Nature 409 , 860 –921 (2001 ) .

We have identified several items requiring correction or clarification in our paper on the sequencing of the human genome.

• Six additional authors should have been included: Pieter de Jong, Joseph J. Catanese, and Kazutoyo Osoegawa (Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, New York 14263, USA; present address: Children's Hospital Oakland Research Institute, 747 52nd street Oakland, California 94609, USA) and Hiroaki Shizuya, Sangdun Choi and Yu-Juin Chen (Division of Biology, California Institute of Technology, Pasadena, California 91125, USA). These investigators and their laboratories constructed the high-quality BAC libraries that were crucial in sequencing the genome, as described in Table 1. These libraries were not previously published. We apologize to our colleagues for this omission.

• The Supplementary Information on Nature's website has been revised. Changes to the original Supplementary Information are available in the Supplementary Information to this Correction. We have added 7 additional investigators to the full list of authors. We have also added 79 additional references, citing previously published sequences that were included in the draft genome sequence.

• Table 27 reported 18 instances of apparently novel paralogues of genes encoding drug targets. We have carefully reviewed these 18 cases and found that two are incorrect: a paralogue of an insulin-like growth factor-1 receptor gene and a paralogue of the calcitonin-related polypeptide alpha gene. In both cases, we had incorrectly recorded the chromosomal location sequence of the known gene, thereby erroneously giving rise to an apparent paralogue (the first instance was identified by J. Englebrecht and C. Kristensen (personal communication)). Of the 16 remaining apparent paralogues, two (calcium channel paralogue IGI_M1_ctg17137_10 and heparan N-deacetylase/N-sulphotransferase paralogue IGI_M1_ctg13263_18) have so far been confirmed as bona fide genes1,2.

• Several correspondents have written to point out that a handful of clones listed as human sequence in the HTG division of GenBank (established to house ‘unfinished’ sequence data) are actually mouse sequence (about two dozen out of 30,000 clones). They asked whether these clones give rise to contamination in the human draft sequence. As noted in the paper, we used computer programs to identify and eliminate instances of such contamination (with mouse sequence, vector sequence, and so on) before assembling the draft genome sequence. In reviewing the work, we identified one mouse clone that slipped through the filter. This clone has been eliminated in subsequent assemblies (http://genome.cse.ucsc.edu/). Because the draft sequence remains an imperfect partial product, we welcome additional comments that could help in improving it.

• The discussion of possible horizontal gene transfer from bacterial genomes to vertebrate genomes has provoked considerable discussion3,4,5. We reported 113 instances of human genes that had reasonably close homologues in bacteria, but either had no homologue or only a weaker homologue in non-vertebrate eukaryotes for which extensive genomic sequence was available. We suggested two hypotheses to explain these data: horizontal gene transfer (HGT) from bacteria to human or gene loss in the other lineages. We had no data to distinguish between these hypotheses, although we suggested that the latter was a more “parsimonious” explanation as it involved fewer independent events. In the introduction we stated that this seemed “likely”.

Several correspondents have undertaken a more comprehensive analysis and have argued that a significant proportion of the cases can be explained by gene loss3,4,5. We agree. We believe that the two hypotheses cannot be distinguished on the basis of parsimony, because too little is known about the relative rates of HGT and gene loss in evolution. Instead, extensive sequence data from many additional organisms will be required to assess definitively the provenance of each gene.

We note that the process of HGT into the vertebrate genome from other organisms has clearly occurred on multiple occasions, as seen from the sudden arrival of many DNA transposons with strong similarities to other organisms. The most recent documented cases occurred subsequent to the eutherian radiation (see Fig. 19).

• A key reference concerning 3′-transduction by LINE elements was omitted on page 887. The sentence citing references 205 and 206 should also have cited Goodier et al.6.

• In Fig. 33, the unit on the y axis should be bp, not kb. The legend should read: “Sequence properties of segmental duplications. Distribution of length and per cent nucleotide identity are shown as a function of the number of aligned bp from the finished vs finished human genomic sequence dataset. Intrachromosomal (blue), interchromosomal (red).”

• In Fig. 41, the legend should begin: “For each of the 27 common domain families, the number of different Pfam domain types that co-occur with the family in each of the five eukaryotic proteomes. The 27 families were chosen to include the 10 most common domain families in each proteome. The data are ranked …”

• In Table 22, the entry 81,126 should be 8,126.

• On page 898, line 31, the final phrase of the sentence (“… and the representativeness of currently ‘known’ human genes”) should be deleted. The sentence should read: “Before discussing the gene predictions for the human genome, it is useful to consider background issues, including previous estimates of the number of human genes and lessons learned from worms and flies”.

• On page 900, line 38, remove “(see above)”.

• We failed to acknowledge the crucial role of sequence editing software, which has been widely used for inspection and subsequent finishing of the sequence assemblies. The two principal programs used were CONSED7 and GAP48.