Chat with us, powered by LiveChat <d | Study Help
  

ARTICLE OPEN
doi:10.1038/nature12027

The African coelacanth genome provides
insights into tetrapod evolution
Chris T. Amemiya1,2*, Jessica Alföldi3*, Alison P. Lee4, Shaohua Fan5, Hervé Philippe6, Iain MacCallum3, Ingo Braasch7,
Tereza Manousaki5,8, Igor Schneider9, Nicolas Rohner10, Chris Organ11, Domitille Chalopin12, Jeramiah J. Smith13, Mark Robinson1,
Rosemary A. Dorrington14, Marco Gerdol15, Bronwen Aken16, Maria Assunta Biscotti17, Marco Barucca17, Denis Baurain18,
Aaron M. Berlin3, Gregory L. Blatch14,19, Francesco Buonocore20, Thorsten Burmester21, Michael S. Campbell22, Adriana Canapa17,
John P. Cannon23, Alan Christoffels24, Gianluca De Moro15, Adrienne L. Edkins14, Lin Fan3, Anna Maria Fausto20,
Nathalie Feiner5,25, Mariko Forconi17, Junaid Gamieldien24, Sante Gnerre3, Andreas Gnirke3, Jared V. Goldstone26,
Wilfried Haerty27, Mark E. Hahn26, Uljana Hesse24, Steve Hoffmann28, Jeremy Johnson3, Sibel I. Karchner26, Shigehiro Kuraku5{,
Marcia Lara3, Joshua Z. Levin3, Gary W. Litman23, Evan Mauceli3{, Tsutomu Miyake29, M. Gail Mueller30, David R. Nelson31,
Anne Nitsche32, Ettore Olmo17, Tatsuya Ota33, Alberto Pallavicini15, Sumir Panji24{, Barbara Picone24, Chris P. Ponting27,
Sonja J. Prohaska34, Dariusz Przybylski3, Nil Ratan Saha1, Vydianathan Ravi4, Filipe J. Ribeiro3{, Tatjana Sauka-Spengler35,
Giuseppe Scapigliati20, Stephen M. J. Searle16, Ted Sharpe3, Oleg Simakov5,36, Peter F. Stadler32, John J. Stegeman26,
Kenta Sumiyama37, Diana Tabbaa3, Hakim Tafer32, Jason Turner-Maier3, Peter van Heusden24, Simon White16, Louise Williams3,
Mark Yandell22, Henner Brinkmann6, Jean-Nicolas Volff12, Clifford J. Tabin10, Neil Shubin38, Manfred Schartl39, David B. Jaffe3,
John H. Postlethwait7, Byrappa Venkatesh4, Federica Di Palma3, Eric S. Lander3, Axel Meyer5,8,25 & Kerstin Lindblad-Toh3,40

The discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to
have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient
relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on
land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic
analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth
protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features.
Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved
in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of
enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance
of the coelacanth genome as a blueprint for understanding tetrapod evolution.

In 1938 Marjorie Courtenay-Latimer, the curator of a small natural
history museum in East London, South Africa, discovered a large,
unusual-looking fish among the many specimens delivered to her by
a local fish trawler. Latimeria chalumnae, named after its discoverer1,
was over 1 m long, bluish in colour and had conspicuously fleshy fins
that resembled the limbs of terrestrial vertebrates. This discovery is

considered to be one of the most notable zoological finds of the twen-
tieth century. Latimeria is the only living member of an ancient group
of lobe-finned fishes that was known previously only from fossils and
believed to have been extinct since the Late Cretaceous period,
approximately 70 million years ago (Myr ago)1. It was almost 15 years
before a second specimen of this elusive species was discovered in the

*These authors contributed equally to this work.

1Molecular Genetics Program, Benaroya Research Institute, Seattle, Washington 98101, USA. 2Department of Biology, University of Washington, Seattle, Washington 98105, USA. 3Broad Institute of MIT
and Harvard, Cambridge, Massachusetts 02142, USA. 4Comparative Genomics Laboratory, Institute of Molecular and Cell Biology, A*STAR, Biopolis, Singapore 138673, Singapore. 5Department of Biology,
University of Konstanz, Konstanz 78464, Germany. 6Département de Biochimie, Université de Montréal, Centre Robert Cedergren, Montréal H3T 1J4, Canada. 7Institute of Neuroscience, University of
Oregon, Eugene, Oregon 97403, USA. 8Konstanz Research School of Chemical Biology, University of Konstanz, Konstanz 78464, Germany. 9Instituto de Ciencias Biologicas, Universidade Federal do Para,
Belem 66075-110, Brazil. 10Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA. 11Department of Anthropology, University of Utah, Salt Lake City, Utah 84112, USA.
12Institut de Genomique Fonctionnelle de Lyon, Ecole Normale Superieure de Lyon, Lyon 69007, France. 13Department of Biology, University of Kentucky, Lexington, Kentucky 40506, USA. 14Biomedical
Biotechnology Research Unit (BioBRU), Department of Biochemistry, Microbiology & Biotechnology, Rhodes University, Grahamstown 6139, South Africa. 15Department of Life Sciences, University of
Trieste, Trieste 34128, Italy. 16Department of Informatics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK. 17Department of Life and Environmental Sciences, Polytechnic University of Marche,
Ancona 60131, Italy. 18Department of Life Sciences, University of Liege, Liege 4000, Belgium. 19College of Health and Biomedicine, Victoria University, Melbourne VIC 8001, Australia. 20Department for
Innovation in Biological, Agro-food and Forest Systems, University of Tuscia, Viterbo 01100, Italy. 21Department of Biology, University of Hamburg, Hamburg 20146, Germany. 22Eccles Institute of Human
Genetics, University of Utah, Salt Lake City, Utah 84112, USA. 23Department of Pediatrics, University of South Florida Morsani College of Medicine, Children’s Research Institute, St. Petersburg, Florida
33701, USA. 24South African National Bioinformatics Institute, University of the Western Cape, Bellville 7535, South Africa. 25International Max-Planck Research School for Organismal Biology, University of
Konstanz, Konstanz 78464, Germany. 26Biology Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts 02543, USA. 27MRC Functional Genomics Unit, Oxford University, Oxford
OX1 3PT, UK. 28Transcriptome Bioinformatics Group, LIFE Research Center for Civilization Diseases, Universität Leipzig, Leipzig 04109, Germany. 29Graduate School of Science and Technology, Keio
University, Yokohama 223-8522, Japan. 30Department of Molecular Genetics, All Children’s Hospital, St. Petersburg, Florida 33701, USA. 31Department of Microbiology, Immunology and Biochemistry,
University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA. 32Bioinformatics Group, Department of Computer Science, Universität Leipzig, Leipzig 04109, Germany. 33Department of
Evolutionary Studies of Biosystems, The Graduate University for Advanced Studies, Hayama 240-0193, Japan. 34Computational EvoDevo Group, Department of Computer Science, Universität Leipzig,
Leipzig 04109, Germany. 35Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX1 2JD, UK. 36European Molecular Biology Laboratory, Heidelberg 69117, Germany. 37Division of
Population Genetics, National Institute of Genetics, Mishima 411-8540, Japan. 38University of Chicago, Chicago, Illinois 60637, USA. 39Department Physiological Chemistry, Biocenter, University of
Wuerzburg, Wuerzburg 97070, Germany. 40Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala 751 23, Sweden. {Present addresses: Genome
Resource and Analysis Unit, Center for Developmental Biology, RIKEN, Kobe, Japan (S.K.); Boston Children’s Hospital, Boston, Massachusetts, USA (E.M.); Computational Biology Unit, Institute of Infectious
Disease and Molecular Medicine, University of Cape Town Health Sciences Campus, Anzio Road, Observatory 7925, South Africa (S.P.); New York Genome Center, New York, New York, USA (F.J.R.).

1 8 A P R I L 2 0 1 3 | V O L 4 9 6 | N A T U R E | 3 1 1

Macmillan Publishers Limited. All rights reserved©2013

Comoros Islands in the Indian Ocean, and only 309 individuals have
been recorded in the past 75 years (R. Nulens, personal communication)2.
The discovery in 1997 of a second coelacanth species in Indonesia,
Latimeria menadoensis, was equally surprising, as it had been assumed
that living coelacanths were confined to small populations off the East
African coast3,4. Fascination with these fish is partly due to their pre-
historic appearance—remarkably, their morphology is similar to that
of fossils that date back at least 300 Myr, leading to the supposition that,
among vertebrates, this lineage is markedly slow to evolve1,5. Latimeria
has also been of particular interest to evolutionary biologists, owing to
its hotly debated relationship to our last fish ancestor, the fish that first
crawled onto land6. In the past 15 years, targeted sequencing efforts
have produced the sequences of the coelacanth mitochondrial genomes7,
HOX clusters8 and a few gene families9,10. Nevertheless, coelacanth
research has felt the lack of large-scale sequencing data. Here we describe
the sequencing and comparative analysis of the genome of L. chalumnae,
the African coelacanth.

Genome assembly and annotation
The African coelacanth genome was sequenced and assembled using
DNA from a Comoros Islands Latimeria chalumnae specimen (Sup-
plementary Fig. 1). It was sequenced by Illumina sequencing tech-
nology and assembled using the short read genome assembler
ALLPATHS-LG11. The L. chalumnae genome has been reported previ-
ously to have a karyotype of 48 chromosomes12. The draft assembly is
2.86 gigabases (Gb) in size and is composed of 2.18 Gb of sequence plus
gaps between contigs. The coelacanth genome assembly has a contig
N50 size (the contig size above which 50% of the total length of the
sequence assembly can be found) of 12.7 kilobases (kb) and a scaffold
N50 size of 924 kb, and quality metrics comparable to other Illumina
genomes (Supplementary Note 1, and Supplementary Tables 1 and 2).

The genome assembly was annotated separately by both the Ensembl
gene annotation pipeline (Ensembl release 66, February 2012) and by
MAKER13. The Ensembl gene annotation pipeline created gene models
using protein alignments from the Universal Protein Resource (Uni-
prot) database, limited coelacanth complementary DNA data, RNA-seq
data generated from L. chalumnae muscle (18 Gb of paired-end reads
were assembled using Trinity software14, Supplementary Fig. 2) as well
as orthology with other vertebrates. This pipeline produced 19,033
protein-coding genes containing 21,817 transcripts. The MAKER
pipeline used the L. chalumnae Ensembl gene set, Uniprot protein
alignments, and L. chalumnae (muscle) and L. menadoensis (liver
and testis)15 RNA-seq data to create gene models, and this produced
29,237 protein-coding gene annotations. In addition, 2,894 short non-
coding RNAs, 1,214 long non-coding RNAs, and more than 24,000
conserved RNA secondary structures were identified (Supplementary
Note 2, Supplementary Tables 3 and 4, Supplementary Data 1–3 and
Supplementary Fig. 3). It was inferred that 336 genes underwent spe-
cific duplications in the coelacanth lineage (Supplementary Note 3,
Supplementary Tables 5 and 6, and Supplementary Data 4).

The closest living fish relative of tetrapods
The question of which living fish is the closest relative to ‘the fish that
first crawled on to land’ has long captured our imagination: among
scientists the odds have been placed on either the lungfish or the
coelacanth16. Analyses of small to moderate amounts of sequence data
for this important phylogenetic question (ranging from 1 to 43 genes)
has tended to favour the lungfishes as the extant sister group to the
land vertebrates17. However, the alternative hypothesis that the lung-
fish and the coelacanth are equally closely related to the tetrapods
could not be rejected with previous data sets18.

To seek a comprehensive answer we generated RNA-seq data from
three samples (brain, gonad and kidney, and gut and liver) from the
West African lungfish, Protopterus annectens, and compared it to gene
sets from 21 strategically chosen jawed vertebrate species. To perform a
reliable analysis we selected 251 genes in which a 1:1 orthology ratio

was clear and used CAT-GTR, a complex site-heterogeneous model of
sequence evolution that is known to reduce tree-reconstruction arte-
facts19 (see Supplementary Methods). The resulting phylogeny, based
on 100,583 concatenated amino acid positions (Fig. 1, posterior prob-
ability 5 1.0 for the lungfish–tetrapod node) is maximally supported
except for the relative positions of the armadillo and the elephant. It
corroborates known vertebrate phylogenetic relationships and
strongly supports the conclusion that tetrapods are more closely
related to lungfish than to the coelacanth (Supplementary Note 4
and Supplementary Fig. 4).

The slowly evolving coelacanth
The morphological resemblance of the modern coelacanth to its fossil
ancestors has resulted in it being nicknamed ‘the living fossil’1. This
invites the question of whether the genome of the coelacanth is as
slowly evolving as its outward appearance suggests. Earlier work
showed that a few gene families, such as Hox and protocadherins,
have comparatively slower protein-coding evolution in coelacanth
than in other vertebrate lineages8,10. To address the question, we
compared several features of the coelacanth genome to those of other
vertebrate genomes.

Protein-coding gene evolution was examined using the phyloge-
nomics data set described above (251 concatenated proteins) (Fig. 1).
Pair-wise distances between taxa were calculated from the branch
lengths of the tree using the two-cluster test proposed previously20

to test for equality of average substitution rates. Then, for each of
the following species and species clusters (coelacanth, lungfish,
chicken and mammals), we ascertained their respective mean distance
to an outgroup consisting of three cartilaginous fishes (elephant
shark, little skate and spotted catshark). Finally, we tested whether
there was any significant difference in the distance to the outgroup of
cartilaginous fish for every pair of species and species clusters, using a

Dog

0.1

substitutions per site

Human

Mouse

Elephant

Armadillo

Opossum

Platypus

Chicken

Turkey

Zebra finch

Lizard

Western clawed frog

Chinese brown frog

Lungfish

Coelacanth

Tilapia

Pufferfish

Zebrafish

Elephant shark

Little skate

Spotted catshark

Tetrapods

Lobe-finned fish

Cartilaginous fish

Ray-finned fish

Tammar wallaby

Figure 1 | A phylogenetic tree of a broad selection of jawed vertebrates
shows that lungfish, not coelacanth, is the closest relative of tetrapods.
Multiple sequence alignments of 251 genes with a 1:1 ratio of orthologues in
22 vertebrates and with a full sequence coverage for both lungfish and
coelacanth were used to generate a concatenated matrix of 100,583
unambiguously aligned amino acid positions. The Bayesian tree was inferred
using PhyloBayes under the CAT 1 GTR 1 C4 model with confidence estimates
derived from 100 gene jack-knife replicates (support is 100% for all clades but
armadillo 1 elephant with 45%)48. The tree was rooted on cartilaginous fish, and
shows that the lungfish is more closely related to tetrapods than the coelacanth,
and that the protein sequence of coelacanth is evolving slowly. Pink lines
(tetrapods) are slightly offset from purple lines (lobe-finned fish), to indicate
that these species are both tetrapods and lobe-finned fish.

RESEARCH ARTICLE

3 1 2 | N A T U R E | V O L 4 9 6 | 1 8 A P R I L 2 0 1 3

Macmillan Publishers Limited. All rights reserved©2013

Z statistic. When these distances to the outgroup of cartilaginous fish
were compared, we found that the coelacanth proteins that were
tested were significantly more slowly evolving (0.890 substitutions
per site) than the lungfish (1.05 substitutions per site), chicken (1.09
substitutions per site) and mammalian (1.21 substitutions per site)
orthologues (P , 1026 in all cases) (Supplementary Data 5). In addition,
as can be seen in Fig. 1, the substitution rate in coelacanth is approxi-
mately half that in tetrapods since the two lineages diverged. A Tajima’s
relative rate test21 confirmed the coelacanth’s significantly slower rate
of protein evolution (P , 10220) (Supplementary Data 6).

We next examined the abundance of transposable elements in the
coelacanth genome. Theoretically, transposable elements may make
their greatest contribution to the evolution of a species by generating
templates for exaptation to form novel regulatory elements and exons,
and by acting as substrates for genomic rearrangement22. We found
that the coelacanth genome contains a wide variety of transposable-
element superfamilies and has a relatively high transposable-element
content (25%); this number is probably an underestimate as this is a
draft assembly (Supplementary Note 5 and Supplementary Tables
7–10). Analysis of RNA-seq data and of the divergence of individual
transposable-element copies from consensus sequences show that
14 coelacanth transposable-element superfamilies are currently active
(Supplementary Note 6, Supplementary Table 10 and Supplementary
Fig. 5). We conclude that the current coelacanth genome shows both
an abundance and activity of transposable elements similar to many
other genomes. This contrasts with the slow protein evolution observed.

Analyses of chromosomal breakpoints in the coelacanth genome
and tetrapod genomes reveal extensive conservation of synteny and
indicate that large-scale rearrangements have occurred at a generally
low rate in the coelacanth lineage. Analyses of these rearrangement
classes detected several fission events published previously23 that are
known to have occurred in tetrapod lineages, and at least 31 inter-
chromosomal rearrangements that occurred in the coelacanth lineage
or the early tetrapod lineage (0.063 fusions per 1 Myr), compared to
20 events (0.054 fusions per 1 Myr) in the salamander lineage and
21 events (0.057 fusions per 1 Myr) in the Xenopus lineage23 (Sup-
plementary Note 7 and Supplementary Fig. 6). Overall, these analyses
indicate that karyotypic evolution in the coelacanth lineage has
occurred at a relatively slow rate, similar to that of non-mammalian
tetrapods24.

In a separate analysis we also examined the evolutionary divergence
between the two species of coelacanth, L. chalumnae and L. menadoensis,
found in African and Indonesian waters, respectively. Previous ana-
lysis of mitochondrial DNA showed a sequence identity of 96%, but
estimated divergence times range widely from 6 to 40 Myr25,26. When
we compared the liver and testis transcriptomes of L. menadoensis27

to the L. chalumnae genome, we found an identity of 99.73% (Sup-
plementary Note 8 and Supplementary Fig. 7), whereas alignments
between 20 sequenced L. menadoensis bacterial artificial chromosomes
(BACs) and the L. chalumnae genome showed an identity of 98.7%
(Supplementary Table 11 and Supplementary Fig. 8). Both the genic
and genomic divergence rates are similar to those seen between the
human and chimpanzee genomes (99.5% and 98.8%, respectively;
divergence time of 6 to 8 Myr ago)28, whereas the rates of molecular
evolution in Latimeria are probably affected by several factors, includ-
ing the slower substitution rate seen in coelacanth. This suggests a
slightly longer divergence time for the two coelacanth species.

The adaptation of vertebrates to land
As the species with a sequenced genome closest to our most recent
aquatic ancestor, the coelacanth provides a unique opportunity to
identify genomic changes that were associated with the successful
adaptation of vertebrates to the land environment.

Over the 400 Myr that vertebrates have lived on land, some genes
that are unnecessary for existence in their new environment have been
eliminated. To understand this aspect of the water-to-land transition,

we surveyed the Latimeria genome annotations to identify genes that
were present in the last common ancestor of all bony fish (including
the coelacanth) but that are missing from tetrapod genomes. More
than 50 such genes, including components of fibroblast growth factor
(FGF) signalling, TGF-b and bone morphogenic protein (BMP) sig-
nalling, and WNT signalling pathways, as well as many transcription
factor genes, were inferred to be lost based on the coelacanth data
(Supplementary Data 7 and Supplementary Fig. 9). Previous studies of
genes that were lost in this transition could only compare teleost fish
to tetrapods, meaning that differences in gene content could have
been due to loss in the tetrapod or in the lobe-finned fish lineages.
We were able to confirm that four genes that were shown previously to
be absent in tetrapods (And1 and And2 (ref. 29), Fgf24 (ref. 30) and
Asip2 (ref. 31)), were indeed present and intact in Latimeria, support-
ing the idea that they were lost in the tetrapod lineage.

We functionally annotated more than 50 genes lost in tetrapods
using zebrafish data (gene expression, knock-downs and knockouts).
Many genes were classified in important developmental categories
(Supplementary Data 7): fin development (13 genes); otolith and
ear development (8 genes); kidney development (7 genes); trunk,
somite and tail development (11 genes); eye (13 genes); and brain
development (23 genes). This implies that critical characters in the
morphological transition from water to land (for example, fin-to-limb
transition and remodelling of the ear) are reflected in the loss of
specific genes along the phylogenetic branch leading to tetrapods.
However, homeobox genes, which are responsible for the develop-
ment of an organism’s basic body plan, show only slight differences
between Latimeria, ray-finned fish and tetrapods; it would seem that
the protein-coding portion of this gene family, along with several
others (Supplementary Note 9, Supplementary Tables 12–16 and Sup-
plementary Fig. 10), have remained largely conserved during the
vertebrate land transition (Supplementary Fig. 11).

As vertebrates transitioned to a new land environment, changes
occurred not only in gene content but also in the regulation of existing
genes. Conserved non-coding elements (CNEs) are strong candidates
for gene regulatory elements. They can act as promoters, enhancers,
repressors and insulators32,33, and have been implicated as major faci-
litators of evolutionary change34. To identify CNEs that originated in
the most recent common ancestor of tetrapods, we predicted CNEs
that evolved in various bony vertebrate (that is, ray-finned fish, coela-
canth and tetrapod) lineages and assigned them to their likely branch
points of origin. To detect CNEs, conserved sequences in the human
genome were identified using MULTIZ alignments of bony vertebrate
genomes, and then known protein-coding sequences, untranslated
regions (UTRs) and known RNA genes were excluded. Our ana-
lysis identified 44,200 ancestral tetrapod CNEs that originated after
the divergence of the coelacanth lineage. They represent 6% of the
739,597 CNEs that are under constraint in the bony vertebrate lin-
eage. We compared the ancestral tetrapod CNEs to mouse embryo
ChIP-seq (chromatin immunoprecipitation followed by sequencing)
data obtained using antibodies against p300, a transcriptional coacti-
vator. This resulted in a sevenfold enrichment in the p300 binding
sites for our candidate CNEs and confirmed that these CNEs are
indeed enriched for gene regulatory elements.

Each tetrapod CNE was assigned to the gene whose transcription
start site was closest, and gene-ontology category enrichment was cal-
culated for those genes. The most enriched categories were involved
with smell perception (for example, sensory perception of smell,
detection of chemical stimulus and olfactory receptor activity). This
is consistent with the notable expansion of olfactory receptor family
genes in tetrapods compared with teleosts, and may reflect the neces-
sity of a more tightly regulated, larger and more diverse repertoire of
olfactory receptors for detecting airborne odorants as part of the
terrestrial lifestyle. Other significant categories include morphoge-
nesis (radial pattern formation, hind limb morphogenesis, kidney mor-
phogenesis) and cell differentiation (endothelial cell fate commitment,

ARTICLE RESEARCH

1 8 A P R I L 2 0 1 3 | V O L 4 9 6 | N A T U R E | 3 1 3

Macmillan Publishers Limited. All rights reserved©2013

epithelial cell fate commitment), which is consistent with the body-
plan changes required for land transition, as well as immunoglobulin
VDJ recombination, which reflects the presumed response differences
required to address the novel pathogens that vertebrates would encoun-
ter on land (Supplementary Note 10 and Supplementary Tables 17–24).

A major innovation of tetrapods is the evolution of limbs charac-
terized by digits. The limb skeleton consists of a stylopod (humerus or
femur), the zeugopod (radius and ulna, or tibia and fibula), and an
autopod (wrist or ankle, and digits). There are two major hypotheses
about the origins of the autopod; that it was a novel feature of tetra-
pods, and that it has antecedents in the fins of fish35 (Supplementary
Note 11 and Supplementary Fig. 12). We examine here the Hox
regulation of limb development in ray-finned fish, coelacanth and
tetrapods to address these hypotheses.

In mouse, late-phase digit enhancers are located in a gene desert
that is proximal to the HOX-D cluster36. Here we provide an align-
ment of the HOX-D centromeric gene desert of coelacanth with those
of tetrapods and ray-finned fishes (Fig. 2a). Among the six cis-regulatory
sequences previously identified in this gene desert36, three sequences
show sequence conservation restricted to tetrapods (Supplementary
Fig. 13). However, one regulatory sequence (island 1) is shared by tetra-
pods and coelacanth, but not by ray-finned fish (Fig. 2b and Supplemen-
tary Fig. 14). When tested in a transient transgenic assay in mouse, the
coelacanth sequence of island 1 was able to drive reporter expression in a
limb-specific pattern (Fig. 2c). This suggests that island 1 was a lobe-
fin developmental enhancer in the fish ancestor of tetrapods that was
then coopted into the autopod enhancer of modern tetrapods. In this
case, the autopod developmental regulation was derived from an ances-
tral lobe-finned fish regulatory element.

Changes in the urea cycle provide an illuminating example of the
adaptations associated with transition to land. Excretion of nitrogen is
a major physiological challenge for terrestrial vertebrates. In aquatic
environments, the primary nitrogenous waste product is ammonia,
which is readily diluted by surrounding water before it reaches toxic
levels, but on land, less toxic substances such as urea or uric acid must
be produced instead (Supplementary Fig. 15). The widespread and
almost exclusive occurrence of urea excretion in amphibians, some
turtles and mammals has led to the hypothesis that the use of urea as
the main nitrogenous waste product was a key innovation in the
vertebrate transition from water to land37.

With the availability of gene sequences from coelacanth and lungfish,
it became possible to test this hypothesis. We used a branch-site model

in the HYPHY package38, which estimates the ratio of synonymous (dS)
to non-synonymous (dN) substitutions (v values) among different
branches and among different sites (codons) across a multiple-species
sequence alignment. For the rate-limiting enzyme of the hepatic urea
cycle, carbamoyl phosphate synthase I (CPS1), only one branch of the
tree shows a strong signature of selection (P 5 0.02), namely the branch
leading to tetrapods and the branch leading to amniotes (Fig. 3); no
other enzymes in this cycle showed a signature of selection. Conversely,
mitochondrial arginase (ARG2), which produces extrahepatic urea as a
byproduct of arginine metabolism but is not involved in the production
of urea for nitrogenous waste disposal, did not show any evidence of
selection in vertebrates (Supplementary Fig. 16). This leads us to con-
clude that adaptive evolution occurred in the hepatic urea cycle during
the vertebrate land transition. In addition, it is interesting to note that
of the five amino acids of CPS1 that changed between coelacanth and
tetrapods, three are in important domains (the two ATP-binding sites
and the subunit interaction domain) and a fourth is known to cause a
malfunctioning enzyme in human patients if mutated39.

The adaptation to a terrestrial lifestyle necessitated major changes in
the physiological environment of the developing embryo and fetus,
resulting in the evolution and specialization of extra-embryonic mem-
branes of the amniote mammals40. In particular, the placenta is a com-
plex structure that is critical for providing gas and nutrient exchange
between mother and fetus, and is also a major site of haematopoiesis41.

We have identified a region of the coelacanth HOX-A cluster that
may have been involved in the evolution of extra-embryonic struc-
tures in tetrapods, including the eutherian placenta. Global alignment
of the coelacanth Hoxa14–Hoxa13 region with the homologous
regions of the horn shark, chicken, human and mouse revealed a
CNE just upstream of the coelacanth Hoxa14 gene (Supplementary
Fig. 17a). This conserved stretch is not found …

ARTICLE OPEN
doi:10.1038/nature12027

The African coelacanth genome provides
insights into tetrapod evolution
Chris T. Amemiya1,2*, Jessica Alföldi3*, Alison P. Lee4, Shaohua Fan5, Hervé Philippe6, Iain MacCallum3, Ingo Braasch7,
Tereza Manousaki5,8, Igor Schneider9, Nicolas Rohner10, Chris Organ11, Domitille Chalopin12, Jeramiah J. Smith13, Mark Robinson1,
Rosemary A. Dorrington14, Marco Gerdol15, Bronwen Aken16, Maria Assunta Biscotti17, Marco Barucca17, Denis Baurain18,
Aaron M. Berlin3, Gregory L. Blatch14,19, Francesco Buonocore20, Thorsten Burmester21, Michael S. Campbell22, Adriana Canapa17,
John P. Cannon23, Alan Christoffels24, Gianluca De Moro15, Adrienne L. Edkins14, Lin Fan3, Anna Maria Fausto20,
Nathalie Feiner5,25, Mariko Forconi17, Junaid Gamieldien24, Sante Gnerre3, Andreas Gnirke3, Jared V. Goldstone26,
Wilfried Haerty27, Mark E. Hahn26, Uljana Hesse24, Steve Hoffmann28, Jeremy Johnson3, Sibel I. Karchner26, Shigehiro Kuraku5{,
Marcia Lara3, Joshua Z. Levin3, Gary W. Litman23, Evan Mauceli3{, Tsutomu Miyake29, M. Gail Mueller30, David R. Nelson31,
Anne Nitsche32, Ettore Olmo17, Tatsuya Ota33, Alberto Pallavicini15, Sumir Panji24{, Barbara Picone24, Chris P. Ponting27,
Sonja J. Prohaska34, Dariusz Przybylski3, Nil Ratan Saha1, Vydianathan Ravi4, Filipe J. Ribeiro3{, Tatjana Sauka-Spengler35,
Giuseppe Scapigliati20, Stephen M. J. Searle16, Ted Sharpe3, Oleg Simakov5,36, Peter F. Stadler32, John J. Stegeman26,
Kenta Sumiyama37, Diana Tabbaa3, Hakim Tafer32, Jason Turner-Maier3, Peter van Heusden24, Simon White16, Louise Williams3,
Mark Yandell22, Henner Brinkmann6, Jean-Nicolas Volff12, Clifford J. Tabin10, Neil Shubin38, Manfred Schartl39, David B. Jaffe3,
John H. Postlethwait7, Byrappa Venkatesh4, Federica Di Palma3, Eric S. Lander3, Axel Meyer5,8,25 & Kerstin Lindblad-Toh3,40

The discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to
have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient
relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on
land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic
analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth
protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features.
Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved
in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of
enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance
of the coelacanth genome as a blueprint for understanding tetrapod evolution.

In 1938 Marjorie Courtenay-Latimer, the curator of a small natural
history museum in East London, South Africa, discovered a large,
unusual-looking fish among the many specimens delivered to her by
a local fish trawler. Latimeria chalumnae, named after its discoverer1,
was over 1 m long, bluish in colour and had conspicuously fleshy fins
that resembled the limbs of terrestrial vertebrates. This discovery is

considered to be one of the most notable zoological finds of the twen-
tieth century. Latimeria is the only living member of an ancient group
of lobe-finned fishes that was known previously only from fossils and
believed to have been extinct since the Late Cretaceous period,
approximately 70 million years ago (Myr ago)1. It was almost 15 years
before a second specimen of this elusive species was discovered in the

*These authors contributed equally to this work.

1Molecular Genetics Program, Benaroya Research Institute, Seattle, Washington 98101, USA. 2Department of Biology, University of Washington, Seattle, Washington 98105, USA. 3Broad Institute of MIT
and Harvard, Cambridge, Massachusetts 02142, USA. 4Comparative Genomics Laboratory, Institute of Molecular and Cell Biology, A*STAR, Biopolis, Singapore 138673, Singapore. 5Department of Biology,
University of Konstanz, Konstanz 78464, Germany. 6Département de Biochimie, Université de Montréal, Centre Robert Cedergren, Montréal H3T 1J4, Canada. 7Institute of Neuroscience, University of
Oregon, Eugene, Oregon 97403, USA. 8Konstanz Research School of Chemical Biology, University of Konstanz, Konstanz 78464, Germany. 9Instituto de Ciencias Biologicas, Universidade Federal do Para,
Belem 66075-110, Brazil. 10Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA. 11Department of Anthropology, University of Utah, Salt Lake City, Utah 84112, USA.
12Institut de Genomique Fonctionnelle de Lyon, Ecole Normale Superieure de Lyon, Lyon 69007, France. 13Department of Biology, University of Kentucky, Lexington, Kentucky 40506, USA. 14Biomedical
Biotechnology Research Unit (BioBRU), Department of Biochemistry, Microbiology & Biotechnology, Rhodes University, Grahamstown 6139, South Africa. 15Department of Life Sciences, University of
Trieste, Trieste 34128, Italy. 16Department of Informatics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK. 17Department of Life and Environmental Sciences, Polytechnic University of Marche,
Ancona 60131, Italy. 18Department of Life Sciences, University of Liege, Liege 4000, Belgium. 19College of Health and Biomedicine, Victoria University, Melbourne VIC 8001, Australia. 20Department for
Innovation in Biological, Agro-food and Forest Systems, University of Tuscia, Viterbo 01100, Italy. 21Department of Biology, University of Hamburg, Hamburg 20146, Germany. 22Eccles Institute of Human
Genetics, University of Utah, Salt Lake City, Utah 84112, USA. 23Department of Pediatrics, University of South Florida Morsani College of Medicine, Children’s Research Institute, St. Petersburg, Florida
33701, USA. 24South African National Bioinformatics Institute, University of the Western Cape, Bellville 7535, South Africa. 25International Max-Planck Research School for Organismal Biology, University of
Konstanz, Konstanz 78464, Germany. 26Biology Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts 02543, USA. 27MRC Functional Genomics Unit, Oxford University, Oxford
OX1 3PT, UK. 28Transcriptome Bioinformatics Group, LIFE Research Center for Civilization Diseases, Universität Leipzig, Leipzig 04109, Germany. 29Graduate School of Science and Technology, Keio
University, Yokohama 223-8522, Japan. 30Department of Molecular Genetics, All Children’s Hospital, St. Petersburg, Florida 33701, USA. 31Department of Microbiology, Immunology and Biochemistry,
University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA. 32Bioinformatics Group, Department of Computer Science, Universität Leipzig, Leipzig 04109, Germany. 33Department of
Evolutionary Studies of Biosystems, The Graduate University for Advanced Studies, Hayama 240-0193, Japan. 34Computational EvoDevo Group, Department of Computer Science, Universität Leipzig,
Leipzig 04109, Germany. 35Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX1 2JD, UK. 36European Molecular Biology Laboratory, Heidelberg 69117, Germany. 37Division of
Population Genetics, National Institute of Genetics, Mishima 411-8540, Japan. 38University of Chicago, Chicago, Illinois 60637, USA. 39Department Physiological Chemistry, Biocenter, University of
Wuerzburg, Wuerzburg 97070, Germany. 40Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala 751 23, Sweden. {Present addresses: Genome
Resource and Analysis Unit, Center for Developmental Biology, RIKEN, Kobe, Japan (S.K.); Boston Children’s Hospital, Boston, Massachusetts, USA (E.M.); Computational Biology Unit, Institute of Infectious
Disease and Molecular Medicine, University of Cape Town Health Sciences Campus, Anzio Road, Observatory 7925, South Africa (S.P.); New York Genome Center, New York, New York, USA (F.J.R.).

1 8 A P R I L 2 0 1 3 | V O L 4 9 6 | N A T U R E | 3 1 1

Macmillan Publishers Limited. All rights reserved©2013

Comoros Islands in the Indian Ocean, and only 309 individuals have
been recorded in the past 75 years (R. Nulens, personal communication)2.
The discovery in 1997 of a second coelacanth species in Indonesia,
Latimeria menadoensis, was equally surprising, as it had been assumed
that living coelacanths were confined to small populations off the East
African coast3,4. Fascination with these fish is partly due to their pre-
historic appearance—remarkably, their morphology is similar to that
of fossils that date back at least 300 Myr, leading to the supposition that,
among vertebrates, this lineage is markedly slow to evolve1,5. Latimeria
has also been of particular interest to evolutionary biologists, owing to
its hotly debated relationship to our last fish ancestor, the fish that first
crawled onto land6. In the past 15 years, targeted sequencing efforts
have produced the sequences of the coelacanth mitochondrial genomes7,
HOX clusters8 and a few gene families9,10. Nevertheless, coelacanth
research has felt the lack of large-scale sequencing data. Here we describe
the sequencing and comparative analysis of the genome of L. chalumnae,
the African coelacanth.

Genome assembly and annotation
The African coelacanth genome was sequenced and assembled using
DNA from a Comoros Islands Latimeria chalumnae specimen (Sup-
plementary Fig. 1). It was sequenced by Illumina sequencing tech-
nology and assembled using the short read genome assembler
ALLPATHS-LG11. The L. chalumnae genome has been reported previ-
ously to have a karyotype of 48 chromosomes12. The draft assembly is
2.86 gigabases (Gb) in size and is composed of 2.18 Gb of sequence plus
gaps between contigs. The coelacanth genome assembly has a contig
N50 size (the contig size above which 50% of the total length of the
sequence assembly can be found) of 12.7 kilobases (kb) and a scaffold
N50 size of 924 kb, and quality metrics comparable to other Illumina
genomes (Supplementary Note 1, and Supplementary Tables 1 and 2).

The genome assembly was annotated separately by both the Ensembl
gene annotation pipeline (Ensembl release 66, February 2012) and by
MAKER13. The Ensembl gene annotation pipeline created gene models
using protein alignments from the Universal Protein Resource (Uni-
prot) database, limited coelacanth complementary DNA data, RNA-seq
data generated from L. chalumnae muscle (18 Gb of paired-end reads
were assembled using Trinity software14, Supplementary Fig. 2) as well
as orthology with other vertebrates. This pipeline produced 19,033
protein-coding genes containing 21,817 transcripts. The MAKER
pipeline used the L. chalumnae Ensembl gene set, Uniprot protein
alignments, and L. chalumnae (muscle) and L. menadoensis (liver
and testis)15 RNA-seq data to create gene models, and this produced
29,237 protein-coding gene annotations. In addition, 2,894 short non-
coding RNAs, 1,214 long non-coding RNAs, and more than 24,000
conserved RNA secondary structures were identified (Supplementary
Note 2, Supplementary Tables 3 and 4, Supplementary Data 1–3 and
Supplementary Fig. 3). It was inferred that 336 genes underwent spe-
cific duplications in the coelacanth lineage (Supplementary Note 3,
Supplementary Tables 5 and 6, and Supplementary Data 4).

The closest living fish relative of tetrapods
The question of which living fish is the closest relative to ‘the fish that
first crawled on to land’ has long captured our imagination: among
scientists the odds have been placed on either the lungfish or the
coelacanth16. Analyses of small to moderate amounts of sequence data
for this important phylogenetic question (ranging from 1 to 43 genes)
has tended to favour the lungfishes as the extant sister group to the
land vertebrates17. However, the alternative hypothesis that the lung-
fish and the coelacanth are equally closely related to the tetrapods
could not be rejected with previous data sets18.

To seek a comprehensive answer we generated RNA-seq data from
three samples (brain, gonad and kidney, and gut and liver) from the
West African lungfish, Protopterus annectens, and compared it to gene
sets from 21 strategically chosen jawed vertebrate species. To perform a
reliable analysis we selected 251 genes in which a 1:1 orthology ratio

was clear and used CAT-GTR, a complex site-heterogeneous model of
sequence evolution that is known to reduce tree-reconstruction arte-
facts19 (see Supplementary Methods). The resulting phylogeny, based
on 100,583 concatenated amino acid positions (Fig. 1, posterior prob-
ability 5 1.0 for the lungfish–tetrapod node) is maximally supported
except for the relative positions of the armadillo and the elephant. It
corroborates known vertebrate phylogenetic relationships and
strongly supports the conclusion that tetrapods are more closely
related to lungfish than to the coelacanth (Supplementary Note 4
and Supplementary Fig. 4).

The slowly evolving coelacanth
The morphological resemblance of the modern coelacanth to its fossil
ancestors has resulted in it being nicknamed ‘the living fossil’1. This
invites the question of whether the genome of the coelacanth is as
slowly evolving as its outward appearance suggests. Earlier work
showed that a few gene families, such as Hox and protocadherins,
have comparatively slower protein-coding evolution in coelacanth
than in other vertebrate lineages8,10. To address the question, we
compared several features of the coelacanth genome to those of other
vertebrate genomes.

Protein-coding gene evolution was examined using the phyloge-
nomics data set described above (251 concatenated proteins) (Fig. 1).
Pair-wise distances between taxa were calculated from the branch
lengths of the tree using the two-cluster test proposed previously20

to test for equality of average substitution rates. Then, for each of
the following species and species clusters (coelacanth, lungfish,
chicken and mammals), we ascertained their respective mean distance
to an outgroup consisting of three cartilaginous fishes (elephant
shark, little skate and spotted catshark). Finally, we tested whether
there was any significant difference in the distance to the outgroup of
cartilaginous fish for every pair of species and species clusters, using a

Dog

0.1

substitutions per site

Human

Mouse

Elephant

Armadillo

Opossum

Platypus

Chicken

Turkey

Zebra finch

Lizard

Western clawed frog

Chinese brown frog

Lungfish

Coelacanth

Tilapia

Pufferfish

Zebrafish

Elephant shark

Little skate

Spotted catshark

Tetrapods

Lobe-finned fish

Cartilaginous fish

Ray-finned fish

Tammar wallaby

Figure 1 | A phylogenetic tree of a broad selection of jawed vertebrates
shows that lungfish, not coelacanth, is the closest relative of tetrapods.
Multiple sequence alignments of 251 genes with a 1:1 ratio of orthologues in
22 vertebrates and with a full sequence coverage for both lungfish and
coelacanth were used to generate a concatenated matrix of 100,583
unambiguously aligned amino acid positions. The Bayesian tree was inferred
using PhyloBayes under the CAT 1 GTR 1 C4 model with confidence estimates
derived from 100 gene jack-knife replicates (support is 100% for all clades but
armadillo 1 elephant with 45%)48. The tree was rooted on cartilaginous fish, and
shows that the lungfish is more closely related to tetrapods than the coelacanth,
and that the protein sequence of coelacanth is evolving slowly. Pink lines
(tetrapods) are slightly offset from purple lines (lobe-finned fish), to indicate
that these species are both tetrapods and lobe-finned fish.

RESEARCH ARTICLE

3 1 2 | N A T U R E | V O L 4 9 6 | 1 8 A P R I L 2 0 1 3

Macmillan Publishers Limited. All rights reserved©2013

Z statistic. When these distances to the outgroup of cartilaginous fish
were compared, we found that the coelacanth proteins that were
tested were significantly more slowly evolving (0.890 substitutions
per site) than the lungfish (1.05 substitutions per site), chicken (1.09
substitutions per site) and mammalian (1.21 substitutions per site)
orthologues (P , 1026 in all cases) (Supplementary Data 5). In addition,
as can be seen in Fig. 1, the substitution rate in coelacanth is approxi-
mately half that in tetrapods since the two lineages diverged. A Tajima’s
relative rate test21 confirmed the coelacanth’s significantly slower rate
of protein evolution (P , 10220) (Supplementary Data 6).

We next examined the abundance of transposable elements in the
coelacanth genome. Theoretically, transposable elements may make
their greatest contribution to the evolution of a species by generating
templates for exaptation to form novel regulatory elements and exons,
and by acting as substrates for genomic rearrangement22. We found
that the coelacanth genome contains a wide variety of transposable-
element superfamilies and has a relatively high transposable-element
content (25%); this number is probably an underestimate as this is a
draft assembly (Supplementary Note 5 and Supplementary Tables
7–10). Analysis of RNA-seq data and of the divergence of individual
transposable-element copies from consensus sequences show that
14 coelacanth transposable-element superfamilies are currently active
(Supplementary Note 6, Supplementary Table 10 and Supplementary
Fig. 5). We conclude that the current coelacanth genome shows both
an abundance and activity of transposable elements similar to many
other genomes. This contrasts with the slow protein evolution observed.

Analyses of chromosomal breakpoints in the coelacanth genome
and tetrapod genomes reveal extensive conservation of synteny and
indicate that large-scale rearrangements have occurred at a generally
low rate in the coelacanth lineage. Analyses of these rearrangement
classes detected several fission events published previously23 that are
known to have occurred in tetrapod lineages, and at least 31 inter-
chromosomal rearrangements that occurred in the coelacanth lineage
or the early tetrapod lineage (0.063 fusions per 1 Myr), compared to
20 events (0.054 fusions per 1 Myr) in the salamander lineage and
21 events (0.057 fusions per 1 Myr) in the Xenopus lineage23 (Sup-
plementary Note 7 and Supplementary Fig. 6). Overall, these analyses
indicate that karyotypic evolution in the coelacanth lineage has
occurred at a relatively slow rate, similar to that of non-mammalian
tetrapods24.

In a separate analysis we also examined the evolutionary divergence
between the two species of coelacanth, L. chalumnae and L. menadoensis,
found in African and Indonesian waters, respectively. Previous ana-
lysis of mitochondrial DNA showed a sequence identity of 96%, but
estimated divergence times range widely from 6 to 40 Myr25,26. When
we compared the liver and testis transcriptomes of L. menadoensis27

to the L. chalumnae genome, we found an identity of 99.73% (Sup-
plementary Note 8 and Supplementary Fig. 7), whereas alignments
between 20 sequenced L. menadoensis bacterial artificial chromosomes
(BACs) and the L. chalumnae genome showed an identity of 98.7%
(Supplementary Table 11 and Supplementary Fig. 8). Both the genic
and genomic divergence rates are similar to those seen between the
human and chimpanzee genomes (99.5% and 98.8%, respectively;
divergence time of 6 to 8 Myr ago)28, whereas the rates of molecular
evolution in Latimeria are probably affected by several factors, includ-
ing the slower substitution rate seen in coelacanth. This suggests a
slightly longer divergence time for the two coelacanth species.

The adaptation of vertebrates to land
As the species with a sequenced genome closest to our most recent
aquatic ancestor, the coelacanth provides a unique opportunity to
identify genomic changes that were associated with the successful
adaptation of vertebrates to the land environment.

Over the 400 Myr that vertebrates have lived on land, some genes
that are unnecessary for existence in their new environment have been
eliminated. To understand this aspect of the water-to-land transition,

we surveyed the Latimeria genome annotations to identify genes that
were present in the last common ancestor of all bony fish (including
the coelacanth) but that are missing from tetrapod genomes. More
than 50 such genes, including components of fibroblast growth factor
(FGF) signalling, TGF-b and bone morphogenic protein (BMP) sig-
nalling, and WNT signalling pathways, as well as many transcription
factor genes, were inferred to be lost based on the coelacanth data
(Supplementary Data 7 and Supplementary Fig. 9). Previous studies of
genes that were lost in this transition could only compare teleost fish
to tetrapods, meaning that differences in gene content could have
been due to loss in the tetrapod or in the lobe-finned fish lineages.
We were able to confirm that four genes that were shown previously to
be absent in tetrapods (And1 and And2 (ref. 29), Fgf24 (ref. 30) and
Asip2 (ref. 31)), were indeed present and intact in Latimeria, support-
ing the idea that they were lost in the tetrapod lineage.

We functionally annotated more than 50 genes lost in tetrapods
using zebrafish data (gene expression, knock-downs and knockouts).
Many genes were classified in important developmental categories
(Supplementary Data 7): fin development (13 genes); otolith and
ear development (8 genes); kidney development (7 genes); trunk,
somite and tail development (11 genes); eye (13 genes); and brain
development (23 genes). This implies that critical characters in the
morphological transition from water to land (for example, fin-to-limb
transition and remodelling of the ear) are reflected in the loss of
specific genes along the phylogenetic branch leading to tetrapods.
However, homeobox genes, which are responsible for the develop-
ment of an organism’s basic body plan, show only slight differences
between Latimeria, ray-finned fish and tetrapods; it would seem that
the protein-coding portion of this gene family, along with several
others (Supplementary Note 9, Supplementary Tables 12–16 and Sup-
plementary Fig. 10), have remained largely conserved during the
vertebrate land transition (Supplementary Fig. 11).

As vertebrates transitioned to a new land environment, changes
occurred not only in gene content but also in the regulation of existing
genes. Conserved non-coding elements (CNEs) are strong candidates
for gene regulatory elements. They can act as promoters, enhancers,
repressors and insulators32,33, and have been implicated as major faci-
litators of evolutionary change34. To identify CNEs that originated in
the most recent common ancestor of tetrapods, we predicted CNEs
that evolved in various bony vertebrate (that is, ray-finned fish, coela-
canth and tetrapod) lineages and assigned them to their likely branch
points of origin. To detect CNEs, conserved sequences in the human
genome were identified using MULTIZ alignments of bony vertebrate
genomes, and then known protein-coding sequences, untranslated
regions (UTRs) and known RNA genes were excluded. Our ana-
lysis identified 44,200 ancestral tetrapod CNEs that originated after
the divergence of the coelacanth lineage. They represent 6% of the
739,597 CNEs that are under constraint in the bony vertebrate lin-
eage. We compared the ancestral tetrapod CNEs to mouse embryo
ChIP-seq (chromatin immunoprecipitation followed by sequencing)
data obtained using antibodies against p300, a transcriptional coacti-
vator. This resulted in a sevenfold enrichment in the p300 binding
sites for our candidate CNEs and confirmed that these CNEs are
indeed enriched for gene regulatory elements.

Each tetrapod CNE was assigned to the gene whose transcription
start site was closest, and gene-ontology category enrichment was cal-
culated for those genes. The most enriched categories were involved
with smell perception (for example, sensory perception of smell,
detection of chemical stimulus and olfactory receptor activity). This
is consistent with the notable expansion of olfactory receptor family
genes in tetrapods compared with teleosts, and may reflect the neces-
sity of a more tightly regulated, larger and more diverse repertoire of
olfactory receptors for detecting airborne odorants as part of the
terrestrial lifestyle. Other significant categories include morphoge-
nesis (radial pattern formation, hind limb morphogenesis, kidney mor-
phogenesis) and cell differentiation (endothelial cell fate commitment,

ARTICLE RESEARCH

1 8 A P R I L 2 0 1 3 | V O L 4 9 6 | N A T U R E | 3 1 3

Macmillan Publishers Limited. All rights reserved©2013

epithelial cell fate commitment), which is consistent with the body-
plan changes required for land transition, as well as immunoglobulin
VDJ recombination, which reflects the presumed response differences
required to address the novel pathogens that vertebrates would encoun-
ter on land (Supplementary Note 10 and Supplementary Tables 17–24).

A major innovation of tetrapods is the evolution of limbs charac-
terized by digits. The limb skeleton consists of a stylopod (humerus or
femur), the zeugopod (radius and ulna, or tibia and fibula), and an
autopod (wrist or ankle, and digits). There are two major hypotheses
about the origins of the autopod; that it was a novel feature of tetra-
pods, and that it has antecedents in the fins of fish35 (Supplementary
Note 11 and Supplementary Fig. 12). We examine here the Hox
regulation of limb development in ray-finned fish, coelacanth and
tetrapods to address these hypotheses.

In mouse, late-phase digit enhancers are located in a gene desert
that is proximal to the HOX-D cluster36. Here we provide an align-
ment of the HOX-D centromeric gene desert of coelacanth with those
of tetrapods and ray-finned fishes (Fig. 2a). Among the six cis-regulatory
sequences previously identified in this gene desert36, three sequences
show sequence conservation restricted to tetrapods (Supplementary
Fig. 13). However, one regulatory sequence (island 1) is shared by tetra-
pods and coelacanth, but not by ray-finned fish (Fig. 2b and Supplemen-
tary Fig. 14). When tested in a transient transgenic assay in mouse, the
coelacanth sequence of island 1 was able to drive reporter expression in a
limb-specific pattern (Fig. 2c). This suggests that island 1 was a lobe-
fin developmental enhancer in the fish ancestor of tetrapods that was
then coopted into the autopod enhancer of modern tetrapods. In this
case, the autopod developmental regulation was derived from an ances-
tral lobe-finned fish regulatory element.

Changes in the urea cycle provide an illuminating example of the
adaptations associated with transition to land. Excretion of nitrogen is
a major physiological challenge for terrestrial vertebrates. In aquatic
environments, the primary nitrogenous waste product is ammonia,
which is readily diluted by surrounding water before it reaches toxic
levels, but on land, less toxic substances such as urea or uric acid must
be produced instead (Supplementary Fig. 15). The widespread and
almost exclusive occurrence of urea excretion in amphibians, some
turtles and mammals has led to the hypothesis that the use of urea as
the main nitrogenous waste product was a key innovation in the
vertebrate transition from water to land37.

With the availability of gene sequences from coelacanth and lungfish,
it became possible to test this hypothesis. We used a branch-site model

in the HYPHY package38, which estimates the ratio of synonymous (dS)
to non-synonymous (dN) substitutions (v values) among different
branches and among different sites (codons) across a multiple-species
sequence alignment. For the rate-limiting enzyme of the hepatic urea
cycle, carbamoyl phosphate synthase I (CPS1), only one branch of the
tree shows a strong signature of selection (P 5 0.02), namely the branch
leading to tetrapods and the branch leading to amniotes (Fig. 3); no
other enzymes in this cycle showed a signature of selection. Conversely,
mitochondrial arginase (ARG2), which produces extrahepatic urea as a
byproduct of arginine metabolism but is not involved in the production
of urea for nitrogenous waste disposal, did not show any evidence of
selection in vertebrates (Supplementary Fig. 16). This leads us to con-
clude that adaptive evolution occurred in the hepatic urea cycle during
the vertebrate land transition. In addition, it is interesting to note that
of the five amino acids of CPS1 that changed between coelacanth and
tetrapods, three are in important domains (the two ATP-binding sites
and the subunit interaction domain) and a fourth is known to cause a
malfunctioning enzyme in human patients if mutated39.

The adaptation to a terrestrial lifestyle necessitated major changes in
the physiological environment of the developing embryo and fetus,
resulting in the evolution and specialization of extra-embryonic mem-
branes of the amniote mammals40. In particular, the placenta is a com-
plex structure that is critical for providing gas and nutrient exchange
between mother and fetus, and is also a major site of haematopoiesis41.

We have identified a region of the coelacanth HOX-A cluster that
may have been involved in the evolution of extra-embryonic struc-
tures in tetrapods, including the eutherian placenta. Global alignment
of the coelacanth Hoxa14–Hoxa13 region with the homologous
regions of the horn shark, chicken, human and mouse revealed a
CNE just upstream of the coelacanth Hoxa14 gene (Supplementary
Fig. 17a). This conserved stretch is not found …

error: Content is protected !!