Rapid isolation of gene homologs across taxa: Efficient identification and isolation of gene orthologs from non-model organism genomes, a technical report
© Heffer and Pick; licensee BioMed Central Ltd. 2011
Received: 16 December 2010
Accepted: 1 March 2011
Published: 1 March 2011
Tremendous progress has been made in the field of evo-devo through comparisons of related genes from diverse taxa. While the vast number of species in nature precludes a complete analysis of the molecular evolution of even one single gene family, this would not be necessary to understand fundamental mechanisms underlying gene evolution if experiments could be designed to systematically sample representative points along the path of established phylogenies to trace changes in regulatory and coding gene sequence. This isolation of homologous genes from phylogenetically diverse, representative species can be challenging, especially if the gene is under weak selective pressure and evolving rapidly.
Here we present an approach - Rapid Isolation of Gene Homologs across Taxa (RIGHT) - to efficiently isolate specific members of gene families. RIGHT is based upon modification and a combination of degenerate polymerase chain reaction (PCR) and gene-specific amplified fragment length polymorphism (AFLP). It allows targeted isolation of specific gene family members from any organism, only requiring genomic DNA. We describe this approach and how we used it to isolate members of several different gene families from diverse arthropods spanning millions of years of evolution.
RIGHT facilitates systematic isolation of one gene from large gene families. It allows for efficient gene isolation without whole genome sequencing, RNA extraction, or culturing of non-model organisms. RIGHT will be a generally useful method for isolation of orthologs from both distant and closely related species, increasing sample size and facilitating the tracking of molecular evolution of gene families and regulatory networks across the tree of life.
One focus of evolutionary biologists is to understand how changes in regulatory and coding regions of genes contribute to species evolution and adaptation [1, 2]. This requires sequence comparisons across distantly related taxa as well as among closely related species. A major limitation in studying molecular evolution is the amount of comprehensive sequence data available to track these changes in genes and their networks. Standard approaches include comparisons across widely divergent model organisms, comparison of gene sequences that have been deposited in databases, and comparisons of whole genome sequences. This can result in an incomplete matrix of information about the lineages of particular gene families, making it difficult to trace steps leading to functional changes in regulatory and coding sequences. Additionally, the sequence conservation of duplicated and diverged genes within gene families [3, 4] poses a challenge: How can we identify a particular member of a gene family without isolating and screening through closely-related homologs? Here we report a strategy to efficiently isolate genes from genomic DNA that can be used to obtain sequence information from un-sequenced genomes and non-model organisms not easily reared in the laboratory. Rapid Isolation of Gene Homologs across Taxa (RIGHT) is based on the fact that homologous genes (both paralogs and orthologs) generally show conservation of at least one domain, even if other parts of the sequence are under weaker selective pressure. For example, the Hox proteins have retained the conserved DNA binding domain after duplication and divergence [5, 6]. While not forging fundamentally new technology, this approach combines and modifies existing procedures to facilitate the rapid isolation of genes, allowing sampling of a large number of taxa.
Sequence up- and downstream of the conserved region (obtained in Step 1, Figure 1) is next isolated by modifications of AFLP and TE-display techniques [7–12] that allow selective amplification of only the gene sequence of interest. Traditional AFLP uses restriction enzymes to digest genomic DNA followed by ligation of adapters of known sequence to DNA ends. Adapter-specific primers are used in subsequent PCRs to amplify DNA fragments, which are then separated on a gel and analyzed. RIGHT uses the basic idea of AFLP up to the amplification step; however, instead of amplifying DNA fragments using adapter sequences as both primers (which generates many fragments), an adapter-specific primer is used as one primer and a gene-specific primer (derived from degenerate PCR used in Figure 1, Step 1) as the other primer. Thus, only a sequence from the gene of interest is isolated. The digestion of genomic DNA and ligation of adapters is done in a single step (Figure 1, Step 2). Adapter sequences are designed to anneal to, but destroy, restriction sites in order to avoid re-digestion in this combined restriction/ligation reaction. Several different restriction digests are set up in parallel to provide different-length PCR templates covering the gene of interest. This is also beneficial because restriction site locations are not known for genomes that have not been sequenced. The digestion/ligation is followed by two rounds of nested PCR (Figure 1, Step 3), which functions to increase specificity of primer binding and the amount of product. After the PCR product is amplified and sequenced, new gene-specific primers are designed at the sequence ends to repeat PCRs with a different restriction digest/ligation as template in order to extend the sequence. By repeating this process, one can "walk" along the genomic sequence to isolate the entire coding sequence (Figure 1, Step 4).
In most cases only one clear product was observed after nested PCR; however, occasionally there were several. In these situations, either all products were sequenced or products were re-amplified using the same primers or another nested set to reduce the number of products. In cases where multiple bands persisted, it was usually due to restriction sites that were very close together in the genome and almost all of the sequenced regions overlapped. After a new sequence has been isolated, its continuity is always checked by PCR with primers at extreme opposite ends of the sequence that has been obtained to make sure the sequence being isolated is contiguous with that upstream and/or downstream (Figure 1, Step 5). This is very important because, although infrequent, ligation may occur between genomic DNA fragments in Step 2. As demonstrated, RIGHT provides efficiency and saves time when compared to other protocols. This combination is a powerful method for obtaining full gene sequence information, including coding and regulatory regions.
RIGHT isolation of homeobox and nuclear receptor genes
In addition to ftz, we isolated other homeobox-containing genes such as extradenticle (exd) and the orphan nuclear receptor ftz-f1 from multiple species with great success (unpublished). RIGHT was used to isolate partial exd sequences from Thermobia domestica (firebrat), Callosobruchus maculatus (beetle), and Folsomia candida (collembolan). In combination with RACE, full-length exd coding regions were isolated from these species. Several partial ftz-f1 sequences were isolated, including Artemia salina (brine shrimp), Folsomia candida (collembolan), Thermobia domestica (firebrat), Callosobruchus maculatus (beetle), Dermestes maculatus (beetle) Oncopeltus fasciatus (milkweed bug), and Acyrthosiphon pisum (aphid). As for exd, full-length ftz-f1 sequences have been obtained from many of these organisms in combination with RACE. For this work, as per experimental design, sequences were obtained from species representing key points in arthropod phylogeny to allow for systematic analysis of a small network of functionally related genes from different families (ftz, ftz-f1, exd). Thus far, every gene that we have attempted to isolate from any chosen species using RIGHT has been obtained.
The ability to isolate homologous genes from diverse taxa will empower studies of molecular evolution of genes, families and gene networks. In the past, these approaches were limited by absence of genomic information. Even though genome sequencing is now practical for a larger number of species, it is unlikely to make a dent in the millions of species on Earth. Similarly, investments are being made in developing new model systems, to expand on the standard fly, mouse and worm systems. However, the investment to bring a new model system up to speed is substantial and it is neither necessary nor practical to fully develop hundreds of genetic model systems. We suggest that these approaches, while enormously important for the field of evo-devo, are not always necessary to answer specific evolutionary questions. RIGHT provides a fast and efficient way to isolate genes, including coding regions and candidate cis-regulatory regions, and overcomes many practical constraints, realistically allowing for the isolation of 10s if not 100s of genes from families or gene networks to study molecular evolution across divergent taxa or within specific clades. This approach obviates common limitations, such as genome sequence availability or rearing species in the lab. It has been used successfully to isolate specific members of several large gene families, allowing for a comparative analysis over millions of years of evolutionary time.
Amplified Fragment Length Polymorphism
- Antp :
- exd :
- ftz :
- ftz-f1 :
fushi tarazu factor 1
Polymerase Chain Reaction
Rapid Amplification of cDNA Ends
Rapid Isolation of Gene Homologs across Taxa
- Scr :
Sex combs reduced
We thank Arun Subramanian, Jerry Regier, Dave Hawthorne and Jeff Shultz for helpful suggestions and David O'Brochta for comments on the manuscript. AH acknowledges support from the University of Maryland's Graduate Student Summer Fellowship. This work was supported by NSF grant IBN0641717 to L.P.
- Schlosser G, Wagner GP: Modularity in Development and Evolution. 2004, Chicago: University of Chicago PressGoogle Scholar
- Carroll SB, Grenier JK, Weatherbee SD: From DNA to diversity: molecular genetics and the evolution of animal design. 2005, Oxford: Blackwell Science Ltd, SecondGoogle Scholar
- Ohno S: Evolution by gene duplication. 1970, Berlin: Springer-VerlagView ArticleGoogle Scholar
- Lynch M, Force A: The probability of duplicate gene preservation by subfunctionalization. Genetics. 2000, 154: 459-473.PubMed CentralPubMedGoogle Scholar
- McGinnis W, Krumlauf R: Homeobox genes and axial patterning. Cell. 1992, 68: 283-302. 10.1016/0092-8674(92)90471-N.View ArticlePubMedGoogle Scholar
- Gehring WJ, Kloter U, Suga H: Evolution of the Hox gene complex from an evolutionary ground state. Curr Top Dev Biol. 2009, 88: 35-61. full_text.View ArticlePubMedGoogle Scholar
- Vos P, Hogers R, Bleeker M, Reijans M, van de Lee T, Hornes M, Frijters A, Pot J, Peleman J, Kuiper M: AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res. 1995, 23: 4407-4414. 10.1093/nar/23.21.4407.PubMed CentralView ArticlePubMedGoogle Scholar
- Beeman RW, Stauth DM: Rapid cloning of insect transposon insertion junctions using 'universal' PCR. Insect Mol Biol. 1997, 6: 83-88. 10.1046/j.1365-2583.1997.00159.x.View ArticlePubMedGoogle Scholar
- Casa AM, Brouwer C, Nagel A, Wang L, Zhang Q, Kresovich S, Wessler SR: Inaugural article: the MITE family heartbreaker (Hbr): molecular markers in maize. Proc Natl Acad Sci USA. 2000, 97: 10083-10089. 10.1073/pnas.97.18.10083.PubMed CentralView ArticlePubMedGoogle Scholar
- Biedler J, Qi Y, Holligan D, della Torre A, Wessler S, Tu Z: Transposable element (TE) display and rapid detection of TE insertion polymorphism in the Anopheles gambiae species complex. Insect Mol Biol. 2003, 12: 211-216. 10.1046/j.1365-2583.2003.00403.x.View ArticlePubMedGoogle Scholar
- Hawthorne DJ: AFLP-based genetic linkage map of the Colorado potato beetle Leptinotarsa decemlineata: sex chromosomes and a pyrethroid-resistance candidate gene. Genetics. 2001, 158: 695-700.PubMed CentralPubMedGoogle Scholar
- Subramanian RA, Arensburger P, Atkinson PW, O'Brochta DA: Transposable element dynamics of the hAT element Herves in the human malaria vector Anopheles gambiae s.s. Genetics. 2007, 176: 247724-87. 10.1534/genetics.107.071811.View ArticleGoogle Scholar
- Telford MJ: Evidence for the derivation of the Drosophila fushi tarazu gene from a Hox gene orthologous to lophotrochozoan Lox5. Curr Biol. 2000, 10: 349-352. 10.1016/S0960-9822(00)00387-0.View ArticlePubMedGoogle Scholar
- Lohr U, Yussa M, Pick L: Drosophila fushi tarazu: a gene on the border of homeotic function. Curr Biol. 2001, 11: 1403-1412. 10.1016/S0960-9822(01)00443-2.View ArticlePubMedGoogle Scholar
- Lohr U, Pick L: Cofactor-interaction motifs and the cooption of a homeotic Hox protein into the segmentation pathway of Drosophila melanogaster. Curr Biol. 2005, 15: 643-649. 10.1016/j.cub.2005.02.048.View ArticlePubMedGoogle Scholar
- Heffer A, Shultz J, Pick L: Surprising flexibility in a conserved Hox transcription factor over 550 million years of evolution. Proc. Natl. Acad. Sci. USA. 2010, 107: 18040-18045. 10.1073/pnas.1010746107.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.