A comparative gene expression database for invertebrates
© Ormestad et al; licensee BioMed Central Ltd. 2011
Received: 23 May 2011
Accepted: 24 August 2011
Published: 24 August 2011
As whole genome and transcriptome sequencing gets cheaper and faster, a great number of 'exotic' animal models are emerging, rapidly adding valuable data to the ever-expanding Evo-Devo field. All these new organisms serve as a fantastic resource for the research community, but the sheer amount of data, some published, some not, makes detailed comparison of gene expression patterns very difficult to summarize - a problem sometimes even noticeable within a single lab. The need to merge existing data with new information in an organized manner that is publicly available to the research community is now more necessary than ever.
In order to offer a homogenous way of storing and handling gene expression patterns from a variety of organisms, we have developed the first web-based comparative gene expression database for invertebrates that allows species-specific as well as cross-species gene expression comparisons. The database can be queried by gene name, developmental stage and/or expression domains.
This database provides a unique tool for the Evo-Devo research community that allows the retrieval, analysis and comparison of gene expression patterns within or among species. In addition, this database enables a quick identification of putative syn-expression groups that can be used to initiate, among other things, gene regulatory network (GRN) projects.
Laboratories which use well-established developmental biology model systems have recognized the importance of species-specific databases and developed extensive tools for their community, for example Zfin (zebrafish, http://zfin.org) , MEPD (medaka, http://ani.embl.de:8080/mepd/) , FlyBase (Drosophila, http://flybase.org, BDGB (Drosophila, http://www.fruitfly.org/DGC/index.html) , WormBase (C. elegans, http://wormbase.org), Aniseed (Ciona, http://aniseed-ibdm.univ-mrs.fr/), Gene Expression Database (GXD) at MGI (mouse, http://www.informatics.jax.org/) [5–7]) EMAGE (mouse, http://www.emouseatlas.org) ([8–10]), XENBASE (Xenopus, http://www.xenbase.org) ([11–13]) and a comparative database 4DXpress for major animal model species http://ani.embl.de/4DXpress. Recently, efforts have also been made to create databases for emerging model systems such as cnidarians (StellaBase (http://www.stellabase.org)  and Platynereis PEPD (http://ani.embl.de:8080/pepd/). These databases are mainly used to combine all available resources (genome, expressed sequence tag's (EST), transgenes, publications, expression data, and so on) and the large amount of data makes it sometimes tricky to retrieve a simple gene expression pattern and its related information in an intuitive manner.
Although some of these databases offer comparison between taxonomically related species (for example, among tunicates, Aniseed), we know of no user-friendly and intuitive tool for large-scale comparison of gene expression patterns among diverse organisms.
To address questions about functional embryonic development and body plan patterning or evolution, it is crucial to identify genes involved in developmental processes and tissue specific markers.
The increased feasibility of sequencing projects, due to massively parallel sequencing technologies, is leading to a steady appearance of new data and gene expression patterns from a growing list of formerly understudied species. Currently, comparison of gene expression patterns among species involves numerous hours of searching for the publications of interest. Interpretation of the data may be quite difficult in the case of taxonomically unrelated organisms. In order to collect, manage and facilitate the process of organization and mining of data we have created a freely accessible online comparative gene expression database for invertebrates. We expect this to aid in the study, analysis and interpretation of developmentally regulated processes across evolutionary diverse organisms. The following sections will describe the basic features of that database and how to easily identify putative syn-expression groups  for further GRN analysis. A more detailed manual of this scientific tool can be downloaded directly from the database website (http://www.kahikai.org/index.php?content=genes).
A comparative gene expression database for invertebrates
As several genomic and/or transcriptome resources are already available for most currently studied species (vertebrate as well as invertebrate models), we focused our interest on an intuitive way to gather, store, make available and compare gene expression patterns of invertebrates. Currently, the database contains data from 18 different species from seven distinct phyla (Ctenophores, Cnidaria, Aceolomorpha, Ecdysozoa, Lophotrochozoa, Echinodermata, Hemichordata), more than 180 genes, 210 experiments (72 'unpublished', that is not accessible to the whole community) and more than 1,300 images. The database is used for storing and browsing in situ gene expression data as well as immunohistochemistry (IHC) information, comparing gene expression patterns within or among species and identifying possible syn-expression groups. The contributor can assign expression data to different experiment types such as wild type expression, drug treatments, mRNA or morpholino injection and so on, providing an easy means of organizing functional studies. The unique aspect of this database is the flexibility of data storage and organization allowing for a set of gene expression patterns whose access can be distributed to a given laboratory, a group of collaborators or the whole community, by the contributor (refer to the online manual for more details). In any case, the data will always be accessible to the current members of a laboratory who can add new experiments to complete missing information (for example stages) or complementary information (for example drug treatments, gene perturbation) for a given expression pattern. This database can be accessed from anywhere in the world and is backed up frequently on the host servers http://www.dreamhost.com, removing this burden from individual laboratories.
Construction and data integration
The primary objective of this platform is to create one database in which users can easily compare datasets from a diverse selection of organisms. In this first version of the database we have chosen not to include any sequence information and instead fully concentrate on the comparison of manually annotated expression patterns. Since these kinds of comparisons are fairly simple we have chosen to build the site using a PHP/MySQL backend. The manual annotation steps are kept to a minimum in order to keep data uploads homogenous and straightforward for the user. Our database can store all the information required for the MISFISHIE standard (minimum information specification for in situ hybridization and immunohistochemistry (IHC) experiments) ([17,,18]). This format will allow us to integrate other systems and provide the information required to interact and exchange information with other resources in the future.
Integration of this database into an online community
This database is openly accessible via Internet and is hosted on our web community platform 'Kahi Kai' (meaning 'one ocean' in Hawaiian, http://www.kahikai.org), allowing researchers to interact, collaborate, and share their data. All contributions will have a clear identification of the author of the expression pattern and the laboratory they are associated with. Every user can decide if they want to 'publish' and therefore share the gene expression pattern with the whole community or keep it 'unpublished' and visible only to a user-defined subset of members (for example the lab where he/she works). These 'unpublished' expression patterns can be made accessible to other groups, allowing collaboration and sharing of unpublished data with other users outside of the local lab (refer to the online manual). For security reasons and to avoid unsolicited content in our database, unpublished data are also visible to the site administrators and will be handled with the highest confidentiality. Our vision is that this database becomes an integral part of the Evo-Devo community and that it will provide a useful tool for ongoing and future collaborations.
The classification of genes from multiple organisms into orthologous groups is the prerequisite for comparing gene expression and function. Although approaches to identify gene orthologies have improved in the last decade(s), concerns still exist especially in non-bilaterian animals such as cnidarians, sponges or ctenophores, in which orthology can be problematic and complete annotated genomes that are required to identify paralogy groups, may not yet be available. Therefore, we decided not to implement an automatic way to analyze gene orthologies in the first version of this database, but plan to do so when a reliable online platform becomes available. In the meantime each user has responsibility for assigning gene orthology before publishing it within the database (as is the case for peer reviewed articles). All genes that cannot be assigned to a clear ortholog group can be named as XXX-like gene (for example. 'nodal-like'). Raw sequence data will be available as a Genbank retrieval accession number.
Developmental stages and expression domains
Comparing developmental stages within a single species does not represent major difficulties and depends on the fine scale precision of the gene expression annotation generated by the user. However, the existence of different life cycles in marine invertebrates within the same phylum (for example echinoids) gives rise to developmental stages (larval stage) within indirect developing species that are absent in direct developers (no larval phase per se). These differences make comparison between two species a difficult task, and it gets more complicated as a greater number of species are added to the list. So to avoid matching non-comparable stages in cross-species analysis we decided to compare only broadly accepted parent stages. In table 1, we compare the developmental stages of the cnidarian Nematostella vectensis with the acoel flatworm Convolutriloba longifissura and classify the various stages that will be used in the comparison algorithm of our database. Obviously, some temporal resolution is lost when reducing development into only six chronological stages shared among invertebrates, so each parent stage is assigned taxon-specific child stages that are used when doing intra-species comparisons of gene expression.
Even more difficult than comparing developmental stages is the comparison of localized gene expression among 'complex' animals that possess taxon-specific structures (for example gill slits or tube feet). In order to overcome the issue of comparing non-homologous structures, we defined parent territories that can be subdivided in species-specific expression domains. In table 2 we show the expression domains defined for the same species as described above (N.vectensis and C. longifissura) sorted by broad domains by which it can be argued to be homologous between different taxa.
In order to extend this standardization of species-specific developmental stages and expression domains to all species present in the database, we are currently implementing a Comparative Embryonic Developmental database that will provide an overview of the main developmental steps and morphological structures of a variety of animals. We anticipate that this database may help facilitate the move towards a common set of experimental conditions in studies of a particular organism. In addition, by cross referencing both gene expression and embryology databases we can provide information required for the understanding of species-specific development and expression domains for the non-specialist.
Adding and editing species, genes and experiments
To get the most use from this database (advanced search, comparison and suggestions of gene expression similarities in a species-specific context) it is important for the user to input stage and expression information in a standardized manner. Therefore, we have made an effort to simplify the upload procedure to the bare minimum that utilizes the user's knowledge of their experimental system. To add a new species (requires registration), the user simply submits an online form, defining the species-specific developmental stages and expression domains associated with the individual developmental stages and expression domains. The user should submit a picture of the adult and a publication relevant to the staging system of the animal. After verification, an administrator adds the species information to the database enabling the user to submit genes and experiments. To add genes and their expression patterns, the user needs to follow the instructions given on the website step-by-step. Mainly, these steps consist of i) adding species-specific gene information (gene name, synonyms (additional and/or former names), and relevant publications if available), ii) assigning an experiment (in situ hybridization, IHC) to that gene, providing the required minimum information (type of staining, temperature, vector and so on) that would allow other users to reproduce the experiment, iii) uploading individual images (expression patterns should always be oriented in the defined direction to facilitate comparison), iv) assigning the images to the corresponding developmental stages and v) annotating the expression domains. Each user can edit and delete his/her own experiments and in case a user wishes to edit information not added by him, he can contact the person associated with the information, leave a comment or contact a Kahi Kai administrator. The use of images extracted/cropped from publication figures may fall under copyright infringement of the given journals. We therefore highly recommend the use of original images (that have or have not been used in publications).
A more detailed manual and guidelines can be requested by email or downloaded directly on the website. http://www.kahikai.org/index.php?content=genes.
Utility - a case study
In its current state, this database is used to store annotated RNA in situ hybridization and IHC information for marine invertebrates that can be searched and compared within or between different species. The expression data is assigned to different experiment types such as wild type expression, mRNA or morpholino injection and so on, making it easy to analyze results from functional studies. We will present the general aspects of the database and guide the reader through the various query options and the analysis of the result pages using as examples published endodermal genes expressed in the cnidarian N. vectensis.
Querying the database
Query by gene name
In order to compare gene expression patterns across larger evolutionary distances, the user can query the database for all species available in the repository. He can do so by entering the gene name (otx) or its synonyms (orthodenticle, otd) into the search field on the starting page and keep the species selector on its default position (All species) (Figure 4A). The following results page lists all the otx genes from all the species uploaded into the database. In this example, all three above mentioned N. vectensis (Cnidaria, ), one Ptychodera flava (Hemichordata, ), two Parhyale hawaiensis (Ecdysozoa, ), one Fungia scutaria (Cnidaria, Loeffler et al. unpublished) and one Terebratalia transversa (Spiralia,  are shown. Clicking the gene name in front of the species name will lead to the entire uploaded expression pattern for that gene in the given animal (data not shown). On the other hand, selecting and comparing all identified genes will lead to a comparison table that will indicate in what animal otx orthologs are expressed at a similar developmental stage in a similar expression domain. Clicking the yellow square will provide access to the gene expression patterns in the various species at the given stage. This quick analysis shows that similar to N. vectenis otxA, otxB and otxC, F. scutaria otx is expressed in the pharyngeal ectoderm in larval stages, and P. flava otx is expressed similarly to all N. vectenis otx in the presumptive endomesoderm (data not shown).
Species-specific comparison of gene expression (otxA, otxB and otxC)
For the following example we selected otxA and otxC and compared it with the initial N. vectensis query otxB (Figure 3B). The default output (Figure 5A) of this comparison is a table (developmental stages versus expression domains) in which all green cells indicate co-expression of the selected genes at that particular combination of stage and domain, suggesting that these genes may be involved in a same biological process, defining a putative syn-expression group . Yellow cells indicate that at least two of the three genes are co-expressed at that stage in the given domain, while orange, black or white cells indicate that only one gene is expressed, no gene is expressed or no information is available, respectively. By clicking on one of the green cells the images corresponding to the genes co-expressed at the given stage are shown in a new window (Figure 5B), and by selecting an image, a final window will present detailed information about that particular stage/experiment with the option of downloading the image (Figure 5C). Analysis of these three expression patterns shows that these three cnidarian otx paralogs are co-expressed during most of embryonic development, but that otxC is not detected in the apical domain of the planula larvae, suggesting a differential transcriptional control of these factors in later stages (see ).
Species-specific identification of co-expressed genes
Cross-species (Larval/Embryo, Ectoderm)
By using the advanced search function on all species (default value) but defining the simplified and optimized 'parent' stage (Larval/Embryo) and the domain (Ectoderm) as search values the user will obtain a list of genes from all species that fulfill the query requirements. The steps are similar to the ones described for the species-specific advanced search and will lead to a comparison table that allow the visualization of the given gene expression patterns with access to a more detailed species-specific annotation simply by clicking on the image of interest. This can quickly identify candidate genes from different species that are expressed with similar developmental patterns.
Discussion and future directions
The major challenges for data repositories include the initial and continuous input of data into the database and their long-term sustainability. As described by Merali and Giles  community-based and driven databases are generally more successful than projects initiated and maintained by single labs or small research groups. The Evo-Devo field is an interactive and growing community and members are invited to submit their gene expression patterns as well as suggestions for improvement to this new database. Currently, the database contains only data from marine invertebrates, but contributions for terrestrial organisms are welcome. Our hope is that this will encourage researchers to share their data with the Evo-Devo community through a community platform that acts in parallel to peer-to-peer publications, improves the visibility of the published work and fosters scientific interactions.
We hope that community driven projects like this one will help improve the way we publish gene expression data today. Instead of collecting all information in static PDF's for print, all image data should primarily be annotated and published in a searchable and standardized format that then can be summarized for print (with automatic creation of hyperlinks to the online data).
The Kahi Kai non-profit organization is in the process of assembling an international scientific committee from different laboratories. We are considering using part of this committee to screen all added genes to minimize erroneous additions to the database in regard to naming conventions, duplicate entries, orthologies and image orientations. This review process will require all new additions to be put in a queue causing a slight delay (the reviewing time) before the expression data is visible to all users.
To make this tool even more useful, we are planning to add additional features including possibilities to implement qPCR and other quantified expression data such as RNA-seq and microarray that will enable gene regulatory network predictions. Therefore, we will develop tools to organize this information in gene regulatory networks where each node can be linked to the corresponding data, making it easy to check, confirm and compare relations.
Gene expression patterns are defined by precise developmental stages and embryonic regions/or germ layers of a given species. This information is usually only known by specialists, making comparison between species for non-specialists sometimes difficult. To facilitate this, we will associate each species present in the database with detailed information about their embryonic/larval development (database in progress). In addition, we anticipate adding illustrations of the developmental stages high-lighting the various domains relevant for the gene expression data (refer to Figure 1). We also consider that each species present in the database will be associated with information about habitat, life cycle, feeding behavior, spawning season and advice for laboratory cultures. This information will also be illustrated with high-resolution pictures of the animals, which can be used for outreach, education, scientific presentations and publications for example.
We encourage independent Principal Investigators who submit data to open access journals such as EvoDevo to also submit expression data to the Kahi Kai gene expression database.
The present comparative gene expression database allows storing, querying and sharing of data not only with the research community, but also in a more restricted way with a group of collaborators or at the level of a single laboratory. In its current state, this tool has been used to track and store the content of gene expression patterns from current and former lab members allowing new studies based on these resources. As described above, a few steps are sufficient to retrieve the expression pattern of a single gene, compare it to genes expressed in a similar way or even identify putative syn-expression groups in order to predict genetic interactions that can be tested with functional experiments. All available information is tightly linked to the user as well as the laboratory he is associated with, ensuring that each user and laboratory gets proper credit for their contributions. We have a strong focus on making simple and intuitive interfaces and we believe that this comparative gene expression pattern database in its current state will be a useful tool for the research community and students interested in zoology, and evolutionary and developmental biology. Further improvements and additions to the existing database in the future will further enhance its usability for the Evo-Devo community. By integrating scientific data, educational material and general information about animals on a community platform we hope to improve scientific outreach as well as provide students and teachers with means to study and to interact directly with the research community.
Availability and requirements
The database can be accessed at: http://www.kahikai.org/index.php?content=genes.
To query the database no restrictions apply. To add, edit or delete data, the user needs to be logged in.
gene regulatory network
hours post fertilization
expressed sequence tag.
The authors are thankful to the members of the Martindale lab for their inputs during the development of the database and to the researchers who submitted their published and unpublished data to the database during the test-phase. We also thank Timothy DuBuc and Elizabeth Vallen for critically reading the manuscript and discussion. This project was funded by the European Molecular Biology Organisation, (EMBO; to ER), by the Swedish Research Council, (Vetenskapsrådet; to MO) and by the National Science Foundation (NSF; to MQM).
- Sprague J, Bayraktaroglu L, Clements D, et al: The Zebrafish Information Network: the zebrafish model organism database. Nucleic Acids Res. 2006, 34: D581-D585. 10.1093/nar/gkj086.PubMed CentralView ArticlePubMedGoogle Scholar
- Heinrich T, Ramialison M, Wittbrodt B, et al: MEPD: a resource for medaka gene expression patterns. Bioinformatics. 2005, 21: 3195-3197. 10.1093/bioinformatics/bti478.View ArticleGoogle Scholar
- Tomancak P, Berman BP, Beaton A, et al: Global analysis of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 2007, 8: R145-10.1186/gb-2007-8-7-r145.PubMed CentralView ArticlePubMedGoogle Scholar
- Grumbling G, Strelets V: FlyBase: anatomical data, images and queries. Nucleic Acids Res. 2006, 34: D484-D488. 10.1093/nar/gkj068.PubMed CentralView ArticlePubMedGoogle Scholar
- Finger JH, Smith CM, Hayamizu TF, et al: The mouse Gene Expression Database (GXD): 2011 update. Nucleic Acids Res. 2011, 39: D835-D841. 10.1093/nar/gkq1132.PubMed CentralView ArticlePubMedGoogle Scholar
- Hill DP, Begley DA, Finger JH, et al: The mouse Gene Expression Database (GXD): updates and enhancements. Nucleic Acids Res. 2004, 32: D568-D571. 10.1093/nar/gkh069.PubMed CentralView ArticlePubMedGoogle Scholar
- Smith CM, Finger JH, Hayamizu TF, et al: The mouse Gene Expression Database (GXD): 2007 update. Nucleic Acids Res. 2007, 35: D618-D623. 10.1093/nar/gkl1003.PubMed CentralView ArticlePubMedGoogle Scholar
- Christiansen JH, Yang Y, Venkataraman S, et al: R: EMAGE: a spatial database of gene expression patterns during mouse embryo development. Nucleic Acids Res. 2006, 34: D637-D641. 10.1093/nar/gkj006.PubMed CentralView ArticlePubMedGoogle Scholar
- Richardson L, Venkataraman S, Stevenson P, et al: EMAGE mouse embryo spatial gene expression database: 2010 update. Nucleic Acids Res. 2010, 38: D703-D709. 10.1093/nar/gkp763.PubMed CentralView ArticlePubMedGoogle Scholar
- Venkataraman S, Stevenson P, Yang Y, et al: EMAGE--Edinburgh Mouse Atlas of Gene Expression: 2008 update. Nucleic Acids Res. 2008, 36: D860-D865.PubMed CentralView ArticlePubMedGoogle Scholar
- Bowes JB, Snyder KA, Segerdell E, et al: Xenbase: a Xenopus biology and genomics resource. Nucleic Acids Res. 2008, 36: D761-D767.PubMed CentralView ArticlePubMedGoogle Scholar
- Bowes JB, Snyder KA, Segerdell E, et al: Xenbase: gene expression and improved integration. Nucleic Acids Res. 2009, 38: D607-D612.PubMed CentralView ArticlePubMedGoogle Scholar
- Segerdell E, Bowes JB, Pollet N, et al: An ontology for Xenopus anatomy and development. BMC Dev Biol. 2008, 8: 92-10.1186/1471-213X-8-92.PubMed CentralView ArticlePubMedGoogle Scholar
- Haudry Y, Berube H, Letunic I, et al: 4DXpress: a database for cross-species expression pattern comparisons. Nucleic Acids Res. 2008, 36: D847-D853.PubMed CentralView ArticlePubMedGoogle Scholar
- Sullivan JC, Ryan JF, Watson JA, et al: StellaBase: the Nematostella vectensis Genomics Database. Nucleic Acids Res. 2006, 34: D495-D499. 10.1093/nar/gkj020.PubMed CentralView ArticlePubMedGoogle Scholar
- Niehrs C, Pollet N: Synexpression groups in eukaryotes. Nature. 1999, 402: 483-487. 10.1038/990025.View ArticlePubMedGoogle Scholar
- Deutsch EW, Ball CA, Berman JJ, et al: Minimum information specification for in situ hybridization and immunohistochemistry experiments (MISFISHIE). Nat Biotechnol. 2008, 26: 305-312. 10.1038/nbt1391.PubMed CentralView ArticlePubMedGoogle Scholar
- Deutsch EW, Ball CA, Bova GS, et al: Development of the Minimum Information Specification for In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE). OMICS. 2006, 10: 205-208. 10.1089/omi.2006.10.205.View ArticlePubMedGoogle Scholar
- Mazza ME, Pang K, Martindale MQ, et al: Genomic organization gene structure and developmental expression of three clustered otx genes in the sea anemone Nematostella vectensis. J Exp Zool B Mol Dev Evol. 2007, 308: 494-506.View ArticlePubMedGoogle Scholar
- Harada Y, Okai N, Taguchi S, et al: Developmental expression of the hemichordate otx ortholog. Mech Dev. 2000, 91: 337-339. 10.1016/S0925-4773(99)00279-8.View ArticlePubMedGoogle Scholar
- Browne WE, Schmid BG, Wimmer EA, et al: Expression of otd orthologs in the amphipod crustacean, Parhyale hawaiensis. Dev Genes Evol. 2006, 216: 581-595. 10.1007/s00427-006-0074-7.View ArticlePubMedGoogle Scholar
- Passamaneck YJ, Furchheim N, Hejnol A, et al: Ciliary photoreceptors in the cerebral eyes of a protostome larva. Evodevo. 2011, 2: 6-10.1186/2041-9139-2-6.PubMed CentralView ArticlePubMedGoogle Scholar
- Merali Z, Giles J: Databases in peril. Nature. 2005, 435: 1010-1011. 10.1038/4351010a.View ArticlePubMedGoogle Scholar
- Magie CR, Daly M, Martindale MQ: Gastrulation in the cnidarian Nematostella vectensis occurs via invagination not ingression. Dev Biol. 2007, 305: 483-497. 10.1016/j.ydbio.2007.02.044.View ArticlePubMedGoogle Scholar
- Fritzenwanker JH, Genikhovich G, Kraus Y, et al: Early development and axis specification in the sea anemone Nematostella vectensis. Dev Biol. 2007, 310: 264-279. 10.1016/j.ydbio.2007.07.029.View ArticlePubMedGoogle Scholar
- Lee PN, Kumburegama S, Marlow HQ, et al: Asymmetric developmental potential along the animal-vegetal axis in the anthozoan cnidarian, Nematostella vectensis, is mediated by Dishevelled. Dev Biol. 2007, 310: 169-186. 10.1016/j.ydbio.2007.05.040.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.