Morphology and function of the internal ectoderm
We examined the fine structure of the internal and external ectoderm in the region of the mouth of N. vectensis during feeding for evidence of morphological and functional variation (Fig. 1). Cells in the external ectoderm around the mouth are organized into a low cuboidal-type epithelium that covers the closed mouth between feeding events (Fig. 1a–d). In the presence of prey, the pharynx is partially everted, exposing the tall columnar epithelium of the pharyngeal ectoderm (Fig. 1e–g). After passing through the pharynx (Fig. 1h), ingested prey remains in contact with the ectodermal portion of the mesenteries, which is populated by cnidocytes and gland cells (Fig. 1i–k).
The pharyngeal ectoderm contains numerous distinct electron dense (zymogen-secreting) and electron lucent (mucus-secreting) gland cells (Fig. 2a–f). The adjacent non-secretory cells in this epithelium have distinctive apical electron-dense vesicles (Fig. 2f). The proximal region of the mesentery (adjacent to the body wall) is comprised of endoderm, while the distal portion (the free edge) is comprised of ectoderm (Fig. 2g) [10, 12, 15]. The ectodermal region gives rise to both the cnidoglandular tract at the most distal extent (Fig. 2h–j, m–o) and the ciliated tract more proximally (Fig. 2k, l). Thin sections of the ectodermal mesentery in the oral region (near the pharynx) show abundant zymogen gland cells (Fig. 2h–j), some of which contain secretory vesicles with heterogeneous contents (Fig. 2j). Ciliated tracts are short and are present only in the oral end of each mesentery. Cells of the ciliated tract are highly proliferative and have apical motile cilia but do not have other distinguishing features (Fig. 2k, l). The aboral mesentery lacks a ciliated tract but the cnidoglandular tract still contains numerous distinct zymogen gland cells, some with motile apical cilia (Fig. 2m–o). Mucus-secreting cells were found in the pharyngeal ectoderm (Fig. 2d) and in the external ectoderm of the body wall and tentacles (Additional file 1), but never in the endoderm.
Proteolytic enzymes are expressed in the developing mesenteries
We previously identified numerous genes encoding different classes of proteases to be upregulated in the adult mesentery of N. vectensis [11]. Using in situ hybridization, we examined the spatial and temporal expression of various classes of proteases identified from this study during early development of the pharynx and mesenteries to understand the ontogeny of digestive function and the onset of terminal gut cell differentiation. All genes examined were expressed in individual ectodermal cells of the mesenteries at the primary polyp stage, just after metamorphosis (Fig. 3a, b); two protease genes (NVJ_82725 and NVJ_83864) were also expressed in the pharyngeal ectoderm of the primary polyp. There was surprisingly little variation in the onset of protease expression, although serine proteases (trypsins) consistently exhibited expression in the early planula stage before differentiation of the presumptive pharynx and mesenteries (Fig. 3b). Double fluorescent in situ hybridization for two metalloprotease genes (NVJ_88668 and NVJ_2109) indicates both co-expression of these two enzymes in few cells at the aboral end of the pharynx and independent expression of the two genes in distinct cells of the ectodermal mesenteries in the late tentacle bud stage (Fig. 3c). These results suggest that adult gland cell identity is acquired very early in development, coincident with the morphogenesis of the pharynx and mesenteries.
The surprising lack of any obvious spatial segregation in protease expression led us to hypothesize that many proteases may be co-expressed together in the few anatomically distinguishable gland cells identified above (Fig. 2). Using the raw data from a single-cell RNA-Seq study published previously [16], we show co-expression of 6 of the 10 proteases we studied by in situ hybridization in a single putative gland cell (Fig. 3d). Using the raw data from the same study and a very low cutoff for gene expression (N ≥ 1 read), we examined more fully the co-expression of the large superfamily of trypsin proteases and found 6727 cells expressing at least one trypsin gene. Nearly, 50% of the trypsin-expressing cells (3282/6727) appear to express only a single trypsin, while the remaining cells exhibited co-expression of up to 24 trypsins (Fig. 3e). For each trypsin, we then examined the relationship between the ubiquity of expression (the total number of cells in which that trypsin is expressed) and the number of cells in which it is co-expressed with other trypsins and found a strong positive correlation (Fig. 3f), confirming that the trypsins with the broadest expression profiles were most likely to be co-expressed with other trypsins.
The tryptome of N. vectensis is unique
To characterize the tryptome (all proteins with a trypsin domain) of N. vectensis, we searched the JGI gene models (https://genome.jgi.doe.gov/Nemve1/Nemve1.home.html) for all sequences containing a significant Trypsin or Trypsin_2 domain using hmmsearch (HMMER 3.1b2; http://hmmer.org) and constructed domain architecture diagrams for each protein (Fig. 4). Of the 72 trypsin gene models that remained after curation (see “Methods”), 28 encode a trypsin domain but lack any other conserved domains and the other 44 encode a trypsin domain and at least one additional conserved domain. In total, trypsin domains were found in association with 24 other domains in N. vectensis. To determine if any of these associated domains were overrepresented in the tryptome, we compared the abundance of trypsin-associated domains in the tryptome and in the proteins predicted from the JGI gene models (N = 27,273 protein predictions). Six domains were found to be represented in high abundance (≥ 10%) in the tryptome: DIM, ShK, Lustrin_cystein, Sushi, MAM and SRCR (Fig. 4a). The DIM and Lustrin_cystein domains are present in low abundance throughout the predicted proteome (1 and 4 total domains, respectively), artificially inflating their perceived abundance in the tryptome. For ShK, Sushi, MAM, and SRCR, ≥ 15% of the domains found in the proteome were associated with trypsins, suggesting the association between trypsin and each of these domains provides a strong selective advantage in the biology of N. vectensis.
To determine whether the makeup of the tryptome was unique to N. vectensis, we searched for proteins with these same domain architectures in representatives from all domains of life (other cnidarians, bilaterians, non-metazoan eukaryotes, and a selection of prokaryotes). Two domain architectures were found to be present across taxa: those with only a trypsin domain, and those with a trypsin and a PDZ domain (Fig. 4b). Trypsin diversity appears to have expanded considerably with the evolution of multicellular animals, as both choanoflagellate lineages had fewer than 5 trypsins but the ctenophore Mnemiopsis leidyi and the placozoan Trichoplax adhaerens (representing two of the earliest diverging animal lineages) both have at least 20. Surprisingly, there was little conservation in trypsin domain architecture across animals. The tryptome of N. vectensis had more trypsin domain architectures in common with other actiniarians (sea anemones) than with any other animal group; however, we still identified 3 trypsin architectures unique to N. vectensis that were absent event from Edwardsiella lineata (a representative of the genus sister to Nematostella). Two of these (NVJ_105271 and NVJ_199428) represent unique associations between trypsin and other conserved domains (WSC and DIM, respectively) and the other (NVJ_105548) exhibits a novel arrangement of trypsin and its associated MAM domains (Fig. 4b).
Trypsins diversified independently in cnidarians and bilaterians
To characterize the diversification of animal trypsins, we built a phylogeny of trypsin domains from taxa representing each of the 5 major animal lineages: bilaterians, cnidarians, placozoans, sponges, and ctenophores. Using this tree, we identify 6 clades of trypsins and classify them by their function in human: a non-catalytic group, the intracellular trypsins, tryptases and transmembrane trypsins, trypsins involved in coagulation and immune response, chymotrypsins, and the clade including granzymes, pancreatic trypsins, kallikreins, hepatocyte growth factors, and elastases (Fig. 5a). Each of these includes representatives from bilaterians, cnidarians, and at least one placozoan, sponge, or ctenophore and likely represents the suite of trypsin clades present in the last common ancestor of animals. The N. vectensis tryptome includes representatives of 5 of 6 clades likely present in the common ancestor of animals; N. vectensis may have lost representatives of the tryptase/transmembrane clade as this these trypsins appear to be present in M. leidyi, A. digitifera, and bilaterians (Fig. 5a, Additional file 2).
We compared the distribution of conserved domains from different clades of trypsins in N. vectensis and H. sapiens (Fig. 5b). In N. vectensis, domain diversity is greatest among the trypsins that group with human chymotrypsins (N = 14), followed by trypsins in the immune/coagulation group (N = 10), the “pancreatic” group (including granzymes, kallikreins, HGF, and elastase) (N = 5), and intracellular trypsins (N = 2). Trypsins from the non-catalytic clade lack associated domains completely. Four trypsin-associated domains (Sushi, EGF_CA, CUB, and FXa_inhibition) were found in the immune/coagulation clades from both N. vectensis and H. sapiens, the CUB domain was found in chymotrypsins from both taxa, and the PDZ domain is restricted to the intracellular clade of trypsins in both taxa; surprisingly, there were no other domains found in common between N. vectensis and H. sapiens trypsins from the same clade (see Additional file 3 for distribution of human trypsin domain architectures).
To determine whether the tryptome diversity of N. vectensis is reflective of other cnidarians, we built a phylogeny using representatives of each class within Cnidaria (Fig. 6). We identify 16 clades of trypsins that include representatives of at least two lineages of anthozoans and two lineages of medusozoans, suggesting that these clades may have been present in the stem cnidarian. Two clades (the trypsin-MAM and trypsin-ShK clades) seem to have undergone further expansion in anthozoans after their divergence from medusozoans.
The Nematostella tryptome diversified through numerous mechanisms
To understand the mechanisms generating trypsin diversity in N. vectensis, we examined the evolutionary relationships of the 72 trypsin proteins in the tryptome (Fig. 7a). Among the 72 predicted proteins, 85% (61/72) had all three conserved residues constituting the catalytic triad and are likely to function as proteases, 79% (57/72) were predicted to have a signal peptide and are presumably secreted, and 7% (5/72) were predicted to have a transmembrane domain (see Additional file 4). The trypsin superfamily, therefore, exhibits evidence of functional specialization through protein primary structure modification, directing protein localization to specific sub-cellular compartments. Furthermore, 4 of the 5 clades of trypsins from N. vectensis (excluding the intracellular clade) include secreted trypsins, membrane-bound trypsins, and trypsins with divergent sequence that have likely lost their catalytic function, suggesting that spatial and functional specialization has evolved multiple times in different lineages of trypsins.
Numerous trypsins from the “pancreatic” and chymotrypsin clades were associated with ShK domains. Likewise, over 30% (26/82) of the ShK domains in N. vectensis are associated with trypsins (Fig. 4a). To determine if the combination of the trypsin and ShK domains may have duplicated together, we built a phylogeny of all 108 ShK domains from the N. vectensis proteins predicted from gene models (Fig. 7b). Despite the abundance of trypsin-ShK associations, the ShK domains from sister trypsins were almost never monophyletic, suggesting this domain is gained and lost easily. Consistent with this, every ShK domain in the tryptome of N. vectensis was encoded by only a single exon (Additional file 5), supporting the rapid evolution of the tryptome through exon shuffling. Two trypsin-ShK proteins (NVJ_218669 and NVJ_218670) were found to be sister in both phylogenies, suggesting they arose by duplication of the combined domains. These two genes are encoded on the same scaffold and are separated by approximately 1 kb of genomic DNA; thus, they are likely the result of a recent tandem duplication event. The ShK domain is a short peptide found in a K-channel inhibitor originally isolated from the sea anemone Stichodactyla helianthus [17]. What role the ShK domain plays when it is paired with the trypsin domain is not known but the overabundance of these two combined domains in cnidarian tryptomes (Additional file 6) combined with the multiple independent origins of this domain combination in N. vectensis (Fig. 7b) suggests that the pairing provides a strong selective advantage in the biology of cnidarians.
Multidomain proteins are more common than proteins with only a single domain as domain recombination increases versatility in protein function [18]. Selection to maintain the catalytic activity of the trypsin domain while allowing the context in which this domain is expressed to vary was a critical component of diversification in this gene superfamily. In support of this, we found surprisingly little conservation in trypsin-associated domains across animals, even among cnidarians (Fig. 4, Additional file 6), suggesting that the associated domains have been continuously gained and lost in each lineage. Furthermore, nearly 40% (28/72) of the proteins comprising the N. vectensis tryptome have only a trypsin domain (Fig. 7); yet, these trypsin-only proteins did not form a monophyletic group (Figs. 5, 6), suggesting that trypsin domains themselves may be rapidly gained and lost from evolutionarily unrelated proteins. Indeed, trypsin diversification does occur independent of the acquisition of associated domains. One gene from the tryptome of N. vectensis (NVJ_127465) encodes three trypsin domains, all of which form a monophyletic group suggesting this gene structure arose through tandem duplication of the trypsin domain (Fig. 5). The tryptome from H. sapiens also includes two proteins with three trypsin domains each (Additional file 6). While these 6 trypsin domains from H. sapiens are found in the tryptase/transmembrane clade (Additional file 3), the three domains in NVJ_127465 group with chymotrypsins (Fig. 5). Thus, despite their similar domain architecture, triple-trypsin domain proteins appear to have evolved multiple times.
Several other mechanisms contributed to diversification of trypsins in N. vectensis. We identified four cases where sister trypsins are found on the same scaffold (Fig. 7a), suggesting tandem gene duplication. Furthermore, while most (70/72) of the trypsin domains were encoded across multiple exons (Additional file 5), two genes (NVJ_128003 and NVJ_216003) lack introns completely, and likely arose through recent retrotransposition. These two genes are also on the same scaffold, suggesting that retrotransposition may have been followed by tandem gene duplication.
Trypsin diversity increases through new associations with old domains
Gene age can be estimated using a phylostratigraphic approach; in such analyses, the minimum age of a gene is inferred by identifying the last common ancestor in which the gene is present [19, 20]. We examined the age of the trypsins found in N. vectensis and the age of each associated domain across all domains of life to understand the evolution of trypsin diversity. Trypsin-PDZ and a subset of the trypsin-only proteins likely arose before bacteria/archaea split from eukaryotes, over 2 billion years ago (Fig. 8). While trypsin-only proteins are present in every lineage examined, trypsin-PDZ proteins appear to have been lost in several taxa including C. owczarzaki, M. leidyi, A. vanhoeffeni, and C. cruxmelitensis (Fig. 4). All other associations between trypsin and other conserved domains appear to have originated after the stem metazoan diverged from the rest of life (~ 800 million years ago) [21]. Many of the trypsin-associated domains originated long before they became associated with trypsin; for example, the Astacin domain was present in the ancestor of all life but the trypsin- Astacin association likely did not arise until the origin of Cnidaria (Fig. 8a). By contrast, the SRCR domain and its association with trypsin likely arose in the stem metazoan as trypsin-SRCR proteins were found in M. leidyi (Additional file 6).
There is no relationship between the age of the domain and the origin of its association with trypsin (Fig. 8b). Two trypsin associations were found only in N. vectensis: trypsin-DIM (NVJ_199428) and trypsin-WSC (NVJ_105271), and one association was found only in Edwarsiidae (Nematostella + Edwardsiella): trypsin-Lustrin_cystein (NVJ_164017). The WSC domain is present throughout eukaryotes (Fig. 8a) but was associated with trypsin only in N. vectensis. The Lustrin_cystein domain seems to have arisen in the last common ancestor of parahoxozoa (Placozoa + Cnidaria + Bilateria). These two associations represent extreme cases whereby trypsin diversity in N. vectensis arose through acquisition of both young (Lustrin_cystein) and old (WSC) domains.