Protein Similarity Networks Reveal Relationships Among Sequence, Structure, and Function within the Cupin Superfamily
Chemistry and Biochemistry
The cupin superfamily is extremely diverse and includes catalytically inactive seed storage proteins, sugar-binding metal-independent epimerases, and metal-dependent enzymes possessing dioxygenase, decarboxylase, and other activities. Although numerous proteins of this superfamily have been structurally characterized, the functions of many of them have not been experimentally determined. We report the first use of protein similarity networks (PSNs) to visualize trends of sequence and structure in order to make functional inferences in this remarkably diverse superfamily. PSNs provide a way to visualize relatedness of structure and sequence among a given set of proteins. Structure- and sequence-based clustering of cupin members reflects functional clustering. Networks based only on cupin domains and networks based on the whole proteins provide complementary information. Domain-clustering supports phylogenetic conclusions that the N- and C-terminal domains of bicupin proteins evolved independently. Interestingly, although many functionally similar enzymatic cupin members bind the same active site metal ion, the structure and sequence clustering does not correlate with the identity of the bound metal. It is anticipated that the application of PSNs to this superfamily will inform experimental work and influence the functional annotation of databases.
Digital Object Identifier (DOI)