Background

The aim of this study was to characterize the Spen family at the sequence level. The Spen family of proteins participates in various biological processes. It is involved in neuronal cell fate, survival and axonal guidance [13], cell cycle regulation [4], and repression of head identity in the embryonic trunk [5]. More recently it has been shown that the split ends gene participates in Wingless signalling in the eye, wing and leg imaginal discs [6] in the fly. The human Spen protein SHARP (SMRT/HDAC1-associated repressor protein) has been identified as a component of transcriptional repression complexes in both nuclear receptor and Notch/RBP-Jkappa signalling pathways [79]. Therefore, the Spen family of proteins appears to regulate transcription in several signalling pathways. In addition, the human Spen protein RBM15 (RNA-binding motif protein 15) is involved in the recurrent t(1;22) translocation whose product is the RBM15-MKL1 fusion protein. This aberrant protein is related to the megakaryoblastic leukemia 1 (MKL1) [10, 11].

On the other hand, DIO and its homologue PHF3 (PHD finger protein 3) are human proteins that contain a PHD (Plant Homeodomain) finger and a TFIIS (Transcription Factor S-II) domain: both domains are usually associated with transcription [12, 13]. The DIO-1 protein regulates the early stages of cell death in mouse and humans [14, 15]. Experimental evidence shows that the PHF3 protein is ubiquitously expressed in normal tissues, including brain. However, its expression is dramatically reduced or lost in glioblastoma, the most frequent tumour reported in human brain [12, 13]. Although this family has been shown to be involved in apoptosis and cancer, the underlying molecular mechanisms are unclear.

Results

Sequence profiles of the C-terminal conserved region of DIO family found the SPOC domain of Spen family at E-values of 0.083. Reciprocally, the profile of the SPOC domain of Spen detected the DIO family with an E-value of 0.05. We localised new members of the Spen and DIO families in different eukaryote lineages (Figures 1 and 2). Statistically significant E-values connected all the SPOC domain-containing families. None of these HMMer profile searches retrieved any new unrelated sequences and, as stated above, reciprocal searches produced convergent results.

Figure 1
figure 1

Representative multiple alignment of the SPOC domain. The colouring scheme indicates average BLOSUM62 score (correlated to amino acid conservation) in each alignment column: cyan (greater than 3), light red (between 3 and 1.5) and light green (between 1.5 and 0.5). The limits of the domains are indicated by the residue positions on each side. The limits of proteins from partially sequenced genes whose full-length proteins are not available are not shown. X-Ray determined structure of the SPOC domain [9], pdb-code: 1OW1 is shown below the SHARP sequence (Swissprot-ID:MINT_HUMAN). PHD secondary structure prediction [29] for DIO family is shown below the DIO-1 human sequence (Swissprot-Id: DAT1_HUMAN), with E indicating a β strand (in red) and H an α helix (in green). The asterisks below the alignment marks the conserved pair arginine-tyrosine mentioned in the text. The sequences are named with their swissprot or sptrembl identifications, and also, if necessary, with their common species name: Human, Homo sapiens; Frog, Xenopus laevis; Drome, Drosophila melanogaster; Caeel, Caenorhabditis elegans; Arath,Arabidopsis thaliana; Ciona, Ciona intestinalis; Yeast, Saccharomyces cerevisiae; Fish, Brachydanio rerio; Plafal, Plasmodium falciparum; Aspnidu, Aspergillus nidulans; Pinus, Pinus taeda; Glycine, Glycine max; Schpo, Schizosaccharomyces pombe; Triti, Triticum aestivum. The "est" prefix identifies consensus sequences manually reconstructed by assembling highly similar expressed sequence tags from identical species (conceptual translations). The "unf" prefix identifies sequences obtained from Genome BLAST server at NCBI [24]. Complementary information is accessible at: http://www.pdg.cnb.uam.es/SPOC.

Figure 2
figure 2

Schematic representation of the domain architecture and common features in representative members of the SPOC domain contained families. The sequences are named with their swissprot or sptrembl identifications, and also, if necessary, with their common species name. The slashes represent inserts that are not shown. The proteins are drawn approximately to scale. The localization of the other domains PHD, TFIIS, BRK (BRM and KIS domain) and RRM is according to Pfam and SMART families databases [35,36].

For the DIO family of sequences, secondary structure predictions were performed for the SPOC. These predictions showed good agreement with the crystal structure of the SPOC domain of SHARP [9] (Figure 1).

To investigate whether fold recognition analysis generated consistent results, we submitted the SPOC domain of DIO-1 (swissprot-id: DAT1_HUMAN, residues 1093 to 1199) as a query to an independent fold assignment system (see methods). The template 1OW1 (the SPOC domain of SHARP protein) was found with a Z-score of -12.2 (estimated error rate <1%) despite its low sequence homology (16%).

Considering the E-values of the HMMer searches, the reliability of secondary structure predictions, and the fold assignment results, we are confident that the SPOC domain is present in the DIO family of proteins.

To highlight the degree of fold-conservation, we generated a structural model (Fig. 3B) of the SPOC domain of DIO. In the sequence alignment showed in figure 1, the C-terminal region of SPOC is missing. This region includes: two small helices (named E and F), which do not form part of the core and are not well conserved within the SHARP family, and the β sheet 7, which is part of the β-barrel core. The high sequence divergence in these region, made impossible to extend the alignment for automatic methods. However, for modelling purposes, the alignment was carefully extended to the C-terminal region and a beta sheet was detected in DIO, while the two helices were missing. Therefore, SPOC domain of DIO adopts a similar fold than the SPOC domain of SHARP and the seven strands β-barrel core is maintained (Figure 3).

Figure 3
figure 3

Comparison of SPOC domains from SHARP and DIO proteins. (A) Ribbon representation and electrostatic surface potential map of structure of the SPOC domain of SHARP protein (PDB code: 1OW1). (B) Ribbon representation and electrostatic surface potential map of homology model of the SPOC domain of DIO protein. Blue indicates positively charged regions, whereas red shows negatively charged regions. All the molecules are in the same orientation. Dotted circles indicates the conserved basic cluster, where is located the conserved arginine-tyrosine pair.

Discussion

Reported as a protein-protein interaction domain, the structure of the SPOC domain of SHARP contains a basic cluster essential for interaction with SMRT (silencing mediator for retinoid and thyroid receptors) the co-repressor [9]. The Arg 3552 of SHARP is localised in this basic cluster. Interestingly, this arginine is fully conserved in the SPOC alignment (Figure 1 and 3). The full conservation of Arg 3552 may be important, especially when equivalent substitutions in the protein-protein interaction interface (e.g. Lysine) might have little effect on the binding capability of this cluster. Therefore, one explanation for this fully conserved arginine could be its specific post-translational modification. Arginine methylation is a common post-translational modification in transcription regulation proteins that are catalysed by type I and II protein arginine methyltransferases [16]. Arginine methylation modulates transcriptional activity and it has recently been related with a wide range of cellular processes [17].

For instance, proteins like the coactivator of nuclear receptors CBP (CREB-binding protein) follow this schema. In this protein there is a methylation site at Arg 600. This residue is essential for stabilising the structure of the domain implicated in CREB recruitment. There is a critical interaction between Arg 600 and Tyr 640 [18]. Disruption of this interaction by arginine methylation could lead to conformational changes [19]. Analogously, a similar interaction is observed on the surface of SPOC between the conserved residues Arg 3552 and Tyr 3602 (Figure 3) [9].

As arginine is one of the aminoacids most frequently found at the active sites of enzymes [20], alternative functional hypothesis for this conserved arginine is that it might form part of an active site of unknown function in the SPOC domain, or contribute to a specific catalytic function in other proteins, analogous to the "Arginine Finger" in Ras-GAP proteins [21].

Conclusions

The SPOC domain is present in different domain architectures among all the eukaryote lineages (Figures 1 and 2). This study shows that, with the exponential growth of the sequence databases, sequence analysis sheds new light on biological function, even when structure is already available. The fact that this domain has been identified in cancer and apoptosis related proteins emphasises its importance in transcriptional regulation. Additional experimental approaches using different members of the SPOC domain-containing families are required to confirm these hypotheses.

Methods

Sequence analyses

For the sequence analysis we related distant protein families via intermediate searches [22] using global hidden Markov model profiles (using hmmsearch of HMMer http://hmmer.wustl.edu/) [23]. To improve the profile quality we followed two approaches: first, BLAST searches against unfinished genomes [24], and secondly, additional searches against EST (expressed sequence tags) databases [25]. This sequence enrichment improved the quality of the profile that was used to perform the searches against the non-redundant protein databases. We used NAIL to view and analyse the HMMer results [26]. The alignment was produced with HMMer [23] and T-Coffee software [27] using default parameters and was slightly refined manually. It is viewed with the Belvu program [28].

Structural predictions and modeling

Secondary structure predictions were performed using PHD [29]. Fold recognition analyses were performed using the FFAS [30] server http://ffas.ljcrf.edu/. The model was based on the published crystal structure from the SPOC domain of SHARP protein [9] and obtained using swiss-model [31]. The model was evaluated using PSQS [32] and WHATIF [33] tools. Illustrations were generated with MOLMOL [34].

Author's contribution

LSP and AR carried out the sequence and structural analysis of the domain.

KVW and CMA provided with the initial input of the research.

LSP, AR, KVW, CMA and AV authored the manuscript.