This Provisional PDF corresponds to the article as it appeared upon acceptance. Copyedited and fu... more This Provisional PDF corresponds to the article as it appeared upon acceptance. Copyedited and fully formatted PDF and full text (HTML) versions will be made available soon. DNA methylome profiling of human tissues identifies global and tissue-specific methylation patterns
g:Profiler (https://biit.cs.ut.ee/gprofiler) is a widely used gene list functional profiling and ... more g:Profiler (https://biit.cs.ut.ee/gprofiler) is a widely used gene list functional profiling and namespace conversion toolset that has been contributing to reproducible biological data analysis already since 2007. Here we introduce the accompanying R package, gprofiler2, developed to facilitate programmatic access to g:Profiler computations and databases via REST API. The gprofiler2 package provides an easy-to-use functionality that enables researchers to incorporate functional enrichment analysis into automated analysis pipelines written in R. The package also implements interactive visualisation methods to help to interpret the enrichment results and to illustrate them for publications. In addition, gprofiler2 gives access to the versatile gene/protein identifier conversion functionality in g:Profiler enabling to map between hundreds of different identifier types or orthologous species. The gprofiler2 package is freely available at the CRAN repository.
There are many situations in which information has a hierarchical or nested structure like that f... more There are many situations in which information has a hierarchical or nested structure like that found in family trees or organization charts. The abstraction that models hierarchical structure is called a tree and this data model is among the most fundamental in computer science. It is the model that underlies several programming languages, including Lisp. Trees of various types appear in many of the chapters of this book. For instance , in Section 1.3 we saw how directories and files in some computer systems are organized into a tree structure. In Section 2.8 we used trees to show how lists are split recursively and then recombined in the merge sort algorithm. In Section 3.7 we used trees to illustrate how simple statements in a program can be combined to form progressively more complex statements. The following themes form the major topics of this chapter: 3 The terms and concepts related to trees (Section 5.2). 3 The basic data structures used to represent trees in programs (Sect...
AIM To develop a web tool for survival analysis based on CpG methylation patterns. MATERIALS & ME... more AIM To develop a web tool for survival analysis based on CpG methylation patterns. MATERIALS & METHODS We utilized methylome data from 'The Cancer Genome Atlas' and used the Cox proportional-hazards model to develop an interactive web interface for survival analysis. RESULTS MethSurv enables survival analysis for a CpG located in or around the proximity of a query gene. For further mining, cluster analysis for a query gene to associate methylation patterns with clinical characteristics and browsing of top biomarkers for each cancer type are provided. MethSurv includes 7358 methylomes from 25 different human cancers. CONCLUSION The MethSurv tool is a valuable platform for the researchers without programming skills to perform the initial assessment of methylation-based cancer biomarkers.
We performed a systematic analysis of gene upstream regions in the yeast genome for occurrences o... more We performed a systematic analysis of gene upstream regions in the yeast genome for occurrences of regular expression-type patterns with the goal of identifying potential regulatory elements. To achieve this goal, we have developed a new sequence pattern discovery algorithm that searches exhaustively for a priori unknown regular expression-type patterns that are over-represented in a given set of sequences. We applied the algorithm in two cases, (1) discovery of patterns in the complete set of >6000 sequences taken upstream of the putative yeast genes and (2) discovery of patterns in the regions upstream of the genes with similar expression profiles. In the first case, we looked for patterns that occur more frequently in the gene upstream regions than in the genome overall. In the second case, first we clustered the upstream regions of all the genes by similarity of their expression profiles on the basis of publicly available gene expression data and then looked for sequence patt...
High titer autoantibodies produced by B lymphocytes are clinically important features of many com... more High titer autoantibodies produced by B lymphocytes are clinically important features of many common autoimmune diseases. APECED patients with deficient autoimmune regulator (AIRE) gene collectively display a broad repertoire of high titer autoantibodies, including some which are pathognomonic for major autoimmune diseases. AIRE deficiency severely reduces thymic expression of gene-products ordinarily restricted to discrete peripheral tissues, and developing T cells reactive to those gene-products are not inactivated during their development. However, the extent of the autoantibody repertoire in APECED and its relation to thymic expression of self-antigens are unclear. We here undertook a broad protein array approach to assess autoantibody repertoire in APECED patients. Our results show that in addition to shared autoantigen reactivities, APECED patients display high inter-individual variation in their autoantigen profiles, which collectively are enriched in evolutionarily conserved...
The inner uterine lining (endometrium) is a unique tissue going through remarkable changes each m... more The inner uterine lining (endometrium) is a unique tissue going through remarkable changes each menstrual cycle. Endometrium has its characteristic DNA methylation profile, although not much is known about the endometrial methylome changes throughout the menstrual cycle. The impact of methylome changes on gene expression and thereby on the function of the tissue, including establishing receptivity to implanting embryo, is also unclear. Therefore, this study used genome-wide technologies to characterize the methylome and the correlation between DNA methylation and gene expression in endometrial biopsies collected from 17 healthy fertile-aged women from pre-receptive and receptive phase within one menstrual cycle. Our study showed that the overall methylome remains relatively stable during this stage of the menstrual cycle, with small-scale changes affecting 5% of the studied CpG sites (22,272 out of studied 437,022 CpGs, FDR < 0.05). Of differentially methylated CpG sites with the...
We performed a systematic analysis of gene upstream regions in the yeast genome for occurrences o... more We performed a systematic analysis of gene upstream regions in the yeast genome for occurrences of regular expression-type patterns with the goal of identifying potential regulatory elements. To achieve this goal, we have developed a new sequence pattern discovery algorithm that searches exhaustively for a priori unknown regular expression-type patterns that are over-represented in a given set of sequences. We applied the algorithm in two cases, (1) discovery of patterns in the complete set of >6000 sequences taken upstream of the putative yeast genes and (2) discovery of patterns in the regions upstream of the genes with similar expression profiles. In the first case, we looked for patterns that occur more frequently in the gene upstream regions than in the genome overall. In the second case, first we clustered the upstream regions of all the genes by similarity of their expression profiles on the basis of publicly available gene expression data and then looked for sequence patt...
Proceedings International Conference on Intelligent Systems For Molecular Biology Ismb International Conference on Intelligent Systems For Molecular Biology, Feb 1, 2000
We have developed a set of methods and tools for automatic discovery of putative regulatory signa... more We have developed a set of methods and tools for automatic discovery of putative regulatory signals in genome sequences. The analysis pipeline consists of gene expression data clustering, sequence pattern discovery from upstream sequences of genes, a control experiment for pattern significance threshold limit detection, selection of interesting patterns, grouping of these patterns, representing the pattern groups in a concise form and evaluating the discovered putative signals against existing databases of regulatory signals. The pattern discovery is computationally the most expensive and crucial step. Our tool performs a rapid exhaustive search for a priori unknown statistically significant sequence patterns of unrestricted length. The statistical significance is determined for a set of sequences in each cluster with respect to a set of background sequences allowing the detection of subtle regulatory signals specific for each cluster. The potentially large number of significant patterns is reduced to a small number of groups by clustering them by mutual similarity. Automatically derived consensus patterns of these groups represent the results in a comprehensive way for a human investigator. We have performed a systematic analysis for the yeast Saccharomyces cerevisiae. We created a large number of independent clusterings of expression data simultaneously assessing the &quot;goodness&quot; of each cluster. For each of the over 52,000 clusters acquired in this way we discovered significant patterns in the upstream sequences of respective genes. We selected nearly 1,500 significant patterns by formal criteria and matched them against the experimentally mapped transcription factor binding sites in the SCPD database. We clustered the 1,500 patterns to 62 groups for which we derived automatically alignments and consensus patterns. Of these 62 groups 48 had patterns that have matching sites in SCPD database.
Proceedings of the 7th International Symposium on Algorithms and Computation, 1996
)Alvis Br?¥azma1Esko Ukkonen2Jaak Vilo21Institute of Mathematics and Computer Science, University... more )Alvis Br?¥azma1Esko Ukkonen2Jaak Vilo21Institute of Mathematics and Computer Science, University of Latvia29 Rainis Bulevard, LV-1459 Riga, Latviaabra@cclu.lv2Department of Computer Science, University of HelsinkiP.O.Box 26, FIN-00014 University of Helsinki, Finlandukkonen,vilo@cs.helsinki.fiAbstract. The problem of learning unions of certain pattern languagesfrom positive examples is considered. We restrict to the regular patterns,i.e., patterns where each variable...
The papers in the series are intended for internal use and are distributed by the author. Copies ... more The papers in the series are intended for internal use and are distributed by the author. Copies may be ordered from the library of Department of Computer Science. Abstract. We consider a problem of learning of unions of pattern languages from positive examples. We consider three diierent classes of patterns-regular patterns, substring patterns and the so called PROSITE patterns. By regular patterns we understand patterns where each variable symbol can appear only once. By substring patterns we understand a subclass of regular patterns of the type xxy, where x and y are variables and is a string of constant symbols. The PROSITE patterns is a class of patterns used for classiication of bio-sequences in PROSITE database. We present an algorithm which, given a set of sequences, nds a `good' collection of patterns`covering' this set. The notion of a `good covering' is deened as the most probable collection of patterns likely to produce the examples in some simple and natural probabilistic model. We show that this criterion is equivalent to the so called Minimum Description Length (MDL) principle. We present a polynomial-time algorithm for approximating the optimal cover within a logarithmic factor and prove its performance guarantees. In the case of substring patterns the running time of the algorithm is almost linear.
... (Extended Abstract) Alvis Br azma ... 3] A. Go eau, BG Barrell, H. Bussey, RW Davis, B. Dujon... more ... (Extended Abstract) Alvis Br azma ... 3] A. Go eau, BG Barrell, H. Bussey, RW Davis, B. Dujon, H. Feldmann, F. Galibert, JD Hoheisel, C. Jacq, M. Johnston, EJ Louis, HW Mewes, Y. Murakami, P. Philippsen, H. Tettelin, and SG Oliver. Life with 6000 genes. ...
This Provisional PDF corresponds to the article as it appeared upon acceptance. Copyedited and fu... more This Provisional PDF corresponds to the article as it appeared upon acceptance. Copyedited and fully formatted PDF and full text (HTML) versions will be made available soon. DNA methylome profiling of human tissues identifies global and tissue-specific methylation patterns
g:Profiler (https://biit.cs.ut.ee/gprofiler) is a widely used gene list functional profiling and ... more g:Profiler (https://biit.cs.ut.ee/gprofiler) is a widely used gene list functional profiling and namespace conversion toolset that has been contributing to reproducible biological data analysis already since 2007. Here we introduce the accompanying R package, gprofiler2, developed to facilitate programmatic access to g:Profiler computations and databases via REST API. The gprofiler2 package provides an easy-to-use functionality that enables researchers to incorporate functional enrichment analysis into automated analysis pipelines written in R. The package also implements interactive visualisation methods to help to interpret the enrichment results and to illustrate them for publications. In addition, gprofiler2 gives access to the versatile gene/protein identifier conversion functionality in g:Profiler enabling to map between hundreds of different identifier types or orthologous species. The gprofiler2 package is freely available at the CRAN repository.
There are many situations in which information has a hierarchical or nested structure like that f... more There are many situations in which information has a hierarchical or nested structure like that found in family trees or organization charts. The abstraction that models hierarchical structure is called a tree and this data model is among the most fundamental in computer science. It is the model that underlies several programming languages, including Lisp. Trees of various types appear in many of the chapters of this book. For instance , in Section 1.3 we saw how directories and files in some computer systems are organized into a tree structure. In Section 2.8 we used trees to show how lists are split recursively and then recombined in the merge sort algorithm. In Section 3.7 we used trees to illustrate how simple statements in a program can be combined to form progressively more complex statements. The following themes form the major topics of this chapter: 3 The terms and concepts related to trees (Section 5.2). 3 The basic data structures used to represent trees in programs (Sect...
AIM To develop a web tool for survival analysis based on CpG methylation patterns. MATERIALS & ME... more AIM To develop a web tool for survival analysis based on CpG methylation patterns. MATERIALS & METHODS We utilized methylome data from 'The Cancer Genome Atlas' and used the Cox proportional-hazards model to develop an interactive web interface for survival analysis. RESULTS MethSurv enables survival analysis for a CpG located in or around the proximity of a query gene. For further mining, cluster analysis for a query gene to associate methylation patterns with clinical characteristics and browsing of top biomarkers for each cancer type are provided. MethSurv includes 7358 methylomes from 25 different human cancers. CONCLUSION The MethSurv tool is a valuable platform for the researchers without programming skills to perform the initial assessment of methylation-based cancer biomarkers.
We performed a systematic analysis of gene upstream regions in the yeast genome for occurrences o... more We performed a systematic analysis of gene upstream regions in the yeast genome for occurrences of regular expression-type patterns with the goal of identifying potential regulatory elements. To achieve this goal, we have developed a new sequence pattern discovery algorithm that searches exhaustively for a priori unknown regular expression-type patterns that are over-represented in a given set of sequences. We applied the algorithm in two cases, (1) discovery of patterns in the complete set of >6000 sequences taken upstream of the putative yeast genes and (2) discovery of patterns in the regions upstream of the genes with similar expression profiles. In the first case, we looked for patterns that occur more frequently in the gene upstream regions than in the genome overall. In the second case, first we clustered the upstream regions of all the genes by similarity of their expression profiles on the basis of publicly available gene expression data and then looked for sequence patt...
High titer autoantibodies produced by B lymphocytes are clinically important features of many com... more High titer autoantibodies produced by B lymphocytes are clinically important features of many common autoimmune diseases. APECED patients with deficient autoimmune regulator (AIRE) gene collectively display a broad repertoire of high titer autoantibodies, including some which are pathognomonic for major autoimmune diseases. AIRE deficiency severely reduces thymic expression of gene-products ordinarily restricted to discrete peripheral tissues, and developing T cells reactive to those gene-products are not inactivated during their development. However, the extent of the autoantibody repertoire in APECED and its relation to thymic expression of self-antigens are unclear. We here undertook a broad protein array approach to assess autoantibody repertoire in APECED patients. Our results show that in addition to shared autoantigen reactivities, APECED patients display high inter-individual variation in their autoantigen profiles, which collectively are enriched in evolutionarily conserved...
The inner uterine lining (endometrium) is a unique tissue going through remarkable changes each m... more The inner uterine lining (endometrium) is a unique tissue going through remarkable changes each menstrual cycle. Endometrium has its characteristic DNA methylation profile, although not much is known about the endometrial methylome changes throughout the menstrual cycle. The impact of methylome changes on gene expression and thereby on the function of the tissue, including establishing receptivity to implanting embryo, is also unclear. Therefore, this study used genome-wide technologies to characterize the methylome and the correlation between DNA methylation and gene expression in endometrial biopsies collected from 17 healthy fertile-aged women from pre-receptive and receptive phase within one menstrual cycle. Our study showed that the overall methylome remains relatively stable during this stage of the menstrual cycle, with small-scale changes affecting 5% of the studied CpG sites (22,272 out of studied 437,022 CpGs, FDR < 0.05). Of differentially methylated CpG sites with the...
We performed a systematic analysis of gene upstream regions in the yeast genome for occurrences o... more We performed a systematic analysis of gene upstream regions in the yeast genome for occurrences of regular expression-type patterns with the goal of identifying potential regulatory elements. To achieve this goal, we have developed a new sequence pattern discovery algorithm that searches exhaustively for a priori unknown regular expression-type patterns that are over-represented in a given set of sequences. We applied the algorithm in two cases, (1) discovery of patterns in the complete set of >6000 sequences taken upstream of the putative yeast genes and (2) discovery of patterns in the regions upstream of the genes with similar expression profiles. In the first case, we looked for patterns that occur more frequently in the gene upstream regions than in the genome overall. In the second case, first we clustered the upstream regions of all the genes by similarity of their expression profiles on the basis of publicly available gene expression data and then looked for sequence patt...
Proceedings International Conference on Intelligent Systems For Molecular Biology Ismb International Conference on Intelligent Systems For Molecular Biology, Feb 1, 2000
We have developed a set of methods and tools for automatic discovery of putative regulatory signa... more We have developed a set of methods and tools for automatic discovery of putative regulatory signals in genome sequences. The analysis pipeline consists of gene expression data clustering, sequence pattern discovery from upstream sequences of genes, a control experiment for pattern significance threshold limit detection, selection of interesting patterns, grouping of these patterns, representing the pattern groups in a concise form and evaluating the discovered putative signals against existing databases of regulatory signals. The pattern discovery is computationally the most expensive and crucial step. Our tool performs a rapid exhaustive search for a priori unknown statistically significant sequence patterns of unrestricted length. The statistical significance is determined for a set of sequences in each cluster with respect to a set of background sequences allowing the detection of subtle regulatory signals specific for each cluster. The potentially large number of significant patterns is reduced to a small number of groups by clustering them by mutual similarity. Automatically derived consensus patterns of these groups represent the results in a comprehensive way for a human investigator. We have performed a systematic analysis for the yeast Saccharomyces cerevisiae. We created a large number of independent clusterings of expression data simultaneously assessing the &quot;goodness&quot; of each cluster. For each of the over 52,000 clusters acquired in this way we discovered significant patterns in the upstream sequences of respective genes. We selected nearly 1,500 significant patterns by formal criteria and matched them against the experimentally mapped transcription factor binding sites in the SCPD database. We clustered the 1,500 patterns to 62 groups for which we derived automatically alignments and consensus patterns. Of these 62 groups 48 had patterns that have matching sites in SCPD database.
Proceedings of the 7th International Symposium on Algorithms and Computation, 1996
)Alvis Br?¥azma1Esko Ukkonen2Jaak Vilo21Institute of Mathematics and Computer Science, University... more )Alvis Br?¥azma1Esko Ukkonen2Jaak Vilo21Institute of Mathematics and Computer Science, University of Latvia29 Rainis Bulevard, LV-1459 Riga, Latviaabra@cclu.lv2Department of Computer Science, University of HelsinkiP.O.Box 26, FIN-00014 University of Helsinki, Finlandukkonen,vilo@cs.helsinki.fiAbstract. The problem of learning unions of certain pattern languagesfrom positive examples is considered. We restrict to the regular patterns,i.e., patterns where each variable...
The papers in the series are intended for internal use and are distributed by the author. Copies ... more The papers in the series are intended for internal use and are distributed by the author. Copies may be ordered from the library of Department of Computer Science. Abstract. We consider a problem of learning of unions of pattern languages from positive examples. We consider three diierent classes of patterns-regular patterns, substring patterns and the so called PROSITE patterns. By regular patterns we understand patterns where each variable symbol can appear only once. By substring patterns we understand a subclass of regular patterns of the type xxy, where x and y are variables and is a string of constant symbols. The PROSITE patterns is a class of patterns used for classiication of bio-sequences in PROSITE database. We present an algorithm which, given a set of sequences, nds a `good' collection of patterns`covering' this set. The notion of a `good covering' is deened as the most probable collection of patterns likely to produce the examples in some simple and natural probabilistic model. We show that this criterion is equivalent to the so called Minimum Description Length (MDL) principle. We present a polynomial-time algorithm for approximating the optimal cover within a logarithmic factor and prove its performance guarantees. In the case of substring patterns the running time of the algorithm is almost linear.
... (Extended Abstract) Alvis Br azma ... 3] A. Go eau, BG Barrell, H. Bussey, RW Davis, B. Dujon... more ... (Extended Abstract) Alvis Br azma ... 3] A. Go eau, BG Barrell, H. Bussey, RW Davis, B. Dujon, H. Feldmann, F. Galibert, JD Hoheisel, C. Jacq, M. Johnston, EJ Louis, HW Mewes, Y. Murakami, P. Philippsen, H. Tettelin, and SG Oliver. Life with 6000 genes. ...
Uploads
Papers by Jaak Vilo