Research Klaus-Tschira-Institut… Bioinformatics and…

Klaus-Tschira-Institute for Computational Cardiology

Bioinformatics & Systemcardiology

The Klaus Tschira Institute for Integrative Cardiology is active in three thematic areas. First: RNA maturation and processing. In particular, the development and physiology of the heart require strict control of RNA biology. Our laboratory has succeeded in publishing numerous software solutions for the investigation of the complex RNA world. Secondly, we have established the field of systems cardiology for in vitro and in vivo models of heart failure. Thirdly, we build a bridge into the field of clinical data science through the HiGHmed Consortium, as part of the Medical Informatics Initiative. At this point, our AI work in the field of unstructured German texts from cardiological settings should be mentioned in particular.

The Klaus Tschira Institute for Computational Cardiology was founded in September 2015 with the support of the Klaus Tschira Foundation and is directed by Prof. Dr. Christoph Dieterich. In bioinformatics, we deal with the processing of genetic information from DNA to proteins. This has often been seen as a straightforward process, in which RNA is only an intermediate product. However, this picture does not do justice to the role of RNA. RNA is rather an interactive and dynamic information carrier that fulfils a variety of functions. The stability and translational efficiency of RNA are controlled by its secondary structure as well as by interactions with RNA binding proteins and non-coding RNAs such as microRNA or lncRNA. Co- and post-transcriptional processes, such as RNA modifications, can also alter RNA molecules at the base pair level and thus influence the final protein sequence even after transcription. With the rediscovery of the class of circular RNAs (circRNAs), another, still largely unexplored group of RNA molecules has also found its way into the realm of non-coding RNAs. The interplay of all these parts in a large interaction network is now known as post-transcriptional gene regulation and controls numerous processes in our cells. Classically, specific questions or observations from RNA biomedicine are at the beginning of our work.

A possible question would be, for example: "Heart muscle cells grow both through fitness training and through pathological influences, for example high blood pressure. But why do the long-term effects differ significantly at the molecular and medical level?

 

As a rule, we develop hypotheses together with our experimental partners, which we then test using established bioinformatics and statistical methods as well as self-developed software and workflows. Newly developed software tools are made available to the scientific community in an open-source format and are continuously further developed.

For example, the research group has developed software that is able to recognise modified RNA base pairs from sequencing data (Piechotta et. al, 2017). Other specialised software solutions for RNA splicing are Baltica and especially for circular RNA circtools (Jakobi et al., 2019). The software has been implemented to cover the entire workflow from quality analysis of raw data, detection and reconstruction of circular RNAs, to the design of molecular genetic primer sequences for validation experiments.

The stability of RNA is a critical factor for many regulatory functions. In many cases, the availability of RNA templates is rapidly regulated by decay processes or synthesis depending on the context. With PulseR and further theoretical work, the research group has developed a tool for the analysis of RNA metabolic kinetics from RNA sequencing data (Uvarovskii et al., 2019).

In other cases, however, it is important to know which RNAs are actually translated into proteins and how the translation of the proteins compares to the transcription of the RNA. Ribosome profiling using high-throughput sequencing (Ribo-seq) is a promising new technique for characterising ribosome distribution on RNA with base-pair resolution. The ribosome is responsible for the translation of mRNA into proteins, so that information on its occupancy provides a detailed view of ribosome density and position, which could be used, among other things, to discover new translated open reading frames (ORFs). A Bayesian approach to predict ORFs from ribosome profiles has been implemented in the software Rp-Bp (Malone et al., 2017).

 

Quantitative system cardiology is characterised by immense amounts of data that are no longer manageable on ordinary workstation computers. For this purpose, the research group maintains its own network of high-performance computers, which are capable of analysing even extensive experimental data sets in a short time. The computer cluster currently consists of 26 dedicated computing nodes with a main memory of up to one terabyte, which is needed for genome assembly or parallel analysis of large OMICS data sets, for example. In addition, the computer cluster has been equipped with a dedicated server that accommodates NVIDIA GPUs (Graphics Processing Units). This special hardware is derived from 3D graphics cards for computer games, which have become increasingly powerful in recent years and are predestined to process machine learning and artificial intelligence (AI) tasks due to their highly parallel architecture. The special system is used for a variety of tasks ranging from extracting the sequence of base pairs from raw sequencing data, text mining in medical documents to the analysis of patient genomes from molecular genetic data.

Open source software tools for the scientific community

As a rule, we develop hypotheses together with our experimental partners, which we then test using established bioinformatics and statistical methods as well as self-developed software and workflows. Newly developed software tools are made available to the scientific community in an open-source format and are continuously further developed.

For example, the research group has developed software that is able to recognise modified RNA base pairs from sequencing data (Piechotta et. al, 2017). Other specialised software solutions for RNA splicing are Baltica and especially for circular RNA circtools (Jakobi et al., 2019). The software has been implemented to cover the entire workflow from quality analysis of raw data, detection and reconstruction of circular RNAs, to the design of molecular genetic primer sequences for validation experiments.

The stability of RNA is a critical factor for many regulatory functions. In many cases, the availability of RNA templates is rapidly regulated by decay processes or synthesis depending on the context. With PulseR and further theoretical work, the research group has developed a tool for the analysis of RNA metabolic kinetics from RNA sequencing data (Uvarovskii et al., 2019).

In other cases, however, it is important to know which RNAs are actually translated into proteins and how the translation of the proteins compares to the transcription of the RNA. Ribosome profiling using high-throughput sequencing (Ribo-seq) is a promising new technique for characterising ribosome distribution on RNA with base-pair resolution. The ribosome is responsible for the translation of mRNA into proteins, so that information on its occupancy provides a detailed view of ribosome density and position, which could be used, among other things, to discover new translated open reading frames (ORFs). A Bayesian approach to predict ORFs from ribosome profiles has been implemented in the software Rp-Bp (Malone et al., 2017).

In clinical practice, large amounts of data from a wide variety of areas are routinely generated. Our software, the Medical Data Explorer (MedEx) (Kindermann et al., 2019), is an intuitive, web-based solution with options for easy data import. We combine a modern dynamic web interface with an in-memory database solution for near real-time responsiveness. MedEx offers various visualisation options to provide a simple overview of the loaded data, to generate hypotheses and perform elementary analyses. In medicine, much treatment-relevant information is still recorded in the form of unstructured texts in German. A typical example is the doctor's letter, which is intended as a transfer document for communication between doctors. Our project MIEdeep (Medical Information Extraction using deep learning) aims to make this data source usable for the extraction of information. For this purpose, innovative approaches from the fields of deep neural networks (deep learning) and machine speech processing (NLP) are used. We combine approaches of machine learning for data preparation, creation of training data and information extraction with a modern graphical user interface suitable for use in a clinical environment.

Gruppenbild. Von links nach rechts: Tami Liebfried, Etienne Boileau Isabel Naarmann-de Vries, Magdalena Smieszek, Christoph Dieterich, Qi Wang, Phillip Richter-Pechanski, Thiago Britto Borges, Aljoscha Kindermann, Tobias Jakobi (Foto: Tobias Jakobi)
From left to right: Maja Bencun, Aljoscha Kindermann, Thiago Britto Borges, Isabel Naarmann-de Vries, Etienne Boileau, Tami Liebfried, Jessica Eschenbach, Magdalena Smieszek, Christoph Dieterich, Qi Wang, Phillip Richter-Pechanski, Qi Wang, Tobias Jakobi (Foto: Tobias Jakobi)
EN