Kliniken & Institute … Kliniken Zentrum für Innere… Internal Medicine III:… Research Klaus-Tschira-Institut… Software


Please find a complete list of our software projects on GitHub:





a one-stop software solution for circular RNA research


Circular RNAs (circRNAs) originate through back-splicing events from linear primary transcripts, are resistant to exonucleases, typically not polyadenylated, and have been shown to be highly specific for cell type and developmental stage. Although few circular RNA molecules have been shown to exhibit miRNA sponge function, for the vast majority of circRNAs however, their function is yet to be determined.

The prediction of circular RNAs is a multi-stage bioinformatics process starting with raw sequencing data and usually ending with a list of potential circRNA candidates which, depending on tissue and condition may contain hundreds to thousands of potential circRNAs. While there already exist a number of tools for the prediction process (e.g. DCC and CircTest), publicly available downstream analysis tools are rare.

We developed circtools, a modular, Python3-based framework for circRNA-related tools that unifies several functionalities in single command line driven software. The command line follows the circtools subcommand standard that is employed in samtools or bedtools. Currently, circtools includes modules for detecting and reconstructing circRNAs, a quick check of circRNA mapping results, RBP enrichment screenings, circRNA primer design, statistical testing, and an exon usage module.



Cyntenator is a software for identification of conserved syntenic blocks between multiple genomes. The program computes Smith-Waterman alignments of sequences, whereby the alphabet consists of all annotated genes and the scoring system is defined by protein sequence similarities and distances between species in a phylogenetic tree. The algorithm is an extension of the Syntenator partial order aligner, described in Rödelsperger and Dieterich, 2008.


The source code of Cyntenator is available from https://github.com/dieterich-lab/cyntenator

DCC and CircTest


Circular RNAs (circRNAs) are a poorly characterized class of molecules that have been identified decades ago. Emerging high-throughput sequencing methods as well as first reports on confirmed functions have sparked new interest in this RNA species. However, the computational detection and quantification tools are still limited.


We developed the software tandem, DCC and CircTest DCC uses output from the STAR read mapper to systematically detect back-splice junctions in next-generation sequencing data. DCC applies a series of filters and integrates data across replicate sets to arrive at a precise list of circRNA candidates. We assessed the detection performance of DCC on a newly generated mouse brain data set and publicly available sequencing data. Our software achieves a much higher precision than state-of-the-art competitors at similar sensitivity levels. Moreover, DCC estimates circRNA versus host gene expression from counting junction and non-junction reads. These read counts are finally used to test for host gene-independence of circRNA expression across different experimental conditions by our R package CircTest We demonstrate the benefits of this approach on previously reported age-dependent circRNAs in the fruit fly.


The source code of DCC and CircTest is available from https://github.com/dieterich-lab


The program Flexbar preprocesses high-throughput sequencing data efficiently. It demultiplexes barcoded runs and removes adapter sequences. Moreover, trimming and filtering features are provided. Flexbar increases read mapping rates and improves genome as well as transcriptome assemblies. It supports next-generation sequencing data in fasta and fastq format, e.g. from Roche 454 and the Illumina platform.


The source code of Flexbar is available from https://github.com/seqan/flexbar.



FUCHS is a python pipeline desigend to fully characterize circular RNAs. It uses a list of circular RNAs and reads spanning the back-splice junction as well as a BAM file containing the mapping of all reads (alternatively of all chimeric reads).

The reads from one circle are extracted by FUCHS and saved in an individual BAM file. Based on these BAM files, FUCHS will detect alternative splicing within the same circle boundaries, summarize different circular isoforms from the same host-gene and generates coverage plots for each circle. It will also cluster circles based on their coverage profile. These results can be used to identify potential false positive circles.


The source code of FUCHS is licensed under the GNU General Public Licence (GPL) version 3 and available from https://github.com/FUCHS.


RNA editing is a co-transcriptional modification that increases the molecular diversity, alters secondary structure and protein coding sequences by changing the sequence of transcripts. The most common RNA editing modification is the single base substitution (A → I) that is catalyzed by the members of the Adenosine deaminases that act on RNA (ADAR) family. Typically, editing sites are identified as RNA-DNA-differences (RDDs) in a comparison of genome and transcriptome data from next-generation sequencing experiments. However, a method for robust detection of site-specific editing events from replicate RNA-seq data has not been published so far. Even more surprising, condition-specific editing events, which would show up as differences in RNA-RNA comparisons (RRDs) and depend on particular cellular states, are rarely discussed in the literature.

Our software JACUSA detects single nucleotide variants by comparing data from next-generation sequencing experiments (RNA-DNA or RNA-RNA). In practice, JACUSA shows higher recall and comparable precision in detecting A → I sites from RNA-DNA comparisons, while showing higher precision and recall in RNA-RNA comparisons.

The source code of JACUSA is licensed under the GNU General Public Licence (GPL) version 3 and available from https://github.com/JACUSA.

Coming soon: pulseR

RNA metabolic rates can be inferred from the pulse-chase experimental design. In this approach, labelled molecules are incorporated into the nascent RNA, which can be later pulled out from the total RNA pool. This fractions are sequenced using RNAseq. We developed an R package pulseR, which aims to handle such count data derived from the pulse-chase RNAseq experiments.


The source code of pulseR is licensed under the GNU General Public Licence (GPL) version 3 and available from dieterich-lab.github.io/pulseR


Ribosome profiling via high-throughput sequencing (ribo-seq) is a promising new technique for characterizing the occupancy of ribosomes on messenger RNA (mRNA) at base-pair resolution. The ribosome is responsible for translating mRNA into proteins, so information about its occupancy offers a detailed view of ribosome density and position which could be used to discover new translated open reading frames (ORFs), among other things. In this work, we propose RP-BP, an unsupervised Bayesian approach to predict translated ORFs from ribosome profiles. We use state-of-the-art Markov chain Monte Carlo (MCMC) techniques to estimate posterior distributions of the likelihood of translation of each ORF. Hence, an important feature of RP-BP is its ability to incorporate and propagate uncertainty in the prediction process. A second novel contribution is automatic Bayesian selection of read lengths and ribosome P-site offsets (BPPS). We empirically demonstrate that our read length selection technique modestly improves sensitivity by identifying more canonical and non-canonical ORFs. Proteomics- and QTI-seq-based validation verifies the high quality of all of the predictions. Experimental comparison shows that RP-BP results in more peptide identifications and proteomics- validated ORF predictions compared to another recent tool for translation prediction.


The source code of rp-bp is available from https://github.com/rp-bp.

Please find a complete list of our software projects on GitHub: https://github.com/dieterich-lab/