RUbioSeq is a suite of automated and paralelized workflows for the analysis of:
- Single Nucleotide Variants (SNVs)
- Copy Number Variation (CNVs)
- ChIP-seq experiments (ChIPsSeq)
- Bisulfite-seq experiments (BS-Seq)
As RUbioSeq depends on more than 20 different software packages, some of them difficult to install and setup, a customized 64-bit LiveDVD (based on Ubuntu 14.04 Desktop LiveCD) has been created, it bundles RUbioSeq3.7 plus all its dependencies, ready to be used on any computer. You can even install the contents of this customized Ubuntu on any computer, so you have RUbioSeq+Ubuntu installed at once.
GUI supports two types of profiles: a) a basic user profile where users with limited skills in bioinformatics can execute all the NGS analysis tasks provided by the software and b) an administration mode where bioinformaticians and advanced users can manage and configure all the technical parameters of the application. Full documentation is available at http://rubioseq.bioinfo.cnio.es
Quality and Control
Different types of quality and control analyses are done during the execution. Some of them are enabled by the corresponding parameter in the experiment configuration file, and others are automatically performed.
Feature presented in all workflows and enabled by the user. FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.
Bam files validation
Presented in SNVs and CNVs detection workflows. Automatic. Read a SAM or BAM file and report on its validity. This control analysis is performed by ValidateSamFile function (picardTools).
User can select and configurate multiple parameters and filters to apply to the internal workflow programs and data. Parameters are supplied to RUbioSeq using XML configuration files.
Output files organization in directories.
Output files are properly organized in directories, where the user can easily find the files she or he is looking for.
Standard format output files.
All output files and results generated with RUbioSeq are in standard formats, like SAM, BAM, BED, WIG and VCF.
Whole workflow and independent level execution.
User can execute the whole workflow from raw data to the final outputs, or execute/repeat an independent level selected by parameter.
RUbioSeq has been design to execute on a HPC, scheduled by an SGE system, this design allows a parallel multiple sample execution in order to reduce the processing time. It can be also executed on a HPC with PBS system (experimental) and on a standard workstation in sequential mode.
Parallel multiple sample execution.
RUbioSeq uses the HPC's schedulers characteristics to develop a parallel execution design, this parallel feature can be executed in two ways:
- Standalone multisample: All samples will be executed in parallel and there will be an output file per input file.
- Joint multisample: All input data will generate a unique output calls file.