class:inverse middle center # Running MultiQC at OSC ---- <br> <br> <br> ### Jelmer Poelstra, MCIC Wooster ### 2021/02/12 (updated: 2021-02-11) --- ## MultiQC MultiQC is a program that *aggregates and summarizes QC log files* into a single nicely formatted HTML page. It is most often used to summarize FastQC results, but can also be used to summarize STAR and featureCounts output logs, as we'll do later. <br> <p align="center"> <img src=img/multiqc_front.png width="90%"> </p> --- ## MultiQC - For instance, this figure shows mean quality scores across 132 samples: <p align="center"> <img src=img/multiqc_qual.png width="100%"> </p> --- ## MultiQC and Conda at OSC MultiQC is *not* available as a module at OSC (i.e., no `module load`). The best way to use it is by installing it for yourself **with the software manager Conda**, which: - Creates *environments* with one (best-practice) or more software packages. - Lots of bioinformatics software is available as a Conda package. - Handles dependencies but does not require admin rights. - Environments are activated and deactivated like with the `module` system. --- ## Conda at OSC - To use Conda at OSC, you first need to load a module: ```sh $ module load python/3.6-conda5.2 ``` .content-box-info[ Like with other software, this module needs to be loaded every time you want to use Conda. ] --- ## One-time installation of MultiQC <br> into a Conda environment - Now, we are ready to create a new Conda environment with Python: ```sh $ conda create -y -n multiqc-env python=3.7 ``` - `create` is the Conda command to create a new environment. - `-y` avoids a confirmation prompt. - `-n multiqc-env` is the name for the new environment, which is arbitrary. - `python=3.7` is how specify that we want to install Python and specifically, Python version 3.7. --- ## One-time installation of MultiQC <br> into a Conda environment - Next, we *activate* the environment and install MultiQC into it: ```sh $ source activate multiqc-env $ conda install -y -c bioconda -c conda-forge multiqc ``` - `-c bioconda -c conda-forge` is how we indicate which so-called *channels*, i.e. repositories, we want to use. (These sorts of details are provided in software installation instructions.) --- ## Running MultiQC from now on - Now that we have MultiQC installed into a Conda environment at OSC, we just need to take the following steps whenever we want to use it: ```sh $ module load python/3.6-conda5.2 $ source activate multiqc-env ``` - Having done that, you should be able to run MultiQC: ```sh $ multiqc --help ``` <br> .content-box-info[ If you didn't manage to create your own conda environment, you can also use mine: ```sh $ source activate /users/PAS0471/jelmer/.conda/envs/multiqc ``` ] --- ## Running MultiQC Actually running MultiQC is straightforward: ```sh # This will put the MultiQC report in the working dir: $ multiqc <dir-with-fastqc-reports> # You can also specify the output dir for the MultiQC report: $ multiqc <dir-with-fastqc-reports> -o <output-dir> ``` <br> -- For example, to start an interactive job and run MultiQC there: ```sh $ cd /fs/project/PAS0471/teach/misc/2021-02_rnaseq/$USER # Start an interactive job at OSC: $ sinteractive -A PAS0471 # No time specified: will be 30 minutes $ module load python/3.6-conda5.2 # Load the conda module $ source activate multiqc # Activate the MultiQC env. $ multiqc results/QC_fastq/ -o results/QC_fastq/ # Run MultiQC $ ls -lh results/QC_fastq/multiqc_report.html # Main output file ``` --- ## MultiQC output <br> .content-box-diy[ Let's take a look at the output file via OnDemand's file browser. ] --- ## MultiQC script We could also run MultiQC in a script (`scripts/QC_fastq/multiqc.sh`): ```sh #!/bin/bash #SBATCH --account=PAS0471 #SBATCH --time=60 #SBATCH --out=slurm-multiqc-%j.out set -e -u -o pipefail # Save command-line arguments in named variables: input_dir="$1" output_dir="$2" mkdir -p "$output_dir" # If necessary, create the output dir # Report before starting the pogram: echo "Running MultiQC for dir: $input_dir" echo "Output dir: $output_dir" # Load the conda module and the MultiQC conda environment: module load python/3.6-conda5.2 source activate /users/PAS0471/jelmer/.conda/envs/multiqc # Run fastqc: multiqc "$input_dir" -o "$output_dir" ``` --- ## Running our MultiQC script - Submitting the script as a job: ```sh $ dir=results/QC_fastq/ $ sbatch multiqc.sh $dir $dir ``` <br> - Checking the queue: ```sh $ squeue -u $USER ``` - Checking the output: ```sh $ ls $ less slurm* $ ls $dir/multiqc_report.html ``` --- class: inverse middle center # Questions? ---- <br> <br> <br> <br>