FQC: FASTQ QA/QC Dashboard¶
FQC is designed to better group FastQC result data across groups where each group is comprised of FASTQs related to an experiment or sequencing batch. Individual samples are grouped into paired-end sets when available and the dashboard’s extensibility allows a user to add plots or tables as desired.
Example Site¶
Citation¶
Joseph Brown, Meg Pirrung, Lee Ann McCue; FQC Dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool. Bioinformatics 2017 doi: 10.1093/bioinformatics/btx373
Install¶
Requires¶
Parsing the table and running FastQC
is performed with code written for
Python 3. We recommend using Anaconda (https://www.continuum.io/downloads) to
install the FastQC dependency.
Set Up BIOCONDA channels¶
conda config --add channels conda-forge
conda config --add channels defaults
conda config --add channels bioconda
Install Dependency¶
conda install fastqc
Install¶
The dashboard reads local files, so install where you will eventually be serving the site:
git clone https://github.com/pnnl/fqc.git
cd fqc
python setup.py install
This installs fqc
command-line tool to process FASTQs and create the
dashboard.
Then to deploy a local copy from within the fqc
directory, you can run:
python -m http.server --bind localhost 8000
And navigate to localhost:8000
in your browser.
By default, this will show the test data QC as determined by the data
directory in js/fqc.js
:
var filePath = "/example/plot_data/"
Edit fqc.js
to your local path within the fqc
directory tree.
Example Workflow¶
Using the example data set provided in the repo, let’s walk through creating the example site with paired-end data and some custom plots.
In our example, we’re looking at highly replicated 16S amplicon data, so we
will be omitting non-informative plots from our display using --exclude
.
Running QC¶
After installation, run QC over the first sample and generate the dashboard from within fqc repository:
$ cd example
$ fqc qc -t 8 -e 'Basic Statistics' \
-e 'Overrepresented Sequences' \
-e 'Count by Length' \
-e 'Kmer Content' \
2016 160912_M03018 data/fastqs/160912_M03018_R1.fastq.gz

Adding Custom Plots¶
Next, add the Lorenz curve which gives us an idea of how equitably the
sequences were distributed among barcodes. This plot will be added as the
top tab using --prepend
:
$ fqc add --prepend --x-value FractionOfSamples --y-value Equal \
--y-value Actual --x-label 'Fraction of Samples' \
--y-label 'Fraction of On Target Reads' \
plot_data/2016/160912_M03018/config.json \
'Read Distribution' \
line \
data/tables/160912_lorenz.csv
The JSON entry in the output above is just for confirmation as it has already been added to this run’s configuration.
Our display now shows the new tab in the first position:

Tip
Based on the distribution of barcodes, the user could have specified a
fail or warning condition on using the Gini Coefficient and set a tab icon
using --status fail
in the previous command. For more information
see Tab Status.
To add the sequence summary table, we need to append another tab:
$ fqc add --prepend \
plot_data/2016/160912_M03018/config.json \
'Run Stats' \
table \
data/tables/160912_summary.csv

Another plot we typically add shows read abundance across primer plates. This is valuable to observe positive and negative control wells in addition to seeing the effects on neighboring wells. Run 160912 did not have any controls, so we’ll just add abundance for its plate:
$ fqc add --x-value WELL_COL --y-value WELL_ROW \
--value TOTAL_PAIRED_READS --label LABEL \
--label-color LABEL_COLOR \
plot_data/2016/160912_M03018/config.json \
'Abundance by Plate' \
plateheatmap \
data/tables/160912_plate_1.csv

And finally, sometimes things go wrong and barcode plates get mixed up, so we display the top barcodes and their counts. Usually we simply add it as a table, but lets display it as a bar plot:
$ fqc add --x-value Barcode --y-value Count \
plot_data/2016/160912_M03018/config.json \
"Barcode Counts" \
bar \
data/tables/160912_top50barcodes.csv

The remainder of the example site iterates over these steps for the remainder of the samples.
Groups¶
Located within the plot_data
directory, this holds metadata for each group
and samples within the groups:
[
{
"group_id": "group_01",
"uids": [
"test_01"
]
},
{
"group_id": "group_00",
"uids": [
"test_00"
]
}
]
Renders as:

The sample ID and group ID must match the underlying directory tree that is
built by fqc qc
and maintained when using fqc batch-qc
and fqc add
.
The directory tree of this simple example:
plot_data/
├── group_00
│ └── test_00
│ ├── R1
│ ├── R2
│ └── config.json
├── group_01
│ └── test_01
│ ├── R1
│ ├── R2
│ └── config.json
└── groups.json
For a more detailed example, see the groups.json file located in the example data directory.
Configuration File¶
Each FASTQ or FASTQ pair will have an associated config.json file that holds
information like which plots to draw into which tabs and where the associated
data files reside. Each entry must have a tab_name
, `filename
, and
chart_properties
specific to the chart type. For more information on
individual plots and their requirements, see their respective pages.
Tab Status¶
This is an optional icon for page tabs and supports values of pass
,
fail
, or warn
. It’s rendered as:

Plot Options¶
The configuration of plots themselves occurs within the chart_properties
key. Individual plot options are defined in the docs under each plot type.
Manual Editing¶
Plots can edited and plot data can be added manually to a particular sample’s
config.json
. If you’re adding a new sample manually, you will have to add
it to groups.json
.
Plot Tabs¶
Tabs can be added to the plot area using a list of lists for the filename
attribute. The first position is the name of the tab while the second is the
file path. An example JSON entry for this in a sample’s config.json
looks
like:
"filename": [
[
"Plate 1",
"plt1_counts.csv"
],
[
"Plate 2",
"plt2_counts.csv"
]
]
Which will render as:

Area Range¶
The area range plot is automatically generated from FastQC output for inspecting read quality as a function of position along the sequence.
Example Data¶
An example table after parsing output from FastQC looks like:
Base | Mean | Lower Quartile | Upper Quartile |
---|---|---|---|
1 | 32.193 | 32.0 | 33.0 |
2 | 32.365 | 32.0 | 33.0 |
3 | 32.570 | 32.0 | 33.0 |
Plot Options¶
Option | Value |
---|---|
filename | supports either a single file path or list of lists with [plot tab name, file path] pairs (see Plot Tabs) |
tab_name | left main menu entry |
status | left main menu icon – supports ‘pass’, ‘fail’, ‘warn’, or alternatively, omitted (see Tab Status) |
chart_properties | See table below |
Chart Properties¶
Option | Value |
---|---|
type | the required entry is ‘arearange’ |
x_label | x-axis label |
x_value | the header label defined in filename corresponding to x-values |
y_label | y-axis label |
lower_quartile | the header label defined in filename corresponding to lower quartile values |
upper_quartile | the header label defined in filename corresponding to upper quartile values |
mean | the header label defined in filename corresponding to mean values |
zones | defined as ‘value’:’color’ with an initial ‘color’ as the base; see example below |
Example JSON entry:
{
"filename": "R1/Per_base_sequence_quality.csv",
"tab_name": "Quality by Position",
"status": "pass",
"chart_properties": {
"type": "arearange",
"x_label": "Position",
"x_value": "Base",
"y_label": "Quality (Phred score)",
"lower_quartile": "Lower Quartile",
"upper_quartile": "Upper Quartile",
"mean": "Mean"
}
}

There is support for adding zones as well, if you’re going for the classic FastQC look and feel:
{
"filename": [
["R1", "R1/Per_base_sequence_quality.csv"],
["R2", "R2/Per_base_sequence_quality.csv"]
],
"tab_name": "Quality by Position",
"status": "warn",
"chart_properties": {
"type": "arearange",
"x_label": "Position",
"x_value": "Base",
"y_label": "Quality (Phred score)",
"lower_quartile": "Lower Quartile",
"upper_quartile": "Upper Quartile",
"mean": "Mean",
"zones": [
{"value": 30, "color": "#e5afb0"},
{"value": 34, "color": "#e6d6b1"},
{"color": "#b0e5b1"}
]
}
}

Bar¶
No bar plots are automatically generated from FastQC output, but can optionally be added for custom data tables.
Example Data¶
Barcode | Count |
---|---|
TCACGGGAGTTG | 579558 |
AGTTCAGACGCT | 250808 |
ATTTCGACATGC | 245063 |
TAATGACCACGC | 230339 |
CGATCCGTATTA | 215466 |
Usage to Add¶
Given the example data for barcode counts:
$ fqc add --x-value Barcode --y-value Count \
plot_data/2016/160912_M03018/config.json \
"Barcode Counts" \
bar \
example/data/tables/160912_top50barcodes.csv
Plot Options¶
Option | Value |
---|---|
filename | supports either a single file path or list of lists with [plot tab name, file path] pairs (see Plot Tabs) |
tab_name | left main menu entry |
status | left main menu icon – supports ‘pass’, ‘fail’, ‘warn’, or alternatively, omitted (see Tab Status) |
chart_properties | See table below |
Chart Properties¶
Option | Value |
---|---|
type | the required entry is ‘bar’ |
subtitle | an optional subtitle for the plot |
x_label | x-axis label |
x_value | the header label defined in filename corresponding to x-values |
y_label | y-axis label |
y_value | the header label defined in filename corresponding to y-values |
Example JSON entry:
{
"filename": "bar_plot_example.csv",
"tab_name": "Barcode Counts",
"chart_properties": {
"type": "bar",
"x_value": "Barcode",
"x_label": "Barcode",
"y_value": [ "Count" ],
"y_label": "Count"
}

Heatmap¶
A heatmap is generated using tile quality data from FastQC, but a custom
one can be generated using data with an x
, a y
, and value associated
with the coordinate.
Example Data¶
Tile | Base | Mean |
---|---|---|
1101 | 1 | 0.4305 |
1101 | 2 | 0.1525 |
1101 | 3 | 0.0202 |
Usage to Add¶
Tile example data from FastQC can be added manually using:
$ fqc add --x-value Barcode --y-value Count --min-value -10 --max-value 10 \
plot_data/2016/160912_M03018/config.json \
"Barcode Counts" \
bar \
example/data/tables/160912_top50barcodes.csv
Plot Options¶
Option | Value |
---|---|
filename | supports either a single file path or list of lists with [plot tab name, file path] pairs (see Plot Tabs) |
tab_name | left main menu entry |
status | left main menu icon – supports ‘pass’, ‘fail’, ‘warn’, or alternatively, omitted (see Tab Status) |
chart_properties | See table below |
Chart Properties¶
Option | Value |
---|---|
type | the required entry is ‘heatmap’ |
subtitle | an optional subtitle for the plot |
x_label | x-axis label |
x_value | the header label defined in filename corresponding to x-values |
y_label | y-axis label |
y_value | the header label defined in filename corresponding to y-values |
value | the header label defined in filename corresponding to heatmap values |
min_color | the minimum color used in the heatmap color map |
mid_color | the middle color used in the heatmap color map |
max_color | the maximum color used in the heatmap color map |
min | the minimum value defined for the heatmap color map |
max | the maximum value defined for the heatmap color map |
Example JSON entry:
{
"filename": "R1/Per_tile_sequence_quality.csv",
"tab_name": "Quality by Tile",
"status": "pass",
"chart_properties": {
"type": "heatmap",
"subtitle": "Per Tile Average Quality Deviation",
"x_label": "Position",
"x_value": "Base",
"y_label": "Tile",
"y_value": "Tile",
"shape": "square",
"value": "Mean",
"min": "-10",
"max": "10",
"min_color": "#36c",
"mid_color": "#ffffff",
"max_color": "#dc3912"
}
}

Histogram¶
This plot is useful in the context of 16S amplicon sequencing after we’ve quality trimmed reads then joined paired-end reads. Tabulating observed read lengths and their respective counts can give insights into the quality of the sequence ends.
Example Data¶
Length |
---|
153 |
179 |
177 |
191 |
198 |
Usage to Add¶
Given the example data for read length counts:
$ fqc add -x Length -Y "Read Count" --step 10 \
plot_data/2016/160912_M03018/config.json \
"Joined Read Lengths" \
histogram \
data/tables/histogram_example.csv
Plot Options¶
Option | Value |
---|---|
filename | supports either a single file path or list of lists with [plot tab name, file path] pairs (see Plot Tabs) |
tab_name | left main menu entry |
status | left main menu icon – supports ‘pass’, ‘fail’, ‘warn’, or alternatively, omitted (see Tab Status) |
chart_properties | See table below |
Chart Properties¶
Option | Value |
---|---|
type | the required entry is ‘histogram’ |
subtitle | an optional subtitle for the plot |
x_label | x-axis label |
x_value | the header label defined in filename corresponding to values |
y_label | y-axis label |
step | histogram bin size |
Example JSON entry:
{
"filename": "histogram_example.csv",
"tab_name": "Joined Read Lengths",
"chart_properties": {
"type": "histogram",
"x_value": "Length",
"x_label": "Length",
"y_label": "Read Count",
"step": 10
}
}

Line¶
Example Data¶
Quality | Count |
---|---|
18 | 1.0 |
19 | 14.0 |
20 | 46.0 |
21 | 111.0 |
22 | 141.0 |
Usage to Add¶
Given quality data across read positions, we can add this plot using:
$ fqc add --x-value Barcode --y-value Count \
plot_data/2016/160912_M03018/config.json \
"Barcode Counts" \
bar \
example/data/tables/160912_top50barcodes.csv
Plot Options¶
Option | Value |
---|---|
filename | supports either a single file path or list of lists with [plot tab name, file path] pairs (see Plot Tabs) |
tab_name | left main menu entry |
status | left main menu icon – supports ‘pass’, ‘fail’, ‘warn’, or alternatively, omitted (see Tab Status) |
chart_properties | See table below |
Chart Properties¶
Option | Value |
---|---|
type | the required entry is ‘bar’ |
subtitle | an optional subtitle for the plot |
x_label | x-axis label |
x_value | the header label defined in filename corresponding to x-values |
y_label | y-axis label |
y_value | the header label defined in filename corresponding to y-values |
Example JSON entry:
{
"filename": "simple_line.csv",
"tab_name": "Quality by Position",
"chart_properties": {
"type": "line",
"x_value": "Quality",
"x_label": "Quality",
"y_value": ["Count"],
"y_label": "Count"
}
}

Multi-line Plots¶
When multiple y-values are being plotted:
Base | G | A | T | C |
---|---|---|---|---|
1 | 45.96 | 51.62 | 1.38 | 1.04 |
2 | 0.8999999999999999 | 7.26 | 1.5 | 90.34 |
3 | 8.14 | 79.12 | 11.72 | 1.02 |
4 | 1.1199999999999999 | 5.1 | 4.5 | 89.28 |
5 | 1.82 | 1.0999999999999999 | 1.32 | 95.76 |
These data are added by specifying -y
multiple times:
$ fqc add -x Base -X Position \
-y G -y A -y C -y T \
-Y Percent \
plot_data/2016/160912_M03018/config.json \
"Content by Position" \
line \
data/tables/multiple_line.csv
Example JSON entry:
{
"filename": "multiple_line.csv",
"tab_name": "Content by Position",
"chart_properties": {
"type": "line",
"x_label": "Position",
"x_value": "Base",
"y_label": "Percent",
"y_value": [
"G",
"A",
"C",
"T"
]
}
}

Plate Heatmap¶
This plot is intended to be a nicely spaced heatmap specifically for showing trends over sample plates. Definitions for colors are optional and will be used to outline their respective coordinates.
An example for this plot type is executed in the workflow at plate-heatmap-example.
Example Data¶
This is an alternate example from the workflow to show how to deal with multiple label colors.
WELL_COL | WELL_ROW | TOTAL_PAIRED_READS | LABEL | LABEL_COLOR |
---|---|---|---|---|
1 | A | 205 | sample1 | |
2 | A | 103 | POS CTRL | #d62728 |
3 | A | 125 | NEG CTRL | #1f77b4 |
Usage to Add¶
$ fqc add --x-value WELL_COL --y-value WELL_ROW \
--value TOTAL_PAIRED_READS --label LABEL \
--label-color LABEL_COLOR \
plot_data/2016/160912_M03018/config.json \
'Abundance by Plate' \
plateheatmap \
data/tables/160912_plate_1.csv
Plot Options¶
Option | Value |
---|---|
filename | supports either a single file path or list of lists with [plot tab name, file path] pairs (see Plot Tabs) |
tab_name | left main menu entry |
status | left main menu icon – supports ‘pass’, ‘fail’, ‘warn’, or alternatively, omitted (see Tab Status) |
chart_properties | See table below |
Chart Properties¶
Option | Value |
---|---|
type | the required entry is ‘plateheatmap’ |
subtitle | an optional subtitle for the plot |
x_label | x-axis label |
x_value | the header label defined in filename corresponding to x-values |
y_label | y-axis label |
y_value | the header label defined in filename corresponding to y-values |
label | the header label defined in filename corresponding to point labels; displayed on hover when specified |
label_color | header of column containing colors; acts to color surrounding point to highlight |
Example JSON entry:
{
"filename": "160912_plate_1.csv",
"tab_name": "Abundance by Plate",
"chart_properties": {
"type": "plateheatmap",
"x_value": "WELL_COL",
"x_label": "WELL_COL",
"y_value": ["WELL_ROW"],
"y_label": "WELL_ROW",
"value": "TOTAL_PAIRED_READS",
"label": "LABEL",
"label_color": "LABEL_COLOR"
}
}

Table¶
Example Data¶
Measure | Value |
---|---|
Total Reads | 15752091 |
Fraction On Target | 0.67042235853 |
Fraction Off Target | 1.0284348916e-05 |
Fraction Unmatched | 0.329567357121 |
Coefficient of Distribution (G) | 0.260394791718 |
Usage to Add¶
$ fqc add --prepend \
plot_data/2016/160912_M03018/config.json \
'Run Stats' \
table \
data/tables/160912_summary.csv
Plot Options¶
Option | Value |
---|---|
filename | supports either a single file path or list of lists with [plot tab name, file path] pairs (see Plot Tabs) |
tab_name | left main menu entry |
status | left main menu icon – supports ‘pass’, ‘fail’, ‘warn’, or alternatively, omitted (see Tab Status) |
Example JSON entry:
{
"filename": "160912_summary.csv",
"chart_properties": {
"type": "table"
},
"tab_name": "Run Stats"
}
