mutation t@sting

QueryEngine Documentation

input | output
known bugs and limitations
contact

Input

Modifyable HTML elements are highlighted in blue

VCF file

Input files have to be in VCF format, coordinates must refer to GRCh37 (also called hg19). Up to now, we do not offer processing of (merged) VCF files containing variants obtained from sequencing of two or more samples. Thus, the uplpoaded VCF file may only contain data from one sample.

E-mail address

Has to be provided in order to get notified when your MutationTaster results are ready. These can be browsed for three weeks on our server and will afterwards be deleted.

HTML files

If you want to download all the single MutationTaster results in HTML format when the QueryEngine has finished your analysis, you have to choose create HTML files. The whole QueryEngine run takes longer then because the writing, storing and zipping of the HTML files takes some time.
If you are only interested in downloading the summarized results in TSV format, choose don't create HTML files.The QueryEngine will run much faster then. You can nevertheless watch the detailed HTML results online on our server because we provide direct links to query MutationTaster again on demand for every variant. We assume that you probably will not watch every single variant from your Exome Sequencing Project (or similar), that's why we think it might be better to re-query the most interesting variants afterwards instead of storing thousands of HTML files in advance.

Analysis settings

search for homozygous variants
Check yes if you are interested in MutationTaster results for homozygous variants - heterozygous variants will be neglected. If unchecked, all variants in your VCF will be processed (unless other options checked).

search for compound heterozygous variants
Check yes if you are interested in MutationTaster results for compound heterozygous variants. If unchecked, all variants in your VCF will be processed (unless other options checked).
This option is not yet implemented, but will soon be.

combine neighbouring variants
Sometimes single base exchanges are located very close to each other. If considered separately as single alterations, they might seem harmless, but if they act together, they might be deleterious. For this reason we offer to combine neighbouring variants (only single base exchanges) and treat them as if they were one, but more complex, alteration. Check yes if you are interested in this. The analysis of the combined alterations is conducted in addition to the analysis of the single alterations.
This option is in beta status. Please let us know if you encounter any problems or inconsistencies. Thank you!

analyse complete VCF / variants on chr / analyse custom regions / exclude custom regions
If you don't need your complete VCF file to be analysed, you can save time by constricting analysis to certain regions (for example linkage- or homozygous regions). Choose analyse custom regions (a text field will open) and enter your regions of interest in bed-format. Some people are interested in variants all over the genome, but mostly in exonic ones. They can leave the option analyse complete VCF selected, but use a ready-made set of all suitable Ensembl69 exons for analysis by additionally ticking the ...but only exons option. Since many people are also interested in intronic variants which are however close to exons, you can enter your favorite value between 0 and 99 - this is the number of "flanking" bases adjacent to intron/exon borders which are additionally analysed. The ...but only exons option is also available if you are interested in all variants on a certain chromosome (choose analyse variants on chr and enter your favorite chromosome) but again want to exclude intronic ones. You can also exclude certain regions with the exclude custom regions option.
This option is in beta status. Please let us know if you encounter any problems or inconsistencies. Thank you!

filter against 1000G
Here, you may specify filter options to skip analysis of your variants that were also found in the 1000Genomes Project (1000G). If you wish to exclude variants found in TGP 4 or more times in homozygous state but include all heterozygous variants, you can leave everything as it is (default setting). But you are free to change the number of cases that have to be present in 1000G in order to exclude variants from analysis, or you may additionally filter out variants in heterozygous state found 1000G. For this purpose, check the checkbox and adjust the number in the corresponding text field (heterozygous in ... or more 1000G samples). Filtering of heterozygous variants is not turned-on by default. If you do not want to filter against 1000G at all, uncheck both boxes. Once a box is checked, there must be a numerical value entered, This can also be zero (0), which virtually is the same as if you would have unchecked the box. Checking the checkbox and leaving the textbox empty will result in an error message.

minimum coverage
Very low covered positions don't offer reliable data. Therefore, it is useful to exclude such variants from analysis (if not already done during variant calling / pileup). We offer the possibility to skip variants that are covered below a user-defined threshold. To this end, adjust the number in the corresponding text field. If you don't want to exclude poorly covered variants, fill in 0. Default # is a minimum coverage of 4.

queue status

We display the current load of the query engine. Jobs are automatically sorted into in different queues depending on their nature and size.
Submitted jobs are generally sorted into either the small (VCF file containing 1-500 variants / lines), medium (VCF files containing 501-10.000 variants / lines) or large (VCF files containing more than 10.000 variants / lines) queue. The different queues are executed with different priorities and independent from each other. Even when the queue for large jobs is full, a small job will be processed immediately, if there are free slots in the small queue.
DB queries are executed several times during every query engine run, independent of the size of the submitted VCF file. Database (DB) jobs are automatically generated during a query engine run and filled into a separate queue, since they may put a heavy load on our server.

Output

Statistics

Most often, MutationTaster will not analyse each and every line of your VCF file, either because you have set certain filters, or because certain variants were not suitable for analysis with MutationTaster.

submitted variants - Number of alterations (lines) in VCF file.

pre-discarded variants - Number of variants which were filtered out according to user input (below coverage, not homozygous, out of specified region / chromosome) or due to input / format errors (e.g. variant equals refseq, reference allele equals alternative allele, Indel is too long or neither genotype nor frequency is supplied). All pre-discarded variants are written to a file (skipped.txt) which can be downloaded on the results page as soon as your job has been finished.

analysable variants - Number of variants which were suitable for analysis. These can be significantly more than the lines in the VCF, because sometimes one line in the VCF contains more than one alternative allele. Additionally, if you choose to combine neighbouring variants, the number will even rise.

discarded (TGP) - Number of variants ignored for analysis due to presence in 1000 Genomes Project (applies only if one or both of the two filter against TGP options are set).

discarded (out of gene/exon/region) - Number of variants which were excluded from analysis because they are a) extragenic and/or b) out of/distant from exon (applies only if option for only exons is set) or c) out of chromosome (applies only if option for only chromosome CHR is set) or c) out of region (applies only if option for analyse custom region is set) or d) inside region (applies only if option for exclude custom region is set)
analysed variants - Number of variants which were analysed with MutationTaster. These will normally be significantly more than the analysable variants, because for most variants, more than one (suitable) transcript will be found.

Storage and download of results

MutationTaster results are stored in our database and can be accessed online on our server. Up to now, results are not deleted, but as soon as the QueryEngine is made public, we will store your results only for three weeks. Afterwards, they will automatically be deleted. You can download your results as zip-archive. We offer two download possibilities:

a) download results as archive of single HTML files (only recommended for input VCFs with few variants) - the resulting archive contains all the MutationTaster results files as single HTML files. Since they are (up to now) neither divided into sub-folders, nor summarized in one overview HTML-file, this zip-archive gets bulky when many variants were processed. That's why we don't recommend it for large input VCFs. Moreover, please be sure to activate the 'create HTML files' option before submitting your VCF (otherwise we will not store the HTML results files and you cannot download them).

b) download results summarized with main features as TSV file(s) (generally recommended, especially for large input VCFs) - the resulting archive contains one TSV file with one variant per line and the following columns per variant: chromosome | position | genesymbol | pred_index | model | probability | alt_type | AAE | snp_id | allele_ref | allele_alt | f_ClinVar

We generally recommend to download your results as TSV for two main reasons: 1) The QueryEngine will run much faster if no HTML files have to be created and saved and 2) the resulting TSV file can be filtered and/or sorted both before and after downloading. We offer to filter out certain variants (e.g. those that were excluded due to presence in TGP) and to sort the remaining variants according to user-specified criteria (see Display / filter / export results). Once downloaded and stored on your own machine, you can still re-sort the TSV file with Microsoft Excel or similar spreadsheet programs. Especially when large VCF files have to be analyzed (e.g. from Exome Sequencing) it is very likely that you won't look at each and every single HTML result file, but only at some HTML files fulfilling certain criteria (e.g. prediction disease causing or variants in certain genes). You have all the results in the TSV file and can then query the interesting variants manually.

The option to delete your data as soon as your download is completed will soon be added.

Display / filter / export results

The results stored in our database can be sorted and filtered by different criteria for either displaying and browsing them directly on our server or for exporting them.

sort & group
1) sort & group by prediction | model | gene symbol; choose this option for sorting from prediction disease causing to prediction polymorphism, from complex_aae via simple_aae to without_aae model, from gene symbols starting with A to gene symbols starting with Z

2) sort & group by prediction | model | gene symbol | variation; similar to 1) but additional level of grouping according to the variation.

3) sort by these attributes; choose this option if you want to sort & group by customized criteria in one, two or three levels. The different criteria are:
genesymbol ASC (genesymbol from A to Z)
genesymbol DESC (genesymbol from Z to A)
chromosome ASC (chromosome from 1 to Y)
chromosome DESC (chromosome from Y to 1)
position ASC (ascending)
position DESC (descending)
pred_index ASC (prediction from disease causing to polymorphism)
pred_index DESC (prediction from polymorphism to disease causing)
pred_problem ASC (reason for prediction problem, from A to Z)
pred_problem DESC (reason for prediction problem, from Z to A)
model ASC (model used by the classifier, from without_aae via simple_aae to complex_aae)
model DESC (model used by the classifier, from complex_aae via simple_aae to without_aae)
probability ASC (probability of the prediction ascending)
probability DESC (probability of the prediction descending)
alt_type ASC (alteration type in order single base exchange, insertion and deletion, insertion, deletion)
alt_type DESC (alteration type in order deletion, insertion, insertion and deletion, single base exchange)
snp_id ASC (rs-number ascending)
snp_id DESC (rs-number descending)

hide
There are the following options to hide certain alterations: Silent alterations (i.e. without amino acid exchange), all predicted polymorphisms, known polymorphisms (i.e. homozygous > 4 times in 1000Genomes Project) and prediction problems. Selection of options is valid for both displayin results in the browser as well as downloading them as TSV.

get the data
The results can either be displayed online in your browser (choose display) or be downloaded as TSV (choose export as TSV). Filtering and sorting options are applied to both methods.

General comments on MutationTaster output

Please note: The option to show nucleotide alignment (multi-species alignment of nucleotide sequence around the submitted alteration) in the MutationTaster results is turned-off by default in the QueryEngine. This is mainly due to speed issues, since the BLAST call slows down MutationTaster and the results are not used by the Bayes Classifier anyway. show nucleotide alignmentis turned-on by default if you use the link to re-query single variants in MutationTaster which is provided in the results table on our server.

The QueryEngine will process the variants from the submitted VCF-file in all suitable Ensembl69 transcripts. Some transcripts will not be included in the analysis, e.g. transcripts which a) have no or too many corresponding NCBI gene ID(s), b) are protein-coding but have no correct start codon (ATG) or stop codon (TGA, TAA, TAG). MutationTaster first tries to use protein-coding transcripts and if there is at least one, it won't search for transcripts of other biotypes. Only if there are no protein-coding transcripts available, it will try to use transcripts of other biotypes (although certain biotypes are straightaway and principally excluded from analysis, e.g. nonsense_mediated_decay,ambiguous_orf,TR_pseudogene etc.).
Please see the MutationTaster documentation for details of the MutationTaster analysis results and MutationTaster error messages.

Known bugs and limitations

only VCF files containing data from one sample can be processed

Future plans

analysis of merged VCF files with data from multiple samples or several single-sample VCF files at once
filter options to exclude / include variants present / absent in defined samples

Contact

In case you discover bugs, have suggestions or questions, please write an e-mail to
Jana Marie Schwarz (jana-marie.schwarz AT charite.de) or to
Dominik Seelow (dominik.seelow AT charite.de).
We also appreciate hearing about your general experiences using this QueryEngine and MutationTaster.