Class CheckFingerprint
- java.lang.Object
-
- picard.cmdline.CommandLineProgram
-
- picard.fingerprint.CheckFingerprint
-
@DocumentedFeature public class CheckFingerprint extends CommandLineProgram
Checks the sample identity of the sequence/genotype data in the provided file (SAM/BAM or VCF) against a set of known genotypes in the supplied genotype file (in VCF format).Summary
Computes a fingerprint (essentially, genotype information from different parts of the genome) from the supplied input file (SAM/BAM or VCF) file and compares it to the expected fingerprint genotypes provided. The key output is a LOD score which represents the relative likelihood of the sequence data originating from the same sample as the genotypes vs. from a random sample.
Two outputs are produced:- A summary metrics file that gives metrics of the fingerprint matches when comparing the input to a set of genotypes for the expected sample. At the single sample level (if the input was a VCF) or at the read level (lane or index within a lane) (if the input was a SAM/BAM)
- A detail metrics file that contains an individual SNP/Haplotype comparison within a fingerprint comparison.
FingerprintingSummaryMetrics
andFingerprintingDetailMetrics
. The output files may be specified individually using the SUMMARY_OUTPUT and DETAIL_OUTPUT options. Alternatively the OUTPUT option may be used instead to give the base of the two output files, with the summary metrics having a file extension "fingerprinting_summary_metrics", and the detail metrics having a file extension "fingerprinting_detail_metrics".
Example comparing a bam against known genotypes:
java -jar picard.jar CheckFingerprint \ INPUT=sample.bam \ GENOTYPES=sample_genotypes.vcf \ HAPLOTYPE_MAP=fingerprinting_haplotype_database.txt \ OUTPUT=sample_fingerprinting
Detailed Explanation
This tool calculates a single number that reports the LOD score for identity check between the
INPUT
and theGENOTYPES
. A positive value indicates that the data seems to have come from the same individual or, in other words the identity checks out. The scale is logarithmic (base 10), so a LOD of 6 indicates that it is 1,000,000 more likely that the data matches the genotypes than not. A negative value indicates that the data do not match. A score that is near zero is inconclusive and can result from low coverage or non-informative genotypes.The identity check makes use of haplotype blocks defined in the
HAPLOTYPE_MAP
file to enable it to have higher statistical power for detecting identity or swap by aggregating data from several SNPs in the haplotype block. This enables an identity check of samples with very low coverage (e.g. ~1x mean coverage).When provided a VCF, the identity check looks at the PL, GL and GT fields (in that order) and uses the first one that it finds.
-
-
Field Summary
Fields Modifier and Type Field Description File
DETAIL_OUTPUT
String
EXPECTED_SAMPLE_ALIAS
static String
FINGERPRINT_DETAIL_FILE_SUFFIX
static String
FINGERPRINT_SUMMARY_FILE_SUFFIX
double
GENOTYPE_LOD_THRESHOLD
String
GENOTYPES
File
HAPLOTYPE_MAP
boolean
IGNORE_READ_GROUPS
String
INPUT
String
OBSERVED_SAMPLE_ALIAS
String
OUTPUT
File
SUMMARY_OUTPUT
-
Fields inherited from class picard.cmdline.CommandLineProgram
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_ALLOWABLE_ONE_LINE_SUMMARY_LENGTH, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY
-
-
Constructor Summary
Constructors Constructor Description CheckFingerprint()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected String[]
customCommandLineValidation()
Put any custom command-line validation in an override of this method.protected int
doWork()
Do the work after command line has been parsed.-
Methods inherited from class picard.cmdline.CommandLineProgram
checkRInstallation, getCommandLine, getCommandLineParser, getCommandLineParserForArgs, getDefaultHeaders, getFaqLink, getMetricsFile, getPGRecord, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, makeReferenceArgumentCollection, parseArgs, requiresReference, setDefaultHeaders, useLegacyParser
-
-
-
-
Field Detail
-
INPUT
@Argument(shortName="I", doc="Input file SAM/BAM/CRAM or VCF. If a VCF is used, it must have at least one sample. If there are more than one samples in the VCF, the parameter OBSERVED_SAMPLE_ALIAS must be provided in order to indicate which sample\'s data to use. If there are no samples in the VCF, an exception will be thrown.") public String INPUT
-
OBSERVED_SAMPLE_ALIAS
@Argument(optional=true, doc="If the input is a VCF, this parameters used to select which sample\'s data in the VCF to use.") public String OBSERVED_SAMPLE_ALIAS
-
OUTPUT
@Argument(shortName="O", doc="The base prefix of output files to write. The summary metrics will have the file extension \'fingerprinting_summary_metrics\' and the detail metrics will have the extension \'fingerprinting_detail_metrics\'.", mutex={"SUMMARY_OUTPUT","DETAIL_OUTPUT"}) public String OUTPUT
-
SUMMARY_OUTPUT
@Argument(shortName="S", doc="The text file to which to write summary metrics.", mutex="OUTPUT") public File SUMMARY_OUTPUT
-
DETAIL_OUTPUT
@Argument(shortName="D", doc="The text file to which to write detail metrics.", mutex="OUTPUT") public File DETAIL_OUTPUT
-
GENOTYPES
@Argument(shortName="G", doc="File of genotypes (VCF) to be used in comparison. May contain any number of genotypes; CheckFingerprint will use only those that are usable for fingerprinting.") public String GENOTYPES
-
EXPECTED_SAMPLE_ALIAS
@Argument(shortName="SAMPLE_ALIAS", optional=true, doc="This parameter can be used to specify which sample\'s genotypes to use from the expected VCF file (the GENOTYPES file). If it is not supplied, the sample name from the input (VCF or BAM read group header) will be used.") public String EXPECTED_SAMPLE_ALIAS
-
HAPLOTYPE_MAP
@Argument(shortName="H", doc="The file lists a set of SNPs, optionally arranged in high-LD blocks, to be used for fingerprinting. See https://software.broadinstitute.org/gatk/documentation/article?id=9526 for details.") public File HAPLOTYPE_MAP
-
GENOTYPE_LOD_THRESHOLD
@Argument(shortName="LOD", doc="When counting haplotypes checked and matching, count only haplotypes where the most likely haplotype achieves at least this LOD.") public double GENOTYPE_LOD_THRESHOLD
-
IGNORE_READ_GROUPS
@Argument(optional=true, shortName="IGNORE_RG", doc="If the input is a SAM/BAM/CRAM, and this parameter is true, treat the entire input BAM as one single read group in the calculation, ignoring RG annotations, and producing a single fingerprint metric for the entire BAM.") public boolean IGNORE_READ_GROUPS
-
FINGERPRINT_SUMMARY_FILE_SUFFIX
public static final String FINGERPRINT_SUMMARY_FILE_SUFFIX
- See Also:
- Constant Field Values
-
FINGERPRINT_DETAIL_FILE_SUFFIX
public static final String FINGERPRINT_DETAIL_FILE_SUFFIX
- See Also:
- Constant Field Values
-
-
Method Detail
-
doWork
protected int doWork()
Description copied from class:CommandLineProgram
Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.- Specified by:
doWork
in classCommandLineProgram
- Returns:
- program exit status.
-
customCommandLineValidation
protected String[] customCommandLineValidation()
Description copied from class:CommandLineProgram
Put any custom command-line validation in an override of this method. clp is initialized at this point and can be used to print usage and access argv. Any options set by command-line parser can be validated.- Overrides:
customCommandLineValidation
in classCommandLineProgram
- Returns:
- null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.
-
-