Package picard.sam

Class AbstractAlignmentMerger

  • Direct Known Subclasses:
    SamAlignmentMerger

    public abstract class AbstractAlignmentMerger
    extends Object
    Abstract class that coordinates the general task of taking in a set of alignment information, possibly in SAM format, possibly in other formats, and merging that with the set of all reads for which alignment was attempted, stored in an unmapped SAM file.

    The order of processing is as follows:

    1. Get records from the unmapped bam and the alignment data 2. Merge the alignment information and public tags ONLY from the aligned SAMRecords 3. Do additional modifications -- handle clipping, trimming, etc. 4. Fix up mate information on paired reads 5. Do a final calculation of the NM and UQ tags (coordinate sorted only) 6. Write the records to the output file.

    Concrete subclasses which extend AbstractAlignmentMerger should implement getQueryNameSortedAlignedRecords. If these records are not in queryname order, mergeAlignment will throw an IllegalStateException.

    Subclasses may optionally implement ignoreAlignment(), which can be used to skip over certain alignments.

    • Field Detail

      • referenceFasta

        protected final File referenceFasta
    • Constructor Detail

      • AbstractAlignmentMerger

        public AbstractAlignmentMerger​(File unmappedBamFile,
                                       File targetBamFile,
                                       File referenceFasta,
                                       boolean clipAdapters,
                                       boolean bisulfiteSequence,
                                       boolean alignedReadsOnly,
                                       htsjdk.samtools.SAMProgramRecord programRecord,
                                       List<String> attributesToRetain,
                                       List<String> attributesToRemove,
                                       Integer read1BasesTrimmed,
                                       Integer read2BasesTrimmed,
                                       List<htsjdk.samtools.SamPairUtil.PairOrientation> expectedOrientations,
                                       htsjdk.samtools.SAMFileHeader.SortOrder sortOrder,
                                       PrimaryAlignmentSelectionStrategy primaryAlignmentSelectionStrategy,
                                       boolean addMateCigar,
                                       boolean unmapContaminantReads)
        constructor with a default setting for unmappingReadsStrategy. see full constructor for parameters
      • AbstractAlignmentMerger

        public AbstractAlignmentMerger​(File unmappedBamFile,
                                       File targetBamFile,
                                       File referenceFasta,
                                       boolean clipAdapters,
                                       boolean bisulfiteSequence,
                                       boolean alignedReadsOnly,
                                       htsjdk.samtools.SAMProgramRecord programRecord,
                                       List<String> attributesToRetain,
                                       List<String> attributesToRemove,
                                       Integer read1BasesTrimmed,
                                       Integer read2BasesTrimmed,
                                       List<htsjdk.samtools.SamPairUtil.PairOrientation> expectedOrientations,
                                       htsjdk.samtools.SAMFileHeader.SortOrder sortOrder,
                                       PrimaryAlignmentSelectionStrategy primaryAlignmentSelectionStrategy,
                                       boolean addMateCigar,
                                       boolean unmapContaminantReads,
                                       AbstractAlignmentMerger.UnmappingReadStrategy unmappingReadsStrategy)
        Constructor
        Parameters:
        unmappedBamFile - The BAM file that was used as the input to the aligner, which will include info on all the reads that did not map. Required.
        targetBamFile - The file to which to write the merged SAM records. Required.
        referenceFasta - The reference sequence for the map files. Required.
        clipAdapters - Whether adapters marked in unmapped BAM file should be marked as soft clipped in the merged bam. Required.
        bisulfiteSequence - Whether the reads are bisulfite sequence (used when calculating the NM and UQ tags). Required.
        alignedReadsOnly - Whether to output only those reads that have alignment data
        programRecord - Program record for target file SAMRecords created.
        attributesToRetain - private attributes from the alignment record that should be included when merging. This overrides the exclusion of attributes whose tags start with the reserved characters of X, Y, and Z
        attributesToRemove - attributes from the alignment record that should be removed when merging. This overrides attributesToRetain if they share common tags.
        read1BasesTrimmed - The number of bases trimmed from start of read 1 prior to alignment. Optional.
        read2BasesTrimmed - The number of bases trimmed from start of read 2 prior to alignment. Optional.
        expectedOrientations - A List of SamPairUtil.PairOrientations that are expected for aligned pairs. Used to determine the properPair flag.
        sortOrder - The order in which the merged records should be output. If null, output will be coordinate-sorted
        primaryAlignmentSelectionStrategy - What to do when there are multiple primary alignments, or multiple alignments but none primary, for a read or read pair.
        addMateCigar - True if we are to add or maintain the mate CIGAR (MC) tag, false if we are to remove or not include.
        unmapContaminantReads - If true, identify reads having the signature of cross-species contamination (i.e. mostly clipped bases), and mark them as unmapped.
        unmappingReadsStrategy - An enum describing how to deal with reads whose mapping information are being removed (currently this happens due to cross-species contamination). Ignored unless unmapContaminantReads is true.
    • Method Detail

      • getDictionaryForMergedBam

        protected abstract htsjdk.samtools.SAMSequenceDictionary getDictionaryForMergedBam()
      • getQuerynameSortedAlignedRecords

        protected abstract htsjdk.samtools.util.CloseableIterator<htsjdk.samtools.SAMRecord> getQuerynameSortedAlignedRecords()
      • ignoreAlignment

        protected boolean ignoreAlignment​(htsjdk.samtools.SAMRecord sam)
      • isContaminant

        protected boolean isContaminant​(picard.sam.HitsForInsert hits)
      • getAttributesToReverse

        public Set<String> getAttributesToReverse()
        Gets the set of attributes to be reversed on reads marked as negative strand.
      • setAttributesToReverse

        public void setAttributesToReverse​(Set<String> attributesToReverse)
        Sets the set of attributes to be reversed on reads marked as negative strand.
      • getAttributesToReverseComplement

        public Set<String> getAttributesToReverseComplement()
        Gets the set of attributes to be reverse complemented on reads marked as negative strand.
      • setAttributesToReverseComplement

        public void setAttributesToReverseComplement​(Set<String> attributesToReverseComplement)
        Sets the set of attributes to be reverse complemented on reads marked as negative strand.
      • setMaxRecordsInRam

        public void setMaxRecordsInRam​(int maxRecordsInRam)
        Allows the caller to override the maximum records in RAM.
      • setAddPGTagToReads

        public void setAddPGTagToReads​(boolean addPGTagToReads)
        Set addPGTagToReads. If true, the PG will be added to reads when applicable. If false, the PG tag will not be added. Default is true
      • mergeAlignment

        public void mergeAlignment​(File referenceFasta)
        Merges the alignment data with the non-aligned records from the source BAM file.
      • fixNmMdAndUq

        public static void fixNmMdAndUq​(htsjdk.samtools.SAMRecord record,
                                        htsjdk.samtools.reference.ReferenceSequenceFileWalker refSeqWalker,
                                        boolean isBisulfiteSequence)
        Calculates and sets the NM, MD, and and UQ tags from the record and the reference
        Parameters:
        record - the record to be fixed
        refSeqWalker - a ReferenceSequenceWalker that will be used to traverse the reference
        isBisulfiteSequence - a flag indicating whether the sequence came from bisulfite-sequencing which would imply a different calculation of the NM tag. No return value, modifies the provided record.
      • fixUq

        public static void fixUq​(htsjdk.samtools.SAMRecord record,
                                 htsjdk.samtools.reference.ReferenceSequenceFileWalker refSeqWalker,
                                 boolean isBisulfiteSequence)
        Calculates and sets UQ tag from the record and the reference
        Parameters:
        record - the record to be fixed
        refSeqWalker - a ReferenceSequenceWalker that will be used to traverse the reference
        isBisulfiteSequence - a flag indicating whether the sequence came from bisulfite-sequencing. No return value, modifies the provided record.
      • encodeMappingInformation

        public static String encodeMappingInformation​(htsjdk.samtools.SAMRecord rec)
        Encodes mapping information from a record into a string according to the format sepcified in the Sam-Spec under the SA tag. No protection against missing values (for cigar, and NM tag). (Might make sense to move this to htsJDK.)
        Parameters:
        rec - SAMRecord whose alignment information will be encoded
        Returns:
        String encoding rec's alignment information according to SA tag in the SAM spec
      • clipForOverlappingReads

        protected static void clipForOverlappingReads​(htsjdk.samtools.SAMRecord read1,
                                                      htsjdk.samtools.SAMRecord read2,
                                                      boolean useHardClipping)
        Checks to see whether the ends of the reads overlap and clips reads if necessary. For inward facing read pairs, this method will soft clip the 5' end of each read so that the 5' aligned end of each read does not extend past the 3' aligned end of its mate. If useHardClipping is true, this method will additionally hard clip the 5' end of each read if necessary so that the 5' end of each read (including soft clipped bases) does not extend past the 3' end of its mate (including soft clipped bases). Some examples are illustrative: <-MMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMM-> will be soft-clipped to <-SSSMMMMMMMMMMMMMM MMMMMMMMMMMMMMSSS-> and with useHardClip true, this would then be hard-clipped to <-HHHMMMMMMMMMMMMMM MMMMMMMMMMMMMMHHH-> A more complicated example <-MMMMMMMMMMMMMMMSS MMMMMMMMMMMMMMMMM-> will be soft-clipped to <-SSSMMMMMMMMMMMMSS MMMMMMMMMMMMSSSSS-> and with useHardClip true, this would then be hard-clipped to <-HHHMMMMMMMMMMMMSS MMMMMMMMMMMMSSHHH-> Note that the soft-clipping is done such that the clipped starts and ends of each read are the same, and hard-clipping is done such that the unclipped starts and ends of each read are the same.
      • setValuesFromAlignment

        protected void setValuesFromAlignment​(htsjdk.samtools.SAMRecord rec,
                                              htsjdk.samtools.SAMRecord alignment,
                                              boolean needsSafeReverseComplement)
        Sets the values from the alignment record on the unaligned BAM record. This preserves all data from the unaligned record (ReadGroup, NoiseRead status, etc) and adds all the alignment info
        Parameters:
        rec - The unaligned read record
        alignment - The alignment record
      • createNewCigarsIfMapsOffEndOfReference

        public static void createNewCigarsIfMapsOffEndOfReference​(htsjdk.samtools.SAMRecord rec)
        Soft-clip an alignment that hangs off the end of its reference sequence. Checks both the read and its mate, if available.
        Parameters:
        rec -
      • updateCigarForTrimmedOrClippedBases

        protected void updateCigarForTrimmedOrClippedBases​(htsjdk.samtools.SAMRecord rec,
                                                           htsjdk.samtools.SAMRecord alignment)
      • getProgramRecord

        protected htsjdk.samtools.SAMProgramRecord getProgramRecord()
      • setProgramRecord

        protected void setProgramRecord​(htsjdk.samtools.SAMProgramRecord pg)
      • isReservedTag

        protected boolean isReservedTag​(String tag)
      • getHeader

        protected htsjdk.samtools.SAMFileHeader getHeader()
      • resetRefSeqFileWalker

        protected void resetRefSeqFileWalker()
      • isClipOverlappingReads

        public boolean isClipOverlappingReads()
      • setClipOverlappingReads

        public void setClipOverlappingReads​(boolean clipOverlappingReads)
      • setHardClipOverlappingReads

        public void setHardClipOverlappingReads​(boolean hardClipOverlappingReads)
      • isKeepAlignerProperPairFlags

        public boolean isKeepAlignerProperPairFlags()
      • setKeepAlignerProperPairFlags

        public void setKeepAlignerProperPairFlags​(boolean keepAlignerProperPairFlags)
        If true, keep the aligner's idea of proper pairs rather than letting alignment merger decide.
      • setIncludeSecondaryAlignments

        public void setIncludeSecondaryAlignments​(boolean includeSecondaryAlignments)
      • close

        public void close()