Class for helping to filter a SAM/BAM record.
More...
#include <SamFilter.h>
|
static FilterStatus | clipOnMismatchThreshold (SamRecord &record, GenomeSequence &refSequence, double mismatchThreshold) |
| Clip the read based on the specified mismatch threshold. More...
|
|
static FilterStatus | softClip (SamRecord &record, int32_t numFrontClips, int32_t numBackClips) |
| Soft clip the record from the front and/or the back. More...
|
|
static FilterStatus | softClip (Cigar &oldCigar, int32_t numFrontClips, int32_t numBackClips, int32_t &startPos, CigarRoller &updatedCigar) |
| Soft clip the cigar from the front and/or the back, writing the value into the new cigar, updatedCigar & startPos are only updated if the return FilterStatus is CLIPPED. More...
|
|
static FilterStatus | filterOnMismatchQuality (SamRecord &record, GenomeSequence &refSequence, uint32_t qualityThreshold, uint8_t defaultQualityInt) |
| Filter the read based on the specified quality threshold. More...
|
|
static uint32_t | sumMismatchQuality (SamRecord &record, GenomeSequence &refSequence, uint8_t defaultQualityInt) |
| Get the sum of the qualities of all mismatches in the record. More...
|
|
static void | filterRead (SamRecord &record) |
| Filter the read by marking it as unmapped.
|
|
Class for helping to filter a SAM/BAM record.
Definition at line 25 of file SamFilter.h.
◆ FilterStatus
Enum describing what sort of filtering was done.
Enumerator |
---|
NONE | The filter did not affect the read.
|
CLIPPED | Filtering clipped the read.
|
FILTERED | Filtering caused the read to be modified to unmapped.
|
Definition at line 29 of file SamFilter.h.
◆ clipOnMismatchThreshold()
Clip the read based on the specified mismatch threshold.
- Returns
- how the read was affected, NONE if the read was not modified, CLIPPED if the read was clipped, FILTERED if the whole read would have been clipped so instead the read was modified to unmapped.
Definition at line 27 of file SamFilter.cpp.
39 const int32_t initialLastFrontClipPos = -1;
40 int32_t lastFrontClipPos = initialLastFrontClipPos;
42 int32_t firstBackClipPos = readLength;
44 bool fromFrontComplete =
false;
45 bool fromBackComplete =
false;
46 int32_t numBasesFromFront = 0;
47 int32_t numBasesFromBack = 0;
48 int32_t numMismatchFromFront = 0;
49 int32_t numMismatchFromBack = 0;
53 while(!fromFrontComplete || !fromBackComplete)
57 while(!fromFrontComplete &&
58 ((numBasesFromFront <= numBasesFromBack) ||
61 if(iterFromFront.getNextMatchMismatch(baseMatchInfo) ==
false)
64 fromFrontComplete =
true;
72 fromFrontComplete =
true;
79 if(baseMatchInfo.
getType() == SamSingleBaseMatchInfo::MISMATCH)
82 ++numMismatchFromFront;
84 double mismatchPercent =
85 (double)numMismatchFromFront / numBasesFromFront;
86 if(mismatchPercent > mismatchThreshold)
91 numBasesFromFront = 0;
92 numMismatchFromFront = 0;
99 while(!fromBackComplete &&
100 ((numBasesFromBack <= numBasesFromFront) ||
101 (fromFrontComplete)))
103 if(iterFromBack.getNextMatchMismatch(baseMatchInfo) ==
false)
106 fromBackComplete =
true;
114 fromBackComplete =
true;
121 if(baseMatchInfo.
getType() == SamSingleBaseMatchInfo::MISMATCH)
124 ++numMismatchFromBack;
126 double mismatchPercent =
127 (double)numMismatchFromBack / numBasesFromBack;
128 if(mismatchPercent > mismatchThreshold)
133 numBasesFromBack = 0;
134 numMismatchFromBack = 0;
150 return(
softClip(record, lastFrontClipPos + 1, readLength - firstBackClipPos));
References SamQuerySeqWithRefIter::getNextMatchMismatch(), SamSingleBaseMatchInfo::getQueryIndex(), SamRecord::getReadLength(), SamSingleBaseMatchInfo::getType(), and softClip().
◆ filterOnMismatchQuality()
Filter the read based on the specified quality threshold.
- Returns
- how the read was affected, NONE if the read was not modified, FILTERED if the read was modified to unmapped because it was over the quality threshold.
Definition at line 430 of file SamFilter.cpp.
435 uint32_t totalMismatchQuality =
440 if(totalMismatchQuality > qualityThreshold)
References FILTERED, filterRead(), NONE, and sumMismatchQuality().
◆ softClip() [1/2]
Soft clip the cigar from the front and/or the back, writing the value into the new cigar, updatedCigar & startPos are only updated if the return FilterStatus is CLIPPED.
- Parameters
-
oldCigar | cigar prior to clipping |
numFrontClips | number of bases that should be clipped from the front of the sequence read. (total count, including any that are already clipped.) |
numBackClips | number of bases that should be clipped from the back of the sequence read. (total count, including any that are already clipped.) |
startPos | 0-based start position associated with the cigar prior to updating (input) and set to the 0-based start position after updating (output) the cigar if it was CLIPPED. |
updatedCigar | set to the clipped cigar if CLIPPED (output param). |
Definition at line 191 of file SamFilter.cpp.
198 int32_t endClipPos = readLength - numBackClips;
201 if((numFrontClips != 0) || (numBackClips != 0))
206 int32_t totalClips = numFrontClips + numBackClips;
207 if(totalClips >= readLength)
219 int origCigarOpIndex = 0;
224 int32_t numPositions = 0;
227 bool onlyClips =
true;
233 while((origCigarOpIndex < oldCigar.
size()) &&
234 (numPositions < numFrontClips))
237 switch(op->operation)
256 numPositions += op->count;
268 if(numFrontClips != 0)
275 int32_t newCount = numPositions - numFrontClips;
283 if(numPositions > endClipPos)
285 newCount -= (numPositions - endClipPos);
289 updatedCigar.
Add(op->operation, newCount);
307 while((origCigarOpIndex < oldCigar.
size()) &&
308 (numPositions <= endClipPos))
337 uint32_t numPosTilClip = endClipPos - numPositions;
339 if(numPosTilClip < op->count)
343 if(numPosTilClip != 0)
345 updatedCigar.
Add(op->operation,
364 numPositions += op->count;
373 if(numBackClips != 0)
381 while(origCigarOpIndex < oldCigar.
size())
407 if(numFrontClips > 0)
415 int32_t lastFrontClipPos = numFrontClips - 1;
421 startPos = newStartPos + 1;
References CigarRoller::Add(), CLIPPED, Cigar::del, FILTERED, Cigar::foundInQuery(), Cigar::getExpectedQueryBaseCount(), Cigar::getOperator(), Cigar::getRefPosition(), Cigar::hardClip, Cigar::INDEX_NA, Cigar::insert, Cigar::isClip(), Cigar::match, Cigar::mismatch, NONE, Cigar::none, Cigar::size(), Cigar::skip, and Cigar::softClip.
◆ softClip() [2/2]
Soft clip the record from the front and/or the back.
- Parameters
-
record | record to be clipped (input/output parameter). |
numFrontClips | number of bases that should be clipped from the front of the sequence read. (total count, including any that are already clipped.) |
backClipPos | number of bases that should be clipped from the back of the sequence read. (total count, including any that are already clipped.) |
Definition at line 155 of file SamFilter.cpp.
165 status =
softClip(*cigar, numFrontClips, numBackClips,
166 startPos, updatedCigar);
References CLIPPED, FILTERED, filterRead(), SamRecord::get0BasedPosition(), SamRecord::getCigarInfo(), NONE, SamRecord::set0BasedPosition(), and SamRecord::setCigar().
Referenced by clipOnMismatchThreshold().
◆ sumMismatchQuality()
uint32_t SamFilter::sumMismatchQuality |
( |
SamRecord & |
record, |
|
|
GenomeSequence & |
refSequence, |
|
|
uint8_t |
defaultQualityInt |
|
) |
| |
|
static |
The documentation for this class was generated from the following files:
static void filterRead(SamRecord &record)
Filter the read by marking it as unmapped.
void Add(Operation operation, int count)
Append the specified operation with the specified count to this object.
This class represents the CIGAR without any methods to set the cigar (see CigarRoller for that).
int32_t get0BasedPosition()
Get the 0-based(BAM) leftmost position of the record.
static bool isClip(Operation op)
Return true if the specified operation is a clipping operation, false if not.
static uint8_t getPhredBaseQuality(char charQuality)
Get phred base quality from the specified ascii quality.
@ none
no operation has been set.
@ CLIPPED
Filtering clipped the read.
static const uint8_t UNKNOWN_QUALITY_INT
Int value used when the quality is unknown.
@ insert
insertion to the reference (the query sequence contains bases that have no corresponding base in the ...
@ hardClip
Hard clip on the read (clipped sequence not present in the query sequence or reference)....
This class contains the match/mismatch information between the reference and a read for a single base...
static const int32_t INDEX_NA
Value associated with an index that is not applicable/does not exist, used for converting between que...
bool set0BasedPosition(int32_t position)
Set the leftmost position using the specified 0-based (BAM format) value.
@ NONE
The filter did not affect the read.
@ match
match/mismatch operation. Associated with CIGAR Operation "M"
@ FILTERED
Filtering caused the read to be modified to unmapped.
@ skip
skipped region from the reference (the reference contains bases that have no corresponding base in th...
@ softClip
Soft clip on the read (clipped sequence present in the query sequence, but not in reference)....
static bool foundInQuery(Operation op)
Return true if the specified operation is found in the query sequence, false if not.
int size() const
Return the number of cigar operations.
const char * getQuality()
Returns the SAM formatted quality string (QUAL).
FilterStatus
Enum describing what sort of filtering was done.
static uint32_t sumMismatchQuality(SamRecord &record, GenomeSequence &refSequence, uint8_t defaultQualityInt)
Get the sum of the qualities of all mismatches in the record.
int32_t getQueryIndex()
Get the query index for this object.
Type getType()
Get the type (match/mismatch/unknown) for this object.
@ mismatch
mismatch operation. Associated with CIGAR Operation "M"
int32_t getReadLength()
Get the length of the read.
bool setCigar(const char *cigar)
Set the CIGAR to the specified SAM formatted cigar string.
Cigar * getCigarInfo()
Returns a pointer to the Cigar object associated with this record.
@ del
deletion from the reference (the reference contains bases that have no corresponding base in the quer...
The purpose of this class is to provide accessors for setting, updating, modifying the CIGAR object....
Iterates through the query and compare with reference.
int getExpectedQueryBaseCount() const
Return the length of the read that corresponds to the current CIGAR string.
int32_t getRefPosition(int32_t queryIndex, int32_t queryStartPos)
Return the reference position associated with the specified query index or INDEX_NA based on this cig...
const CigarOperator & getOperator(int i) const
Return the Cigar Operation at the specified index (starting at 0).
static FilterStatus softClip(SamRecord &record, int32_t numFrontClips, int32_t numBackClips)
Soft clip the record from the front and/or the back.