libStatGen Software
1
|
Allows the user to easily read/write a SAM/BAM file. More...
#include <SamFile.h>
Public Types | |
enum | OpenType { READ, WRITE } |
Enum for indicating whether to open the file for read or write. More... | |
enum | SortedType { UNSORTED = 0, FLAG, COORDINATE, QUERY_NAME } |
Enum for indicating the type of sort expected in the file. More... | |
Public Member Functions | |
SamFile () | |
Default Constructor, initializes the variables, but does not open any files. | |
SamFile (ErrorHandler::HandlingType errorHandlingType) | |
Constructor that sets the error handling type. More... | |
SamFile (const char *filename, OpenType mode) | |
Constructor that opens the specified file based on the specified mode (READ/WRITE), aborts if the file could not be opened. More... | |
SamFile (const char *filename, OpenType mode, ErrorHandler::HandlingType errorHandlingType) | |
Constructor that opens the specified file based on the specified mode (READ/WRITE) and handles errors per the specified handleType. More... | |
SamFile (const char *filename, OpenType mode, SamFileHeader *header) | |
Constructor that opens the specified file based on the specified mode (READ/WRITE) and reads the header, aborts if the file could not be opened or the header not read. More... | |
SamFile (const char *filename, OpenType mode, ErrorHandler::HandlingType errorHandlingType, SamFileHeader *header) | |
Constructor that opens the specified file based on the specified mode (READ/WRITE) and reads the header, handling errors per the specified handleType. More... | |
virtual | ~SamFile () |
Destructor. | |
bool | OpenForRead (const char *filename, SamFileHeader *header=NULL) |
Open a sam/bam file for reading with the specified filename, determing the type of file and SAM/BAM by reading the file (if not stdin). More... | |
bool | OpenForWrite (const char *filename, SamFileHeader *header=NULL) |
Open a sam/bam file for writing with the specified filename, determining SAM/BAM from the extension (.bam = BAM). More... | |
bool | ReadBamIndex (const char *filename) |
Read the specified bam index file. More... | |
bool | ReadBamIndex () |
Read the bam index file using the BAM filename as a base. More... | |
void | SetReference (GenomeSequence *reference) |
Sets the reference to the specified genome sequence object. More... | |
void | SetReadSequenceTranslation (SamRecord::SequenceTranslation translation) |
Set the type of sequence translation to use when reading the sequence. More... | |
void | SetWriteSequenceTranslation (SamRecord::SequenceTranslation translation) |
Set the type of sequence translation to use when writing the sequence. More... | |
void | Close () |
Close the file if there is one open. | |
bool | IsOpen () |
Returns whether or not the file has been opened successfully. More... | |
bool | IsEOF () |
Returns whether or not the end of the file has been reached. More... | |
bool | ReadHeader (SamFileHeader &header) |
Reads the header section from the file and stores it in the passed in header. More... | |
bool | WriteHeader (SamFileHeader &header) |
Writes the specified header into the file. More... | |
bool | ReadRecord (SamFileHeader &header, SamRecord &record) |
Reads the next record from the file & stores it in the passed in record. More... | |
bool | WriteRecord (SamFileHeader &header, SamRecord &record) |
Writes the specified record into the file. More... | |
void | setSortedValidation (SortedType sortType) |
Set the flag to validate that the file is sorted as it is read/written. More... | |
uint32_t | GetCurrentRecordCount () |
Return the number of records that have been read/written so far. | |
SamStatus::Status | GetFailure () |
Deprecated, get the Status of the last call that sets status. More... | |
SamStatus::Status | GetStatus () |
Get the Status of the last call that sets status. | |
const char * | GetStatusMessage () |
Get the Status Message of the last call that sets status. | |
bool | SetReadSection (int32_t refID) |
Sets which reference id (index into the BAM list of reference information) of the BAM file should be read. More... | |
bool | SetReadSection (const char *refName) |
Sets which reference name of the BAM file should be read. More... | |
bool | SetReadSection (int32_t refID, int32_t start, int32_t end, bool overlap=true) |
Sets which reference id (index into the BAM list of reference information) & start/end positions of the BAM file should be read. More... | |
bool | SetReadSection (const char *refName, int32_t start, int32_t end, bool overlap=true) |
Sets which reference name & start/end positions of the BAM file should be read. More... | |
void | SetReadFlags (uint16_t requiredFlags, uint16_t excludedFlags) |
Specify which reads should be returned by ReadRecord. More... | |
int32_t | getNumMappedReadsFromIndex (int32_t refID) |
Get the number of mapped reads in the specified reference id. More... | |
int32_t | getNumUnMappedReadsFromIndex (int32_t refID) |
Get the number of unmapped reads in the specified reference id. More... | |
int32_t | getNumMappedReadsFromIndex (const char *refName, SamFileHeader &header) |
Get the number of mapped reads in the specified reference name. More... | |
int32_t | getNumUnMappedReadsFromIndex (const char *refName, SamFileHeader &header) |
Get the number of unmapped reads in the specified reference name. More... | |
uint32_t | GetNumOverlaps (SamRecord &samRecord) |
Returns the number of bases in the passed in read that overlap the region that is currently set. More... | |
void | GenerateStatistics (bool genStats) |
Whether or not statistics should be generated for this file. More... | |
const BamIndex * | GetBamIndex () |
Return the bam index if one has been opened. More... | |
int64_t | GetCurrentPosition () |
Get the current file position. More... | |
void | DisableBuffering () |
Turn off file read buffering. | |
void | PrintStatistics () |
Print the statistics that have been recorded due to a call to GenerateStatistics. | |
bool | attemptRecoverySync (bool(*checkSignature)(void *data), int length) |
void | setAttemptRecovery (bool flag=false) |
Protected Member Functions | |
void | init () |
void | init (const char *filename, OpenType mode, SamFileHeader *header) |
void | resetFile () |
Resets the file prepping for a new file. | |
bool | validateSortOrder (SamRecord &record, SamFileHeader &header) |
Validate that the record is sorted compared to the previously read record if there is one, according to the specified sort order. More... | |
SortedType | getSortOrderFromHeader (SamFileHeader &header) |
bool | processNewSection (SamFileHeader &header) |
bool | ensureIndexedReadPosition () |
bool | checkRecordInSection (SamRecord &record) |
Protected Attributes | |
IFILE | myFilePtr |
GenericSamInterface * | myInterfacePtr |
bool | myIsOpenForRead |
Flag to indicate if a file is open for reading. | |
bool | myIsOpenForWrite |
Flag to indicate if a file is open for writing. | |
bool | myHasHeader |
Flag to indicate if a header has been read/written - required before being able to read/write a record. | |
SortedType | mySortedType |
int32_t | myPrevCoord |
Previous values used for checking if the file is sorted. | |
int32_t | myPrevRefID |
String | myPrevReadName |
uint32_t | myRecordCount |
Keep a count of the number of records that have been read/written so far. | |
SamStatistics * | myStatistics |
Pointer to the statistics for this file. | |
SamStatus | myStatus |
The status of the last SamFile command. | |
bool | myIsBamOpenForRead |
Values for reading Sorted BAM files via the index. | |
bool | myNewSection |
bool | myOverlapSection |
int32_t | myRefID |
int32_t | myStartPos |
int32_t | myEndPos |
uint64_t | myCurrentChunkEnd |
SortedChunkList | myChunksToRead |
BamIndex * | myBamIndex |
GenomeSequence * | myRefPtr |
SamRecord::SequenceTranslation | myReadTranslation |
SamRecord::SequenceTranslation | myWriteTranslation |
std::string | myRefName |
Allows the user to easily read/write a SAM/BAM file.
The SamFile class contains additional functionality that allows a user to read specific sections of sorted & indexed BAM files. In order to take advantage of this capability, the index file must be read prior to setting the read section. This logic saves the time of having to read the entire file and takes advantage of the seeking capability of BGZF.
enum SamFile::OpenType |
enum SamFile::SortedType |
SamFile::SamFile | ( | ErrorHandler::HandlingType | errorHandlingType | ) |
Constructor that sets the error handling type.
errorHandlingType | how to handle errors. |
Definition at line 35 of file SamFile.cpp.
References resetFile().
SamFile::SamFile | ( | const char * | filename, |
OpenType | mode | ||
) |
Constructor that opens the specified file based on the specified mode (READ/WRITE), aborts if the file could not be opened.
filename | name of the file to open. |
mode | mode to use for opening the file. |
Definition at line 45 of file SamFile.cpp.
SamFile::SamFile | ( | const char * | filename, |
OpenType | mode, | ||
ErrorHandler::HandlingType | errorHandlingType | ||
) |
Constructor that opens the specified file based on the specified mode (READ/WRITE) and handles errors per the specified handleType.
filename | name of the file to open. |
mode | mode to use for opening the file. |
errorHandlingType | how to handle errors. |
Definition at line 54 of file SamFile.cpp.
SamFile::SamFile | ( | const char * | filename, |
OpenType | mode, | ||
SamFileHeader * | header | ||
) |
Constructor that opens the specified file based on the specified mode (READ/WRITE) and reads the header, aborts if the file could not be opened or the header not read.
filename | name of the file to open. |
mode | mode to use for opening the file. |
header | to read into or write from |
Definition at line 64 of file SamFile.cpp.
SamFile::SamFile | ( | const char * | filename, |
OpenType | mode, | ||
ErrorHandler::HandlingType | errorHandlingType, | ||
SamFileHeader * | header | ||
) |
Constructor that opens the specified file based on the specified mode (READ/WRITE) and reads the header, handling errors per the specified handleType.
filename | name of the file to open. |
mode | mode to use for opening the file. |
errorHandlingType | how to handle errors. |
header | to read into or write from |
Definition at line 73 of file SamFile.cpp.
void SamFile::GenerateStatistics | ( | bool | genStats | ) |
Whether or not statistics should be generated for this file.
The value is carried over between files and is not reset, but the statistics themselves are reset between files.
genStats | set to true if statistics should be generated, false if not. |
Definition at line 878 of file SamFile.cpp.
References myStatistics.
const BamIndex * SamFile::GetBamIndex | ( | ) |
Return the bam index if one has been opened.
Definition at line 903 of file SamFile.cpp.
|
inline |
|
inline |
Deprecated, get the Status of the last call that sets status.
To remain backwards compatable - will be removed later.
Definition at line 201 of file SamFile.h.
References GetStatus().
int32_t SamFile::getNumMappedReadsFromIndex | ( | const char * | refName, |
SamFileHeader & | header | ||
) |
Get the number of mapped reads in the specified reference name.
Returns -1 for unknown reference names.
refName | reference name for which to extract the number of mapped reads. |
header | header object containing the map from refName to refID |
Definition at line 820 of file SamFile.cpp.
References StatGenStatus::FAIL_ORDER, BamIndex::getNumMappedReads(), SamFileHeader::getReferenceID(), myStatus, BamIndex::REF_ID_UNMAPPED, and StatGenStatus::setStatus().
int32_t SamFile::getNumMappedReadsFromIndex | ( | int32_t | refID | ) |
Get the number of mapped reads in the specified reference id.
Returns -1 for out of range refIDs.
refID | reference ID for which to extract the number of mapped reads. |
Definition at line 790 of file SamFile.cpp.
References StatGenStatus::FAIL_ORDER, BamIndex::getNumMappedReads(), myStatus, and StatGenStatus::setStatus().
uint32_t SamFile::GetNumOverlaps | ( | SamRecord & | samRecord | ) |
Returns the number of bases in the passed in read that overlap the region that is currently set.
Overlapping means that the bases occur in both the read and the reference as either matches or mismatches. This does not count insertions, deletions, clips, pads, or skips.
samRecord | to check for overlapping bases. |
Definition at line 864 of file SamFile.cpp.
References SamRecord::getNumOverlaps(), SamRecord::setReference(), and SamRecord::setSequenceTranslation().
int32_t SamFile::getNumUnMappedReadsFromIndex | ( | const char * | refName, |
SamFileHeader & | header | ||
) |
Get the number of unmapped reads in the specified reference name.
Returns -1 for unknown reference names.
refName | reference name for which to extract the number of unmapped reads. |
header | header object containing the map from refName to refID |
Definition at line 842 of file SamFile.cpp.
References StatGenStatus::FAIL_ORDER, BamIndex::getNumUnMappedReads(), SamFileHeader::getReferenceID(), myStatus, BamIndex::REF_ID_UNMAPPED, and StatGenStatus::setStatus().
int32_t SamFile::getNumUnMappedReadsFromIndex | ( | int32_t | refID | ) |
Get the number of unmapped reads in the specified reference id.
Returns -1 for out of range refIDs.
refID | reference ID for which to extract the number of unmapped reads. |
Definition at line 805 of file SamFile.cpp.
References StatGenStatus::FAIL_ORDER, BamIndex::getNumUnMappedReads(), myStatus, and StatGenStatus::setStatus().
bool SamFile::IsEOF | ( | ) |
Returns whether or not the end of the file has been reached.
Definition at line 424 of file SamFile.cpp.
References ifeof().
bool SamFile::IsOpen | ( | ) |
Returns whether or not the file has been opened successfully.
Definition at line 410 of file SamFile.cpp.
References InputFile::isOpen().
bool SamFile::OpenForRead | ( | const char * | filename, |
SamFileHeader * | header = NULL |
||
) |
Open a sam/bam file for reading with the specified filename, determing the type of file and SAM/BAM by reading the file (if not stdin).
filename | the sam/bam file to open for reading. |
header | to read into or write from (optional) |
Definition at line 93 of file SamFile.cpp.
References InputFile::BGZF, InputFile::DEFAULT, StatGenStatus::FAIL_IO, ifopen(), ifread(), ifrewind(), myIsBamOpenForRead, myIsOpenForRead, myStatus, ReadHeader(), resetFile(), InputFile::setAttemptRecovery(), StatGenStatus::setStatus(), StatGenStatus::SUCCESS, and InputFile::UNCOMPRESSED.
Referenced by Pileup< TestPileupElement >::processFile().
bool SamFile::OpenForWrite | ( | const char * | filename, |
SamFileHeader * | header = NULL |
||
) |
Open a sam/bam file for writing with the specified filename, determining SAM/BAM from the extension (.bam = BAM).
filename | the sam/bam file to open for writing. |
header | to read into or write from (optional) |
Definition at line 223 of file SamFile.cpp.
References InputFile::BGZF, StatGenStatus::FAIL_IO, ifopen(), myIsOpenForWrite, myStatus, resetFile(), StatGenStatus::setStatus(), StatGenStatus::SUCCESS, InputFile::UNCOMPRESSED, and WriteHeader().
bool SamFile::ReadBamIndex | ( | ) |
Read the bam index file using the BAM filename as a base.
It must be read prior to setting a read section, for seeking and reading portions of a bam file. Must be read after opening the BAM file since it uses the BAM filename as a base name for the index file. First it tries filename.bam.bai. If that fails, it tries it without the .bam extension, filename.bai.
Definition at line 328 of file SamFile.cpp.
References StatGenStatus::FAIL_ORDER, InputFile::getFileName(), myStatus, and StatGenStatus::setStatus().
bool SamFile::ReadBamIndex | ( | const char * | filename | ) |
Read the specified bam index file.
It must be read prior to setting a read section, for seeking and reading portions of a bam file.
filename | the name of the bam index file to be read. |
Definition at line 300 of file SamFile.cpp.
References myStatus, BamIndex::readIndex(), StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.
bool SamFile::ReadHeader | ( | SamFileHeader & | header | ) |
Reads the header section from the file and stores it in the passed in header.
Definition at line 437 of file SamFile.cpp.
References StatGenStatus::FAIL_ORDER, myHasHeader, myIsOpenForRead, myStatus, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.
Referenced by OpenForRead(), and Pileup< TestPileupElement >::processFile().
bool SamFile::ReadRecord | ( | SamFileHeader & | header, |
SamRecord & | record | ||
) |
Reads the next record from the file & stores it in the passed in record.
If it is an indexed BAM file and SetReadSection was called, only alignments in the section specified by SetReadSection are read. If they all have already been read, this method returns false.
Validates that the record is sorted according to the value set by setSortedValidation. No sorting validation is done if specified to be unsorted, or setSortedValidation was never called.
Definition at line 501 of file SamFile.cpp.
References StatGenStatus::FAIL_ORDER, SamRecord::getFlag(), myHasHeader, myIsOpenForRead, myRecordCount, myStatistics, myStatus, SamRecord::setReference(), SamRecord::setSequenceTranslation(), StatGenStatus::setStatus(), StatGenStatus::SUCCESS, and validateSortOrder().
Referenced by Pileup< TestPileupElement >::processFile().
void SamFile::SetReadFlags | ( | uint16_t | requiredFlags, |
uint16_t | excludedFlags | ||
) |
Specify which reads should be returned by ReadRecord.
Reads will only be returned by ReadRecord that contain the specified required flags and that do not contain any of the specified excluded flags. ReadRecord will continue to read from the file until a record that complies with these flag settings is found or until the end of the file/region.
requiredFlags | flags that are required to be in records returned by ReadRecord (set to 0x0 if there are no required flags). |
excludedFlags | flags that are required to not be in records returned by ReadRecord (set to 0x0 if there are no excluded flags). |
Definition at line 781 of file SamFile.cpp.
bool SamFile::SetReadSection | ( | const char * | refName | ) |
Sets which reference name of the BAM file should be read.
The records for that reference name will be retrieved on each ReadRecord call. Specify "" or "*" to read records not associated with a reference. When all records have been retrieved for the specified reference name, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading. Sorting validation is reset everytime SetReadPosition is called since it can jump around in the file.
refName | the reference name of the records to read from the file. |
Definition at line 692 of file SamFile.cpp.
References SetReadSection().
bool SamFile::SetReadSection | ( | const char * | refName, |
int32_t | start, | ||
int32_t | end, | ||
bool | overlap = true |
||
) |
Sets which reference name & start/end positions of the BAM file should be read.
The records for this reference name & positions will be retrieved on each ReadRecord call. Specify "" or "*" to indicate reads with no reference. When all records have been retrieved for the specified section, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading. Sorting validation is reset everytime SetReadSection is called since it can jump around in the file.
refName | the reference name of the records to read from the file. |
start | inclusive 0-based start position of records that should be read for this refID. |
end | exclusive 0-based end position of records that should be read for this refID. |
overlap | When true (default), return reads that just overlap the region; when false, only return reads that fall completely within the region |
Definition at line 736 of file SamFile.cpp.
References StatGenStatus::FAIL_ORDER, myIsBamOpenForRead, myPrevCoord, myStatus, BamIndex::REF_ID_ALL, BamIndex::REF_ID_UNMAPPED, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.
bool SamFile::SetReadSection | ( | int32_t | refID | ) |
Sets which reference id (index into the BAM list of reference information) of the BAM file should be read.
The records for that reference id will be retrieved on each ReadRecord call.
Reference ids start at 0, and -1 indicates reads with no reference. When all records have been retrieved for the specified reference id, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading. Sorting validation is reset everytime SetReadPosition is called since it can jump around in the file.
refID | the reference ID of the records to read from the file. |
Definition at line 683 of file SamFile.cpp.
Referenced by SetReadSection().
bool SamFile::SetReadSection | ( | int32_t | refID, |
int32_t | start, | ||
int32_t | end, | ||
bool | overlap = true |
||
) |
Sets which reference id (index into the BAM list of reference information) & start/end positions of the BAM file should be read.
The records for that reference id and positions will be retrieved on each ReadRecord call. Reference ids start at 0, and -1 indicates reads with no reference. When all records have been retrieved for the specified reference id, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading. Sorting validation is reset everytime SetReadPosition is called since it can jump around in the file.
refID | the reference ID of the records to read from the file. |
start | inclusive 0-based start position of records that should be read for this refID. |
end | exclusive 0-based end position of records that should be read for this refID. |
overlap | When true (default), return reads that just overlap the region; when false, only return reads that fall completely within the region |
Definition at line 700 of file SamFile.cpp.
References StatGenStatus::FAIL_ORDER, myIsBamOpenForRead, myPrevCoord, myStatus, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.
void SamFile::SetReadSequenceTranslation | ( | SamRecord::SequenceTranslation | translation | ) |
Set the type of sequence translation to use when reading the sequence.
Passed down to the SamRecord when it is read.
The default type (if this method is never called) is NONE (the sequence is left as-is).
translation | type of sequence translation to use. |
Definition at line 387 of file SamFile.cpp.
void SamFile::SetReference | ( | GenomeSequence * | reference | ) |
Sets the reference to the specified genome sequence object.
reference | pointer to the GenomeSequence object. |
Definition at line 380 of file SamFile.cpp.
Referenced by Pileup< TestPileupElement >::processFile().
void SamFile::setSortedValidation | ( | SortedType | sortType | ) |
Set the flag to validate that the file is sorted as it is read/written.
Must be called after the file has been opened. Sorting validation is reset everytime SetReadPosition is called since it can jump around in the file.
sortType | specifies the type of sort to be checked for. |
Definition at line 669 of file SamFile.cpp.
Referenced by Pileup< TestPileupElement >::processFile().
void SamFile::SetWriteSequenceTranslation | ( | SamRecord::SequenceTranslation | translation | ) |
Set the type of sequence translation to use when writing the sequence.
Passed down to the SamRecord when it is written. The default type (if this method is never called) is NONE (the sequence is left as-is).
translation | type of sequence translation to use. |
Definition at line 394 of file SamFile.cpp.
|
protected |
Validate that the record is sorted compared to the previously read record if there is one, according to the specified sort order.
If the sort order is UNSORTED, true is returned. Sorting validation is reset everytime SetReadPosition is called since it can jump around in the file.
Definition at line 1006 of file SamFile.cpp.
References FLAG, SamRecord::get0BasedPosition(), SamRecord::getReadName(), SamRecord::getReferenceID(), SamFileHeader::getReferenceLabel(), StatGenStatus::INVALID_SORT, myPrevCoord, myRecordCount, myStatus, QUERY_NAME, BamIndex::REF_ID_UNMAPPED, SamRecord::setReference(), SamRecord::setSequenceTranslation(), StatGenStatus::setStatus(), and UNSORTED.
Referenced by ReadRecord(), and WriteRecord().
bool SamFile::WriteHeader | ( | SamFileHeader & | header | ) |
Writes the specified header into the file.
Definition at line 467 of file SamFile.cpp.
References StatGenStatus::FAIL_ORDER, myHasHeader, myIsOpenForWrite, myStatus, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.
Referenced by OpenForWrite().
bool SamFile::WriteRecord | ( | SamFileHeader & | header, |
SamRecord & | record | ||
) |
Writes the specified record into the file.
Validates that the record is sorted according to the value set by setSortedValidation. No sorting validation is done if specified to be unsorted, or setSortedValidation was never called. Returns false and does not write the record if the record was not properly sorted.
Definition at line 619 of file SamFile.cpp.
References StatGenStatus::FAIL_ORDER, StatGenStatus::INVALID_SORT, myHasHeader, myIsOpenForWrite, myRecordCount, myStatus, SamRecord::setReference(), StatGenStatus::setStatus(), StatGenStatus::SUCCESS, and validateSortOrder().
Referenced by SamCoordOutput::flush().