Package htsjdk.tribble.gff
Class Gff3Codec
- java.lang.Object
-
- htsjdk.tribble.AbstractFeatureCodec<Gff3Feature,LineIterator>
-
- htsjdk.tribble.gff.Gff3Codec
-
- All Implemented Interfaces:
FeatureCodec<Gff3Feature,LineIterator>
public class Gff3Codec extends AbstractFeatureCodec<Gff3Feature,LineIterator>
Codec for parsing Gff3 files, as defined in https://github.com/The-Sequence-Ontology/Specifications/blob/31f62ad469b31769b43af42e0903448db1826925/gff3.md Note that while spec states that all feature types must be defined in sequence ontology, this implementation makes no check on feature types, and allows any string as feature type Each feature line in the Gff3 file will be emitted as a separate feature. Features linked together through the "Parent" attribute will be linked throughGff3Feature.getParents()
,Gff3Feature.getChildren()
,Gff3Feature.getAncestors()
,Gff3Feature.getDescendents()
, amdGff3Feature.flatten()
. This linking is not guaranteed to be comprehensive when the file is read for only features overlapping a particular region, using a tribble index. In this case, a particular feature will only be linked to the subgroup of features it is linked to in the input file which overlap the given region.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
Gff3Codec.DecodeDepth
static class
Gff3Codec.Gff3Directive
Enum for parsing directive lines.
-
Constructor Summary
Constructors Constructor Description Gff3Codec()
Gff3Codec(Gff3Codec.DecodeDepth decodeDepth)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
canDecode(String inputFilePath)
This function returns true iff the File potentialInput can be parsed by this codec.void
close(LineIterator lineIterator)
Adapter method that closes the providedFeatureCodec
.Gff3Feature
decode(LineIterator lineIterator)
Decode a singleFeature
from theFeatureCodec
, reading no further in the underlying source than beyond that feature.Feature
decodeLoc(LineIterator lineIterator)
Decode a line to obtain just its FeatureLoc for indexing -- contig, start, and stop.Map<Integer,String>
getCommentsWithLineNumbers()
Gets map from line number to comment found on that line.List<String>
getCommentTexts()
Gets list of comments parsed by the codec.List<SequenceRegion>
getSequenceRegions()
Get list of sequence regions parsed by the codec.TabixFormat
getTabixFormat()
Define the tabix format for the feature, used for indexing.boolean
isDone(LineIterator lineIterator)
Adapter method that assesses whether the providedFeatureCodec
has more data.LocationAware
makeIndexableSourceFromStream(InputStream bufferedInputStream)
Return aFeatureCodec
for thisFeatureCodec
that implementsLocationAware
, and is thus suitable for use during indexing.LineIterator
makeSourceFromStream(InputStream bufferedInputStream)
Generates a reader of typeFeatureCodec
appropriate for use by this codec from the generic input stream.FeatureCodecHeader
readHeader(LineIterator lineIterator)
Read and return the header, or null if there is no header.-
Methods inherited from class htsjdk.tribble.AbstractFeatureCodec
getFeatureType
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface htsjdk.tribble.FeatureCodec
getPathToDataFile
-
-
-
-
Constructor Detail
-
Gff3Codec
public Gff3Codec()
-
Gff3Codec
public Gff3Codec(Gff3Codec.DecodeDepth decodeDepth)
-
-
Method Detail
-
decode
public Gff3Feature decode(LineIterator lineIterator) throws IOException
Description copied from interface:FeatureCodec
Decode a singleFeature
from theFeatureCodec
, reading no further in the underlying source than beyond that feature.- Parameters:
lineIterator
- the input stream from which to decode the next record- Returns:
- Return the Feature encoded by the line, or null if the line does not represent a feature (e.g. is a comment)
- Throws:
IOException
-
getSequenceRegions
public List<SequenceRegion> getSequenceRegions()
Get list of sequence regions parsed by the codec.- Returns:
- list of sequence regions
-
getCommentsWithLineNumbers
public Map<Integer,String> getCommentsWithLineNumbers()
Gets map from line number to comment found on that line. The text of the comment EXCLUDES the leading # which indicates a comment line.- Returns:
- Map from line number to comment found on line
-
getCommentTexts
public List<String> getCommentTexts()
Gets list of comments parsed by the codec. Excludes leading # which indicates a comment line.- Returns:
-
decodeLoc
public Feature decodeLoc(LineIterator lineIterator) throws IOException
Description copied from interface:FeatureCodec
Decode a line to obtain just its FeatureLoc for indexing -- contig, start, and stop.- Specified by:
decodeLoc
in interfaceFeatureCodec<Gff3Feature,LineIterator>
- Overrides:
decodeLoc
in classAbstractFeatureCodec<Gff3Feature,LineIterator>
- Parameters:
lineIterator
- the input stream from which to decode the next record- Returns:
- Return the FeatureLoc encoded by the line, or null if the line does not represent a feature (e.g. is a comment)
- Throws:
IOException
-
canDecode
public boolean canDecode(String inputFilePath)
Description copied from interface:FeatureCodec
This function returns true iff the File potentialInput can be parsed by this codec. Note that checking the file's extension is a perfectly acceptable implementation of this method and file contents only rarely need to be checked.
There is an assumption that there's never a situation where two different Codecs return true for the same file. If this occurs, the recommendation would be to error out.
Note this function must never throw an error. All errors should be trapped and false returned.- Parameters:
inputFilePath
- the file to test for parsability with this codec- Returns:
- true if potentialInput can be parsed, false otherwise
-
readHeader
public FeatureCodecHeader readHeader(LineIterator lineIterator)
Description copied from interface:FeatureCodec
Read and return the header, or null if there is no header. Note: Implementers of this method must be careful to read exactly as much fromFeatureCodec
as needed to parse the header, and no more. Otherwise, data that might otherwise be fed into parsing aFeature
may be lost.- Parameters:
lineIterator
- the source from which to decode the header- Returns:
- header object
-
makeSourceFromStream
public LineIterator makeSourceFromStream(InputStream bufferedInputStream)
Description copied from interface:FeatureCodec
Generates a reader of typeFeatureCodec
appropriate for use by this codec from the generic input stream. Implementers should assume the stream is buffered.
-
makeIndexableSourceFromStream
public LocationAware makeIndexableSourceFromStream(InputStream bufferedInputStream)
Description copied from interface:FeatureCodec
Return aFeatureCodec
for thisFeatureCodec
that implementsLocationAware
, and is thus suitable for use during indexing. LikeFeatureCodec.makeSourceFromStream(java.io.InputStream)
, except theLocationAware
compatibility is required for creating indexes. Implementers of this method must return a type that is bothLocationAware
as well asFeatureCodec
. Note that this requirement cannot be enforced via the method signature due to limitations in Java's generic typing system. Instead, consumers should cast the call result into aFeatureCodec
when applicable. NOTE: During the indexing process, the indexer passes theFeatureCodec
to the codec to consume Features from the underlyingFeatureCodec
, one at a time, recording the Feature location via theFeatureCodec
'sLocationAware
interface. Therefore, it is essential that theFeatureCodec
implementation, theFeatureCodec.readHeader(SOURCE)
method, and theFeatureCodec.decodeLoc(SOURCE)
method, which are used during indexing, not introduce any buffering that would that would advance theFeatureCodec
more than a single feature (or the more than the size of the header, in the case ofFeatureCodec.readHeader(SOURCE)
).
-
isDone
public boolean isDone(LineIterator lineIterator)
Description copied from interface:FeatureCodec
Adapter method that assesses whether the providedFeatureCodec
has more data. True if it does, false otherwise.
-
close
public void close(LineIterator lineIterator)
Description copied from interface:FeatureCodec
Adapter method that closes the providedFeatureCodec
.
-
getTabixFormat
public TabixFormat getTabixFormat()
Description copied from interface:FeatureCodec
Define the tabix format for the feature, used for indexing. Default implementation throws an exception. Note that onlyAsciiFeatureCodec
could read tabix files as defined inAbstractFeatureReader.getFeatureReader(String, String, FeatureCodec, boolean, java.util.function.Function, java.util.function.Function)
- Returns:
- the format to use with tabix
-
-