Class Allele
- java.lang.Object
-
- htsjdk.variant.variantcontext.Allele
-
- All Implemented Interfaces:
Serializable
,Comparable<Allele>
public class Allele extends Object implements Comparable<Allele>, Serializable
Immutable representation of an allele.Types of alleles:
Ref: a t C g a // C is the reference base : a t G g a // C base is a G in some individuals : a t - g a // C base is deleted w.r.t. the reference : a t CAg a // A base is inserted w.r.t. the reference sequence
In these cases, where are the alleles?
- SNP polymorphism of C/G -> { C , G } -> C is the reference allele
- 1 base deletion of C -> { tC , t } -> C is the reference allele and we include the preceding reference base (null alleles are not allowed)
- 1 base insertion of A -> { C ; CA } -> C is the reference allele (because null alleles are not allowed)
Suppose I see a the following in the population:
Ref: a t C g a // C is the reference base : a t G g a // C base is a G in some individuals : a t - g a // C base is deleted w.r.t. the reference
How do I represent this? There are three segregating alleles:
{ C , G , - }
and these are represented as:
{ tC, tG, t }
Now suppose I have this more complex example:
Ref: a t C g a // C is the reference base : a t - g a : a t - - a : a t CAg a
There are actually four segregating alleles:
{ Cg , -g, --, and CAg } over bases 2-4
represented as:
{ tCg, tg, t, tCAg }
Critically, it should be possible to apply an allele to a reference sequence to create the correct haplotype sequence:
Allele + reference => haplotype
For convenience, we are going to create Alleles where the GenomeLoc of the allele is stored outside of the Allele object itself. So there's an idea of an A/C polymorphism independent of it's surrounding context. Given list of alleles it's possible to determine the "type" of the variation
A / C @ loc => SNP - / A => INDEL
If you know where allele is the reference, you can determine whether the variant is an insertion or deletion.
Alelle also supports is concept of a NO_CALL allele. This Allele represents a haplotype that couldn't be determined. This is usually represented by a '.' allele.
Note that Alleles store all bases as bytes, in **UPPER CASE**. So 'atc' == 'ATC' from the perspective of an Allele.
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static Allele
ALT_A
static Allele
ALT_C
static Allele
ALT_G
static Allele
ALT_N
static Allele
ALT_T
static Allele
NO_CALL
static String
NO_CALL_STRING
A generic static NO_CALL allele for usestatic Allele
NON_REF_ALLELE
static String
NON_REF_STRING
Non ref allele representationsstatic Allele
REF_A
static Allele
REF_C
static Allele
REF_G
static Allele
REF_N
static Allele
REF_T
static long
serialVersionUID
static Allele
SPAN_DEL
static String
SPAN_DEL_STRING
A generic static SPAN_DEL allele for usestatic Allele
SV_SIMPLE_CNV
static Allele
SV_SIMPLE_DEL
static Allele
SV_SIMPLE_DUP
static Allele
SV_SIMPLE_INS
static Allele
SV_SIMPLE_INV
static Allele
UNSPECIFIED_ALTERNATE_ALLELE
static String
UNSPECIFIED_ALTERNATE_ALLELE_STRING
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static boolean
acceptableAlleleBases(byte[] bases)
static boolean
acceptableAlleleBases(byte[] bases, boolean isReferenceAllele)
static boolean
acceptableAlleleBases(String bases)
static boolean
acceptableAlleleBases(String bases, boolean isReferenceAllele)
boolean
basesMatch(byte[] test)
boolean
basesMatch(Allele test)
boolean
basesMatch(String test)
int
compareTo(Allele other)
static Allele
create(byte base)
static Allele
create(byte[] bases)
Creates a non-Ref allele.static Allele
create(byte[] bases, boolean isRef)
Create a new Allele that includes bases and if tagged as the reference allele if isRef == true.static Allele
create(byte base, boolean isRef)
static Allele
create(Allele allele, boolean ignoreRefState)
Creates a new allele based on the provided one.static Allele
create(String bases)
Creates a non-Ref allele.static Allele
create(String bases, boolean isRef)
boolean
equals(Allele other, boolean ignoreRefState)
Returns true if this and other are equal.boolean
equals(Object other)
static Allele
extend(Allele left, byte[] right)
byte[]
getBases()
Return the DNA bases segregating in this allele.String
getBaseString()
Return the DNA bases segregating in this allele in String format.byte[]
getDisplayBases()
Same as #getDisplayString() but returns the result as byte[].String
getDisplayString()
Return the printed representation of this allele.static Allele
getMatchingAllele(Collection<Allele> allAlleles, byte[] alleleBases)
int
hashCode()
boolean
isBreakpoint()
boolean
isCalled()
boolean
isNoCall()
boolean
isNonRefAllele()
boolean
isNonReference()
boolean
isReference()
boolean
isSingleBreakend()
boolean
isSymbolic()
int
length()
static boolean
oneIsPrefixOfOther(Allele a1, Allele a2)
String
toString()
static boolean
wouldBeBreakpoint(byte[] bases)
static boolean
wouldBeNoCallAllele(byte[] bases)
static boolean
wouldBeNullAllele(byte[] bases)
static boolean
wouldBeSingleBreakend(byte[] bases)
static boolean
wouldBeStarAllele(byte[] bases)
static boolean
wouldBeSymbolicAllele(byte[] bases)
-
-
-
Field Detail
-
serialVersionUID
public static final long serialVersionUID
- See Also:
- Constant Field Values
-
NO_CALL_STRING
public static final String NO_CALL_STRING
A generic static NO_CALL allele for use- See Also:
- Constant Field Values
-
SPAN_DEL_STRING
public static final String SPAN_DEL_STRING
A generic static SPAN_DEL allele for use- See Also:
- Constant Field Values
-
NON_REF_STRING
public static final String NON_REF_STRING
Non ref allele representations- See Also:
- Constant Field Values
-
UNSPECIFIED_ALTERNATE_ALLELE_STRING
public static final String UNSPECIFIED_ALTERNATE_ALLELE_STRING
- See Also:
- Constant Field Values
-
REF_A
public static final Allele REF_A
-
ALT_A
public static final Allele ALT_A
-
REF_C
public static final Allele REF_C
-
ALT_C
public static final Allele ALT_C
-
REF_G
public static final Allele REF_G
-
ALT_G
public static final Allele ALT_G
-
REF_T
public static final Allele REF_T
-
ALT_T
public static final Allele ALT_T
-
REF_N
public static final Allele REF_N
-
ALT_N
public static final Allele ALT_N
-
SPAN_DEL
public static final Allele SPAN_DEL
-
NO_CALL
public static final Allele NO_CALL
-
NON_REF_ALLELE
public static final Allele NON_REF_ALLELE
-
UNSPECIFIED_ALTERNATE_ALLELE
public static final Allele UNSPECIFIED_ALTERNATE_ALLELE
-
SV_SIMPLE_DEL
public static final Allele SV_SIMPLE_DEL
-
SV_SIMPLE_INS
public static final Allele SV_SIMPLE_INS
-
SV_SIMPLE_INV
public static final Allele SV_SIMPLE_INV
-
SV_SIMPLE_CNV
public static final Allele SV_SIMPLE_CNV
-
SV_SIMPLE_DUP
public static final Allele SV_SIMPLE_DUP
-
-
Constructor Detail
-
Allele
protected Allele(byte[] bases, boolean isRef)
-
Allele
protected Allele(String bases, boolean isRef)
-
Allele
protected Allele(Allele allele, boolean ignoreRefState)
Creates a new allele based on the provided one. Ref state will be copied unless ignoreRefState is true (in which case the returned allele will be non-Ref). This method is efficient because it can skip the validation of the bases (since the original allele was already validated)- Parameters:
allele
- the allele from which to copy the basesignoreRefState
- should we ignore the reference state of the input allele and use the default ref state?
-
-
Method Detail
-
create
public static Allele create(byte[] bases, boolean isRef)
Create a new Allele that includes bases and if tagged as the reference allele if isRef == true. If bases == '-', a Null allele is created. If bases == '.', a no call Allele is created. If bases == '*', a spanning deletions Allele is created.- Parameters:
bases
- the DNA sequence of this variation, '-', '.', or '*'isRef
- should we make this a reference allele?- Throws:
IllegalArgumentException
- if bases contains illegal characters or is otherwise malformated
-
create
public static Allele create(byte base, boolean isRef)
-
create
public static Allele create(byte base)
-
wouldBeNullAllele
public static boolean wouldBeNullAllele(byte[] bases)
- Parameters:
bases
- bases representing an allele- Returns:
- true if the bases represent the null allele
-
wouldBeStarAllele
public static boolean wouldBeStarAllele(byte[] bases)
- Parameters:
bases
- bases representing an allele- Returns:
- true if the bases represent the SPAN_DEL allele
-
wouldBeNoCallAllele
public static boolean wouldBeNoCallAllele(byte[] bases)
- Parameters:
bases
- bases representing an allele- Returns:
- true if the bases represent the NO_CALL allele
-
wouldBeSymbolicAllele
public static boolean wouldBeSymbolicAllele(byte[] bases)
- Parameters:
bases
- bases representing an allele- Returns:
- true if the bases represent a symbolic allele, including breakpoints and breakends
-
wouldBeBreakpoint
public static boolean wouldBeBreakpoint(byte[] bases)
- Parameters:
bases
- bases representing an allele- Returns:
- true if the bases represent a symbolic allele in breakpoint notation, (ex: G]17:198982] or ]13:123456]T )
-
wouldBeSingleBreakend
public static boolean wouldBeSingleBreakend(byte[] bases)
- Parameters:
bases
- bases representing an allele- Returns:
- true if the bases represent a symbolic allele in single breakend notation (ex: .A or A. )
-
acceptableAlleleBases
public static boolean acceptableAlleleBases(String bases)
- Parameters:
bases
- bases representing a reference allele- Returns:
- true if the bases represent the well formatted allele
-
acceptableAlleleBases
public static boolean acceptableAlleleBases(String bases, boolean isReferenceAllele)
- Parameters:
bases
- bases representing an alleleisReferenceAllele
- is a reference allele- Returns:
- true if the bases represent the well formatted allele
-
acceptableAlleleBases
public static boolean acceptableAlleleBases(byte[] bases)
- Parameters:
bases
- bases representing a reference allele- Returns:
- true if the bases represent the well formatted allele
-
acceptableAlleleBases
public static boolean acceptableAlleleBases(byte[] bases, boolean isReferenceAllele)
- Parameters:
bases
- bases representing an alleleisReferenceAllele
- true if a reference allele- Returns:
- true if the bases represent the well formatted allele
-
create
public static Allele create(String bases, boolean isRef)
- Parameters:
bases
- bases representing an alleleisRef
- is this the reference allele?- See Also:
Allele(byte[], boolean)
-
create
public static Allele create(String bases)
Creates a non-Ref allele. @see Allele(byte[], boolean) for full information- Parameters:
bases
- bases representing an allele
-
create
public static Allele create(byte[] bases)
Creates a non-Ref allele. @see Allele(byte[], boolean) for full information- Parameters:
bases
- bases representing an allele
-
create
public static Allele create(Allele allele, boolean ignoreRefState)
Creates a new allele based on the provided one. Ref state will be copied unless ignoreRefState is true (in which case the returned allele will be non-Ref). This method is efficient because it can skip the validation of the bases (since the original allele was already validated)- Parameters:
allele
- the allele from which to copy the basesignoreRefState
- should we ignore the reference state of the input allele and use the default ref state?
-
isNoCall
public boolean isNoCall()
- Returns:
- true if this is the NO_CALL allele
-
isCalled
public boolean isCalled()
-
isReference
public boolean isReference()
- Returns:
- true if this Allele is the reference allele
-
isNonReference
public boolean isNonReference()
- Returns:
- true if this Allele is not the reference allele
-
isSymbolic
public boolean isSymbolic()
- Returns:
- true if this Allele is symbolic (i.e. no well-defined base sequence), this includes breakpoints and breakends
-
isBreakpoint
public boolean isBreakpoint()
- Returns:
- true if this Allele is a breakpoint ( ex: G]17:198982] or ]13:123456]T )
-
isSingleBreakend
public boolean isSingleBreakend()
- Returns:
- true if this Allele is a single breakend (ex: .A or A.)
-
getBases
public byte[] getBases()
Return the DNA bases segregating in this allele. Note this isn't reference polarized, so the Null allele is represented by a vector of length 0- Returns:
- the segregating bases
-
getBaseString
public String getBaseString()
Return the DNA bases segregating in this allele in String format. This is useful, because toString() adds a '*' to reference alleles and getBases() returns garbage when you call toString() on it.- Returns:
- the segregating bases
-
getDisplayString
public String getDisplayString()
Return the printed representation of this allele. Same as getBaseString(), except for symbolic alleles. For symbolic alleles, the base string is empty while the display string contains <TAG>.- Returns:
- the allele string representation
-
getDisplayBases
public byte[] getDisplayBases()
Same as #getDisplayString() but returns the result as byte[]. Slightly faster then getDisplayString()- Returns:
- the allele string representation
-
equals
public boolean equals(Object other)
-
equals
public boolean equals(Allele other, boolean ignoreRefState)
Returns true if this and other are equal. If ignoreRefState is true, then doesn't require both alleles has the same ref tag- Parameters:
other
- allele to compare toignoreRefState
- if true, ignore ref state in comparison- Returns:
- true if this and other are equal
-
basesMatch
public boolean basesMatch(byte[] test)
- Parameters:
test
- bases to test against- Returns:
- true if this Allele contains the same bases as test, regardless of its reference status; handles Null and NO_CALL alleles
-
basesMatch
public boolean basesMatch(String test)
- Parameters:
test
- bases to test against- Returns:
- true if this Allele contains the same bases as test, regardless of its reference status; handles Null and NO_CALL alleles
-
basesMatch
public boolean basesMatch(Allele test)
- Parameters:
test
- allele to test against- Returns:
- true if this Allele contains the same bases as test, regardless of its reference status; handles Null and NO_CALL alleles
-
length
public int length()
- Returns:
- the length of this allele. Null and NO_CALL alleles have 0 length.
-
getMatchingAllele
public static Allele getMatchingAllele(Collection<Allele> allAlleles, byte[] alleleBases)
-
compareTo
public int compareTo(Allele other)
- Specified by:
compareTo
in interfaceComparable<Allele>
-
isNonRefAllele
public boolean isNonRefAllele()
- Returns:
- true if Allele is either
<NON_REF>
or<*>
-
-