Protein Class Reference

#include <bioseq.h>

Inheritance diagram for Protein:

bioseq

List of all members.

Public Member Functions

 Protein ()
 Protein (const Protein &s)
 Protein (const string &str)
 Protein (const bioseq &s)
 Protein (const string &n, const string &s)
Protein subseq (int b, int e) const
 ~Protein ()
Proteinoperator= (const Protein &p)
Proteinoperator= (const string &str)
double relativeEntropy () const
double relativeEntropyUniform () const
const double * getCodeFrequence ()
sequenceType getSequenceType () const
bool hasStart () const
bool hasStop () const
bool hasInternalStop () const
int countInternalStops ()

Static Public Member Functions

static const char * threeLetterCode (const char A)
static const char * aminoAcidFullName (const char A)

Static Public Attributes

static const int numcodes = 27
static const char * oneLetterSymbols = "ABCDEFGHIKLMNPQRSTUVWXYZ*"

Private Attributes

double codefreq [27]

Static Private Attributes

static const char * symbols []
static const double aafreq []


Detailed Description

all protein single letter code use upper case The most recent version of amino acid code is stored in this class.

Constructor & Destructor Documentation

Protein::Protein (  )  [inline]

References codefreq.

Protein::Protein ( const Protein s  )  [inline]

References codefreq, and numcodes.

Protein::Protein ( const string &  str  )  [inline]

References codefreq.

Protein::Protein ( const bioseq s  )  [inline]

References codefreq.

Protein::Protein ( const string &  n,
const string &  s 
) [inline]

References codefreq.

Protein::~Protein (  )  [inline]


Member Function Documentation

Protein Protein::subseq ( int  b,
int  e 
) const [inline]

performance is very bad if sequence is long. 1-based index, inclusive [b, e]

Reimplemented from bioseq.

References bioseq::subseq().

Protein & Protein::operator= ( const Protein p  ) 

Protein & Protein::operator= ( const string &  str  ) 

Reimplemented from bioseq.

References codefreq, and bioseq::operator=().

static const char* Protein::threeLetterCode ( const char  A  )  [inline, static]

References symbols, and toupper().

static const char* Protein::aminoAcidFullName ( const char  A  )  [inline, static]

References symbols, and toupper().

double Protein::relativeEntropy (  )  const

relative to SwissProt statistics

References aafreq, bioseq::getFrequency(), and toupper().

Referenced by main().

double Protein::relativeEntropyUniform (  )  const

const double * Protein::getCodeFrequence (  ) 

return a pointer to the internally stored int array that is terminated by -1, all legal code for aa range from 0-26 it has one extra element than the sequence length calling this before calling encode will return the array is according the 26 letter codes Note: all letters are used.

References codefreq, bioseq::getcode(), and bioseq::length().

sequenceType Protein::getSequenceType (  )  const [inline, virtual]

for good programming this method is not needed

Reimplemented from bioseq.

References PROTEINSEQ.

bool Protein::hasStart (  )  const [inline]

References bioseq::seq.

bool Protein::hasStop (  )  const [inline]

References bioseq::seq.

Referenced by GenModel::valid().

bool Protein::hasInternalStop (  )  const

References bioseq::seq.

int Protein::countInternalStops (  ) 

References bioseq::seq.

Referenced by GenModel::valid().


Member Data Documentation

const int Protein::numcodes = 27 [static]

Referenced by operator=(), and Protein().

const char * Protein::oneLetterSymbols = "ABCDEFGHIKLMNPQRSTUVWXYZ*" [static]

double Protein::codefreq[27] [mutable, private]

stores the amino acid frequence as an array A in zero, B in one .... Calling computeEntropy will fill this array The first element is set to -1 to indicate that this array has not been set. Not occuring letters have frequency of zero. # 27 is the stop codon "*"

Referenced by getCodeFrequence(), operator=(), and Protein().

const char * Protein::symbols [static, private]

Initial value:

{
"Ala", "alanine", 
"Asx", "aspartic acid or asparagine",
"Cys", "cysteine", 
"Asp", "aspartic acid",
"Glu", "glutamic acid", 
"Phe", "phenylalanine",
"Gly", "glycine", 
"His", "histidine", 
"Ile", "isoleucine",
"NUL", "not defined",
"Lys", "lysine", 
"Leu", "leucine", 
"Met", "methionine",
"Asn", "asparagine", 
"NUL", "not defined",
"Pro", "proline", 
"Gln", "glutamine",
"Arg", "arginine", 
"Ser", "serine", 
"Thr", "threonine",
"Sec", "selenocysteine", 
"Val", "valine", 
"Trp", "tryptophan",
"Xaa", "unknown or 'other' amino acid", 
"Tyr", "tyrosine",
"Glx", "glutamic acid or glutamine (or substances such as 4-carboxyglutamic acid and 5-oxoproline that yield glutamic acid on acid hydrolysis of peptides)",
"Stp", "stop codon"}
array of 2x27 three letter code, description The char(index + A) is the single letter code.

Referenced by aminoAcidFullName(), and threeLetterCode().

const double Protein::aafreq [static, private]

Initial value:

 { 
   0.0783, 0.0000, 0.0152, 0.0532, 0.0664,
   0.0400, 0.0693, 0.0229, 0.0591, 0.0000,
   0.0593, 0.0964, 0.0238, 0.0418, 0.0000,
   0.0483, 0.0395, 0.0535, 0.0686, 0.0542, 
   0.0000, 0.0671, 0.0115, 0.0000, 0.0306, 0.0000}
frequency taken from SwissProt

Referenced by relativeEntropy().


The documentation for this class was generated from the following files:

Generated on Wed Aug 10 11:57:14 2011 for Softwares from Orpara by  doxygen 1.5.6