intronstop.cpp File Reference

#include <iostream>
#include <fstream>
#include "RNAModel.h"
#include <mysql++.h>
#include "dbinfo.h"
#include <iomanip>

Classes

struct  instopres

Functions

void countStop (Connection &conn, const string &file, const map< string, string > &genostore)
void cacheGenomic (Connection &conn, map< string, string > &gstore, const string &host, const string &database)
void stopsInIntrons (const Noschain &ch, const string &gseq, map< Range, instopres > &res)
int numberOfStops (const string &seq)
void countIntronN (Connection &conn, const string &file, const map< string, string > &genostore)
void countNInIntrons (const Noschain &ch, const string &gseq, map< Range, int > &res, int bs[4])
void writeBaseCount (ostream &ous, int base[4])
void usage ()
int main (int argc, char *argv[])

Function Documentation

void cacheGenomic ( Connection &  conn,
map< string, string > &  gstore,
const string &  host,
const string &  database 
)

the schema for genomic sequence in JGI is not staightforward, This function may not work well. It is usually better to make a genomic table from the subtables There are three tables: scaffold, scaffoldSeq, scaffoldInfo scaffold, is a JGI track table scaffoldInfo is a table describing the internal structure of scaffold in terms of contigs. This table also have a sequnce column but are left un populated. scaffoldSeq is the actual sequence. if genomic sequence is long you cannot use scaffoldSeq table, you have to use my kzgenomic table for this.

Referenced by main().

void countIntronN ( Connection &  conn,
const string &  file,
const map< string, string > &  genostore 
)

count N in all intron sequences. Both coding and non-coding introns are counted. count 4 bases plust N and derive frequency.

References countNInIntrons(), and writeBaseCount().

Referenced by main().

void countNInIntrons ( const Noschain ch,
const string &  gseq,
map< Range, int > &  res,
int  bs[4] 
)

Parameters:
bs[] is a vector of 4 elements: A,C,G,T count from the intron sequences. This is an accumulator.
res is a intron b,e => number_of_N bases map.

References Range::begin(), Range::direction(), Range::end(), Noschain::numberOfRanges(), and reverseComplement().

Referenced by countIntronN().

void countStop ( Connection &  conn,
const string &  file,
const map< string, string > &  genostore 
)

count stop in coding introns in the frame of the preceeding exon

References stopsInIntrons(), and Noschain::subchain().

Referenced by main().

int main ( int  argc,
char *  argv[] 
)

int numberOfStops ( const string &  seq  ) 

count stop codon in frame 0, which is inframe with the preceding exon.

Referenced by stopsInIntrons().

void stopsInIntrons ( const Noschain ch,
const string &  gseq,
map< Range, instopres > &  res 
)

Parameters:
res. Is the accumulator for each genomic sequence. The Range only make sense within this genomic sequence.

References Range::begin(), Range::direction(), Range::end(), Range::length(), Noschain::numberOfRanges(), numberOfStops(), and reverseComplement().

Referenced by countStop().

void usage (  ) 

given gmap summary format, this program converts it into combest archive format (*.car)

It can be used a pipe, or given specific file names.

this is a helper program to count distinct estids in the ESTId column of the combest result. So it is the actual number of ESTs mapped. This could be lower for deeply covered genome because of the coverage depth-dependent filtering.

void writeBaseCount ( ostream &  ous,
int  base[4] 
)

Parameters:
base input array of frequenies of four bases
ous write result in the format of Base Character <TAB> Count <TAB> Frequence Also write expected stop frequency in isolation.

Referenced by countIntronN().


Generated on Wed Aug 10 11:57:02 2011 for Softwares from Orpara by  doxygen 1.5.6