#include <iostream>#include <fstream>#include "RNAModel.h"#include <mysql++.h>#include "dbinfo.h"#include <iomanip>Classes | |
| struct | instopres |
Functions | |
| void | countStop (Connection &conn, const string &file, const map< string, string > &genostore) |
| void | cacheGenomic (Connection &conn, map< string, string > &gstore, const string &host, const string &database) |
| void | stopsInIntrons (const Noschain &ch, const string &gseq, map< Range, instopres > &res) |
| int | numberOfStops (const string &seq) |
| void | countIntronN (Connection &conn, const string &file, const map< string, string > &genostore) |
| void | countNInIntrons (const Noschain &ch, const string &gseq, map< Range, int > &res, int bs[4]) |
| void | writeBaseCount (ostream &ous, int base[4]) |
| void | usage () |
| int | main (int argc, char *argv[]) |
| void cacheGenomic | ( | Connection & | conn, | |
| map< string, string > & | gstore, | |||
| const string & | host, | |||
| const string & | database | |||
| ) |
the schema for genomic sequence in JGI is not staightforward, This function may not work well. It is usually better to make a genomic table from the subtables There are three tables: scaffold, scaffoldSeq, scaffoldInfo scaffold, is a JGI track table scaffoldInfo is a table describing the internal structure of scaffold in terms of contigs. This table also have a sequnce column but are left un populated. scaffoldSeq is the actual sequence. if genomic sequence is long you cannot use scaffoldSeq table, you have to use my kzgenomic table for this.
Referenced by main().
| void countIntronN | ( | Connection & | conn, | |
| const string & | file, | |||
| const map< string, string > & | genostore | |||
| ) |
count N in all intron sequences. Both coding and non-coding introns are counted. count 4 bases plust N and derive frequency.
References countNInIntrons(), and writeBaseCount().
Referenced by main().
| void countNInIntrons | ( | const Noschain & | ch, | |
| const string & | gseq, | |||
| map< Range, int > & | res, | |||
| int | bs[4] | |||
| ) |
| bs[] | is a vector of 4 elements: A,C,G,T count from the intron sequences. This is an accumulator. | |
| res | is a intron b,e => number_of_N bases map. |
References Range::begin(), Range::direction(), Range::end(), Noschain::numberOfRanges(), and reverseComplement().
Referenced by countIntronN().
| void countStop | ( | Connection & | conn, | |
| const string & | file, | |||
| const map< string, string > & | genostore | |||
| ) |
count stop in coding introns in the frame of the preceeding exon
References stopsInIntrons(), and Noschain::subchain().
Referenced by main().
| int main | ( | int | argc, | |
| char * | argv[] | |||
| ) |
| int numberOfStops | ( | const string & | seq | ) |
count stop codon in frame 0, which is inframe with the preceding exon.
Referenced by stopsInIntrons().
| res. | Is the accumulator for each genomic sequence. The Range only make sense within this genomic sequence. |
References Range::begin(), Range::direction(), Range::end(), Range::length(), Noschain::numberOfRanges(), numberOfStops(), and reverseComplement().
Referenced by countStop().
| void usage | ( | ) |
given gmap summary format, this program converts it into combest archive format (*.car)
It can be used a pipe, or given specific file names.
this is a helper program to count distinct estids in the ESTId column of the combest result. So it is the actual number of ESTs mapped. This could be lower for deeply covered genome because of the coverage depth-dependent filtering.
| void writeBaseCount | ( | ostream & | ous, | |
| int | base[4] | |||
| ) |
| base | input array of frequenies of four bases | |
| ous | write result in the format of Base Character <TAB> Count <TAB> Frequence Also write expected stop frequency in isolation. |
Referenced by countIntronN().
1.5.6