gathergene.cpp File Reference

#include <iostream>
#include <fstream>
#include <mysql++.h>
#include <dbinfo.h>
#include "SimpleRNAModel.h"
#include <hatrees.h>

Functions

int groupModel (set< mRNAModelLight *, lessByChainDirectionPtr > &model, ostream &ous)
void clustergene (list< mRNAModelLight * > &mod, ostream &ous)
void releaseMemory (list< mRNAModelLight * > &mod)
void updateInputTable (const string &intab, const string &file, Connection &conn)
void createGeneTable (const string &tabname, const string &intab, Connection &conn)
void loadModelGeneTable (const string &genetable, const string &genefile, Connection &conn)
void combestInput (const string &intab, Connection &conn, const string &genetable)
void jgiInput (const string &intab, Connection &conn)
string nameGeneTable (const string &inputTable)
bool isJGIInput (Connection &conn, const string &tab)
void usage ()
int main (int argc, char *argv[])
void storeBadModel (const string &badtab, const map< int, string > &badmod, Connection &conn)
void updateJGIGene (const string &intab, const string &m2gtab, Connection &conn)

Function Documentation

void clustergene ( list< mRNAModelLight * > &  mod,
ostream &  ous 
)

Parameters:
mod input.
ous output result. (modelid, geneid==repmodelid) repmodel is the model of the longest sum of exons.
If mod has only one model, then deallocate memory, clear mod then return. Case of more than one model: use mRNAModelLight::sameGene() method to cluster models into genes. This method use a geneid counter to make geneids. Should use modelid of rep. This way, we can save one value. This method should be moved to combest so that we don't need this program again.

This method will be used for generating gene info for JGI tracks. So we could not discard this yet. This will be considered as a new branch.

References hatrees< T >::clusterArray(), hatrees< T >::getNodeCount(), hatrees< T >::keyset(), and releaseMemory().

void combestInput ( const string &  intab,
Connection &  conn,
const string &  genetable 
)

use combest tables as input to make gene track This method could also use nrepstopfix1 table. Its schema is similar.

Parameters:
intab. default intab should be 'combest_model' This function can also use tables with (genomic,exons,cdsb,cdse) where cdsb,cdse means genomic cdsbegin, end. genomic is genomicid.

References createGeneTable(), groupModel(), string(), and updateInputTable().

Referenced by main().

void createGeneTable ( const string &  tabname,
const string &  intab,
Connection &  conn 
)

Create a full-featured gene table from the model table This is just the simple stats summarized into one talbe. It is for convinience. table columns (geneid, genomicid, begin, end, nummodel)

Parameters:
intab is the input table, must have schema from combest (modelid,geneid,genomicid,begin,end)
tabname is the name of the gene table to be build.
This function basically run a simple query to format the existing relation with group statement. Move this method into loadcombest2mysql

int groupModel ( set< mRNAModelLight *, lessByChainDirectionPtr > &  model,
ostream &  ous 
)

model from one chromosome or genomic DNA

Parameters:
model input: a set of mRNAModelLight pointers sorted directionally with lessByChainDirectionPtr()
ous output in tabular format. (modelid <TAB> geneid) The result is for direct upload into database table. This will be used to update input table's extra geneid column.
First group model with > 0.5 Genomic Overlap into groups Then cluster members in each group into genes with sameGene().

References clustergene(), Range::length(), Range::merge(), min, and Range::overlap().

bool isJGIInput ( Connection &  conn,
const string &  tab 
)

Referenced by main().

void jgiInput ( const string &  intab,
Connection &  conn 
)

void loadModelGeneTable ( const string &  genetable,
const string &  genefile,
Connection &  conn 
)

simple load table file into database table.

Parameters:
genefile the input file.
genetable the database table name The file has two columns (modelid,geneid) This is a simple mapping table.

Referenced by jgiInput().

int main ( int  argc,
char *  argv[] 
)

gathergene program is nolonger run on combest because I have add this component into the combest algorithm. The combest algorithm is much better. This should be discarded from the pipeline. But this program is very useful outside the comest pipeline, because I have made it to run on any JGI track.

References combestInput(), MysqlDBInfo::getAuthenInfo(), MysqlDBInfo::getPassword(), MysqlDBInfo::getUser(), isJGIInput(), jgiInput(), nameGeneTable(), and usage.

string nameGeneTable ( const string &  inputTable  ) 

Referenced by main().

void releaseMemory ( list< mRNAModelLight * > &  mod  ) 

void storeBadModel ( const string &  badtab,
const map< int, string > &  badmod,
Connection &  conn 
)

Referenced by jgiInput().

void updateInputTable ( const string &  intab,
const string &  file,
Connection &  conn 
)

update input table with gene table stored in files First load the gene table file into gene table; this gene table has only two columns modelid => geneid. We call it minimal table. This table is not very useful, so we are going to make a full-featured gene table later. Then use the table in database to update the geneid columns of the model table.

void updateJGIGene ( const string &  intab,
const string &  m2gtab,
Connection &  conn 
)

Referenced by jgiInput().

void usage (  ) 

given gmap summary format, this program converts it into combest archive format (*.car)

It can be used a pipe, or given specific file names.

this is a helper program to count distinct estids in the ESTId column of the combest result. So it is the actual number of ESTs mapped. This could be lower for deeply covered genome because of the coverage depth-dependent filtering.

given gmap summary format, this program converts it into combest archive format (*.car)

It can be used a pipe, or given specific file names.

this is a helper program to count distinct estids in the ESTId column of the combest result. So it is the actual number of ESTs mapped. This could be lower for deeply covered genome because of the coverage depth-dependent filtering.


Generated on Wed Aug 10 11:57:02 2011 for Softwares from Orpara by  doxygen 1.5.6