Bacterial genome annotation using Annovar - annotations

am writing my own pipeline in python in order to annotate bacterial genome MTB, am new in this field and am a bit lost , I converted my VCF to appropriate annovar input format, then i got stack, i have to use dbSNP to annotate SNP and hrv37 as the reference genome for annotation but dont really know the correct command format or what i really need to provide more. I read the manual but it is not really helping me. Anyone with experience in using Annovar to annotate bacterial genomes? Thanks in advance

Related

RDKit: generate fingerprints from ZINC database for cluster analysis

I'm new to RDKit. I need to do a cluster analysis of a database of compounds.
I've downloaded 191K compounds from ZINC database in 3D mol2 format and now I need to obtain fingerprints using RDKit.
First, I don't understand if it's possible to convert mol2 format into fingerprints and what kind of fingerprints is better for this type of analysis (I need to understand what chemotypes I have in the database in order to - eventually - find some representatives).
Does anyone have suggestions?(practical suggestions are really appreciated, too).
Thanks
RdKit supports the mol2 file loading. You can use the MolFromMol2File function for that.
from rdkit import Chem
mol2_paths = ['path1', 'path2', 'path3', ......]
mols = []
for path in mol2_paths:
mols.append(Chem.MolFromMol2File(path))
The above function will load all the mol2 files and create a RdKit molecule object for all of them. Once an object is created, you can use it to calculate any of the properties (similar to how you would calculate if you had a SMILES string).
Now, for clustering, RdKit has a ClusterData module, you can use that. See the module here. See an example usage of the module here. Another example here. Check out this presentation on different methods of clustering in RdKit here. An alternative way to cluster here.
Hope this should be a sufficient information for you to go ahead.

How to write Moses decoder translations into a file?

I'm trying to test the Moses decoder. I have huge amounts of data and I need to record their translations into a file. I mean giving all the data to Moses and recording the translations into a file. Is there anyone who can help me? I have used this link: http://www.statmt.org/moses/?n=Moses.Baseline and now I can just give a sentence to it inorder to translate.
Thanks!

training a new model using pascal kit

need some help on this.
Currently I am doing a project on computer vision that requires me to train a new model to detect a certain object.
In this case, I am using the system provided by P. Felzenszwalb, D. McAllester, D. Ramaman and his team => Discriminatively trained deformable part models which is implemented in Matlab.
Project webpage: http://www.cs.uchicago.edu/~pff/latent/.
However I have no idea how to direct the system to use my dataset(a collection of images and annotation) which is different from the the PASCAL datasets so as to train a new model.
By directing, I meant a line of code that allows me to change the dataset the system reads from, for training a model.
E.g.
% directory for caching models, intermediate data, and results
cachedir = ['/var/tmp/rbg/YOURPATH/' VOCyear '/'];
I tried looking at their Readme and documentation guides but they do not make any mention. Do correct me if I am wrong.
Let me know if I have not made my problem clear enough.
I tried looking at some files such as global.m but no go.
Your help is much appreciated and thanks in advance!
You can try to read pascal.m in the DPM package(voc-release5), there are similar code working on VOC2007/2010 dataset.
There are plenty of parts that need to be adapted to achieve this. For example the voc_config has to be adapted in order to read from your files.
The same with the pascal_train.m function. Depending on the images and the way you parse them, this may require quite some time to adapt this function.
Other functions to consider:
imreadx
pascal_test
pascaleval

Extracting function call list from DOxygen XML Output

I posted a question on the DOxygen forums and also am posting it here for a better response.
I have a moderately sized C project of about 2,900 functions. I am using DOxygen 1.5.9 and it is successfully generating a call graph for the functions. Is there a way to extract this out for further analysis? A simple paired list would be sufficient, e.g.
Caller,Callee
FunctionX, FunctionY
...
I am comfortable with XSLT but I must say that the DOxygen XML output is complex. Has anyone done this before and can provide some guidance on how to parse the XML files?
Thanks in advance!
Based on what I see in the contrived example that I created,
Parse files with a name similar to ^_(.+)\d+(c|cpp|h|hpp)\.xml$, if my regex-foo is right.
Find all <memberdef kind="function">. It has a unique id attribute. I believe the XPath for this is //memberdef[#kind='function'].
Within that element, find all <references>.
For each of those tags, the refid attribute uniquely refers to the id attribute of the corresponding <memberdef> that is being called.
The text node within each <references> corresponds to the <name> of the corresponding <memberdef> that is being called.
This seems like a nice, straightforward way to express call graphs. You should have no trouble using XSLT or any other sane XML-parsing suite to get the desired results.

smlnj listdir problems

I am a newbie learning sml and the question I am given involves IO functions that I do not understand.
Here are the 2 questions that I really need help with to get me started, please provide me with code and some explanation, I will be able to use trial and error with the code given for the other questions.
Q2) readlist(filename) which reads a list of filenames (each of which were produced by listdir in (Q1) and combines them into one large list.
(reads from the text file in Q1 and then assigns the contents into 1 big list containing all the information)
Thing is, I only learned from the lecturer in school on the introduction section, there isn't even a system input or output example shown, not even the "use file" function is taught. if anyone that knows sml sees this, please help. Thanks to anyone who took the effort to help me.
Thanks for the reply, current I am using SMLNJ to try and do this. Basically, Q1 requires me to list the directory's files of the "directoryname" provided into a text file in "filename". The Q2 requires me to read from the "filename" text file and then place the contents into one large list.
BTW, if you people only kept seeing this post, please try and ask questions also. Currently i am stuck trying to read from the txt file and appending it to a list, I am able to do it for a single line but am now trying to do it for the whole file:
fun readlist(infile : string) =
let val ins = TextIO.openIn infile
fun listing() =
TextIO.inputLine ins;
in listing()
end;
TextIO.closeIn;
It is very hard for me to make out what questions you are trying to ask.
The functions you ask about are not part of the Standard Basis Library for ML. If you are supposed to write them, you are going to have a hard time without some kind of Posix module. You can tell your instructor I didn't care for this assignment.
Moscow ML contains a listDir function which is admirably simple:
- load "Mosml";
> val it = () : unit
- Mosml.listDir ".";
> val it = ["natural-semantics.djvu", "natural-semantics.pdf"] : string list
-
To get more help, please be a little clearer what you are asking.
EDIT: Since it's a homework question I shouldn't just give you the answer, but some useful functions includeopenDir, readDir, and closeDir from the OS.Filesys structure. These will tell you what's in the directory. Then to read and write files you'll want TextIO.
You'll find the Standard Basis Library documentation indispensible.
You sure i didn't teach u?
u owe me one chicken pie.