RDKit: generate fingerprints from ZINC database for cluster analysis - cluster-analysis

I'm new to RDKit. I need to do a cluster analysis of a database of compounds.
I've downloaded 191K compounds from ZINC database in 3D mol2 format and now I need to obtain fingerprints using RDKit.
First, I don't understand if it's possible to convert mol2 format into fingerprints and what kind of fingerprints is better for this type of analysis (I need to understand what chemotypes I have in the database in order to - eventually - find some representatives).
Does anyone have suggestions?(practical suggestions are really appreciated, too).
Thanks

RdKit supports the mol2 file loading. You can use the MolFromMol2File function for that.
from rdkit import Chem
mol2_paths = ['path1', 'path2', 'path3', ......]
mols = []
for path in mol2_paths:
mols.append(Chem.MolFromMol2File(path))
The above function will load all the mol2 files and create a RdKit molecule object for all of them. Once an object is created, you can use it to calculate any of the properties (similar to how you would calculate if you had a SMILES string).
Now, for clustering, RdKit has a ClusterData module, you can use that. See the module here. See an example usage of the module here. Another example here. Check out this presentation on different methods of clustering in RdKit here. An alternative way to cluster here.
Hope this should be a sufficient information for you to go ahead.

Related

Matlab converting library to model

I'm working on a script to convert a Simulink library to a plain model, meaning it can be simulated, it does not auto-lock etc.
Is there a way to do this with code aside from basically copy-pasting every single block into a new model? And if it isn't, what is the most efficient way to do the "copy-paste".
I was not able to find any clues as how to approach this problem here, or on Google, or on the official documentation or on the MathWorks forum so I'm at a loss on how to proceed.
Thank you in advance!
I don't think it's possible to convert a library to a model, but you can programmatically add library blocks to models like so:
sys = 'testModel';
new_system(sys);
open_system(sys);
add_block('Simulink/Sources/Sine Wave', [sys, '/MySineWave']);
save_system(sys);
close_system(sys);
sim(sys);
You could even use the find_system command to list all the blocks in a library and then loop through them all and create a new model for each using the above code.

Is it possible to update and use updated .ini and .ned files when Omnet++ simulation is running?

I am trying to run Omnet++ and matlab software in parallel and want them to communicate. When Omnet++ is running, I want to update the position of the node and for that I want to edit the .ned and .int files with matlab results continuously. During simulation I want to generate the result file using the updated files. I want just to update the position and don't want to add or delete any node. Please suggest me a way for proceeding?
matlab_loop
{
matlab_writes_position_in_ned_file;
delay(100ms);
}
omnet_loop
{
omnet_loads_ned_and_simulates;
//sca and vec should update;
delay(100ms);
}
Thank you.
NED and Ini files are read only during initialization of the model. You can't "read" them again after the simulation started. On the other hand, you are free to modify your parameters and create/delete modules using OMNeT++'s C++ API. What you want to achieve is basicaly: set your node position based on some calculations carried out by matlab code. The proper way to do it:
Generate C code from your matlab code.
Link that code to your OMNeT++ model
Create a new mobility model (assuming you are using INET) that is using the matlab code
What you are looking for seems to be more of a project rather than a question/problem which can be solved in Q&A site like stackoverflow.
Unfortunately, I have little understanding of matlab and V-REP to provide you a satisfactory answer. However, it seems that you will need to play around with APIs in lower levels.
As an example of coupling different simulation tools to form a simulation framework in case of need consider reading this paper and this
Also note the answer given by #Rudi. He seems to know what he is talking about.

training a new model using pascal kit

need some help on this.
Currently I am doing a project on computer vision that requires me to train a new model to detect a certain object.
In this case, I am using the system provided by P. Felzenszwalb, D. McAllester, D. Ramaman and his team => Discriminatively trained deformable part models which is implemented in Matlab.
Project webpage: http://www.cs.uchicago.edu/~pff/latent/.
However I have no idea how to direct the system to use my dataset(a collection of images and annotation) which is different from the the PASCAL datasets so as to train a new model.
By directing, I meant a line of code that allows me to change the dataset the system reads from, for training a model.
E.g.
% directory for caching models, intermediate data, and results
cachedir = ['/var/tmp/rbg/YOURPATH/' VOCyear '/'];
I tried looking at their Readme and documentation guides but they do not make any mention. Do correct me if I am wrong.
Let me know if I have not made my problem clear enough.
I tried looking at some files such as global.m but no go.
Your help is much appreciated and thanks in advance!
You can try to read pascal.m in the DPM package(voc-release5), there are similar code working on VOC2007/2010 dataset.
There are plenty of parts that need to be adapted to achieve this. For example the voc_config has to be adapted in order to read from your files.
The same with the pascal_train.m function. Depending on the images and the way you parse them, this may require quite some time to adapt this function.
Other functions to consider:
imreadx
pascal_test
pascaleval

Testing data generator for Intersystems Caché?

Are there any easy ways (i.e. libraries) to create testing data for Caché, similar to the Populator and Faker gems for Ruby/Rails?
** edit **
I am trying to create test data for an Epic implementation. In addition to the electronic-medical-record (EMR) application, the implementation includes a tool called 'Text'. I'm hoping that I can use the data-generator with Text.
The %Populate class has a bunch of methods designed to help you create test data for your persistent classes.
Do ##class(MyApp.MyClass).Populate()
Could also use %PopulateUtils class directly to get random data returns directly.
USER>w ##class(%PopulateUtils).Name()
Taylor,Kenny O.
.
USER>w ##class(%PopulateUtils).Street()
3012 Oak Drive
.
USER>w ##class(%PopulateUtils).SSN()
113-89-3577
mccrackend is right. The docs on this can be found here:
http://docs.intersystems.com/cache20102/csp/docbook/DocBook.UI.Page.cls?KEY=GOBJ_populate

Export from OpenCascade, import into OpenSceneGraph

We have a modeling tool which uses OCC, and a 3d editor using OSG. What I want to do is, export the model from the first tool and import into the second tool. I have been searching the web for days, but I can't find a solution.
Three things can solve my problem:
An exporter for OCC to export into OSG supported formats (.ive, .osg, and many more),
An importer for OSG to import from OCC supported formats (.stp, .step, .igs, .iges, .brp, .brep ),
A converter tool for converting between two formats, one format supported by OCC and one format supported by OSG.
Has anybody done this before, or know of anything that can help?
I am trying to avoid writing a custom exporter for OCC.
I found a solution. OpenCascade has an import/export example, which can export VRML files without texture support. Some modifications on the import export code and some modifications on other parts (where the OCC model is represented by VRML classes) was enough to successfully export my model to a VRML file. Then i built the VRML plugin for OpenSceneGraph and successfully imported the model.
CADExchanger (OCC based) does a pretty good job converting between BRep and other formats (STEP, IGES, STL, VRML...)
Why don't you have a look at pythonocc.org.
I'm assuming OSG takes meshes?
Load the STEP / IGES file in (python)OCC, grab its mesh, push the verts / indices to OSG.
Would that work?