Is there a way for modifying molecule in RDkit? - chemistry

I have a branched molecule just like in the Image (left one).
I want to add COOH at the end of each branch like Image (right one)
Here is the SMILES format of my molecule in a simplified form with 4 branches.
[N:1]([CH2:2][CH2:3][N:4]([CH2:47][CH2:48][CH:49]([NH:50][CH2:51][CH2:52][NH2:53])[O-:55])[CH2:66][CH2:67][CH:68]([NH:69][CH2:70][CH2:71][NH2:72])[O-:74])([CH2:9][CH2:10][CH:11]([NH:12][CH2:13][CH2:14][NH2:15])[O-:17])[CH2:28][CH2:29][CH:30]([NH:31][CH2:32][CH2:33][NH2:34])[O-:36]
I actually have a much bigger molecule but if i can find a way to do it with the simple one, i think i can extend the solution to the bigger one.
Here is a code example
mod_mol = Chem.ReplaceSubstructs(m,
Chem.MolFromSmiles('[NH2:34]'),
Chem.MolFromSmiles('[CH2:99]'),
replaceAll=True)
mod_mol[0]
for example i tried to change NH2 to CH2 but nothing happens.

In general, it is helpful to observe where the error shows a Nonetype. In this case,
rdkit.Chem.rdmolops.ReplaceSubstructs(Mol, NoneType, Mol)
The issue was caused because Chem.MolFromSmiles was provided with a SMARTS string, like this:
`Chem.MolFromSmiles('[NH2:34]')`
The solution is to use a Chem.MolFromSmarts instead, like this:
Chem.MolFromSmarts('[NH2:34]')

Related

Cimplicity Screen - one object/button that is dependent on hundreds of points

So I have created a huge screen that essentially just shows the robot status for every robot in this factory (individually)… At the very end of the project, they decided they want one object on the screen that blinks if any of the 300 robots fault. I am trying to think of a way to make this work. Maybe a global script of some kind? Problem is, I do not do much scripting in Cimplicity, so any help is appreciated.
All the points that are currently used on this screen (to indicate a fault) have very similar names… as in, the beginning is the same… so I was thinking of a script that could maybe recognize if a bit is high based on PART of it's string name characteristic. The end will change a little each time, but I am sure there is a way to only look for part of a string and negate the rest. If the end has to be hard coded, that's fine.
You can use a Python script in Cimplicity.
I will not go into detail on the use of python in Cimplicity, which is well described in the documentation indicated above.
Here's an example of what can be done... note that I don't have a way to test it and, of course, this will work if the name of your robots in the declaration follows the format Robot_1, Robot_2, Robot_3 ... Robot_10 ... Robot_300 and it also depends on the Name and the Type of the fault variable... as you didn't define it, I imagine it can be an integer, with ZERO indicating no error. But if you use something other than that, you can easily change it.
import cimplicity
(...)
OneRobotWithFault = False
# Here you get the values and check for fault
for i in range(0, 300):
pointName = f'MyFactory.Robot_{i}.FaultCode'
robotFaultCode = cimplicity.point_get(pointName)
if robotFaultCode > 0:
OneRobotWithFault = True
break
# Set the status to the variable "WeHaveRobotWithFault"
cimplicity.point_set("WeHaveRobotWithFault", OneRobotWithFault)

Parsing XML and retrieving attributes from (nested?) elements

I am trying to get specific data from an XML file, namely X, Y coordinates that are appear, to my beginners eyes, attributes of an element called "Point" in my file. I cannot get to that data with anything other than a sledgehammer approach and would gratefully accept some help.
I have used the following successfully:
for Shooter in root.iter('Shooter'):
print(Shooter.attrib)
But if I try the same with "Point" (or "Points") there is no output. I cannot even see "Point" when I use the following:
for child in root:
print(child.tag, child.attrib)
So: the sledgehammer
print([elem.attrib for elem in root.iter()])
Which gives me the attributes for every element. This file is a single collection of data and could contain hundreds of data points and so I would rather try to be a little more subtle and home in on exactly what I need.
My XML file
https://pastebin.com/abQT3t9k
UPDATE: Thanks for the answers so far. I tried the solution posted and ended up with 7000 lines of which wasn't quite what I was after. I should have explained in more detail. I also tried (as suggested)
def find_rec(node, element, result):
for item in node.findall(element):
result.append(item)
find_rec(item, element, result)
return result
print(find_rec(ET.parse(filepath_1), 'Shooter', [])) #Returns <Element 'Shooter' at 0x125b0f958>
print(find_rec(ET.parse(filepath_1), 'Point', [])) #Returns None
I admit I have never worked with XML files before, and I am new to Python (but enjoying it). I wanted to get the solution myself but I have spent days getting nowhere.
I perhaps should have just asked from the beginning how to extract the XY data for each ShotNbr (in this file there is just one) but I didn't want code written for me.
I've managed to get the XY from this file but my code will never work if there is more than one shot, or if I want to specifically look at, say, shot number 20.
How can I find shot number 2 (ShotNbr="2") and extract only its XY data points?
Assuming that you are using:
xml.etree.ElementTree,
You are only looking at the direct children of root.
You need to recurse into the tree to access elements lower in the hierarchical tree.
This seems to be the same problem as ElementTree - findall to recursively select all child elements
which has an excellent answer that I am not going to plagiarize.
Just apply it.
Alternatively,
import xml.etree.ElementTree as ET
root = ET.parse("file.xml")
print root.findall('.//Point')
Should work.
See: https://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax

Why FastText test of a model return only 1 exemple when my test file contains 135

I'm trying to test the model (model.bin) i've made with fastText on a test file (test.txt). In this test file, i have 135 labelised data. I'm expecting from fastText to test my model on this number of example, but instead, it only test it over 1 example. Where does come from this problem ?
I've already tried to do such a thing with another model and another testing file and all worked nicely.
this is how I test my model. model_baby.bin is the model, and test.data.txt is my testing file.
./fasttext test model_baby.bin test.data.txt
N 1
P#1 1
R#1 0.0164
Number of examples: 1
And here is an extract from my testing file
__label__4.0 I love the fact you can hide your stuff. Only down is that the straps to hold it at midpoint and bottom could be better designed for your car. It's got plenty of room which is great. __label__5.0 This hid our ipad wonderfully. Especially for those quick stops where we all had jump out and use the restroom. It zipped, folded and held all our stuff for the kids in the back seat. __label__3.0
As i have more than 1 labelised example in my testing file, I expect the output "Number of examples: " to be at least more than 1 but the actual one is "1"
From the official documentation (https://fasttext.cc/docs/en/supervised-tutorial.html): Each line of the text file contains a list of labels, followed by the corresponding document. All the labels start by the __label__ prefix, which is how fastText recognize what is a label or what is a word.
I don't understand very much your extract. I think it should be like this:
__label__4.0 I love the fact you can hide your stuff. Only down is that the straps to hold it at midpoint and bottom could be better designed for your car. It's got plenty of room which is great.
__label__5.0 This hid our ipad wonderfully. Especially for those quick stops where we all had jump out and use the restroom. It zipped, folded and held all our stuff for the kids in the back seat.
__label__3.0 ...

Remove Matlab r2014b Plot Browser limit

Among many distressing graphics changes to r2014b, the Plot Browser now only displays a certain number of lines per plot (looks like the limit is 50). Any number of plots above this limit are not displayed in the Plot Browser - it just says "and 78 more..."
Is there anyway to remove the limit? I want to see all my lines in the plot browser.
Unfortunately the answer is currenty:
No you cannot remove this limit
This was already reported to mathworks a while ago, and here is the reply:
I am writing in reference to your Technical Support Case #01143663
regarding 'plot browser with "and xxxx more...." indication'.
Really interesting question (and even a bit surprising). Basically
this limitation has been introduced with MATLAB 2014b!
Our developers are aware of that, and they are working to solve it. An
enhancement/bug request has been already submitted, and I am going to
add this case to the list. However, I cannot guarantee you a release
date.
If you think this limitation is crucial for your work, I would
strongly suggest you to contact your account manager, which will have
a bit more influence on the developers than a simple engineer :).
Of course, if there is anything else I can do for you, please let me
know
So it seems that you will have to deal with this limitation if you keep using 2014b.
This was also an issue for me; in particular I wanted to see the DisplayName property of the graphics object in the list. I used a workaround where I created a callback function, so that when clicking on a data point, the DisplayName would be shown. This can be helpful if you have a plot of many lines and want to see the DisplayName of a particular one. You would first need to set the DisplayName property of the graphics objects for this to work, as it's empty by default. You could also use this to show other properties, such as Color or LineStyle, that are shown in the plot browser:
%Based on
%http://www.mathworks.com/help/matlab/ref/datacursormode.html
%'fig_h' is the figure handle
dcm_obj = datacursormode(fig_h);
set(dcm_obj,'UpdateFcn',#myupdatefcn)
And then include this function as a separate file on the Matlab path, or paste into the function you're currently writing, and include an extra 'end' at the end of that function:
function name = myupdatefcn(empt,event_obj)
% Customizes text of data tips
tar = get(event_obj,'Target');
name = get(tar,'DisplayName');
end

Data Processing, how to approach

I have the following Problem, given this XML Datastructure:
<level1>
<level2ElementTypeA></level2ElementTypeA>
<level2ElementTypeB>
<level3ElementTypeA>String1Ineed<level3ElementTypeB>
</level2ElementTypeB>
...
<level2ElementTypeC>
<level3ElementTypeB attribute1>
<level4ElementTypeA>String2Ineed<level4ElementTypeA>
<level3ElementTypeB>
<level2ElementTypeC>
...
<level2ElementTypeD></level2ElementTypeD>
</level1>
<level1>...</level1>
I need to create an Entity which contain: String1Ineed and String2Ineed.
So every time I came across a level3ElementTypeB with a certain value in attribute1, I have my String2Ineed. The ugly part is how to obtain String1Ineed, which is located in the first element of type level2ElementTypeB above the current level2ElementTypeC.
My 'imperative' solution looks like that that I always keep an variable with the last value of String1Ineed and if I hit criteria for String2Ineed, I simply use that. If we look at this from a plain collection processing point of view. How would you model the backtracking logic between String1Ineed and String2Ineed? Using the State Monad?
Isn't this what XPATH is for? You can find String2Ineed and then change the axis to search back for String1Ineed.