MATLAB reading specific fields of a pdb file - matlab

I have to be coding this in MATLAB. My problem is I would like to extract coordinates of certain atoms corresponding only to some residues in a PDB file. For example, I would like to extract coordinates of CA atoms of all alanine present in the PDB file. I tried using find(strcmp(atoms,'CA')) but it gives me all CA atoms and not CA of Alanine only. How can solve this problem in MATLAB? Kindly help. Thank you.

All I know about PDB files is what I've read today at http://www.wwpdb.org/index and here (http://www.wwpdb.org/documentation/file-format-content/format33/v3.3.html).
I've used the example provided by MatLab help to read the PDB file.
According to the structure of the data read from the PDB file and the description of the file format, it seems to me that the data you are looking for are contained in the Model.Atom field.
More precisely (glf is the name of the struct read by pdbread function):
gfl.Model.Atom(:).AtomName
gfl.Model.Atom(:).resName
gfl.Model.Atom(:).X
gfl.Model.Atom(:).Y
gfl.Model.Atom(:).Z
If so, in order to identify the atoms "CA" of Alcaline you can use a combination of find and strcmp functions as follows:
pos=find(strcmp({gfl.Model.Atom(:).AtomName},'CA') & ...
strcmp({gfl.Model.Atom(:).resName},'ALA'))
The output array pos contains the indices of the Atoms you are looking for.
To extract the coordinate, you can then use that indices as follows:
X=[gfl.Model.Atom(pos).X]
Y=[gfl.Model.Atom(pos).Y]
Z=[gfl.Model.Atom(pos).Z]
You can make more "general" the code by defining "Atom name" and Residue name as parameter.
In the following, you can find the complete script, based on the example file privided by MatLab.
% Generate a PDB file (example from MatLab help)
gfl = getpdb('1GFL','TOFILE','1gfl.pdb')
% Read the PDB file
gfl = pdbread('1gfl.pdb')
% Define the Atom Name
atom_name='CA';
% Define the Residue Name
res_name='ALA';
% Search for the couple "Atom name - Residue Name"
pos=find(strcmp({gfl.Model.Atom(:).AtomName},atom_name) & ...
strcmp({gfl.Model.Atom(:).resName},res_name))
% Extract the coordinates of the Atoms matching the search criteria
X=[gfl.Model.Atom(pos).X]
Y=[gfl.Model.Atom(pos).Y]
Z=[gfl.Model.Atom(pos).Z]
Hope this helps.

Related

Why am I getting "Unable to read file 'topo60c'. No such file or directory" error in Matlab?

Many of Matlab's Mapping toolbox examples require "topo60c" world map data. Here's an example
load topo60c
axesm hatano
meshm(topo60c,topo60cR)
zlimits = [min(topo60c(:)) max(topo60c(:))];
demcmap(zlimits)
colorbar
However, when I run the above script, Matlab displays a file not found error for "topo60c". Does anyone know why I'm getting this error? I have the Mapping toolbox installed, and it works with other Mapping sample code that doesn't reference that file.
In the acknowledgements section of the mapping toolbox docs there is a note about example data sources:
https://uk.mathworks.com/help/map/dedication-and-acknowledgment.html
Except where noted, the information contained in example and sample data files (found in matlabroot/examples/map/data and matlabroot/toolbox/map/mapdata) is derived from publicly available digital data sets. These data files are provided as a convenience to Mapping Toolbox™ users. MathWorks® makes no claims that any of this data is free of defects or errors, or that the representations of geographic features or names are up to date or authoritative.
You can open these folders from MATLAB (on Windows) using
winopen( fullfile( matlabroot, 'examples/map/data' ) )
winopen( fullfile( matlabroot, 'toolbox/map/mapdata' ) )
Or simply use the fullfile commands above to identify the paths and navigate there yourself.
I can see (MATLAB R2020b) the topo60c file within the first of these folders, which isn't on your path by default because it's within "examples" and not a toolbox directory:
So you could either:
Add this folder to your path so that MATLAB can see the file: addpath(fullfile(matlabroot,'examples/map/data'));
Reference the full file path to the data when running examples: load(fullfile(matlabroot,'examples/map/data/topo60c.mat'));
I would prefer option 2 to avoid changing the path.
Additionally, there is another note in the Raster Geodata section of the docs which details what that dataset should contain
https://uk.mathworks.com/help/map/raster-geodata.html
When raster geodata consists of surface elevations, the map can also be referred to as a digital elevation model/matrix (DEM), and its display is a topographical map. The DEM is one of the most common forms of digital terrain model (DTM), which can also be represented as contour lines, triangulated elevation points, quadtrees, octree, or otherwise.
The topo60c MAT-file, which contains global terrain data, is an example of a DEM. In this 180-by-360 matrix, each row represents one degree of latitude, and each column represents one degree of longitude. Each element of this matrix is the average elevation, in meters, for the one-degree-by-one-degree region of the Earth to which its row and column correspond.
Given that it's generated from publically available data anyway (ref the first docs quote) and you now know what data it represents (ref the 2nd docs quote), you could replicate some replacement data if really needed.

Keeping instances IDs during mcl clustering

I am trying to cluster points using mcl. The points take indices ind (e.g ind= [4,54,3,etc]). I converted my graph to .abc format and applied mcl to this file (following the instructions provided by micans). The output gives me clusters using the canonical domain (that is, for the example above, 3 would be represented by 0, 4 by 1, 54 by 3). Is there a way to get the output using the indices I gave in input?
This is the basic workflow, using an example file name 'f.abc' in abc format:
mcxload -abc f.abc --stream-mirror -o f.mci -write-tab f.tab
mcl f.mci
mcxdump -icl out.f.mci.I20 -tabr f.tab -o dump.f.mci.I20
The file dump.f.mci.I20 should now contain the labels that were used in the 'abc' file.
However, if you just do
mcl f.abc --abc
then you should get the exact same result, although now in the (default output) file out.f.abc.I20. By default mcl assumes an 'mcl graph file' (in the documentation this is often called matrix format or refered to as a matrix file, as graphs and sparse matrices are the same thing in the mcl software). You can give mcl a file in abc format, but it will not figure out by itself that the format is different, hence the use of the --abc option.

csvwrite in loop with numbered filenames in matlab

kinda new to matlab here, searching the csvwrite tutorial and some of the existing webportals regarding my question couldn't find a way to pass my variables by value to the output file names while exporting in csv; providing my bellow scripts, i would like to have the output files something like output_$aa_$dd.csv which aa and dd are respectively the first and second for counters of the scripts.
for aa=1:27
for dd=1:5
M_Normal=bench(aa,dd).Y;
for j=1:300
randRand=M_Normal(randperm(12000,12000));
for jj = 1:numel(randMin(:,1)); % loops over the rand numbers
vv= randMin(jj,1); % gets the value
randMin(jj,j+1)=min(randRand(1:vv)); % get and store the min of the selction in the matix
end
end
csvwrite('/home/amir/amir_matlab/sprintf(''%d%d',aa, bb).csv',randMin);
end
end
String concatenation in MATLAB is done like a matrix concatenation. For example
a='app';
b='le';
c=[a,b] % returns 'apple'
Hence, in your problem, the full path can be formed this way.
['/home/amir/amir_matlab/',sprintf('%d_%d',aa,bb),'.csv']
Furthermore, it is usually best not to specify the file separator explicitly, so that your code can be implemented in other operating systems. I suggest you write the full path as
fullfile('home','amir','amir_matlab',sprintf('%d_%d.csv',aa,bb))
Cheers.

MATLAB Saving and Loading Feature Vectors

I am trying to load feature vectors into classifiers such as a k-nearest neighbors classifier.
I have my code for GLCM, so I get contrast, correlation, energy, homogeneity in numbers (feature vectors).
My question is, how can I save every set of feature vectors from all the training images? I have seen somewhere that people had a .set file to load into classifiers (may be it is a special case for the particular classifier toolbox).
load 'mydata.set';
for example.
I suppose it does not have to be a .set file.
I'd just need a way to store all the feature vectors from all the training images in a separate file that can be loaded.
I've google,
and I found this that may be useful
but I am not entirely sure.
Thanks for your time and help in advance.
Regards.
If you arrange your feature vectors as the columns of an array called X, then just issue the command
save('some_description.mat','X');
Alternatively, if you want the save file to be readable, say in ASCII, then just use this instead:
save('some_description.txt', 'X', '-ASCII');
Later, when you want to re-use the data, just say
var = {'X'}; % <-- You can modify this if you want to load multiple variables.
load('some_description.mat', var{:});
load('some_description.txt', var{:}); % <-- Use this if you saved to .txt file.
Then the variable named 'X' will be loaded into the workspace and its columns will be the same feature vectors you computed before.
You will want to replace the some_description part of each file name above and instead use something that allows you to easily identify which data set's feature vectors are saved in the file (if you have multiple data sets). Your array of feature vectors may also be called something besides X, so you can change the name accordingly.

Reading in points from a file

I have a txt file in which each row has the x, y ,z coordinates of the point. seperated by space.I want to read points from this txt file and store it as a matrix in matlab of the form [Pm_1 Pm_2 ... Pm_nmod] where each Pm_n is a point .Could someone help me with this?
I have to actually enter it into a code which accepts the model as :
"model - matrix with model points, [Pm_1 Pm_2 ... Pm_nmod]"
I use importdata heavily for this. It reads all kinds of formats ; I normally use other methods like dlmread only if importdata doesn't work.
Usage is as simple as M = importdata('data.txt');
Just use
load -ascii data.txt
That creates a matrix called `data' in your workspace whose rows contain the coordinates.
You can find all the details of the conversion in the documentation for the load command.