Keeping instances IDs during mcl clustering - cluster-analysis

I am trying to cluster points using mcl. The points take indices ind (e.g ind= [4,54,3,etc]). I converted my graph to .abc format and applied mcl to this file (following the instructions provided by micans). The output gives me clusters using the canonical domain (that is, for the example above, 3 would be represented by 0, 4 by 1, 54 by 3). Is there a way to get the output using the indices I gave in input?

This is the basic workflow, using an example file name 'f.abc' in abc format:
mcxload -abc f.abc --stream-mirror -o f.mci -write-tab f.tab
mcl f.mci
mcxdump -icl out.f.mci.I20 -tabr f.tab -o dump.f.mci.I20
The file dump.f.mci.I20 should now contain the labels that were used in the 'abc' file.
However, if you just do
mcl f.abc --abc
then you should get the exact same result, although now in the (default output) file out.f.abc.I20. By default mcl assumes an 'mcl graph file' (in the documentation this is often called matrix format or refered to as a matrix file, as graphs and sparse matrices are the same thing in the mcl software). You can give mcl a file in abc format, but it will not figure out by itself that the format is different, hence the use of the --abc option.

Related

Why am I getting "Unable to read file 'topo60c'. No such file or directory" error in Matlab?

Many of Matlab's Mapping toolbox examples require "topo60c" world map data. Here's an example
load topo60c
axesm hatano
meshm(topo60c,topo60cR)
zlimits = [min(topo60c(:)) max(topo60c(:))];
demcmap(zlimits)
colorbar
However, when I run the above script, Matlab displays a file not found error for "topo60c". Does anyone know why I'm getting this error? I have the Mapping toolbox installed, and it works with other Mapping sample code that doesn't reference that file.
In the acknowledgements section of the mapping toolbox docs there is a note about example data sources:
https://uk.mathworks.com/help/map/dedication-and-acknowledgment.html
Except where noted, the information contained in example and sample data files (found in matlabroot/examples/map/data and matlabroot/toolbox/map/mapdata) is derived from publicly available digital data sets. These data files are provided as a convenience to Mapping Toolbox™ users. MathWorks® makes no claims that any of this data is free of defects or errors, or that the representations of geographic features or names are up to date or authoritative.
You can open these folders from MATLAB (on Windows) using
winopen( fullfile( matlabroot, 'examples/map/data' ) )
winopen( fullfile( matlabroot, 'toolbox/map/mapdata' ) )
Or simply use the fullfile commands above to identify the paths and navigate there yourself.
I can see (MATLAB R2020b) the topo60c file within the first of these folders, which isn't on your path by default because it's within "examples" and not a toolbox directory:
So you could either:
Add this folder to your path so that MATLAB can see the file: addpath(fullfile(matlabroot,'examples/map/data'));
Reference the full file path to the data when running examples: load(fullfile(matlabroot,'examples/map/data/topo60c.mat'));
I would prefer option 2 to avoid changing the path.
Additionally, there is another note in the Raster Geodata section of the docs which details what that dataset should contain
https://uk.mathworks.com/help/map/raster-geodata.html
When raster geodata consists of surface elevations, the map can also be referred to as a digital elevation model/matrix (DEM), and its display is a topographical map. The DEM is one of the most common forms of digital terrain model (DTM), which can also be represented as contour lines, triangulated elevation points, quadtrees, octree, or otherwise.
The topo60c MAT-file, which contains global terrain data, is an example of a DEM. In this 180-by-360 matrix, each row represents one degree of latitude, and each column represents one degree of longitude. Each element of this matrix is the average elevation, in meters, for the one-degree-by-one-degree region of the Earth to which its row and column correspond.
Given that it's generated from publically available data anyway (ref the first docs quote) and you now know what data it represents (ref the 2nd docs quote), you could replicate some replacement data if really needed.

extra lines in command window output

I am very new to MATLAB and i am currently trying to learn how to import files in matlab and work on it. I am importing a "*.dat" file which contains a single column of floating point numbers[they are just filter coefficients I got from a c++ code] into an array in MATLAB. When I am displaying the output in command window the first line is always " 1.0e-03 * " followed by the contents of my file. I want to know what it means? When I check my workspace the array connects the correct number of inputs. My sample code and first few lines of output are below:
Code:-
clear; clc;
coeff = fopen('filterCoeff.dat');
A = fscanf(coeff, '%f');
A
fclose(coeff);
Output:-
A =
**1.0e-03 *** <===== What does this mean?
-0.170194000000000
0
0.404879000000000
0
-0.410347000000000
P.S: I found many options to read file eg. textscan, fscanf etc. Which one is the best to use?
It is a multiplier that applies to all the numbers displayed after that. It means that, for example, the last entry of A is not -0.410347 but -0.410347e-3, that is, -0.000410347.
I think it is is just Matlab's display number type. It means each of your results are scaled by that amount.
format longg
A
And see what it displays. Look at the docs for format for other options.

MATLAB reading specific fields of a pdb file

I have to be coding this in MATLAB. My problem is I would like to extract coordinates of certain atoms corresponding only to some residues in a PDB file. For example, I would like to extract coordinates of CA atoms of all alanine present in the PDB file. I tried using find(strcmp(atoms,'CA')) but it gives me all CA atoms and not CA of Alanine only. How can solve this problem in MATLAB? Kindly help. Thank you.
All I know about PDB files is what I've read today at http://www.wwpdb.org/index and here (http://www.wwpdb.org/documentation/file-format-content/format33/v3.3.html).
I've used the example provided by MatLab help to read the PDB file.
According to the structure of the data read from the PDB file and the description of the file format, it seems to me that the data you are looking for are contained in the Model.Atom field.
More precisely (glf is the name of the struct read by pdbread function):
gfl.Model.Atom(:).AtomName
gfl.Model.Atom(:).resName
gfl.Model.Atom(:).X
gfl.Model.Atom(:).Y
gfl.Model.Atom(:).Z
If so, in order to identify the atoms "CA" of Alcaline you can use a combination of find and strcmp functions as follows:
pos=find(strcmp({gfl.Model.Atom(:).AtomName},'CA') & ...
strcmp({gfl.Model.Atom(:).resName},'ALA'))
The output array pos contains the indices of the Atoms you are looking for.
To extract the coordinate, you can then use that indices as follows:
X=[gfl.Model.Atom(pos).X]
Y=[gfl.Model.Atom(pos).Y]
Z=[gfl.Model.Atom(pos).Z]
You can make more "general" the code by defining "Atom name" and Residue name as parameter.
In the following, you can find the complete script, based on the example file privided by MatLab.
% Generate a PDB file (example from MatLab help)
gfl = getpdb('1GFL','TOFILE','1gfl.pdb')
% Read the PDB file
gfl = pdbread('1gfl.pdb')
% Define the Atom Name
atom_name='CA';
% Define the Residue Name
res_name='ALA';
% Search for the couple "Atom name - Residue Name"
pos=find(strcmp({gfl.Model.Atom(:).AtomName},atom_name) & ...
strcmp({gfl.Model.Atom(:).resName},res_name))
% Extract the coordinates of the Atoms matching the search criteria
X=[gfl.Model.Atom(pos).X]
Y=[gfl.Model.Atom(pos).Y]
Z=[gfl.Model.Atom(pos).Z]
Hope this helps.

Example data sets in Matlab

There are several example data sets in Matlab, for example wind and mri. If you execute the command load wind you will load the data in the data set wind. Some are included in toolboxes and some appear to be included in standard Matlab. These example data sets are valuable as test data when developing algorithms.
Where can one find a list of all such data sets included in Matlab?
You can enter demo in matlab to get a list. The wind table is part of Example — Stream Line Plots of Vector Data, etc.
For the tables on your computer, have a look at:
C:\Program Files\MATLAB\R2007b\toolbox\matlab\demos
The example data is located in .mat files in ../toolbox/matlab/demos.
The following data is available in MATLAB 2014a:
% in matlab run:
> H=what('demos')
> display(H.mat)
You can also use your favorite Linux console:
/usr/local/MATLAB/R2014a/toolbox/matlab/demos$ ls *.mat -1 | sed -e "s/.mat//g"
This is my list for readers who can not try it on their machine while reading this answer:
accidents
airfoil
cape
census
clown
detail
dmbanner
durer
earth
flujet
gatlin
gatlin2
integersignal
logo
mandrill
membrane
mri
patients
penny
quake
seamount
spine
stocks
tetmesh
topo
topography
trimesh2d
trimesh3d
truss
usapolygon
usborder
vibesdat
west0479
wind
xpmndrll
While the command demo in MATLAB 2018b will start a help browser with some demos:
You can find a list of all available dataset and their description in the following link :
https://www.mathworks.com/help/stats/sample-data-sets.html

MATLAB Saving and Loading Feature Vectors

I am trying to load feature vectors into classifiers such as a k-nearest neighbors classifier.
I have my code for GLCM, so I get contrast, correlation, energy, homogeneity in numbers (feature vectors).
My question is, how can I save every set of feature vectors from all the training images? I have seen somewhere that people had a .set file to load into classifiers (may be it is a special case for the particular classifier toolbox).
load 'mydata.set';
for example.
I suppose it does not have to be a .set file.
I'd just need a way to store all the feature vectors from all the training images in a separate file that can be loaded.
I've google,
and I found this that may be useful
but I am not entirely sure.
Thanks for your time and help in advance.
Regards.
If you arrange your feature vectors as the columns of an array called X, then just issue the command
save('some_description.mat','X');
Alternatively, if you want the save file to be readable, say in ASCII, then just use this instead:
save('some_description.txt', 'X', '-ASCII');
Later, when you want to re-use the data, just say
var = {'X'}; % <-- You can modify this if you want to load multiple variables.
load('some_description.mat', var{:});
load('some_description.txt', var{:}); % <-- Use this if you saved to .txt file.
Then the variable named 'X' will be loaded into the workspace and its columns will be the same feature vectors you computed before.
You will want to replace the some_description part of each file name above and instead use something that allows you to easily identify which data set's feature vectors are saved in the file (if you have multiple data sets). Your array of feature vectors may also be called something besides X, so you can change the name accordingly.