manipulating columns in the pdb files in matlab - matlab

I have written the following code, and I am required to do the operations with 6,7,8th column only as shown but I want the result .pdb file containing the other columns from the input files also. that is I just want to replace the 6,7,8th column of the input file with the final result without affecting the other columns. I am not able to figure out this.
%
phi1 =188.7*pi/180;
PHI = 82.3*pi/180;
phi2 = 150.4*pi/180;
%%%%% to be change ary euler angles
%%%%%%% rotation_matrix defining
R(1,1) = cos(phi1)*cos(phi2)-sin(phi1)*sin(phi2)*cos(PHI);
R(1,2) = sin(phi1)*cos(phi2)+cos(phi1)*sin(phi2)*cos(PHI);
R(1,3) = sin(phi2)*sin(PHI);
R(2,1) = -cos(phi1)*sin(phi2)-sin(phi1)*cos(phi2)*cos(PHI);
R(2,2) = -sin(phi1)*sin(phi2)+cos(phi1)*cos(phi2)*cos(PHI);
R(2,3) = cos(phi2)*sin(PHI);
R(3,1) = sin(phi1)*sin(PHI);
R(3,2) = -cos(phi1)*sin(PHI);
R(3,3) = cos(PHI);
%%output
rotationMatrix=R;
fid2 = fopen('final_result.pdb','wt');
for k=1:100
fid = fopen('ipp31.pdb');
A = textscan(fid, '%s %s %f %s %s %f %f %f %s ') ;
%read the file
a = A{6};
b = A{7};
c = A{8};
p=[a b c];
p_t=p.';
M=rotationMatrix*p_t;
M_T=M.';
fid = fopen('ipp31.pdb', 'wt'); % Open for writing
fprintf(fid,' %f\t %f\t %f\n',M_T);
fclose(fid);
%
textfilename = [num2str(k) '.pdb'];
fid1 = fopen(textfilename,'wt');
fprintf(fid1,' %f\t %f\t %f\n',M_T);
fclose(fid1);
fprintf(fid2,' %f\t %f\t %f\n',M_T);
end
fclose(fid2);

Related

Using textscan to extract values from a text file

I have the following text file:
Leaf Tips:2867.5,1101.66666666667 2555,764.166666666667 2382.5,1221.66666666667 2115,759.166666666667 1845,1131.66666666667 1270,991.666666666667
Leaf Bases:1682.66666666667,800.333333333333 1886,850.333333333333 2226,920.333333333333 2362.66666666667,923.666666666667 2619.33333333333,967
Ear Tips:1029.33333333333,513.666666666667 1236,753.666666666667
Ear Bases:1419.33333333333,790.333333333333 1272.66666666667,677
These are coordinates to regions of interest for each category in an image. I need to extract these regions. I know I have to use textscan to accomplish this but I am unsure of the formatspec options needed to achieve this, since whichever setting I use seem to give me some jumbled form of cell output.
What formatSpec should I use so that I get the coordinates of each region outputed in a cell?
I've tried the following:
file = '0.txt';
fileID = fopen(file);
formatSpec = '%s %f %f %f %f %f %f %f %f';
C = textscan(fileID, formatSpec, 150, 'Delimiter', ':');
Here's an example of what you can do:
fid = fopen('0.txt'); % open file
T = textscan(fid, '%s','Delimiter',':'); % read all lines, separate row names from numbers
fclose(fid); % close file
T = reshape(T{1},2,[]).'; % rearrange outputs so it makes more sense
T = [T(:,1), cellfun(#(x)textscan(x,'%f','Delimiter',','), T(:,2))]; % parse numbers
Which will result in a cell array as follows:
T =
4×2 cell array
{'Leaf Tips' } {12×1 double}
{'Leaf Bases'} {10×1 double}
{'Ear Tips' } { 4×1 double}
{'Ear Bases' } { 4×1 double}
This is how I would do it:
fid = fopen('file.txt');
x = textscan(fid,'%s', 'Delimiter', char(10)); % read each line
fclose(fid);
x = x{1};
x = regexp(x, '\d*\.?\d*', 'match'); % extract numbers of each line
C = cellfun(#(t) reshape(str2double(t), 2, []).', x, 'UniformOutput', false); % rearrange
Result:
>> celldisp(C)
C{1} =
1.0e+03 *
2.867500000000000 1.101666666666670
2.555000000000000 0.764166666666667
2.382500000000000 1.221666666666670
2.115000000000000 0.759166666666667
1.845000000000000 1.131666666666670
1.270000000000000 0.991666666666667
C{2} =
1.0e+03 *
1.682666666666670 0.800333333333333
1.886000000000000 0.850333333333333
2.226000000000000 0.920333333333333
2.362666666666670 0.923666666666667
2.619333333333330 0.967000000000000
C{3} =
1.0e+03 *
1.029333333333330 0.513666666666667
1.236000000000000 0.753666666666667
C{4} =
1.0e+03 *
1.419333333333330 0.790333333333333
1.272666666666670 0.677000000000000

Speedup processing of larger binary files

I have to process thousands of binary files (each of 16MB) by reading pairs of them and creating a bit-level data structure (usually a 1x134217728 array) in order to process them on bit level.
Currently I am doing this the following way:
conv = #(c) uint8(bitget(c,1:32));
measurement = NaN(1,(sizeOfMeasurements*8)) %(1,134217728)
fid = fopen(fileName, 'rb');
byteContent = fread(fid,'uint32');
fclose(fid);
bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);
measurement=[bitRepresentation1{:}];
Thus, I replaced fopen with memmapfile as below:
m = memmapfile(fileName,'Format',{'uint32', [4194304 1], 'byteContent'});
byteContent = m.data.byteContent;
byteContent = double(byteContent);
I printed timing information (using tic/toc) for the individual instructions and it turns out that the bottleneck is:
bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false); % see first line of code for conv
Are there more efficient ways of transforming byteContent into an array that stores a bit per index (i.e. that is a bit representation of byteContent)?
Let looping over all numbers be handled by bitget. You loop over the bits:
fid = fopen(fileName, 'rb');
bitContent = fread(fid,'*ubit64');
fclose(fid);
conv = #(ii) uint8(bitget(bitContent, ii));
bitRepresentation = arrayfun(conv, 1:64, 'UniformOutput', false);
measurement = [bitRepresentation{:}]';
measurement = measurement(:).';
EDIT you can also try a direct loop:
fid = fopen(fileName, 'rb');
bitContent = fread(fid,'*ubit64');
fclose(fid);
sz = 64 * size(bitContent,1);
measurement3 = zeros(1, sz, 'uint8');
weave = 1:64:sz;
for ii = 1:64
measurement3(weave + ii - 1) = uint8(bitget(bitContent, ii)); end
but on my system, that is (surprisingly) slower than arrayfun...but, my MATLAB version is from the stone age, your mileage may be different. Give it a try
Several things that seem to provide further improvement on Rody's suggestion:
(minor:) Using a local function instead of a function handle for conv.
(major:) Converting the result of conv to logical using ~~ instead of uint8.
(major:) cell2mat instead of [bitRepresentation{:}]'.
The result:
function q40863898(filename)
fid = fopen(filename, 'rb');
bitContent = fread(fid,'*ubit64');
fclose(fid);
bitRepresentation = arrayfun(#convert, 1:64, 'UniformOutput', false);
measurement = reshape(cell2mat(bitRepresentation).',[],1).';
function out = convert(ii)
out = ~~(bitget(bitContent, ii, 'uint64'));
end
end
Benchmark result (on MATLAB R2016b, Win10 x64, 14MB file):
Rody's vectorized method: 0.87783
Rody's loop method: 2.37
Dev-iL's method: 0.68387
Benchmark code:
function q40863898(filename)
%% Common code:
fid = fopen(filename, 'rb');
bitContent = fread(fid,'*ubit64');
fclose(fid);
%% Verification:
ref = Rody1();
res = {Rody2(), uint8(Devil1())};
assert(isequal(ref,res{1}));
assert(isequal(ref,res{2}));
%% Benchmark:
disp(['Rody''s vectorized method: ' num2str(timeit(#Rody1))]);
disp(['Rody''s loop method: ' num2str(timeit(#Rody2))]);
disp(['Dev-iL''s method: ' num2str(timeit(#Devil1))]);
%% Functions:
function measurement = Rody1()
conv = #(ii) uint8(bitget(bitContent, ii));
bitRepresentation = arrayfun(conv, 1:64, 'UniformOutput', false);
measurement = [bitRepresentation{:}]';
measurement = measurement(:).';
end
function measurement = Rody2()
sz = 64 * size(bitContent,1);
measurement = zeros(1, sz, 'uint8');
weave = 1:64:sz;
for ii = 1:64
measurement(weave + ii - 1) = uint8(bitget(bitContent, ii));
end
end
function measurement = Devil1()
bitRepresentation = arrayfun(#convert, 1:64, 'UniformOutput', false);
measurement = reshape(cell2mat(bitRepresentation).',[],1).';
function out = convert(ii)
out = ~~(bitget(bitContent, ii, 'uint64'));
end
end
end

importing text file data by blocks?

I am trying to import every rows that starts with '//', I have tried to extract it with the script below. can anybody check my script please?
formatSpec = '//NFE=%f //ElapsedTime=%f //SBX=%f //DE=%f //PCX=%f //SPX=%f //UNDX=%f //UM=%f //Improvements=%f //Restarts=%f //PopulationSize=%f //ArchiveSize=%f //MutationIndex=%f %*f';
N=1
k = 0;
while ~feof(fileID)
k = k+1;
C = textscan(fileID,formatSpec,N,'CommentStyle','#','Delimiter','\n');
end
It is not clear to me how you want the output to look, but here is one possibilitiy:
fid = fopen(filename, 'rt');
dataset = textscan(fid, '%s', 'delimiter', '\n', 'headerlines', 0);
fclose(fid);
result = regexp(dataset{1}, '//([A-Za-z].*)=([0-9\.].*)', 'tokens');
result = result(cellfun(#(x) ~isempty(x), result));
result contains both the type, e.g. NFE or SBX, and the number (albeit in character format).

Importing data block with Matlab

I have a set of data in the following format, and I would like to import each block in order to analyze them with Matlab.
Emax=0.5/real
----------------------------------------------------------------------
4.9750557 14535
4.9825821 14522
4.990109 14511
4.9976354 14491
5.0051618 14481
5.0126886 14468
5.020215 14437
5.0277414 14418
5.0352678 14400
5.0427947 14372
5.0503211 14355
5.0578475 14339
5.0653744 14321
Emax=1/real
----------------------------------------------------------------------
24.965595 597544
24.973122 597543
24.980648 597543
24.988174 597542
24.995703 597542
25.003229 597542
I have modified this piece of code from MathWorks, but I think, I have problems dealing with the spaces between each column.
Each block of data consist of 3874 rows and is divided by a text (Emax=XX/real) and a line of ----, unfortunately is the only way the software export the data.
Here is one way to import the data:
% read file as a cell-array of lines
fid = fopen('file.dat', 'rt');
C = textscan(fid, '%s', 'Delimiter','');
C = C{1};
fclose(fid);
% remove separator lines
C(strncmp('---',C,3)) = [];
% location of section headers
headInd = [find(strncmp('Emax=', C, length('Emax='))) ; numel(C)+1];
% extract each section
num = numel(headInd)-1;
blocks = struct('header',cell(num,1), 'data',cell(num,1));
for i=1:num
% section header
blocks(i).header = C{headInd(i)};
% data
X = regexp(C(headInd(i)+1:headInd(i+1)-1), '\s+', 'split');
blocks(i).data = str2double(vertcat(X{:}));
end
The result is a structure array containing the data from each block:
>> blocks
blocks =
2x1 struct array with fields:
header
data
>> blocks(2)
ans =
header: 'Emax=1/real'
data: [6x2 double]
>> blocks(2).data(:,1)
ans =
24.9656
24.9731
24.9806
24.9882
24.9957
25.0032
This should work. I don't think textscan() will work with a file like this because of the breaks between blocks.
Essentially what this code does is loop through lines between blocks until it finds a line that matches the data format. The code is naive and assumes that the file will have exactly the number of blocks lines per block that you specify. If there were a fixed number of lines between blocks it would be a lot easier and you could remove the first inner loop and replace with just ~=fgets(fid) once for each line.
function block_data = readfile(in_file_name)
fid = fopen(in_file_name, 'r');
delimiter = ' ';
line_format = '%f %f';
n_cols = 2; % Number of numbers per line
block_length = 3874; % Number of lines per block
n_blocks = 2; % Total number of blocks in file
tline = fgets(fid);
line_data = cell2mat(textscan(tline,line_format,'delimiter',delimiter,'MultipleDelimsAsOne',1));
block_n = 0;
block_data = zeros(n_blocks,block_length,n_cols);
while ischar(tline) && block_n < n_blocks
block_n = block_n+1;
tline = fgets(fid);
if ischar(tline)
line_data = cell2mat(textscan(tline,line_format,'delimiter',delimiter,'MultipleDelimsAsOne',1));
end
while ischar(tline) && isempty(line_data)
tline = fgets(fid);
line_data = cell2mat(textscan(tline,line_format,'delimiter',delimiter,'MultipleDelimsAsOne',1));
end
line_n = 1;
while line_n <= block_length
block_data(block_n,line_n,:) = cell2mat(textscan(tline,line_format,'delimiter',delimiter,'MultipleDelimsAsOne',1));
tline = fgets(fid);
line_n = line_n+1;
end
end
fclose(fid)

Get all the values from the "for loop" run

I have some *.dat files in a folder, I would like to extract a particular column (8th column) from all of the files and put into a excel file. I have run a for loop, but it only gives me the results of final run (i.e. if there are 10 number of files, it only returns me 8th column of the 10th files).
data = cell(numel(files),1);
for i = 1:numel(files)
fid = fopen(fullfile(pathToFolder,files(i).name), 'rt');
H = textscan(fid, '%s', 4, 'Delimiter','\n');
C = textscan(fid, repmat('%f ',1,48), 'Delimiter',' ', ...
'MultipleDelimsAsOne',true, 'CollectOutput',true);
fclose(fid);
H = H = H{1}; C = C{1};
data{i} = C;
B = C(:,8);
end
Looking for your help on this.
It would be greatly appreciated.
You are overwriting B each iteration. B(:,i) will put each column 8 of C in a column of B.