rearrange/sort matrices using custom function - matlab

How can I sort/rearrange a matrix with a custom function?
I.e. if I have data like this:
data = rand(5)
data =
0.9705954 0.9280535 0.1763516 0.0024225 0.8622087
0.4187847 0.3452783 0.7068682 0.9295234 0.0906835
0.3371114 0.4808020 0.8806709 0.9573226 0.5422291
0.6477601 0.5711606 0.1111401 0.3909264 0.3565683
0.0086004 0.8695550 0.7431095 0.8492812 0.6675760
I want to sort/rearrange the matrix columns by the value of the last row divided by the value of the first row. I can't figure out how to do that in a nice way. The only approach I found so far is this ugly hack:
tmp = [data(end,:) ./ data(1,:); data]
tmp =
8.8610e-03 9.3697e-01 4.2138e+00 3.5059e+02 7.7426e-01
9.7060e-01 9.2805e-01 1.7635e-01 2.4225e-03 8.6221e-01
4.1878e-01 3.4528e-01 7.0687e-01 9.2952e-01 9.0683e-02
3.3711e-01 4.8080e-01 8.8067e-01 9.5732e-01 5.4223e-01
6.4776e-01 5.7116e-01 1.1114e-01 3.9093e-01 3.5657e-01
8.6004e-03 8.6955e-01 7.4311e-01 8.4928e-01 6.6758e-01
tmp = sortrows(tmp')'
tmp =
8.8610e-03 7.7426e-01 9.3697e-01 4.2138e+00 3.5059e+02
9.7060e-01 8.6221e-01 9.2805e-01 1.7635e-01 2.4225e-03
4.1878e-01 9.0683e-02 3.4528e-01 7.0687e-01 9.2952e-01
3.3711e-01 5.4223e-01 4.8080e-01 8.8067e-01 9.5732e-01
6.4776e-01 3.5657e-01 5.7116e-01 1.1114e-01 3.9093e-01
8.6004e-03 6.6758e-01 8.6955e-01 7.4311e-01 8.4928e-01
data = tmp(2:end,:)
data =
0.9705954 0.8622087 0.9280535 0.1763516 0.0024225
0.4187847 0.0906835 0.3452783 0.7068682 0.9295234
0.3371114 0.5422291 0.4808020 0.8806709 0.9573226
0.6477601 0.3565683 0.5711606 0.1111401 0.3909264
0.0086004 0.6675760 0.8695550 0.7431095 0.8492812
For large matrices, this is pretty slow :(

Use second output of sort to get the column indices:
[~, ind] = sort(data(end,:)./data(1,:))
result = data(:,ind);

Related

CSV file export in MATLAB adding new lines

I'm trying to take in a complex csv file in MATLAB and edit/rearrange it. I used tableread to pull out the numerical data that I then manipulated. Afterwards I aimed to take the first 52 rows of text metadata, add three more rows then combine with the numerical data and output as a csv file.
However, I can't seem to find a way to work with the text metadata rows that doesn't ultimately add new lines between them when output.
e.g. this is how my csv outputs:
#HEADER
,,,,,,
Instrument, Data Processed Analysis
,,,,,
Version,1.3.1,1.25
,,,,
Created,Monday, August 16, 2021 3:13:38 PM
,,,
Filename,D5-116.NGXDP
,,,,,
When it should look like:
#HEADER
NGX, Data Processed Analysis
Version,1.3.1,1.25
Created,Monday, August 16, 2021 3:13:38 PM
Filename,D5-116.NGXDP
Here is my code:
`
close
clear
fileName = 'D5-116.NGXDP';
fileOut = fileName(1:end-6);
fileOut = append(fileOut, '.csv');
eEnergy = 1.602176634e-19; %electron energy in coulombs
resisTor = 1e11; %Resistor value for EM
%% Reading and Edit Mass Data - this is too hardcoded right now
T = readtable(fileName, 'FileType', 'text');
T = T{:,:}; %convert from table to matrix
dataOrg(:,1:3) = T(1087:1236, 1:3); %cycle:time:Ar
dataOrg(:,4) = T(935:1084, 3); %39
dataOrg(:,5) = T(783:932, 3); %38
dataOrg(:,6) = T(631:780, 3); %37
dataOrg(:,7) = T(479:628, 3); %36
dataOrg(:,7) = dataOrg(:,7) * eEnergy * resisTor; %36Ar - convert CPS to V
dataOrg(:,3:7) = dataOrg(:,3:7) * 1000; %Convert all to mV
%% Reading the metadata from the file
T2 = fopen(fileName);
metaData = fread(T2, '*char')'; %read content
fclose(T2);
results = strsplit(metaData, '\n'); %split the char data by entry spaces
metaNeat = results(:,1:52)'; %Takes the metadata prior to #collectors
metaNeat(end+1, 1) = {'Blocks, 1'}; %added data
metaNeat(end+1, 1) = {'Cycles, 150'}; %added data
metaNeat(end+1, 1) = {'Cycle, Time, 40, 39, 38, 37, 36'};
for n = 1:length(metaNeat) %scan each row, if there is a comma split them
if contains(metaNeat(n,1), ',') == 1
tempMeta = split(metaNeat(n,1), ',')';
metaNeat(n,1:size(tempMeta, 2)) = tempMeta;
clear tempMeta
end
end
outPut = metaNeat; %metadata section
outPut(end+1:(end+150), 1:7) = num2cell(dataOrg); %mass value section
writecell(outPut,fileOut,'Delimiter',',', 'FileType', 'text')
I'm sure there is a way to do this in three lines of course. My guess is that the strsplit function I'm using to separate the char cell into multiple rows is preserving the \n value. The added lines (e.g. Blocks, 1) do not show gaps in the csv file. Any tips or advice would be appreciated.
Thanks for your time,

Dynamic Output Arguments in for-loop

I am fairly new to Matlab and I created this script to help me gather 4 numbers out of an excel file. This one works so far:
clear all
cd('F:/Wortpaare Analyse/Excel')
VPNumber = input('Which Participant Number?', 's');
filename = (['PAL_',VPNumber,'_Gain_Loss.xlsx']);
sheet = 1;
x1Range = 'N324';
GainBlock1 = xlsread(filename,sheet,x1Range);
x1Range = 'O324';
LossBlock1 = xlsread(filename,sheet,x1Range);
x1Range = 'AD324';
GainBlock2 = xlsread(filename,sheet,x1Range);
x1Range = 'AE324';
LossBlock2 = xlsread(filename,sheet,x1Range);
AnalyseProband = [GainBlock1, LossBlock1, GainBlock2, GainBlock2]
Now I would like to make a script that will analyze the first 20 excel files and tried this:
clear all
cd('F:/Wortpaare Analyse/Excel')
for VPNumber = 1:20 %for the 20 files
a = (['PAL_%d_Gain_Loss.xlsx']);
filename = sprintf(a, VPNumber) % specifies the name of the file
sheet = 1;
x1Range = 'N324';
(['GainBlock1_',VPNummer]) = xlsread(filename,sheet,x1Range);
....
end
The problem seems to be that I can only have one output argument. I would like to change the output argument in each loop, so it doesn't overwrite "GainBlock1" in every cycle.
In the end I would like to have these variables:
GainBlock1_1 (for the first excel sheet)
GainBlock1_2 (for the second excel sheet)
...
GainBlock1_20 (for the 20th excel sheet)
Is there a clever way to do that? I was able to write the first script fairly easily, but was unable to produce any significant progress in the second script. Any help is greatly appreciated.
Best,
Luca
This can be accomplished by either storing the data in an array or a cell. I believe an array will be sufficient for what you're trying to do. I've included a re-organized code sample below, broken out into functions to help make it easier to read and understand.
Basically, the first function handles the loop and controls where the data gets placed in the array, and the second function is your original script which accepts the VPNumber as an input.
What this should return is a 20x4 array, where the first index controls the sheet the data was pulled from and the second index controls whether it's [GainBlock1, LossBlock1, GainBlock2, LossBlock2]. i.e.- GainBlock 1 for sheet 5 would be AnalseProband(5,1), LossBlock2 for sheet 11 would be AnalyseProband(11,4).
function AnalyseProband = getAllProbands()
currentDir = pwd;
returnToOriginalDir = onCleanup(#()cd(currentDir));
cd('F:/Wortpaare Analyse/Excel')
numProbands = 20;
AnalyseProband = zeros(numProbands,4);
for n = 1:numProbands
AnalyseProband(n,:) = getBlockInfoFromXLS(n);
end
end
function AnalyseProband = getBlockInfoFromXLS(VPNumber)
a = 'PAL_%d_Gain_Loss.xlsx';
filename = sprintf(a, VPNumber); % specifies the name of the file
sheet = 1;
x1Range = 'N324';
GainBlock1 = xlsread(filename,sheet,x1Range);
x1Range = 'O324';
LossBlock1 = xlsread(filename,sheet,x1Range);
x1Range = 'AD324';
GainBlock2 = xlsread(filename,sheet,x1Range);
x1Range = 'AE324';
LossBlock2 = xlsread(filename,sheet,x1Range);
AnalyseProband = [GainBlock1, LossBlock1, GainBlock2, LossBlock2];
end

Excessively large overhead in MATLAB .mat file

I am parsing a large text file full of data and then saving it to disk as a *.mat file so that I can easily load in only parts of it (see here for more information on reading in the files, and here for the data). To do so, I read in one line at a time, parse the line, and then append it to the file. The problem is that the file itself is >3 orders of magnitude larger than the data contained therein!
Here is a stripped down version of my code:
database = which('01_hit12.par');
[directory,filename,~] = fileparts(database);
matObj = matfile(fullfile(directory,[filename '.mat']),'Writable',true);
fidr = fopen(database);
hitranTemp = fgetl(fidr);
k = 1;
while ischar(hitranTemp)
if abs(hitranTemp(1)) == 32;
hitranTemp(1) = '0';
end
hitran = textscan(hitranTemp,'%2u%1u%12f%10f%10f%5f%5f%10f%4f%8f%15c%15c%15c%15c%6u%2u%2u%2u%2u%2u%2u%1c%7f%7f','delimiter','','whitespace','');
matObj.moleculeNumber(1,k) = uint8(hitran{1});
matObj.isotopeologueNumber(1,k) = uint8(hitran{2});
matObj.vacuumWavenumber(1,k) = hitran{3};
matObj.lineIntensity(1,k) = hitran{4};
matObj.airWidth(1,k) = single(hitran{6});
matObj.selfWidth(1,k) = single(hitran{7});
matObj.lowStateE(1,k) = single(hitran{8});
matObj.tempDependWidth(1,k) = single(hitran{9});
matObj.pressureShift(1,k) = single(hitran{10});
if rem(k,1e4) == 0;
display(sprintf('line %u (%2.2f)',k,100*k/K));
end
hitranTemp = fgetl(fidr);
k = k + 1;
end
fclose(fidr);
I stopped the code after 13,813 of the 224,515 lines had been parsed because it had been taking a very long time and the file size was getting huge, but the last printout indicated that I had only just cleared 10k lines. I cleared the memory, and then ran:
S = whos('-file','01_hit12.mat');
fileBytes = sum([S.bytes]);
T = dir(which('01_hit12.mat'));
diskBytes = T.bytes;
disp([fileBytes diskBytes diskBytes/fileBytes])
and get the output:
524894 896189009 1707.37141022759
What is taking up the extra 895,664,115 bytes? I know the help page says there should be a little extra overhead, but I feel that nearly a Gb of descriptive header is a bit excessive!
New information:
I tried pre-allocating the file, thinking that perhaps MATLAB was doing the same thing it does when a matrix is embiggened in a loop and reallocating a chunk of disk space for the entire matrix on each write, and that isn't it. Filling the file with zeros of the appropriate data types results in a file that my short check script returns:
8531570 71467 0.00837677004349727
This makes more sense to me. Matlab is saving the file sparsely, so the disk file size is much smaller than the size of the full matrix in memory. Once it starts replacing values with real data, however, I get the same behavior as before and the file size starts skyrocketing beyond all reasonable bounds.
New new information:
Tried this on a subset of the data, 100 lines long. To stream to disk, the data has to be in v7.3 format, so I ran the subset through my script, loaded it into memory, and then resaved as v7.0 format. Here are the results:
v7.3: 3800 8752 2.30
v7.0: 3800 2561 0.67
No wonder the v7.3 format isn't the default. Does anyone know a way around this? Is this a bug or a feature?
This seems like a bug to me. A workaround is to write in chunks to pre-allocated arrays.
Start off by pre-allocating:
fid = fopen('01_hit12.par', 'r');
data = fread(fid, inf, 'uint8');
nlines = nnz(data == 10) + 1;
fclose(fid);
matObj.moleculeNumber = zeros(1,nlines,'uint8');
matObj.isotopeologueNumber = zeros(1,nlines,'uint8');
matObj.vacuumWavenumber = zeros(1,nlines,'double');
matObj.lineIntensity = zeros(1,nlines,'double');
matObj.airWidth = zeros(1,nlines,'single');
matObj.selfWidth = zeros(1,nlines,'single');
matObj.lowStateE = zeros(1,nlines,'single');
matObj.tempDependWidth = zeros(1,nlines,'single');
matObj.pressureShift = zeros(1,nlines,'single');
Then to write in chunks of 10000, I modified your code as follows:
... % your code plus pre-alloc first
bs = 10000;
while ischar(hitranTemp)
if abs(hitranTemp(1)) == 32;
hitranTemp(1) = '0';
end
for ii = 1:bs,
hitran{ii} = textscan(hitranTemp,'%2u%1u%12f%10f%10f%5f%5f%10f%4f%8f%15c%15c%15c%15c%6u%2u%2u%2u%2u%2u%2 u%1c%7f%7f','delimiter','','whitespace','');
hitranTemp = fgetl(fidr);
if hitranTemp==-1, bs=ii; break; end
end
% this part really ugly, sorry! trying to keep it compact...
matObj.moleculeNumber(1,k:k+bs-1) = uint8(builtin('_paren',cellfun(#(c)c{1},hitran),1:bs));
matObj.isotopeologueNumber(1,k:k+bs-1) = uint8(builtin('_paren',cellfun(#(c)c{2},hitran),1:bs));
matObj.vacuumWavenumber(1,k:k+bs-1) = builtin('_paren',cellfun(#(c)c{3},hitran),1:bs);
matObj.lineIntensity(1,k:k+bs-1) = builtin('_paren',cellfun(#(c)c{4},hitran),1:bs);
matObj.airWidth(1,k:k+bs-1) = single(builtin('_paren',cellfun(#(c)c{5},hitran),1:bs));
matObj.selfWidth(1,k:k+bs-1) = single(builtin('_paren',cellfun(#(c)c{6},hitran),1:bs));
matObj.lowStateE(1,k:k+bs-1) = single(builtin('_paren',cellfun(#(c)c{7},hitran),1:bs));
matObj.tempDependWidth(1,k:k+bs-1) = single(builtin('_paren',cellfun(#(c)c{8},hitran),1:bs));
matObj.pressureShift(1,k:k+bs-1) = single(builtin('_paren',cellfun(#(c)c{9},hitran),1:bs));
k = k + bs;
fprintf('.');
end
fclose(fidr);
The final size on disk is 21,393,408 bytes. The usage breaks down as,
>> S = whos('-file','01_hit12.mat');
>> fileBytes = sum([S.bytes]);
>> T = dir(which('01_hit12.mat'));
>> diskBytes = T.bytes; ratio = diskBytes/fileBytes;
>> fprintf('%10d whos\n%10d disk\n%10.6f\n',fileBytes,diskBytes,ratio)
8531608 whos
21389582 disk
2.507099
Still fairly inefficient, but not out of control.

How to programatically construct a large cell array

How could I construct automatically a dataset like the one below, assuming that the number of columns of matrix summary_whts is approx. 400???
lrwghts = dataset(...
{summary_whts(:,01),'w00'},...
{summary_whts(:,02),'w01'},...
{summary_whts(:,03),'w02'},...
{summary_whts(:,04),'w03'},...
{summary_whts(:,05),'w04'},...
{summary_whts(:,06),'w05'},...
{summary_whts(:,07),'w06'},...
{summary_whts(:,08),'w07'},...
{summary_whts(:,09),'w08'},...
{summary_whts(:,10),'w09'},...
{summary_whts(:,11),'w10'},...
{summary_whts(:,12),'w11'},...
'ObsNames',summary_mthd);
Why not use a simple loop to populate dataset?
nCols = size(summary_whts,1);
dataset = cell(nCols, 2);
for i = 1:nCols
dataset{i,1} = summary_whts(:,i);
dataset{i,2} = sprintf('w%04d', i);
end
dataset{end+1,1} = 'ObsNames';
dataset(end, 2} = summary_mthd;
At last, I found it! This is what I was looking for:
cat = [];
for i = 0:(size(X,2)),
cat = [cat;sprintf('w%03d',i)];
end
cat = cellstr(cat);
lrwghts = dataset({summary_whts,cat{:}},'ObsNames',cellstr(summary_mthd));

Recursive concatenation of Matlab structs

Is it somehow possible to concatenate two matlab structures recursively without iterating over all leaves of one of the structures.
For instance
x.a=1;
x.b.c=2;
y.b.d=3;
y.a = 4 ;
would result in the following
res = mergeStructs(x,y)
res.a=4
res.b.c=2
res.b.d=3
The following function works for your particular example. There will be things it doesn't consider, so let me know if there are other cases you want it to work for and I can update.
function res = mergeStructs(x,y)
if isstruct(x) && isstruct(y)
res = x;
names = fieldnames(y);
for fnum = 1:numel(names)
if isfield(x,names{fnum})
res.(names{fnum}) = mergeStructs(x.(names{fnum}),y.(names{fnum}));
else
res.(names{fnum}) = y.(names{fnum});
end
end
else
res = y;
end
Then res = mergeStructs(x,y); gives:
>> res.a
ans =
4
>> res.b
ans =
c: 2
d: 3
as you require.
EDIT: I added isstruct(x) && to the first line. The old version worked fine because isfield(x,n) returns 0 if ~isstruct(x), but the new version is slightly faster if y is a big struct and ~isstruct(x).