Matlab: Use Splitapply to write multiple files - matlab

I have grouped tables by a variable and I am trying to write multiple files based on the grouping variable. But it does not work.
I used findgroups and splitapply, but the splitapply is where I am having problems.
Here is one version of the commands I am using:
load patients;
G=findgroups(Gender);
func=#(x,y) csvwrite(x,y);
splitapply(func,Gender,Weight,G);
I am getting the following error message:
Error using splitapply (line 132)
Applying the function '#(x,y)csvwrite(x,y)' to the 1st group of data generated the following error:
FILENAME must be a character vector or string scalar.
When I figure out how to use this, I will be using it on large datastore tall arrays. Please help !

The problem is that the first parameter of csvwrite must be a file name.
In your code sample, the first parameter to csvwrite is a cell array, and not a string.
You can see it by using the following trick:
func=#(x,y) display(x);
The output of splitapply(func,Gender,Weight,G) is:
x =
53×1 cell array
{'Female'}
{'Female'}
{'Female'}
...
x =
47×1 cell array
{'Male'}
{'Male'}
{'Male'}
Solution:
Use x{1} instead of x:
func=#(x,y) csvwrite(x{1}, y);
It's recommended to add file extension like .txt to file name:
func=#(x,y) csvwrite([x{1}, '.txt'], y);
Remark:
It's possible that the combination of splitapply with csvwrite misses the original intent of splitapply function.
According to documentation it looks like splitapply is better fitted for statistical calculations (and not intended to be used for I/O operations [writing files]).
I am not sure whether the above code pattern is the right way for "using it on large datastore tall arrays".
Complete code sample:
load patients;
G=findgroups(Gender);
%The first parameter of csvwrite must be a file name.
%x{1} = 'Male' for all the Male group, and 'Female' for all the Female group.
%[x{1}, '.txt'] adds a '.txt' extension to the file name.
%
%y will be array of Weight like
%[71
% 69
% 68]
func=#(x,y) csvwrite([x{1}, '.txt'], y);
splitapply(func,Gender,Weight,G)

Related

Dynamically labelling in MATLAB

I have a MATLAB script which creates an matrix, 'newmatrix', and exports it as matrix.txt:
save -ascii matrix.txt newmatrix
In my script I also calculate the distance between certain elements of the matrix, as the size of the matrix depends on a variable 'width' which I specify in the script.
width = max(newmatrix(:,5)) - min(newmatrix(:,5))
x_vector = width + 2
And the variable x_vector is defined as width + 2
I want to know is it possible to export x_vector, labelling it as, eg my_vector $x_vector so that "my_vector 7.3" will be produced when the value of x_vector is equal to 7.3
I have tried:
save -ascii 'my_vector' + x_vector
But receive the following errors:
warning: save: no such variable +
warning: no such variable 'my_vector'
Three things:
1) I prefer to use functional form of the calls so that you can pass in variables rather than static strings.
save -ascii matrix.txt newmatrix
is equivalent to:
save('-ascii','matrix.txt','newmatrix')
In other words, in the first form all inputs get treated as string inputs to the function.
2) You can't add character arrays in Matlab. Rather you concatenate them or use sprintf.
name = sprintf('my_vector_%g',x_vector);
save('-ascii',name)
Note by using the functional form we can now pass in a variable. Note however this won't work because name should be either a valid option or a variable, and my_vector_7.3 isn't either.
3) I'm not entirely sure what you're asking, but I think you want the text file to say "my_vector 7.3". I don't think -ascii supports strings .... You could write something using fprintf.
fid = fopen('matrix.txt','w');
fprintf(fid,mat2str(new_matrix));
fprintf(fid,'\n');
fprintf(fid,'my_vector %g',x_vector);
fclose(fid);

Initialize variables with names from a file

I have a txt file with a bunch of parameters created by the external program.
Let us consider a simple example:
input.txt
a= 1
b= 2
c= 3
I can read both names and values in matlab 2018a:
[names, values]=textread('input.txt','%s%f');
As a result, names will be a 3x1 cell array with entries a=, b= and so on, whereas values will be a conventional 3x1 array of doubles.
In my current workspace, I want to initialize the obtained variables (with the corresponding names) and set them equal to the corresponding values.
In the example above, variables a=1, b=2 and c=3 should be created in the current workspace.
I have no idea how to do it...
Thanks!
Edit: in my actual example variable names can contain many characters/numbers (by the standard convention, variable names always start with the letter, not a digit), e.g.
Rcirc1= 30.0
SaveStride= 1000
You can use a combination of regexp and assignin to achieve the desired output:
%Read data.
data = fileread('input.txt')
%Extract variable name and value in named groups.
s = regexp(data,'(?<var>[A-Z]\w+)\D+(?<val>\d+(?:\.\d+)?)','names');
%Loop over struct s contents to create variables in workspace.
cellfun(#(x,y) assignin('base',x,str2double(y)),{s.var},{s.val})
The assignments in the text file can be directly evaluated by MATLAB. You don't need to extract them. In order to silence the text printed for each line you can use evalc
evalc(fileread('input.txt'));

How can I apply Huffman coding correctly?

I applied the zigzag function after quantization to an image block, and I want to compute the Huffman coding of this block. I understand that the input argument must be a vector, and that the histogram should be calculated.
I wrote the following code, but it doesn't seem to work:
[M N]=size(yce);
fun1=zigzag(yce);
count1 = imhist(fun1);
p1 = count1/ numel(fun1);
[dict1,avglen1]=huffmandict(count1,p1);
comp1= huffmanenco(fun1,dict1);
Im1 = huffmandeco(comp1,dict1);
I get the following error with the huffmandict function:
Error in project at 65
[dict1,avglen1]=huffmandict(count1,p1);
Source symbols repeat.
zigzag.m is a written function in a matlab file.it converts a matrix into a vector,thus eliminating long sequences of zeros.
The Huffman encoding function (huffmandict) in MATLAB requires that the symbols vector (first argument of the function) must all be unique values. This symbols vector is a list of all possible symbols that are seen in your data that you want to encode / compress. As such, it wouldn't make sense to have a list of all symbols to be encountered if there are duplicates. This is much like a dictionary of words, where it wouldn't make sense to see the same word twice in this dictionary. The second parameter of the function is the associated probabilities of occurrence for each symbol in your sequence.
With huffmandict, what you are doing is you are creating a dictionary for Huffman encoding that consists of all possible unique symbols to be encountered when encoding/decoding as well as their associated probabilities. Therefore, by examining your code, you need to extract both the bin locations as well as the probabilities of occurrence when using imhist. Essentially, you need to call the two element output version of imhist. The second output of imhist gives you a list of all possible intensities / symbols that were encountered in the data, while the first element gives you the frequency of each these intensities / symbols in your data. You then normalize the first output element by the total number of symbols / intensities in your data to get the probabilities (assuming equiprobable encounters of course). Once this is complete, you use both of these as input into huffmandict.
In other words, you need to change only two lines of code, thus:
[M N]=size(yce);
fun1=zigzag(yce);
[count1,x] = imhist(fun1); %// Change
p1 = count1/ numel(fun1);
[dict1,avglen1]=huffmandict(x,p1); %// Change
comp1= huffmanenco(fun1,dict1);
Im1 = huffmandeco(comp1,dict1);
Edit
Knowing how fun1 is structured now, do not use imhist. imhist assumes that you are putting in image data, but it doesn't look like that's the case. Instead, try using histc instead to compute the frequency of occurrence. As such, simply modify your code to this:
[M N]=size(yce);
fun1=zigzag(yce);
bins = unique(fun1); %// Change
count1 = histc(fun1, bins); %// Change
p1 = count1/ numel(fun1);
[dict1,avglen1]=huffmandict(bins,p1); %// Change
comp1= huffmanenco(fun1,dict1);
Im1 = huffmandeco(comp1,dict1);
unique finds those unique values that are in your vector so that we can use these as bins to calculate our frequencies. This also figures out the all possible symbols seen in the data.

csvwrite in loop with numbered filenames in matlab

kinda new to matlab here, searching the csvwrite tutorial and some of the existing webportals regarding my question couldn't find a way to pass my variables by value to the output file names while exporting in csv; providing my bellow scripts, i would like to have the output files something like output_$aa_$dd.csv which aa and dd are respectively the first and second for counters of the scripts.
for aa=1:27
for dd=1:5
M_Normal=bench(aa,dd).Y;
for j=1:300
randRand=M_Normal(randperm(12000,12000));
for jj = 1:numel(randMin(:,1)); % loops over the rand numbers
vv= randMin(jj,1); % gets the value
randMin(jj,j+1)=min(randRand(1:vv)); % get and store the min of the selction in the matix
end
end
csvwrite('/home/amir/amir_matlab/sprintf(''%d%d',aa, bb).csv',randMin);
end
end
String concatenation in MATLAB is done like a matrix concatenation. For example
a='app';
b='le';
c=[a,b] % returns 'apple'
Hence, in your problem, the full path can be formed this way.
['/home/amir/amir_matlab/',sprintf('%d_%d',aa,bb),'.csv']
Furthermore, it is usually best not to specify the file separator explicitly, so that your code can be implemented in other operating systems. I suggest you write the full path as
fullfile('home','amir','amir_matlab',sprintf('%d_%d.csv',aa,bb))
Cheers.

Reading in points from a file

I have a txt file in which each row has the x, y ,z coordinates of the point. seperated by space.I want to read points from this txt file and store it as a matrix in matlab of the form [Pm_1 Pm_2 ... Pm_nmod] where each Pm_n is a point .Could someone help me with this?
I have to actually enter it into a code which accepts the model as :
"model - matrix with model points, [Pm_1 Pm_2 ... Pm_nmod]"
I use importdata heavily for this. It reads all kinds of formats ; I normally use other methods like dlmread only if importdata doesn't work.
Usage is as simple as M = importdata('data.txt');
Just use
load -ascii data.txt
That creates a matrix called `data' in your workspace whose rows contain the coordinates.
You can find all the details of the conversion in the documentation for the load command.