Convert .mat to .csv octave/matlab - matlab

I'm trying to write an octave program that will convert a .mat file to a .csv file. The .mat file has a matrix X and a column vector y. X is populated with 0s and 1s and y is populated with labels from 1 to 10. I want to take y and put it in front of X and write it as a .csv file.
Here is a code snippet of my first approach:
load(filename, "X", "y");
z = [y X];
basename = split{1};
csvname = strcat(basename, ".csv");
csvwrite(csvname, z);
The resulting file contains lots of really small decimal numbers, e.g. 8.560596795891285e-06,1.940359477121703e-06, etc...
My second approach was to loop through and manually write the values out to the .csv file:
load(filename, "X", "y");
z = [y X];
basename = split{1};
csvname = strcat(basename, ".csv");
csvfile = fopen(csvname, "w");
numrows = size(z, 1);
numcols = size(z, 2);
for i = 1:numrows
for j = 1:numcols
fprintf(csvfile, "%d", z(i, j));
if j == numcols
fprintf(csvfile, "\n");
else
fprintf(csvfile, ",");
end
end
end
fclose(csvfile);
That gave me a correct result, but took a really long time.
Can someone tell me either how to use csvwrite in a way that will write the correct values, or how to more efficiently manually create the .csv file.
Thanks!

The problem is that if y is of type char, your X vector gets converted to char, too. Since your labels are nothing else but numbers, you can simply convert them to numbers and save the data using csvwrite:
csvwrite('data.txt', [str2num(y) X]);
Edit Also, in the loop you save the numbers using integer conversion %d, while csvwrite writes doubles if your data is of type double. If the zeros are not exactly zeros, csvwrite will write them with scientific notation, while your loop will round them. Hence the different behavior.

Just a heads up your code isn't optimized for Matab / octave. Switch the for i and for j lines around.
Octave is in column major order so its not cache efficient to do what your doing. It will speed up the overall loop by making the change to probably an acceptable time

Related

Skip all type of characters matlab while using fscanf or sscanf

I need to read a text that has mix of numerical values and characters. Here is an example:
% Loc : LAT = -19.6423 LON = -70.817 DEP = 21.5451196625
I only need to read numerical fields.
Typically I used this:
x = fgetl(fid);
out = sscanf(x,'%% Loc : LAT = %f LON = %f DEP = %f\n');
It works but the problem is that not all the files have fixed format and sometimes letters are written in upper or lower cases. In such cases what I did does not work.
I tried skipping all characters using
out = sscanf(x,'%[a-zA-Z.=\t\b]s %f %[a-zA-Z.=\t\b]s %f %[a-zA-Z.=\t\b]s %f\n');
but it does not work!
Please note that file lines are not the same and I have different number of numerical field in each line of file.
I'm still a little unclear on your file format, but it seems like you could do this much easier using textscan instead of the lower level functions.
Something like this should work:
while (~feof(fid))
textscan(fid, '%s :'); % Read the part of the line through the colon
data = textscan(fid, '%s = %f');
% Do something with the data here
end
The variable fid is an file identifier that you would have to have gotten from calling fopen and you'll need to call fclose when you're done.
I don't think this is going to exactly fix your problem, but hopefully it will get you on a track that's much shorter and cleaner. You'll have to play with this to make sure that you actually get to the end of file, for example, and that there are not corner cases that trip up the pattern matching.
*scanf() uses a format string like "%d", not a multi-character constant like '%d'
Detail: " vs. '.
"%[] does not use a trailing 's' as OP used in '%[a-zA-Z.=\t\b]s'
"%n" records the int count of characters scanned so far.
Suggest
// Adjust these as needed
#define SKIPCHAR " %*[a-zA-Z.%:]"
#define EQUALFLOAT " =%f"
int n = 0;
float lat, lon, dep;
sscanf(x, SKIPCHAR EQUALFLOAT SKIPCHAR EQUALFLOAT SKIPCHAR EQUALFLOAT " %n",
&lat, &lon, &dep, &n);
if (n > 0 && x[n] == '\0') Success();
else Fail();
To cope with different number of numbers in a line:
#define XN 100
float x[XN];
char *p = x;
size_t i;
for (i=0; i<XN; i++) {
int n = 0;
sscanf(p, SKIPCHAR " %n", &n);
p += n;
n = 0;
sscanf(p, EQUALFLOAT " %n", &x[i], &n);
if (n == 0) break;
p += n;
}
I've found a possible solution even if it is for sure not "elegant", nevertheless seems working.
It is based on the following process:
read the file line by line using fgets
parse each string using strtok
try converting each token to a numebr with str2num
if it is actually a "number" str2num (i. e. if str2num does not returns an empty array) insert the number in the output matrix
The output matrix is initialized (to NaN) at the beginning of the script as big enough to have:
a number of rows greater or equal to the number of rows of the input file (if it is not known in advance, a "reasonable" value should be defined)
a number of columns greater or equal to the maximum number of numeric values that can be present in a row of the input file (if it is not known in advance, a "reasonable" value should be defined).
Once you've read all the input file, you can "clean" the the output matrix by removing the exceeding full NaN rows and columns.
In the following you can find the script, the input file I've used and the output matrix (looking at it should make more clear the reason for having initialized it to NaN - I hope).
Notice that the identification of the number and their extraction (using strtok) is based on the format of your the example row: in particular, for example, it is based on the fact that all the token of the string are separated by a space.
This means that the code is not able to identify =123.456 as number.
If your input file has token such as =123.456, the code has to be modified.
% Initialize rows counter
r_cnt=0;
% INitialize column counter
c_cnt=0;
% Define the number of rows of the input file (if it not known in advance,
% put a "reasonable" value) - Used to initialize the output matrix
file_rows=5;
% Define the number of numeric values to be extracted from the input file
% (if it not known in advance, put a "reasonable" value) - Used to
% initialize the output matrix
max_col=5;
% Initialize the variable holding the maximum number of column. Used to
% "clean" the output matrix
max_n_col=-1;
% Initialize the output matrix
m=nan(file_rows,max_col);
% Open the input file
fp=fopen('char_and_num.txt','rt');
% Get the first row
tline = fgets(fp);
% Loop to read line by line the input file
while ischar(tline)
% Increment the row counter
r_cnt=r_cnt+1;
% Parse the line looking for numeric values
while(true)
[str, tline] = strtok(tline);
if(isempty(str))
break
end
% Try to conver the string into a number
tmp_val=str2num(str);
if(~isempty(tmp_val))
% If the token is a number, increment the column counter and
% insert the number in the output matrix
c_cnt=c_cnt+1;
m(r_cnt,c_cnt)=tmp_val;
end
end
% Identify the maximum number not NaN column in the in the output matrix
% so far
max_n_col=max(max_n_col,c_cnt);
% Reset the column counter before nest iteration
c_cnt=0;
% Read next line of the input file
tline = fgets(fp);
end
% After having read all the input file, close it
fclose(fp)
% Clean the output matrix removing the exceeding full NaN rows and columns
m(r_cnt+1:end,:)=[];
m(:,max_n_col+1:end)=[];
m
Input file
% Loc : LAT = -19.6423 LON = -70.817 DEP = 21.5451196625
% Loc : xxx = -1.234 yyy = -70.000 WIDTH = 333.369 DEP = 456.5451196625
% Loc : zzz = 1.23
Output
m =
-19.6423 -70.8170 21.5451 NaN
-1.2340 -70.0000 333.3690 456.5451
1.2300 NaN NaN NaN
Hope this helps.

How to store .csv data and calculate average value in MATLAB

Can someone help me to understand how I can save in matlab a group of .csv files, select only the columns in which I am interested and get as output a final file in which I have the average value of the y columns and standard deviation of y axes? I am not so good in matlab and so I kindly ask if someone to help me to solve this question.
Here what I tried to do till now:
clear all;
clc;
which_column = 5;
dirstats = dir('*.csv');
col3Complete=0;
col4Complete=0;
for K = 1:length(dirstats)
[num,txt,raw] = xlsread(dirstats(K).name);
col3=num(:,3);
col4=num(:,4);
col3Complete=[col3Complete;col3];
col4Complete=[col4Complete;col4];
avgVal(K)=mean(col4(:));
end
col3Complete(1)=[];
col4Complete(1)=[];
%columnavg = mean(col4Complete);
%columnstd = std(col4Complete);
% xvals = 1 : size(columnavg,1);
% plot(xvals, columnavg, 'b-', xvals, columnavg-columnstd, 'r--', xvals, columnavg+columstd, 'r--');
B = reshape(col4Complete,[5000,K]);
m=mean(B,2);
C = reshape (col4Complete,[5000,K]);
S=std(C,0,2);
Now I know that I should compute mean and stdeviation inside for loop, using mean()function, but I am not sure how I can use it.
which_column = 5;
dirstats = dir('*.csv');
col3Complete=[]; % Initialise as empty matrix
col4Complete=[];
avgVal = zeros(length(dirstats),2); % initialise as columnvector
for K = 1:length(dirstats)
[num,txt,raw] = xlsread(dirstats(K).name);
col3=num(:,3);
col4=num(:,4);
col3Complete=[col3Complete;col3];
col4Complete=[col4Complete;col4];
avgVal(K,1)=mean(col4(:)); % 1st column contains mean
avgVal(K,2)=std(col4(:)); % 2nd column contains standard deviation
end
%columnavg = mean(col4Complete);
%columnstd = std(col4Complete);
% xvals = 1 : size(columnavg,1);
% plot(xvals, columnavg, 'b-', xvals, columnavg-columnstd, 'r--', xvals, columnavg+columstd, 'r--');
B = reshape(col4Complete,[5000,K]);
meanVals=mean(B,2);
I didn't change much, just initialised your arrays as empty arrays so you do not have to delete the first entry later on and made avgVal a column vector with the mean in column 1 and the standard deviation in column 1. You can of course add two columns if you want to collect those statistics for your 3rd column in the csv as well.
As a side note: xlsread is rather heavy for reading files, since Excel is horribly inefficient. If you want to read a structured file such as a csv, it's faster to use importdata.
Create some random matrix to store in a file with header:
A = rand(1e3,5);
out = fopen('output.csv','w');
fprintf(out,['ColumnA', '\t', 'ColumnB', '\t', 'ColumnC', '\t', 'ColumnD', '\t', 'ColumnE','\n']);
fclose(out);
dlmwrite('output.csv', A, 'delimiter','\t','-append');
Load it using csvread:
data = csvread('output.csv',1);
data now contains your five columns, without any headers.

Copying Contents of Matrix in MATLAB

I am trying to copy the results to a matrix and want the output in a 32768*8 array. This is the code I am using, but it stops working after the last line.
As you can see for the first file ( i=1), the decimal data,T(32768*1) is converted to M(32768*8). Now I want this M to be stored for each iteration of i, without overwriting anything.
Files_list = getAllFiles('C:\Stellaris Measurements\Stellaris-LM4F120_all');
for i = 1:15000
B=num2str(cell2mat(Files_list(i)));
fid = fopen(B,'rb');
T= fread(fid,inf,'uint8','ieee-be');
total = numel(T);
%M=textread('C:\Users\admin\Workspace\STELLARIS-LM4F120_00_210214_104000_0001_temp_025.bin','%2c');
%M=dec2bin(M);
M= de2bi(T,8,'left-msb');
M = measure(i);
end
So, basically I want to create a martix for each of the measurement, which will store the converted binary results in a 32768*8 array.
Thanks!
BR,
\Kashif

Matlab: read in part of binary data

I have a data set(binary file) which i want to read only the first half of X (and corresponding Y) data which is saved to 4D matrix:
for i = 1:vols
for j = 1:cols
XY(i,:,:,j) = fread(fid,[X Y],'int16');
end
end
How do I modify the above loop so only the first e.g. 10 X data (and corresponding Y) is read in for each vols and cols?
thanks
You will need to implement reading for each vols and cols in following order:
read part of Y for the first input X, than skip rest of this line, read part of Y for the second input X, etc.
After reading of requested number of X lines, you will need to skip rest of matrix before read next (vols, cols) pair.
To skip part of matrix you can use fseek function.
Let X_count and Y_cound are dimensions of submatrix; X_total and Y_total are dimension of total matrix. You need something like following:
for i = 1:vols
for j = 1:cols
for k=1:X_count
XY(i,k,:,j) = fread(fid,Y_count,'int16');
fseek(fid,(Y_total-Y_count)*2,'cof');
end
fseek(fid,(X_total-X_count)*Y_total*2,'cof');
end
end

How to read binary file in one block rather than using a loop in matlab

I have this file which is a series of x, y, z coordinates of over 34 million particles and I am reading them in as follows:
parfor i = 1:Ntot
x0(i,1)=fread(fid, 1, 'real*8')';
y0(i,1)=fread(fid, 1, 'real*8')';
z0(i,1)=fread(fid, 1, 'real*8')';
end
Is there a way to read this in without doing a loop? It would greatly speed up the read in. I just want three vectors with x,y,z. I just want to speed up the read in process. Thanks. Other suggestions welcomed.
I do not have a machine with Matlab and I don't have your file to test either but I think coordinates = fread (fid, [3, Ntot], 'real*8') should work fine.
Maybe fread is the function you are looking for.
You're right. Reading data in larger batches is usually a key part of speeding up file reads. Another part is pre-allocating the destination variable zeros, for example, a zeros call.
I would do something like this:
%Pre-allocate
x0 = zeros(Ntot,1);
y0 = zeros(Ntot,1);
z0 = zeros(Ntot,1);
%Define a desired batch size. make this as large as you can, given available memory.
batchSize = 10000;
%Use while to step through file
indexCurrent = 1; %indexCurrent is the next element which will be read
while indexCurrent <= Ntot
%At the end of the file, we may need to read less than batchSize
currentBatch = min(batchSize, Ntot-indexCurrent+1);
%Load a batch of data
tmpLoaded = fread(fid, currentBatch*3, 'read*8')';
%Deal the fread data into the desired three variables
x0(indexCurrent + (0:(currentBatch-1))) = tmpLoaded(1:3:end);
y0(indexCurrent + (0:(currentBatch-1))) = tmpLoaded(2:3:end);
z0(indexCurrent + (0:(currentBatch-1))) = tmpLoaded(3:3:end);
%Update index variable
indexCurrent = indexCurrent + batchSize;
end
Of course, make sure you test, as I have not. I'm always suspicious of off-by-one errors in this sort of work.