Octave - Convert number-strings from CSV import file to numerical matrix - type-conversion

I'm writing a code to import data from a CSV file and convert it into an Octave matrix. The imported data can be seen in the following snap:
In the next step I added the following command to delete the commas and "":
meas_raw_temp = strrep(meas_raw_temp,',',' ')
And then I get the data format in the following form:
The problem is that Octave still sees the data as 1 single 1-dimensional array. i.e., when I use the size command I get a single number, i.e. 2647. What I need to have is a matrix output, with each line of the snaps being a row of the matrix, and with each element separated.
Any thoughts?

Here's what's happening.
You have a 1-dimensional (rows only) cell array. Each element (i.e. cell) in the cell array contains a single string.
Your strings contain commas and literal double-quotes in them. You have managed to get rid of them all by replacing them in-place with an 'empty string'. Good. However that doesn't change the fact that you still have a single string per cell.
You need to create a for loop to process each cell separately. For each cell, split the string into its components (i.e. 'tokens') using ' ' (i.e. space) as the delimiter. You can use either strsplit or strtok appropriately to achieve this. Note that the 'tokens' you get out of this process are still of 'string' type. If you want numbers, you'll need to convert them (e.g. using str2double or something equivalent).
For each cell you process, find a way to fill the corresponding row of a preallocated matrix.
As Adriaan has pointed out in the comments, the exact manner in which you follow the steps above programmatically can vary, therefore I'm not going to provide the range of possible ways that you could do so, and I'm limiting the answer to this question to the steps above, which is how you should think about solving your problem.
If you attempt these steps and get stuck on a 'programmatic' aspect of your implementation, feel free to ask another stackoverflow question.

Related

MATLAB: making a histogram plot from csv files read and put into cells?

Unfortunately I am not too tech proficient and only have a basic MATLAB/programming background...
I have several csv data files in a folder, and would like to make a histogram plot of all of them simultaneously in order to compare them. I am not sure how to go about doing this. Some digging online gave a script:
d=dir('*.csv'); % return the list of csv files
for i=1:length(d)
m{i}=csvread(d(i).name); % put into cell array
end
The problem is I cannot now simply write histogram(m(i)) command, because m(i) is a cell type not a csv file type (I'm not sure I'm using this terminology correctly, but MATLAB definitely isn't accepting the former).
I am not quite sure how to proceed. In fact, I am not sure what exactly is the nature of the elements m(i) and what I can/cannot do with them. The histogram command wants a matrix input, so presumably I would need a 'vector of matrices' and a command which plots each of the vector elements (i.e. matrices) on a separate plot. I would have about 14 altogether, which is quite a lot and would take a long time to load, but I am not sure how to proceed more efficiently.
Generalizing the question:
I will later be writing a script to reduce the noise and smooth out the data in the csv file, and binarise it (the csv files are for noisy images with vague shapes, and I want to distinguish these shapes by setting a cut off for the pixel intensity/value in the csv matrix, such as to create a binary image showing these shapes). Ideally, I would like to apply this to all of the images in my folder at once so I can shift out which images are best for analysis. So my question is, how can I run a script with all of the csv files in my folder so that I can compare them all at once? I presume whatever technique I use for the histogram plots can apply to this too, but I am not sure.
It should probably be better to write a script which:
-makes a histogram plot and/or runs the binarising script for each csv file in the folder
-and puts all of the images into a new, designated folder, so I can sift through these.
I would greatly appreciate pointers on how to do this. As I mentioned, I am quite new to programming and am getting overwhelmed when looking at suggestions, seeing various different commands used to apparently achieve the same thing- reading several files at once.
The function csvread returns natively a matrix. I am not sure but it is possible that if some elements inside the csv file are not numbers, Matlab automatically makes a cell array out of the output. Since I don't know the structure of your csv-files I will recommend you trying out some similar functions(readtable, xlsread):
M = readtable(d(i).name) % Reads table like data, most recommended
M = xlsread(d(i).name) % Excel like structures, but works also on similar data
Try them out and let me know if it worked. If not please upload a file sample.
The function csvread(filename)
always return the matrix M that is numerical matrix and will never give the cell as return.
If you have textual data inside the .csv file, it will give you an error for not having the numerical data only. The only reason I can see for using the cell array when reading the files is if the dimensions of individual matrices read from each file are different, for example first .csv file contains data organised as 3xA, and second .csv file contains data organised as 2xB, so you can place them all into a single structure.
However, it is still possible to use histogram on cell array, by extracting the element as an array instead of extracting it as cell element.
If M is a cell matrix, there are two options for extracting the data:
M(i) and M{i}. M(i) will give you the cell element, and cannot be used for histogram, however M{i} returns element in its initial form which is numerical matrix.
TL;DR use histogram(M{i}) instead of histogram(M(i)).

Can i write out a txt or csv doc with data of varying dimensions in Matlab?

I am using Matlab R2013b.
I have a 100x100 matrix which contains both numbers and strings. I converted it to a cell array (alldat) and wrote it to a csv file (blah.csv).
I then tried to append a single number to the top line of this csv file...which Matlab won't let me do.
cell2csv('blah.csv',alldat)
I can append the single number 'n' at the bottom of the matrix:
dlmwrite('blah.csv',n,'-append','delimiter',' ','roffset',1)
But it won't let me do it the other way around (so I can put the number in the first cell of the csv file, then have the matrix below it.
Can anyone advise?
I also tried outputting the cell array to a txt document using dlmwrite:
dlmwrite('blah.txt',alldat,'delimiter',' ');
And I kept getting this error:
Error using dlmwrite (line 113) The input cell array cannot be
converted to a matrix.
I often use tables for such tasks. Since you have a 100 x 100 array and not variables with different dimensions, it should be possible to adapt.
VarA={'12A3';123;'12B3'};
VarB={'45A6';456;'45B6'};
T=table(VarA,VarB);
writetable(T,'test.csv','WriteVariableNames',false)
T1=readtable('test.csv','ReadVariableNames',false)
You may want to use cell2table to create a table directly from your cell array, although it didn't work for me because it made some strange conversions from number to character.

Octave / Matlab - Reading fixed width file

I have a fixed width file format (original was input for a Fortran routine). Several lines of the file look like the below:
1078.0711005.481 932.978 861.159 788.103 716.076
How this actually should read:
1078.071 1005.481 932.978 861.159 788.103 716.076
I have tried various methods, textscan, fgetl, fscanf etc, however the problem I have is, as seen above, sometimes because of the fixed width of the original files there is no whitespace between some of the numbers. I cant seem to find a way to read them directly and I cant change the original format.
The best I have come up with so far is to use fgetl which reads the whole line in, then I reshape the result into an 8,6 array
A=fgetl
A=reshape(A,8,6)
which generates the following result
11
009877
703681
852186
......
049110
787507
118936
So now I have the above and thought I might be able to concatenate the rows of that array together to form each number, although that is seeming difficult as well having tried strcat, vertcat etc.
All of that seems a long way round so was hoping for some better suggestions.
Thanks.
If you can rely on three decimal numbers you can use a simple regular expression to generate the missing blanks:
s = '1078.0711005.481 932.978 861.159 788.103 716.076';
s = regexprep(s, '(\.\d\d\d)', '$1 ');
c = textscan(s, '%f');
Now c{1} contains your numbers. This will also work if s is in fact the whole file instead of one line.
You haven't mentioned which class of output you needed, but I guess you need to read doubles from the file to do some calculations. I assume you are able to read your file since you have results of reshape() function already. However, using reshape() function will not be efficient for your case since your variables are not fixed sized (i.e 1078.071 and 932.978).
If I did't misunderstand your problem:
Your data is squashed in some parts (i.e 1078.0711005.481 instead
of 1078.071 1005.481).
Fractional part of variables have 3 digits.
First of all we need to get rid of spaces from the string array:
A = A(~ismember(A,' '));
Then using the information that fractional parts are 3 digits:
iter = length(strfind(A, '.'));
for k=1:iter
[stat,ind] = ismember('.', A);
B(k)=str2double(A(1:ind+3));
A = A(ind+4:end);
end
B will be an array of doubles as a result.

dynamically specifying floating point number size in a function-enclosed sscanf statement

I have a structured data file consisting of header lines interspersed with blocks of data. I am reading each block of data (as defined by the header line) into a separate cell of a cell array. For instance, suppose that after loading the data with textscan, I have a cell array x and an array of indices of header lines and EOF (headerIdx) of the following form:
x={'header line 1';'98.78743';'99.39717';'99.93578';'100.40125';'100.79166';'101.10525';'101.34037';'101.49553';'101.56939';'101.56072';'101.4685';'101.29184';'101.03002';'100.68249';'header line 2';'100.24887';'99.72897';'99.12274';'98.43036';'97.65215';'96.78864';'95.84054';'header line 3';'3.2';'4.31';'2.7';'4.6';'9.3'};
headerIdx=[1;16;24;30];
I then attempt to extract each block of data below a header line into a separate element of a cell array using sscanf and str2mat (as suggested by this post). Initially, this approach failed because the elements within a given block of data were of different length. This can be solved by including a numerical flag for the '%f' argument to help sscanf know where to delimit the input data (as suggested by this post). One can then use a strategy such as the following to effect the conversion of structured data to a cell array of block-specific double arrays:
extract_data = #(n) sscanf(str2mat(x(headerIdx(n)+1:headerIdx(n+1)-1)).',['%' num2str(size(str2mat(x(headerIdx(n)+1:headerIdx(n+1)-1)).',1)) 'f']);
extracted_data = arrayfun(extract_data,1:numel(headerIdx)-1,'UniformOutput',false);
The numerical flag of the format string can either be set to something arbitrarily large to encompass all the data, or can be set on a block-specific basis as I have done in the example above. The latter approach leads to redundant evaluation of str2mat (once for the input to sscanf and once for the input to the '%f' string generator. Can this redundancy be avoided without using loop statements that store the output of the str2mat command in a temporary variable? Note that one cannot simply take the output of the size command applied to the output of str2mat(x).' on the entire data set because the header lines are generally going to be the lines with the greatest number of characters.
Finally, I have constructed the x matrix above to reflect the fact that some blocks of data may have different precision than other blocks. This is the reason to set the format string in a block-specific manner. My testing has shown that despite accurate construction of a block-specific format string (['%' num2str(size(str2mat(x(headerIdx(n)+1:headerIdx(n+1)-1)).',1)) 'f']), the data in all elements of the resulting cell array (extracted_data) are ultimately forced to have the same precision (see below). Why is this the case, and how can it be corrected?
extracted_data{:}
ans =
98.7874
99.3972
99.9358
100.4013
100.7917
101.1052
101.3404
101.4955
101.5694
101.5607
101.4685
101.2918
101.0300
100.6825
ans =
100.2489
99.7290
99.1227
98.4304
97.6522
96.7886
95.8405
ans =
3.2000
4.3100
2.7000
4.6000
9.3000

Avoid list being truncated by Matlab

Edit 1 - This question has been solved and it was due to a typo thanks to Floris for spotting this.
I have a one line matrix in Matlab which it is truncating and causing me to loose data.
My code reads:
[status,Vf_rpm_string] = system (fragment_velocity_string);
Vf_rpm_shape=regexprep(Vf_rpm_string,'\n',' ');
Vf_rpm_vector=str2num(Vf_rpm_string);
Vf_rpm= reshape(Vf_rpm_vector,[],1);
The code conducts a system command and stores the result, the result is a Matrix of numbers and sometimes the last line in the matrix has less columns than the previous lines. Matlab doesn't like this, as it does not know what to do with the empty few columns in the last line. So I have to remove the new line character from the results (\n) and replace it with a space.
This was working fine until the results from the system command were too large and so when I remove the new line character (\n) and replace it with a space creating a one line matrix it is too long for Matlab and it truncates it and I loose a lot of my data. So when I convert the returned data (which is returned as a string) to a number it gives me an empty matrix, then the reshape command is pointless at this point.
This is how it reads in Matlab:
20.65866342... Output truncated. Text exceeds maximum line length of 25,000 characters for Command Window display.
So the 20.65866342 is the last value before I start to loose data. I know it says it is too large for the command window but still the variable does not store all the data and it is lost.
Does anyone have any solutions to avoid this truncation?
Or does anyone want to suggest an alternative method for me to convert my data?
I am using Matlab 2012b and Windows 7
Thanks for your time.
Could the problem be that you strip the newlines, but the stripped string isn't the one you are parsing?
[status,Vf_rpm_string] = system (fragment_velocity_string);
Vf_rpm_shape=regexprep(Vf_rpm_string,'\n',' ');
Vf_rpm_vector=str2num(Vf_rpm_string);
Vf_rpm= reshape(Vf_rpm_vector,[],1);
That third line of code should be
Vf_rpm_vector=str2num(Vf_rpm_shape);
if I understand the logic of your code.