Converting csv of strings into a matrix - matlab

I just started using Octave (No money for Matlab :/) and I'm also new to Stack Overflow, so please pardon any error I make with conventions.
Problem: I have a csv of strings like so:
Bob Marley,Kobe Bryant,Michael Jackson,Kevin Hart
I would like to make this into a 1 column matrix (I need it in a matrix so that I can combine it with data that are in other matrices).
My approach: I have tried doing textread, but this gives me a cell array. I tried converting the resulting cell array to a matrix by using cell2mat, but I suspect that I cannot do this because my strings are of varying lengths.
Let me know if any other information is necessary.

You can use char arrays using:
fid = fopen('strings.csv');
A = textscan(fid, '%s', 'delimiter', ',');
B = char(A{:})
[rows, cols] = size(B)
Output is the following:
B =
Bob Marley
Kobe Bryant
Michael Jackson
Kevin Hart
rows = 4
cols = 15
As you can see, the number of columns of B is the maximum length of all "strings" (Michael Jackson, 15). All other "strings" get whitespaces appended.

Considering you are in the directory where you have a file "strings.csv" with the content you mentioned in the question, your code whould look like this:
fid=fopen('strings.csv');
A=textscan(fid,'%s','delimiter',',');
A=A{1};
A=cellfun(#(x) string(x),A,'uni',0);
B=[A{:}];

If your data is that simple, you can do it in a one-liner. Use fileread to slurp all the data in, and then strsplit to separate the elements, and a ' transpose to convert it to a column vector.
x = strsplit(fileread('myfile.txt'), ',')'
If you end up with spaces around the commas in your data, upgrade to regexp.
x = regexp(fileread('myfile.txt'), ' *, *', 'split')

Related

Matlab fscanf read two column character/hex data from text file

Need to read in data stored as two columns of hex values in text file temp.dat into a Matlab variable with 8 rows and two columns.
Would like to stick with the fcsanf method.
temp.dat looks like this (8 rows, two columns):
0000 7FFF
30FB 7641
5A82 5A82
7641 30FB
7FFF 0000
7641 CF05
5A82 A57E
30FB 89BF
% Matlab code
fpath = './';
fname = 'temp.dat';
fid = fopen([fpath fname],'r');
% Matlab treats hex a a character string
formatSpec = '%s %s';
% Want the output variable to be 8 rows two columns
sizeA = [8,2];
A = fscanf(fid,formatSpec,sizeA)
fclose(fid);
Matlab is producing the following which I don't expect.
A = 8×8 char array
'03577753'
'00A6F6A0'
'0F84F48F'
'0B21F12B'
'77530CA8'
'F6A00F59'
'F48F007B'
'F12B05EF'
In another variation, I attemped changing the format string like this
formatSpec = '%4c %4c';
Which produced this output:
A =
8×10 char array
'0↵45 F7↵78'
'031A3F65E9'
'00↵80 4A↵B'
'0F52F0183F'
'7BA7B0C20 '
'F 86↵0F F '
'F724700AB '
'F6 1F↵55 '
Still another variation like this:
formatSpec = '%4c %4c';
sizeA = [8,16];
A = fscanf(fid,formatSpec);
Produces a one by 76 character array:
A =
'00007FFF
30FB 7641
5A82 5A827641 30FB
7FFF 0000
7641CF05
5A82 A57E
30FB 89BF'
Would like and expect Matlab to produce a workspace variable with 8 rows and 2 columns.
Have followed the example on the Matlab help area here:
https://www.mathworks.com/help/matlab/ref/fscanf.html
My Matlab code is based on the 'read file contents into an array' section about 1/3 of the way down the page. The example I reference is doing something very similar except that the two columns are one int and one float rather than two characters.
Running Matlab R2017a on Redhat.
Here is the complete code with the solution provided by Azim and comments about
what I learned as a result of posting the question.
fpath = './';
fname = 'temp.dat';
fid = fopen([fpath fname],'r');
formatSpec = '%9c\n';
% specify the output size as the input transposed, NOT the input.
sizeA = [9,8];
A = fscanf(fid,formatSpec,sizeA);
% A' is an 8 by 9 character array, which is the goal matrix size.
% B is an 8 by 1 cell array, each member has this format 'dead beef'.
%
% Cell arrays are data types with indexed data containers called cells,
% where each cell can contain any type of data.
B = cellstr(A');
% split divides str at whitespace characters.
S = split(C)
fclose(fid)
S =
8×2 cell array
'0000' '7FFF'
'30FB' '7641'
'5A82' '5A82'
'7641' '30FB'
'7FFF' '0000'
'7641' 'CF05'
'5A82' 'A57E'
'30FB' '89BF'
It is likely your, 8x2 MATLAB variable would end up being a cell array. This can be done in two steps.
First, your lines have 9 characters so you could use formatSpec = '%9c\n' to read each line. Next you need to adjust the size parameter to read 9 rows and 8 columns; sizeA = [9 8]. This will read in all 9 characters into columns of the output; transposing the output will get you closer.
In the second step you need to convert the result of fscanf into your 8x2 cell array. Since you have R2017a you can then use cellstr and split to get your result.
Finally, if you need the integer values of each hex value you can use hex2dec on each cell in the cell-array.

Matlab: loading every n-th column of a csv with varying formats in row using textscan

i am trying to load a csv file with semicolon (;) delimiter.
Example:
150501;190722;ms_since_start=;30001276;temp=;31.97;IT=;147753;spec num=;1000;(here i have 512 floating number repetitions and ';;' to indicate the end of line)
this pattern repeats for 1000 rows.
I have been trying to use textscan but only get empty cells with the following code
formatSpec = ['%s%s%*s%*s%*s%*s%*s%*s%*s%*s%*s' repmat('%f', [1,512]) '%*[^;;]']
M = textscan(dirtmp, formatSpec, 'Delimiter', ';')
The goal is to get the first 2 columns, skip 9, get the remaining 512 columns and repeat this for 1000 rows.
Any help is highly appreciated
Look at the answer to this question 1. I don't think you can actually skip the column but once you get the output you can say:
x = read_mixed_csv('example.csv',';');
y = x{:, [1,2, 11:size(x, 2)]};

How to save data transposed in a tab-delimited file

Assuming you have an array of 5 lines by n columns as a MATLAB variable.
How do you save to a file each column of the array into a new array as as follows:
column1 becomes line1 and so on.
I need this to be without comas between elements so it should be something along the lines of
dlmwrite('pointcloud.pts', cloud, 'delimiter', '\t');
produces
but I want column one to be saved as line one.
I think you only have to transpose your matrix. Here's an example:
n = 7;
test = rand(5, n);
dlmwrite('pointcloud.pts', test', 'delimiter', '\t');
For me it works fine. -> ' <- is the operator to transpose... Or did I understand you wrong?
EDIT: Look, I think that you are still saving the not transposed matrix. So in your case you are still saving the first 443250 elements of the first row into the first row of your file. By transposing your data with the apostroph ' you transpose the data and can store it correctly. Have a look at my code: you will see one apostrophe (as operator to transpose) after >test<.
You can see that for example if you type:
a = rand(2, 4);
a_transposed = a';

MATLAB Creating a .txt file containing numbers and strings from a cell

Dear stackoverflowers,
I'd like to create a .txt file using matlab.
The content should be separated with tabs.
It should have 3 columns, and the 3rd column should be filled with strings from a cell array.
Let's say
A=[2; 3; 3;];
B=2*A;
C=cell(3,1);
C{1,1}='string1'; C{2,1}='string2'; C{3,1}='string3';
In the end, it should look like this:
2 4 string1
3 6 string2
3 6 string3
I already found out, how to put the 2 matrices in a text file:
dlmwrite('filename.txt', [A B], 'delimiter', '\t')
But how to append the content of the cell?
It would be best, to have only the strings in the file, not the single quotes.
I neither found a solution to this elsewhere, nor did I ask this somewhere else.
I apprechiate all kinds of suggestions.
Try the following:
% Open a file for writing (if you want to append to file use 'a' instead of 'w')
fid = fopen(file,'w');
for i = 1:size(A,1)
fprintf(fid,'%d %d %s\n',A(i),B(i),C{i})
end
fclose(fid)
Hope this helps
the documentation on dlmwrite states:
Remarks
The resulting file is readable by spreadsheet programs.
The dlmwrite function does not accept cell arrays for the input matrix
M. To export a cell array that contains only numeric data, use
cell2mat to convert the cell array to a numeric matrix before calling csvwrite.
To export cell arrays with mixed alphabetic and numeric
data, where each cell contains a single element, you can create an
Excel spreadsheet (if your system has Excel installed) using xlswrite.
For all other cases, you must use low-level export functions to write
your data.
So either you write it as an Excel spreadsheet, or use have to write your own conversion function.
For example
A=[2; 3; 3;];
B=2*A;
C=cell(3,1);
C{1,1}='string1'; C{2,1}='string2'; C{3,1}='string3';
% First solution
f = fopen('filename.txt', 'w');
for n = 1:3
fprintf(f, '%d\t%d\t%s\n', A(n), B(n), C{n});
end
fclose(f);
% Another solution
% create the table as a single cell array with only strings
C2 = [arrayfun(#num2str, [A, B], 'UniformOutput', false) C]'; % <- note the transpose
f = fopen('filename.txt', 'w');
fprintf(f, '%s\t%s\t%s\n', C2{:}); % <- every three entries are written as a line
fclose(f);

Reading text values into matlab variables from ASCII files

Consider the following file
var1 var2 variable3
1 2 3
11 22 33
I would like to load the numbers into a matrix, and the column titles into a variable that would be equivalent to:
variable_names = char('var1', 'var2', 'variable3');
I don't mind to split the names and the numbers in two files, however preparing matlab code files and eval'ing them is not an option.
Note that there can be an arbitrary number of variables (columns)
I suggest importdata for operations like this:
d = importdata('filename.txt');
The return is a struct with the numerical fields in a member called 'data', and the column headers in a field called 'colheaders'.
Another useful interface for importing manipulating data like these is the 'dataset' class available in the Statistics Toolbox.
If the header is on the first row then
A = dlmread(filename,delimString,2,1);
will read the numeric data into the Matrix A.
You can then use
fid = fopen(filename)
headerString = fscanf(fid,'%s/n') % reads header data into a string
fclose(fid)
You can then use strtok to split the headerString into a cell array. Is one approach I can think of deal with an unknown number of columns
Edit
fixed fscanf function call
Just use textscan with different format specifiers.
fid = fopen(filename,'r');
heading = textscan(fid,'%s %s %s',1);
fgetl(fid); %advance the file pointer one line
data = textscan(fid,'%n %n %n');%read the rest of the data
fclose(fid);
In this case 'heading' will be a cell array containing cells with each column heading inside, so you will have to change them into cell array of strings or whatever it is that you want. 'data' will be a cell array containing a numeric array for each column that you read, so you will have to cat them together to make one matrix.