splitting a character array into a cell array and matrix - matlab

I have a text file.
In the file is approx 20,000 rows of data. Each row has one column & contains 256 characters (which are all numbers).
I need to split each row into a cell array or matrix. So each 8 characters are "one piece" of information. I want to split the first 3 characters into a cell array and the next 5 characters into a double, then same again for the next 8 characters.
example
1653256719812345
myCellArray (1 x 2) myDoubleArray (1 x 2)
[165, 198] [32567, 12345]
What is the best way to do this?

Use textscan.
fid = fopen(MyFileName.txt);
data = textscan(fid, '%3d%5d', 'Delimiter', '');
fclose(fid);
testing:
% Test with string of 256 random digits that all happen to be 1:8 repeated 32 times
x = '1234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678';
>> y = textscan(x, '%3d%5d', 'Delimiter', '')
y =
[32x1 int32] [32x1 int32]
>> y{1}
ans =
123
123
123
123
...
I don't know the exact format of your files, so you may have to do this line-by-line within a loop (in which case you would get each line using fgetl and then replace fid in the textscan statement with the output from fgetl).
In general, whenever you find yourself having to read in data that was produced by FORTRAN code (fixed field width text files), textscan's 'Delimiter, '' and 'Whitespace', '' parameters are your friend.

Use regexp. If the file data.txt contains
1653256719812345
1563256719812345
1233256719812345
1463256719812345
Then the following MATLAB statements will read the numbers.
>> txt = fileread('data.txt') % Read entire file in txt
>> out = regexp(txt,'(\d{3})(\d{5})(\d{3})(\d{5})','tokens') % Match regex capturing groups
out =
{1x4 cell} {1x4 cell} {1x4 cell} {1x4 cell}
Each cell in out is a row from the file containing the parsed numbers as strings.You can use str2double to convert the numbers to a numeric data type in MATLAB
>> nums = cellfun(#str2double,out,'uni',0)
nums =
[1x4 double] [1x4 double] [1x4 double] [1x4 double]

Iterate over your rows one by one and run something like the following code.
k=int2str(1653256719812345);
> myCellArray{1}=k(1:3)
myCellArray =
'165'
>> mydoublearray(1)=str2num(k(4:9))
mydoublearray =
325671
If there's some formulaic pattern you should incorporate that instead of manually hard coding it.

Related

cellarray to matrix in matlab

I want to import some data in an m file. So for I have managed to create a cell array of the data. I want to convert it into a matrix. I used cell2mat but I get an error. I'm new to Matlab, so I would like some help. Here is my complete code
fid = fopen('vessel-movements.csv');
C = textscan(fid, '%f %f %f %f %f %s %s %s', 'HeaderLines', 1, 'Delimiter', ',')
fclose(fid);
iscell(C)
T = cell2mat(C)
The answer I get is:
C =
Columns 1 through 4
[300744x1 double] [300744x1 double] [300744x1 double] [300744x1 double]
Columns 5 through 8
[300744x1 double] {300744x1 cell} {300744x1 cell} {300744x1 cell}
ans =
1
??? Error using ==> cell2mat at 46
All contents of the input cell array must be of the same data type.
Error in ==> test at 5
T = cell2mat(C)
My question is how I can do that? The data is in the following link vessel-movements.csv. It contains numbers, as ids and coordinates, and timestamps.
As the error message says:
All contents of the input cell array must be of the same data type.
Columns 6, 7 and 8 are chars (datestrings). It's not possible to convert them into a Matrix. Leave them in a cell.
You can transform only the numerical data into a matrix: data = cell2mat(C(:,1:5)). The three left columns have to be converted with datenum() into a numerical time to add it to data matrix.
When you've got >=R2013b, you can use as datatype a table like: data = readtable('vessel-movements.csv');
I assume you only want to convert the first five columns of C, which are the ones that contain numeric data. You can use cell2mat as follows:
M = cell2mat(C(:,1:5));
or equivalently
M = [C{:,1:5}];
The main difference between a matrix and a cell array (in MATLAB parlance) is that a matrix holds elements of the same type and size, while a cell array holds elements of different types and sizes.
You read numbers and strings. The numbers do have the same type and size (double, 1×1) while the strings are different (they're all char type, but usually different sizes).
To group your numeric data, you must select only the numeric elements of your cell array:
N = horzcat(C{1:5});
while for the strings you should keep the cell array structure:
S = horzcat(C{6:8});
Later edit: Since you admit that you're new to MATLAB, I'm going to make a general recommendation: every time you see a function that you don't know what it does—or behaves unexpectedly from your point of view—mark its name and press F1. The MATLAB documentation is quite comprehensive, and contains also lots of examples depicting the typical uses for that function.

In matlab how can someone put the values from a single cell with multiple values within it seperated by a specific delimiter in a matrix?

I would like display the data from only one column of a .csv file in a matrix. There are multiple integer numbers (to be precise 3 numbers) separated by a semi-colon per cel for each row. Here is an example of how the data looks like:
A B
1;2;3
4;5;6
(note that A means column A, column B is empty)
The desired output would be an array in Matlab with 3 columns and 2 rows.
>> matrixFromCsvFile=
1 2 3
4 5 6
What I tried: was
fid = fopen('test.csv');
matrixFromCsvFile = textscan(fid, '%d %d %d', 'delimiter', ';')
fclose(fid);
Instead of the desired output I got this:
>> matrixFromCsvFile =
[2x1 int32] [2x1 int32] [2x1 int32]
>> matrixFromCsvFile{1}
>> ans =
1
4
Did I really just created 3 arrays within an array? I want just one. Luckily the 1 and 4 values are correct though. This already took me a long time to achieve, I'm stuck.
You can fix your example just by adding a CollectOutput flag to textscan:
M = textscan(fid, '%d %d %d', 'delimiter', ';','CollectOutput',1);
By default textscan outputs columns separately (so your data is there, just in e.g. M{1}, M{2}, M{3}). Setting CollectOutput puts consecutive columns of the same class into a single array.
e.g. this would give me five columns in five arrays:
M = textscan(fid, '%d %d %f %f %f');
This would give me two arrays, one containing the first two columns, one containing the last three:
M = textscan(fid, '%d %d %f %f %f','CollectOutput',1);
Use importdata:
M = importdata('test.csv',';',1)
matrixFromCsvFile = M.data
You could go on with
matrixFromCsvFile = cell2mat(matrixFromCsvFile);

Matlab: how to write cell array of matrices to file

i have a cell of several matrices (all double and with the same dimension)
my_cell =
[172x15 double] [172x15 double] [172x15 double] [172x15 double]
I would to write the matrices on txt file side by side and tabulated, to obtain a .txt file with 172 rows and 60 columns (in this case)
use dlmwrite and cell2mat
mat = cell2mat(my_cell);
delimiter = ' '; % // used to separate two values in a row in the file
filename = 'test.txt';
dlmwrite(filename,mat,delimiter);
>> dlmwrite('file1.txt', [c{:}],'delimiter','\t','precision','%.5f')
or
>> dlmwrite('file2.txt', c(:)','delimiter','\t','precision','%.5f')
You have to choose a precision, otherwise you'll get non-uniform lines because of different numbers of decimal places.
Code
%// output_filepath is the name of your output text file
c1 = horzcat(my_cell{:})
datacell = mat2cell(c1,ones(1,size(c1,1)),ones(1,size(c1,2)))
dlmwrite(output_filepath,datacell,'\t'); %// a TAB delimiter is used

Specify decimal separator for .dat file in matlab [duplicate]

This question already has answers here:
Matlab: How to read in numbers with a comma as decimal separator?
(4 answers)
Closed 9 years ago.
I've got a bunch of .dat files, where the decimal separator is comma instead of dot. Is there any function in MATLAB to set comma as the separator?
You will have to read the data in as text (with textscan, textread, dlmread, etc.) and convert to numeric.
Say you have read the data into a cell array with each number in a cell:
>> C = {'1,2345','3,14159','2,7183','1,4142','0,7071'}
C =
'1,2345' '3,14159' '2,7183' '1,4142' '0,7071'
Use strrep and str2double as follows:
>> x = str2double(strrep(C,',','.'))
x =
1.2345 3.1416 2.7183 1.4142 0.7071
For your example data from comments, you have a file "1.dat" formatted similarly to:
1,2 3,4
5,6 7,8
Here you have a space as a delimiter. By default, textscan uses whitespace as a delimiter, so that is fine. All you need to change below is the format specifier for the number of columns in your data by repeating %s for each column (e.g. here we need '%s%s' for two columns):
>> fid = fopen('1.dat','r');
>> C = textscan(fid,'%s%s')
C =
{2x1 cell} {2x1 cell}
>> fclose(fid);
The output of textscan is a cell array for each column delimited by whitespace. Combine the columns into a single cell array and run the commands to convert to numeric:
>> C = [C{:}]
C =
'1,2' '3,4'
'5,6' '7,8'
>> x = str2double(strrep(C,',','.'))
x =
1.2000 3.4000
5.6000 7.8000

Use textscan in Matlab to output data

I've got a large text file with some headers and numerical data. I want to ignore the header lines and specifically output the data in columns 2 and 4.
Example data
[headers]
line1
line2
line3
[data]
1 2 3 4
5 6 7 8
9 10 11 12
I've tried using the following code:
FID = fopen('datafile.dat');
data = textscan(FID,'%f',4,'delimiter',' ','headerLines',4);
fclose(FID);
I only get an output of 0x1 cell
Try this:
FID = fopen('datafile.dat');
data = textscan(FID,'%f %f %f %f', 'headerLines', 6);
fclose(FID);
data will be a 1x4 cell array. Each cell will contain a 3x1 array of double values, which are the values in each column of your data.
You can access the 2nd and 4th columns of your data by executing data{2} and data{4}.
With your original code, the main issue is that the data file has 6 header lines but you've specified that there are only 4.
Additionally, though, you'll run into problems with the specification of the number of times to match the formatSpec. Take for instance the following code
data = textscan(FID,'%f',4);
which specifies that you will attempt to match a floating-point value 4 times. Keep in mind that after matching 4 values, textscan will stop. So for the sake of simplicity, let's imagine that your data file only contained the data (i.e. no header lines), then you would get the following results when executing that code, multiple times:
>> FID = fopen('datafile_noheaders.dat');
>> data_line1 = textscan(FID,'%f', 4)
data_line1 =
[4x1 double]
>> data_line1{1}'
ans =
1 2 3 4
>> data_line2 = textscan(FID,'%f', 4)
data_line2 =
[4x1 double]
>> data_line2{1}'
ans =
5 6 7 8
>> data_line3 = textscan(FID,'%f', 4)
data_line3 =
[4x1 double]
>> data_line3{1}'
ans =
9 10 11 12
>> data_line4 = textscan(FID,'%f', 4)
data_line4 =
[0x1 double]
>> fclose(FID);
Notice that textscan picks up where it "left off" each time it is called. In this case, the first three times that textscan is called it returns one row from your data file (in the form of a cell containing a 4x1 column of data). The fourth call returns an empty cell. For the usecase you described, this format is not particularly helpful.
The example given at the top should return data in a format that is much easier to work with for what you are trying to accomplish. In this case it will match four floating point values in each of your rows of data, and will continue with each line of text until it can no longer match this pattern.