Read string from txt file and use string for loop - matlab

Trying to read a txt file, then to loop through all string of the txt file. Unfortunately not getting it to work.
fid = fopen(fullfile(source_dir, '1.txt'),'r')
read_current_item_cells = textscan(fid,'%s')
read_current_item = cell2mat(read_current_item_cells);
for i=1:length(read_current_item)
current_stock = read_current_item(i,1);
current_url = sprintf('http:/www.', current_item)
.....
I basically try to convert the cell arrays to a matrix as textscan outputs cell arrays. However now I get the message
Error using cell2mat (line 53) Cannot support cell arrays containing cell arrays or objects.
Any help is very much appreciated

That is the normal behaviour of textscan. It returns a cell array where each element of it is another cell OR array (depending on the specifier) containing the values corresponding to each format specifier in the format string you have passed to the function. For example, if 1.txt contains
appl 12
msft 23
running your code returns
>> read_current_item_cells
read_current_item_cells =
{4x1 cell}
>> read_current_item_cells{1}
ans =
'appl'
'12'
'msft'
'23'
which itself is another cell array:
>> iscell(read_current_item_cells{1})
ans =
1
and its elements can be accessed using
>> read_current_item_cells{1}{1}
ans =
appl
Now if you change the format from '%s' to '%s %d' you get
>> read_current_item_cells
read_current_item_cells =
{2x1 cell} [2x1 int32]
>> read_current_item_cells{1}
ans =
'appl'
'msft'
>> read_current_item_cells{2}
ans =
12
23
But the interesting part is that
>> iscell(read_current_item_cells{1})
ans =
1
>> iscell(read_current_item_cells{2})
ans =
0
That means the cell element corresponding to %s is turned into a cell array, while the one corresponding to %d is left as an array. Now since I do not know the exact format of the rows in your file, I guess you have one cell array with one element which in turn is another cell array containing all the elements in the table.

What can happen is that the data gets wrapped into a cell array of cell arrays, and to access the stored strings you need to index past the first array with
read_current_item_cells = read_current_item_cells{1};
Converting from cell2mat will not work if your strings are not equal in length, in which case you can use strvcat:
read_current_item = strvcat(read_current_item_cells{:});
Then you should be able to loop through the char array:
for ii=1:size(read_current_item,1)
current_stock = read_current_item(ii,:);
current_url = sprintf('http:/www.', current_stock)
.....

Related

Why does cell2mat convert a cell array into a matrix of characters, instead of a matrix of cell contents?

I have this cell array
>> FooCellArray
FooCellArray =
1×7 cell array
Columns 1 through 4
{'Foo1'} {'Foo2'} {'Foo3'} {'Foo4'}
Columns 5 through 7
{'Foo5'} {'Foo6'} {'Foo7'}
cell2mat() converts the array into a character array instead of a 1x7 or 7x1 matrix of 'Foo1' ... 'Foo7'.
>> (cell2mat(FooCellArray))'
ans =
28×1 char array
'F'
'o'
'o'
'1'
'F'
'o'
'o'
'2'
'F'
'o'
'o'
'3'
'F'
'o'
'o'
'4'
....
Why?
cell2mat is doing precisely the correct thing as documented here. Each cell element is char vector of size 1xN. Your overall cell array is 1xN. cell2mat concatenates the contents of every cell in the "natural" direction as defined by the shape of the cell array. Your original cell looks a bit like this:
FooCellArray = [{'Foo1'}, {'Foo2'}]
The effect of cell2mat is basically as if you removed the {}, so you're left with
cell2mat(FooCellArray) --> ['Foo1', 'Foo2']
And therefore this gets concatenated into a single char vector 'Foo1Foo2'.
Compare with vectors of double instead of vectors of char:
>> FooCellArray = [{[1,2,3]}, {[4,5,6]}]
FooCellArray =
1×2 cell array
{[1 2 3]} {[4 5 6]}
>> cell2mat(FooCellArray)
ans =
1 2 3 4 5 6
With a smaller starting example:
FooCellArray = {'Foo1','Foo2','Foo3'} ;
If your MATLAB version is >= 2017b
You can directly use the function convertCharsToStrings:
>> convertCharsToStrings(FooCellArray)
ans =
1×3 string array
"Foo1" "Foo2" "Foo3"
The benefit of this method is that it will work even if the strings contained in your cell array are not all of the same length. You can transpose it if you want it as a column instead of line vector. Note the terminology of the result type, it is a string array.
If you MATLAB version is older AND if all the strings in the cell array have the same length, you could convert your cell array into a 2D character array:
>> reshape(cell2mat(FooCellArray),4,[]).'
ans =
3×4 char array
'Foo1'
'Foo2'
'Foo3'
For this one, transposition wouldn't really make sense. This result type is a char array, which are ok when they are simple vector but they get quite unwieldy once they are in 2D. Mainly because they are not as flexible as strings, each line has to have the same number of elements. So as #Adriaan pointed at, if one of your cell contained Foo24 then the reshape command would error.
Edit: Or as Chris Luengo kindly mentionned in comment, a simpler command to get exactly the same result:
>> cell2mat(FooCellArray.')
ans =
3×4 char array
'Foo1'
'Foo2'
'Foo3'
This has the same restriction, all the cell contents must have the same number of characters or the command will error.

How to convert the string 'pi' to double? (MATLAB 2015a)

When I try to convert the string 'pi' to a double it gets converted to NaN.
>> str2double('pi')
ans =
NaN
I'm reading a file that contains comma separated values, which might include a multiple of pi. For example (assume pi_in_string was read from a file):
>> pi_in_string = '0,1,-pi/6'
pi_in_string =
0,1,-pi/6
>> split_string = strsplit(pi_in_string, ',')
split_string =
'0' '1' '-pi/6'
>> str2double(split_string)
ans =
0 1 NaN
I figured out that I need to use str2num instead of str2double, but str2num doesn't work on a cell array. So instead, I looped through the elements in the cell array, converting each to type char first, then using str2num.
pi_in_string = '0,1,-pi/6';
str_array = strsplit(pi_in_string, ','); %str_array now cell array
num_elements = length(str_array); %get # elements to loop
num_vector = zeros(1,num_elements); %initialize vector
%loop through elements in str_array
for i = 1:num_elements %converting each element first to type char
num_vector(i) = str2num(char(str_array(i)));
end
It is easy to avoid the for loop and explicitly initializing the num_vector array using the cellfun command:
pi_in_string = '0,1,-pi/6';
str_array = strsplit(pi_in_string, ',');
num_vector = cellfun(#(x)eval(x), str_array, 'UniformOutput', true)
num_vector =
0 1.0000 -0.5236

Matlab cellstr length limit

I am using cellstr in Matlab to convert characters to cell string array. For example:
A = 'a1a2a3a4...a100'; % I do not list all of the characters in A
B = cellstr(A);
But the result is
B = 'a1a2a3a4a5a6a7a8a9a10a11a12a13a14a15a16a17a18a19a20a21a22a23a24a25a26a27a28a29a30a31a32a33a34a35a36a37a38a39a40a41a42a43a...'
It does not convert all the characters. I guess it is caused by the limit. Does anyone know how to increase this limit?
It does convert all of the characters. The ellipsis comes from Matlab truncating the output of a cell array display at the width of your command window.
You can display the full contents of the cell array using B{1}:
>> A = sprintf('a%g',1:100);
>> B = cellstr(A)
B =
'a1a2a3a4a5a6a7a8a9a10a11a12a13a14a15a16a17a18a19a20a21a22a23a24a25a26a...'
>> B{1}
ans =
a1a2a3a4a5a6a7a8a9a10a11a12a13a14a15a16a17a18a19a20a21a22a23a24a25a26a27a28a29a30a31a32a33a34a35a36a37a38a39a40a41a42a43a44a45a46a47a48a49a50a51a52a53a54a55a56a57a58a59a60a61a62a63a64a65a66a67a68a69a70a71a72a73a74a75a76a77a78a79a80a81a82a83a84a85a86a87a88a89a90a91a92a93a94a95a96a97a98a99a100
This is the default format for cell-strings such that as many elements of the array can be displayed in the command window at one time in limited space:
>> [B,B]
ans =
'a1a2a3a4a5a6a7a8a9a10a11a12a1...' 'a1a2a3a4a5a6a7a8a9a10a11a12a1...'
>> [B,B,B]
ans =
'a1a2a3a4a5a6a7a8...' 'a1a2a3a4a5a6a7a8...' 'a1a2a3a4a5a6a7a8...'
But, all of the string is contained within the cell array element itself. The only limit to the size is the amount of memory Matlab has to create the array.

Matlab Array has strange syntax

In Matlab, we use textscan to get a cell array from a file or somewhere. But the behavior of the cell array is so strange.
There is the sample code:
>> str = '0.41 8.24 3.57 6.24 9.27';
>> C = textscan(str, '%3.1f %*1d');
>> C
C =
[5x1 double]
We can know that C is a cell array of size 5 * 1. When I use C{1}, C{1}{1} and C(1). I get the following result:
>> C{1}
ans =
0.4000
8.2000
3.5000
6.2000
9.2000
>> C{1}{1}
Cell contents reference from a non-cell array object.
>> C(1)
ans =
[5x1 double]
Why I cannot use C{1}{1} to get the element from the cell array ? Then how can I get the elements from that cell array ?
An example I found on the Internet is :
%% First import the words from the text file into a cell array
fid = fopen(filename);
words = textscan(fid, '%s');
%% Get rid of all the characters that are not letters or numbers
for i=1:numel(words{1,1})
ind = find(isstrprop(words{1,1}{i,1}, 'alphanum') == 0);
words{1,1}{i,1}(ind)=[];
end
As words{1,1}{i,1}(ind)=[] show, what is the mechanism of using {}?
Thanks
Then how can I get the elements from that cell array ?
C = C{:}; % or C = C{1};
Access values by C(1), C(2) and so on
There is a slightly different syntax for indexing into cell arrays and numerical arrays. Your output
>> C
C =
[5x1 double]
is telling you that what you have is a 1x1 cell array, and in that 1 cell is a 5x1 array of doubles. Cell arrays are indexed into with {}, while 'normal' arrays are indexed into with ().
So you want to index into the first element of the cell array, and then index down to the first value in the 5x1 array of doubles using C{1}(1). To get the second value - C{1}(2), and so forth.
If you're familiar with other programming languages, cell arrays are something like arrays of pointers; the operator A(n) is used to get the nth element of the array A, while A{n} gets the object pointed to by the nth element of the array A (or 'contained in the nth cell of cell array A'). If A is not a cell array, A{n} fails.
So, knowing that C is a cell array, here's why you got what you got in the cases you tried -
C{1} returns the 5x1 double array contained in the first cell of C.
C{1}{1} gets the object (call it B) contained in the first cell of C, and then tried to get the object contained in the first cell of B. It fails because B is not a cell array, it is a 5x1 double array.
And C(1) returns the first element of C, which is a single cell containing a 5x1 double array.
But C{1}(1) would get you the first element of the 5x1 array contained in the first cell of C, which is what you are looking for. As #Cheery above me noted, it's probably easier, instead of writing C{1}(1), C{1}(2), ... to remove the 'cell-level' indexing by setting C=C{1}, which means C is now a 5x1 double array, and you can get the elements of it using C(1), C(2), ... Hope that makes sense!

Use textscan in Matlab to output data

I've got a large text file with some headers and numerical data. I want to ignore the header lines and specifically output the data in columns 2 and 4.
Example data
[headers]
line1
line2
line3
[data]
1 2 3 4
5 6 7 8
9 10 11 12
I've tried using the following code:
FID = fopen('datafile.dat');
data = textscan(FID,'%f',4,'delimiter',' ','headerLines',4);
fclose(FID);
I only get an output of 0x1 cell
Try this:
FID = fopen('datafile.dat');
data = textscan(FID,'%f %f %f %f', 'headerLines', 6);
fclose(FID);
data will be a 1x4 cell array. Each cell will contain a 3x1 array of double values, which are the values in each column of your data.
You can access the 2nd and 4th columns of your data by executing data{2} and data{4}.
With your original code, the main issue is that the data file has 6 header lines but you've specified that there are only 4.
Additionally, though, you'll run into problems with the specification of the number of times to match the formatSpec. Take for instance the following code
data = textscan(FID,'%f',4);
which specifies that you will attempt to match a floating-point value 4 times. Keep in mind that after matching 4 values, textscan will stop. So for the sake of simplicity, let's imagine that your data file only contained the data (i.e. no header lines), then you would get the following results when executing that code, multiple times:
>> FID = fopen('datafile_noheaders.dat');
>> data_line1 = textscan(FID,'%f', 4)
data_line1 =
[4x1 double]
>> data_line1{1}'
ans =
1 2 3 4
>> data_line2 = textscan(FID,'%f', 4)
data_line2 =
[4x1 double]
>> data_line2{1}'
ans =
5 6 7 8
>> data_line3 = textscan(FID,'%f', 4)
data_line3 =
[4x1 double]
>> data_line3{1}'
ans =
9 10 11 12
>> data_line4 = textscan(FID,'%f', 4)
data_line4 =
[0x1 double]
>> fclose(FID);
Notice that textscan picks up where it "left off" each time it is called. In this case, the first three times that textscan is called it returns one row from your data file (in the form of a cell containing a 4x1 column of data). The fourth call returns an empty cell. For the usecase you described, this format is not particularly helpful.
The example given at the top should return data in a format that is much easier to work with for what you are trying to accomplish. In this case it will match four floating point values in each of your rows of data, and will continue with each line of text until it can no longer match this pattern.