How to detect the two tabs in a string in matlab - matlab

I am reading data form tab separated file:
str1 = '1 3'
str2 = '4 5 6'
In 'str1' second place is empty. I am reading line by line in matlab and then using strsplit, I extract values from each line and later, I am building arrays. Each column in text correspond to each array.
strsplit(str1, '\t')
yeilds ==> '1 3'
strsplit(str2, '\t')
yeilds ==> '4 5 6'
Somehow, I miss that second place in first string is empty. How can I save this information?

Try using a regular expression:
str1 = '1 3'
numel(regexp(str1, '\t')) % look for the number of elements of the regular expression that looks for tabs '\t'
will return 2
For your problem you could do the following:
tmp = regexp(str1, '(\d*)\t(\d*)\t(\d*)', 'tokens')
tmp{1}
=
'1' '' '3'

Matlab has built-in support to read tab-separated files:
A = importdata('file.txt', '\t')
If your file looks like this:
1\t2\t3
4\t\t5
importdata yields:
A =
1 2 3
4 NaN 5

Related

Loading variables from ascii file

I am trying to load variables from a .dat file I have created.
The file is in the following format:
x = 1
y = 2
z = 3
I understand that if the file was in the format:
1 2 3
I could use
s = load(filename.dat)
and it would create an array with name 'S' storing all the numbers in the file.
However, from the first format I showed, I would like each stored as a separate variable.
I know I could do this with a .MAT file but this isn't really optimal to my requirements because it needs to be easily edited, preferably with notepad or another word processor.
Try textread function:
[varNames, varValues] = textread('tmp.txt', '%s%f', 'whitespace','\n', 'delimiter','=');
disp(varNames);
'x '
'y '
'z '
disp(varValues);
1
2
3

How to separate the elements of a matrix with comma in Matlab

I would like to separate each element in the matrix below with a comma.
1 2 3
4 5 6
7 8 9
Here's my attempt:
s= sprintf('%.17g,',matrix)
Ouput=1,2,3,4,5,6,7,8,9,
Desired output:
1, 2, 3
4, 5, 6
7, 8, 9
Thanks in advance for your suggestions.
You just need to specify the formatting of the entire first line:
s = sprintf('%.17g, %.17g, %.17g\n',matrix.')
MATLAB keeps re-using the formatting string as long as there are elements left in matrix.
To generalize this process, use the following expression:
s = sprintf([strjoin(repmat({'%.17g'},1,size(matrix,2)), ', ') '\n'], matrix.')
So there's a lot going on in this one line - let's unpack it from inside out:
repmat({'%.17g'},1,size(matrix,2))
This sub-expression takes a single cell array of size 1x1, containing the string %.17g, and duplicates it into a cell array with dimensions specified by the next two arguments. We want to construct a cell array with a single row (hence the argument 1) representing all the format specifiers (%...) we need. Since we want one instance of %.17g for each column, we use size(matrix,2) as the last argument to repmat, since that returns the number of columns of the matrix.
As an example, if you have 5 columns, you get this:
>> repmat({'%.17g'},1,5)
ans =
'%.17g' '%.17g' '%.17g' '%.17g' '%.17g'
Next, since you want columns delimited by commas and spaces, you can use strjoin():
>> strjoin(repmat({'%.17g'},1,5), ', ')
ans =
%.17g, %.17g, %.17g, %.17g, %.17g
Note the use of a comma and several spaces as the second argument (the delimiting string) to strjoin(). Adjust the number of spaces according to your display needs. We need one more thing to be able to print a multi-line matrix - a carriage return. To do this, we use the fact that two strings in square brackets [] are concatenated by MATLAB:
[strjoin(repmat({'%.17g'},1,size(matrix,2)), ', ') '\n']
This produces the final formatting string that we need. All that is left, is to add the sprintf and pass in the matrix argument. As Rijul Sudhir pointed out, you do have to transpose your matrix because MATLAB will walk down a column to pair the matrix elements with the format specifiers.
EDIT: Stewie Griffin was correct about the transpose operation (.') - code has been corrected.

Replacing letters with numbers in a MATLAB array

I am trying to write a function to mark the results of a test. The answers given by participants are stored in a nx1 cell array. However, theses are stored as letters. I am looking for a way to convert (a-d) these into numbers (1-4) ie. a=1, b=2 so these can be compared the answers using logical operations.
What I have so far is:
[num,txt,raw]=xlsread('FolkPhysicsMERGE.xlsx', 'X3:X142');
FolkPhysParAns=txt;
I seem to be able to find how to convert from numbers into letters but not the other way around. I feel like there should be a relatively easy way to do this, any ideas?
If you have a cell array of letters:
>> data = {'a','b','c','A'};
you only need to:
Convert to lower-case with lower, to treat both cases equally;
Convert to a character array with cell2mat;
Subtract (the ASCII code of) 'a' and add 1.
Code:
>> result = cell2mat(lower(data))-'a'+1
result =
1 2 3 1
More generally, if the possible answers are not consecutive letters, or even not single letters, use ismember:
>> possibleValues = {'s', 'm', 'l', 'xl', 'xxl'};
>> data = {'s', 'm', 'xl', 'l', 'm', 'l', 'aaa'};
>> [~, result] = ismember(data, possibleValues)
result =
1 2 4 3 2 3 0
Thought I might as well write an answer...
you can use strrep to replace 'a' with '1' (note it is the string format), and do it for all 26 letters and then use cell2mat to convert string '1' - '26' etc to numeric 1 -26.
Lets say:
t = {'a','b','c'} //%Array of Strings
t = strrep(t,'a','1') //%replace all 'a' with '1'
t = strrep(t,'b','2') //%replace all 'b' with '2'
t = strrep(t,'c','3') //%replace all 'c' with '3'
%// Or 1 line:
t = strrep(g,{'a','b','c'},{'1','2','3'})
>> t =
'1' '2' '3'
output = cellfun(#str2num,t,'un',0) //% keeps the cell structure
>> output =
[1] [2] [3]
alternatively:
output = str2num(cell2mat(t')) //% uses the matrix structure instead, NOTE the inversion ', it is crucial here.
>> output =
1
2
3

average range of data and plot in gnuplot

I have this kind of data:
label-> 1 2 3 4 5
val1 1.67E+07 2.20E+07 3.04E+07 7.89E+07 1.24E+08
val2 1.71E+07 2.35E+07 2.70E+07 7.80E+07 1.31E+08
val3 1.48E+07 2.15E+07 2.74E+07 7.18E+07 1.17E+08
val4 1.57E+07 2.07E+07 2.49E+07 7.46E+07 1.27E+08
val5 1.32E+07 2.23E+07 3.07E+07 7.50E+07 1.16E+08
I need to plot the label vs the average of each val column, like this:
label-> 1 2 3 4 5
val1 1.67E+07 2.20E+07 3.04E+07 7.89E+07 1.24E+08
val2 1.71E+07 2.35E+07 2.70E+07 7.80E+07 1.31E+08
val3 1.48E+07 2.15E+07 2.74E+07 7.18E+07 1.17E+08
val4 1.57E+07 2.07E+07 2.49E+07 7.46E+07 1.27E+08
val5 1.32E+07 2.23E+07 3.07E+07 7.50E+07 1.16E+08
mean 1.55E+07 2.20E+07 2.81E+07 7.57E+07 1.23E+08
Is there any possibility of perform this operation in gnuplot or should I keep attached to Excel?
You could do it using awk and gnuplot. Assume your example data (without mean row) is in data.txt.
Then you could calculate the mean in each column starting from the second column (from i=2) and the second row (record, or row, #1 -- NR==1 -> do not summate, but fill auxiliary array a with zeroes: a[i]=0.0). For that purpose one could use awk condition: if (NR==1)... else {...calculate the means...}.
Awk reads the data row-by-row. In each row, you iterate over fields and summate the data from column with number i into array element a[i]:
{for(i=2;i<=NF;i++) a[i]+=$i;}
When iterating over the first row (NR==1), we would ;
At the END of awk script (all rows processed), just divide by number of columns in your data NF-1 to calculate the mean values. Note, the code below assumes you have rectangular-formatted data (NF=const).
Also, save row column labels into label array:
if (NR==1) {for(i=2;i<=NF;i++) label[i]=$i; ... }
Then print the labels and mean values into the rows, one row for one label.
for(i=2;i<=NF;i++) {printf label[i]" "; print a[i]/(NF-1)}
The final data table would look that way:
1 15500000
2 22000000
3 28080000
4 75660000
5 123000000
Then you could plot one column against the other.
Note, the final data for gnuplot should be formatted in columns, not rows.
The following code performs the described operations:
gnuplot> unset key
gnuplot> plot "<export LC_NUMERIC=C; awk '{if (NR==1) {for(i=2;i<=NF;i++) label[i]=$i; a[i]=0.0;} else {for(i=2;i<=NF;i++) a[i]+=$i;};} END {for(i=2;i<=NF;i++) {printf label[i]\" \"; print a[i]/(NF-1)}};' data.txt"
Note, that spaces should be escaped with backslash character \ in the gnuplot.

textscan introduces additional zeros in output array

I have a .txt file like this:
ord,cat,1
ord,cat,1
ord,cat,3
ord,cat,1
ord,cat,4
I know the number of entries for each row (comma separated) but not the number of rows.
I need to import the number of the following car in an array.
I wrote this:
fid=fopen(filename)
A=textscan(fid,'%s%s%d','Delimiter',',')
But i get this
A = {17x1 cell} [16x1 int32]
where the number of cells is clearly wrong.
When i try to read
A{3}
i get
ans =
0
0
0
0
0
1
0
1
0
3
0
1
0
4
I'm really interested just in the integer array, but maybe can be useful to show you also:
A{1}
ans =
'{\rtf1\ansi\ansicpg1252\cocoartf1187\cocoasubrtf400'
'{\fonttbl\f0\fswiss\fcharset0 Helvetica;}'
'{\colortbl;\red255\green255\blue255;}'
[1x75 char]
[1x102 char]
'\f0\fs24 \cf0 ord'
'\'
'ord'
'\'
'ord'
'\'
'ord'
'\'
'ord'
'}'
A{2}
ans =
''
''
''
''
''
'cat'
''
'cat'
''
'cat'
''
'cat'
''
'cat'
Ok,I think there was a format mistake of some kind in the input file.
I deleted it and created a new .txt file and the code above works fine.
You're not giving the right format command to textscan.
A=textscan(fid,'%s%d','Delimiter',',')
'%s%d' here means "read one string, then one integer". So it will probably sit there reading string-integer-string-integer (or trying to), and the "0"s arise from errors where
Since you have three entries per line, try instead:
A=textscan(fid,'%s%s%d','Delimiter',',')
Your numbers should be in A{3}.
If you don't need the first two columns, you can also skip over those fields:
A=textscan(fid,'%*s%*s%d','Delimiter',',')