Extracting certain part of a string using strtok - matlab

I'm trying to extract a part of the string by using strtok(), but I am unable to get complete output.
For input:
string = '3_5_2_spd_20kmin_corrected_1_20190326.txt';
Output:
>> strtok(string)
ans =
'3_5_2_spd_20kmin_corrected_1_20190326.txt'
>> strtok(string,'.txt')
ans =
'3_5_2_spd_20kmin_correc'
>> strtok(string,'0326')
ans =
'_5_'
>> strtok(string,'2019')
ans =
'3_5_'
>> strtok(string,'.txt')
ans =
'3_5_2_spd_20kmin_correc'
I expect the output 3_5_2_spd_20kmin_corrected_1_20190326, but the actual output was 3_5_2_spd_20kmin_correc. Why is that and how can I get the correct output?

strtok treats every character inside the second input argument as a separate delimiter.
For example, when calling:
strtok("3_5_2_spd_20kmin_corrected_1_20190326.txt",'.txt')
Matlab sees as separate delimiters the .,t,x and therefore splits your input at the first t it encounters and gives back the result 3_5_2_spd_20kmin_correc.
In your other example using '2019', again '2019' is not a single delimiter but delimiterS, in the sense that the actual delimiters used are all '2','0','1','9'. Therefore the first delimiter encountered in the string (left to right) is '2', right after '3_5_'. That's why it returns '3_5_'.
To achieve your expected output, I think you would be better off using
strsplit
instead:
result = strsplit(string,".txt");
result{1}

extractBefore does what you're looking to do:
>> string = '3_5_2_spd_20kmin_corrected_1_20190326.txt';
>> extractBefore(string,'.txt')
ans =
'3_5_2_spd_20kmin_corrected_1_20190326'

If your strings are file names/paths, and your goal is to extract the file name without extension, the best option would be to use fileparts, like so:
>> str = '3_5_2_spd_20kmin_corrected_1_20190326.txt';
>> [~, name] = fileparts(str)
name =
'3_5_2_spd_20kmin_corrected_1_20190326'

Related

How to calculate the number of appearance of each letter(A-Z ,a-z as well as '.' , ',' and ' ' ) in a text file in matlab?

How can I go about doing this? So far I've opened the file like this
fileID = fopen('hamlet.txt'.'r');
[A,count] = fscanf(fileID, '%s');
fclose(fileID);
Getting spaces from the file
First, if you want to capture spaces, you'll need to change your format specifier. %s reads only non-whitespace characters.
>> fileID = fopen('space.txt','r');
>> A = fscanf(fileID, '%s');
>> fclose(fileID);
>> A
A = Thistexthasspacesinit.
Instead, we can use %c:
>> fileID = fopen('space.txt','r');
>> A = fscanf(fileID, '%c');
>> fclose(fileID);
>> A
A = This text has spaces in it.
Mapping between characters and values (array indices)
We could create a character array that contains all of the target characters to look for:
search_chars = ['A':'Z', 'a':'z', ',', '.', ' '];
That would work, but to map the character to a position in the array you'd have to do something like:
>> char_pos = find(search_chars == 'q')
char_pos = 43
You could also use containters.Map, but that seems like overkill.
Instead, let's use the ASCII value of each character. For convenience, we'll use only values 1:126 (0 is NUL, and 127 is DEL. We should never encounter either of those.) Converting from characters to their ASCII code is easy:
>> c = 'q'
c = s
>> a = uint8(c) % MATLAB actually does this using double(). Seems wasteful to me.
a = 115
>> c2 = char(a)
c2 = s
Note that by doing this, you're counting characters that are not in your desired list like ! and *. If that's a problem, then use search_chars and figure out how you want to map from characters to indices.
Looping solution
The most intuitive way to count each character is a loop. For each character in A, find its ASCII code and increment the counter array at that index.
char_count = zeros(1, 126);
for current_char = A
c = uint8(current_char);
char_count(c) = char_count(c) + 1;
end
Now you've got an array of counts for each character with ASCII codes from 1 to 126. To find out how many instances of 's' there are, we can just use its ASCII code as an index:
>> char_count(115)
ans = 4
We can even use the character itself as an index:
>> char_count('s')
ans = 4
Vectorized solution
As you can see with that last example, MATLAB's weak typing makes characters and their ASCII codes pretty much equivalent. In fact:
>> 's' == 115
ans = 1
That means that we can use implicit broadcasting and == to create a logical 2D array where L(c,a) == 1 if character c in our string A has an ASCII code of a. Then we can get the count for each ASCII code by summing along the columns.
L = (A.' == [1:126]);
char_count = sum(L, 1);
A one-liner
Just for fun, I'll show one more way to do this: histcounts. This is meant to put values into bins, but as we said before, characters can be treated like values.
char_count = histcounts(uint8(A), 1:126);
There are dozens of other possibilities, for instance you could use the search_chars array and ismember(), but this should be a good starting point.
With [A,count] = fscanf(fileID, '%s'); you'll only count all string letters, doesn't matter which one. You can use regexp here which search for each letter you specify and will put it in a cell array. It consists of fields which contains the indices of your occuring letters. In the end you only sum the number of indices and you have the count for each letter:
fileID = fopen('hamlet.txt'.'r');
A = fscanf(fileID, '%s');
indexCellArray = regexp(A,{'A','B','C','D',... %I'm too lazy to add the other letters now^^
'a','b','c','d',...
','.' '};
letterCount = cellfun(#(x) numel(x),indexCellArray);
fclose(fileID);
Maybe you put the cell array in a struct where you can give fieldnames for the letters, otherwise you might loose track which count belongs to which number.
Maybe there's much easier solution, cause this one is kind of exhausting to put all the letters in the regexp but it works.

MATLAB 'text' function not working with 'sprintf' argument

Trying to print the following range of labels in a figure:
aux = {'ca155.mat','ca154.mat','ca159.mat','ca146.mat','ca148.mat','ca004.mat'};
But I need it upper case and without the extension, so I use
text(0,0,upper(sprintf([aux{i},'\b\b\b\b'])));
In the command window I get the correct output such as for i=1, i.e. CA155. However the text function on a figure doesn't work and produces:
CA155.MAT[][][][]
Except instead of brackets there are closed rectangles (I couldn't copy the character).
How can I fix this?
When processing your text, you did not delete the extension, you inserted backspaces. Here some insights for demonstration:
>> x=upper(sprintf([aux{i},'\b\b\b\b']))
x =
'CA155'
>> size(x)
ans =
1 13
>> x(1:9)
ans =
'CA155.MAT'
>> x(1:10)
ans =
'CA155.MA'
The first 9 characters are still there but the following backspaces delete them when working in a command window. Looks like text does not support it, and backspaces are definitely not the way to go.
Use fileparts instead:
>> [filepath,name,ext]=fileparts(aux{i})
filepath =
0×0 empty char array
name =
'ca155'
ext =
'.mat'

How to display selected entries of an array of structures in MATLAB

Suppose we have an array of structure. The structure has fields: name, price and cost.
Suppose the array A has size n x 1. If I'd like to display the names of the 1st, 3rd and the 4th structure, I can use the command:
A([1,3,4]).name
The problem is that it prints the following thing on screen:
ans =
name_of_item_1
ans =
name_of_item_3
ans =
name_of_item
How can I remove those ans = things? I tried:
disp(A([1,3,4]).name);
only to get an error/warning.
By doing A([1,3,4]).name, you are returning a comma-separated list. This is equivalent to typing in the following in the MATLAB command prompt:
>> A(1).name, A(3).name, A(4).name
That's why you'll see the MATLAB command prompt give you ans = ... three times.
If you want to display all of the strings together, consider using strjoin to join all of the names together and we can separate the names by a comma. To do this, you'll have to place all of these in a cell array. Let's call this cell array names. As such, if we did this:
names = {A([1,3,4]).name};
This is the same as doing:
names = {A(1).name, A(3).name, A(4).name};
This will create a 1 x 3 cell array of names and we can use these names to join them together by separating them with a comma and a space:
names = {A([1,3,4]).name};
out = strjoin(names, ', ');
You can then show what this final string looks like:
disp(out);
You can use:
[A([1,3,4]).name]
which will, however, concatenate all of the names into a single string.
The better way is to make a cell array using:
{ A([1,3,4]).name }

Extract values from filenames

I have file names stored as follows:
>> allFiles.name
ans =
k-120_knt-500_threshold-0.3_percent-34.57.csv
ans =
k-216_knt-22625_threshold-0.3_percent-33.33.csv
I wish to extract the 4 values from them and store in a cell.
data={};
for k =1:numel(allFiles)
data{k,1}=csvread(allFiles(k).name,1,0);
data{k,2}= %kvalue
data{k,3}= %kntvalue
data{k,4}=%threshold
data{k,5}=%percent
...
end
There's probably a regular expression that can be used to do this, but a simple piece of code would be
data={numel(allFiles),5};
for k =1:numel(allFiles)
data{k,1}=csvread(allFiles(k).name,1,0);
[~,name] = fileparts(allFiles(k).name);
dashIdx = strfind(name,'-'); % find location of dashes
usIdx = strfind(name,'_'); % find location of underscores
data{k,2}= str2double(name(dashIdx(1)+1:usIdx(1)-1)); %kvalue
data{k,3}= str2double(name(dashIdx(2)+1:usIdx(2)-1)); %kntvalue
data{k,4}= str2double(name(dashIdx(3)+1:usIdx(3)-1)); %threshold
data{k,5}= str2double(name(dashIdx(4)+1:end)); %percent
...
end
For efficiency, you might consider using a single matrix to store all the numeric data, and/or a structure (so that you can access the data by name rather than index).
You simply need to tokenize using strtok multiple times (there is more than 1 way to solve this). Someone has a handy matlab script somewhere on the web to tokenize strings into a cell array.
(1) Starting with:
filename = 'k-216_knt-22625_threshold-0.3_percent-33.33.csv'
Use strfind to prune out the extension
r = strfind(filename, '.csv')
filenameWithoutExtension = filename(1:r-1)
This leaves us with:
'k-216_knt-22625_threshold-0.3_percent-33.33'
(2) Then tokenize this:
'k-216_knt-22625_threshold-0.3_percent-33.33'
using '_' . You get the tokens:
'k-216'
'knt-22625'
'threshold-0.3'
'percent-33.33'
(3) Lastly, for each string, tokenize using using '-'. Each second string will be:
'216'
'22625'
'0.3'
'33.33'
And use str2num to convert.
Strategy: strsplit() + str2num().
data={};
for k =1:numel(allFiles)
data{k,1}=csvread(allFiles(k).name,1,0);
words = strsplit( allFiles(k).name(1:(end-4)), '_' );
data{k,2} = str2num(words{1}(2:end));
data{k,3} = str2num(words{2}(4:end));
data{k,4} = str2num(words{3}(10:end));
data{k,5} = str2num(words{4}(8:end));
end

How to store the line matching an expression in matlab

I have the following code, and i am wanting to store the entire line that contains the matching expression, but currently i am able to store only the expression itself.
expr='\hello';
fileread = regexp(filetext, expr, 'match');
fid = fopen('data.txt', 'wt');
fprintf(fid, '%s\n',fileread{:});
suppose my file contains:
Hello,my name is X
X hello
Not this line
my file data.txt stores
hello
hello
instead of the entire line containing the expression.
desired data.txt
Hello,my name is X
X hello
what am i doing wrong?
Based on the way you are interacting with the regexp function I will assume you have all the file text in a single variable. Let's imagine that variable takes the following form:
my name is hello there
Hello,my name is X
X hello
Not this line
For your reference, I've constructed this variable using sprintf
string = sprintf('my name is hello there\nHello,my name is X\n X hello\n Not this line')
You can extract the lines which have hello with the following regexp:
[~,~,~,d] = regexp(string, '.*?[H|h]ello.*?\n')
The results can be retrieved from the cell array with:
>> d{1}
ans =
my name is hello there
>> d{2}
ans =
Hello,my name is X
>> d{3}
ans =
X hello
Note that I used a couple of lazy quantifiers .*?, check out Laziness Instead of Greediness at this link if you would like to learn more: http://www.regular-expressions.info/repeat.html
What you're doing wrong is not using the MATLAB regexp function correctly. If you look under "Return Substrings using 'match' Keyword" on this site, you will see that the result you got is what is expected for what your code stated (it returns the parts of the input string that match the regular expression you supplied). I was going to post a suggestion, but someone beat me to it ;-). Good luck.