How to store the line matching an expression in matlab - matlab

I have the following code, and i am wanting to store the entire line that contains the matching expression, but currently i am able to store only the expression itself.
expr='\hello';
fileread = regexp(filetext, expr, 'match');
fid = fopen('data.txt', 'wt');
fprintf(fid, '%s\n',fileread{:});
suppose my file contains:
Hello,my name is X
X hello
Not this line
my file data.txt stores
hello
hello
instead of the entire line containing the expression.
desired data.txt
Hello,my name is X
X hello
what am i doing wrong?

Based on the way you are interacting with the regexp function I will assume you have all the file text in a single variable. Let's imagine that variable takes the following form:
my name is hello there
Hello,my name is X
X hello
Not this line
For your reference, I've constructed this variable using sprintf
string = sprintf('my name is hello there\nHello,my name is X\n X hello\n Not this line')
You can extract the lines which have hello with the following regexp:
[~,~,~,d] = regexp(string, '.*?[H|h]ello.*?\n')
The results can be retrieved from the cell array with:
>> d{1}
ans =
my name is hello there
>> d{2}
ans =
Hello,my name is X
>> d{3}
ans =
X hello
Note that I used a couple of lazy quantifiers .*?, check out Laziness Instead of Greediness at this link if you would like to learn more: http://www.regular-expressions.info/repeat.html

What you're doing wrong is not using the MATLAB regexp function correctly. If you look under "Return Substrings using 'match' Keyword" on this site, you will see that the result you got is what is expected for what your code stated (it returns the parts of the input string that match the regular expression you supplied). I was going to post a suggestion, but someone beat me to it ;-). Good luck.

Related

MATLAB 'text' function not working with 'sprintf' argument

Trying to print the following range of labels in a figure:
aux = {'ca155.mat','ca154.mat','ca159.mat','ca146.mat','ca148.mat','ca004.mat'};
But I need it upper case and without the extension, so I use
text(0,0,upper(sprintf([aux{i},'\b\b\b\b'])));
In the command window I get the correct output such as for i=1, i.e. CA155. However the text function on a figure doesn't work and produces:
CA155.MAT[][][][]
Except instead of brackets there are closed rectangles (I couldn't copy the character).
How can I fix this?
When processing your text, you did not delete the extension, you inserted backspaces. Here some insights for demonstration:
>> x=upper(sprintf([aux{i},'\b\b\b\b']))
x =
'CA155'
>> size(x)
ans =
1 13
>> x(1:9)
ans =
'CA155.MAT'
>> x(1:10)
ans =
'CA155.MA'
The first 9 characters are still there but the following backspaces delete them when working in a command window. Looks like text does not support it, and backspaces are definitely not the way to go.
Use fileparts instead:
>> [filepath,name,ext]=fileparts(aux{i})
filepath =
0×0 empty char array
name =
'ca155'
ext =
'.mat'

Having trouble conditionally moving files based on their names

I am trying to write a script that will auto sort files based on the 7th and 8th digit in their name. I get the following error: "Argument must be a string scalar or character vector". Error is coming from line 16:
Argument must be a string scalar or character vector.
Error in sort_files (line 16)
movefile (filelist(i), DirOut)
Here's the code:
DirIn = 'C:\Folder\Experiment' %set incoming directory
DirOut = 'C:\Folder\Experiment\1'
eval(['filelist=dir(''' DirIn '/*.wav'')']) %get file list
for i = 1:length(filelist);
Filename = filelist(i).name
name = strsplit(Filename, '_');
newStr = extractBetween(name,7,8);
if strcmp(newStr,'01')
movefile (filelist(i), DirOut)
end
end
Also, I am trying to make the file folder conditional so that if the 10-11 digits are 02 the file goes to DirOut/02 etc.
First, try avoid using the eval function, it is pretty much dreaded as slow and hard to understand. Specially if you need to create variables. Instead do this:
filelist = dir(fullfile(DirIn,'*.wav'));
Second, the passage:
name = strsplit(Filename, '_');
Makes name a list, so you can access name{1} or possibly name{2}. Each of these are strings. But name isn't a string, it is a list. extractBetween requires a string as an input. That is why you are getting this problem. But note that you could have simply done:
newStr = name(7:8);
If name was a string, which in Matlab is a char array.
EDIT:
Since it has been now claimed that the error occurs on movefile (filelist(i), DirOut), the likely cause is because filelist(i) is a struct. Wheres a filena name (char array) should have been given at input. The solution should be replacing this line with:
movefile(fullfile(filelist(i).folder, filelist(i).name), DirOut)
Now, if you want to number the output folders too, you can do this:
movefile(fullfile(filelist(i).folder, filelist(i).name), [DirOut,filesep,name(7:8)])
This will move a file to /DirOut/01. If you wanted /DirOut/1, you could do this:
movefile(fullfile(filelist(i).folder, filelist(i).name), [DirOut,filesep,int2str(str2num(name(7:8)))])

Extracting certain part of a string using strtok

I'm trying to extract a part of the string by using strtok(), but I am unable to get complete output.
For input:
string = '3_5_2_spd_20kmin_corrected_1_20190326.txt';
Output:
>> strtok(string)
ans =
'3_5_2_spd_20kmin_corrected_1_20190326.txt'
>> strtok(string,'.txt')
ans =
'3_5_2_spd_20kmin_correc'
>> strtok(string,'0326')
ans =
'_5_'
>> strtok(string,'2019')
ans =
'3_5_'
>> strtok(string,'.txt')
ans =
'3_5_2_spd_20kmin_correc'
I expect the output 3_5_2_spd_20kmin_corrected_1_20190326, but the actual output was 3_5_2_spd_20kmin_correc. Why is that and how can I get the correct output?
strtok treats every character inside the second input argument as a separate delimiter.
For example, when calling:
strtok("3_5_2_spd_20kmin_corrected_1_20190326.txt",'.txt')
Matlab sees as separate delimiters the .,t,x and therefore splits your input at the first t it encounters and gives back the result 3_5_2_spd_20kmin_correc.
In your other example using '2019', again '2019' is not a single delimiter but delimiterS, in the sense that the actual delimiters used are all '2','0','1','9'. Therefore the first delimiter encountered in the string (left to right) is '2', right after '3_5_'. That's why it returns '3_5_'.
To achieve your expected output, I think you would be better off using
strsplit
instead:
result = strsplit(string,".txt");
result{1}
extractBefore does what you're looking to do:
>> string = '3_5_2_spd_20kmin_corrected_1_20190326.txt';
>> extractBefore(string,'.txt')
ans =
'3_5_2_spd_20kmin_corrected_1_20190326'
If your strings are file names/paths, and your goal is to extract the file name without extension, the best option would be to use fileparts, like so:
>> str = '3_5_2_spd_20kmin_corrected_1_20190326.txt';
>> [~, name] = fileparts(str)
name =
'3_5_2_spd_20kmin_corrected_1_20190326'

Extract specific column information from table in MATLAB

I have several *.txt files with 3 columns information, here just an example of one file:
namecolumn1 namecolumn2 namecolumn3
#----------------------------------------
name1.jpg someinfo1 name
name2.jpg someinfo2 name
name3.jpg someinfo3 name
othername1.bmp info1 othername
othername2.bmp info2 othername
othername3.bmp info3 othername
I would like to extract from "namecolumn1" only the names starting with name but from column 1.
My code look like this:
file1 = fopen('test.txt','rb');
c = textscan(file1,'%s %s %s','Headerlines',2);
tf = strcmp(c{3}, 'name');
info = c{1}{tf};
the problem is that when I do disp(info) I got only the first entry from the table: name1.jpg and I would like to have all of them:
name1.jpg
name2.jpg
name3.jpg
You're pretty much there. What you're seeing is an example of MATLAB's Comma Separated List, so MATLAB is returning each value separately.
You can verify this by entering c{1}{tf} in the command line after running your script, which returns:
>> c{1}{tf}
ans =
name1.jpg
ans =
name2.jpg
ans =
name3.jpg
Though sometimes we'd want to concatenate them, I think in the case of character arrays it is more difficult to work with than retaining the cell arrays:
>> info = [c{1}{tf}]
info =
name1.jpgname2.jpgname3.jpg
versus
>> info = c{1}(tf)
info =
'name1.jpg'
'name2.jpg'
'name3.jpg'
The former would require you to reshape the result (and whitespace pad, if the strings are different lengths), whereas you can index the strings in a cell array directly without having to worry about any of that (e.g. info{1}).

Extracting strings from cells in MATLAB [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Using regexp to find a word
I'm working on an assignment for my CS course.
We're given a plain text file, which, in my case, contains a series of tweets.
What I need to do is create a script that will detect hashtags, and then save each hashtag into an cell array.
So far I know how to write a function that detects the '#' symbol...
strfind(textRead{i},'#');
where in a for loop where i=1:30 (that is, the number of cells of text). However, past that, I'm at a loss as to how I should write a script that will detect the '#' and return the text between that and the next ' ' (space) character.
Try this:
str = '#someHashtag other tweet text ignore #random';
regexp(str, '#[A-z]*', 'match')
I think you'll be able to find the rest out yourself :)
Here is basic skeleton. But make sure to use correct regexp to extract the values ;-)
Yes with the above Dorin's regexp and match you get one value at a time. You may add a token as per this example from mathworks.
Sample:
str = ['if <code>A </code> == x<sup>2 </sup>, ' ... '<em>disp(x) </em>']
str = if <code>A </code> == x<sup>2 </sup>, <em>disp(x) </em>
expr = '<(\w+).*?>.*?</\1>';
[tok mat] = regexp(str, expr, 'tokens', 'match');
tok{:}
ans = 'code'
ans = 'sup'
ans = 'em'
in above code you don't really need to loop and can process entire text bulk as one string , hopefully not hitting any string limit......
But if you want to loop, or if you need to loop, you use the following sample with Rody's regexp and match only.
fid = fopen('data.txt');
dataText = fgetl(fid);
while ~feof(fid)
ldata = textscan(dataText,'*%d#*');
X = (ldata, '#[A-z]*', 'match')
Cellarray = X{1}
end
Disp(X)
fclose(fid);