Extract or import two Specific values form Text file - MATLAB - matlab

I am working on a problem in MATLAB, in which I need to import two specific values (highlighted) from a text file as shown in the figure 1.
Corresponding .txt file is attached in the following link (Link)

When you want to extract specific values from a text file you should always consider using regular expressions.
For getting the two values you highlighted you can use:
raw = fileread('M7_soil_FN_1.txt');
val = regexp(raw,'(\d+)\s+(\d+\.\d+)\s+(?=NPTS)','tokens')
The regular expression says:
(\d+) Match digits and capture.
\s+ Match whitespace.
(\d+\.\d+) Match and capture digits, a full stop, more digits.
\s+ Match whitespace.
(?=NPTS) Positive lookahead to ensure that what follows is NPTS.
Then convert val to double:
val = str2double(val{:})
>>val(1)
5991
>>val(2)
0.0050
If you are interested, you can see the regular expression in action live here.

Hopefully this will work.
delimiter = ' ';
startRow = 7;
fileID = fopen(filename,'r');
formatSpec = '%f%f%[^\n\r]';
dataArray = textscan(fileID, formatSpec,startRow-startRow+1, 'Delimiter', delimiter, 'HeaderLines' ,startRow-1, 'ReturnOnError', false,'MultipleDelimsAsOne',1, 'EndOfLine', '\r\n');
val1 = dataArray{1,1};
val2 = dataArray{1,2};

Related

How to calculate the number of appearance of each letter(A-Z ,a-z as well as '.' , ',' and ' ' ) in a text file in matlab?

How can I go about doing this? So far I've opened the file like this
fileID = fopen('hamlet.txt'.'r');
[A,count] = fscanf(fileID, '%s');
fclose(fileID);
Getting spaces from the file
First, if you want to capture spaces, you'll need to change your format specifier. %s reads only non-whitespace characters.
>> fileID = fopen('space.txt','r');
>> A = fscanf(fileID, '%s');
>> fclose(fileID);
>> A
A = Thistexthasspacesinit.
Instead, we can use %c:
>> fileID = fopen('space.txt','r');
>> A = fscanf(fileID, '%c');
>> fclose(fileID);
>> A
A = This text has spaces in it.
Mapping between characters and values (array indices)
We could create a character array that contains all of the target characters to look for:
search_chars = ['A':'Z', 'a':'z', ',', '.', ' '];
That would work, but to map the character to a position in the array you'd have to do something like:
>> char_pos = find(search_chars == 'q')
char_pos = 43
You could also use containters.Map, but that seems like overkill.
Instead, let's use the ASCII value of each character. For convenience, we'll use only values 1:126 (0 is NUL, and 127 is DEL. We should never encounter either of those.) Converting from characters to their ASCII code is easy:
>> c = 'q'
c = s
>> a = uint8(c) % MATLAB actually does this using double(). Seems wasteful to me.
a = 115
>> c2 = char(a)
c2 = s
Note that by doing this, you're counting characters that are not in your desired list like ! and *. If that's a problem, then use search_chars and figure out how you want to map from characters to indices.
Looping solution
The most intuitive way to count each character is a loop. For each character in A, find its ASCII code and increment the counter array at that index.
char_count = zeros(1, 126);
for current_char = A
c = uint8(current_char);
char_count(c) = char_count(c) + 1;
end
Now you've got an array of counts for each character with ASCII codes from 1 to 126. To find out how many instances of 's' there are, we can just use its ASCII code as an index:
>> char_count(115)
ans = 4
We can even use the character itself as an index:
>> char_count('s')
ans = 4
Vectorized solution
As you can see with that last example, MATLAB's weak typing makes characters and their ASCII codes pretty much equivalent. In fact:
>> 's' == 115
ans = 1
That means that we can use implicit broadcasting and == to create a logical 2D array where L(c,a) == 1 if character c in our string A has an ASCII code of a. Then we can get the count for each ASCII code by summing along the columns.
L = (A.' == [1:126]);
char_count = sum(L, 1);
A one-liner
Just for fun, I'll show one more way to do this: histcounts. This is meant to put values into bins, but as we said before, characters can be treated like values.
char_count = histcounts(uint8(A), 1:126);
There are dozens of other possibilities, for instance you could use the search_chars array and ismember(), but this should be a good starting point.
With [A,count] = fscanf(fileID, '%s'); you'll only count all string letters, doesn't matter which one. You can use regexp here which search for each letter you specify and will put it in a cell array. It consists of fields which contains the indices of your occuring letters. In the end you only sum the number of indices and you have the count for each letter:
fileID = fopen('hamlet.txt'.'r');
A = fscanf(fileID, '%s');
indexCellArray = regexp(A,{'A','B','C','D',... %I'm too lazy to add the other letters now^^
'a','b','c','d',...
','.' '};
letterCount = cellfun(#(x) numel(x),indexCellArray);
fclose(fileID);
Maybe you put the cell array in a struct where you can give fieldnames for the letters, otherwise you might loose track which count belongs to which number.
Maybe there's much easier solution, cause this one is kind of exhausting to put all the letters in the regexp but it works.

Blank cells while reading substring and numbers from with a string with textscan

I have a text file that consists of line after line of data in an xml-like format like this:
<item type="newpoint1" orient_zx="0.8658983248810842" orient_zy="0.4371062806139187" orient_zz="0.2432245678709263" electrostatic_force_x="0" electrostatic_force_y="0" electrostatic_force_z="0" cust_attr_HMTorque_0="0" cust_attr_HMTorque_1="0" cust_attr_HMTorque_2="0" vel_x="0" vel_y="0" vel_z="0" orient_xx="-0.2638371745169712" orient_xy="-0.01401379799313232" orient_xz="0.9644654264455047" pos_x="0" cust_attr_BondForce_0="0" pos_y="0" cust_attr_BondForce_1="0" pos_z="0.16" angvel_x="0" cust_attr_BondForce_2="0" angvel_y="0" id="1" angvel_z="0" charge="0" scaling_factor="1" cust_attr_BondTorque_0="0" cust_attr_BondTorque_1="0" cust_attr_BondTorque_2="0" cust_attr_Damage_0="0" orient_yx="0.4249823952954215" cust_attr_HMForce_0="0" cust_attr_Damage_1="0" orient_yy="-0.8993006799250595" cust_attr_HMForce_1="0" orient_yz="0.1031903618333235" cust_attr_HMForce_2="0" />
I'm only interested in the values within the " " so I'm trying to read this with textscan. To do this I take the first line and do regex find/replace to swap all number for %f and strings for %s, like this:
expression = '"[-+]?\d*\.?\d*"';
expression2 = '"\w*?"';
newStr = regexprep(firstline,expression,'"%f"');
FormatString = sprintf('%s',regexprep(newStr,expression2,'"%s"'));
The I re-open the file to read the files with string with the following call:
while ~feof(InputFile) % Read all lines in file
data = textscan(InputFile,FormatString,'delimiter','\n');
end
But all i get is an array of empty cells. I can't see what my mistake is - can someone point me in the right direction?
Clarification:
Mathworks provides this following example for textscan to remove literal text, which is what I'm trying to do.
"Remove the literal text 'Level' from each field in the second column of the data from the previous example."
filename = fullfile(matlabroot,'examples','matlab','scan1.dat');
fileID = fopen(filename);
C = textscan(fileID,'%s Level%d %f32 %d8 %u %f %f %s %f');
fclose(fileID);
C{2}
Ok, after looking at this with some fresh eyes today I spotted my problem.
newStr = regexprep(firstline,expression,'"%f"');
FormatString = sprintf('%s',regexprep(newStr,expression2,'%q'));
data = textscan(InputFile,FormatString,'delimiter',' ');
The replacement of the string need to be switched to the %q option which allows a string within double quotes to be read and the delimiter for textscan needed to be reverted to a single space. Code working fine now.

Matlab: Parsing strings

How can I turn strings like 1-14 into 01_014? (and for 2-2 into 02_002)?
I can do something like this:
testpoint_number = '5-16';
temp = textscan(testpoint_number, '%s', 'delimiter', '-');
temp = temp{1};
first_part = temp{1};
second_part = temp{2};
output_prefix = strcat('0',first_part);
parsed_testpoint_number = strcat(output_prefix, '_',second_part);
parsed_testpoint_number
But I feel this is very tedious, and I don't know how to handle the second part (16 to 016)
As you are handling integer numbers, I would suggest to change the textscan to %d (integer numbers). With that, you can use the formatting power of the *printf commands (e.g. sprintf).
*printf allows you to specify the width of the integer. With %02d, a 2 chars wide integer, which will be filled up with zeros, is printed.
The textscan returns a {1x1} cell, which contains a 2x1 array of integers. *printf can handle this itsself, so you just have to supply the argument temp{1}:
temp = textscan(testpoint_number, '%d', 'delimiter', '-');
parsed_testpoint_number = sprintf('%02d_%03d',temp{1});
Your textscanning is probably the most intuitive way to do this, but from then on what I would recommend doing is instead converting the scanned first_part and second_part into numerical format, giving you integers.
You can then sprintf these into your target string using the correct 'c'-style formatters to indicate your zero-padding prefix width, e.g.:
temp = textscan(testpoint_number, '%d', 'delimiter', '-');
parsed_testpoint_number = sprintf('%02d_%03d', temp{1});
Take a look at the C sprintf() documentation for an explanation of the string formatting options.

Textscan Matlab ; Doesn't read the format

I have a file in the following format:
**400**,**100**::400,descendsFrom,**76**::0
**400**,**119**::400,descendsFrom,**35**::0
**400**,**4**::400,descendsFrom,**45**::0
...
...
Now I need to read, the part only in the bold. I've written the following formatspec:
formatspec = '%d,%d::%*d,%*s,%d::%*d\n';
data = textscan(fileID, formatspec);
It doesn't seem to work. Can someone tell me what's wrong?
I also need to know how to 'not use' delimiter, and how to proceed if I want to express the exact way my file is written in, for example in the case above.
EDITED
A possible problem is with the %s part of the formatspec variable. Because %s is an arbitrary string therefore the descendsFrom,76::0 part of the line is ordered to this string. So with the formatspec '%d,%d::%d,%s,%d::%d\n' you will get the following cells form the first line:
400 100 400 'descendsFrom,76::0'
To solve this problem you have two possibilities:
formatspec = %d,%d::%d,descendsFrom,%d::%d\n
OR
formatspec = %d,%d::%d,%12s,%d::%d\n
In the first case the 'descendForm' string has to be contained by each row (as in your example). In the second case the string can be changed but its length must be 12.
Your Delimiter is "," you should first delimit it then maybe run a regex. Here is how I would go about it:
fileID = fopen('file.csv');
D = textscan(fileID,'%s %s %s %s ','Delimiter',','); %read everything as strings
column1 = regexprep(D{1},'*','')
column2 = regexprep(D{2},{'*',':'},{'',''})
column3 = D{3}
column4 = regexprep(D{4},{'*',':'},{'',''})
This should generate your 4 columns which you can then combine
I believe the Delimiter can only be one symbol. The more efficient way is to directly do regexprep on your entire line, which would generate:
test = '**400**,**4**::400,descendsFrom,**45**::0'
test = regexprep(test,{'*',':'},{'',''})
>> test = 400,4400,descendsFrom,450
You can do multiple delimiters in textscan, they need to be supplied as a cell array of strings. You don't need the end of line character in the format, and you need to set 'MultipleDelimsAsOne'. Don't have MATLAB to hand but something along these lines should work:
formatspec = '%d %d %*d %*s %d %*d';
data = textscan(fileID, formatspec,'Delimiter',{',',':'},'MultipleDelimsAsOne',1);
If you want to return it as a matrix of numbers not a cell array, try adding also the option 'CollectOutput',1

Nested textscan statements

The following two statements read the first line from an input file (fid) and parse said line into strings delimited by whitespace.
a = textscan(fid,'%s',1,'Delimiter','\n');
b = textscan(a{1}{1},'%s');
I would like to know if this action can be accomplished in a single statement, having a form similar to the following (which is syntactically invalid).
b = textscan(textscan(fid,'%s',1,'Delimiter','\n'),'%s');
Thanks.
Instead of
a = textscan(fid, '%s', 1, 'Delimiter', '\n');
you can use
a = fgetl(fid);
That will return the next line in fid as a string (the newline character at the end is stripped). You can then split that line into white-space separated chunks as follows:
b = regexp(a, '\s*', 'split');
Combined:
b = regexp(fgetl(fid), '\s*', 'split');
Note that this is not 100% equivalent to your code, since using textscan adds another cell-layer (representing different lines in the file). That's not a problem, though, simply use
b = {regexp(fgetl(fid), '\s*', 'split')};
if you need that extra cell-layer.