retriving a text file content information in Matlab - matlab

I have a .txt file which includes 300 lines. For example the first line is:
ANSWER: correct: yes, time: 6.880674, guess: Lay, action: Lay, file: 16
or the second line is:
ANSWER: correct: no, time: 7.150422, guess: Put on top, action: Stir, file: 18
Only 'time' and 'file' values are numbers and the others are string.
I want to store the values of "correct", "time", "guess", "action" and "file" of the whole 300 lines in the different variables (like some arrays).
How can I do this in the Matlab?

Option 1:
You can use textscan with the following formatSpec:
formatSpec = 'ANSWER: correct:%s time:%f guess:%s action:%s file: %f';
data = textscan(fileID,formatSpec,'Delimiter',',');
where fileID is the file identifier obtained by fopen.
Option 2:
Another option is to use readtable, with the formatting above (directly with the file name, no fileID):
data = readtable('53485991.txt','Format',formatSpec,'Delimiter',',',...
'ReadVariableNames',false);
% the next lines are just to give the table variables some meaningful names:
varNames = strsplit(fmt,{'ANSWER',':','%s',' ','%f'});
data.Properties.VariableNames = varNames(2:end-1);
The result (ignore the values, as I messed that example a little bit while playing with it):
data =
4×5 table
correct time guess action file
_______ ______ _______________ ______ ____
'yes' 6.8888 'Lay' 'Lay' 16
'no' 7.8762 'Put on top' 'Stir' 18
'no' 7.1503 'Put on bottom' 'Stir' 3
'no' 7.151 'go' 'Stir' 270
The advantage in option 2 is that a table is a much more convenient way to hold these data than a cell array (which is the output of textscan).

Use fgetl to get a line of the file and while loop to read all of the lines.
For each line, use regexp to partition the string into cells by : and , delimiter. Then, use strip to remove leading and trailing whitespace for each cell.
Here is the solution:
f = fopen('a.txt');
aline = fgetl(f);
i = 1;
while ischar(aline)
content = strip(regexp(aline,':|,','split'));
correct{i} = content{3};
time(i) = str2double(content{5});
guess{i}= content{7};
action{i} = content{9};
file(i) = str2double(content{11});
i = i + 1;
aline = fgetl(f);
end
fclose(f);
Example:
Suppose a.txt file looks like this
ANSWER: correct: yes, time: 6.880674, guess: Lay, action: Lay, file: 16
ANSWER: correct: no, time: 7.150422, guess: Put on top, action: Stir, file: 18
After executing the script, the results are
correct =
1×2 cell array
'yes' 'no'
time =
6.8807 7.1504
guess =
1×2 cell array
'Lay' 'Put on top'
action =
1×2 cell array
'Lay' 'Stir'
file =
16 18

Related

Reading rows of data till its end

I am trying to read the data in column2 until the row ends. I don't know the number of rows of data, but it is finite ( < 100). The columns are space separated:
header line 1
header line 2
header line 3
column1 column2
1 2
3 5
5 7
7 9
. .
. .
. .
header line 4
header line 5
I tried the following code. It works if there are no further header lines:
mydata = dlmread('data.txt', '', 4, 1)
How does it work with further header lines after the data rows as shown above ends?
An easier solution is to use textscan to read your file. You can specify the number of header lines as an extra argument to the function call. The additional lines at the end of your file are ignored by the function when you specify the correct conversion specifier.
fileID = fopen('data.txt');
mydata = textscan(fileID,'%d%d','HeaderLines',4);
fclose(fileID);
mydata{2} contains data from column 2.
An easy approach is reading the file in string format, removing the lines, and writing to the new file.
% Read the file
fid = fopen(filePath,'r');
str = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
% Extract number lines
str2 = str{1}(5:end-2);
% Save as a text file
fid2 = fopen('new.txt','w');
fprintf(fid2,'%s\n', str2{:});
fclose(fid2);
mydata = dlmread('new.txt','',0,1);

writetable without dimension names

I'm trying to write a CSV from a table using writetable with writetable(..., 'WriteRowNames', true), but when I do so, Matlab defaults to putting Row in the (1,1) cell of the CSV. I know I can change Row to another string by setting myTable.Properties.DimensionNames{1} but I can't set that to be blank and so it seems like I'm forced to have some text in that (1,1) cell.
Is there a way to leave the (1,1) element of my CSV blank and still write the row names?
There doesn't appear to be any way to set any of the character arrays in the 'DimensionNames' field to either empty or whitespace. One option is to create your .csv file as you do above, then use xlswrite to clear that first cell:
xlswrite('your_file.csv', {''}, 1, 'A1');
Even though the xlswrite documentation states that the file argument should be a .xls, it still works properly for me.
Another approach could use memmapfile to modify the leading bytes of the file in memory.
For example:
% Set up data
LastName = {'Smith';'Johnson';'Williams';'Jones';'Brown'};
Age = [38;43;38;40;49];
Height = [71;69;64;67;64];
Weight = [176;163;131;133;119];
BloodPressure = [124 93; 109 77; 125 83; 117 75; 122 80];
T = table(Age, Height, Weight, BloodPressure, 'RowNames', LastName);
% Write data to CSV
fname = 'asdf.csv';
writetable(T, fname, 'WriteRowNames', true)
% Overwrite row dimension name in the first row
% Use memmapfile to map only the dimension name to memory
tmp = memmapfile(fname, 'Writable', true, 'Repeat', numel(T.Properties.DimensionNames{1}));
tmp.Data(:) = 32; % Change to the ASCII code for a space
clear('tmp'); % Clean up
Which brings us from:
Row,Age,Height,Weight,BloodPressure_1,BloodPressure_2
Smith,38,71,176,124,93
Johnson,43,69,163,109,77
Williams,38,64,131,125,83
Jones,40,67,133,117,75
Brown,49,64,119,122,80
To:
,Age,Height,Weight,BloodPressure_1,BloodPressure_2
Smith,38,71,176,124,93
Johnson,43,69,163,109,77
Williams,38,64,131,125,83
Jones,40,67,133,117,75
Brown,49,64,119,122,80
Unfortunately not quite deleted, but it's a fun approach.
Alternatively, you can use MATLAB's low level file IO to copy everything after the row dimension name to a new file, then overwrite the original:
fID = fopen(fname, 'r');
fID2 = fopen('tmp.csv', 'w');
fseek(fID, numel(T.Properties.DimensionNames{1}), 'bof');
fwrite(fID2, fread(fID));
fclose(fID);
fclose(fID2);
movefile('tmp.csv', fname);
Which produces:
,Age,Height,Weight,BloodPressure_1,BloodPressure_2
Smith,38,71,176,124,93
Johnson,43,69,163,109,77
Williams,38,64,131,125,83
Jones,40,67,133,117,75
Brown,49,64,119,122,80
No, that is currently not supported. The only workaround I see is to use a placeholder as dimension name and to programmatically remove it from the file afterwards.
writetable(T,fileFullPath,'WriteVariableNames',false);
When specify 'WriteVariableNames' as false (default one is true), then the variable/dimension names will NOT be written in the output file.
Ref link: https://uk.mathworks.com/help/matlab/ref/writetable.html

concatenate on the third dimension for each subject

I have one file .mat for each condition (4) and each subject (24). So, I have in total 96 files .mat.
Example:
cond1_sbj5_ToProcess_av.mat, cond1_sbj7_ToProcess_av.mat, cond1_sbj10_ToProcess_av.mat, etc.
cond2_sbj5_ToProcess_av.mat, cond2_sbj7_ToProcess_av.mat, cond2_sbj10_ToProcess_av.mat, etc.
cond3_sbj5_ToProcess_av.mat, cond3_sbj7_ToProcess_av.mat, cond3_sbj10_ToProcess_av.mat, etc.
cond4_sbj5_ToProcess_av.mat, cond4_sbj7_ToProcess_av.mat, cond4_sbj10_ToProcess_av.mat, etc.
In each file, depending on the condition, I have a variable that is 66x3000single. AA1 for condition1, AA2 for condition2, AA3 for condition3, AA4 for condition4.
I would like to concatenate on the third dimension AA1,AA2,AA3,AA4 for each subject, in a loop.
So, for each subject I should obtain a 3D matrix/structure with 4 'sheets' as third dimension.
Any suggestion? Thanks a lot.
ok new try with all previous approaches deleted. Here you go. The problem was that cond_number was a string and using a string as an index created the 52, also the load (filename) was missing. However, I tried it with some dummy files and now it works, but only if your subject numbers are running from 1 to 24 and not like 1 5 7 9 19 .... If that is the case you need to modify the code accordingly. Good luck!
clear all
close all
origindir = 'c:\yourdirectory';
cd (origindir)
av_files = dir(fullfile('*.mat'));
mymatrix = zeros(24,66,3000,4);
for ifile = 1:size(av_files,1)
filename = av_files(ifile).name;
load(filename)
if ~isempty(str2num(filename(10:11)))
sub_number = str2num(filename(10:11));
else
sub_number = str2num(filename(10));
end
cond_number_str = filename(5);
cond_number = str2double(cond_number_str);
varname = strcat('AA',cond_number_str);
mymatrix(sub_number,:,:,cond_number)=eval(sprintf(varname));
end
for sub = 1:24
varname2 = strcat('newmat',num2str(sub));
eval([sprintf('%s = squeeze(mymatrix(%i,:,:,:));',varname2,sub)])
end

Modify the value of a specific position in a text file in Matlab

AIR, ID
AIR.SIT
50 1 1 1 0 0 2 1
43.57 -116.24 1. 857.7
Hi, All,
I have a text file like above. Now in Matlab, I want to create 5000 text files, changing the number "2" (the specific number in the 3rd row) from 1 to 5000 in each file, while keeping other contents the same. In every loop, the changed number is the same with the loop number. And the output in every loop is saved into a new text file, with the name like AIR_LoopNumber.SIT.
I've spent some time writing on that. But it is kind of difficult for a newby. Here is what I have:
% - Read source file.
fid = fopen ('Air.SIT');
n = 1;
textline={};
while (~feof(fid))
textline(n,1)={fgetl(fid)};
end
FileName=Air;
% - Replace characters when relevant.
for i = 1 : 5000
filename = sprintf('%s_%d.SIT','filename',i);
end
Anybody can help on finishing the program?
Thanks,
James
If your file is so short you do not have to read it line by line. Just read the full thing in one variable, modify only the necessary part of it before each write, then write the full variable back in one go.
%% // read the full file as a long sequence of 'char'
fid = fopen ('Air.SIT');
fulltext = fread(fid,Inf,'char') ;
fclose(fid) ;
%% // add a few blank placeholder (3 exactly) to hold the 4 digits when we'll be counting 5000
fulltext = [fulltext(1:49) ; 32 ; 32 ; 32 ; fulltext(50:end) ] ;
idx2replace = 50:53 ; %// save the index of the characters which will be modified each loop
%% // Go for it
baseFileName = 'AIR_%d.SIT' ;
for iFile = 1:1000:5000
%// build filename
filename = sprintf(baseFileName,iFile);
%// modify the string to write
fulltext(idx2replace) = num2str(iFile,'%04d').' ; %//'
%// write the file
fidw = fopen( filename , 'w' ) ;
fwrite(fidw,fulltext) ;
fclose(fidw) ;
end
This example works with the text in your example, you may have to adjust slightly the indices of the characters to replace if your real case is different.
Also I set a step of 1000 for the loop to let you try and see if it works without writing 1000's of file. When you are satisfied with the result, remove the 1000 step in the for loop.
Edit:
The format specifier %04d I gave in the first solution insure the output will take 4 characters, and it will pad any smaller number with zero (ex: 23 => 0023). It is sometimes desirable to keep the length constant, and in your particular example it made things easier because the output string would be exactly the same length for all the files.
However it is not mandatory at all, if you do not want the loop number to be padded with zero, you can use the simple format %d. This will only use the required number of digits.
The side effect is that the output string will be of different length for different loop number, so we cannot use one string for all the iterations, we have to recreate a string at each iteration. So the simple modifications are as follow. Keep the first paragraph of the solution above as is, and replace the last 2 paragraphs with the following:
%% // prepare the block of text before and after the character to change
textBefore = fulltext(1:49) ;
textAfter = fulltext(51:end) ;
%% // Go for it
baseFileName = 'AIR_%d.SIT' ;
for iFile = 1:500:5000
%// build filename
filename = sprintf(baseFileName,iFile);
%// rebuild the string to write
fulltext = [textBefore ; num2str(iFile,'%d').' ; textAfter ]; %//'
%// write the file
fidw = fopen( filename , 'w' ) ;
fwrite(fidw,fulltext) ;
fclose(fidw) ;
end
Note:
The constant length of character for a number may not be important in the file, but it can be very useful for your file names to be named AIR_0001 ... AIR_0023 ... AIR_849 ... AIR_4357 etc ... because in a list they will appear properly ordered in any explorer windows.
If you want your files named with constant length numbers, the just use:
baseFileName = 'AIR_%04d.SIT' ;
instead of the current line.

Read specific hex data in CSV file in MATLAB

I've looked through the posts on StackOverflow and can't seem to find the answer I am looking for. I have a large CSV file (450 MB) with hex data that looks like this:
63C000CF,6000002F,603000AF,6000C06F,617300EF,6C7C001F,6000009F,0%,63C000CF...
That is a very truncated example, but basically I have approximately 78 different hex values separated by commas, then there will be the '0%', then 78 more hex values. This will continue for a very long time. I've been using textscan like this:
data = textscan(fid, '%s', 1, 'delimiter', '%');
data = textscan(data{1}{1}, '%s', 'delimiter', ',');
data = data{1};
count = size(data);
outstring = ['%', sprintf('\n')];
for idx = 1:count(1)
string = data{idx};
stringSize = size(string);
if stringSize(2) > 1
outstring = [outstring, string, sprintf('\n')];
end
end
fprintf(output_fid, '%s', outstring)
This allowed me to format the csv file in a way to which I could use fgetl() to analyze whether or not I was looking at the data I needed. Because the data repeats itself, I can use fseek() to jump to the next occurrence before calling fgetl() again.
What I need is a way to skip to the ending. I want to just be able to use something like fgetl() but have it only return the first hex value it encounters. I will know how many bytes to shift through the file. Then I need to make sure I can read other hex values. Is what I'm asking possible? My code using textscan above takes far too long on a csv file that is 90 MB let alone 450 MB.
Answer obtained from user Cedric Wannaz on the Mathworks MATLAB Central Answers page.
NEW solution
Here is a more efficient solution; I am using a 122MB file, so you have an idea about the timing
% One line for reading the whole file. To perform once only.
tic ;
content = fileread( 'adam_1.txt' ) ;
fprintf( 'Time for reading the file : %.2fs\n', toc ) ;
% One line for defining an extraction function. To perform once only.
extract = #(label) content(bsxfun( #plus, ...
strfind( content, [label,','] ).' - 6, ...
0 : 5 )) ;
% Then it is one call per label to extract data.
tic ;
data = extract( 'CF' ) ;
fprintf( 'Time for extracting one label: %.2fs\n', toc ) ;
Running this, I obtain
Time for reading the file : 0.52s
Time for extracting one label: 0.62s
FORMER solution
Would the following work for you?
% Read file content. To do once only.
content = fileread( 'myFile.txt' ) ;
% Define regexp-based extraction function. To do once only.
getByLabel = #(label) regexp( content, sprintf( '\\w{6}(?=%s)', label ), ...
'match' ) ;
% Get all entries for e.g. label 'CF'.
entries_CF = getByLabel( 'CF' ) ;
% Get all entries for e.g. label '6F'.
entries_6F = getByLabel( '6F' ) ;
I am not completely clear on what you need to achieve ultimately; if I had to design a GUI where users can choose a label and get corresponding data, I would process the data much further during the init phase, e.g. by grouping them by label in a cell array. Regexp is not the most efficient approach in this case I guess, but the principle would be..
labels = {'CF', '6F', 'AF', ..} ;
nLabels = numel( labels ) ;
data = cell{ 1, nLabels ) ;
for lId = 1 : nLabels
data{lId} = getByLabel( labels{lId} ) ;
end
and then when a user selects 'CF' ..
lId = strcmpi( label, labels ) ;
dataForThisLabel = data{lId} ;