I have a text file containing randoms of 0 and 1, and I want to read it in Matlab and obtain each element in an array
goal :
I have two text files that I want to compare and see if they are identical and how much difference there is, in fact, the two files are :
1) original file that I send via a communication line
2) the received file should be identical to the send file
Example of my code:
for i=1:1:size
if (send[i] ~= received[i]) error++;
end
but I need to know how to obtain these two arrays from the text files, where all the "0" and "1" are in one line
Since you want to check that the contents of the two files are the same, I do not think you need to worry about the format of their contents or the sequence of zeros and ones, they should be simply identical. You can use the following code to read the entire text file and store it in a char vector:
C = char(join(readlines(filename), ''));
To compare contents of two files and find the error percent you can do the following:
act = char(join(readlines(actualfilename), ''));
exp = char(join(readlines(expectedfilename), ''));
err = (sum(act~=exp))/length(act);
But you should also detect if two files contain different number of characters:
act = char(join(readlines(actualfilename), ''));
exp = char(join(readlines(expectedfilename), ''));
al = length(act); % actual length
el = length(exp); % expected length
dl = abs(al-el);
if (dl>0)
ml = min(al, el); % min length
act = act(1:ml); % shorten act if needed
exp = exp(1:ml); % shorten exp if needed
end
err = (sum(act~=exp)+dl)/al % error
Note that in the second case, if a character is added or lost in the middle of the file, all subsequent characters will be considered as error.
Reading in the Text Files:
If the text file is configured with spaces or line breaks:
Text.txt (line breaks)
0
1
0
1
1
Text.txt (spaces)
1 0 1 0 1 1
Scanning in the data can be done by using the fscanf() function with format specification %d indicated to scan in the file as integers.
File_Name = "Text.txt";
File_ID = fopen(File_Name);
Binary = fscanf(File_ID,'%d');
If the text file has the characters beside/concatenated on the same line without spaces:
Text.txt (single line, no spaces)
01011
Scanning the text file can be done using the format specification, %s indicated to read the file as a string. This string can be split and converted into an array by using split(), cell2mat() and str2num().
split() β Splits the string into a cell array with individual bits/binary
cell2mat() β Converts the cell array to a character array
str2num() β Converts the character array to a numerical double array
File_Name = "Text.txt";
File_ID = fopen(File_Name);
Binary = fscanf(File_ID,'%s');
Binary = split(Binary,'');
Binary = str2num(cell2mat(Binary(2:end-1))).';
Comparing to Evaluate Amount of Errors:
Error checking can be done by comparing the arrays logically in an element-wise fashion. Then by using the nnz() (number of non-zeroes) function we can count the number of times the condition is true, "1". Here the condition is when the two binary signals Binary_1 and Binary_2 not equal to each other.
Code Snippet:
Error = nnz(Binary_1 ~= Binary_2);
Error
Full Script Option 1 (line breaks/spaces text file):
File_Name = "Text_1.txt";
File_ID = fopen(File_Name);
Binary_1 = fscanf(File_ID,'%d');
fclose(File_ID);
File_Name = "Text_2.txt";
File_ID = fopen(File_Name);
Binary_2 = fscanf(File_ID,'%d');
fclose(File_ID);
clearvars -except Binary_1 Binary_2
Error = nnz(Binary_1 ~= Binary_2);
Error
Full Script Option 2 (single line, no spaces text file):
File_Name = "Text_1.txt";
File_ID = fopen(File_Name);
Binary_1 = fscanf(File_ID,'%s');
Binary_1 = split(Binary_1,'');
Binary_1 = str2num(cell2mat(Binary_1(2:end-1))).';
File_Name = "Text_2.txt";
File_ID = fopen(File_Name);
Binary_2 = fscanf(File_ID,'%s');
Binary_2 = split(Binary_2,'');
Binary_2 = str2num(cell2mat(Binary_2(2:end-1))).';
fclose(File_ID);
clearvars -except Binary_1 Binary_2
Error = nnz(Binary_1 ~= Binary_2);
Error
Ran using MATLAB R2019b
I am generating 2500 values in Matlab in format (time,heart_rate, resp_rate) by using below code
numberOfSeconds = 2500;
time = 1:numberOfSeconds;
newTime = transpose(time);
number0 = size(newTime, 1)
% generating heart rates
heart_rate = 50 +(70-50) * rand (numberOfSeconds,1);
intHeartRate = int64(heart_rate);
number1 = size(intHeartRate, 1)
% hist(heart_rate)
% generating resp rates
resp_rate = 50 +(70-50) * rand (numberOfSeconds,1);
intRespRate = int64(resp_rate);
number2 = size(intRespRate, 1)
% hist(heart_rate)
% joining time and sensor data
joinedStream = strcat(num2str(newTime),{','},num2str(intHeartRate),{','},num2str(intRespRate))
dlmwrite('/Users/amar/Desktop/geenrated/rate.txt', joinedStream,'delimiter','');
The data shown in the console is alright, but when I save this data to a .txt file, it contains extra spaces in beginning. Hence I am not able to parse the .txt file to generate input stream. Please help
Replace the last two lines of your code with the following. No need to use strcat if you want a CSV output file.
dlmwrite('/Users/amar/Desktop/geenrated/rate.txt', [newTime intHeartRate intRespRate]);
πβπ π πππ’π‘πππ π π’ππππ π‘ππ ππ¦ ππΎπ ππ π‘βπ π ππππππ π‘ πππ π¦ππ’π πππ π. πβππ πππ π€ππ ππ₯ππππππ π€βπ¦ π¦ππ’ πππ‘ π‘βπ π’πππ₯ππππ‘ππ ππ’π‘ππ’π‘.
The data written in the file is exactly what is shown in the console.
>> joinedStream(1) %The exact output will differ since 'rand' is used
ans =
cell
' 1,60,63'
num2str basically converts a matrix into a character array. Hence number of characters in its each row must be same. So for each column of the original matrix, the row with the maximum number of characters is set as a standard for all the rows with less characters and the deficiency is filled by spaces. Columns are separated by 2 spaces. Take a look at the following smaller example to understand:
>> num2str([44, 42314; 4, 1212421])
ans =
2Γ11 char array
'44 42314'
' 4 1212421'
I have a txt file that I want to read into Matlab. Data format is like below:
term2 2015-07-31-15_58_25_612 [0.9934343, 0.3423043, 0.2343433, 0.2342323]
term0 2015-07-31-15_58_25_620 [12]
term3 2015-07-31-15_58_25_625 [2.3333, 3.4444, 4.5555]
...
How can I read these data in the following way?
name = [term2 term0 term3] or namenum = [2 0 3]
time = [2015-07-31-15_58_25_612 2015-07-31-15_58_25_620 2015-07-31-15_58_25_625]
data = {[0.9934343, 0.3423043, 0.2343433, 0.2342323], [12], [2.3333, 3.4444, 4.5555]}
I tried to use textscan in this way 'term%d %s [%f, %f...]', but for the last data part I cannot specify the length because they are different. Then how can I read it? My Matlab version is R2012b.
Thanks a lot in advance if anyone could help!
There may be a way to do that in one single pass, but for me these kind of problems are easier to sort with a 2 pass approach.
Pass 1: Read all the columns with a constant format according to their type (string, integer, etc ...) and read the non constant part in a separate column which will be processed in second pass.
Pass 2: Process your irregular column according to its specificities.
In a case with your sample data, it looks like this:
%% // read file
fid = fopen('Test.txt','r') ;
M = textscan( fid , 'term%d %s %*c %[^]] %*[^\n]' ) ;
fclose(fid) ;
%% // dispatch data into variables
name = M{1,1} ;
time = M{1,2} ;
data = cellfun( #(s) textscan(s,'%f',Inf,'Delimiter',',') , M{1,3} ) ;
What happened:
The first textscan instruction reads the full file. In the format specifier:
term%d read the integer after the literal expression 'term'.
%s read a string representing the date.
%*c ignore one character (to ignore the character '[').
%[^]] read everything (as a string) until it finds the character ']'.
%*[^\n] ignore everything until the next newline ('\n') character. (to not capture the last ']'.
After that, the first 2 columns are easily dispatched into their own variable. The 3rd column of the result cell array M contains strings of different lengths containing different number of floating point number. We use cellfun in combination with another textscan to read the numbers in each cell and return a cell array containing double:
Bonus:
If you want your time to be a numeric value as well (instead of a string), use the following extension of the code:
%% // read file
fid = fopen('Test.txt','r') ;
M = textscan( fid , 'term%d %f-%f-%f-%f_%f_%f_%f %*c %[^]] %*[^\n]' ) ;
fclose(fid) ;
%% // dispatch data
name = M{1,1} ;
time_vec = cell2mat( M(1,2:7) ) ;
time_ms = M{1,8} ./ (24*3600*1000) ; %// take care of the millisecond separatly as they are not handled by "datenum"
time = datenum( time_vec ) + time_ms ;
data = cellfun( #(s) textscan(s,'%f',Inf,'Delimiter',',') , M{1,end} ) ;
This will give you an array time with a Matlab time serial number (often easier to use than strings). To show you the serial number still represent the right time:
>> datestr(time,'yyyy-mm-dd HH:MM:SS.FFF')
ans =
2015-07-31 15:58:25.612
2015-07-31 15:58:25.620
2015-07-31 15:58:25.625
For comlicated string parsing situations like such it is best to use regexp. In this case assuming you have the data in file data.txt the following code should do what you are looking for:
txt = fileread('data.txt')
tokens = regexp(txt,'term(\d+)\s(\S*)\s\[(.*)\]','tokens','dotexceptnewline')
% Convert namenum to numeric type
namenum = cellfun(#(x)str2double(x{1}),tokens)
% Get time stamps from the second row of all the tokens
time = cellfun(#(x)x{2},tokens,'UniformOutput',false);
% Split the numbers in the third column
data = cellfun(#(x)str2double(strsplit(x{3},',')),tokens,'UniformOutput',false)
I have a text file which has the contents as follows.. I need to read this file column wise (ies, 2 columns here). I have tried many ways.. but cannot do as it contains "(" , "," , ")" etc... please guide..
(-2.714141687294326, 0.17700122506478025)
(-2.8889905690592976, 0.1449494260855578)
(-2.74534285564141, 0.3182989792519164)
(-2.728716536554531, -0.3267545129349194)
(-2.280859632844493, -0.7413304490629143)
(-2.8205377507406095, 0.08946138452856946)
(-2.6261449731466335, -0.16338495969832847)
(-2.8863827317805537, 0.5783117541867042)
(-2.6727557978209546, 0.11377424587411682)
(-2.5069470906518565, -0.6450688986485736)
(-2.6127552309087236, -0.01472993916137419)
(-2.7861092661880185, 0.23511200020171835)
(-3.2238037438656533, 0.5113945870063824)
(-2.6447503899420304, -1.1787646364375748)
Try this:
x = importdata('filename.txt');
x = regexp(x,'-?\d+\.?\d*','match'); %// detect numbers as [-]a[.][bcd]
x = cellfun(#str2num, vertcat(x{:}));
If the file can contain numbers in decimal form ("1.234") and in scientific notation ("1.234e-56"):
x = importdata('filename.txt');
x = regexp(x,'-?\d+\.?\d*(e-?\d+)?','match');
x = cellfun(#str2num, vertcat(x{:}));
You can use fscanf and specify the desired format:
fid = fopen('filename.txt');
x = fscanf(fid,'(%f, %f)\n',[2,inf]).';
fclose(fid);
The format spec '(%f, %f)\n' reads to float values inside brackets, separated by , per line. With [2,inf] you specify to put it into a 2 x n array, where n is as large as needed. To have the same format as before, you'll have to transpose it again .'.
I am importing a CSV file that is comma delimited into MATLAB. Each column has quotes around anything I want to consider as text and then a comma.
I am using read_mixed_csv function from the answer to this question to read in the data as a cell: Import CSV file with mixed data types
thisdata = read_mixed_csv(fname, ','); % Reads in the CSV file
thisdata = regexprep(thisdata, '^"|"$','');
However, since a few of my columns look like this:
"FAIRHOPE, Alabama"
"FAIRHOPE HIGH SCHOOL, FAIRHOPE, ALABAMA"
"Daphne-Fairhope-Foley, AL"
MATLAB places everything after a comma into a new column. So
"Daphne-Fairhope-Foley, AL"
Becomes two columns
"Daphne-Fairhope-Foley
AL"
How can I get MATLAB to read in a mixed csv file and not only consider a comma as a delimiter, but also consider the quotation marks? Is there a more automated way of doing it than textscan? If textscan is an option, what would that look like?
Here is a sample of the data I'm trying to read in with the header included:
"State Code","County Code","Site Num","Parameter Code","POC","Latitude","Longitude","Datum","Parameter Name","Sample Duration","Pollutant Standard","Date Local","Units of Measure","Event Type","Observation Count","Observation Percent","Arithmetic Mean","1st Max Value","1st Max Hour","AQI","Method Name","Local Site Name","Address","State Name","County Name","City Name","CBSA Name","Date of Last Change"
"01","003","0010","88101",1,30.498001,-87.881412,"NAD83","PM2.5 - Local Conditions","24 HOUR","PM25 24-hour 2006","2013-01-01","Micrograms/cubic meter (LC)","None",1,100.0,7.3,7.3,0,30,"R & P Model 2025 PM2.5 Sequential w/WINS - GRAVIMETRIC","FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, FAIRHOPE, ALABAMA","Alabama","Baldwin","Fairhope","Daphne-Fairhope-Foley, AL","2014-02-11"
"01","003","0010","88101",1,30.498001,-87.881412,"NAD83","PM2.5 - Local Conditions","24 HOUR","PM25 24-hour 2006","2013-01-04","Micrograms/cubic meter (LC)","None",1,100.0,7.6,7.6,0,32,"R & P Model 2025 PM2.5 Sequential w/WINS - GRAVIMETRIC","FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, FAIRHOPE, ALABAMA","Alabama","Baldwin","Fairhope","Daphne-Fairhope-Foley, AL","2014-02-11"
"01","003","0010","88101",1,30.498001,-87.881412,"NAD83","PM2.5 - Local Conditions","24 HOUR","PM25 24-hour 2006","2013-01-07","Micrograms/cubic meter (LC)","None",1,100.0,8.6,8.6,0,36,"R & P Model 2025 PM2.5 Sequential w/WINS - GRAVIMETRIC","FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, FAIRHOPE, ALABAMA","Alabama","Baldwin","Fairhope","Daphne-Fairhope-Foley, AL","2014-02-11"
"01","003","0010","88101",1,30.498001,-87.881412,"NAD83","PM2.5 - Local Conditions","24 HOUR","PM25 24-hour 2006","2013-01-10","Micrograms/cubic meter (LC)","None",1,100.0,7,7,0,29,"R & P Model 2025 PM2.5 Sequential w/WINS - GRAVIMETRIC","FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, FAIRHOPE, ALABAMA","Alabama","Baldwin","Fairhope","Daphne-Fairhope-Foley, AL","2014-02-11"
*Note: Converting the CSV file to a tab delimited file makes it easier for MATLAB to deal with and circumvents this problem.
Having a text qualifier (like ") is a little tricky, but the following might work if you ensure that each row of your table will have the same number of columns (and probably no empty ones).
Anything not within the text qualifier must be convertible to a number.
function C = csvmixed(eachLine,delim,textQualifier)
% Outputs cell containing mixed string and numeric data given a delimiter (',')
% and a text qualifier ('"'). Each line of the delimited file must be loaded into
% the cell array eachLine, and each line must have the same number of columns.
%
% Example:
% fid = fopen('testcsv.txt','r');
% eachLine = textscan(fid,'%s','Delimiter','\n'); fclose(fid);
% C = csvmixed(eachLine{1},',','"')
assert(ischar(delim) && numel(delim)==1);
assert(ischar(textQualifier) && numel(textQualifier)==1);
% find strings, as specified by the input qualifier
patternStr = sprintf('"([^"]*)"%c?',delim);
patternStr = strrep(patternStr,'"',textQualifier);
Cstr = regexp(eachLine,patternStr,'tokens');
% find numeric data
patternNum = sprintf('(?<=(,|^))[^%c,a-zA-Z]*(?=(,|$))',textQualifier);
patternNum = strrep(patternNum,',',delim);
Cnum = regexp(eachLine,patternNum,'match','emptymatch');
numCols = cellfun(#numel,Cstr) + cellfun(#numel,Cnum);
assert(nnz(diff(numCols))==0,'Number of columns not consistent.')
% get string extents (begin, start) indexes for each string
strExtents = regexp(eachLine,patternStr,'tokenExtents');
% deal out parsed data for each line
C = cell(numel(eachLine),numCols(1));
for ii = 1:numel(eachLine),
strBounds = vertcat(strExtents{ii}{:});
delimLocs = getDelimLocs(eachLine{ii},strBounds,delim);
strCellMap = getCellMap(strBounds,delimLocs);
C(ii,strCellMap) = [Cstr{ii}{:}]; % TODO: preallocate
C(ii,~strCellMap) = num2cell(str2double(Cnum{ii})); % all else must be numeric
end
end
function delimLocs = getDelimLocs(lineText,solidBounds,delim)
delimCharLocs = strfind(lineText,delim);
delimLocs = delimCharLocs(~any(bsxfun(#ge,delimCharLocs,solidBounds(:,1)) & ...
bsxfun(#le,delimCharLocs,solidBounds(:,2)),1));
end
function cellMap = getCellMap(typeBounds,delimLocs)
cellMap = any(bsxfun(#gt,typeBounds(:,1),[0 delimLocs]) & ...
bsxfun(#lt,typeBounds(:,1),[delimLocs Inf]), 1);
end
UPDATE: Fix small typos in getDelimLocs. Add preallocation of cell array.
Use the file exchange code replaceinfile to replace the strings that have commas in them with a period instead.
Use read_mixed_csv from Import CSV file with mixed data types to read in the file.
Remove the extra quotes from the strings that are still left.
replaceinfile(', ', '. ', fname); % Replace commas that was inside quotes and not meant to be separated as periods so they don't show up as a new column
thisdata = read_mixed_csv(fname, ','); % Reads in the CSV file (\t for tab)
thisdata = regexprep(thisdata, '^"|"$',''); % Remove quotes from file and only keep the first 28 columns (last two columns are empty)
For replaceinfile.m function:
For running the code on Linux, change the first line of the section on Perl to
perlCmd = sprintf('"%s"', '/usr/bin/perl');