joining arrays in Matlab and writing to file using dlmwrite( ) adds extra space - matlab

I am generating 2500 values in Matlab in format (time,heart_rate, resp_rate) by using below code
numberOfSeconds = 2500;
time = 1:numberOfSeconds;
newTime = transpose(time);
number0 = size(newTime, 1)
% generating heart rates
heart_rate = 50 +(70-50) * rand (numberOfSeconds,1);
intHeartRate = int64(heart_rate);
number1 = size(intHeartRate, 1)
% hist(heart_rate)
% generating resp rates
resp_rate = 50 +(70-50) * rand (numberOfSeconds,1);
intRespRate = int64(resp_rate);
number2 = size(intRespRate, 1)
% hist(heart_rate)
% joining time and sensor data
joinedStream = strcat(num2str(newTime),{','},num2str(intHeartRate),{','},num2str(intRespRate))
dlmwrite('/Users/amar/Desktop/geenrated/rate.txt', joinedStream,'delimiter','');
The data shown in the console is alright, but when I save this data to a .txt file, it contains extra spaces in beginning. Hence I am not able to parse the .txt file to generate input stream. Please help

Replace the last two lines of your code with the following. No need to use strcat if you want a CSV output file.
dlmwrite('/Users/amar/Desktop/geenrated/rate.txt', [newTime intHeartRate intRespRate]);

π‘‡β„Žπ‘’ π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› 𝑠𝑒𝑔𝑔𝑒𝑠𝑑𝑒𝑑 𝑏𝑦 π‘ƒπΎπ‘œ 𝑖𝑠 π‘‘β„Žπ‘’ π‘ π‘–π‘šπ‘π‘™π‘’π‘ π‘‘ π‘“π‘œπ‘Ÿ π‘¦π‘œπ‘’π‘Ÿ π‘π‘Žπ‘ π‘’. π‘‡β„Žπ‘–π‘  π‘Žπ‘›π‘ π‘€π‘’π‘Ÿ 𝑒π‘₯π‘π‘™π‘Žπ‘–π‘›π‘  π‘€β„Žπ‘¦ π‘¦π‘œπ‘’ 𝑔𝑒𝑑 π‘‘β„Žπ‘’ 𝑒𝑛𝑒π‘₯𝑝𝑒𝑐𝑑𝑒𝑑 π‘œπ‘’π‘‘π‘π‘’π‘‘.
The data written in the file is exactly what is shown in the console.
>> joinedStream(1) %The exact output will differ since 'rand' is used
ans =
cell
' 1,60,63'
num2str basically converts a matrix into a character array. Hence number of characters in its each row must be same. So for each column of the original matrix, the row with the maximum number of characters is set as a standard for all the rows with less characters and the deficiency is filled by spaces. Columns are separated by 2 spaces. Take a look at the following smaller example to understand:
>> num2str([44, 42314; 4, 1212421])
ans =
2Γ—11 char array
'44 42314'
' 4 1212421'

Related

matlab: speed up for loop with string analysis

I have a very lare csv file containing three columns. Now I want to load these columns as fast as possible into a matlab matrix.
Currently what I do is this
fid = fopen(inputfile, 'rt');
g = textscan(fid,'%s','delimiter','\r\n');
tdata = g{1};
fclose(fid);
results = zeros([numel(tdata)-4], 3);
tic
display('start reading data...');
for r = 4:numel(tdata)
if ~mod(r, 100)
display(['data row: ' num2str(r) ' / ' num2str(numel(tdata))]);
end
entries = strsplit(tdata{r}, ',');
results(r-3,1) = str2double(strrep(entries{1},',', '.'));
results(r-3,2) = str2double(strrep(entries{2},',', '.'));
results(r-3,3) = str2double(strrep(entries{3},',', '.'));
end
This however takes ~30 seconds for 200 000 lines. This means 150 Β΅s per line. This is really slow. The code is not accepted by parfor.
Now I would like to know what causes the bottleneck in the for loop and how I can speed it up.
Here the measured times:
str2double 578253 calls 29.631s
strsplit 192750 calls 13.388s
EDIT:
The content has this structure in the file
0.000000, -0.00271, 5394147
0.000667, -0.00271, 5394148
0.001333, -0.00271, 5394149
0.002000, -0.00271, 5394150
I think a lot can be improved by calling textscan differently.
You do this:
g = textscan(fid,'%s','delimiter','\r\n');
But then call tdata = g{1};
If textscan is called correctly it should already split all your data, and give it back as numbers.
Try this:
g=textscan(fid,'%f,%f,%f,'delimiter','\r\n')
It should give you back three cell arrays with in the columns your values. To convert to a matrix you can use:
g=cell2mat(g)
I imported 200k lines in 0.12 seconds.
It seems your code has some other workarounds. You start at r=4, it seems you have 3 lines that you don't want to read. so after fopen you can call 3 times
[~] =fgetl(fid)
to get to the interesting part of your file.
You also first split the line with ',' as seperator. But the replace all ',' by '.'. That will not do anything, all ',' are already gone since they were used as seperators.
If you used csvread you wouldn't need to use str2double or strsplit, which you say are the slow lines... it's likely much quicker for a csv.
You would be able to replace all the above code by:
results = csvread(inputfile);

MATLAB reading CSV file with timestamp and values

I have the following sample from a CSV file. Structure is:
Date ,Time(Hr:Min:S:mS), Value
2015:08:20,08:20:19:123 , 0.05234
2015:08:20,08:20:19:456 , 0.06234
I then would like to read this into a matrix in MATLAB.
Attempt :
Matrix = csvread('file_name.csv');
Also tried an attempt formatting the string.
fmt = %u:%u:%u %u:%u:%u:%u %f
Matrix = csvread('file_name.csv',fmt);
The problem is when the file is read the format is wrong and displays it differently.
Any help or advice given would be greatly appreciated!
EDIT
When using #Adriaan answer the result is
2015 -11 -9
8 -17 -1
So it seems that MATLAB thinks the '-' is the delimiter(separator)
Matrix = csvread('file_name.csv',1,0);
csread does not support a format specifier. Just enter the number of header rows (I took it to be one, as per example), and number of header columns, 0.
You file, however, contains non-numeric data. Thus import it with importdata:
data = importdata('file_name.csv')
This will get you a structure, data with two fields: data.data contains the numeric data, i.e. a vector containing your value. data.textdata is a cell containing the rest of the data, you need the first two column and extract the numerics from it, i.e.
for ii = 2:size(data.textdata,1)
tmp1 = data.textdata{ii,1};
Date(ii,1) = datenum(tmp1,'YYYY:MM:DD');
tmp2 = data.textdata{ii,2};
Date(ii,2) = datenum(tmp2,'HH:MM:SS:FFF');
end
Thanks to #Excaza it turns out milliseconds are supported.

read complicated format .txt file into Matlab

I have a txt file that I want to read into Matlab. Data format is like below:
term2 2015-07-31-15_58_25_612 [0.9934343, 0.3423043, 0.2343433, 0.2342323]
term0 2015-07-31-15_58_25_620 [12]
term3 2015-07-31-15_58_25_625 [2.3333, 3.4444, 4.5555]
...
How can I read these data in the following way?
name = [term2 term0 term3] or namenum = [2 0 3]
time = [2015-07-31-15_58_25_612 2015-07-31-15_58_25_620 2015-07-31-15_58_25_625]
data = {[0.9934343, 0.3423043, 0.2343433, 0.2342323], [12], [2.3333, 3.4444, 4.5555]}
I tried to use textscan in this way 'term%d %s [%f, %f...]', but for the last data part I cannot specify the length because they are different. Then how can I read it? My Matlab version is R2012b.
Thanks a lot in advance if anyone could help!
There may be a way to do that in one single pass, but for me these kind of problems are easier to sort with a 2 pass approach.
Pass 1: Read all the columns with a constant format according to their type (string, integer, etc ...) and read the non constant part in a separate column which will be processed in second pass.
Pass 2: Process your irregular column according to its specificities.
In a case with your sample data, it looks like this:
%% // read file
fid = fopen('Test.txt','r') ;
M = textscan( fid , 'term%d %s %*c %[^]] %*[^\n]' ) ;
fclose(fid) ;
%% // dispatch data into variables
name = M{1,1} ;
time = M{1,2} ;
data = cellfun( #(s) textscan(s,'%f',Inf,'Delimiter',',') , M{1,3} ) ;
What happened:
The first textscan instruction reads the full file. In the format specifier:
term%d read the integer after the literal expression 'term'.
%s read a string representing the date.
%*c ignore one character (to ignore the character '[').
%[^]] read everything (as a string) until it finds the character ']'.
%*[^\n] ignore everything until the next newline ('\n') character. (to not capture the last ']'.
After that, the first 2 columns are easily dispatched into their own variable. The 3rd column of the result cell array M contains strings of different lengths containing different number of floating point number. We use cellfun in combination with another textscan to read the numbers in each cell and return a cell array containing double:
Bonus:
If you want your time to be a numeric value as well (instead of a string), use the following extension of the code:
%% // read file
fid = fopen('Test.txt','r') ;
M = textscan( fid , 'term%d %f-%f-%f-%f_%f_%f_%f %*c %[^]] %*[^\n]' ) ;
fclose(fid) ;
%% // dispatch data
name = M{1,1} ;
time_vec = cell2mat( M(1,2:7) ) ;
time_ms = M{1,8} ./ (24*3600*1000) ; %// take care of the millisecond separatly as they are not handled by "datenum"
time = datenum( time_vec ) + time_ms ;
data = cellfun( #(s) textscan(s,'%f',Inf,'Delimiter',',') , M{1,end} ) ;
This will give you an array time with a Matlab time serial number (often easier to use than strings). To show you the serial number still represent the right time:
>> datestr(time,'yyyy-mm-dd HH:MM:SS.FFF')
ans =
2015-07-31 15:58:25.612
2015-07-31 15:58:25.620
2015-07-31 15:58:25.625
For comlicated string parsing situations like such it is best to use regexp. In this case assuming you have the data in file data.txt the following code should do what you are looking for:
txt = fileread('data.txt')
tokens = regexp(txt,'term(\d+)\s(\S*)\s\[(.*)\]','tokens','dotexceptnewline')
% Convert namenum to numeric type
namenum = cellfun(#(x)str2double(x{1}),tokens)
% Get time stamps from the second row of all the tokens
time = cellfun(#(x)x{2},tokens,'UniformOutput',false);
% Split the numbers in the third column
data = cellfun(#(x)str2double(strsplit(x{3},',')),tokens,'UniformOutput',false)

Import Mixed CSV that has quotes around text

I am importing a CSV file that is comma delimited into MATLAB. Each column has quotes around anything I want to consider as text and then a comma.
I am using read_mixed_csv function from the answer to this question to read in the data as a cell: Import CSV file with mixed data types
thisdata = read_mixed_csv(fname, ','); % Reads in the CSV file
thisdata = regexprep(thisdata, '^"|"$','');
However, since a few of my columns look like this:
"FAIRHOPE, Alabama"
"FAIRHOPE HIGH SCHOOL, FAIRHOPE, ALABAMA"
"Daphne-Fairhope-Foley, AL"
MATLAB places everything after a comma into a new column. So
"Daphne-Fairhope-Foley, AL"
Becomes two columns
"Daphne-Fairhope-Foley
AL"
How can I get MATLAB to read in a mixed csv file and not only consider a comma as a delimiter, but also consider the quotation marks? Is there a more automated way of doing it than textscan? If textscan is an option, what would that look like?
Here is a sample of the data I'm trying to read in with the header included:
"State Code","County Code","Site Num","Parameter Code","POC","Latitude","Longitude","Datum","Parameter Name","Sample Duration","Pollutant Standard","Date Local","Units of Measure","Event Type","Observation Count","Observation Percent","Arithmetic Mean","1st Max Value","1st Max Hour","AQI","Method Name","Local Site Name","Address","State Name","County Name","City Name","CBSA Name","Date of Last Change"
"01","003","0010","88101",1,30.498001,-87.881412,"NAD83","PM2.5 - Local Conditions","24 HOUR","PM25 24-hour 2006","2013-01-01","Micrograms/cubic meter (LC)","None",1,100.0,7.3,7.3,0,30,"R & P Model 2025 PM2.5 Sequential w/WINS - GRAVIMETRIC","FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, FAIRHOPE, ALABAMA","Alabama","Baldwin","Fairhope","Daphne-Fairhope-Foley, AL","2014-02-11"
"01","003","0010","88101",1,30.498001,-87.881412,"NAD83","PM2.5 - Local Conditions","24 HOUR","PM25 24-hour 2006","2013-01-04","Micrograms/cubic meter (LC)","None",1,100.0,7.6,7.6,0,32,"R & P Model 2025 PM2.5 Sequential w/WINS - GRAVIMETRIC","FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, FAIRHOPE, ALABAMA","Alabama","Baldwin","Fairhope","Daphne-Fairhope-Foley, AL","2014-02-11"
"01","003","0010","88101",1,30.498001,-87.881412,"NAD83","PM2.5 - Local Conditions","24 HOUR","PM25 24-hour 2006","2013-01-07","Micrograms/cubic meter (LC)","None",1,100.0,8.6,8.6,0,36,"R & P Model 2025 PM2.5 Sequential w/WINS - GRAVIMETRIC","FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, FAIRHOPE, ALABAMA","Alabama","Baldwin","Fairhope","Daphne-Fairhope-Foley, AL","2014-02-11"
"01","003","0010","88101",1,30.498001,-87.881412,"NAD83","PM2.5 - Local Conditions","24 HOUR","PM25 24-hour 2006","2013-01-10","Micrograms/cubic meter (LC)","None",1,100.0,7,7,0,29,"R & P Model 2025 PM2.5 Sequential w/WINS - GRAVIMETRIC","FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, FAIRHOPE, ALABAMA","Alabama","Baldwin","Fairhope","Daphne-Fairhope-Foley, AL","2014-02-11"
*Note: Converting the CSV file to a tab delimited file makes it easier for MATLAB to deal with and circumvents this problem.
Having a text qualifier (like ") is a little tricky, but the following might work if you ensure that each row of your table will have the same number of columns (and probably no empty ones).
Anything not within the text qualifier must be convertible to a number.
function C = csvmixed(eachLine,delim,textQualifier)
% Outputs cell containing mixed string and numeric data given a delimiter (',')
% and a text qualifier ('"'). Each line of the delimited file must be loaded into
% the cell array eachLine, and each line must have the same number of columns.
%
% Example:
% fid = fopen('testcsv.txt','r');
% eachLine = textscan(fid,'%s','Delimiter','\n'); fclose(fid);
% C = csvmixed(eachLine{1},',','"')
assert(ischar(delim) && numel(delim)==1);
assert(ischar(textQualifier) && numel(textQualifier)==1);
% find strings, as specified by the input qualifier
patternStr = sprintf('"([^"]*)"%c?',delim);
patternStr = strrep(patternStr,'"',textQualifier);
Cstr = regexp(eachLine,patternStr,'tokens');
% find numeric data
patternNum = sprintf('(?<=(,|^))[^%c,a-zA-Z]*(?=(,|$))',textQualifier);
patternNum = strrep(patternNum,',',delim);
Cnum = regexp(eachLine,patternNum,'match','emptymatch');
numCols = cellfun(#numel,Cstr) + cellfun(#numel,Cnum);
assert(nnz(diff(numCols))==0,'Number of columns not consistent.')
% get string extents (begin, start) indexes for each string
strExtents = regexp(eachLine,patternStr,'tokenExtents');
% deal out parsed data for each line
C = cell(numel(eachLine),numCols(1));
for ii = 1:numel(eachLine),
strBounds = vertcat(strExtents{ii}{:});
delimLocs = getDelimLocs(eachLine{ii},strBounds,delim);
strCellMap = getCellMap(strBounds,delimLocs);
C(ii,strCellMap) = [Cstr{ii}{:}]; % TODO: preallocate
C(ii,~strCellMap) = num2cell(str2double(Cnum{ii})); % all else must be numeric
end
end
function delimLocs = getDelimLocs(lineText,solidBounds,delim)
delimCharLocs = strfind(lineText,delim);
delimLocs = delimCharLocs(~any(bsxfun(#ge,delimCharLocs,solidBounds(:,1)) & ...
bsxfun(#le,delimCharLocs,solidBounds(:,2)),1));
end
function cellMap = getCellMap(typeBounds,delimLocs)
cellMap = any(bsxfun(#gt,typeBounds(:,1),[0 delimLocs]) & ...
bsxfun(#lt,typeBounds(:,1),[delimLocs Inf]), 1);
end
UPDATE: Fix small typos in getDelimLocs. Add preallocation of cell array.
Use the file exchange code replaceinfile to replace the strings that have commas in them with a period instead.
Use read_mixed_csv from Import CSV file with mixed data types to read in the file.
Remove the extra quotes from the strings that are still left.
replaceinfile(', ', '. ', fname); % Replace commas that was inside quotes and not meant to be separated as periods so they don't show up as a new column
thisdata = read_mixed_csv(fname, ','); % Reads in the CSV file (\t for tab)
thisdata = regexprep(thisdata, '^"|"$',''); % Remove quotes from file and only keep the first 28 columns (last two columns are empty)
For replaceinfile.m function:
For running the code on Linux, change the first line of the section on Perl to
perlCmd = sprintf('"%s"', '/usr/bin/perl');

Extract variables from a textfile in matlab

I have a large text file (~3500 lines) which contains output data from an instrument. This consists of repeated entries from different measurements which are all formatted as follows:
* LogFrame Start *
variablename1: value
variablename2: value
...
variablename35: value
* LogFrame End *
Each logframe contains 35 variables. From these I would like to extract two, 'VName' and 'EMGMARKER' and put the associated values in columns in a matlab array, (i.e. the array should be (VName,EMGMARKER) which I can then associate with data from another output file which I already have in a matlab array. I have no idea where to start with this in order to extract the variables from this file, so hence my searches on the internet so far have been unsuccessful. Any advice on this would be much appreciated.
You could use textscan:
C = textscan(file_id,'%s %f');
Then you extract the variables of interest like this:
VName_indices = strcmp(C{1},'VName:');
VName = C{2}(VName_indices);
If the variable names, so 'VName' and 'EMGMARKER' , are always the same you can just read through the file and search for lines containing them and extract data from there. I don't know what values they contain so you might have to use cells instead of arrays when extracting them.
fileID = fopen([target_path fname]); % open .txt file
% read the first line of the specified .txt file
current_line = fgetl(fileID);
counter = 1;
% go through the whole file
while(ischar(current_line))
if ~isempty(strfind(current_line, VName))
% now you find the location of the white space delimiter between name and value
d_loc = strfind(current_line,char(9));% if it's tab separated - so \t
d_loc = strfind(current_line,' ');% if it's a simple white space
variable_name{counter} = strtok(current_line);% strtok returns everything up until the first white space
variable_value{counter} = str2double(strtok(current_line(d_loc(1):end)));% returns everything form first to second white space and converts it into a double
end
% same block for EMGMARKER
counter = counter+1;
current_line = fgetl(fileID);% next line
end
You do the same for EMGMARKER and you have cells containing the data you need.