Trouble importing csv-data within MATLAB - matlab

I am trying to read in a csv-file that contains daily data on EUR/USD exchange rates including the dates specifying year, month and day. The problem is that using readtable(filename) puts single quotes around all table-entries and therefore hinders me using the data at all.
Detect import options:
opts = detectImportOptions('EUR_USD Historische Data.csv');
Read in the data:
EUR_USD = readtable('EUR_USD Historische Data.csv');
Substract dates and transform to datetime variable:
dt = EUR_USD(:,1);
dates = datetime(dt,'InputFormat','yyyyMMdd');
% Does not work because of single quotes
I was able to subtract closing prices and make them workable, but I am not sure if this is an elegant way of doing so:
closing_prices = str2double(table2array(EUR_USD(:,5)));
Ultimately the goal is to make the data workable. I need to compare two columns with datetime-variables and if dates do not match between the two columns I need to remove that entry such that in the end both columns match.
This is the vector with dates:
Dates vector wrong
I need it to look like this:
Dates vector correct

I think all you need to do is remove the ' character in order to read the data into datetime correctly. Look at the following example:
%stringz is the same as dt here: just the string data
T = table;
T.stringz = string(['''string1'''; '''string2'''; '''string3''']);
stringz = T.stringz;
%Run the for loop to remove the ' chars
for i = 1:length(stringz)
strval = char(stringz(i,1));
strval = strval(2:end-1);
strmat(i,1) = string(strval);
end
%Then load data into datetime after this for loop
dates = datetime(strmat,'InputFormat','yyyyMMdd');
strmat return a 3x1 string array with no ' characters on the outside of the string.

Related

OCTAVE data import from PCE-VDL data logger device and conversion of decimal coma to decimal point

I have a measurement device PCE-VDL, which gives me measurements in following CSV format below, which I need to import to OCTAVE for further investigation.
Especially I need to import last 3 columns with xyz acceleration data.
The file is in CSV format with delimiter of semicolon ";".
I have tried:
A_1 = importdata ("file.csv", ";", 3);
but have recieved
error: missing_idx(10): out of bound 9
The CSV file looks like this:
#PCE-VDL X - TableView series
#2020.16.11
#Date;Time;Duration [s];t [°C];RH [%];p [mbar];aX [g];aY [g];aZ [g];
2020.28.10;16:16:32:0000;00:000;;;;0,0195;-0,0547;1,0039;
2020.28.10;16:16:32:0052;00:005;;;;0,0898;-0,0273;0,8789;
2020.28.10;16:16:32:0104;00:010;;;;0,0977;-0,0313;0,9336;
2020.28.10;16:16:32:0157;00:015;;;;0,1016;-0,0273;0,9297;
The numbers in last 3 columns have also decimal coma and not decimal point. So there probably should be done also some conversion.
Thank you very much for any help.
Regards
EDIT: 18.11.2020
Thanks for help. I have tried now following:
A_1_str = fileread ("file.csv");
A_1_str_m = strrep (A_1_str, ".", "-");
A_1_str_m = strrep (A_1_str_m, ",", ".");
save "A_1_str_m.csv" A_1_str_m;
A_1 = importdata ("A_1_str_m.csv", ";", 8);
and still receive error: file_content(140): out of bound 139
There is probably some problem with time format in first columns, which I do not want to read. I just need last three columns.
After my conversion, the file looks like this:
# Created by Octave 5.1.0, Wed Nov 18 21:40:52 2020 CET <zdenek#ASUS-F5V>
# name: A_1_str_m
# type: sq_string
# elements: 1
# length: 7849
#PCE-VDL X - TableView series
#2020-16-11
#Date;Time;Duration [s];t [°C];RH [%];p [mbar];aX [g];aY [g];aZ [g];
2020-28-10;16:16:32:0000;00:000;;;;0.0195;-0.0547;1.0039;
2020-28-10;16:16:32:0052;00:005;;;;0.0898;-0.0273;0.8789;
2020-28-10;16:16:32:0104;00:010;;;;0.0977;-0.0313;0.9336;
Thanks for support!
You can first read the data with fileread, which stores the data as a string. Then you can manipulate the string like this:
new_string = strrep(string, ",", ".");
strrep replaces all occurrences of a pattern within a string. Afterwards you save this data as a separate file or you overwrite the existing file with the manipulated data. When this is done you proceed as you have tried before.
EDIT: 19.11.2020
To avoid the additional heading lines in the new file, you can save it like this:
fid = fopen("A_1_str_m.csv", "w");
fputs(fid, A_1_str_m);
fclose(fid);
fputs will just write the string to the file.
The you can read the new file with dlmread.
A1_buf = dlmread("A_1_str_m.csv", ";");
A1_buf = real(A1); # get the real value of the complex number
A1_buf(1:3, :) = []; # remove the headlines
A1 = A1_buf(:, end-3:end-1); # get only the the 3 columns you're looking for
This will give you the three columns your looking for. But the date and time data will be ignored.
EDIT 20.11.2020
Replaced abs with real, so the sign of the value will be kept.
Use csv2cell from the io package.

Suppressing unwanted spaces in dates

From reading code from elsewhere, I have a matrix of dates called 'time' that have unwanted spaces that I want removed.
I've tried isspace and regexprep with no luck
time = regexprep(time, '\W', '');
I have about 130000 dates with the following format:
04-July -2017 09:54:30.000
04-July -2017 09:54:31.000
etc
There are two spaces between the end of 'July' to the next dash I want to suppress to:
04-July-2017 09:54:30.000
04-July-2017 09:54:31.000
Replace two or more spaces with nothing:
>> time = {'04-July -2017 09:54:30.000'
'04-July -2017 09:54:31.000'}
>> regexprep(time,' {2,}','')
{'04-July-2017 09:54:30.000'}
{'04-July-2017 09:54:31.000'}
Unless you just want to correct your input file for later usage, you do not necessarily need to correct the input. There are several ways to parse the time directly with the extra spaces:
Let time be:
time = ['04-July -2017 09:54:31.000';
'04-July -2017 09:54:32.000']
Then to directly parse the string representation of the datetime into a MATLAB date serial number you can use:
%% get date in [MATLAB date serial number]
formatIn = 'dd-mmm -yyyy HH:MM:SS.FFF' ;
matlabTime = datenum(time,formatIn)
matlabTime =
736880.412858796
736880.41287037
This serial time representation is not so human readable but it's the fastest thing you can get if you want to do calculations with date/time.
If your goal is simply to correct the string, then you can use the same trick to read the value in, and define exactly which output format you want out:
%% get date in [string]
formatIn = 'dd-mmm -yyyy HH:MM:SS.FFF' ;
formatOut = 'dd-mmm-yyyy HH:MM:SS.FFF' ;
stringTime = datestr(datenum(time,formatIn),formatOut)
stringTime =
04-Jul-2017 09:54:31.000
04-Jul-2017 09:54:32.000
If you want to use the new datetime objects, the input format has a slight different syntax but the operation is roughly the same:
%% get date in [datetime] objects
formatIn = 'dd-MMM -yyyy HH:mm:ss.SSS' ;
t = datetime(time,'InputFormat',formatIn)
t =
04-Jul-2017 09:54:31
04-Jul-2017 09:54:32
Although the MATLAB console display t in human readable format, t is now a datetime object. Check the documentation if you want to use this.
Replace only two white-spaces after a month and preceding a dash (-):
>> date = '04-July -2017 09:54:30.000';
>> regexprep(date, '(\w) -', '$1-')
ans =
'04-July-2017 09:54:30.000'

Pull specific cells out of a cell array by comparing the last digit of their filename

I have a cell array of filenames - things like '20160303_144045_4.dat', '20160303_144045_5.dat', which I need to separate into separate arrays by the last digit before the '.dat'; one cell array of '...4.dat's, one of '...5.dat's, etc.
My code is below; it uses regex to split the file around the '.dat', reshapes a bit then regexes again to pull out the last number of the filename, builds a cell to store the filenames in then, and then I get a tad stuck. I have an array produced such as '1,0,1,0,1,0..' of required cell indexes which I thought might be trivial to pull out, but I'm struggling to get it to do what I want.
numFiles = length(sampleFile); %sampleFile is the input cell array
splitFiles = regexp(sampleFile,'.dat','split');
column = vertcat(splitFiles{:});
column = column(:,1);
splitNums = regexp(column,'_','split');
splitNums = splitNums(:,1);
column = vertcat(splitNums{:});
column = column(:,3);
column = cellfun(#str2double,column); %produces column array of values - 3,4,3,4,3,4, etc
uniqueVals = unique(column);
numChannels = length(uniqueVals);
fileNameCell = cell(ceil(numFiles/numChannels),numChannels);
for i = 1:numChannels
column(column ~= uniqueVals(i)) = 0;
column = column / uniqueVals(i); %e.g. 1,0,1,0,1,0
%fileNameCell(i)
end
I feel there should be an easier way than my hodge-podge of code, and I don't want to throw together a ton of messy for-loops if I can avoid it; I definitely believe I've overcomplicated this problem massively.
We can neaten your code quite a bit.
Take some example data:
files = {'abc4.dat';'abc5.dat';'def4.dat';'ghi4.dat';'abc6.dat';'def5.dat';'nonum.dat'};
You can get the final numbers using regexp and matching one or more digits followed by '.dat', then using strrep to remove the '.dat'.
filenums = cellfun(#(r) strrep(regexp(r, '\d+.dat', 'match', 'once'), '.dat', ''), ...
files, 'uniformoutput', false);
Now we can put these in a structure, using the unique numbers (prefixed by a letter because fields can't start with numbers) as field names.
% Get unique file numbers and set up the output struct
ufilenums = unique(filenums);
filestruct = struct;
% Loop over file numbers
for ii = 1:numel(ufilenums)
% Get files which have this number
idx = cellfun(#(r) strcmp(r, ufilenums{ii}), filenums);
% Assign the identified files to their struct field
filestruct.(['x' ufilenums{ii}]) = files(idx);
end
Now you have a neat output
% Files with numbers before .dat given a field in the output struct
filestruct.x4 = {'abc4.dat' 'def4.dat' 'ghi4.dat'}
filestruct.x5 = {'abc5.dat' 'def5.dat'}
filestruct.x6 = {'abc6.dat'}
% Files without numbers before .dat also captured
filestruct.x = {'nonum.dat'}

MATLAB reading CSV file with timestamp and values

I have the following sample from a CSV file. Structure is:
Date ,Time(Hr:Min:S:mS), Value
2015:08:20,08:20:19:123 , 0.05234
2015:08:20,08:20:19:456 , 0.06234
I then would like to read this into a matrix in MATLAB.
Attempt :
Matrix = csvread('file_name.csv');
Also tried an attempt formatting the string.
fmt = %u:%u:%u %u:%u:%u:%u %f
Matrix = csvread('file_name.csv',fmt);
The problem is when the file is read the format is wrong and displays it differently.
Any help or advice given would be greatly appreciated!
EDIT
When using #Adriaan answer the result is
2015 -11 -9
8 -17 -1
So it seems that MATLAB thinks the '-' is the delimiter(separator)
Matrix = csvread('file_name.csv',1,0);
csread does not support a format specifier. Just enter the number of header rows (I took it to be one, as per example), and number of header columns, 0.
You file, however, contains non-numeric data. Thus import it with importdata:
data = importdata('file_name.csv')
This will get you a structure, data with two fields: data.data contains the numeric data, i.e. a vector containing your value. data.textdata is a cell containing the rest of the data, you need the first two column and extract the numerics from it, i.e.
for ii = 2:size(data.textdata,1)
tmp1 = data.textdata{ii,1};
Date(ii,1) = datenum(tmp1,'YYYY:MM:DD');
tmp2 = data.textdata{ii,2};
Date(ii,2) = datenum(tmp2,'HH:MM:SS:FFF');
end
Thanks to #Excaza it turns out milliseconds are supported.

Date Format in MATLAB

Here is my code. I have 59 CSV files in my directory, I have 11 variables in each, with first one date in quarter format. I need to tell MATLAB that first column has date format, because the code below imports it as a string variable.
ext = '.csv';
countries = dir(['*', ext]);
countryFiles = {countries.name};
countriesNames = strrep(countryFiles, ext, '');
%%
a = cell(length(countriesNames), 2);
a(:,1) = countriesNames(:)';
a(:,2) = cellfun(#(file) readtable(file, 'TreatAsEmpty', '.','Format'?), countryFiles(:), 'uni', 0);
As far as I understand this is option 'Format' with datenum in readtable... However, I can`t find useful information on it in helpfiles and whatever I try I get errors... Below how my data looks for each country. Data is 1980Q1-2014Q4.
So I have 59*2 cell array with first column representing countryNames and second column contains 59 140*11 tables. First I do not know how to access variable names within cell array`s table. Second problem is that if you try to define x to a column in that table and then datenum(x,'YYYYQQ') I get "Input expected to be a cell array, was table instead."
So, I need to extract from cell a tables in a{1,2}, a{2,2}, a{3,2} ... then convert those tables to cellarray using table2cell option. How to do it using loop for each country and then write it back to the cell?