I have an excel worksheet with several entries of column data. The data is arranged in pairs such that the first column contains dates and the second contains time series data corresponding to that date. So for example time series 1 will be in columns A and B where is is the dates and B is the data. Column C is blank before columns D and E contain the entries for time series 2 and so on and so forth...
How do I merge these into a single file in Matlab where the dates match up? Specifically I would want the first column to contain the dates and the other columns to contain the data. I have tried to do this with fts and merge functions but so far failed..
You could grab the dates like this: dates = [raw{:,1}]' and the data like this data = reshape([raw{:, 2:3:end}]', size(raw,1), []); to get normal matlab matrices in case you want to manipulate them in matlab.
Otherwise if you just want to send them straight back to excel then:
data = [raw(:,1) reshape(raw(:, 2:3:end)];
xlswrite(...blablafilename_etc..., data);
But in this case you should have use a VBA macro :/
Related
if I have two dates in the following format "28-SEP-2018 12.40.00" in separate columns, does anyone know how can I create a new third column which can produce the result of the formula below.
(0.1+sum(DATE_1)- sum(DATE_2))/(0.1+sum(DATE_2))*100
I need to ingest a CDF (common data format) file into MATLAB. I have used the [cdfread][1] command for this purpose. An image of my output is attached below:
When I open data_import, columns 4 and 5 are in a particular 3 x 1 format (as shown in data_import(1,4)).
My question is: Is there a simple way to extract the data for each cell in column 4, such that for the 2nd row in data_import(1,4), it gets inserted as a new column (i.e. column 5) in the original data (data_import)? Similarly, 3rd row in data_import(1,4) should be inserted as a new column (column 6) in the original data (data_import). This procedure should also be repeated in the original Column 5 data which also has a similar 3 x 1 structure within each cell.
I hope I'm not being too vague in what I am describing, but I'm really not sure what I'm supposed to do regarding the commands to call for the operation. Thank you in advance.
Your desired final output has columns which are made up of these cells converted from 3 x 1 arrays to 1 x 3 cell arrays and then concatenated for each row. It's easier to do the concatenation first with the elements the "wrong way round" and then transpose the final result:
data_import = [data_import(:,1:3) num2cell([data_import{:,4}; data_import{:,5}]') data_import(:,6:end)];
I have two time series x and y which roughly cover the same period of time. The data is in daily form however there are some days that have data in one dataset but no data in the other. I wish to use matlab to create two data-sets of equal size with matching dates. Essentially I wish to remove the days that don't have data in both x and y. Is there a simple way to do this? Thanks.
You could use an inner join see help join if you are able to convert your timeseries into datasets. If not you could use the ismember function, but this time you should do it only on the dates.
Something like this will work:
a = {'2015-01-01', '2015-02-02', '2015-03-03'};
b = {'2015-01-01', '2015-03-03', '2015-04-04'};
newA = a(ismember(a,b));
newB = b(ismember(b,a));
I'm porting a Matlab script to Python. Below is an extract:
%// Create a list of unique trade dates
DateList = unique(AllData(:,1));
%// Loop through the dates
for DateIndex = 1:size(DateList,1)
CalibrationDate = DateList(DateIndex);
%// Extract the data for a single cablibration date (but all expiries)
SubsetIndices = ismember(AllData(:,1) , DateList(DateIndex)) == 1;
SubsetAllExpiries = AllData(SubsetIndices, :);
AllData is an N-by-6 cell matrix, the first 2 columns are dates (strings) and the other 4 are numbers. In python I will be getting this data out of a csv so something like this:
import numpy as np
AllData = np.recfromcsv(open("MyCSV.csv", "rb"))
So now if I'm not mistaken AllData is a numpy array of ordinary tuples. Is this is best format to have this data in? The goal will be to extract a list of unique dates from column 1, and for each date extract the rows with that date in column 1 (column one is ordered). Then for each row in column one do some maths on the numbers and date in the remaining 5 columns.
So in matlab I can get the list of dates by unique(AllData(:,1)) and then I can get the records (rows) corresponding to that date (i.e. with that date in columns one) like this:
SubsetIndices = ismember(AllData(:,1) , MyDate) == 1;
SubsetAllExpiries = AllData(SubsetIndices, :);
How can I best achieve the same results in Python?
To put things in context, np.recfromcsv is just a modified version of np.genfromtxt which outputs record arrays instead of structured arrays.
A structured array lets you access the individual fields (here, your columns) by their names, like in my_array["field_one"] while a record array gives you the same plus the possibility to access the fields as attributes, like in my_array.field_one. I'm not fond of "access-as-attributes", so I usually stick to structured arrays.
For your information, structurede/record arrays are not arrays of tuples, but arrays of some numpy object call a np.void: it's a block of memory composed of as many sub-blocks you have of fields, the size of each sub-block depending on its datatype.
That said, yes, what you seem to have in mind is exactly the kind of usage for a structured array. The approach would then be:
to take your dates array and filter them to find the unique elements.
to find the indices of these unique elements, as an array of integers we'll call, say, matching;
to use matching to access the corresponding records (eg, rows of your array) using fancy indexing, as
my_array[matching].
to perform your computations on the records, as you want.
Note that you can keep your dates as strings or transform them into datetime objects using a user-defined converter, as described in the documentation. For example, your could transform a YYYY-MM-DD into a datetime object with a lambda s:datetime.dateime.strptime(s,"%Y-%m-%d"). That way, instead of having, say, a N array where each row (a record) consists of two dates as strings and 4 floats, you would have a N array where each row consists of two datetime objects and 4 floats.
Note the shape of your array (via my_array.shape), it says (N,), meaning it's a 1D array, even if it looks like a 2D table with multiple columns. You can access individual fields (each "column") by using its name. For example, if we create an array consisting of one string field called first and one int field called second, like that:
x = np.array([('a',1),('b',2)], dtype=[('first',"|S10"),('second',int)])
you could access the first column with
>>> x['first']
array(['a', 'b'],
dtype='|S10')
How do I write a text file in the same format that it is read in MATLAB?
I looked and my question is almost the same as above question.
I want to read in a file which is 84641 x 175 long.
and i want a make a new .txt file with 84641 x 40 , deleteling rest of the columns.
I have 2 rewrite the dates n times. date is on first column in format 6/26/2010 and time on 2nd column in format ' 00:00:04'
when i use the code put in above question i keep getting the error
??? Error using ==> reshape
Product of known dimensions, 181,
not divisible into total number
of elements, 14812175.
Error in ==>
write at
data = reshape(data{1},N+6,[])';
when i comment this it has error in printf statements for date and data write.
Any ideas??
thanks
As the author of the accepted answer in the question you link to, I'll try to explain what I think is going wrong.
The code in my answer is designed to read data from a file which has a date XX/XX/XXXX in the first column, a time XX:XX:XX in the second column, and N additional columns of data.
You list the number of elements in data as 14812175, which is evenly divisible by 175. This implies that your input data file has 2 columns for the date and time, then 169 additional columns of data. This value of 169 is what you have to use for N. When the date and time columns are read from the input file they are broken up into 3 columns each in data (for a total of 6 columns), which when added to the 169 additional columns of data gives you 175.
After reshaping, the size of data should be 84641-by-175. The first 6 columns contain the date and time values. If you want to write the date, the time, and the first 40 columns of the additional data to a new file, you would only have to change one line of the code in my answer. This line:
fprintf(fid,', %.1f',data(i,7:end)); %# Output all columns of data
Should be changed to this:
fprintf(fid,', %.1f',data(i,7:46)); %# Output first 40 columns of data