Changing format of many files in Excel - perl

I have a folder filled with thousands of csv files. When I open one file, the data looks like:
20110503 01:46.0 1527.8 1 E
20110503 01:46.0 1537.8 1 E
20110504 37:40.0 1536.6 1 E
20110504 37:40.0 1533.6 1 E
20110504 36:17.0 1531.1 1 E
The second column(time) has minutes and seconds before the decimal point. If I select the second column, right click and click format cells, select time, and change to 13:30:55 mode, the same data looks like:
20110503 19:01:46 1527.8 1 E
20110503 19:01:46 1537.8 1 E
20110504 0:37:40 1536.6 1 E
20110504 0:37:40 1533.6 1 E
20110504 8:36:17 1531.1 1 E
Now I can see hours, minutes and seconds. I have written a matlab function that reads these files, but needs to be able to read the hours. The function can only be used after I change the format to display the hours. Now I have to apply the function to all the files in the folder.
I'm wondering, is there a way to change the default time display so hours are included? If not, is there a way of writing a script to change the format of these files? Thanks!
Note: the part of my matlab function that reads the file looks like:
fid = fopen('E:\Tick Data\Data Output\NGU13.csv','rt');
c = fscanf(fid, '%d,%d:%d:%d,%f,%d,%*c');
datamat = reshape(c,6,length(c)/6)'; % reshape into matrix
yyyymmdd = datamat(:,1);
hr = datamat(:,2);
mn = datamat(:,3);
sec = datamat(:,4);
pp = datamat(:,5); % price
vv = datamat(:,6); % volume
In Excel:
In Notepad, you can see hours, minutes, seconds, and milliseconds:
20111206,09:50:56.411,4.320,1,E
20111206,10:02:10.167,4.300,1,E
20111206,11:24:09.052,4.313,1,E
20111206,11:46:09.359,4.307,1,E
20111206,11:50:22.785,4.320,1,E

For a record of the type
20010402, 09:30:24.456, 4.235, 1, E
you should use this fmt:
fmt = '%f%f:%f:%f.%f%f%*s';
data = textscan(fid, fmt, 'Delimiter',',','CollectOutput',true);

Related

Get matrix from one text file

i need get matrix from attached text.
this file:
stk.v.11.0
WrittenBy STK_v11.2.0
BEGIN Attitude
NumberOfAttitudePoints 1441
BlockingFactor 20
InterpolationOrder 1
CentralBody Earth
ScenarioEpoch 1 Sep 2019 07:30:00.000000
Epoch in JDate format: 2458727.81250000000000
Epoch in YYDDD format: 19244.31250000000000
Time of first point: 1 Sep 2019 07:30:00.000000000 UTCG = 2458727.81250000000000 JDate = 19244.31250000000000 YYDDD
CoordinateAxes Fixed
AttitudeTimeQuaternions
0.00000000000000e+00 -6.31921733422396e-01 -3.17293118154861e-01 -3.46458799928973e-01 6.16414065342263e-01
6.00000000000000e+01 -6.43812442721383e-01 -3.36172942616126e-01 -3.26612615262581e-01 6.04828480481266e-01
1.20000000000000e+02 -6.55079207620880e-01 -3.54631351046725e-01 -3.06354155948717e-01 5.92667670562959e-01
1.80000000000000e+02 -6.65707851270427e-01 -3.72650894083426e-01 -2.85703135087044e-01 5.79946623834615e-01
i need seprate this 4 row of matrix and asset it to one matrix variable...
in fact i have bigger matrix than that (that near 1400 rows) i must extract from text file and store it in one variable of matrix...
Use readmatrix and specify the number of lines to skip to get to the matrix data. If the number is fixed at 25 as in your example, then:
>> a = readmatrix('textfile.txt','NumHeaderLines',25)
a =
0 -0.6319 -0.3173 -0.3465 0.6164
60.0000 -0.6438 -0.3362 -0.3266 0.6048
120.0000 -0.6551 -0.3546 -0.3064 0.5927
180.0000 -0.6657 -0.3727 -0.2857 0.5799

Analyze weather data stored in csv

I have some weather data stored in a csv file in the form of: „id, date, temperature, rainfall“, with id being the weather station and, obviously, date being the date of measurement. The file contains the data of 3 different stations over a period of 10 years.
What I'd like to do is analyze the data of each station and each year. For example: I'd like to calculate day-to-day differences in temperature [abs((n+1)-n)] for each station and each year.
I thought while-loops could be a possibility, with the loop calculating something as long as the id value is equal to the one in the next row.
But I’ve no idea how to do it.
Best regards
If you still need assistance, I would consider importing the .csv file data using "readtable". So long as only the first row are text, MATLAB will create a 'table' variable (this shouldn't be an issue for a .csv file). The individual columns can be accessed via "tablename.header" and can be reestablished as double data type (ex variable_1=tablename.header). You can then concatenate your dataset as you like. As for sorting by date and station id, I would advocate using "sortrows". For example, if the station id is the first column, sortrow(data,1) will sort "data" by the station id. sortrow(data, [1 2]) will sort "data" by the first column, then by the second column. From there, you can write an if statement to compare the station id's and perform the required calculations. I hope my brief answer is somewhat helpful.
A basic code structure would be:
path=['copy and paste file path here']; % show matlab where to look
data=readtable([path '\filename.csv'], 'ReadVariableNames',1); % read the file from csv format to table
variable1=data.header1 % general example of making double type variable from table
variable2=data.header2
variable3=data.header3
double_data=[variable1 variable2 variable3]; % concatenates the three columns together
sorted_data=sortrows(double_data, [1 2]); % sorts double_data by column 1 then column 2
It always helps to have actual data to work on and specifics as to what kind of output format is expected. Basically, ins and outs :) With the little info provided, I figured I would generate random data for you in the first section, and then calculate some stats in the second. I include the loop as an example since that's what you asked, but I highly recommend using vectorized calculations whenever available, such as the one done in summary stats.
%% example for weather stations
% generation of random data to correspond to what your csv file looks like
rng(1); % keeps the random seed for testing purposes
nbDates = 1000; % number of days of data
nbStations = 3; % number of weather stations
measureDates = repmat((now()-(nbDates-1):now())',nbStations,1); % nbDates days of data ending today
stationIds = kron((1:nbStations)',ones(nbDates,1)); % assuming 3 weather stations with IDs [1,2,3]
temp = rand(nbStations*nbDates,1)*70+30; % temperatures are in F and vary between 30 and 100 degrees
rain = max(rand(nbStations*nbDates,1)*40-20,0); % rain fall is 0 approximately half the time, and between 0mm and 20mm the rest of the time
csv = table(measureDates, stationIds, temp, rain);
clear measureDates stationIds temps rain;
% augment the original dataset as needed
years = year(csv.measureDates);
data = [csv,array2table(years)];
sorted = sortrows( data, {'stationIds', 'measureDates'}, {'ascend', 'ascend'} );
% example looping through your data
for i = 1 : size( sorted, 1 )
fprintf( 'Id=%d, year=%d, temp=%g, rain=%g', sorted.stationIds( i ), sorted.years( i ), sorted.temp( i ), sorted.rain( i ) );
if( i > 1 && sorted.stationIds( i )==sorted.stationIds( i-1 ) && sorted.years( i )==sorted.years( i-1 ) )
fprintf( ' => absolute difference with day before: %g', abs( sorted.temp( i ) - sorted.temp( i-1 ) ) );
end
fprintf( '\n' ); % new line
end
% depending on the statistics you wish to do, other more efficient ways of
% accessing summary stats might be accessible, for example:
grpstats( data ...
, {'stationIds','years'} ... % group by categories
, {'mean','min','max','meanci'} ... % statistics we want
, 'dataVars', {'temp','rain'} ... % variables on which to calculate stats
) % doesn't require data to be sorted or any looping
This produces one line printed for each row of data (and only calculates difference in temperature when there is no year or station change). It also produces some summary stats at the end, here's what I get:
stationIds years GroupCount mean_temp min_temp max_temp meanci_temp mean_rain min_rain max_rain meanci_rain
__________ _____ __________ _________ ________ ________ ________________ _________ ________ ________ ________________
1_2016 1 2016 82 63.13 30.008 99.22 58.543 67.717 6.1181 0 19.729 4.6284 7.6078
1_2017 1 2017 365 65.914 30.028 99.813 63.783 68.045 5.0075 0 19.933 4.3441 5.6708
1_2018 1 2018 365 65.322 30.218 99.773 63.275 67.369 4.7039 0 19.884 4.0615 5.3462
1_2019 1 2019 188 63.642 31.16 99.654 60.835 66.449 5.9186 0 19.864 4.9834 6.8538
2_2016 2 2016 82 65.821 31.078 98.144 61.179 70.463 4.7633 0 19.688 3.4369 6.0898
2_2017 2 2017 365 66.002 30.054 99.896 63.902 68.102 4.5902 0 19.902 3.9267 5.2537
2_2018 2 2018 365 66.524 30.072 99.852 64.359 68.69 4.9649 0 19.812 4.2967 5.6331
2_2019 2 2019 188 66.481 30.249 99.889 63.647 69.315 5.2711 0 19.811 4.3234 6.2189
3_2016 3 2016 82 61.996 32.067 98.802 57.831 66.161 4.5445 0 19.898 3.1523 5.9366
3_2017 3 2017 365 63.914 30.176 99.902 61.932 65.896 4.8879 0 19.934 4.246 5.5298
3_2018 3 2018 365 63.653 30.137 99.991 61.595 65.712 5.3728 0 19.909 4.6943 6.0514
3_2019 3 2019 188 64.201 30.078 99.8 61.319 67.082 5.3926 0 19.874 4.4541 6.3312

how to read from file and display the data in desired rows in Matlab

I am trying to read from a file and display the data in rows 6, 11, 111 and 127 in Matlab. I could not figure out how to do it. I have been searching Matlab forums and this platform for an answer. I used fscanf, textscan and other functions but they did not work as intended. I also used a for loop but again the output was not what I wanted. I can now only read one row and display it. Simply I want to display all of them(data in rows given above) at the same time. How can I do that?
matlab code
n = [0 :1: 127];
%% Problem 1
figure
x1 = cos(0.17*pi*n)
%it creates file and writes content of x1 to the file
fileID = fopen('file.txt','w');
fprintf(fileID,'%d \n',x1);
fclose(fileID);
%line number can be changed in order to obtain wanted values.
fileID = fopen('file.txt');
line = 6;
C = textscan(fileID,'%s',1,'delimiter','\n', 'headerlines',line-1);
celldisp(C)
fclose(fileID);
and this is the file
1
8.607420e-01
4.817537e-01
-3.141076e-02
-5.358268e-01
-8.910065e-01
-9.980267e-01
-8.270806e-01
-4.257793e-01
9.410831e-02
5.877853e-01
9.177546e-01
9.921147e-01
7.901550e-01
3.681246e-01
-1.564345e-01
-6.374240e-01
-9.408808e-01
-9.822873e-01
-7.501111e-01
-3.090170e-01
2.181432e-01
6.845471e-01
9.602937e-01
9.685832e-01
7.071068e-01
2.486899e-01
-2.789911e-01
-7.289686e-01
-9.759168e-01
-9.510565e-01
-6.613119e-01
-1.873813e-01
3.387379e-01
7.705132e-01
9.876883e-01
9.297765e-01
6.129071e-01
1.253332e-01
-3.971479e-01
-8.090170e-01
-9.955620e-01
-9.048271e-01
-5.620834e-01
-6.279052e-02
4.539905e-01
8.443279e-01
9.995066e-01
8.763067e-01
5.090414e-01
-4.288121e-15
-5.090414e-01
-8.763067e-01
-9.995066e-01
-8.443279e-01
-4.539905e-01
6.279052e-02
5.620834e-01
9.048271e-01
9.955620e-01
8.090170e-01
3.971479e-01
-1.253332e-01
-6.129071e-01
-9.297765e-01
-9.876883e-01
-7.705132e-01
-3.387379e-01
1.873813e-01
6.613119e-01
9.510565e-01
9.759168e-01
7.289686e-01
2.789911e-01
-2.486899e-01
-7.071068e-01
-9.685832e-01
-9.602937e-01
-6.845471e-01
-2.181432e-01
3.090170e-01
7.501111e-01
9.822873e-01
9.408808e-01
6.374240e-01
1.564345e-01
-3.681246e-01
-7.901550e-01
-9.921147e-01
-9.177546e-01
-5.877853e-01
-9.410831e-02
4.257793e-01
8.270806e-01
9.980267e-01
8.910065e-01
5.358268e-01
3.141076e-02
-4.817537e-01
-8.607420e-01
-1
-8.607420e-01
-4.817537e-01
3.141076e-02
5.358268e-01
8.910065e-01
9.980267e-01
8.270806e-01
4.257793e-01
-9.410831e-02
-5.877853e-01
-9.177546e-01
-9.921147e-01
-7.901550e-01
-3.681246e-01
1.564345e-01
6.374240e-01
9.408808e-01
9.822873e-01
7.501111e-01
3.090170e-01
-2.181432e-01
-6.845471e-01
-9.602937e-01
-9.685832e-01
-7.071068e-01
-2.486899e-01
2.789911e-01
Assuming the file is not exceedingly large, the simplest way would probably be read the entire file & index the output to your desired lines.
line = [6 11 111 127];
fileID = fopen('file.txt');
C = textscan(fileID,'%s','delimiter','\n');
fclose(fileID);
disp(C{1}(line))

Matlab - read unstructured file

I'm quite new with Matlab and I've been searching, unsucessfully, for the following issue: I have an unstructure txt file, with several rows I don't need, but there are a number of rows inside that file that have an structured format. I've been researching how to "load" the file to edit it, but cannot find anything.
Since i don't know if I was clear, let me show you the content in the file:
8782 PROJCS["UTM-39",GEOGC.......
1 676135.67755473056 2673731.9365976951 -15 0
2 663999.99999999302 2717629.9999999981 -14.00231124135486 3
3 709999.99999999162 2707679.2185399458 -10 2
4 679972.20003752434 2674637.5679516452 0.070000000000000007 1
5 676124.87132483651 2674327.3183533219 -18.94794942571912 0
6 682614.20527054626 2671000.0000000549 -1.6383425512446661 0
...........
8780 682247.4593014461 2676571.1515358146 0.1541080392180566 0
8781 695426.98657108378 2698111.6168302582 -8.5039945992245904 0
8782 674723.80100125563 2675133.5486935056 -19.920312922947179 0
16997 3 21
1 2147 658 590
2 1855 2529 5623
.........
I'd appreciate if someone can just tell me if there is the possibility to open the file to later load only the rows starting with 1 to the one starting with 8782. First row and all the others are not important.
I know than manually copy and paste to a new file would be a solution, but I'd like to know about the possibility to read the file and edit it for other ideas I have.
Thanks!
% Now lines{i} is the string of the i'th line.
lines = strsplit(fileread('filename'), '\n')
% Now elements{i}{j} is the j'th field of the i'th line.
elements = arrayfun(#(x){strsplit(x{1}, ' ')}, lines)
% Remove the first row:
elements(1) = []
% Take the first several rows:
n_rows = 8782
elements = elements(1:n_rows)
Or if the number of rows you need to take is not fixed, you can replace the last two statements above by:
firsts = arrayfun(#(x)str2num(x{1}{1}), elements)
n_rows = find((firsts(2:end) - firsts(1:end-1)) ~= 1, 1, 'first')
elements = elements(1:n_rows)

Creating open-high-low-close (ohlc) bars from tick data in Matlab

I have a CSV file 'XPQ12.csv' of futures tick data in the following form:
20090312 30:14.0 717.25 1 E
20090312 30:15.0 718.47 1 E
20090312 30:17.0 717.25 1 E
20090312 30:32.0 718.42 1 E
20090312 30:49.0 715.32 1 E
20090312 30:58.0 717.57 1 E
20090312 31:06.0 716.65 3 E
20090312 31:12.0 718.35 2 E
20090312 31:45.0 721.14 1 E
20090312 31:52.0 719.24 1 E
20090312 32:11.0 717.02 6 E
20090312 32:29.0 717.14 1 E
20090312 32:35.0 717.34 1 E
20090312 32:55.0 717.26 1 E
(The first column is the yearmonthdate, the second column is the minute:second:tenthofsecond, the third column is the price, the fourth column is the number of contracts traded, and the fifth indicates if the trade was electronic or in a pit). In my actual data set, I may have thousands of price quotes within any given minute.
I read the file using the following code:
fid = fopen('C:\Program Files\MATLAB\R2013a\XPQ12.csv','r');
[c] = fscanf(fid, '%d,%d:%d.%d,%f,%d,%c')
Which outputs:
20090312
30
14
0
717.25
1
69
20090312
30
15
0
718.47
3
69
.
.
.
(the 69s are the matlab representation for E I believe)
Now I want to cut this up into one minute ohlc bars, so that for each minute, I record what the first, highest, lowest, and last price was within that minute. I'd really like to know the best way to go about this.
My original idea was to store the sequence of minutes in a vector d, and while working through the data, each time the number at the end of d changed I would record the corresponding price as an open, record the previous price as a close for the last bar, and find the largest and smallest prices within each open and close.
c(2) is the first minute, so I said:
d(1)=c(2);
and then noting that I'd always be counting by 7 before getting to the next minute, I said:
Nrows = numel(textread('XPQ12.csv','%1c%*[^\n]')); % counts rows in file
for i=1:Nrows
if mod(i-2,7)== 0;
d(end+1)=c(i);
end
end
which should fill up d with all the minutes:
30
30
30
30
30
30
31
31
31
31
32
32
32
32
in the case of the example data. I'm kind of lost what to do from here, or if what I'm doing is on the right track.
From where you are:
Minutes = c(2:7:end);
MinuteValues=unique(Minutes);
Prices = c(5:7:end);
if (length(Prices)>length(Minutes))
Prices=Prices(1:length(Minutes));
elseif (length(Prices)<length(Minutes))
Minutes=Minutes(1:length(Prices));
OverflowValues=1+find(Minutes(2:end)==0 & Minutes(1:end-1)==59);
for v=length(OverflowValues):-1:1
Minutes(OverflowValues(v):end)=Minutes(OverflowValues(v):end)+60;
end
Highs=zeros(1,length(MinuteValues));
Lows=zeros(1,length(MinuteValues));
First=zeros(1,length(MinuteValues));
Last=zeros(1,length(MinuteValues));
for v=1:length(MinuteValues)
Highs(v) = max(Prices(Minutes==MinuteValues(v)));
Lows(v) = min(Prices(Minutes==MinuteValues(v)));
First(v) = Prices(find(Minutes==MinuteVales(v),1,'first'));
Last(v) = Prices(find(Minutes==MinuteVales(v),1,'last'));
end
Using textread would make this easier for you, as mentioned.
(If you are lost at this stage, I wouldn't find accumarray as mentioned in the comments is the best place to start!)
By the way, this is assuming that minutes increases above 60 and you don't have hours in there somewhere. Otherwise this won't work at all.