skip file headers in matlab - matlab

My file format is *.Vt2 , It' strong ground motion record.
It can be download at http://peer.berkeley.edu/smcat/search.html
choose a earthquake then download the record of any station .Data like this:
PEER STRONG MOTION DATABASE RECORD. PROCESSING BY PACIFIC ENGINEERING.
TAIWAN SMART1 (45) 11/14/86, SMART1 C00, EW
VELOCITY TIME HISTORY IN UNITS OF CM/S. FILTER POINTS: HP=0.1 Hz LP=25.0 Hz
NPTS= 4000, DT= .01000 SEC
.9437205E-03 .1497919E-01 .3328475E-01 .5111011E-01 .6865002E-01
.8659123E-01 .9975034E-01 .1072606E+00 .1168364E+00 .1217983E+00
.1135203E+00 .8993586E-01 .6435175E-01 .3819334E-01 .1840042E-01
What I want to do is skip the 4 headerlines and read the numbers (by row),then save them in a
NĂ—1 matrix M. But I don't know how to make it.
Any help is greet

Try:
x=importdata('filename.txt');
Your data will be:
x.data

I believe you might try this:
fid = fopen(filename,'r');
A = textscan(fid,'%f%f%f%f%f','Delimiter',',','headerLines',4);
data = cat(2,A{:});
I hope it helps.

Related

How to forecast electricity consumption using MATLAB's command "forecast"?

I have the data of electricity consumption of a region during the year of 2017. So I have to matrix 1x1, one with the month and other with the consumption. I want to use the command forecast to forecast the consumption of the first month of 2018, but I don't know how to do this even after reading the examples on MATLAB's help page.
Example:
data = {1166974.25000000, 1132479.36000000, 1137173.86000000, 1145853.58000000, 1118875.72000000, 1071456.85000000 ,1047171.87000000, 1071179.65000000 ,1077986.32000000 ,1112111.10000000, 1149668.47000000 ,1161649.19000000, 1175576.25000000 ,1126753.31000000 ,1204843.11000000 ,1183946.03000000, 1153080.36000000, 1120182.07000000, 1104726.03000000 ,1108110.02000000 ,1137729.28000000 ,1189699.45000000, 1252975.55000000, 1218118.20000000 ,1259580 ,1208193 ,1194430, 1244458, 1218867, 1205705 ,1177362, 1185584, 1164758, 1226991 ,1286044 ,1305312, 1360681.70000000 ,1332020 ,1306497.90000000 ,1299819.10000000 ,1316167.70000000 ,1246959.40000000 ,1256700.20000000 ,1266490.60000000, 1275642.90000000, 1358839.80000000, 1361440.10000000, 1398059.40000000};
data = [data{:}];
sys = ar(data,4)
K = 49;
p = forecast(sys,data,K);
plot(data,'b',p,'r'), legend('measured','forecasted')
Why does this not work?
I hope you found a solution to your problem. If you have not, maybe I can be of assistance.
MathWork's documentation of the function notes that the "PastData" entry (labeled "data" in your code) can either be an iddata object or an N x N_y matrix of doubles. Your implementation uses a matrix, so I decided to try out the code with an iddata object.
rawdat = [1166974.25000000, 1132479.36000000, 1137173.86000000, 1145853.58000000, 1118875.72000000, 1071456.85000000 ,1047171.87000000, 1071179.65000000 ,1077986.32000000 ,1112111.10000000, 1149668.47000000 ,1161649.19000000, 1175576.25000000 ,1126753.31000000 ,1204843.11000000 ,1183946.03000000, 1153080.36000000, 1120182.07000000, 1104726.03000000 ,1108110.02000000 ,1137729.28000000 ,1189699.45000000, 1252975.55000000, 1218118.20000000 ,1259580 ,1208193 ,1194430, 1244458, 1218867, 1205705 ,1177362, 1185584, 1164758, 1226991 ,1286044 ,1305312, 1360681.70000000 ,1332020 ,1306497.90000000 ,1299819.10000000 ,1316167.70000000 ,1246959.40000000 ,1256700.20000000 ,1266490.60000000, 1275642.90000000, 1358839.80000000, 1361440.10000000, 1398059.40000000];
data = iddata(rawdat',[]);
sys = ar(data,4);
K = 49;
p = forecast(sys,data,K);
plot(data,'b',p,'r'), legend('measured','forecasted')
Notice that I also changed the initial data's variable name and type.
The above code leads to the following figure.
Please update us. Thanks.

Issues fitting an exponential function

I'm having some serious issues fitting an exponential function (Beer-Lambert law) to my data. The optimization toolset function that I'm using produces terrible fits:
function [ Coefficients ] = fitting_new( Modified_Spectrum_Data,trajectory )
x_axis = trajectory;
fun = #(x,x_axis) (x(1)*exp((-x(2))*x_axis));
start = [Modified_Spectrum_Data(1) 0.05];
nlm = nlinfit(x_axis,Modified_Spectrum_Data,fun,start,opts);
Coefficients = nlm;
end
Data:
Modified_Spectrum_Data = [1.11111111111111, 1.08784976353957, 1.06352170731165, 1.04099672033640, 1.02649723285838, 1.00423806910703, 0.994116452961827, 0.975928861361604, 0.963081773802984, 0.953191520906905, 0.940636278551651, 0.930360007604054, 0.922259178548511, 0.916659345499171, 0.909149956799775, 0.901241601559703, 0.895375741449218, 0.893308346234150, 0.887985459843162, 0.884657500398024, 0.883852990694089, 0.877158499678129, 0.874817832833850, 0.875428444059047, 0.873170360623947, 0.871461252768665, 0.867913776631497, 0.866459074988087, 0.863819528471106, 0.863228815347816 ,0.864369045426273 ,0.860602502500599, 0.862653463581049, 0.861169231463016, 0.858658616425390, 0.864588421841755, 0.858668693409622, 0.857993365648639]
trajectory = [0.0043, 0.9996, 2.0007, 2.9994, 3.9996, 4.9994, 5.9981, 6.9978, 7.9997, 8.9992, 10.0007, 10.9993, 11.9994, 12.9992, 14.0001, 14.9968, 15.9972, 16.9996, 17.9996, 18.999, 19.9992, 20.9996, 21.9994, 23.0003, 23.9992, 24.999, 25.9987, 26.9986, 27.999, 28.9991, 29.999, 30.9987, 31.9976, 32.9979, 33.9983, 34.9988, 35.999, 36.9991]
I've tried using multiple different fitting functions and messing around with the options, but they don't seem to make too big of a difference. Additionally, I've tried changing the initial guess, but again that doesn't really make a difference.
Excel seems to be able to fit the data perfectly fine, but I have 900 rows of data I want to fit so doing it in Excel is not possible.
Any help would be greatly appreciated, thank you.
You'll want to use the cftool. Your data looks to follow a power law. Then choose 'Modified Spectrum Data' as your x axis and 'Trajectory' as your y. Select 'Power' from the drop down menu towards the top of the GUI.
Modified_Spectrum_Data = [1.11111111111111, 1.08784976353957, 1.06352170731165, 1.04099672033640, 1.02649723285838, 1.00423806910703, 0.994116452961827, 0.975928861361604, 0.963081773802984, 0.953191520906905, 0.940636278551651, 0.930360007604054, 0.922259178548511, 0.916659345499171, 0.909149956799775, 0.901241601559703, 0.895375741449218, 0.893308346234150, 0.887985459843162, 0.884657500398024, 0.883852990694089, 0.877158499678129, 0.874817832833850, 0.875428444059047, 0.873170360623947, 0.871461252768665, 0.867913776631497, 0.866459074988087, 0.863819528471106, 0.863228815347816 ,0.864369045426273 ,0.860602502500599, 0.862653463581049, 0.861169231463016, 0.858658616425390, 0.864588421841755, 0.858668693409622, 0.857993365648639]
trajectory = [0.0043, 0.9996, 2.0007, 2.9994, 3.9996, 4.9994, 5.9981, 6.9978, 7.9997, 8.9992, 10.0007, 10.9993, 11.9994, 12.9992, 14.0001, 14.9968, 15.9972, 16.9996, 17.9996, 18.999, 19.9992, 20.9996, 21.9994, 23.0003, 23.9992, 24.999, 25.9987, 26.9986, 27.999, 28.9991, 29.999, 30.9987, 31.9976, 32.9979, 33.9983, 34.9988, 35.999, 36.9991]
cftool
Screenshot:
For more information on the curve fitting (cftool), see: https://www.mathworks.com/help/curvefit/curvefitting-app.html

Using low pass filter in matlab to get same endpoints of the data

This is an extension of my previous question: https://dsp.stackexchange.com/questions/28095/choosing-low-pass-filter-parameters
I am recording people from an overheard camera. I have tracks of each's head using some software. I want to periodicity from tracks due to head wobbling.
I apply low-pass butterworth filter. I want the starting point and ending point of the filtered to be same as unfiltered tracks.
Data:
K>> [xcor_i,ycor_i ]
ans =
-101.7000 -77.4040
-102.4200 -77.4040
-103.6600 -77.4040
-103.9300 -76.6720
-103.9900 -76.5130
-104.0000 -76.4780
-105.0800 -76.4710
-106.0400 -77.5660
-106.2500 -77.8050
-106.2900 -77.8570
-106.3000 -77.8680
-106.3000 -77.8710
-107.7500 -78.9680
-108.0600 -79.2070
-108.1200 -79.2590
-109.9500 -80.3680
-111.4200 -80.6090
-112.8200 -81.7590
-113.8500 -82.3750
-115.1500 -83.2410
-116.1500 -83.4290
-116.3700 -83.8360
-117.5000 -84.2910
-117.7400 -84.3890
-118.8800 -84.7770
-119.8400 -85.2270
-121.1400 -85.3250
-123.2200 -84.9800
-125.4700 -85.2710
-127.0400 -85.7000
-128.8200 -85.7930
-130.6500 -85.8130
-132.4900 -85.8180
-134.3300 -86.5500
-136.1700 -87.0760
-137.6500 -86.0920
-138.6900 -86.9760
-140.3600 -87.9000
-142.1600 -88.4660
-144.7200 -89.3210
Code(answer by #SleuthEye):
dataOut_x = xcor_i(1)+filter(b,a,xcor_i-xcor_i(1));
dataOut_y = ycor_i(1)+filter(b,a,ycor_i-ycor_i(1));
Output:
In the above example, the endpoint(to the left) is different for filtered and unfiltered tracks. How can I ensure it is same?
Your question is pretty ambiguous, and doesn't really have a specific question. I'm assuming you want to have your filtered data start at the same points as the measured data, but are unsure why this is not happening already, and how to do so.
A low pass filter is a filter which lowers the effect of rapid changes. One way of doing this, and the method which appears to be used here, is by using a rolling average. A rolling average is simply an average (mean) of the previous data points. It looks like you are using a rolling average of 5 data points. Therefore you need five points of raw data before your filter will give you a single data point.
-101.7000 -77.4040 }
-102.4200 -77.4040 } }
-103.6600 -77.4040 } }
-103.9300 -76.6720 } }
-103.9900 -76.5130 } Filter point 1. }
-104.0000 -76.4780 } Filter point 2.
-105.0800 -76.4710
-106.0400 -77.5660
-106.2500 -77.8050
-106.2900 -77.8570
-106.3000 -77.8680
-106.3000 -77.8710
In order to solve this problem, you could just append the first data point to the data set four times, as this means that the filter will produce the same number of points. This is a pretty rough solution, however, as you are creating new data. This could be achieved quite simply, for example if your dataset is called myArray:
firstEntry = myArray(1,:);
myNewArray = [firstEntry; firstEntry; firstEntry; firstEntry; myArray];
This will create four data points equal to your first data point, which should then allow you to apply the low pass filter to your data, and have it start at the same point.
Hope this helps, although it's worth bearing in mind that filtering ALWAYS results in a loss of data.
Because you don't want to implement it but want someone else to:
The theory as above is correct, but instead you need to add 2 values at the end of your vectors:
x_last = xcor_i(end);
y_last = ycor_i(end);
xcor_i = [xcor_i;x_last;x_last];
ycor_i = [ycor_i;y_last;y_last];
This gives the following:
As you can see the ends are pretty close to being the same now.

Massive CSV file into Matlab

I have a CSV file 1.6 GB large, that I need to feed into matlab. I will have to do this frequently and I need it to run quickly. The file is of the form:
20111205 00:00.2 99.18 6 E
20111205 00:00.2 99.18 5 E
20111205 00:00.2 99.18 1 E
20111205 00:00.2 99.195 5 E
20111205 00:00.2 99.195 5 E
20111205 01:27.0 99.19 5 E
20111205 02:01.4 99.185 1 E
20111205 02:01.4 99.185 1 E
20111205 02:01.4 99.185 1 E
20111205 02:01.4 99.185 1 E
The code I have right now is the following:
tic;
format long g
fid = fopen('C:\Program Files\MATLAB\R2013a\EDU13.csv','r');
[c] = fscanf(fid, '%d,%d:%d.%d,%f,%d,%c');
c = reshape(c, 7, length(c)/7)
toc;
But this is far too slow. I would appreciate a method of getting this CSV file into matlab in the most efficient manner possible. Thank you!
Consider using a binary file format. Binary files are much smaller and don't need to be converted by MATLAB into the binary format. Hence they are much faster to read and write. They may also be more accurate (precision may be higher).
http://www.mathworks.com.au/help/matlab/ref/fread.html
The recommended syntax is textscan (http://www.mathworks.com/help/matlab/ref/textscan.html)
Your code would look like this:
fid = fopen('C:\Program Files\MATLAB\R2013a\EDU13.csv','r');
c = textscan(fid, '%d,%d:%d.%d,%f,%d,%c');
fclose(fid);
You end up with a cell array... whether it's worth converting that to another shape really depends on how you want to access the data afterwards.
It is quite likely that this would be faster if you include a loop that allows you to use a smaller, fixed amount of memory for much of the operation. One problem with reading large files is the fact that you don't know ahead of time how big it will be - and that very likely means that Matlab guesses the amount of memory it needs, and frequently has to rescale. That is a very slow operation - if it happens every 1MB, say, then it copies 1 MB once, next 2 MB, then again 3 MB, etc - as you can see it is quadratic in the size of the array.
If instead you allocate a fixed amount of memory for the final result, and process in smaller batches, you avoid all that overhead. I'm pretty sure it will be much faster - but you would have to experiment a bit with the block size. That would look something like this:
block = 1000;
Nlines = 35E6;
fid = fopen('C:\Program Files\MATLAB\R2013a\EDU13.csv','r');
c = struct(field1, field2, fieldn, value); %... initialize structure array or other storage for c ...
c_offset = 0;
while ~feof(fid)
temp = textscan(fid, '%d,%d:%d.%d,%f,%d,%c', block);
bt = size(temp, 1); % first dimension - should be `block`, except for last loop
%... extract, process, store in c(c_offset + (1:bt))...
c_offset = c_offset + bt;
end
fclose(fid);
Inspired by #Axon's answer, I implemented a "fast" C program to convert the file to binary, then read it in using Matlab's fread function. Spoiler alert: reading is then 20x faster... although the initial conversion takes a little bit of time.
To make the job in Matlab easier, and the file size smaller, I am converting each of the number fields into an int16 (short integer). For the first field - which looks like a yyyymmdd field - that involves splitting into two smaller numbers; similarly the decimal numbers are converted to two short integers (given the apparent range I think that is valid). All this is recognizing that "to really optimize, you must really know your problem" - so if assumptions are invalid, the results will be too.
Here is the C code:
#include <stdio.h>
int main(){
FILE *fp, *fo;
long int ld1;
int d2, d3, d4, d5, d6, d7;
short int buf[9];
char c8;
int n;
short int year, monthday;
fp = fopen("bigdata.txt", "r");
fo = fopen("bigdata.bin", "wb");
if (fp == NULL || fo == NULL) {
printf("unable to open file\n");
return 1;
}
while(!feof(fp)) {
n = fscanf(fp, "%ld %d:%d.%d %d.%d %d %c\n", \
&ld1, &d2, &d3, &d4, &d5, &d6, &d7, &c8);
year = d1 / 10000;
monthday = d1 - 10000 * year;
// move everything into buffer for single call to fwrite:
buf[0] = year;
buf[1] = monthday;
buf[2] = d2;
buf[3] = d3;
buf[4] = d4;
buf[5] = d5;
buf[6] = d6;
buf[7] = d7;
buf[8] = c8;
fwrite(buf, sizeof(short int), 9, fo);
}
fclose(fp);
fclose(fo);
return 0;
}
The resulting file is about half the size of the original - which is encouraging and will speed up access. Note that it would be a good idea if the output file could be written to a different disk than the input file - it really helps keep data streaming without a lot of time wasted in seek operations.
Benchmark: using a file of 2 M lines as input, this ran in about 2 seconds (same disk). The resulting binary file is read in Matlab with the following:
tic
fid = fopen('bigdata.bin');
d = fread(fid, 'int16');
d = reshape(d, 9, []);
toc
Of course, now if you want to recover the numbers as floating point numbers, you will have to do a little bit of work; but I think it's worth it. One possible problem you will have to solve is the situation where the value after the decimal point has a different number of digits: converting (a,b) into float isn't as simple as "a + b/100" when b > 100... "exercise for the student"?
A little benchmarking: The above code took about 0.4 seconds. By comparison, my first suggestion with textread took about 9 seconds on the same file; and your original code took a little over 11 seconds. The difference may get bigger when the file gets bigger.
If you do this a lot (as you said), it clearly is worth converting your files once to binary format, and using them that way. Especially if the file needs to be converted only once, and read many times, the savings will be considerable.
update
I repeated the benchmark with a 13M line file. The conversion took 13 seconds, the binary read < 3 seconds. By contrast each of the other two methods took over a minute (textscan: 61s; fscanf: 77s). It seems that things are scaling linearly (file size 470M text, 240M binary)

IDL and MatLab getting strange values from NetCDF file

I have a NetCDF file, which contains data representing total precipitation across the globe over several months (so it's stored in a three dimensional array). I first ensured that the data was sensible, and the way it was formed, both in XConv and ncdump. All looks sensible - values vary from very small (~10^-10 - this makes sense, as this is model data, and effectively represents zero) to about 5x10^-3.
The problems start when I try to handle this data in IDL or MatLab. The arrays generated in these programs are full of huge negative numbers such as -4x10^4, with occasional huge positive numbers, such as 5000. Strangely, looking at a plot of the data in MatLab with respect to latitude and longitude (at a specific time), the pattern of rainfall looks sensible, but the values are just completely wrong.
In IDL, I'm reading the file in to write it to a text file so it can be handled by some software that takes very basic text files. Here's the code I'm using:
PRO nao_heaps
address = '/Users/levyadmin/Downloads/'
file_base = 'output'
ncid = ncdf_open(address + file_base + '.nc')
MONTHS=['january','february','march','april','may','june','july','august','september','october','november','december']
varid_field = ncdf_varid(ncid, "tp")
varid_lon = ncdf_varid(ncid, "longitude")
varid_lat = ncdf_varid(ncid, "latitude")
varid_time = ncdf_varid(ncid, "time")
ncdf_varget,ncid, varid_field, total_precip
ncdf_varget,ncid, varid_lat, lats
ncdf_varget,ncid, varid_lon, lons
ncdf_varget,ncid, varid_time, time
ncdf_close,ncid
lats = reform(lats)
lons = reform(lons)
time = reform(time)
total_precip = reform(total_precip)
total_precip = total_precip*1000. ;put in mm
noLats=(size(lats))(1)
noLons=(size(lons))(1)
noMonths=(size(time))(1)
; the data may not be an integer number of years (otherwise we could make this next loop cleaner)
av_precip=fltarr(noLons,noLats,12)
for month=0, 11 do begin
year = 0
while ( (year*12) + month lt noMonths ) do begin
av_precip(*,*,month) = av_precip(*,*,month) + total_precip(*,*, (year*12)+month )
year++
endwhile
av_precip(*,*,month) = av_precip(*,*,month)/year
endfor
fname = address + file_base + '.dat'
OPENW,1,fname
PRINTF,1,'longitude'
PRINTF,1,lons
PRINTF,1,'latitude'
PRINTF,1,lats
for month=0,11 do begin
PRINTF,1,MONTHS(month)
PRINTF,1,av_precip(*,*,month)
endfor
CLOSE,1
END
Anyone have any ideas why I'm getting such strange values in MatLab and IDL?!
AH! Found the answer. NetCDF files use an offset, and a scale factor for the data to keep the size of the file to a minimum. To get the correct values, I simply need to:
total_precip = offset + (scale_factor * total_precip) ;put into correct range
At present I'm getting the scale factor and offset from ncdump, and hard coding them into my IDL program, but does anyone know how I can get them dynamically in my IDL code..?