I'm looking for some suggestions on how to compress time-series data in MATLAB.
I have some data sets of pupil size, which were gathered during 1 sec with 25,000 points for each trial(I'm still not sure whether it is proper to call the data 'timeseries'). What I'd like to do from now is to compare them with another data, and I need to compress the number of points into about 10,000 or less, minimizing loss of the information. Are there any ways to do it?
I've tried to search how to do this, but all that I could find out was the way to smooth the data or to compress digital images, which were already done or not useful to me.
• The data sets simply consist of pupil diameter, changing as time goes. For each trial, 25,000 points of data were gathered during 1 sec, that means 1 point denotes the pupil diameter measured for 0.04msec. What I want to do is just to adjust this data into 0.1 msec/point; however, I'm not sure whether I can apply techniques like FFT in this case because it is the first time that I handle this kind of data. I appreciate your advices again.
A standard data compression technique with time series data is to take the fast Fourier transform and use the smallest frequency amplitudes to represent your data (calculate the power spectrum). You can compare data using these frequency amplitudes, though for the to lose the least amount of information you would want to use the frequencies with the largest amplitudes -- but then it becomes tricky to compare the data... Here is the standard Matlab tutorial on FFT. Some other possibilities include:
-ARMA models
-Wavelets
Check out this paper on the "SAX" method, a modern approach for time-series compression -- it also discusses classic time-series compression techniques.
Related
I have a lot of mixed files, and from those files I would like to keep those that correspond to EEG recordings.
The problem is that some EEG files are supposed to be recorded with 2, 4, 5 and 6 channels, and that there also is other data (of unknown type) that also have the same amount of channels (I already filtered those that don't have these amount of channels). And of course, there are thousands of files, so manual inspection isn't really an option.
So, is there some kind of metric or algorithm that helps me to differentiate non-EEG signal from EEG ones? With MATLAB if possible.
You might first try comparing statistics such as mean, std and kurtosis of each channel relative to known good channels. Since you have lots of files, you would probably want to the sample possible matches in a few places (e.g., compare seconds 1-3 and say 7-9) and then get a probability of match. Interesting question!
This is not really a MATLAB question, but...
What is (potentially) in the other recordings? Perhaps it is easier to filter them out than it is to identify EEG specifically.
I don't know of any metric or algorithm that can identify EEG recordings. One problem is that EEG recordings often contain artefacts to varying degrees such as muscle activity, line noise, and any other electromagnetic interference in the room that make the recording nonspecific to cortical activity.
The frequency spectrum may be one meaningful measure. Cortical activity usually tapers off in power towards, say, 40 Hz, and tends to have some peaks before. For example, depending on the placement of EEG electrodes and the task that was being performed, a peak in the alpha band (around 10 Hz) may be prominent. This is assuming there is little interference from artefacts.
Also the amplitude of the signal may be something to look at.
Perhaps you could take a number of measures and statistical properties (e.g. powers in different frequency bands, variance, drift/slope, amplitude, etc.), cluster them using e.g. t-SNE, and, assuming you get clearly separable clusters, identify a few samples from each cluster by hand to figure out which is the EEG cluster.
Review for removing periodicsI have a dataset that contains hourly wind speed data for 7 seven. I am trying to implement a forecasting model to the data and the review paper states that trimming of diurnal, weekly, monthly, and annual patterns in data significantly enhances estimation accuracy. They then follow along by using the fourier series to remove the periodic components as seen in the image. Any ideas on how i model this in matlab?
I am afraid this topic is not explained "urgently". What you need is a filter for the respective frequencies and a certain number of their harmonics. You can implement such a filter with an fft or directly with a IIR/FIR-formula.
FFT is faster than a IIR/FIR-implementation, but requires some care with respect to window function. Even if you do a "continuous" DFT, you will have a window function (like exponential or gaussian). The window function determines the bandwidth. The wider the window, the smaller the bandwidth. With an IIR/FIR-filter the bandwidth is encoded in the recursive parameters.
For suppressing single frequencies (like the 24hr weather signal) you need a notch-filter. This also requires you to specify a bandwidth, as you can see in the linked article. The smaller the bandwidth, the longer it will take (in time) until the filter has evolved to the frequency to suppress it. If you want the filter to recognize the amplitude of the 24hr-signal fast, then you need a wider bandwidth. But then however you are going to suppress also more frequencies slightly lower and slightly higher than 1/24hrs. It's a tradeoff.
If you also want to suppress several harmonics (like described in the paper) you have to combine several notch-filters in series. If you want to do it with FFT, you have to model the desired transfer function in the frequency space and since you can do this for all frequencies at once, so it's more efficient.
An easy but approximate way to get something similar to a notch-filter including all harmonics is with a Comb-filter. But it's an approximation, you have no control over the details of the transfer function. You could do that in Matlab by adding to the original a signal that is shifted by 12hrs. This is because a sinusoidal signal will cancel with one that is shifted by pi.
So you see, there's lots of possibilities for what you want.
Provided that I have a similar example:
where the blue data is my calculated/measured data and my red data is the given groundtruth data. The task is to get the similarity/closeness between the data and each of the given curves so that a classification can be done, it could also be possible to choose multiple classes if the results seem to be very close.
I can divide the problem in my mind to several subproblems:
The data range is no the same
The resolution of the calculated/measured data is higher than the ground-truth data
The calculated data has some bias/shift
The following questions come to my mind when trying to solve those problems
Is it better to fit the calculated/measured data first then attempting to solve the problem?
Would it be fine to use the data points as is and calculate the mean squared error of each curve assuming it is a fitting attempt and thus getting the best fit? What would be the effect of the bias/shift in this case?
What is a good approach to dealing with the data/range mismatch, by decreasing the number of samples for the higher sampled version or increasing the number of samples for the lower sampled data in the given range?
I have certain movement data acquired from motion capture system which I want to automatically choose which 5 signals are more alike.
Picture shows example of the particular data, all normalized to 100 samples due to the difference in speed.
Data set for knee flexion/extension
What I am looking for is some idea to actually compare the shapes of the curves.
The easiest solution is just to substract the "raw" curves and check which one has the smallest RMSE.
But looking at your data (which are smooth curves), another option is to use PLS or GMM to describe them. Then you can use RMSE to compute the error between your input curve and your database of curves and take the one with lowest error.
I am trying to resample/recreate already recorded data for plotting purposes. I thought this is best place to ask the question (besides dsp.se).
The data is sampled at high frequency, contains to much data points and not suitable for plotting in time domain (not enough memory). i want to sample it with minimal loss. The sampling interval of the resulting data doesn't need to be same (well it is again for plotting purposes, not analysis) although input data in equally sampled.
When we use the regular resample command from matlab/octave, it can distort stiff pieces of the curve.
What is the best approach here?
For reference I put two pictures found in tex.se)
First image is regular resample
Second image is a better resampled data that can well behave around peaks.
You should try this set of files from the File Exchange. It computes optimal lookup table based on either the maximum set of points or a given error. You can choose from natural, linear, or spline for the interpolation methods. Spline will have the smallest table size but is slower than linear. I don't use natural unless I have a really good reason.
Sincerely,
Jason