Remove Spikes from Periodic Data with MATLAB - matlab

I have some data which is time-stamped by a NMEA GPS string that I decode in order to obtain the single data point Year, Month, Day, etcetera.
The problem is is that in few occasions the GPS (probably due to some signal loss) goes boinks and it spits out very very wrong stuff. This generates spikes in the time-stamp data as you can see from the attached picture which plots the vector of Days as outputted by the GPS.
As you can see, the GPS data are generally well behaved, and the days go between 1 and 30/31 each month before falling back to 1 at the next month. In certain moments though, the GPS spits out a random day.
I tried all the standard MATLAB functions for despiking (such as medfilt1 and findpeaks), but either they are not suited to the task, either I do not know how to set them up properly.
My other idea was to loop over differences between adjacent elements, but the vector is so big that the computer cannot really handle it.
Is there any vectorized way to go down such a road and detect those spikes?
Thanks so much!

you need to filter your data using a simple low pass to get rid of the outliers:
windowSize = 5;
b = (1/windowSize)*ones(1,windowSize);
a = 1;
FILTERED_DATA = filter(b,a,YOUR_DATA);
just play a bit with the windowSize until you get the smoothness you want.

Related

How to persist previous data point when time range doesn't include a data point

TL;DR:
Can I get Grafana to show me the previous data point, when the currently selected time period does not have a data point? I have an example which sounds ridiculous, but at least it's simple to understand: I send data every 1 minute, and I wish to zoom into the last 30 seconds, and still see data. You may ask "why not just zoom out to 2 minutes" but the reason is that other data is on the same graph that has updated more often, and I wish to compare with that data. Also, for the more lengthy reasons below.
If not, how can I achieve what I want to achieve, see below?
Context
For a few years, I have been monitoring the water level in three of our basement sumps (which have pumps installed) by sending this data from Node-RED to InfluxDB, then visualising the sump levels in Grafana. I have set up three waterproof ultrasonic distance sensors, each pointed down a pipe that is inserted vertically into each sump. The water fills the pipe and the distance sensor, connected to an Arduino, sends me the reading. The Arduino also has other sensors connected (temp / humidity) and deals with distance calibrations to calculate the percent full of each sump. All this data is sent to Node-RED. In total, I am sending 4 values per sump: distance measurement in mm, percent full, temp, humidity. So that's 12 fields. Data is sent every 2 seconds, because I wished to have a reasonably high resolution to see nice curves in graphs.
Also I decided to store all this data so that I could later troubleshoot issues (we have had sewage floods resulting in water not being able to be pumped away, etc...) and design some warning systems for these issues based on data.
Storing 12 values for every 2 seconds, over the course of a number of years, takes up a lot of space (8GB).
Nature of the data
Storing this resolution of data has also helped me be able to describe the nature of the data. I will do so here.
(1) Non-meaningful NOISE (see below) - the percent-full reading goes up and down by 1 or 2 percent every couple of seconds:
(2) Meaningful DRIFT (see below) - I don't mean sensor drift, I am referring to actual water levels changing slowly over time, e.g. over 1 day or 1 week. Perhaps condensation on the walls drips down into the sump, or water evaporates from the sump, and the value can waver by a few percent over the course of a day. Each sump has slightly different characteristics.
(3) Meaningful MONITORING DATA - during wet weather, depending on rainfall amount, the sumps fill up over the course of say 30 mins to 3 hours. Then the pumps run and the water level drops again, wavers a bit, then the sumps continue to fill up. If the rain stopped, you can see a lovely curve as the water fills in progressively more slowly (see the green line below):
Solution to downsample
I know Influx has its own downsampling possibilities, however because of the nature of the data (which can hardly vary for 2 months but when it does, I really need to capture it in detail), I don't think lowering the sample rate is a great idea.
I have some understanding of digital filters (e.g. low pass etc) but have never programmed one myself. So I have written a basic filter in javascript (a Node-RED function) to filter the data in realtime as follows: only send each reading when it has changed from the previous one by x amount. (And update the previous one, when that occurs.)
This has already vastly reduced the amount of data being stored, and I can vary x to filter out noise shown in my first graph above, at the expense of resolution when the pumps run. Even if I set the x value to 2, it still vastly reduces data over long periods of dry weather.
So - onto my problem! Now data is not being logged to InfluxDB unless there is some meaningful change. Which means that when I zoom in to e.g. 15 minute timeframe of data, there is nothing to see.
Grafana does have the option of "fill (previous)" but this draws a line between points on the existing graph, rather than showing the previous data as if it hasn't changed since that point. Now my grafana dashboard looks a bit sad :(
One proposed solution is, in addition to sending "delta" data, send "summary" data, that is - send a full suite of data every 1 minute regardless of whether data changed or not. But then we get noise back again, and pointless storage.
Any other ideas?

What's the most efficient way to use large data from Excel in my C# code?

I ran a computer simulation for my Pendulum, to measure time taken to reach the lowest point, for every velocity and every angle.
As you can imagine there is a lot of data, thousands of lines for all angles and velocity.
On every frame, I will be measuring the velocity and angle of the pendulum, and will look for the closest data in my Excel spreadsheet.
How can I go about this to make sure it's not too CPU-intensive?
Should I create a massive array where every element corresponds to a certain angle: for example, myArray[30] will be for all velocities and times for all my data between 30.0 degrees and 30.999. (That way it will be avoid lots of if statements)
Or should I keep everything in my Excel spreadsheet?
Any suggestion?
The best approach in my opinion would be dividing your data into intervals based on distribution since you have to access that data in every frame. Then when you measure the velocity and angle you can go look for the interval and access only that part of your data.
I would find maximum and minimum of your data points while importing to Unity and then divide that part based on (maximum - minimum) / NumOfIntervals. Lets say your interval size is 5 for each Angle. When you got an angle of 17 you can do (int)15/5 = 3(Assuming indexes start from zero) and go for third item in your structure. This can be a dictionary or Array of an Arbitrary class instances based on your data.
I can try to help further if you can share the structure of your data. But in my opinion evenly distribution of data to every interval is important.

Step function between two alternating values

I'm looking to plot connectivity over time to see connection duration and amount of disconnects. Here is the graph I currently have.
This graph is misleading though. It makes it seem like the machine is slowly disconnecting between Sep 29th and Oct 3rd when it reality it is connected that whole time before a brief disconnection.
I'd like the line to remain at 1 / connected until it is not connected.
Thanks in advance for any help!
Tableau is doing this because it draws a line between all data points in the view along the x-axis. I'm assuming you don't have a 1 before October 3rd, so it just slowly slopes to the next point which happens to be a 0.
There are few approaches you could use to visualize this type of data. If the system is always connected, when not disconnected, then you could just visualize points that are disconnects. Additionally, switching to a bar plot may sometimes communicate your intent better than a line in this situation.
Depending on the structure, and assumptions of how the disconnected/connects are ordered in your underlying data, you could create a table calculation that uses the last value in the partition to determine it's value. (connected vs. disconnected)
You could also resample the data to turn your irregular time series into something that is regular. This would add a large number of data points, depending on the time interval you are looking for. (1 million for 15 days at 1 second)
A few suggestions:
Clarify the units on your x-axis: days? hours? secs?
Try using dots instead of a line connector
& Flip the visualization around: plot a transform of your data where 'connected'=0 and 'disconnected'=1

serial input buffer size Matlab

I'm trying to read a lot of data coming from my Arduino, I've set my input buffer to 500000 to make sure that it can handle all these data. My data are 4 sensors readings each samples at 250 Hz. With the default buffer size (712), I used to get snags when I plot the readings in real time and the samples get disordered which makes the plot go crazy. I solved this by increasing the buffer size to 50000. But now, this will work for a while but if I want to run it for 15 minutes, I get the same misbehavior after 5 minutes, with the addition that the plotting gets slower. I do have some processing code along with the live plotting but it shouldn't be like this with such a bi buffer. I want to know whether the buffer will contain all the data from the beginning until it's full or will it keep erasing older data when it gets full (knowing that I already saved it in another vector and plotted it). I truly don't understand why this keeps happening.
kind regards
I.H
When the buffer gets full, once you get new data it erases the old data. The behavior you are seeing is because your processing and your plotting is slower than the flow of the data.
Try to make sure that you optimize you processing
Make sure that for plotting is done by "drawnow". Like this you are sure that if there is anything in the queue it is not executed
Try to avoid saving and keeping all the data
If the problem is still there, you can try to implement a timer to make sure that you are consistent with reading your data

Measuring Frequency of Square wave in MATLAB using USB 1024HLS

I'm trying to measure the frequency of a square wave which is read through a USB 1024 HLS Daq module through MATLAB. What I've done is create a loop which reads 100 values from the digitial input and that gives me vector of 0's and 1's. There is also a timer in this loop which measures the duration for which the loop runs.
After getting the vector, I then count the number of 1's and then use frequency = num_transitions/time to give me the frequency. However, this doesn't seem to work well :( I keep getting different frequencies for different number of iterations of the loop. Any suggestions?
I would suggest trying the following code:
vec = ...(the 100-element vector of digital values)...
dur = ...(the time required to collect the above vector)...
edges = find(diff(vec)); % Finds the indices of transitions between 0 and 1
period = 2*mean(diff(edges)); % Finds the mean period, in number of samples
frequency = 100/(dur*period);
First, the code finds the indices of the transitions from 0 to 1 or 1 to 0. Next, the differences between these indices are computed and averaged, giving the average duration (in number of samples) for the lengths of zeroes and ones. Multiplying this number by two then gives the average period (in number of samples) of the square wave. This number is then multiplied by dur/100 to get the period in whatever the time units of dur are (i.e. seconds, milliseconds, etc.). Taking the reciprocal then gives the average frequency.
One additional caveat: in order to get a good estimate of the frequency, you might have to make sure the 100 samples you collect contain at least a few repeated periods.
Functions of interest used above: DIFF, FIND, MEAN
First of all, you have to make sure that your 100 samples contain at least one full period of the signal, otherwise you'll get false results. You need a good compromise of sample rate (i.e. the more samples per period you have the better the measurement is) and and number of samples.
To be really precise, you should either have a timestamp associated with every measurement (as you usually can't be sure that you get equidistant time spacing in the for loop) or perhaps it's possible to switch your USB module in some "running" mode which doesn't only get one sample at a time but a complete waveform with fixed samplerate.
Concerning the calculation of the frequency, gnovice already pointed out the right way. If you have individual timestamps (in seconds), the following changes are necessary:
tst = ...(the timestamps associated with every sample)...
period = 2*mean(diff(tst(edges)));
frequency = 1/period;
I can't figure out the problem, but if the boolean vector were v then,
frequency = sum(v)/time_to_give_me_the_frequency
Based on your description, it doesn't sound like a problem with the software, UNLESS you are using the Windows system timer, which is notoriously inaccurate (it is only accurate to about 15 milliseconds).
There are high-resolution timers available in Windows, but I don't know how to use them in Matlab. If you have access to the .NET framework, the Stopwatch class has 1 microsecond accuracy (or better), as does the QueryPerformanceCounter API in Win32.
Other than that, you might have some jitter. There could be something in your signal chain that is causing false triggers, etc.
UPDATE: The following CodeProject article should solve the timing problem, if there is one. You should check the Matlab documentation of your version of Matlab to see if it has a native high-resolution timer. Otherwise, you can use this:
C++/Mex wrapper adds microsecond resolution timer to Matlab under WinXP
http://www.codeproject.com/KB/cpp/Matlab_Microsecond_Timer.aspx
mersenne31:
Thanks everyone for your responses. I have tried the solutions that gnovice and groovingandi mentioned and I'm sure they will work as soon as the timing issue is solved.
The code I've used is shown below:
for i=1:100 tic; value = getvalue(portCH); vector(i) = value(1); tst(i) = toc; % gets an individual time sample end
% to get the total time I put total_time = toc after the for loop
totaltime = sum(tst); edges = find(diff(vec)); % Finds the indices of transitions between 0 and 1 period = 2*mean(diff(edges)); % Finds the mean period, in number of samples frequency = 100/(totaltime*period);
The problem is that measuring the time for one sample doesn't really help because it is nearly the same for all samples. What is needed is, as groovingandi mentioned, some "running" mode which reads 100 samples for 3 seconds.
So something like for(3 seconds) and then we do the data capture. But I can't find anything like this. Is there any function in MATLAB that could do this?
This won't answer your question, but it's what I thought of after reading you question. square waves have infinite frequency. The FFT of a square wave it sin(x)/x, which goes from -inf to +inf.
Also try counting only the rising edges in matlab. You can quantize the signal to just +1 and 0, and then only increment the count when you see [0 1] slice of your vector.
OR
You can quantize, decimate, then just sum. This will only work if the each square pulse is the same length and your sampling frequency is constant. I think this one would be harder to do.