Step function between two alternating values - visualization

I'm looking to plot connectivity over time to see connection duration and amount of disconnects. Here is the graph I currently have.
This graph is misleading though. It makes it seem like the machine is slowly disconnecting between Sep 29th and Oct 3rd when it reality it is connected that whole time before a brief disconnection.
I'd like the line to remain at 1 / connected until it is not connected.
Thanks in advance for any help!

Tableau is doing this because it draws a line between all data points in the view along the x-axis. I'm assuming you don't have a 1 before October 3rd, so it just slowly slopes to the next point which happens to be a 0.
There are few approaches you could use to visualize this type of data. If the system is always connected, when not disconnected, then you could just visualize points that are disconnects. Additionally, switching to a bar plot may sometimes communicate your intent better than a line in this situation.
Depending on the structure, and assumptions of how the disconnected/connects are ordered in your underlying data, you could create a table calculation that uses the last value in the partition to determine it's value. (connected vs. disconnected)
You could also resample the data to turn your irregular time series into something that is regular. This would add a large number of data points, depending on the time interval you are looking for. (1 million for 15 days at 1 second)

A few suggestions:
Clarify the units on your x-axis: days? hours? secs?
Try using dots instead of a line connector
& Flip the visualization around: plot a transform of your data where 'connected'=0 and 'disconnected'=1

Related

How to persist previous data point when time range doesn't include a data point

TL;DR:
Can I get Grafana to show me the previous data point, when the currently selected time period does not have a data point? I have an example which sounds ridiculous, but at least it's simple to understand: I send data every 1 minute, and I wish to zoom into the last 30 seconds, and still see data. You may ask "why not just zoom out to 2 minutes" but the reason is that other data is on the same graph that has updated more often, and I wish to compare with that data. Also, for the more lengthy reasons below.
If not, how can I achieve what I want to achieve, see below?
Context
For a few years, I have been monitoring the water level in three of our basement sumps (which have pumps installed) by sending this data from Node-RED to InfluxDB, then visualising the sump levels in Grafana. I have set up three waterproof ultrasonic distance sensors, each pointed down a pipe that is inserted vertically into each sump. The water fills the pipe and the distance sensor, connected to an Arduino, sends me the reading. The Arduino also has other sensors connected (temp / humidity) and deals with distance calibrations to calculate the percent full of each sump. All this data is sent to Node-RED. In total, I am sending 4 values per sump: distance measurement in mm, percent full, temp, humidity. So that's 12 fields. Data is sent every 2 seconds, because I wished to have a reasonably high resolution to see nice curves in graphs.
Also I decided to store all this data so that I could later troubleshoot issues (we have had sewage floods resulting in water not being able to be pumped away, etc...) and design some warning systems for these issues based on data.
Storing 12 values for every 2 seconds, over the course of a number of years, takes up a lot of space (8GB).
Nature of the data
Storing this resolution of data has also helped me be able to describe the nature of the data. I will do so here.
(1) Non-meaningful NOISE (see below) - the percent-full reading goes up and down by 1 or 2 percent every couple of seconds:
(2) Meaningful DRIFT (see below) - I don't mean sensor drift, I am referring to actual water levels changing slowly over time, e.g. over 1 day or 1 week. Perhaps condensation on the walls drips down into the sump, or water evaporates from the sump, and the value can waver by a few percent over the course of a day. Each sump has slightly different characteristics.
(3) Meaningful MONITORING DATA - during wet weather, depending on rainfall amount, the sumps fill up over the course of say 30 mins to 3 hours. Then the pumps run and the water level drops again, wavers a bit, then the sumps continue to fill up. If the rain stopped, you can see a lovely curve as the water fills in progressively more slowly (see the green line below):
Solution to downsample
I know Influx has its own downsampling possibilities, however because of the nature of the data (which can hardly vary for 2 months but when it does, I really need to capture it in detail), I don't think lowering the sample rate is a great idea.
I have some understanding of digital filters (e.g. low pass etc) but have never programmed one myself. So I have written a basic filter in javascript (a Node-RED function) to filter the data in realtime as follows: only send each reading when it has changed from the previous one by x amount. (And update the previous one, when that occurs.)
This has already vastly reduced the amount of data being stored, and I can vary x to filter out noise shown in my first graph above, at the expense of resolution when the pumps run. Even if I set the x value to 2, it still vastly reduces data over long periods of dry weather.
So - onto my problem! Now data is not being logged to InfluxDB unless there is some meaningful change. Which means that when I zoom in to e.g. 15 minute timeframe of data, there is nothing to see.
Grafana does have the option of "fill (previous)" but this draws a line between points on the existing graph, rather than showing the previous data as if it hasn't changed since that point. Now my grafana dashboard looks a bit sad :(
One proposed solution is, in addition to sending "delta" data, send "summary" data, that is - send a full suite of data every 1 minute regardless of whether data changed or not. But then we get noise back again, and pointless storage.
Any other ideas?

Removing outlier with multiple consecutive values similar to a step

I am processing an ocean wave data, where I have a timeseries of the Peak Wave Period (Tp (s)). The typical values for Tp ranges from 2s-15s for this location. However, it may reach higher values above 15s during extreme events such as a storm. Hence, removing data based on a threshold value is not suitable.
As you can see in the figure below, there are multiple values that are outliers. The high values occurred for a small duration and then dropped down. An extreme event would last for hours.
I have tried the functions filloutlier and medfilt1, but they are not successful in removing the outlier, which I presume is because multiple consecutive outlier data points exists.
Is there a built-in Matlab function exist to handle such situation?
Else, if I need to write my own function to filter such signals, could you provide some guidance.
Attaching a small data sample here as well: Download Data
Dataset plot (Only the segment in the provided data above)
Zoomed in plot at one of the outliers.
If we know that we need the values to be in the range of (2,15), we can clip the values > 15 to 15.
Another way is to use the value of a high percentile (say 95) of the observations and clip values about it.
filloutlier, medfilt1 methods are not removing values like 18 because they are not treating them as outliers. 18 is not very far away from the typical range of (2, 15).

What's the most efficient way to use large data from Excel in my C# code?

I ran a computer simulation for my Pendulum, to measure time taken to reach the lowest point, for every velocity and every angle.
As you can imagine there is a lot of data, thousands of lines for all angles and velocity.
On every frame, I will be measuring the velocity and angle of the pendulum, and will look for the closest data in my Excel spreadsheet.
How can I go about this to make sure it's not too CPU-intensive?
Should I create a massive array where every element corresponds to a certain angle: for example, myArray[30] will be for all velocities and times for all my data between 30.0 degrees and 30.999. (That way it will be avoid lots of if statements)
Or should I keep everything in my Excel spreadsheet?
Any suggestion?
The best approach in my opinion would be dividing your data into intervals based on distribution since you have to access that data in every frame. Then when you measure the velocity and angle you can go look for the interval and access only that part of your data.
I would find maximum and minimum of your data points while importing to Unity and then divide that part based on (maximum - minimum) / NumOfIntervals. Lets say your interval size is 5 for each Angle. When you got an angle of 17 you can do (int)15/5 = 3(Assuming indexes start from zero) and go for third item in your structure. This can be a dictionary or Array of an Arbitrary class instances based on your data.
I can try to help further if you can share the structure of your data. But in my opinion evenly distribution of data to every interval is important.

Remove Spikes from Periodic Data with MATLAB

I have some data which is time-stamped by a NMEA GPS string that I decode in order to obtain the single data point Year, Month, Day, etcetera.
The problem is is that in few occasions the GPS (probably due to some signal loss) goes boinks and it spits out very very wrong stuff. This generates spikes in the time-stamp data as you can see from the attached picture which plots the vector of Days as outputted by the GPS.
As you can see, the GPS data are generally well behaved, and the days go between 1 and 30/31 each month before falling back to 1 at the next month. In certain moments though, the GPS spits out a random day.
I tried all the standard MATLAB functions for despiking (such as medfilt1 and findpeaks), but either they are not suited to the task, either I do not know how to set them up properly.
My other idea was to loop over differences between adjacent elements, but the vector is so big that the computer cannot really handle it.
Is there any vectorized way to go down such a road and detect those spikes?
Thanks so much!
you need to filter your data using a simple low pass to get rid of the outliers:
windowSize = 5;
b = (1/windowSize)*ones(1,windowSize);
a = 1;
FILTERED_DATA = filter(b,a,YOUR_DATA);
just play a bit with the windowSize until you get the smoothness you want.

Using velocity/time data to create a displacement/acceleration matrix

I've recently started a new job and having never used MATLAB before, I'm a little stuck. Any help would be much appreciated.
Here is the story. I have been given the velocities of fragmentation pellets at time intervals of 0.001 milliseconds. There are 28 gauges (i.e 28 sets of data) and each gauge has 20,000 readings. I have created a matrix consisting of all this data.
My objective to take that matrix and create 2 more matrices with the corresponding displacement and acceleration values of each reading. The next step is to export the time and acceleration values to an excel spreadsheet.
I am at a loss as to how to do this. I have tried to integrate and differentiate but I cant seem to get it right. Is it possible to create a function that takes the velocity data and automatically calculates acceleration/displacement? (This would make things easier as people in the future could use that same code)
Any help on how to solve any part of this problem would be much appreciated. I've only been using the software 3 days.
Many thanks.