How to calculate rolling sum using Wrangler in Cloud Data Fusion GCP? - google-cloud-data-fusion

I am trying to calculate the rolling sum for a numeric column in Cloud Data Fusion. The wrangler UI does not have the option to calculate the rolling sum. Is there a specific way this has to be done? And if so how should it be done? Or is there a different tool that can be used?

Related

How to plot uniform time-series in MongoDB Charts

I've just started to use MongoDB Charts to plot incoming data from a series of IoT devices that send at regular intervals. Each device sends a package with a timestamp and some data (JSON to our NoSQL db), and I would like to plot several devices on the same chart.
To visually check the continuity of the data flowing in (i.e. if a device fails to upload data) I want to plot each data-point over a continuous time-series x-axis. Does anyone know if MongoDB Charts has a feature to make the x-axis continuous? Currently, the chart plots one point per observation and space these equally no matter the time-in between.
Example of data: Point 1-2-3 all have 6 minutes in between, but visually appear to be non-uniform, since the x-axis is not continuous:
Yes, MongoDB Charts has a "Continuous Line" chart type that should do what you want. As you discovered, the discrete chart always aggregates values over binned time periods, but the continuous version plots a point for each document. Please see the following screenshot as an example.

tsfresh time-series clustering of stock data

How would we use "tsfresh" for time-series clustering of stock data,
where we do not have a vector of target values?
The select_features function requires a vector of target values.
First calculate a set of features from your stock time series (e.g. take price and volume data). To do that, you will have to convert your stock data into dataframe in one of the tsfresh input formats (https://tsfresh.readthedocs.io/en/latest/text/data_formats.html).
tsfresh will return a feature matrix that you can then feed to clustering algorithms, e.g. from scikit-learn (http://scikit-learn.org/stable/modules/clustering.html). So, by use of tsfresh you move your problem from the time series domain into the feature matrix domain.

Clustering of Features

I have multiview video dataset and I have extracted the features by using IDT (Improved Dense Trajectory Method). Dictionary of features is created using Vl_fisher method. Now, each row corresponds to a video and each column corresponds to a feature associated with each video. I want to apply clustering method to cluster the features but when I apply kmeans, spectral clustering I get the index of videos(rows) corresponding to each cluster but instead I want to know which feature corresponds to which cluster? Can anybody suggest any method. I am using Matlab for my work.

K-medioids with Dynamic Time Warping in RapidMiner

How to perform K-medioids clustering with Dynamic Time Warping as a distance measure in RapidMiner?
The idea with Dynamic Time Warping is to perform it on time series of different length. How can I do that in RapidMiner? I get this error message
The data contains missing values which is not allowed for KMediods
How can I cluster time series of different length?
You could fill the missing values with zeroes. The operator Replace Missing Values does this. I don't know the details of your data nor how RapidMiner calculates DTW distances so I therefore can't tell if this approach would yield valid results.
Faced with this, I might use the R extension with the dtw and cluster packages to investigate how distances between different length time series could be used to make clusters. Once you have R working, you can call it from RapidMiner.

Average line in time series with iReport

how can i draw the average line in a time series?
Solutions
There are a few ways to do this:
Perform in-line trend analysis and write a chart customizer to perform the calculations. See also:
Trend analysis using iterative value increments
Best fit curve for trend line
Use an integrated statistical package with your database to perform a statistical analysis. See also:
Non-linear regression models in PostgreSQL using R
Use a third-party tool to perform the analysis. See:
http://www.revolutionanalytics.com/
In-line Analysis
The disadvantage to performing the analysis in-line is that JasperReports plots a single value at a time. Any customizer you write will have to calculate the trend based on the past data points, rather than an analysis on all the data points at the end. This will cause a slight skew to the data line.
Integrated Stats Package
The disadvantage to using a statistics package is that you would have to find a way to integrate it with your database. (You would also have to learn the corresponding statistical functions to perform the analysis.)
Third-party Tool
The disadvantage here is that you might have to pay for the product, or support. The integration is likely the easiest.
Recommended Solution
If you have a PostgreSQL database, I would recommend installing PL/R. Use R to perform the aggregate data analysis and then send the result back to JasperReports for the time series chart.
What you are asking to do can be quite involved.