how can i draw the average line in a time series?
Solutions
There are a few ways to do this:
Perform in-line trend analysis and write a chart customizer to perform the calculations. See also:
Trend analysis using iterative value increments
Best fit curve for trend line
Use an integrated statistical package with your database to perform a statistical analysis. See also:
Non-linear regression models in PostgreSQL using R
Use a third-party tool to perform the analysis. See:
http://www.revolutionanalytics.com/
In-line Analysis
The disadvantage to performing the analysis in-line is that JasperReports plots a single value at a time. Any customizer you write will have to calculate the trend based on the past data points, rather than an analysis on all the data points at the end. This will cause a slight skew to the data line.
Integrated Stats Package
The disadvantage to using a statistics package is that you would have to find a way to integrate it with your database. (You would also have to learn the corresponding statistical functions to perform the analysis.)
Third-party Tool
The disadvantage here is that you might have to pay for the product, or support. The integration is likely the easiest.
Recommended Solution
If you have a PostgreSQL database, I would recommend installing PL/R. Use R to perform the aggregate data analysis and then send the result back to JasperReports for the time series chart.
What you are asking to do can be quite involved.
Related
I want to use the SelectKBest Scikit Learn algorithm (with chi-square test) to reduce the number of features in my ML setup. Unfortunately my feature-data is continous. So my question is, if scikit groups this data itself or is this my job? On the one hand in most cases the input data is continous and chi-sqaure-test seems to be an often used method. On the other hand there is no possibility to define a number of groups.
Thanks
Prickles
Well, I have been studying up on the different algorithms used for clustering like k-means, k-mediods etc and I was trying to run the algorithms and analyze their performance on the leaf dataset right here:
http://archive.ics.uci.edu/ml/datasets/Leaf
I was able to cluster the dataset via k-means by first reading the csv file, filtering out unneeded attributes and applying k-means on it. The problem that I am facing here that I wish to calculate measures such as entropy, precision, recall and f-measure for the model developed via k-means. Is there an operator avialable that allows me to do this so that I can quantitatively compare the different clustering algorithms available on rapid-miner?
P.S I know about performance operators like Performance(Classification) that allows me to calculate precision and recall for a model but I dont know any that allow me to calculate entropy.
Help would be much appreciated.
The short answer is to use R. Here's a link to a book chapter about this very subject. There is a revised version coming soon that works for the most recent version of RapidMiner.
I have been all over Google trying to find a good function/package to perform multivariate regression (i.e. predict multiple continuous variables given another set of multiple continuous variables).
I wish to use something like fitlm(), since that also gives me p-value statistics and R squared statistics. Does anything like that exist?
Matlab has a bundle of tools for this, see this page.
I believe that mvregress is the most rounded and mainstream tool. See this page for setting up an analysis with it.
Also, a comment in this post may be useful for alternatives, if needed: it is possible to approach this via separate regression analyses, one for each response variable.
I am doing a project on Open Modelica and i have to simulate filters on it using active elements(op amp). Modelica plots graph with respect to time but i want my graphs with respect to frequency to analyze the frequency response of the system. I searched the internet but couldn't find anything useful. Please reply as soon as possible.
If you want to plot a variable with respect to another variable you can use plotParameteric from OMShell (OpenModelica Shell). In OMEdit (OpenModelica Connection Editor) you can click on parametric plot button x(y) and then select 2 variables.
I assume that what you want is a Bode plot. If so, it is important to understand that such a plot does not arise from a transient simulation. It is necessary to transform your system into a linear, time-invariant representation in order to express the response of your system in the frequency domain.
I do not know what specific features OpenModelica has in this regard. But those are at least the kinds of things you should search the documentation for. If you have access to MATLAB, then all you really need to do is extract the linearized version of the model (normally expressed as the so-called "ABCD" matrices) and MATLAB can get you the rest of the way.
There is also the Modelica_LinearSystems2 library which might be compatible with OpenModelica (I have no idea). It includes many types of operations you would typically perform on linear systems.
If I was to create interpolated splines from a large amount of data (about 400 charts, 500,000 values each), how could I then access the coordinates of those splines from another software quickly and efficiently?
Initially I intended to run a regression on the data and use the resulting formula in my delphi program, but that turned out to be a bigger pain than I thought.
I am currently using Matlab but I can use another software if need be.
Edit: It is probably relevant that this data represents the empirical cumulative distribution of some other data (which I already have in a database).
Here is what one of these charts would look like.
The emphasis is on speed of access. I intend to use this data to run simulations on financial data.
MATLAB has a command for converting a spline into a piecewise polynomial. You can then extract the breaks and the coefficients of each piece of polynomial with unmkpp, and evaluate them in another program.
If you are also familiar with C, you could use Matlab coder or something similar to get an intermediate library to connect your Delphi program and MATLAB together. Interfacing Delphi and C code is, albeit a tad tedious, certainly possible (or it was back in the days of Delphi 7). Or you could even write the algorithm in MATLAB, convert the code to C using Matlab coder and from within Delphi call the generated C library.
Perhaps a bit overkill, but you can store your data in a database (e.g. MySQL) from MATLAB and retrieve them from Delphi.
Finally: is Delphi a real constraint? You could also use MATLAB to do the simulations, as you might have the same tools (or even more) available for MATLAB than in Delphi. Afterwards you can just share the results, which I suppose is less speed critical.
My initial guess at doing this efficiently would be to create a memory mapped file in MATLAB using memmapfile, stuff a look-up table with your data into that, then open the memory mapped file in your Delphi code and read the data from that file.
The fastest is most likely a look-up table that you save to disk and that you load and use in your simulation code (although: why not run the simulation in Matlab?)
You can evaluate the spline for a finely-grained list of values of x using FNVAL, and use the closest value of x to look up the cdf.