Difference between Array and Timeseries [closed] - matlab

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I want to save result to to_file block in model matlab
just I want to know what is difference between array and timeseries in save format field.

Lets start from array - it's easiest thing. If you use To File or To Workspace block with array options it writes to file just column of values of your variable.
If you use Timeseries it writes values in timeseries format. This structure consist of several fields. Main of them are Time and Data. So you get not only values but times corresponded to this data! Furthermore it contain some additional information like interpolation method and other (see it in help).
When I have to use Array and when Timeseries?
It's clear that if time moments important to you you need to use Timeseries. For example if your simulation uses variable time step then data will not be uniformly distributed.So it's helpful to get times too.
Using an array is useful if times of data is not important. For example if I save from Enabled subsystem only 1 value of my variable.

Related

How do you do stratified sampling across different groups, when creating train and test sets, in pyspark? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am looking for a solution to split my data to Test and Train sets but I want to have all the levels of my categorical variable in both test and train.
My variable has 200 levels and the data is 18 million records. I tried sampleBy function with fractions (0.8) and could get the training set but had difficulties getting the test set since there is no index in Spark and even with creating a key, using left join or subtract is very slow to get the test set!
I want to do a groupBy based on my categorical variable and randomly sample each category and if there is only one observation for that category, put that in the train set.
Is there a default function or library to help with this operation?
A pretty hard problem.
I don't know of an in-built function which will help you get this. Using sampleBy and then so subtraction subtraction would work but as you said - would be pretty slow.
Alternatively, wonder if you can try this*:
Use window functions, add row num and remove everything with rownum=1 into a separate dataframe which you will add into your training in the end.
On the remaining data, using randomSplit (a dataframe function) to divide into training and test
Add the separated data from Step 1 to training.
This should work faster.
*(I haven't tried it before! Would be great if you can share what worked in the end!)

ML model to transform words [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I build model that on input have correct word. On output there is possible word written by human (it contain some errors). My training dataset looks that:
input - output
hello - helo
hello - heelo
hello - hellou
between - betwen
between - beetween
between - beetwen
between - bettwen
between - bitween
etc.
During preprocessing I add a measure of the distortion of a word. Then I hardcoding letters for numbers.
My current model's using CNN. The number of neurons of input is the same as the longest word in training dataset and the number of neurons of output is the same as the longest word in traning dataset.
This model doesn't work as I excepted. Word on the output is not look as I except.
eg.
input - output
house - gjrtdd
Question:
How can I build/improve model for this task? Is CNN a good idea? What other methods can I use for this task?

How to Test conditional independence between random variables using available samples? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
How can I test for the independence between two random variable given another one(i.e. whether P(A|C)=P(A|C,B) or not?) using available samples. in other words, I just have 1000 samplesf for 3 random variables generated by bntoolbox on Matlab and now I wanna test for CI between arbitrary random variables.
I've read something about Fisher's method but honestly don't understand it.
Thanks is advance.

Reading a 5-D double data structure in matlab [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
How do I read a 5-D data structure (of type double) in matlab? I have loaded the data in matlab and it says 5-D double. How do I read it afterwards
With the current information I cannot say much, but usually this is a good first try:
In matlab navigate to the file, right click on it and choose import data.
The import wizard is quite powerful so you have a good chance to get the data you need from that. Afterwards you may need to try help reshape if it does not have the right dimensions yet.

Which is the best clustering algorithm to find outliers? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Basically I have some hourly and daily data like
Day 1
Hours,Measure
(1,21)
(2,22)
(3,27)
(4,24)
Day 2
hours,measure
(1,23)
(2,26)
(3,29)
(4,20)
Now I want to find outliers in the data by considering hourly variations and as well as the daily variations using bivariate analysis...which includes hourly and measure...
So which is the best clustering algorithm is more suited to find outlier considering this scenario?
.
one 'good' advice (:P) I can give you is that (based on my experience) it is NOT a good idea to treat time similar to spatial features. So beware of solutions that do this. You probably can start with searching the literature in outlier detection for time-series data.
You really should use a different repesentation for your data.
Why don't you use an actual outlier detection method, if you want to detect outliers?
Other than that, just read through some literature. k-means for example is known to have problems with outliers. DBSCAN on the other hand is designed to be used on data with "Noise" (the N in DBSCAN), which essentially are outliers.
Still, the way you are representing your data will make none of these work very well.
You should use time series based outlier detection method because of the nature of your data (it has its own seasonality, trend, autocorrelation etc.). Time series based outliers are of different kinds (AO, IO etc.) and it's kind of complicated but there are applications which make it easy to implement.
Download the latest build of R from http://cran.r-project.org/. Install the packages "forecast" & "TSA".
Use the auto.arima function of forecast package to derive the best model fit for your data amd pass on those variables along with your data to detectAO & detectIO of TSA functions. These functions will pop up any outlier which is present in the data with their time indexes.
R is also easy to integrate with other applications or just simply run a batch job ....Hope that helps...