Encog predictive neural network results - neural-network

I have been using the Encog Neural Net workbench (version 3.2) to run the sunspot prediction routine and have noticed that when changing the future prediction window to greater than 1 the results in the sunspot_output.csv appear to be time offset so that the output when the network evaluates at t=0 are not really (t+1), (t+2), (t+3) etc. It's very likely I'm not understanding how the workbench is displaying the results so perhaps someone could clarify this for me.
As I understand it if you use a past window of 30 and a future window of 14 then the network will look at the last 30 records and predict forward from the last available record (in this case lets say 11/1/1951 is the last available record). So an evaluation on 11/1/1951 will look back 30 records to 5/1/1949 and use this information to feed through the trained network to predict data for 12/1/1951 (t+1), 1/1/1952 (t+2), 2/1/1952 (t+3), etc. However, looking at the result file this does not appear to be the case. The "prediction" really appears to be a repeat of the pattern from the previous 14 records. So that (t+1) is really more representative of (t-14) 08/01/1950 than the next record forward from (t=0) which would be on 12/1/1951.
I have an image that shows this but unfortunately I don't appear have the reputation points to post it yet. To reproduce this issue I suggest using the Encog workbench and using a past window of 30, future window of 14 and training error of 1 or 2%.
To Summarize:
Has anyone else noticed this issue when looking at the predictive network results, particularly for greater than one time step ahead?
Why do the workbench results show that the encog predictive neural network is not properly predicting into the future when you look at the dates associated with the outputs.
Thank you for any thoughts you may have!

That's not an issue is how a sliding window time series forecaster works.
I would suggest you to deepen here https://www.cs.rutgers.edu/~pazzani/Publications/survey.pdf
It really depends on how you tune the neural network.
If you want more predictive power you have to extract features or syntetize new features (for example I would use wavelet extraction and denoising).
Pay attention to normalization. Use range normalization if you know that there are known ranges otherwise z-normalization.
Use the proper activation function: Sigmoid if the normalized range is 0,1 or tanh if the range is -1,1.
But before ending that the neural network is not predicting I would suggest you to use SVR (Support Vector Regression) included in encog.
It guarantees (if it is present) to reach the global minimum.
See if the SVR predicts better than the ANN.
If not use my firsts suggestions ;-)
Vincenzo

Related

Is Matlab incorrect for mnrfit?

It seems Matlab is giving incorrect results for multinomial logistic regression.
In their example documentation using Fisher's Iris dataset [link], they give coefficients for the model which can be used on the same data set itself to get the modeled probabilities.
load fisheriris
sp = categorical(species);
[B,dev,stats] = mnrfit(meas,sp);
PHAT=mnrval(B,meas);
However, none of the expected value aggregates match the population aggregates which is a requirement for a MaxEnt classifer (See slide 35 [here], or Eq 14 [here], or Agresti "Categorical Data Analysis" pg 298, etc.)
For example
>> sum(PHAT)
>> 49.9828 49.8715 50.1456
should all equal 50 (population values), likewise for other aggregations
If the parameters
B=[36.9450 42.6378
12.2641 2.4653
14.4401 6.6809
-30.5885 -9.4294
-39.3232 -18.2862]
were used instead then all aggregated sufficient statistics match.
Additionally it seems odd that Matlab is solving it with likelihoods, which can produce an error,
Warning: Maximum likelihood estimation did not converge. Iteration
limit exceeded. You may need to merge categories to increase observed
counts
where the only requirement, proved by MLE consideration, is that the expected values match and no likelihood evaluation is needed.
It would be a nice feature that if instead of true classes are given we can give an option for including just the aggregate information.
Submitted a technical error review within Mathworks website. Their reply:
Hello [----],
I am writing in reference to your Technical Support Case #01820504
regarding 'mnrfit'.
Thanks a lot for your patience and reporting this issue. This appears
to be unexpected behavior. It appears to be related to an existing
issue we have in our records, that "mnrfit" does not give correct
maximum likelihood estimates in certain cases. Since the "mnrfit"
function is not finding the maximum likelihood estimates for the
coefficients, we calculated the actual MLEs. When we use these
estimates, we get the desired result of all 50s in this case.
The issue is that, for this particular dataset in our example, the
classes can be separated perfectly. This means that the logistic
function, in order to get exact zero or one probabilities, needs to
have infinite coefficients. The "mnrfit" function carries out an
iterative procedure with the coefficients getting larger, but it stops
at a point where the results have the issue that you have found. We
certainly agree that "mnrfit" could be made to do better. Our
development team is working on it.
At this stage, I am not able to suggest a workaround other than to
write a custom implementation as my colleague and I had tried. For
now, I will be closing this request as I have already forwarded it to
our records. However, if you have any additional questions related to
this case, please do not hesitate to reach me.
Sincerely,
[----]
MathWorks Technical Support Department

why if I put a filter on an output I modify the source signal? is this a simulink bug?

I know it sounds strange and that's a bad way to write a question,but let me show you this odd behavior.
as you can see this signal, r5, is nice and clean. exactly what I expected from my simulation.
now look at this:
this is EXACTLY the same simulation,the only difference is that the filter is now not connected. I tried for hours to find a reason,but it seems like a bug.
This is my file, you can test it yourself disconnecting the filter.
----edited.
Tried it with simulink 2014 and on friend's 2013,on two different computers...if Someone can test it on 2015 it would be great.
(attaching the filter to any other r,r1-r4 included ''fixes'' the noise (on ALL r1-r8),I tried putting it on other signals but the noise won't go away).
the expected result is exactly the smooth one, this file showed to be quite robust on other simulations (so I guess the math inside the blocks is good) and this case happens only with one of the two''link number'' (one input on the top left) set to 4,even if a small noise appears with one ''link number'' set to 3.
thanks in advance for any help.
It seems to me that the only thing the filter could affect is the time step used in the integration, assuming you are using a dynamic time step (which is the default). So, my guess is that (if this is not a bug) your system is numerically unstable/chaotic. It could also be related to noise, caused by differentiation. Differentiating noise over a smaller time step mostly makes things even worse.
Solvers such as ode23 and ode45 use a dynamic time step. ode23 compares a second and third order integration and selects the third one if the difference between the two is not too big. If the difference is too big, it does another calculation with a smaller timestep. ode45 does the same with a fourth and fifth order calculation, more accurate, but more sensitive. Instabilities can occur if a smaller time step makes things worse, which could occur if you differentiate noise.
To overcome the problem, try using a fixed time step, change your precision/solver, or better: avoid differentiation, use some type of state estimator to obtain derivatives or calculate analytically.

Newbie to Neural Networks

Just starting to play around with Neural Networks for fun after playing with some basic linear regression. I am an English teacher so don't have a math background and trying to read a book on this stuff is way over my head. I thought this would be a better avenue to get some basic questions answered (even though I suspect there is no easy answer). Just looking for some general guidance put in layman's terms. I am using a trial version of an Excel Add-In called NEURO XL. I apologize if these questions are too "elementary."
My first project is related to predicting a student's Verbal score on the SAT based on a number of test scores, GPA, practice exam scores, etc. as well as some qualitative data (gender: M=1, F=0; took SAT prep class: Y=1, N=0; plays varsity sports: Y=1, N=0).
In total, I have 21 variables that I would like to feed into the network, with the output being the actual score (200-800).
I have 9000 records of data spanning many years/students. Here are my questions:
How many records of the 9000 should I use to train the network?
1a. Should I completely randomize the selection of this training data or be more involved and make sure I include a variety of output scores and a wide range of each of the input variables?
If I split the data into an even number, say 9x1000 (or however many) and created a network for each one, then tested the results of each of these 9 on the other 8 sets to see which had the lowest MSE across the samples, would this be a valid way to "choose" the best network if I wanted to predict the scores for my incoming students (not included in this data at all)?
Since the scores on the tests that I am using as inputs vary in scale (some are on 1-100, and others 1-20 for example), should I normalize all of the inputs to their respective z-scores? When is this recommended vs not recommended?
I am predicting the actual score, but in reality, I'm NOT that concerned about the exact score but more of a range. Would my network be more accurate if I grouped the output scores into buckets and then tried to predict this number instead of the actual score?
E.g.
750-800 = 10
700-740 = 9
etc.
Is there any benefit to doing this or should I just go ahead and try to predict the exact score?
What if ALL I cared about was whether or not the score was above or below 600. Would I then just make the output 0(below 600) or 1(above 600)?
5a. I read somewhere that it's not good to use 0 and 1, but instead 0.1 and 0.9 - why is that?
5b. What about -1(below 600), 0(exactly 600), 1(above 600), would this work?
5c. Would the network always output -1, 0, 1 - or would it output fractions that I would then have to roundup or rounddown to finalize the prediction?
Once I have found the "best" network from Question #3, would I then play around with the different parameters (number of epochs, number of neurons in hidden layer, momentum, learning rate, etc.) to optimize this further?
6a. What about the Activation Function? Will Log-sigmoid do the trick or should I try the other options my software has as well (threshold, hyperbolic tangent, zero-based log-sigmoid).
6b. What is the difference between log-sigmoid and zero-based log-sigmoid?
Thanks!
First a little bit of meta content about the question itself (and not about the answers to your questions).
I have to laugh a little that you say 'I apologize if these questions are too "elementary."' and then proceed to ask the single most thorough and well thought out question I've seen as someone's first post on SO.
I wouldn't be too worried that you'll have people looking down their noses at you for asking this stuff.
This is a pretty big question in terms of the depth and range of knowledge required, especially the statistical knowledge needed and familiarity with Neural Networks.
You may want to try breaking this up into several questions distributed across the different StackExchange sites.
Off the top of my head, some of it definitely belongs on the statistics StackExchange, Cross Validated: https://stats.stackexchange.com/
You might also want to try out https://datascience.stackexchange.com/ , a beta site specifically targeting machine learning and related areas.
That said, there is some of this that I think I can help to answer.
Anything I haven't answered is something I don't feel qualified to help you with.
Question 1
How many records of the 9000 should I use to train the network? 1a. Should I completely randomize the selection of this training data or be more involved and make sure I include a variety of output scores and a wide range of each of the input variables?
Randomizing the selection of training data is probably not a good idea.
Keep in mind that truly random data includes clusters.
A random selection of students could happen to consist solely of those who scored above a 30 on the ACT exams, which could potentially result in a bias in your result.
Likewise, if you only select students whose SAT scores were below 700, the classifier you build won't have any capacity to distinguish between a student expected to score 720 and a student expected to score 780 -- they'll look the same to the classifier because it was trained without the relevant information.
You want to ensure a representative sample of your different inputs and your different outputs.
Because you're dealing with input variables that may be correlated, you shouldn't try to do anything too complex in selecting this data, or you could mistakenly introduce another bias in your inputs.
Namely, you don't want to select a training data set that consists largely of outliers.
I would recommend trying to ensure that your inputs cover all possible values for all of the variables you are observing, and all possible results for the output (the SAT scores), without constraining how these requirements are satisfied.
I'm sure there are algorithms out there designed to do exactly this, but I don't know them myself -- possibly a good question in and of itself for Cross Validated.
Question 3
Since the scores on the tests that I am using as inputs vary in scale (some are on 1-100, and others 1-20 for example), should I normalize all of the inputs to their respective z-scores? When is this recommended vs not recommended?
My understanding is that this is not recommended as the input to a Nerual Network, but I may be wrong.
The convergence of the network should handle this for you.
Every node in the network will assign a weight to its inputs, multiply them by their weights, and sum those products as a core part of its computation.
That means that every node in the network is searching for some coefficients for each of their inputs.
To do this, all inputs will be converted to numeric values -- so conditions like gender will be translated into "0=MALE,1=FEMALE" or something similar.
For example, a node's metric might look like this at a given point in time:
2*ACT_SCORE + 0*GENDER + (-5)*VARISTY_SPORTS ...
The coefficients for each values are exactly what the network is searching for as it converges.
If you change the scale of a value, like ACT_SCORE, you just change the scale of the coefficient that will be found by the reciporical of that scaling factor.
The result should still be the same.
There are other concerns in terms of accuracy (computers have limited capacity to represent small fractions) and speed that may enter this, but not being familiar with NEURO XL, I can't say whether or not they apply for this technology.
Question 4
I am predicting the actual score, but in reality, I'm NOT that concerned about the exact score but more of a range. Would my network be more accurate if I grouped the output scores into buckets and then tried to predict this number instead of the actual score?
This will reduce accuracy, although you should converge to a solution much faster with fewer possible outputs (scores).
Neural Networks actually describe very high-dimensional functions in their input variables.
If you reduce the granularity of that function's output space, you essentially state that you don't care about local minima and maxima in that function, especially around the borders between your output scores.
As a result, you are sacrificing information that may be an essential component of the "true" function that you are searching for.
I hope this has been helpful, but you really should break this question down into its many components and ask them separately on different sites -- potentially some of them do belong here on StackOverflow as well.

Dymola/Modelica real-time simulation advances too fast

I want to simulate a model in Dymola in real-time for HiL use. In the results I see that the Simulation is advancing about 5% too fast.
Integration terminated successfully at T = 691200
CPU-time for integration : 6.57e+005 seconds
CPU-time for one GRID interval: 951 milli-seconds
I already tried to increase the grid interval to reduce the relativ error, but still the simulation is advancing too fast. I only read about aproaches to reduce model complexity to allow simulation within the defined time steps.
Note, that the Simulation does keep up with real-time and is even faster. How can I ín this case match simulated time and real time?
Edit 1:
I used Lsodar solver with checked "Synchronize with realtime option" in Realtime tab. I have the realtime simulation licence option. I use Dymola 2013 on Windows 7. Here the result for a stepsize of 15s:
Integration terminated successfully at T = 691200
CPU-time for integration : 6.6e+005 seconds
CPU-time for one GRID interval : 1.43e+004 milli-seconds
The deviation still is roughly about 4.5%.
I did however not use inline integration.
Do I need hard realtime or inline integration to improve those results? It should be possible to get a deviation lower than 4.5% using soft realtime or not?
Edit 2:
I took the Python27 block from the Berkeley Buildings library to read the System time and compare it with the Simulation advance. The result shows that 36 hours after Simulation start, the Simulation slows down slightly (compared to real time). About 72 hours after the start of the simulation it starts getting about 10% faster than real time. In addition, the jitter in the result increases after those 72 hours.
Any explanations?
Next steps will be:
-changing to fixed step solver (Might well be this is a big part of the solution)
-Changing from DDE Server to OPC Server, which at the Moment doesn't not seem to be possible in Dymola 2013 however.
Edit 3:
Nope... using a fixed step solver does seem to solve the problem. In the first 48 hours of simulation time the deviation seems to be equal to the deviation using a solver with variable step size. In this example I used the Rkfix 3 solver with an integrator step of 0.1.
Nobody knows how to get rid of those huge deviations?
If I recall correctly, Dymola has a special compilation option for real-time performance. However, I think it is a licensed option (not sure).
I suspect that Dymola is picking up the wrong clock speed.
You could use the "Slowdown factor" that is in the Simulation Setup, on the Realtime tab just below "Synchronize with realtime". Set this to 1/0.95.
There is a parameter in Dymola that you can use to set the CPU speed but I could not find this now, I will have a look for this again later.
I solved the problem switching to an embedded OPC-Server. Error between real time and simulation time in this case is shown below.
Compiling Dymola Problems with an embedded OPC-Server requires administrator rights (which I did not have before). The active folder of Dymola must not be write protected.

create new event in output adapter streaminsight

I have the following problem in StreamInsight. I have a query where new tasks from an order came in and trigger an output adapter to make an prediction. The outputadapter writes the predicted task cycle time to a table (in Windows Azure). The prediction is based on neural networks and is plugged in in the outputadapter. After the prediction is written in the table I want to do something else with all the predicted times. So in a second query I want to count the number of written tasks in a time window of 5 minutes. When the number of predicted values saved in the table is equal to the number of tasks in an order, I want to get all the predicted values from the table and make a prediction of the order cycle time.
For this idea I need to make a new event in my outputadapter to know the predicted time is writen in the table. But I don't thinks its possible to enqueue new events in the streaminsight server from an outputadapter.
Maybe this figure makes the problem clear:
http://i40.tinypic.com/4h4850.jpg
Hope someone can help me.
Thanks Carlo
First off, I'm assuming you are using pre-2.1 StreamInsight based on your use of the term "output adapter".
From what you've posted, I would strongly recommend that your adapters do either input or output, but not both. This cuts down on the complexity, makes the implementation easier, and depending on how you wrote the adapter, you now have a reusable piece of code in your solution.
If you are wanting to send data from StreamInsight to your neural network prediction engine, you will need to write an output adapter to do that. Then I would create an input adapter that will get the results from the neural network prediction engine and enqueue the data into StreamInsight. After creating your stream from the neural network prediction engine input adapter, you can use dynamic query composition to share the stream to a Windows Azure storage output adapter and your next query.
If your neural network prediction engine can "push" data to your input adapter, that would be the way to do. If not, you'll have to poll for results.
There is a lot more to this, but it's difficult to drill in to more specifics without more details.
Hope this helps.