Caffe: How to train end-to-end (image to image)? - neural-network

We are quite new to caffe, but what we have seen so far, looks really promising.
After reading a few papers (1,2), we wanted to reproduce the result of 1, specifically about a segmentation challenge 4.
We downloaded the modified caffe from 3 and were able to execute it, just to see, that the trained network didn't work with the dataset from 4.
At first we thought that the network needs to be trained for the specific problem.
Which lead to the problem of how to do 'image-to-image (aka end-to-end) learning ' (4, training data).
This lead us to 'holistically nested edge detection' (hed, 2), where image-to-image learning, seems to be used.
With hed, we were able to retrain the network on our own. But it doesn't work (it leads to all 0 or 0.5 images - black images :-( ) if we try to train the network for the dataset of 4. For initialization we wrote a script to calculate the mean-map witch we use for the dataset of 4.
Our question(s) are:
How can we reproduce the result, mentioned in 1 by running
image-to-image training?
How do you train networks, where we have image-to-image learning?
Since we only have 30 image-to-image pairs, should we implement
deformation as mentioned in 1/3 via matlab/python or is there a
functionality within caffe already?
Are we missing something simple from 1 or 2?
Kind regards,
Klaus and Bernhard
Ps: We asked the same question at the caffe-user group and intend to post solutions at both locations.

After some time, and trying several different things out - i stumbled upon:
Using that caffe fork, with caffe_neural_models and caffe_neural_tool training image(raw)-to-image(labels) can be done quite simple.
Just check out 'caffe_neural_models/net*' for different configurations.


How to create a "Denoising Autoencoder" in Matlab?

I know Matlab has the function TrainAutoencoder(input, settings) to create and train an autoencoder. The result is capable of running the two functions of "Encode" and "Decode".
But this is only applicable to the case of normal autoencoders. What if you want to have a denoising autoencoder? I searched and found some sample codes, where they used the "Network" function to convert the autoencoder to a normal network and then Train(network, noisyInput, smoothOutput)like a denoising autoencoder.
But there are multiple missing parts:
How to use this new network object to "encode" new data points? it doesn't support the encode().
How to get the "latent" variables to the features, out of this "network'?
I appreciate if anyone could help me resolve this issue.
At present (2019a), MATALAB does not permit users to add layers manually in autoencoder. If you want to build up your own, you will have start from the scratch by using layers provided by MATLAB;
In order to to use TrainNetwork(...) to train your model, you will have you find out a way to insert your data into an object called imDatastore. The difficulty for autoencoder's data is that there is NO label, which is required by imDatastore, hence you will have to find out a smart way to avoid it--essentially you are to deal with a so-called OCC (One Class Classification) problem.
Use activations(...) to dump outputs from intermediate (hidden) layers
I swang between using MATLAB and Python (Keras) for deep learning for a couple of weeks, eventually I chose the latter, albeit I am a long-term and loyal user to MATLAB and a rookie to Python. My two cents are that there are too many restrictions in the former regarding deep learning.
Good luck.:-)
If you 'simulation' means prediction/inference, simply use activations(...) to dump outputs from any intermediate (hidden) layers as I mentioned earlier so that you can check them.
Another way is that you construct an identical network but with the encoding part only, copy your trained parameters into it, and feed your simulated signals.

Neural Network playing Tic Tac Toe doesn't learn

I have a neural network playing tic-tac-toe. (I know there are other better methods for this, but I want to learn about NN)
So the NN plays against a random AI. First, it should learn to make an allowed move, ie. not choosing a field that is already occupied.
It doesn't get very far with this, however.
When NN chooses an illegal move I optimize the weights such that the distance to another, randomly chosen (legal) field is minimized. (There is one output which should have values between 1 and 9).
My problem is: in changing the weights, a formerly optimized outcome is now also changed. So I have this kind of overfitting: Everytime I backpropagade to optimize the weights for one particular situation, the decision for every other situation becomes worse!
I know I should probably have 9 output neurons instead of 1 and should probably not use a random field as the target, as I assume this can mess things up. I am starting to change this.
Still, the issue seems to remain. Obviously. How can I improve the decision in one situation without forgetting every other situation?
One solution I came up with is to "remember" every game played and optimizing simultaneously over all games played.
However, after a while this becomes very demanding on the computation. Also, it seems to go into the direction of a complete enumartion of all possible board situations. This might be possible for Tic Tac Toe but if I move to another game, say Go, this becomes infeasible.
Where is my mistake? How do I generally tackle this problem? Or where could I read about it? Thanks a lot!
To tackle this problem efficiently, you sould consider Reinforcement Learning methods, instead of what you are currently doing. What your are trying to do is to learn the behaviour of an agent playing Tic Tac Toe. The agent gets a high reward when he wins a game, a high penalty when he loses and an even higher penalty when he performs an illegal move. My guess is that using methods such as Q-learning with neural networks will work perfectly, even with very simple neural nets. One useful paper on the topic could be:, or earlier papers on TD-Gammon (I think you can easily find tutorials on the topic using the keywords TD-Gammon, Q-learning, ...).
By the way, a more down-to-earth answer to why your model might not work is that you are seemingly using one single unit to represent categorical outputs: if you want to represent an integer between 1 and N, you should represent it using N output neurons with values between 0 and 1, and pick the neuron with the highest value as your answer. Using a single neuron with value between 1 and 9 creates an unatural assymetry between your outputs, and, for example, when the expected value is 3, your network gets a higher error for outputing a 9 than a 2. This should obviously not be the case: all wrong answers are equally wrong.
Hope this helps,

Recommendation system design

I am currently working on a research in which I try to predict people's IQ.
This is how the research goes, on day 1 participants take IQ test. At regular intervals of 2 weeks they continue to take the test (with different questions maybe) for 6 months.
Given this information (or dataset) how does one go about designing a recommendation system.
I imagine it something like this
IQvalue --input--> [ Recommendation Engine ] --spits out--> probable IQ value (after 6 months)
My actual research is not on IQ at all. I just made this example up.
Kindly suggest if I am going in the right direction at all? Are there any algorithms that do something similar?
Appreciate any help.
For case 1, you only have the time-related IQ values, I suggest you consider the time series analysis methods. Your target is to predict how the IQ change with time. My suggestion for this solution is the statsmodels library. Its github address is as follows: .
This tool is written in python and easy to use. It contains many generally used tsa models, such as ARIMA.
For case 2, if you also have the features of people, for instance, the answers in the QA test, ages, gender, education, etc., I suggest you consider use a machine learning methods to predict the IQs. You may consider the random forest or gradient boost to solve this problem. I suggest you use the tools such as Scikit-learn or xgboost.
For case 3, you can model it as a recommender system problem. Suppose user-test people, item-IQ, rating-IQ value, you can construct a user-item matrix. After that, you can use RS methods, such as matrix factorization or memory-based methods to predict the IQ values.
In my opinion, the first two means may be better for your case.

Bidirectional LSTM for Classification

I am done with searching "how to implement bidirectional lstm network for a classification problem (say with iris data)". I have not found any satisfying answer. In almost every cases I came by a solution where BLSTM is implemented for a sequence prediction problem. My simple question is that, how can I create a bidirectional network in pybrain. As whenever I am trying to build one I am writing *
My intention was to add modules later by pybrain.addInputModule() or so. But it is failing of course as I am not specifying seqlen as in
n = BidirectionalNetwork(seqlen=20, inputsize=1,
hiddensize=5, symmetric=False)
what will be seqlen if I have 4 inputs and 3 outputs(as in iris data) and 150 sample data. will it be 150? Things are not clear as I have no example of classification problem.

Encog predictive neural network results

I have been using the Encog Neural Net workbench (version 3.2) to run the sunspot prediction routine and have noticed that when changing the future prediction window to greater than 1 the results in the sunspot_output.csv appear to be time offset so that the output when the network evaluates at t=0 are not really (t+1), (t+2), (t+3) etc. It's very likely I'm not understanding how the workbench is displaying the results so perhaps someone could clarify this for me.
As I understand it if you use a past window of 30 and a future window of 14 then the network will look at the last 30 records and predict forward from the last available record (in this case lets say 11/1/1951 is the last available record). So an evaluation on 11/1/1951 will look back 30 records to 5/1/1949 and use this information to feed through the trained network to predict data for 12/1/1951 (t+1), 1/1/1952 (t+2), 2/1/1952 (t+3), etc. However, looking at the result file this does not appear to be the case. The "prediction" really appears to be a repeat of the pattern from the previous 14 records. So that (t+1) is really more representative of (t-14) 08/01/1950 than the next record forward from (t=0) which would be on 12/1/1951.
I have an image that shows this but unfortunately I don't appear have the reputation points to post it yet. To reproduce this issue I suggest using the Encog workbench and using a past window of 30, future window of 14 and training error of 1 or 2%.
To Summarize:
Has anyone else noticed this issue when looking at the predictive network results, particularly for greater than one time step ahead?
Why do the workbench results show that the encog predictive neural network is not properly predicting into the future when you look at the dates associated with the outputs.
Thank you for any thoughts you may have!
That's not an issue is how a sliding window time series forecaster works.
I would suggest you to deepen here
It really depends on how you tune the neural network.
If you want more predictive power you have to extract features or syntetize new features (for example I would use wavelet extraction and denoising).
Pay attention to normalization. Use range normalization if you know that there are known ranges otherwise z-normalization.
Use the proper activation function: Sigmoid if the normalized range is 0,1 or tanh if the range is -1,1.
But before ending that the neural network is not predicting I would suggest you to use SVR (Support Vector Regression) included in encog.
It guarantees (if it is present) to reach the global minimum.
See if the SVR predicts better than the ANN.
If not use my firsts suggestions ;-)