Scaling up a simple linear regression model - linear-regression

This is a simple supply chain analysis that I'm hoping to get some feedback on. I'm wanting to build an analysis that shows a negative correlation between inbound service and outbound transportation costs. The theory is that when our incoming shipments are delayed, we need to expedite outbound shipments to pick up the slack. Thus incurring additional cost.
I built a simple linear regression model that plots item ID on a graph that has service level % on x-axis and transportation cost on y-axis. This was strait forward. Now I want to build a similar graph that shows the sum total of our transportation spend on the max of the y-axis. This would demonstrate that with each percent increase in overall inbound service level, we can decrease overall transportation costs by __.
My questions are:
Is it mathematically correct to translate the first analysis into the second?
How would I go about building the second graph based on the first analysis? (included sample data below for illustration)

Related

Choice of Neural Network and Activation Function

I am very new to the field of Neural Network. Apologies, if this question is very amateurish.
I am looking to build a neural network model to predict whether a particular image that I am about to post on a social media platform will get a certain engagement rate.
I have around 120 images with historical data about the engagement rate. The following information is available:
Images of size 501 px x 501 px
Type of image (Exterior photoshoot/Interior photoshoot)
Day of posting the image (Sunday/Monday/Tuesday/Wednesday/Thursday/Friday/Saturday)
Time of posting the image (18:33, 10:13, 19:36 etc)
No. of people who have seen the post (15659, 35754, 25312 etc)
Engagement rate (5.22%, 3.12%, 2.63% etc)
I would like the model to predict if a certain image when posted on a particular day and time will give an engagement rate of 3% or more.
As you may have noticed, the input data is images, text (signifying what type or day), time and numbers.
Could you please help me understand how to build a neural network for this problem?
P.S: I am very new to this field. It would be great if you can give a detailed direction how I should proceed to solve this problem.
A neural network has three kinds of neuronal layers:
Input layer. It stores the inputs this network will receive. The number of neurons must equal the number of inputs you have;
Hidden layer. It uses the inputs that come from the previous layer and it does the necessary calculations so as to obtain a result, which passes to the output layer. More complex problems may require more than one hidden layer. As far as I know, there is not an algorithm to determine the number of neurons in this layer, so I think you determine this number based on trial and error and previous experience;
Output layer. It gets the results from the hidden layer and gives it to the user for his personal use. The number of neurons from the output layer equals the number of outputs you have.
According to what you write here, your training database has 6 inputs and one output (the engagement rate). This means that your artificial neural network (ANN) will have 6 neurons on the input layer and one neuron on the output layer.
I not sure if you can pass images as inputs to a neural network. Also, because in theory there are an infinite types of images, I think you should categorize them a bit, each category receiving a number. An example of categorization would be:
Images with dogs are in category 1;
Images with hospitals are in category 2, etc.
So, your inputs will look like this:
Image category (dogs=1, hospitals=2, etc.);
Type of image (Exterior photoshoot=1, interior photoshoot=2);
Posting day (Sunday=1, Monday=2, etc.);
Time of posting the image;
Number of people who have seen the post;
Engagement rate.
The number of hidden layers and the number of each neuron from each hidden layer depends on your problem's complexity. Having 120 pictures, I think one hidden layer and 10 neurons on this layer is enough.
The ANN will have one hidden layer (the engagement rate).
Once the database containing the information about the 120 pictures is created (known as training database) is created, the next step is to train the ANN using the database. However, there is some discussion here.
Training an ANN means computing some parameters of the hidden neurons by using an optimization algorithm so as the sum of squared errors is minimum. The training process has some degree of randomness to it. To minimize the effect of the randomness factor and to get as precise estimations as possible, your training database must have:
Consistent data;
Many records;
I don't know how consistent your data are, but from my experience, a small training database with consistent data beats a huge database with non-consistent ones.
Judging by the problem, I think you should use the default activation function provided by the software you use for ANN handling.
Once you have trained your database, it is time to see how efficient this training was. The software which you use for ANN should provide you with tools to estimate this, tools which should be documented. If training is satisfactory for you, you may begin using it. If it is not, you may either re-train the ANN or use a larger database.

Episodic Semi-gradient Sarsa with Neural Network

While trying to implement the Episodic Semi-gradient Sarsa with a Neural Network as the approximator I wondered how I choose the optimal action based on the currently learned weights of the network. If the action space is discrete I can just calculate the estimated value of the different actions in the current state and choose the one which gives the maximimum. But this seems to be not the best way of solving the problem. Furthermore, it does not work if the action space can be continous (like the acceleration of a self-driving car for example).
So, basicly I am wondering how to solve the 10th line Choose A' as a function of q(S', , w) in this pseudo-code of Sutton:
How are these problems typically solved? Can one recommend a good example of this algorithm using Keras?
Edit: Do I need to modify the pseudo-code when using a network as the approximator? So, that I simply minimize the MSE of the prediction of the network and the reward R for example?
I wondered how I choose the optimal action based on the currently learned weights of the network
You have three basic choices:
Run the network multiple times, once for each possible value of A' to go with the S' value that you are considering. Take the maximum value as the predicted optimum action (with probability of 1-ε, otherwise choose randomly for ε-greedy policy typically used in SARSA)
Design the network to estimate all action values at once - i.e. to have |A(s)| outputs (perhaps padded to cover "impossible" actions that you need to filter out). This will alter the gradient calculations slightly, there should be zero gradient applied to last layer inactive outputs (i.e. anything not matching the A of (S,A)). Again, just take the maximum valid output as the estimated optimum action. This can be more efficient than running the network multiple times. This is also the approach used by the recent DQN Atari games playing bot, and AlphaGo's policy networks.
Use a policy-gradient method, which works by using samples to estimate gradient that would improve a policy estimator. You can see chapter 13 of Sutton and Barto's second edition of Reinforcement Learning: An Introduction for more details. Policy-gradient methods become attractive for when there are large numbers of possible actions and can cope with continuous action spaces (by making estimates of the distribution function for optimal policy - e.g. choosing mean and standard deviation of a normal distribution, which you can sample from to take your action). You can also combine policy-gradient with a state-value approach in actor-critic methods, which can be more efficient learners than pure policy-gradient approaches.
Note that if your action space is continuous, you don't have to use a policy-gradient method, you could just quantise the action. Also, in some cases, even when actions are in theory continuous, you may find the optimal policy involves only using extreme values (the classic mountain car example falls into this category, the only useful actions are maximum acceleration and maximum backwards acceleration)
Do I need to modify the pseudo-code when using a network as the approximator? So, that I simply minimize the MSE of the prediction of the network and the reward R for example?
No. There is no separate loss function in the pseudocode, such as the MSE you would see used in supervised learning. The error term (often called the TD error) is given by the part in square brackets, and achieves a similar effect. Literally the term ∇q(S,A,w) (sorry for missing hat, no LaTex on SO) means the gradient of the estimator itself - not the gradient of any loss function.

Encog - Using Hybrid Neural Networks

How is using simulated annealing in conjunction with a feed-forward neural network different than simply resetting the weights (and placing the hidden layer into a new error valley) when a local minimum is reached? Is simulated annealing used by the FFNN as a more systematic way of moving the weights around to find a global minimum, and hence only one iteration is performed each time the validation error begins to increase relative to the training error... slowly moving the current position across the error function? In this case, the simulated annealing is independent of the feed-forward network and the the feed-forward network is dependent on the simulated annealing output. If not, and the simulated annealing is directly dependent on results from the FFNN, I don't see how the simulated annealing trainer would receive this information in terms of how to update its own weights (if that makes sense). One of the examples mentions a cycle (multiple iterations), which doesn't fit into my first assumption.
I have looked at different exmaples, where network.fromArray() and network.toArray() are used, but I only see network.encodeToArray() and network.decodeFromArray(). What is the most current way (v3.2) to transfer weights from one type of network to another? Is this the same for using genetic algorithms, etc?
Neural network training algorithms, such as simulated annealing are essentially searches. The weights of the neural network are essentially vector coordinates that specify a location in a high dimension space.
Consider hill-climbing, possibly the most simple training algorithm. You adjust one weight, thus moving in one dimension and see if it improves your score. If the score is improved, then great, stay there and try a different dimension next iteration. If your score is NOT improved, retreat and try a different dimension next time. Think of a human looking at every point they can reach in one step and choosing the step that increases their altitude the most. If no step will increase altitude (you are standing in the middle of a valley), then your stuck. This is a local minimum.
Simulated annealing adds one critical component to hill-climbing. We might move to a lesser a worse location. (not greedy) The probability that we will move to a lesser location is determined by the decreasing temperature.
If you look inside of the NeuralSimulatedAnnealing classes you will see calls to NetworkCODEC.NetworkToArray() and NetworkCODEC.ArrayToNetwork(). These are how the weight vector is directly updated.

Lake Visitor Modeling by Neural Networks

Let's say I want to model the amount of visitors at an arbitrary lake at specific time.
Given Data:
Time Series of Amount of Visitors for 12 lakes.
Weather Time Series for the 12 lakes
Number of Trees at lake
Percentage of grass/stone ground of the beach.
Hereby I want to use a Neural Network (NN) to model the amount of visitors and I have some essential questions which I want to introduce step by step. Note that the visitor time series shall not be used!
1) we only use the Inputs:
Time of Day
Day of Week
So there is two inputs and one output. I read of a rule of thumb which says that the hidden neurons should be chosen as
#input>=neurons>=#output.
Is the number of inputs here 2 or is it an estimate of the real amount of dependent variables (as weather, mood of persons, economical situation, ....). If yes so I should choose my hidden neurons as 1 or 2, correct?
2) If I want to include lake specific parameters as the number of treas or the ground ratio, can I just add these as additional inputs (constant for each of the twelve lakes) or would that not help for some reason? How could I assure that there is a causal connection between these inputs and the output?
3) For weather since it is a time series which weather values should I use. How do I get the optimal delay for example. Would Granger Causality be a mean to determine that?
Hope you can help. I just wanna discuss on the strength of NNs for modeling and want to hear your opinion. I would use Matlabs Neural Network Toolbox for this.
Thanks in advance.

Multi Step Prediction Neural Networks

I have been working with the matlab neural network toolkit. Here I am using the NARX network. I have a dataset consisting of prices of an object as well as the quantity of the object purchased over a period of time. Essential this network does one step prediction which is defined mathematically as follows:
y(t)= f (y(t −1),y(t −2),...,y(t −ny),x(t −1),x(t −2),...,x(t −nx))
Here y(t) is the price at time t and x is the amount. So the input features I am using are price and amount and the target is the price at time t+1. Suppose I have 100 records of such transactions and each transaction consists of the price and the amount.Then essentially my neural network can predict the price of the 101st transaction. This works fine for one step predictions. However, if i want to do multiple step predictions, so say i want to predict 10 transactions ahead(110th transaction), then I assume that i do a one step prediction of the price and then feed this back into the neural network. I keep doing this until I reach the 110th prediction. However, in this scenario, after i predict the 101st price , I can feed this price into the neural network to predict the 102nd price, however, I do not know the amount of the object at the 101st transaction. How do I go about this ? I was thinking about setting my targets to be the prices of transactions that are 10 transactions ahead of the current one, so that when I predict the 101st transaction, I am essentially predicting the price of the 110th transaction. Is this a viable solution or am i going about this in a completely wrong manner. Thanks in advance for any help
Similar to what kostas said, once you have the predicted 101 price, you can use all your data to predict the 101 amount, then use that to predict the 102 price, then use the 102 price to predict the 102 amount, etc. However, this compounds any error in your predictions for each variable. To mitigate that, you can add several other features, like a tapering discount on past values or a measure of error to use in the prediction (search temporal difference learning for similar ideas in the reinforcement learning realm).
I guess you can use a separate neural network to do time series prediction for x in order to produce x(t+1) up to x(t+10) and then use these values to feed another ANN to predict y(t).