I have a 3D input dataset. The dimensions are (24,80,42). 80 number of timesteps or samples. Each timestep has 24 entities and each entity is attributed to 42 features. How do I give this as input to an ordinary Feed-forward Neural Network? I have already got results with LSTM.
This is the error I'm getting.
I don't know how to reshape the data to give as input.
ValueError: Error when checking input: expected dense_3_input to have 3 dimensions, but got array with shape (1920, 42)
input_shape = (80,24,42)
network = models.Sequential()
# Add fully connected layer with a ReLU activation function
network.add(layers.Dense(units=42, activation='relu',
input_shape=input_shape))
# Add fully connected layer with a ReLU activation function
network.add(layers.Dense(units=42, activation='relu'))
# Add fully connected layer with no activation function
network.add(layers.Dense(units=24))
network.summary()
Is this correct ?
Layer (type) Output Shape Param #
dense_25 (Dense) (None, 80, 24, 42) 1806
dense_26 (Dense) (None, 80, 24, 42) 1806
dense_27 (Dense) (None, 80, 24, 24) 1032
Total params: 4,644
Trainable params: 4,644
Non-trainable params: 0
Related
I'm trying to train a neural network to approximate a known scalar function of two variables; however, no matter the parameters of my training, the network always just ends up simply predicting the average value of the true outputs.
I am using an MLP and have tried:
using several network depths and widths
different optimizers (SGD and ADAM)
different activations (ReLU and Sigmoid)
changing the learning rate (several points within the range 0.1 to 0.001)
increasing the data (to 10,000 points)
increasing the number of epochs (to 2,000)
and different random seeds
to no avail.
My loss function is MSE and always plateaus to a value of about 5.14.
Regardless of changes I make, I get the following results:
Where the blue surface is the function to be approximated, and the green surface is the MLP approximation of the function, having a value that is roughly the average of the true function over that domain (the true average is 2.15 with a square of 4.64 - not far from the loss plateau value).
I feel like I could be missing something very obvious and have just been looking at it for too long. Any help is greatly appreciated! Thanks
I've attached my code here (I'm using JAX):
import jax.numpy as jnp
from jax import grad, jit, vmap, random, value_and_grad
import flax
import flax.linen as nn
import optax
seed = 2
key, data_key = random.split(random.PRNGKey(seed))
x1, x2, y= generate_data(data_key) # Data generation function
# Using Flax - define an MLP
class MLP(nn.Module):
features: Sequence[int]
#nn.compact
def __call__(self, x):
for feat in self.features[:-1]:
x = nn.relu(nn.Dense(feat)(x))
x = nn.Dense(self.features[-1])(x)
return x
# Define function that returns JITted loss function
def make_mlp_loss(input_data, true_y):
def mlp_loss(params):
pred_y = model.apply(params, input_data)
loss_vector = jnp.square(true_y.reshape(-1) - pred_y)
return jnp.average(loss_vector)
# Outer scope incapsulation saves the data and true output
return jit(mlp_loss)
# Concatenate independent variable vectors to be proper input shape
input_data = jnp.hstack((x1.reshape(-1, 1), x2.reshape(-1, 1)))
# Create loss function with data and true output
mlp_loss = make_mlp_loss(input_data, y)
# Create function that returns loss and gradient
loss_and_grad = value_and_grad(mlp_loss)
# Example architectures I've tried
architectures = [[16, 16, 1], [8, 16, 1], [16, 8, 1], [8, 16, 8, 1], [32, 32, 1]]
# Only using one seed but iterated over several
for seed in [645]:
for architecture in architectures:
# Create model
model = MLP(architecture)
# Initialize model with random parameters
key, params_key = random.split(key)
dummy = jnp.ones((1000, 2))
params = model.init(params_key, dummy)
# Create optimizer
opt = optax.adam(learning_rate=0.01) #sgd
opt_state = opt.init(params)
epochs = 50
for i in range(epochs):
# Get loss and gradient
curr_loss, curr_grad = loss_and_grad(params)
if i % 5 == 0:
print(curr_loss)
# Update
updates, opt_state = opt.update(curr_grad, opt_state)
params = optax.apply_updates(params, updates)
print(f"Architecture: {architecture}\nLoss: {curr_loss}\nSeed: {seed}\n\n")
Got many links to solve this read different stackoverflow answer related to this but not able to figure it out .
My image size is torch.Size([8, 3, 16, 16]).
My architechture is as below
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# linear layer (784 -> 1 hidden node)
self.fc1 = nn.Linear(16 * 16, 768)
self.fc2 = nn.Linear(768, 64)
self.fc3 = nn.Linear(64, 10)
self.dropout = nn.Dropout(p=.5)
def forward(self, x):
# flatten image input
x = x.view(-1, 16 * 16)
# add hidden layer, with relu activation function
x = self.dropout(F.relu(self.fc1(x)))
x = self.dropout(F.relu(self.fc2(x)))
x = F.log_softmax(self.fc3(x), dim=1)
return x
# specify loss function
criterion = nn.NLLLoss()
# specify optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=.003)
# number of epochs to train the model
n_epochs = 30 # suggest training between 20-50 epochs
model.train() # prep model for training
for epoch in range(n_epochs):
# monitor training loss
train_loss = 0.0
###################
# train the model #
###################
for data, target in trainloader:
# clear the gradients of all optimized variables
optimizer.zero_grad()
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# perform a single optimization step (parameter update)
optimizer.step()
# update running training loss
train_loss += loss.item()*data.size(0)
# print training statistics
# calculate average loss over an epoch
train_loss = train_loss/len(trainloader.dataset)
print('Epoch: {} \tTraining Loss: {:.6f}'.format(
epoch+1,
train_loss
))
i am getting value error as
ValueError: Expected input batch_size (24) to match target batch_size (8).
how to fix it . My batch size is 8 and input image size is (16*16).And i have 10 class classification here .
Your input images have 3 channels, therefore your input feature size is 16*16*3, not 16*16. Currently, you consider each channel as separate instances, leading to a classifier output - after x.view(-1, 16*16) flattening - of (24, 16*16). Clearly, the batch size doesn't match because it is supposed to be 8, not 8*3 = 24.
You could either:
Switch to a CNN to handle multi-channel inputs (here 3 channels).
Use a self.fc1 with 16*16*3 input features.
If the input is RGB, maybe even convert to 1-channel grayscale map.
My input images have 8 channels and my output (label) has 1 channel and my CNN in keras is like below:
def set_model(ks1=5, ks2=5, nf1=64, nf2=1):
model = Sequential()
model.add(Conv2D(nf1, padding="same", kernel_size=(ks1, ks1),
activation='relu', input_shape=(62, 62, 8)))
model.add(Conv2D(nf2, padding="same", kernel_size=(ks2, ks2),
activation='relu'))
model.compile(loss=keras.losses.binary_crossentropy,
optimizer=keras.optimizers.Adadelta())
return model
Below is the summary of the model implemented above:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 62, 62, 64) 12864
_________________________________________________________________
conv2d_2 (Conv2D) (None, 62, 62, 1) 1601
=================================================================
Total params: 14,465
Trainable params: 14,465
Non-trainable params: 0
_________________________________________________________________
And by another question, I asked here (How to have a 3D filter for Conv2D in Keras?) I am convinced that every channel has their own set of weights. Now, my question is why when I add another channel and increase them 7 to 8, why the results deteriorate?
Prediction accuracy index with 7 channels:
Prediction accuracy index with 8 channels:
I also standardized all 8 channels into values between 0 and 1.
one of the predictions with 7 channels:
one of the predictions with 8 channels:
I have a neural network as below:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_11 (Dense) (None, 36) 288
_________________________________________________________________
dense_12 (Dense) (None, 1) 37
=================================================================
Total params: 325
Trainable params: 325
Non-trainable params: 0
_________________________________________________________________
The activation functions for the first and second layers are "relu" and "sigmoid" respectively.
My problem is the output is a straight line:
I did more investigation and figured out that the weights of this neural net are also a straight line (just first layer).
x_train shape is (2516, 7), and y_train shape is (280, 7)
one of the features (dimensions) of the input data is like below and other are similar to this:
and the labels are like below:
I am relatively new to keras so i played around with a simple seq2seq architecture.
model = Sequential()
model.add(Embedding(22, 10, input_length=32, mask_zero=True))
model.add(LSTM(4, return_sequences = True))
model.add(TimeDistributed(Dense(output_dim=3)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.fit(x,y,nb_epoch =1, batch_size = 1, verbose = 2)
Exception: Error when checking model target: expected activation_6 to have shape (None, 32, 3) but got array with shape (2, 4, 3)
x has shape of (2,32) and y has shape of (2,4,3).
According to my understanding, it means that y has 2 examples with each sequence have a length of 4 and one hot-encoded into 3 dimension. However, when i ran model.fit it seems its not the shape that i am expecting. Why is this so?