Predicting self draw number with keras - neural-network

I used keras to train the MNIST dataset.
After training I tested the data on the MNIST dataset.
Now I drew a picture of a ZERO,ONE,TWO and THREE on paper and uploaded it to my jupiternotebook and would like to predict the numbers that I drew.
I tried to preprocess those numbers but I am still getting errors while predicting them.
Here is the code, plus one of picture I drew.
img = np.random.rand(224,224,3)
img_path = "0_a.jpg"
img = image.load_img(img_path, target_size=(224, 224))
print(type(img))
x = image.img_to_array(img)
print(type(x))
print(x.shape)
plt.imshow(x/255.)
img = np.random.rand(224,224,3)
img_path = "0_a.jpg"
img = image.load_img(img_path, target_size=(224, 224))
print(type(img))
x = image.img_to_array(img)
print(type(x))
print(x.shape)
plt.imshow(x/255.)
model.predict(x)
Here is the error I get, I am not sure what to do.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-166-2648d9cfd8aa> in <module>()
----> 1 model.predict(x)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/models.py in predict(self, x, batch_size, verbose, steps)
1025 self.build()
1026 return self.model.predict(x, batch_size=batch_size, verbose=verbose,
-> 1027 steps=steps)
1028
1029 def predict_on_batch(self, x):
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/engine/training.py in predict(self, x, batch_size, verbose, steps)
1780 x = _standardize_input_data(x, self._feed_input_names,
1781 self._feed_input_shapes,
-> 1782 check_batch_axis=False)
1783 if self.stateful:
1784 if x[0].shape[0] > batch_size and x[0].shape[0] % batch_size != 0:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/engine/training.py in _standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
108 ': expected ' + names[i] + ' to have ' +
109 str(len(shape)) + ' dimensions, but got array '
--> 110 'with shape ' + str(data_shape))
111 if not check_batch_axis:
112 data_shape = data_shape[1:]
ValueError: Error when checking : expected dense_13_input to have 2 dimensions, but got array with shape (224, 224, 3)
[1]: https://i.stack.imgur.com/SvB
ta.jpg

Your images have three channels (RGB), while the MNIST images only have one channel (grayscale). Your images need to be opened in grayscale mode. More info can be found at How can I convert an RGB image into grayscale in Python?, assuming you're using PIL.

Related

PyTorch LayerNorm vs nnabla layer_normalization -> output with same normalized input size differ, why?

PyTorch layernorm_before = nn.LayerNorm(hidden_size) input size(normalized shape = 768) Output torch size = 768 nnabla code: input = (1,197,768) batch axis =1 means 197 and batch axis = 2 means 768
layernorm = PF.layer_normalization(x, batch_axis=2) here input shape to nnabla is 768 but the output shape is (1,197,1)
Layernorm = PF.layer_normalization(x, batch_axis=1) here input shape to nnabla is 197 but the output shape is (1,1,768) as scenario 2 is matching the Pytorch output, but scenario 1 should have matched the output, why is this behaviour ?

Given groups=1, weight of size [10, 1, 5, 5], expected input[2, 3, 28, 28] to have 1 channels, but got 3 channels instead

I am trying to run CNN with train MNIST, but test on my own written digits. To do that I wrote the following code but I getting an error in title of this questions:
I am trying to run CNN with train MNIST, but test on my own written digits. To do that I wrote the following code but I getting an error in title of this questions:
batch_size = 64
train_dataset = datasets.MNIST(root='./data/',
train=True,
transform=transforms.ToTensor(),
download=True)
test_dataset = ImageFolder('my_digit_images/', transform=transforms.ToTensor())
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
batch_size=batch_size,
shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
batch_size=batch_size,
shuffle=False)
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
#print(self.conv1.weight.shape)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv3 = nn.Conv2d(20, 20, kernel_size=3)
#print(self.conv2.weight.shape)
self.mp = nn.MaxPool2d(2)
self.fc = nn.Linear(320, 10)
def forward(self, x):
in_size = x.size(0)
x = F.relu(self.conv1(x))
#print(x.shape)
x = F.relu(self.mp(self.conv2(x)))
x = F.relu(self.mp(self.conv3(x)))
#print("2.", x.shape)
# x = F.relu(self.mp(self.conv3(x)))
x = x.view(in_size, -1) # flatten the tensor
#print("3.", x.shape)
x = self.fc(x)
return F.log_softmax(x)
model = Net()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 10 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def test():
model.eval()
test_loss = 0
correct = 0
for data, target in test_loader:
data, target = Variable(data, volatile=True), Variable(target)
output = model(data)
test_loss += F.nll_loss(output, target, size_average=False).data
pred = output.data.max(1, keepdim=True)[1]
correct += pred.eq(target.data.view_as(pred)).cpu().sum()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
MNIST dataset contains black and white 1-channel images, while yours are 3-channeled RGB probably. Either recode your images or preprocess it like
img = img[:,0:1,:,:]
You can do it with custom transform, adding it after transforms.ToTensor()
The images in training and testing should follow the same distribution. Since MNIST data is by default in Grayscale and it is expected that you didn't change the channels, then the model expects the same number of channels in testing.
The following code is an example of how it's done using a transformation.
Following the order defined below, it
Converts the image to a single channel (Grayscale)
Resize the image to the size of the default MNIST data
Convert the image to a tensor
Normalize the tensor to have same mean and std as that of during training(assuming that you used the same values).
test_dataset = ImageFolder('my_digit_images/', transform=transforms.Compose([transforms.Grayscale(num_output_channels=1),
transforms.Resize((28, 28)),
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))]))

Very long training time on a simple CNN

#building a convolutional neural network for images
#hyperprams
learning_rate = 5e-3
in_channel = 3
num_classes = 57
batch_size = 24
class CNNet(nn.Module):
def __init__(self,num_classes):
super(CNNet, self).__init__()
#Using 3 inchannels for RGB bvalues
self.conv1 = nn.Conv2d(in_channels=3,
out_channels=15,
kernel_size= (3,3),
stride=(1,1),
padding= (1,1),
bias=True)
#in channels on second layer must match out on first layer
self.conv2 = nn.Conv2d(in_channels= 15,
out_channels= 30,
kernel_size=(3,3),
stride = (1,1),
padding=(1,1),
bias=True)
#can use max pooling to reduce image size to 13 x 13
self.pool = nn.MaxPool2d(2,2)
#last level is fully connected with num out channels * original image dem /2
self.fc = nn.Linear(30 * 13 * 13 * 3, num_classes)
def forward(self, x):
x = f.relu(self.conv1(x))
x = f.relu(self.conv2(x))
x = self.pool(x)
x = x.reshape(x.shape[0], -1)
x = self.fc(x)
return x
#instantiate the class object of CNNet
cnn = CNNet(57)
cnn_optimizer = optim.Adam(cnn.parameters(), lr = learning_rate)
#training the CNN
for epoch in range(1):
for i, (data,target) in enumerate(train_loader):
cnn_optimizer.zero_grad()
#forward
outputs = cnn(data)
loss = criterion(outputs, target)
#backward propigation
cnn_optimizer.zero_grad()
loss.backward()
cnn_optimizer.step()
Im having an issue with training a convolution neural network using a custom data.
I have about 10000 color images all of size 26X26 Im trying to use to train the model to predict 57 different classes.
When Im running on the custom images class train in taking hours to complete. Currently 12 hours and it hasn't finished.
Im assuming I have something wrong in the training process that's causing such a long run time, But Im not sure where.

I want to use Numpy to simulate the inference process of a quantized MobileNet V2 network, but the outcome is different with pytorch realized one

Python version: 3.8
Pytorch version: 1.9.0+cpu
Platform: Anaconda Spyder5.0
To reproduce this problem, just copy every code below to a single file.
The ILSVRC2012_val_00000293.jpg file used in this code is shown below, you also need to download it and then change its destination in the code.
Some background of this problem:
I am now working on a project that aims to develop a hardware accelerator to complete the inference process of the MobileNet V2 network. I used pretrained quantized Pytorch model to simulate the outcome, and the result comes out very well.
In order to use hardware to complete this task, I wish to know every inputs and outputs as well as intermidiate variables during runing this piece of pytorch code. I used a package named torchextractor to fetch the outcomes of first layer, which in this case, is a 3*3 convolution layer.
import numpy as np
import torchvision
import torch
from torchvision import transforms, datasets
from PIL import Image
from torchvision import transforms
import torchextractor as tx
import math
#########################################################################################
##### Processing of input image
#########################################################################################
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
test_transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,])
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
#image file destination
filename = "D:\Project_UM\MobileNet_VC709\MobileNet_pytorch\ILSVRC2012_val_00000293.jpg"
input_image = Image.open(filename)
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0)
#########################################################################################
#########################################################################################
#########################################################################################
#----First verify that the torchextractor class should not influent the inference outcome
# ofmp of layer1 before putting into torchextractor
a,b,c = quantize_tensor(input_batch)# to quantize the input tensor and return an int8 tensor, scale and zero point
input_qa = torch.quantize_per_tensor(torch.tensor(input_batch.clone().detach()), b, c, torch.quint8)# Using quantize_per_tensor method of torch
# Load a quantized mobilenet_v2 model
model_quantized = torchvision.models.quantization.mobilenet_v2(pretrained=True, quantize=True)
model_quantized.eval()
with torch.no_grad():
output = model_quantized.features[0][0](input_qa)# Ofmp of layer1, datatype : quantized_tensor
# print("FM of layer1 before tx_extractor:\n",output.int_repr())# Ofmp of layer1, datatype : int8 tensor
output1_clone = output.int_repr().detach().numpy()# Clone ofmp of layer1, datatype : ndarray
#########################################################################################
#########################################################################################
#########################################################################################
# ofmp of layer1 after adding torchextractor
model_quantized_ex = tx.Extractor(model_quantized, ["features.0.0"])#Capture of the module inside first layer
model_output, features = model_quantized_ex(input_batch)# Forward propagation
# feature_shapes = {name: f.shape for name, f in features.items()}
# print(features['features.0.0']) # Ofmp of layer1, datatype : quantized_tensor
out1_clone = features['features.0.0'].int_repr().numpy() # Clone ofmp of layer1, datatype : ndarray
if(out1_clone.all() == output1_clone.all()):
print('Model with torchextractor attached output the same value as the original model')
else:
print('Torchextractor method influence the outcome')
Here I define a numpy quantization scheme based on the quantization scheme proposed by
Quantization and Training of Neural Networks for Efficient
Integer-Arithmetic-Only Inference
# Convert a normal regular tensor to a quantized tensor with scale and zero_point
def quantize_tensor(x, num_bits=8):# to quantize the input tensor and return an int8 tensor, scale and zero point
qmin = 0.
qmax = 2.**num_bits - 1.
min_val, max_val = x.min(), x.max()
scale = (max_val - min_val) / (qmax - qmin)
initial_zero_point = qmin - min_val / scale
zero_point = 0
if initial_zero_point < qmin:
zero_point = qmin
elif initial_zero_point > qmax:
zero_point = qmax
else:
zero_point = initial_zero_point
# print(zero_point)
zero_point = int(zero_point)
q_x = zero_point + x / scale
q_x.clamp_(qmin, qmax).round_()
q_x = q_x.round().byte()
return q_x, scale, zero_point
#%%
# #############################################################################################
# --------- Simulate the inference process of layer0: conv33 using numpy
# #############################################################################################
# get the input_batch quantized buffer data
input_scale = b.item()
input_zero = c
input_quantized = a[0].detach().numpy()
# get the layer0 output scale and zero_point
output_scale = model_quantized.features[0][0].state_dict()['scale'].item()
output_zero = model_quantized.features[0][0].state_dict()['zero_point'].item()
# get the quantized weight with scale and zero_point
weight_scale = model_quantized.features[0][0].state_dict()["weight"].q_scale()
weight_zero = model_quantized.features[0][0].state_dict()["weight"].q_zero_point()
weight_quantized = model_quantized.features[0][0].state_dict()["weight"].int_repr().numpy()
# print(weight_quantized)
# print(weight_quantized.shape)
# bias_quantized,bias_scale,bias_zero= quantize_tensor(model_quantized.features[0][0].state_dict()["bias"])# to quantize the input tensor and return an int8 tensor, scale and zero point
# print(bias_quantized.shape)
bias = model_quantized.features[0][0].state_dict()["bias"].detach().numpy()
# print(input_quantized)
print(type(input_scale))
print(type(output_scale))
print(type(weight_scale))
Then I write a quantized 2D convolution using numpy, hope to figure out every details in pytorch data flow during the inference.
#%% numpy simulated layer0 convolution function define
def conv_cal(input_quantized, weight_quantized, kernel_size, stride, out_i, out_j, out_k):
weight = weight_quantized[out_i]
input = np.zeros((input_quantized.shape[0], kernel_size, kernel_size))
for i in range(weight.shape[0]):
for j in range(weight.shape[1]):
for k in range(weight.shape[2]):
input[i][j][k] = input_quantized[i][stride*out_j+j][stride*out_k+k]
# print(np.dot(weight,input))
# print(input,"\n")
# print(weight)
return np.multiply(weight,input).sum()
def QuantizedConv2D(input_scale, input_zero, input_quantized, output_scale, output_zero, weight_scale, weight_zero, weight_quantized, bias, kernel_size, stride, padding, ofm_size):
output = np.zeros((weight_quantized.shape[0],ofm_size,ofm_size))
input_quantized_padding = np.full((input_quantized.shape[0],input_quantized.shape[1]+2*padding,input_quantized.shape[2]+2*padding),0)
zero_temp = np.full(input_quantized.shape,input_zero)
input_quantized = input_quantized - zero_temp
for i in range(input_quantized.shape[0]):
for j in range(padding,padding + input_quantized.shape[1]):
for k in range(padding,padding + input_quantized.shape[2]):
input_quantized_padding[i][j][k] = input_quantized[i][j-padding][k-padding]
zero_temp = np.full(weight_quantized.shape, weight_zero)
weight_quantized = weight_quantized - zero_temp
for i in range(output.shape[0]):
for j in range(output.shape[1]):
for k in range(output.shape[2]):
# output[i][j][k] = (weight_scale*input_scale)*conv_cal(input_quantized_padding, weight_quantized, kernel_size, stride, i, j, k) + bias[i] #floating_output
output[i][j][k] = weight_scale*input_scale/output_scale*conv_cal(input_quantized_padding, weight_quantized, kernel_size, stride, i, j, k) + bias[i]/output_scale + output_zero
output[i][j][k] = round(output[i][j][k])
# int_output
return output
Here I input the same image, weight, and bias together with their zero_point and scale, then compare this "numpy simulated" result to the PyTorch calculated one.
quantized_model_out1_int8 = np.squeeze(features['features.0.0'].int_repr().numpy())
print(quantized_model_out1_int8.shape)
print(quantized_model_out1_int8)
out1_np = QuantizedConv2D(input_scale, input_zero, input_quantized, output_scale, output_zero, weight_scale, weight_zero, weight_quantized, bias, 3, 2, 1, 112)
np.save("out1_np.npy",out1_np)
for i in range(quantized_model_out1_int8.shape[0]):
for j in range(quantized_model_out1_int8.shape[1]):
for k in range(quantized_model_out1_int8.shape[2]):
if(out1_np[i][j][k] < 0):
out1_np[i][j][k] = 0
print(out1_np)
flag = np.zeros(quantized_model_out1_int8.shape)
for i in range(quantized_model_out1_int8.shape[0]):
for j in range(quantized_model_out1_int8.shape[1]):
for k in range(quantized_model_out1_int8.shape[2]):
if(quantized_model_out1_int8[i][j][k] == out1_np[i][j][k]):
flag[i][j][k] = 1
out1_np[i][j][k] = 0
quantized_model_out1_int8[i][j][k] = 0
# Compare the simulated result to extractor fetched result, gain the total hit rate
print(flag.sum()/(112*112*32)*100,'%')
If the "numpy simulated" results are the same as the extracted one, call it a hit. Print the total hit rate, it shows that numpy gets 92% of the values right. Now the problem is, I have no idea why the rest 8% of values come out wrong.
Comparison of two outcomes:
The picture below shows the different values between Numpy one and PyTorch one, the sample channel is index[1]. The left upper corner is Numpy one, and the upright corner is PyTorch one, I have set all values that are the same between them to 0, as you can see, most of the values just have a difference of 1(This can be view as the error brought by the precision loss of fixed point arithmetics), but some have large differences, e.g. the value[1][4], 121 vs. 76 (I don't know why)
Focus on one strange value:
This code is used to replay the calculation process of the value[1][4], originally I was expecting a trial and error process could lead me to solve this problem, to get my wanted number of 76, but no matter how I tried, it didn't output 76. If you want to try this, I paste this code for your convenience.
#%% A test code to check the calculation process
weight_quantized_sample = weight_quantized[2]
M_t = input_scale * weight_scale / output_scale
ifmap_t = np.int32(input_quantized[:,1:4,7:10])
weight_t = np.int32(weight_quantized_sample)
bias_t = bias[2]
bias_q = bias_t/output_scale
res_t = 0
for ch in range(3):
ifmap_offset = ifmap_t[ch]-np.int32(input_zero)
weight_offset = weight_t[ch]-np.int32(weight_zero)
res_ch = np.multiply(ifmap_offset, weight_offset)
res_ch = res_ch.sum()
res_t = res_t + res_ch
res_mul = M_t*res_t
# for n in range(1, 30):
# res_mul = multiply(n, M_t, res_t)
res_t = round(res_mul + output_zero + bias_q)
print(res_t)
Could you help me out of this, have been stuck here for a long time.
I implemented my own version of quantized convolution and got from 99.999% to 100% hitrate (and mismatch of a single value is by 1 that I can consider to be a rounding issue). The link on the paper in the question helped a lot.
But I found that your formulas are the same as mine. So I don't know what was your issue. As I understand quantization in pytorch is hardware dependent.
Here is my code:
def my_Conv2dRelu_b2(input_q, conv_layer, output_shape):
'''
Args:
input_q: quantized tensor
conv_layer: quantized tensor
output_shape: the pre-computed shape of the result
Returns:
'''
output = np.zeros(output_shape)
# extract needed float numbers from quantized operations
weights_scale = conv_layer.weight().q_per_channel_scales()
input_scale = input_q.q_scale()
weights_zp = conv_layer.weight().q_per_channel_zero_points()
input_zp = input_q.q_zero_point()
# extract needed convolution parameters
padding = conv_layer.padding
stride = conv_layer.stride
# extract float numbers for results
output_zp = conv_layer.zero_point
output_scale = conv_layer.scale
conv_weights_int = conv_layer.weight().int_repr()
input_int = input_q.int_repr()
biases = conv_layer.bias().numpy()
for k in range(input_q.shape[0]):
for i in range(conv_weights_int.shape[0]):
output[k][i] = manual_convolution_quant(
input_int[k].numpy(),
conv_weights_int[i].numpy(),
biases[i],
padding=padding,
stride=stride,
image_zp=input_zp, image_scale=input_scale,
kernel_zp=weights_zp[i].item(), kernel_scale=weights_scale[i].item(),
result_zp=output_zp, result_scale=output_scale
)
return output
def manual_convolution_quant(image, kernel, b, padding, stride, image_zp, image_scale, kernel_zp, kernel_scale,
result_zp, result_scale):
H = image.shape[1]
W = image.shape[2]
new_H = H // stride[0]
new_W = W // stride[1]
results = np.zeros([new_H, new_W])
M = image_scale * kernel_scale / result_scale
bias = b / result_scale
paddedIm = np.pad(
image,
[(0, 0), (padding[0], padding[0]), (padding[1], padding[1])],
mode="constant",
constant_values=image_zp,
)
s = kernel.shape[1]
for i in range(new_H):
for j in range(new_W):
patch = paddedIm[
:, i * stride[0]: i * stride[0] + s, j * stride[1]: j * stride[1] + s
]
res = M * ((kernel - kernel_zp) * (patch - image_zp)).sum() + result_zp + bias
if res < 0:
res = 0
results[i, j] = round(res)
return results
Code to compare pytorch and my own version.
def calc_hit_rate(array1, array2):
good = (array1 == array2).astype(np.int).sum()
all = array1.size
return good / all
# during inference
y2 = model.conv1(y1)
y2_int = torch.int_repr(y2)
y2_int_manual = my_Conv2dRelu_b2(y1, model.conv1, y2.shape)
print(f'y2 hit rate= {calc_hit_rate(y2.int_repr().numpy(), y2_int_manual)}') #hit_rate=1.0

Add dense layer before LSTM layer in keras or Tensorflow?

I am trying to implement a denoising autoencoder with an LSTM layer in between. The architecture goes following.
FC layer -> FC layer -> LSTM cell -> FC layer -> FC layer.
I am unable to understand how my input dimension should be to implement this architecture?
I tried the following code
batch_size = 1
model = Sequential()
model.add(Dense(5, input_shape=(1,)))
model.add(Dense(10))
model.add(LSTM(32))
model.add(Dropout(0.3))
model.add(Dense(5))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, nb_epoch=100, batch_size=batch_size, verbose=2)
My trainX is [650,20,1] vector. It is a time series data in with only one feature.
I am getting following error
ValueError Traceback (most recent call last)
<ipython-input-20-1248a33f6518> in <module>()
3 model.add(Dense(5, input_shape=(1,)))
4 model.add(Dense(10))
----> 5 model.add(LSTM(32))
6 model.add(Dropout(0.3))
7 model.add(Dense(5))
/usr/local/lib/python2.7/dist-packages/keras/models.pyc in add(self, layer)
330 output_shapes=[self.outputs[0]._keras_shape])
331 else:
--> 332 output_tensor = layer(self.outputs[0])
333 if isinstance(output_tensor, list):
334 raise TypeError('All layers in a Sequential model '
/usr/local/lib/python2.7/dist-packages/keras/engine/topology.pyc in __call__(self, x, mask)
527 # Raise exceptions in case the input is not compatible
528 # with the input_spec specified in the layer constructor.
--> 529 self.assert_input_compatibility(x)
530
531 # Collect input shapes to build layer.
/usr/local/lib/python2.7/dist-packages/keras/engine/topology.pyc in assert_input_compatibility(self, input)
467 self.name + ': expected ndim=' +
468 str(spec.ndim) + ', found ndim=' +
--> 469 str(K.ndim(x)))
470 if spec.dtype is not None:
471 if K.dtype(x) != spec.dtype:
ValueError: Input 0 is incompatible with layer lstm_10: expected ndim=3, found ndim=2
The dense layer can take sequences as input and it will apply the same dense layer on every vector (last dimension). Example :
You have a 2D tensor input that represents a sequence (timesteps, dim_features), if you apply a dense layer to it with new_dim outputs, the tensor that you will have after the layer will be a new sequence (timesteps, new_dim)
If you have a 3D tensor (n_lines, n_words, embedding_dim) that can be a document, with n_lines lines, n_words words per lines and embedding_dim dimensions for each word, applying a dense layer to it with new_dim outputs will get you a new doc tensor (3D) with shape (n_lines, n_words, new_dim)
You can see here the dimensions input and output that you can feed and get with the Dense() layer.
Although Nassim Ben already explained the background, as Google brought me here, I would like to mention the tensorflow.keras.Layers.Reshape layer.
I originally came from a "how to implement dropout"-point-of-view, but ran into the same problem.
Albeit the different Layer classes (may) come with their own dropout-options already embedded, I like to have my own, separate tensorflow.keras.Layers.Dropout squished in-between (for it helps my weak mind keeping track of them).
With
model.add(Reshape((1, Xtrain1.shape[1])))
now in turn squished in-between layers of the form Dropout (or Dense for that matter) and LSTM, at least I persuade myself, one has a solution tying together different layers with different requirements in terms of tensor dimension.