Related
I'm using pytorch and this is the model from huggingface transformers link:
from transformers import BertTokenizerFast, BertForSequenceClassification
bert = BertForSequenceClassification.from_pretrained("bert-base-uncased",
num_labels=int(data['class'].nunique()),
output_attentions=False,
output_hidden_states=False)
and in the forward function I'm building, I'm calling x1, x2 = self.bert(sent_id, attention_mask=mask)
Now, as far as I know, x2 is the cls output(which is the output of the first transformer encoder) but yet again, I don't think I understand the output of the model.
but I want the output of all the 12 last transformer encoders.
How can I do that in pytorch ?
Ideally, if you want to look into the outputs of all the layer, you should use BertModel and not BertForSequenceClassification. Because, BertForSequenceClassification is inherited from BertModel and adds a linear layer on top of the BERT model.
from transformers import BertModel
my_bert_model = BertModel.from_pretrained("bert-base-uncased")
### Add your code to map the model to device, data to device, and obtain input_ids and mask
sequence_output, pooled_output = my_bert_model(ids, attention_mask=mask)
# sequence_output has the following shape: (batch_size, sequence_length, 768), which contains output for all tokens in the last layer of the BERT model.
sequence_output contains output for all tokens in the last layer of the BERT model.
In order to obtain the outputs of all the transformer encoder layers, you can use the following:
my_bert_model = BertModel.from_pretrained("bert-base-uncased")
sequence_output, pooled_output, all_layer_output = model(ids, attention_mask=mask, output_hidden_states=True)
all_layer_output is a output tuple containing the outputs embeddings layer + outputs of all the layer. Each element in the tuple will have a shape (batch_size, sequence_length, 768)
Hence, to get the sequence of outputs at layer-5, you can use all_layer_output[5]. As, all_layer_output[0] contains outputs of the embeddings.
detailed in the doc: https://huggingface.co/transformers/model_doc/bert.html#transformers.BertModel.
from transformers import BertModel, BertConfig
config = BertConfig.from_pretrained("xxx", output_hidden_states=True)
model = BertModel.from_pretrained("xxx", config=config)
outputs = model(inputs)
print(len(outputs)) # 3
hidden_states = outputs[2]
print(len(hidden_states)) # 13
embedding_output = hidden_states[0]
attention_hidden_states = hidden_states[1:]
I have a code that is subclassing nn.module .
i don't know what exactly reset_() function does and i did not find any reset_() function in nn.module source code.
who knows how can i use this for resetting connections in neural networks when there is no operations in that and also there is no function in name reset_() in parent class???
class Connection(torch.nn.module):
super().__init__()
def reset_(self) -> None:
#Contains resetting logic for the connection.#
super().reset_()
Although I am not sure what did you mean by reset() function in a PyTorch Module, however, usually in many NN layers, there is a reset_parameters() function which is used to reset the parameters of that layer. I am giving you an example if it helps.
import torch
import torch.nn as nn
class Connection(nn.Module):
def __init__(self):
super().__init__()
# a weight matrix of shape [10 x 100] as parameters
self.weight = nn.Parameter(torch.Tensor(10, 100))
def reset_parameters(self) -> None:
# reset parameters using random values from a uniform distribution
nn.init.uniform_(self.weight, -0.01, 0.01)
c = Connection()
c.reset_parameters() # reset the weight parameters
This is merely an example, you can modify the reset_parameters function to fulfill your need.
I'm creating an LSTM Encoder-Decoder Network, using Keras, following the code provided here: https://github.com/LukeTonin/keras-seq-2-seq-signal-prediction. The only change I made is to replace the GRUCell with an LSTMCell. Basically both the encoder and decoder consists of 2 layers, of 35 LSTMCells. The layers are stacked over (and combined with) each other using an RNN Layer.
The LSTMCell returns 2 states whereas the GRUCell returns 1 state. This is where I am encountering an error, as I do not know how to code for the 2 returned states of the LSTMCell.
I have created two models: first, an encoder-decoder model. Second, a prediction model. I am not encountering any problems in the encoder-decoder model, but a encountering problems in the decoder of the prediction model.
The error I am getting is:
ValueError: Layer rnn_4 expects 9 inputs, but it received 3 input tensors. Input received: [<tf.Tensor 'input_4:0' shape=(?, ?, 1) dtype=float32>, <tf.Tensor 'input_11:0' shape=(?, 35) dtype=float32>, <tf.Tensor 'input_12:0' shape=(?, 35) dtype=float32>]
This error happens when this line below, in the prediction model, is run:
decoder_outputs_and_states = decoder(
decoder_inputs, initial_state=decoder_states_inputs)
The section of code this fits into is:
encoder_predict_model = keras.models.Model(encoder_inputs,
encoder_states)
decoder_states_inputs = []
# Read layers backwards to fit the format of initial_state
# For some reason, the states of the model are order backwards (state of the first layer at the end of the list)
# If instead of a GRU you were using an LSTM Cell, you would have to append two Input tensors since the LSTM has 2 states.
for hidden_neurons in layers[::-1]:
# One state for GRU, but two states for LSTMCell
decoder_states_inputs.append(keras.layers.Input(shape=(hidden_neurons,)))
decoder_outputs_and_states = decoder(
decoder_inputs, initial_state=decoder_states_inputs)
decoder_outputs = decoder_outputs_and_states[0]
decoder_states = decoder_outputs_and_states[1:]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_predict_model = keras.models.Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
Could somebody help me with the for loop above, and initial states I should be passing the decoder after that?
I had an similar error and i solved just doing what he says, adding another input tensor:
# If instead of a GRU you were using an LSTM Cell, you would have to append two Input tensors since the LSTM has 2 states.
for hidden_neurons in layers[::-1]:
# One state for GRU
decoder_states_inputs.append(keras.layers.Input(shape=(hidden_neurons,)))
decoder_states_inputs.append(keras.layers.Input(shape=(hidden_neurons,)))
here it solved the prolem...
In the user manual, it shows the different kernel_initializer below
https://keras.io/initializers/
the main purpose is to initialize the weight matrix in the neural network.
Anyone knows what the default initializer is? the document didn't show the default.
Usually, it's glorot_uniform by default. Different layer types might have different default kernel_initializer. When in doubt, just look in the source code. For example, for Dense layer:
class Dense(Layer):
...
def __init__(self, units,
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs):
GlorotUniform, keras uses Glorot initialization with a uniform distribution.r = √(3/fan_avg)
fan_avg = (fan_in + fan_out) /2
number of inputs = fan_in
number of nurons in a layer = fan_out
I am looking for the best approach to train on larger-than-memory-data in Keras and currently noticing that the vanilla ImageDataGenerator tends to be slower than I would hope.
I have two networks training on the Kaggle cat's vs dogs dataset (25000 images):
1) this approach is exactly the code from: http://www.pyimagesearch.com/2016/09/26/a-simple-neural-network-with-python-and-keras/
2) same as (1) but using an ImageDataGenerator instead of loading into memory the data
Note: for below, "preprocessing" means resizing, scaling, flattening
I find the following on my gtx970:
For network 1, it takes ~0s per epoch.
For network 2, it takes ~36s per epoch if the preprocessing is done in the data generator.
For network 2, it takes ~13s per epoch if preprocessing is done in a first-pass outside of the data generator.
Is this likely the speed limit for ImageDataGenerator (13s seems like the usual 10-100x difference between disk and ram...)? Are there approaches/mechanisms better suited for training on larger-than-memory-data when using Keras?
e.g. Perhaps there is way to get the ImageDataGenerator in Keras to save its processed images after the first epoch?
Thanks!
I assume you already might have solved this, but nevertheless...
Keras image preprocessing has the option of saving the results by setting the save_to_dir argument in the flow() or flow_from_directory() function:
https://keras.io/preprocessing/image/
In my understanding, problem is that augmented images are used only once in a training cycle of a model, not even across several epochs. So it's a huge waste of GPU cycles while CPU is struggling.
I found following solution:
I generate as many augmentations in RAM as I can
I use them for training across a frame of epochs, 10 to 30, whatever it takes to get a noticeable convergence
after that I generate new batch of augmented images (by implementing on_epoch_end) and process goes on.
This approach most of the time keeps GPU busy, while being able to benefit from data augmentation. I use custom Sequence subclass to generate augmentation and fix classes imbalance at the same time.
EDIT: adding some code to clarify the idea
from pyutilz.string import read_config_file
from tqdm.notebook import tqdm
from gc import collect
import numpy as np
import tensorflow
import random
import cv2
class StoppingFromFile(tensorflow.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
if read_config_file('control.ini','ML','stop',globals()):
if stop is not None:
if stop==True or stop=='True':
logging.warning(f'Model should be stopped according to the control fole')
self.model.stop_training = True
class AugmentedBalancedSequence(tensorflow.keras.utils.Sequence):
def __init__(self, images_and_classes:dict,input_size:tuple,class_sizes:list, augmentations_fn:object, preprocessing_fn:object, batch_size:int=10,
num_class_samples=100, frame_length:int=5, aug_p:float=0.1,aug_pipe_p:float=0.2,is_validation:bool=False,
disk_saving_prob:float=.01,disk_example_nfiles:int=50):
"""
From a dict of file paths grouped by class label, creates each N epochs augmented balanced training set.
If current class is too scarce, ensures that current frame has no duplicate final images.
If it's rich enough, ensures that current frame has no duplicate base images.
"""
logging.info(f'Got {len(images_and_classes)} classes.')
self.disk_example_nfiles=disk_example_nfiles;self.disk_saving_prob=disk_saving_prob;self.cur_example_file=0
self.images_and_classes=images_and_classes
self.num_class_samples=num_class_samples
self.augmentations_fn=augmentations_fn
self.preprocessing_fn=preprocessing_fn
self.is_validation=is_validation
self.frame_length=frame_length
self.batch_size = batch_size
self.class_sizes=class_sizes
self.input_size=input_size
self.aug_pipe_p=aug_pipe_p
self.aug_p=aug_p
self.images=None
self.epoch = 0
#print(f'got frame_length={self.frame_length}')
self._generate_data()
def __len__(self):
return int(np.ceil(len(self.images)/ float(self.batch_size)))
def __getitem__(self, idx):
a=idx * self.batch_size;b=a+self.batch_size
return self.images[a:b],self.labels[a:b]
def on_epoch_end(self):
import ast
self.epoch += 1
mydict={}
import pathlib
fname='control.json'
p = pathlib.Path(fname)
if p.is_file():
try:
with open (fname) as f:
mydict=json.load(f)
for var,val in mydict.items():
if hasattr(self,var):
converted = val #ast.literal_eval(val)
if converted is not None:
if getattr(self, var)!=converted:
setattr(self, var, converted)
print(f'{var} became {val}')
except Exception as e:
logging.error(str(e))
if self.epoch % self.frame_length == 0:
#print('generating data...')
self._generate_data()
def _add_sample(self,image,label):
from random import random
idx=self.indices[self.img_sent]
if self.disk_saving_prob>0:
if random()<self.disk_saving_prob:
self.cur_example_file+=1
if self.cur_example_file>self.disk_example_nfiles:
self.cur_example_file=1
Path(r'example_images/').mkdir(parents=True, exist_ok=True)
cv2.imwrite(f'example_images/test{self.cur_example_file}.jpg',cv2.cvtColor(image,cv2.COLOR_RGB2BGR))
if self.preprocessing_fn:
self.images[idx]=self.preprocessing_fn(image)
else:
self.images[idx]=image
self.labels[idx]=label
self.img_sent+=1
def _generate_data(self):
logging.info('Generating new set of augmented data...')
collect()
#del self.images
#del self.labels
#collect()
if self.num_class_samples:
expected_length=len(self.images_and_classes)*self.num_class_samples
else:
expected_length=sum(self.class_sizes.values())
if self.images is None:
self.images=np.empty((expected_length,)+(self.input_size[1],)+(self.input_size[0],)+(3,))
self.labels=np.empty((expected_length),np.int32)
self.indices=np.random.choice(expected_length, expected_length, replace=False)
self.img_sent=0
collect()
relaxed_augmentation_pipeline=self.augmentations_fn(p=self.aug_p,pipe_p=self.aug_pipe_p)
maxed_out_augmentation_pipeline=self.augmentations_fn(p=self.aug_p,pipe_p=1.0)
#for each class
x,y=[],[]
nartificial=0
for label,images in tqdm(self.images_and_classes.items()):
if self.num_class_samples is None:
#Just all native samples without augmentations
for image in images:
self._add_sample(image,label)
else:
#if there are enough native samples
if len(images)>=self.num_class_samples:
#randomly select samples of this class which will participate in this frame of epochs
indices=np.random.choice(len(images), self.num_class_samples, replace=False)
#apply albumentations pipeline to selected samples
for idx in indices:
if not self.is_validation:
self._add_sample(relaxed_augmentation_pipeline(image=images[idx])['image'],label)
else:
self._add_sample(images[idx],label)
else:
#------------------------------------------------------------------------------------------------------------------------------------------------------------------
# Randomly pick next image from existing. try applying augmentation pipeline (with maxed out probability) till we get num_class_samples DIFFERENT images
#------------------------------------------------------------------------------------------------------------------------------------------------------------------
hashes=set()
norig=0
while len(hashes)<self.num_class_samples:
if self.is_validation and norig<len(images):
#just include all originals first
image=images[norig]
else:
image=maxed_out_augmentation_pipeline(image=random.choice(images))['image']
next_hash=np.sum(image)
if next_hash not in hashes or (self.is_validation and norig<=len(images)):
#print(f'Adding orig {norig} out of {self.num_class_samples}, hashes={hashes}')
self._add_sample(image,label)
if next_hash in hashes:
norig+=1
hashes.add(norig)
else:
hashes.add(next_hash)
nartificial+=1
#self.images=self.images[indices];self.labels=self.labels[indices]
logging.info(f'Generated {self.img_sent} samples ({nartificial} artificial)')
once I have images and classes loaded,
train_datagen = AugmentedBalancedSequence(images_and_classes=images_and_classes_train,
input_size=INPUT_SIZE,class_sizes=class_sizes_train,num_class_samples=UPSCALE_SAMPLES,
augmentations_fn=get_albumentations_pipeline,aug_p=AUG_P,aug_pipe_p=AUG_PIPE_P,preprocessing_fn=preprocess_input, batch_size=BATCH_SIZE,frame_length=FRAME_LENGTH,disk_saving_prob=0.05)
val_datagen = AugmentedBalancedSequence(images_and_classes=images_and_classes_val,
input_size=INPUT_SIZE,class_sizes=class_sizes_val,num_class_samples=None,
augmentations_fn=get_albumentations_pipeline,preprocessing_fn=preprocess_input, batch_size=BATCH_SIZE,frame_length=FRAME_LENGTH,is_validation=True)
and after the model is instantiated, I do
model.fit(train_datagen,epochs=600,verbose=1,
validation_data=(val_datagen.images,val_datagen.labels),validation_batch_size=BATCH_SIZE,
callbacks=[checkpointer,StoppingFromFile()],validation_freq=1)