How initialize weights of a `torch.nn.Transformer` module? - neural-network

I am using a vanilla transformer architecture from the "Attention Is All You Need" paper for a sequence-to-sequence task. As shown in the following code.
Assuming that I would like to use the torch.nn.init.kaiming_uniform_ initialization method, how would one go about initializing the weights of the nn.Transformer ?
Is it necessary to use a custom encoder and decoder class in order for that to happen?
import torch
import torch.nn as nn
import math
# helper Module that adds positional encoding to the token embedding to introduce a notion of word order.
import torch.nn as nn
class PositionalEncoding(nn.Module):
def __init__(self,
emb_size: int,
dropout: float,
maxlen: int = 20):
super(PositionalEncoding, self).__init__()
den = torch.exp(- torch.arange(0, emb_size, 2)* math.log(10000) / emb_size)
pos = torch.arange(0, maxlen).reshape(maxlen, 1)
pos_embedding = torch.zeros((maxlen, emb_size))
pos_embedding[:, 0::2] = torch.sin(pos * den)
pos_embedding[:, 1::2] = torch.cos(pos * den)
pos_embedding = pos_embedding.unsqueeze(-2)
self.dropout = nn.Dropout(dropout)
self.register_buffer('pos_embedding', pos_embedding)
def forward(self, token_embedding: Tensor):
return self.dropout(token_embedding + self.pos_embedding[:token_embedding.size(0), :])
# helper Module to convert tensor of input indices into corresponding tensor of token embeddings
class TokenEmbedding(nn.Module):
def __init__(self, vocab_size: int, emb_size):
super(TokenEmbedding, self).__init__()
self.embedding = nn.Embedding(vocab_size, emb_size)
# Initialize weights with He initialization
self.embedding.weight = nn.init.kaiming_uniform_(self.embedding.weight)
self.emb_size = emb_size
def forward(self, tokens: Tensor):
return self.embedding(tokens.long()) * math.sqrt(self.emb_size)
# Seq2Seq Network
class Transformer(nn.Module):
def __init__(self,
src_vocab_size: int,
tgt_vocab_size: int,
num_encoder_layers: int = 1,
num_decoder_layers: int = 1,
emb_size: int = 300,
nhead: int = 3,
dim_feedforward: int = 512,
dropout: float = 0.1,
activation_function='relu'):
super().__init__()
self.src_tok_emb = TokenEmbedding(src_vocab_size, emb_size)
self.tgt_tok_emb = TokenEmbedding(tgt_vocab_size, emb_size)
self.positional_encoding = PositionalEncoding(
emb_size, dropout=dropout)
self.transformer = torch.nn.Transformer(d_model=emb_size,
nhead=nhead,
num_encoder_layers=num_encoder_layers,
num_decoder_layers=num_decoder_layers,
dim_feedforward=dim_feedforward,
dropout=dropout,
batch_first=True,
activation=activation_function)
self.generator = nn.Linear(emb_size, tgt_vocab_size)
def init_weights(self):
self.generator.weight = nn.init.kaiming_uniform_(self.generator.weight)
def forward(self,
src: Tensor,
trg: Tensor,
src_mask: Tensor,
tgt_mask: Tensor,
src_padding_mask: Tensor,
tgt_padding_mask: Tensor,
memory_key_padding_mask: Tensor):
# the .permute() is necessary since the positional-encoder expects tensors to be of shape
# (seq_len, batch_size, emb_length)
src_emb = self.positional_encoding(self.src_tok_emb(src).permute(1,0,2)).permute(1,0,2)
tgt_emb = self.positional_encoding(self.tgt_tok_emb(trg).permute(1,0,2)).permute(1,0,2)
outs = self.transformer(src_emb, tgt_emb, src_mask, tgt_mask, None,
src_padding_mask, tgt_padding_mask, memory_key_padding_mask)
return self.generator(outs)
def encode(self, src: Tensor, src_mask: Tensor):
return self.transformer.encoder(self.positional_encoding(
self.src_tok_emb(src)), src_mask)
def decode(self, tgt: Tensor, memory: Tensor, tgt_mask: Tensor):
return self.transformer.decoder(self.positional_encoding(
self.tgt_tok_emb(tgt)), memory,
tgt_mask)

Related

I use LSTM attention but model does not learn. How can I imporve model?

def __init__(self):
super().__init__()
self.lstm = nn.LSTM(input_dim,
hidden_dim,
num_layers=num_layers,
bidirectional=bidirectional,
dropout=dropout,
batch_first=True)
self.fc = nn.Linear(hidden_dim * 2, num_classes)
def attention_net(self, lstm_output, final_state):
hidden = final_state.unsqueeze(2)
attn_weights = torch.bmm(lstm_output, hidden).squeeze(2)
soft_attn_weights = F.softmax(attn_weights, 1)
context = torch.bmm(lstm_output.transpose(1, 2),
soft_attn_weights.unsqueeze(2)).squeeze(2)
return context, soft_attn_weights.cpu().data.numpy()
def forward(self, text):
output, (hn, cn) = self.lstm(text)
hn = torch.cat((hn[-2,:,:], hn[-1,:,:]), dim = 1)
attn_output, attention = self.attention_net(output, hn)
return self.fc(attn_output), attention`
I use LSTM + attention. Model does not learn class = 3 but give me only one class all the time.

Scala Spark: How do I bootstrap sample from a column of a Spark Dataframe?

I am looking to sample values, with replacement, from a column of a Spark DataFrame, using the Scala programming language in a Jupyter Notebook setting in a cluster environment. How do I do this?
I tried the following function that I found online:
import scala.util
def bootstrapMean(originalData: Array[Double]): Double = {
val n = originalData.length
def draw: Double = originalData(util.Random.nextInt(n))
// a tail recursive loop to randomly draw and add a value to the accumulating sum
def drawAndSumValues(current: Int, acc: Double = 0D): Double = {
if (current == 0) acc
else drawAndSumValues(current - 1, acc + draw)
}
drawAndSumValues(n) / n
}
Like so:
val data = stack.select("column_with_values").collect.map(_.toSeq).flatten
val m = 10
val bootstraps = Vector.fill(m)(bootstrapMean(data))
But I get the error:
An error was encountered:
<console>:47: error: type mismatch;
found : Array[Any]
required: Array[Double]
Note: Any >: Double, but class Array is invariant in type T.
You may wish to investigate a wildcard type such as `_ >: Double`. (SLS 3.2.10)
val bootstraps = Vector.fill(m)(bootstrapMean(data))
Not sure how to debug this, and whether I should bother to or try another approach. I'm looking for ideas/documentation/code. Thanks.
Update:
How do I put the user mck's solution below, in a for loop? I tried the following:
var bootstrap_container = Seq()
var a = 1
for( a <- 1 until 3){
var sampled = stack_b.select("diff_hours").sample(withReplacement = true, fraction = 0.5, seed = a)
var smpl_average = sampled.select(avg("diff_hours")).collect()(0)(0)
var bootstrap_smpls = bootstrap_container.union(Seq(smpl_average)).collect()
}
bootstrap_smpls
but that gives an error:
<console>:49: error: not enough arguments for method collect: (pf: PartialFunction[Any,B])(implicit bf: scala.collection.generic.CanBuildFrom[Seq[Any],B,That])That.
Unspecified value parameter pf.
var bootstrap_smpls = bootstrap_container.union(Seq(smpl_average)).collect()
You can use the sample method of dataframes, for example, if you want to sample with replacement and with a fraction of 0.5:
val sampled = stack.select("column_with_values").sample(true, 0.5)
To get the mean, you can do:
val col_average = sampled.select(avg("column_with_values")).collect()(0)(0)
EDIT:
var bootstrap_container = List[Double]()
var a = 1
for( a <- 1 until 3){
var sampled = stack_b2.select("diff_hours").sample(withReplacement = true, fraction = 0.5, seed = a)
var smpl_average = sampled.select(avg("diff_hours")).collect()(0)(0)
bootstrap_container = bootstrap_container :+ smpl_average.asInstanceOf[Double]
}
var mean_bootstrap = bootstrap_container.reduce(_ + _) / bootstrap_container.length

custom activation function in PyTorch - fix prediction

I read this post about customa ctivation function, but still I can't implement my code. My activation function can be expressed as a combination of existing PyTorch functions and it works fine function_pytorch(prediction, Q_sample). [Q_samples, is some variable I need it and it does't need gradient. ]
My activation function should receive the output of NN and , implement the function_pytorch and it's out put goes in the loss function. so:
class Activation_fun(nn.Module):
def __init__(self, prediction):
super().__init__()
def forward(self, input, Q_samples):
return function_pytorch(input, Q_samples)
in my NN I have
class NeuralNet(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(NeuralNet, self).__init__()
self.BN0 = nn.BatchNorm1d(input_size)
self.l1 = nn.Linear(input_size, hidden_size)
self.tan = nn.Tanh()
self.BN = nn.BatchNorm1d(output_size)
#custom activation
self.l2 = Activation_fun()
def forward(self, x, q):
out = self.BN0(x)
out = self.l1(out)
out = self.tan()
out = self.BN9(out)
out = self.l2(out, q)
return out
model = NeuralNet(input_size, hidden_size, output_size)
and in my training epochs:
outputs = model(inputs, q_samples)
The problem is: my prediction remains fix if I apply my customized activation function.
Is there any problem in my implementation?

Use generator with ruamel.yaml

I would like to have a bunch of generators in my config dict. So I tried this:
#yaml.register_class
class UniformDistribution:
yaml_tag = '!uniform'
#classmethod
def from_yaml(cls, a, node):
for x in node.value:
if x[0].value == 'min':
min_ = float(x[1].value)
if x[0].value == 'max':
max_ = float(x[1].value)
def f():
while True:
yield np.random.uniform(min_, max_)
g = f()
return g
However, the parser never returns because generators are used internally to resolve reference like &A and *A. Therefore, something like returning (g,) is a fairly simple workaround, but I would prefer a solution where I don't need the additional and very confusing index 0 term in next(config['position_generator'][0]).
Any Ideas?
This wrapper adapted from a different question did exactly what I was looking for.
class GeneratorWrapper(Generator):
def __init__(self, function, *args):
self.function = function
self.args = args
def send(self, ignored_arg):
return self.function(*self.args)
def throw(self, typ=None, val=None, tb=None):
raise StopIteration
#yaml.register_class
class UniformDistribution:
yaml_tag = '!uniform'
#classmethod
def from_yaml(cls, constructor, node):
for x in node.value:
value = float(x[1].value)
if x[0].value == 'min':
min_ = value
if x[0].value == 'max':
max_ = value
return GeneratorWrapper(np.random.uniform, min_, max_)

Bitstream Library in Scala

I need to compress/decompress some Data with a old, in house developed Algorithm.
There i have a lot of operations like:
if the next bit is 0 take the following 6 Bits and interpret them as an Int
if the next bits are 10 take the following 9 Bits and interpret them as an Int
etc.
Knows somebody something like a "Bitstrem" class in Scala? (I didn't found anything and hope that i didn't have to implement it by myself.)
Thanks
Edit:
I combined the answer with http://www.scala-lang.org/node/8413 ("The Architecture of Scala Collections") If somebody needs the samething:
abstract class Bit
object Bit {
val fromInt: Int => Bit = Array(Low, High)
val toInt: Bit => Int = Map(Low -> 0, High -> 1)
}
case object High extends Bit
case object Low extends Bit
import collection.IndexedSeqLike
import collection.mutable.{Builder, ArrayBuffer}
import collection.generic.CanBuildFrom
import collection.IndexedSeq
// IndexedSeqLike implements all concrete methods of IndexedSeq
// with newBuilder. (methods like take, filter, drop)
final class BitSeq private (val bits: Array[Int], val length: Int)
extends IndexedSeq[Bit]
with IndexedSeqLike[Bit, BitSeq]
{
import BitSeq._
// Mandatory for IndexedSeqLike
override protected[this] def newBuilder: Builder[Bit, BitSeq] =
BitSeq.newBuilder
//Mandatory for IndexedSeq
def apply(idx: Int): Bit = {
if(idx < 0 || length <= idx)
throw new IndexOutOfBoundsException
Bit.fromInt(bits(idx/N) >> (idx % N) & M)
}
}
object BitSeq {
// Bits per Int
private val N = 32
// Bitmask to isolate a bit
private val M = 0x01
def fromSeq(buf: Seq[Bit]): BitSeq = {
val bits = new Array[Int]((buf.length + N - 1) / N)
for(i <- 0 until buf.length) {
bits(i/N) |= Bit.toInt(buf(i)) << (i % N)
}
new BitSeq(bits, buf.length)
}
def apply(bits: Bit*) = fromSeq(bits)
def newBuilder: Builder[Bit, BitSeq] = new ArrayBuffer mapResult fromSeq
// Needed for map etc. (BitSeq map {:Bit} should return a BitSeq)
implicit def canBuilderFrom: CanBuildFrom[BitSeq, Bit, BitSeq] =
new CanBuildFrom[BitSeq, Bit, BitSeq] {
def apply(): Builder[Bit, BitSeq] = newBuilder
def apply(from: BitSeq): Builder[Bit, BitSeq] = newBuilder
}
}
There isn't any existing class that I'm aware of, but you can leverage the existing classes to help out with almost all of the difficult operations. The trick is to turn your data into a stream of Ints (or Bytes if there wouldn't be enough memory). You then can use all the handy collections methods (e.g. take) and only are left with the problem of turning bits into memory. But that's easy if you pack the bits in MSB order.
object BitExample {
def bitInt(ii: Iterator[Int]): Int = (0 /: ii)((i,b) => (i<<1)|b)
def bitInt(ii: Iterable[Int]): Int = bitInt(ii.iterator)
class ArrayBits(bytes: Array[Byte]) extends Iterator[Int] {
private[this] var buffer = 0
private[this] var index,shift = -1
def hasNext = (shift > 0) || (index+1 < bytes.length)
def next = {
if (shift <= 0) {
index += 1
buffer = bytes(index) & 0xFF
shift = 7
}
else shift -= 1
(buffer >> shift) & 0x1
}
}
}
And then you do things like
import BitExample._
val compressed = new ArrayBits( Array[Byte](14,29,126) ).toStream
val headless = compressed.dropWhile(_ == 0)
val (test,rest) = headless.splitAt(3)
if (bitInt(test) > 4) println(bitInt(rest.take(6)))
(You can decide whether you want to use the iterator directly or as a stream, list, or whatever.)