I am looking through the Caffe prototxt for deep residual networks and have noticed the appearance of a "Scale" layer.
layer {
bottom: "res2b_branch2b"
top: "res2b_branch2b"
name: "scale2b_branch2b"
type: "Scale"
scale_param {
bias_term: true
}
}
However, this layer is not available in the Caffe layer catalogue. Can someone explain the functionality of this layer and the meaning of the parameters or point to a an up-to-date documentation for Caffe?
You can find a detailed documentation on caffe here.
Specifically, for "Scale" layer the doc reads:
Computes a product of two input Blobs, with the shape of the latter Blob "broadcast" to match the shape of the former. Equivalent to tiling the latter Blob, then computing the elementwise product.
The second input may be omitted, in which case it's learned as a parameter of the layer.
It seems like, in your case, (single "bottom"), this layer learns a scale factor to multiply "res2b_branch2b". Moreover, since scale_param { bias_term: true } means the layer learns not only a multiplicative scaling factor, but also a constant term. So, the forward pass computes:
res2b_branch2b <- res2b_branch2b * \alpha + \beta
During training the net tries to learn the values of \alpha and \beta.
There's also some documentation on it in the caffe.proto file, you can search for 'ScaleParameter'.
Thanks a heap for your post :) Scale layer was exactly what I was looking for. In case anyone wants an example for a layer that scales by a scalar (0.5) and then "adds" -2 (and those values shouldn't change):
layer {
name: "scaleAndAdd"
type: "Scale"
bottom: "bot"
top: "scaled"
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
filler {
value: 0.5 }
bias_term: true
bias_filler {
value: -2
}
}
}
(Probably, the decay_mult's are unnecessary here though. But dunno. See comments.)
Other than that:
lr_mult: 0 - switches off learning for "that param" - I think the
first "param {" always(?) refers to the weights, the second to bias (lr_mult is not ScaleLayer specific)
filler: a "FillerParameter" [see caffe.proto] telling how to fill the ommited second blob. Default is one constant "value: ...".
bias_filler: parameter telling how to fill an optional bias blob
bias_term: whether there is a bias blob
All taken from caffe.proto. And: I only tested the layer above with both filler values = 1.2.
Related
I want an array of 1-dimensional data (essentially an array of arrays) that is created at run time, so the size of this array is not known at compile time. I want to easily send the array of that data to a kernel shader using setTextures so my kernel can just accept a single argument like texture1d_array which will bind each element texture to a different kernel index automatically, regardless of how many there are.
The question is how to actually create a 1D MTLTexture ? All the MTLTextureDescriptor options seem to focus on 2D or 3D. Is it as simple as creating a 2D texture with the height of 1? Would that then be a 1D texture that the kernel would accept?
I.e.
let textureDescriptor = MTLTextureDescriptor
.texture2DDescriptor(pixelFormat: .r16Uint,
width: length,
height: 1,
mipmapped: false)
If my data is actually just one dimensional (not actually pixel data), is there is an equally convenient way to use an ordinary buffer instead of MTLTexture, with the same flexibility of sending an arbitrarily-sized array of these buffers to the kernel as a single kernel argument?
You can recreate all of those factory methods yourself. I would just use an initializer though unless you run into collisions.
public extension MTLTextureDescriptor {
/// A 1D texture descriptor.
convenience init(
pixelFormat: MTLPixelFormat = .r16Uint,
width: Int
) {
self.init()
textureType = .type1D
self.pixelFormat = pixelFormat
self.width = width
}
}
MTLTextureDescriptor(width: 512)
This is no different than texture2DDescriptor with a height of 1, but it's not as much of a lie. (I.e. yes, a texture can be thought of as infinite-dimensional, with a magnitude of 1 in everything but the few that matter. But we throw out all the dimensions with 1s when saying "how dimensional" something is.)
var descriptor = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: .r16Uint,
width: 512,
height: 1,
mipmapped: false
)
descriptor.textureType = .type1D
descriptor == MTLTextureDescriptor(width: 512) // true
——
textureType = .typeTextureBuffer takes care of your second question, but why not use textureBufferDescriptor then?
For better understanding of this question, let me show a working example in TradingView webpage, on which the following chart shows the combination of Momentum (area chart) + ADX (line chart):
As you can see, there are 2 vertical values, at left side is the ADX which the scale goes usually from 0 to 60(+-), but the right side, it can grow much more.
On my attempt to achieve the same here, the momentum (area chart) has a huge range, and the adx (lineal chart) goes from 0 to 60, so when it tries to fit both values under the same scale, my blue line looks like it's almost zero, in comparison with the area chart values. (Mouse-over the blue line showing that is currently 43)
So I think you get the point, would it be possible to have 2 scales/vAxis for each series of values?
I checked documentation already but nothing here seems to refer to what I mention:
https://developers.google.com/chart/interactive/docs/gallery/combochart
And just in case you need the options provided to the chart, nothing advanced, just the basic examples:
options:{
seriesType: 'area',
series: {
0: { type: 'line', color: 'blue' }
}
};
use option --> targetAxisIndex -- in the series option...
zero is the default v-axis
targetAxisIndex: 0
to create a second v-axis, assign one of the series to index 1
series: {
1: {
targetAxisIndex: 1
}
}
Found the solution over this post in the documentation, it is conceptually known as DUAL-Y Charts
https://developers.google.com/chart/interactive/docs/gallery/columnchart#dual-y-charts
I am converting two trained Keras models to Metal Performance Shaders. I have to reshape the output of the first graph and use it as input to the second graph. The first graph's output is an MPSImage with "shape" (1,1,8192), and the second graph's input is an MPSImage of "shape" (4,4,512).
I cast graph1's output image.texture as a float16 array, and pass it to the following function to copy the data into "midImage", a 4x4x512 MPSImage:
func reshapeTexture(imageArray:[Float16]) -> MPSImage{
let image = imageArray
image.withUnsafeBufferPointer { ptr in
let width = midImage.texture.width
let height = midImage.texture.height
for slice in 0..<128{
for w in 0..<width{
for h in 0..<height{
let region = MTLRegion(origin: MTLOriginMake(w, h, 0),
size: MTLSizeMake(1, 1, 1))
midImage.texture.replace(region: region, mipmapLevel: 0, slice: slice, withBytes: ptr.baseAddress!.advanced(by: ((slice * 4 * width * height) + ((w + h) * 4)), bytesPerRow: MemoryLayout<Float16>.stride * 4, bytesPerImage: 0)
}
}
}
}
return midImage
}
When I pass midImage to graph2, the output of the graph is a square with 3/4 garbled noise, 1/4 black in the bottom right corner. I think I am not understanding something about the MPSImage slice property for storing extra channels. Thanks!
Metal 2d texture arrays are nearly always stored in a Morton or “Z” ordering of some kind. Certainly MPS always allocates them that way, though I suppose on MacOS there may be a means to make a linear 2D texture array and wrap a MPSImage around it. So, without undue care, direct access of a 2d texture array backing store is going to result in sadness and confusion.
The right way to do this is to write a simple Metal copy kernel. This gives you storage order independence and you don’t have to wait for the command buffer to complete before you can do the operation.
A feature request in Radar might also be warranted. Please also look in the latest macOS / iOS seed to see if Apple recently added a reshape filter for you.
I have written a custom layer and would like to output both the accuracy and loss at the same time. Is this possible to accomplish using caffe in the following manner ?
Something similar to :
layer {
name: ""
bottom: ""
top: loss1
top: loss2
top: accuracy
}
You can have as many "top"s as you want for your layer.
First, you need to define the number of "top"s your layer computes. This is done by overriding ExactNumBottomBlobs().
Your LayerSetup and Reshape methods should also take into account the new number of "top"s and setup and reshape these "top"s as well.
Note that since your layer is a loss layer, you'll have to have loss_weight value for each "top":
layer {
name: "my_new_layer"
type: "MyNewLayer"
bottom: "x"
top: "loss1"
top: "loss2"
top: "accuracy"
loss_weight: 1
loss_weight: 1.3 # you might want loss2 to have a bit more impact
loss_weight: 0 # accuracy should not affect gradients...
}
And your layer class should be derived of LossLayer<Dtype> class, rather than the more abstract Layer<Dtype> class.
For more information on how to implement new layers in caffe, see this page.
Also note that "SoftmaxWithLoss" layer has an optional second "top", you might want to look in the code of that layer to see how this is implemented.
I am working on implementing Hinton's Knowledge distillation paper. The first step is to store the soft targets of a "cumbersome model" with a higher temperature (i.e. I don't need to train the network, just need to do forward pass per image and store the soft targets with a temperature T).
Is there a way I can get the output of Alexnet or googlenet soft targets but with a different temperature?
I need to modify the soft-max with pi= exp(zi/T)/sum(exp(zi/T).
Need to divide the outputs of the final fully connected layer with a temperature T. I only need this for the forward pass (not for training).
I believe there are three options to solve this problem
1. Implement your own Softmax layer with a temperature parameter. It should be quite straight forward to modify the code of softmax_layer.cpp to take into account a "temperature" T. You might need to tweak the caffe.proto as well to allow for parsing Softmax layer with an extra parameter.
2. Implement the layer as a python layer.
3. If you only need a forward pass, i.e. "extracting features", then you can simply output as features the "top" of the layer before the softmax layer and do the softmax with temperature outside caffe altogether.
4. You can add Scale layer before the top Softmax layer:
layer {
type: "Scale"
name: "temperature"
bottom: "zi"
top: "zi/T"
scale_param {
filler: { type: 'constant' value: 1/T } # replace "1/T" with the actual value of 1/T.
}
param { lr_mult: 0 decay_mult: 0 } # make sure temperature is fixed
}
layer {
type: "Softmax"
name: "prob"
bottom: "zi/T"
top: "pi"
}