I am new to Caffe, and now I need to modify the threshold values in ReLU layers in a convolution neural network. The way I am using now to modify thresholds is to edit the C++ source code in caffe/src/caffe/layers/relu_layer.cpp, and recompile it. However, this will change the threshold value to a specified value every time when ReLU is called. Is there any way to use different value as threshold in each ReLU layers in a network? By the way, I am using the pycaffe interface and I cannot find such a way to do so.
Finally, sorry for my poor English, if there are something unclear, just let me know, I'll try to describe it in detail.
If I understand correctly your "ReLU with threshold" is basically
f(x) = x-threshold if x>threshold, 0 otherwise
You can easily implement it by adding a "Bias" layer which subtract threshold from the input just prior to a regular "ReLU" layer
Yes, you can. In src/caffe/proto, add a line:
message ReLUParameter {
...
optional float threshold = 3 [default = 0]; #add this line
...
}
and in src/caffe/layers/relu_layer.cpp, make some small modifications as:
template <typename Dtype>
void ReLULayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
...
Dtype threshold = this->layer_param_.relu_param().threshold(); //add this line
for (int i = 0; i < count; ++i) {
top_data[i] = (bottom_data[i] > threshold) ? (bottom_data[i] - threshold) :
(negative_slope * (bottom_data[i] - threshold));
}
}
template <typename Dtype>
void ReLULayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) {
if (propagate_down[0]) {
...
Dtype threshold = this->layer_param_.relu_param().threshold(); //this line
for (int i = 0; i < count; ++i) {
bottom_diff[i] = top_diff[i] * ((bottom_data[i] > threshold)
+ negative_slope * (bottom_data[i] <= threshold));
}
}
}
and similarly in src/caffe/layers/relu_layer.cu the code should be like this.
And after compiling your caffe and pycaffe, in your net.prototxt, you can write a relu layer like:
layer {
name: "threshold_relu"
type: "ReLU"
relu_param: {threshold: 1 #e.g. you want this relu layer to have a threshold 1}
bottom: "input"
top: "output"
}
Related
I need to find the shortest set of paths to connect each element of Set A with at least one element of Set B. Repetitions in A OR B are allowed (but not both), and no element can be left unconnected. Something like this:
I'm representing the elements as integers, so the "cost" of a connection is just the absolute value of the difference. I also have a cost for crossing paths, so if Set A = [60, 64] and Set B = [63, 67], then (60 -> 67) incurs an additional cost. There can be any number of elements in either set.
I've calculated the table of transitions and costs (distances and crossings), but I can't find the algorithm to find the lowest-cost solution. I keep ending up with either too many connections (i.e., repetitions in both A and B) or greedy solutions that omit elements (e.g., when A and B are non-overlapping). I haven't been able to find examples of precisely this kind of problem online, so I hoped someone here might be able to help, or at least point me in the right direction. I'm not a graph theorist (obviously!), and I'm writing in Swift, so code examples in Swift (or pseudocode) would be much appreciated.
UPDATE: The solution offered by #Daniel is almost working, but it does occasionally add unnecessary duplicates. I think this may be something to do with the sorting of the priorityQueue -- the duplicates always involve identical elements with identical costs. My first thought was to add some kind of "positional encoding" (yes, Transformer-speak) to the costs, so that the costs are offset by their positions (though of course, this doesn't guarantee unique costs). I thought I'd post my Swift version here, in case anyone has any ideas:
public static func voiceLeading(from chA: [Int], to chB: [Int]) -> Set<[Int]> {
var result: Set<[Int]> = Set()
let im = intervalMatrix(chA, chB: chB)
if im.count == 0 { return [[0]] }
let vc = voiceCrossingCostsMatrix(chA, chB: chB, cost: 4)
// NOTE: cm contains the weights
let cm = VectorUtils.absoluteAddMatrix(im, toMatrix: vc)
var A_links: [Int:Int] = [:]
var B_links: [Int:Int] = [:]
var priorityQueue: [Entry] = []
for (i, a) in chA.enumerated() {
for (j, b) in chB.enumerated() {
priorityQueue.append(Entry(a: a, b: b, cost: cm[i][j]))
if A_links[a] != nil {
A_links[a]! += 1
} else {
A_links[a] = 1
}
if B_links[b] != nil {
B_links[b]! += 1
} else {
B_links[b] = 1
}
}
}
priorityQueue.sort { $0.cost > $1.cost }
while priorityQueue.count > 0 {
let entry = priorityQueue[0]
if A_links[entry.a]! > 1 && B_links[entry.b]! > 1 {
A_links[entry.a]! -= 1
B_links[entry.b]! -= 1
} else {
result.insert([entry.a, (entry.b - entry.a)])
}
priorityQueue.remove(at: 0)
}
return result
}
Of course, since the duplicates have identical scores, it shouldn't be a problem to just remove the extras, but it feels a bit hackish...
UPDATE 2: Slightly less hackish (but still a bit!); since the requirement is that my result should have equal cardinality to max(|A|, |B|), I can actually just stop adding entries to my result when I've reached the target cardinality. Seems okay...
UPDATE 3: Resurrecting this old question, I've recently had some problems arise from the fact that the above algorithm doesn't fulfill my requirement |S| == max(|A|, |B|) (where S is the set of pairings). If anyone knows of a simple way of ensuring this it would be much appreciated. (I'll obviously be poking away at possible changes.)
This is an easy task:
Add all edges of the graph in a priority_queue, where the biggest priority is the edge with the biggest weight.
Look each edge e = (u, v, w) in the priority_queue, where u is in A, v is in B and w is the weight.
If removing e from the graph doesn't leave u or v isolated, remove it.
Otherwise, e is part of the answer.
This should be enough for your case:
#include <bits/stdc++.h>
using namespace std;
struct edge {
int u, v, w;
edge(){}
edge(int up, int vp, int wp){u = up; v = vp; w = wp;}
void print(){ cout<<"("<<u<<", "<<v<<")"<<endl; }
bool operator<(const edge& rhs) const {return w < rhs.w;}
};
vector<edge> E; //edge set
priority_queue<edge> pq;
vector<edge> ans;
int grade[5] = {3, 3, 2, 2, 2};
int main(){
E.push_back(edge(0, 2, 1)); E.push_back(edge(0, 3, 1)); E.push_back(edge(0, 4, 4));
E.push_back(edge(1, 2, 5)); E.push_back(edge(1, 3, 2)); E.push_back(edge(1, 4, 0));
for(int i = 0; i < E.size(); i++) pq.push(E[i]);
while(!pq.empty()){
edge e = pq.top();
if(grade[e.u] > 1 && grade[e.v] > 1){
grade[e.u]--; grade[e.v]--;
}
else ans.push_back(e);
pq.pop();
}
for(int i = 0; i < ans.size(); i++) ans[i].print();
return 0;
}
Complexity: O(E lg(E)).
I think this problem is "minimum weighted bipartite matching" (although searching for " maximum weighted bipartite matching" would also be relevant, it's just the opposite)
To be clear, I'm very new to STM32 and MBED programming, but is it possible to create valid VGA signal, using STM32 nucleo-F070RB board? I've picked up this standard, and my goal is to display "something" on screen. With this i mean I should have control which pixel i want to, well, turn on.
For demo, here is my (very crude) sketch:
#include <mbed.h>
DigitalOut led(LED1);
DigitalOut h_sync(PC_3);
DigitalOut v_sync(PC_2);
DigitalOut c_red(PC_0);
int main() {
h_sync = 1;
v_sync = 1;
int line_count = 0;
int color_red = 0;
while(1) {
wait_us(21);
h_sync = 0;
wait_us(3);
h_sync = 1;
wait_ns(2);
line_count++;
color_red++;
if (color_red == 16) { color_red = 0; c_red = !c_red; }
if (line_count == 601) v_sync = 0;
if (line_count == 605) v_sync = 1;
if (line_count == 628) { line_count = 0; c_red = 0; color_red = 0; }
}
}
I have connected V-Sync to V-Sync of my monitor, same with H-Sync and c_red (color red) through 560Ohm resistor to RED signal. And it (kind of) worked! It displayed red strips every 16 lines. Perfect, but I need to be able to control every pixel (if it's possible). I've seen some VGA libraries (maybe), but i really need to write it myself - something very crude. I just only want to have some sort of control over pixels, not the some super-super something (Just for me to learn :) ). And because I don't have much experience with STMs, after hours i was not able to "convince" my board to generate such "high-speed" signals, so, it is after all possible?
I was using MBED's Ticker function to generate the timing for each pixel, but it did not work - the fastest the Ticker went for me was something around few miliseconds, far too much. Can I use timer interrupts? Or something else?
I'm looking for an algorithm, I am programming in swift now but pseudocode or any reasonably similar "C family" syntax will do.
Imagine a large list of values, such as pixels in a bitmap. You want to pick each one in a visually random order, one at a time, and never pick the same one twice, and always end up picking them all.
I used it before in a Fractal generator so that it was not just rendering line by line, but built it up slowly in a stochastic way, but that was long ago, in a Java applet, and I no longer have the code.
I do not believe it used any pseudo-random number generator, and the main thing I liked about it is that it did not make the rendering time take longer than the just line by line approach. Any of the shuffling algorithms I looked at would make the rendering take longer with such a large number of values to deal with, unless I'm missing something.
EDIT: I used the shuffling an array approach. I shuffle once when the app loads, and it does not take that long anyway. Here is the code for my "Dealer" class.
import Foundation
import Cocoa
import Quartz
class Dealer: NSObject
{
//########################################################
var deck = [(CGFloat,CGFloat)]()
var count = 0
//########################################################
init(_ w:Int, _ h:Int)
{
super.init()
deck.reserveCapacity((w*h)+1)
for y in 0...h
{
for x in 0...w
{
deck.append((CGFloat(x),CGFloat(y)))
}
}
self.shuffle()
}
//########################################################
func shuffle()
{
var j:Int = 0
let total:Int = deck.count-1
for i:Int in 0...total
{
j = Int(arc4random_uniform(UInt32(total)))
deck.swapAt(i, j)
}
}
//########################################################
func deal() -> (CGFloat,CGFloat)
{
let result = deck[count]
let total:Int = deck.count-1
if(count<total) { count=count+1 } else { count=0 }
return(result)
}
//########################################################
}
The init is called once, and it calls shuffle, but if you want you can call shuffle again if needed.
Each time you need a "card" you call Deal. It loops to the beginning when the "deck" is done.
if you got enough memory space to store all the pixel positions you can shuffle them:
const int xs=640; // image resolution
const int ys=480;
color pixel[sz]; // image data
const int sz=xs*ys; // image size
int adr[sz],i,j;
for (i=0;i<sz;i++) adr[i]=i; // ordered positions
for (i=0;i<sz;i++) // shuffle them
{
j = random(sz); // pseudo-randomness with uniform distribution
swap(pixel[i],pixel[j]);
}
this way you got guaranteed that each pixel is used once and most likely all of them are shuffled ...
You need to implement a pseudo-random number generator with a theoretically known period, which is greater than but very close to the number of elements in your list. Suppose R() is a function that implements such a RNG.
Then:
for i = 1...N
do
idx = R()
while idx > N
output element(idx)
end
If the period of the RNG is greater than N, this algorithm is guaranteed to finish, and never output the same element twice
If the period of the RNG is close to N, this algorithm will be fast (i.e. the do-while loop will mostly do 1 iteration).
If the RNG has good quality, the visual output will look pleasant; here you have to do experiments and decide what is good enough for you
To find a RNG that has an exactly-known period, you should examine theory on RNGs, which is very extensive (maybe too extensive); Wikipedia has useful links.
Start with Linear congruential generators: they are very simple, and there is a chance they will be of good enough quality.
Here's a working example based on linear feedback shift registers. Since an n-bit LFSR has a maximal sequence length of 2n−1 steps, this will work best when the number of pixels is one less than a power of 2. For other sizes, the pseudo-random coordinates are discarded until one is obtained that lies within the specified range of coordinates. This is still reasonably efficient; in the worst case (where w×h is a power of 2), there will be an average of two LSFR iterations per coordinate pair.
The following code is in Javascript, but it should be easy enough to port this to Swift or any other language.
Note: For large canvas areas like 1920×1024, it would make more sense to use repeated tiles of a smaller size (e.g., 128×128). The tiling will be imperceptible.
var lsfr_register, lsfr_mask, lsfr_fill_width, lsfr_fill_height, lsfr_state, lsfr_timer;
var lsfr_canvas, lsfr_canvas_context, lsfr_blocks_per_frame, lsfr_frame_rate = 50;
function lsfr_setup(width, height, callback, duration) {
// Maximal length LSFR feedback terms
// (sourced from http://users.ece.cmu.edu/~koopman/lfsr/index.html)
var taps = [ -1, 0x1, 0x3, 0x5, 0x9, 0x12, 0x21, 0x41, 0x8E, 0x108, 0x204, 0x402,
0x829, 0x100D, 0x2015, 0x4001, 0x8016, 0x10004, 0x20013, 0x40013,
0x80004, 0x100002, 0x200001, 0x400010, 0x80000D, 0x1000004, 0x2000023,
0x4000013, 0x8000004, 0x10000002, 0x20000029, 0x40000004, 0x80000057 ];
nblocks = width * height;
lsfr_size = nblocks.toString(2).length;
if (lsfr_size > 32) {
// Anything longer than about 21 bits would be quite slow anyway
console.log("Unsupposrted LSFR size ("+lsfr_size+")");
return;
}
lsfr_register = 1;
lsfr_mask = taps[lsfr_size];
lsfr_state = nblocks;
lsfr_fill_width = width;
lsfr_fill_height = height;
lsfr_blocks_per_frame = Math.ceil(nblocks / (duration * lsfr_frame_rate));
lsfr_timer = setInterval(callback, Math.ceil(1000 / lsfr_frame_rate));
}
function lsfr_step() {
var x, y;
do {
// Generate x,y pairs until they are within the bounds of the canvas area
// Worst-case for an n-bit LSFR is n iterations in one call (2 on average)
// Best-case (where w*h is one less than a power of 2): 1 call per iteration
if (lsfr_register & 1) lsfr_register = (lsfr_register >> 1) ^ lsfr_mask;
else lsfr_register >>= 1;
y = Math.floor((lsfr_register-1) / lsfr_fill_width);
} while (y >= lsfr_fill_height);
x = (lsfr_register-1) % lsfr_fill_width;
return [x, y];
}
function lsfr_callback() {
var coords;
for (var i=0; i<lsfr_blocks_per_frame; i++) {
// Fetch pseudo-random coordinates and fill the corresponding pixels
coords = lsfr_step();
lsfr_canvas_context.fillRect(coords[0],coords[1],1,1);
if (--lsfr_state <= 0) {
clearInterval(lsfr_timer);
break;
}
}
}
function start_fade() {
var w = document.getElementById("w").value * 1;
var h = document.getElementById("h").value * 1;
var dur = document.getElementById("dur").value * 1;
lsfr_canvas = document.getElementById("cv");
lsfr_canvas.width = w;
lsfr_canvas.height = h;
lsfr_canvas_context = lsfr_canvas.getContext("2d");
lsfr_canvas_context.fillStyle = "#ffff00";
lsfr_canvas_context.fillRect(0,0,w,h);
lsfr_canvas_context.fillStyle = "#ff0000";
lsfr_setup(w, h, lsfr_callback, dur);
}
Size:
<input type="text" size="3" id="w" value="320"/>
×
<input type="text" size="3" id="h" value="240"/>
in
<input type="text" size="3" id="dur" value="3"/>
secs
<button onclick="start_fade(); return 0">Start</button>
<br />
<canvas id="cv" width="320" height="240" style="border:1px solid #ccc"/>
I am implementing this neural network for some classification problem. I initially tried back propagation but it takes longer to converge. So I though of using RPROP. In my test setup RPROP works fine for AND gate simulation but never converges for OR and XOR gate simulation.
How and when should I update bias for RPROP?
Here my weight update logic:
for(int l_index = 1; l_index < _total_layers; l_index++){
Layer* curr_layer = get_layer_at(l_index);
//iterate through each neuron
for (unsigned int n_index = 0; n_index < curr_layer->get_number_of_neurons(); n_index++) {
Neuron* jth_neuron = curr_layer->get_neuron_at(n_index);
double change = jth_neuron->get_change();
double curr_gradient = jth_neuron->get_gradient();
double last_gradient = jth_neuron->get_last_gradient();
int grad_sign = sign(curr_gradient * last_gradient);
//iterate through each weight of the neuron
for(int w_index = 0; w_index < jth_neuron->get_number_of_weights(); w_index++){
double current_weight = jth_neuron->give_weight_at(w_index);
double last_update_value = jth_neuron->give_update_value_at(w_index);
double new_update_value = last_update_value;
if(grad_sign > 0){
new_update_value = min(last_update_value*1.2, 50.0);
change = sign(curr_gradient) * new_update_value;
}else if(grad_sign < 0){
new_update_value = max(last_update_value*0.5, 1e-6);
change = -change;
curr_gradient = 0.0;
}else if(grad_sign == 0){
change = sign(curr_gradient) * new_update_value;
}
//Update neuron values
jth_neuron->set_change(change);
jth_neuron->update_weight_at((current_weight + change), w_index);
jth_neuron->set_last_gradient(curr_gradient);
jth_neuron->update_update_value_at(new_update_value, w_index);
double current_bias = jth_neuron->get_bias();
jth_neuron->set_bias(current_bias + _learning_rate * jth_neuron->get_delta());
}
}
}
In principal you don't treat the bias differently than before when you did backpropagation. It's learning_rate * delta which you seem to be doing.
One source of error may be that the sign of the weight change depends on how you calculate your error. There's different conventions and (t_i-y_i) instead of (y_i - t_i) should result in returning (new_update_value * sgn(grad)) instead of -(new_update_value * sign(grad)) so try switching the sign. I'm also unsure about how you specifically implemented everything since a lot is not shown here. But here's a snippet of mine in a Java implementation that might be of help:
// gradient didn't change sign:
if(weight.previousErrorGradient * errorGradient > 0)
weight.lastUpdateValue = Math.min(weight.lastUpdateValue * step_pos, update_max);
// changed sign:
else if(weight.previousErrorGradient * errorGradient < 0)
{
weight.lastUpdateValue = Math.max(weight.lastUpdateValue * step_neg, update_min);
}
else
weight.lastUpdateValue = weight.lastUpdateValue; // no change
// Depending on language, you should check for NaN here.
// multiply this with -1 depending on your error signal's sign:
return ( weight.lastUpdateValue * Math.signum(errorGradient) );
Also, keep in mind that 50.0, 1e-6 and especially 0.5, 1.2 are empirically gathered values so they might need to be adjusted. You should definitely print out the gradients and weight changes to see if there's something weird going on (e.g. exploding gradients->NaN although you're only testing AND/XOR). Your last_gradient value should also be initialized to 0 at the first timestep.
For my doctoral thesis I am building a 3D printer based loosely off of one from the University of Twente:
http://pwdr.github.io/
So far, everything has gone relatively smoothly. The hardware part took longer than expected, but the electronics frighten me a little bit. I can sucessfully jog all the motors and, mechanically, everything does what is supposed to do.
However, now that I am working on the software side, I am getting headaches.
The Pwder people wrote a code that uses Processing to take an .STL file and slice it into layers. Upon running the code, a Processing GUI opens where I can load a model. The model loads fine (I'm using the Utah Teapot) and shows that it will take 149 layers.
Upon hitting "convert" the program is supposed to take the .STL file and slice it into layers, followed by writing a text file that I can then upload to an SD card. The printer will then print directly from the SD card.
However, when I hit "convert" I get an "Array Index Out of Bounds" error. I'm not quite sure what this means.. can anyone enlighten me?
The code can be found below, along with a picture of the error.
Thank you.
// Convert the graphical output of the sliced STL into a printable binary format.
// The bytes are read by the Arduino firmware
PrintWriter output, outputUpper;
int loc;
int LTR = 0;
int lowernozzles = 8;
int uppernozzles = 4;
int nozzles = lowernozzles+uppernozzles;
int printXcoordinate = 120+280; // Left margin 120
int printYcoordinate = 30+190; // Top margin 30
int printWidth = 120; // Total image width 650
int printHeight = 120; // Total image height 480
int layer_size = printWidth * printHeight/nozzles * 2;
void convertModel() {
// Create config file for the printer, trailing comma for convenience
output = createWriter("PWDR/PWDRCONF.TXT"); output.print(printWidth+","+printHeight/nozzles+","+maxSlices+","+inkSaturation+ ",");
output.flush();
output.close();
int index = 0;
byte[] print_data = new byte[layer_size * 2];
// Steps of 12 nozzles in Y direction
for (int y = printYcoordinate; y < printYcoordinate+printHeight; y=y+nozzles ) {
// Set a variable to know wheter we're moving LTR of RTL
LTR++;
// Step in X direction
for (int x = 0; x < printWidth; x++) {
// Clear the temp strings
String[] LowerStr = {""};
String LowerStr2 = "";
String[] UpperStr = {""};
String UpperStr2 = "";
// For every step in Y direction, sample the 12 nozzles
for ( int i=0; i<nozzles; i++) {
// Calculate the location in the pixel array, use total window width!
// Use the LTR to determine the direction
if (LTR % 2 == 1){
loc = printXcoordinate + printWidth - x + (y+i) * width;
} else {
loc = printXcoordinate + x + (y+i) * width;
}
if (brightness(pixels[loc]) < 100) {
// Write a zero when the pixel is white (or should be white, as the preview is inverted)
if (i<uppernozzles) {
UpperStr = append(UpperStr, "0");
} else {
LowerStr = append(LowerStr, "0");
}
} else {
// Write a one when the pixel is black
if (i<uppernozzles) {
UpperStr = append(UpperStr, "1");
} else {
LowerStr = append(LowerStr, "1");
}
}
}
LowerStr2 = join(LowerStr, "");
print_data[index] = byte(unbinary(LowerStr2));
index++;
UpperStr2 = join(UpperStr, "");
print_data[index] = byte(unbinary(UpperStr2));
index++;
}
}
if (sliceNumber >= 1 && sliceNumber < 10){
String DEST_FILE = "PWDR/PWDR000"+sliceNumber+".DAT";
File dataFile = sketchFile(DEST_FILE);
if (dataFile.exists()){
dataFile.delete();
}
saveBytes(DEST_FILE, print_data); // Savebytes directly causes bug under Windows
} else if (sliceNumber >= 10 && sliceNumber < 100){
String DEST_FILE = "PWDR/PWDR00"+sliceNumber+".DAT";
File dataFile = sketchFile(DEST_FILE);
if (dataFile.exists()){
dataFile.delete();
}
saveBytes(DEST_FILE, print_data); // Savebytes directly causes bug under Windows
} else if (sliceNumber >= 100 && sliceNumber < 1000){
String DEST_FILE = "PWDR/PWDR0"+sliceNumber+".DAT";
File dataFile = sketchFile(DEST_FILE);
if (dataFile.exists()){
dataFile.delete();
}
saveBytes(DEST_FILE, print_data); // Savebytes directly causes bug under Windows
} else if (sliceNumber >= 1000) {
String DEST_FILE = "PWDR/PWDR"+sliceNumber+".DAT";
File dataFile = sketchFile(DEST_FILE);
if (dataFile.exists()){
dataFile.delete();
}
saveBytes(DEST_FILE, print_data); // Savebytes directly causes bug under Windows
}
sliceNumber++;
println(sliceNumber);
}
What's happening is that print_data is smaller than index. (For example, if index is 123, but print_data only has 122 elements.)
Size of print_data is layer_size * 2 or printWidth * printHeight/nozzles * 4 or 4800
Max size of index is printHeight/nozzles * 2 * printWidth or 20*120 or 2400.
This seems alright, so I probably missed something, and it appears to be placing data in element 4800, which is weird. I suggest a bunch of print statements to get the size of print_data and the index.