Different frequency to different cores in gem5 simulation? - multicore

Is it possible to set the different frequency to the CPU cores for simulation? It means, suppose I have two core C1 and C2 can I want to set 1GhZ to C1 and 2GHz to C2. In the group, I found that using big/little core for ARM ISA it might be possible. However, in the gem5 official site, it is mentioned that gem5 does not support heterogeneous cores. I want the varying frequency for X-86 and ALPHA ISA, is it possible in SE/Full mode. Kindly advise.
Thank you

Related

Detect if two audio files are generated by the same instrument

What I'm trying to do is to detect in a small set of audio samples if any are generated by the same instrument. If so, those are considered duplicates and filtered out.
Listen to this file of ten concatenated samples. You can hear that the first five are all generated by the same instrument (an electric piano) so four of them are to be deemed duplicates.
What algorithm or method can I use to solve this problem? Note that I don't need full-fledged instrument detection as I'm only interested in whether the instrument is or isn't the same. Note also that I don't mean literally "the same instrument" but rather "the same acoustic flavor just different pitches."
Task Formulation
What you need is a Similarity Metric (a type of Distance Metric), that predicts two samples of the same instrument / instrument type as very similar (low score) and two samples of different instruments as quite different (high score). And that this holds regardless of which note is being played. So it should be sensitive to timbre, and not sensitive to musical content.
Learning setup
The task can be referred to as Similarity Learning. A popular and effective approach for neural networks is Triplet Loss. Here is a blog-post introducing the concept in the context of image similarity. It has been applied successfully to audio before.
Model architecture
The primary model architecture I would consider would be a Convolutional Neural Network on log-mel spectrograms. Try first to use a generic model like OpenL3 as a feature extractor. It produces a 1024 dimensional output called an Audio Embedding, which you can do a triplet loss model on top of.
Datasets
The key to success for your application will to have a suitable dataset. You might be able to utilize the Nsynth dataset. Maybe training on that alone can give OK performance. Or you may be able to use it as a training set, and then fine-tune on your own training set.
You will at a minimum need to create a validation/test set for your own audio clips, in order to evaluate performance of the model. Minimum some 10-100 labeled examples of each instrument type of interest.

NEAT - What is a good Compatability Threshold

I am learning about NEAT (Neuroevolution of Augmenting Topologies), and am trying to implement it in C++, and I have no idea of what a good compatability threshold would be, please can you recommend one, along with c1, c2 and c3 (see the distance function (δ) in the paper (page 13): http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf)
The compatibility threshold along with variables c1, c2 and c3, should all be chosen based on your problem, and other variables you've set for your NEAT implementation. The larger your compatibility threshold is, the less species you'll have. If your population size is small this will probably be what you want since you don't want your already small population to be divided much further. However, if your population size is really big you can afford to have more species. Another thing to note is that, in general, c1 and c2 should always be set to the same thing, since there isn't any difference in the way that disjoint and excess genes behave. All you have to do now is determine how much you want the weights of each network to factor into speciation. This, in my experience can only be adjusted through trial and error.

Does tensorflow convnet only duplicate model across multiple GPUs?

I am currently running a Tensorflow convnet for image recognition and I am considering of buying new GPUs to enable more complex graphs, batch size, and input dimensions. I have read posts like this that do not recommend using AWS GPU instances to train convnets, but more opinions are always welcomed.
I've read Tensorflow's guide 'Training a Model Using Multiple GPU Cards', and it seems that the graph is duplicated across the GPUs. I would like to know is this the only way to use parallel GPUs in Tensorflow convnet?
The reason I am asking this is because if Tensorflow can only duplicate graphs across multiple GPUs, it would mean each GPU must have at least the memory size that my model requires for one batch. (Example if the minimum memory size required is 5GB, two card of 4GB each would not do the job)
Thank you in advance!
No, it is definitely possible to use different variables on different GPUs.
For every variable and every layer that you declare, you have the choice of where do you declare the variable.
And in the specific case, you would want to use multiple GPUs for duplicating your model only to increase its batch_size training parameter to train faster, you would still need to explicitly build your model using the concept of shared parameters and manage how do those parameters communicate.

Clocking Issue in FPGA with MATLAB HDL Coder

So I am using simulink to generate a series of upsampling filter. I have my input as a sine wave with 44.1 kHz input and a output sine wave of 11.2 MHz. For this I use a set of 4 FIR Interpolation Filter from Simulink. The first one with a upsample of 32 and the rest with upsample of 2.
The problem is with the Fmax (the highest value at which the circuit can be clocked). I get a Fmax which is really low. Like below 50 MHz. I did some optimizations and got it up here. I want to rise it more. If any one can help me I can attach the simulink file I have
I am using MATLAB HDL coder and Altera Quatras 2 for my synthesis purposes
First of all, i do not understand why you would upsample by 32 and then 4 times by 2. You should analyze the slowest path.
If the addition is a bottleneck, that would be in the 32x upsampling and 8,8,8 would be better. However, all depends on the implementation, which I can't guess from here.
I would advise to have a look at FIR filters. Reducing the number of FIR stages will increase your speed at the cost of increased , SNR, which may or may not be tolerable. You could take one with a very short impulse response.
You could also reduce the number bits used to represent the samples. This will again decrease the SNR but consume less logic and is likely to be faster.
You also consider to or not to use hard multiplier blocks, if available in the technology you are targetting.
Otherwise, have a look at parallel FIR filter implementations. Though I bet you'll have to implement that one yourself.
And of course, as you pointed out yourself, realistic constraints are required.
Good luck. Please consider liking my post.
Thank for the answer. Yes i need the 4 stages of upsampling because of my project requirements. My input sampling frequency is varying and my output should always be 11.2 MHz, so thats why i need those 4 different stages in order to generate output for 4 different stages.
I optimized the FIR filters by using pipeline registers, reduced the number of multipliers of the 32 upsample one using the partly serial architecture.
I guess the problem was i was not using a SDC file as need for timinig analysis by altera, now when i configure a simple SDC file, i get positive slack value and a restriected Fmax of 24.5 MHz, as my output needs to be 11.2 MHz i guess this is fine enough.
If you have some more suggestions on this please let me know, i did not quite understand the fact of the SNR

In what order should we tune hyperparameters in Neural Networks?

I have a quite simple ANN using Tensorflow and AdamOptimizer for a regression problem and I am now at the point to tune all the hyperparameters.
For now, I saw many different hyperparameters that I have to tune :
Learning rate : initial learning rate, learning rate decay
The AdamOptimizer needs 4 arguments (learning-rate, beta1, beta2, epsilon) so we need to tune them - at least epsilon
batch-size
nb of iterations
Lambda L2-regularization parameter
Number of neurons, number of layers
what kind of activation function for the hidden layers, for the output layer
dropout parameter
I have 2 questions :
1) Do you see any other hyperparameter I might have forgotten ?
2) For now, my tuning is quite "manual" and I am not sure I am not doing everything in a proper way.
Is there a special order to tune the parameters ? E.g learning rate first, then batch size, then ...
I am not sure that all these parameters are independent - in fact, I am quite sure that some of them are not. Which ones are clearly independent and which ones are clearly not independent ? Should we then tune them together ?
Is there any paper or article which talks about properly tuning all the parameters in a special order ?
EDIT :
Here are the graphs I got for different initial learning rates, batch sizes and regularization parameters. The purple curve is completely weird for me... Because the cost decreases like way slowly that the others, but it got stuck at a lower accuracy rate. Is it possible that the model is stuck in a local minimum ?
Accuracy
Cost
For the learning rate, I used the decay :
LR(t) = LRI/sqrt(epoch)
Thanks for your help !
Paul
My general order is:
Batch size, as it will largely affect the training time of future experiments.
Architecture of the network:
Number of neurons in the network
Number of layers
Rest (dropout, L2 reg, etc.)
Dependencies:
I'd assume that the optimal values of
learning rate and batch size
learning rate and number of neurons
number of neurons and number of layers
strongly depend on each other. I am not an expert on that field though.
As for your hyperparameters:
For the Adam optimizer: "Recommended values in the paper are eps = 1e-8, beta1 = 0.9, beta2 = 0.999." (source)
For the learning rate with Adam and RMSProp, I found values around 0.001 to be optimal for most problems.
As an alternative to Adam, you can also use RMSProp, which reduces the memory footprint by up to 33%. See this answer for more details.
You could also tune the initial weight values (see All you need is a good init). Although, the Xavier initializer seems to be a good way to prevent having to tune the weight inits.
I don't tune the number of iterations / epochs as a hyperparameter. I train the net until its validation error converges. However, I give each run a time budget.
Get Tensorboard running. Plot the error there. You'll need to create subdirectories in the path where TB looks for the data to plot. I do that subdir creation in the script. So I change a parameter in the script, give the trial a name there, run it, and plot all the trials in the same chart. You'll very soon get a feel for the most effective settings for your graph and data.
For parameters that are less important you can probably just pick a reasonable value and stick with it.
Like you said, the optimal values of these parameters all depend on each other. The easiest thing to do is to define a reasonable range of values for each hyperparameter. Then randomly sample a parameter from each range and train a model with that setting. Repeat this a bunch of times and then pick the best model. If you are lucky you will be able to analyze which hyperparameter settings worked best and make some conclusions from that.
I don't know any tool specific for tensorflow, but the best strategy is to first start with the basic hyperparameters such as learning rate of 0.01, 0.001, weight_decay of 0.005, 0.0005. And then tune them. Doing it manually will take a lot of time, if you are using caffe, following is the best option that will take the hyperparameters from a set of input values and will give you the best set.
https://github.com/kuz/caffe-with-spearmint
for more information, you can follow this tutorial as well:
http://fastml.com/optimizing-hyperparams-with-hyperopt/
For number of layers, What I suggest you to do is first make smaller network and increase the data, and after you have sufficient data, increase the model complexity.
Before you begin:
Set batch size to maximal (or maximal power of 2) that works on your hardware. Simply increase it until you get a CUDA error (or system RAM usage > 90%).
Set regularizes to low values.
The architecture and exact numbers of neurons and layers - use known architectures as inspirations and adjust them to your specific performance requirements: more layers and neurons -> possibly a stronger, but slower model.
Then, if you want to do it one by one, I would go like this:
Tune learning rate in a wide range.
Tune other parameters of the optimizer.
Tune regularizes (dropout, L2 etc).
Fine tune learning rate - it's the most important hyper-parameter.