Clocking Issue in FPGA with MATLAB HDL Coder - matlab

So I am using simulink to generate a series of upsampling filter. I have my input as a sine wave with 44.1 kHz input and a output sine wave of 11.2 MHz. For this I use a set of 4 FIR Interpolation Filter from Simulink. The first one with a upsample of 32 and the rest with upsample of 2.
The problem is with the Fmax (the highest value at which the circuit can be clocked). I get a Fmax which is really low. Like below 50 MHz. I did some optimizations and got it up here. I want to rise it more. If any one can help me I can attach the simulink file I have
I am using MATLAB HDL coder and Altera Quatras 2 for my synthesis purposes

First of all, i do not understand why you would upsample by 32 and then 4 times by 2. You should analyze the slowest path.
If the addition is a bottleneck, that would be in the 32x upsampling and 8,8,8 would be better. However, all depends on the implementation, which I can't guess from here.
I would advise to have a look at FIR filters. Reducing the number of FIR stages will increase your speed at the cost of increased , SNR, which may or may not be tolerable. You could take one with a very short impulse response.
You could also reduce the number bits used to represent the samples. This will again decrease the SNR but consume less logic and is likely to be faster.
You also consider to or not to use hard multiplier blocks, if available in the technology you are targetting.
Otherwise, have a look at parallel FIR filter implementations. Though I bet you'll have to implement that one yourself.
And of course, as you pointed out yourself, realistic constraints are required.
Good luck. Please consider liking my post.

Thank for the answer. Yes i need the 4 stages of upsampling because of my project requirements. My input sampling frequency is varying and my output should always be 11.2 MHz, so thats why i need those 4 different stages in order to generate output for 4 different stages.
I optimized the FIR filters by using pipeline registers, reduced the number of multipliers of the 32 upsample one using the partly serial architecture.
I guess the problem was i was not using a SDC file as need for timinig analysis by altera, now when i configure a simple SDC file, i get positive slack value and a restriected Fmax of 24.5 MHz, as my output needs to be 11.2 MHz i guess this is fine enough.
If you have some more suggestions on this please let me know, i did not quite understand the fact of the SNR

Related

In what order should we tune hyperparameters in Neural Networks?

I have a quite simple ANN using Tensorflow and AdamOptimizer for a regression problem and I am now at the point to tune all the hyperparameters.
For now, I saw many different hyperparameters that I have to tune :
Learning rate : initial learning rate, learning rate decay
The AdamOptimizer needs 4 arguments (learning-rate, beta1, beta2, epsilon) so we need to tune them - at least epsilon
batch-size
nb of iterations
Lambda L2-regularization parameter
Number of neurons, number of layers
what kind of activation function for the hidden layers, for the output layer
dropout parameter
I have 2 questions :
1) Do you see any other hyperparameter I might have forgotten ?
2) For now, my tuning is quite "manual" and I am not sure I am not doing everything in a proper way.
Is there a special order to tune the parameters ? E.g learning rate first, then batch size, then ...
I am not sure that all these parameters are independent - in fact, I am quite sure that some of them are not. Which ones are clearly independent and which ones are clearly not independent ? Should we then tune them together ?
Is there any paper or article which talks about properly tuning all the parameters in a special order ?
EDIT :
Here are the graphs I got for different initial learning rates, batch sizes and regularization parameters. The purple curve is completely weird for me... Because the cost decreases like way slowly that the others, but it got stuck at a lower accuracy rate. Is it possible that the model is stuck in a local minimum ?
Accuracy
Cost
For the learning rate, I used the decay :
LR(t) = LRI/sqrt(epoch)
Thanks for your help !
Paul
My general order is:
Batch size, as it will largely affect the training time of future experiments.
Architecture of the network:
Number of neurons in the network
Number of layers
Rest (dropout, L2 reg, etc.)
Dependencies:
I'd assume that the optimal values of
learning rate and batch size
learning rate and number of neurons
number of neurons and number of layers
strongly depend on each other. I am not an expert on that field though.
As for your hyperparameters:
For the Adam optimizer: "Recommended values in the paper are eps = 1e-8, beta1 = 0.9, beta2 = 0.999." (source)
For the learning rate with Adam and RMSProp, I found values around 0.001 to be optimal for most problems.
As an alternative to Adam, you can also use RMSProp, which reduces the memory footprint by up to 33%. See this answer for more details.
You could also tune the initial weight values (see All you need is a good init). Although, the Xavier initializer seems to be a good way to prevent having to tune the weight inits.
I don't tune the number of iterations / epochs as a hyperparameter. I train the net until its validation error converges. However, I give each run a time budget.
Get Tensorboard running. Plot the error there. You'll need to create subdirectories in the path where TB looks for the data to plot. I do that subdir creation in the script. So I change a parameter in the script, give the trial a name there, run it, and plot all the trials in the same chart. You'll very soon get a feel for the most effective settings for your graph and data.
For parameters that are less important you can probably just pick a reasonable value and stick with it.
Like you said, the optimal values of these parameters all depend on each other. The easiest thing to do is to define a reasonable range of values for each hyperparameter. Then randomly sample a parameter from each range and train a model with that setting. Repeat this a bunch of times and then pick the best model. If you are lucky you will be able to analyze which hyperparameter settings worked best and make some conclusions from that.
I don't know any tool specific for tensorflow, but the best strategy is to first start with the basic hyperparameters such as learning rate of 0.01, 0.001, weight_decay of 0.005, 0.0005. And then tune them. Doing it manually will take a lot of time, if you are using caffe, following is the best option that will take the hyperparameters from a set of input values and will give you the best set.
https://github.com/kuz/caffe-with-spearmint
for more information, you can follow this tutorial as well:
http://fastml.com/optimizing-hyperparams-with-hyperopt/
For number of layers, What I suggest you to do is first make smaller network and increase the data, and after you have sufficient data, increase the model complexity.
Before you begin:
Set batch size to maximal (or maximal power of 2) that works on your hardware. Simply increase it until you get a CUDA error (or system RAM usage > 90%).
Set regularizes to low values.
The architecture and exact numbers of neurons and layers - use known architectures as inspirations and adjust them to your specific performance requirements: more layers and neurons -> possibly a stronger, but slower model.
Then, if you want to do it one by one, I would go like this:
Tune learning rate in a wide range.
Tune other parameters of the optimizer.
Tune regularizes (dropout, L2 etc).
Fine tune learning rate - it's the most important hyper-parameter.

Modifying Sound Input to Determine Frequency

I'm working on a project and I've hit a snag that is past my understanding. My goal is to create an artificial neural network which is fed information from a sound file which is then ported through the system, resulting in a labeling of the chord. I'm hoping to make this to help in music transcription -- not to actually do the transcription itself, but to help in the harmonization aspect. I digress.
I've read as much as I can on the Goertzel and the FFT function, but I'm unsure if these functions are what I'm looking for. I'm not looking for any particular frequency in the sound sample, but rather, I'm hoping to find the higher, middle, and low range frequencies of the sample.
I know the Goertzel algorithm returns a high number if a particular frequency is found, but it seems computational wasteful to run the algorithm for all possible tones in a given sample. Any ideas on what to use?
Or, if this is impossible, I'd love to know that too before spending too much time on this one project.
Thank you for your time!
Probably better suited to DSP StackExchange.
Suppose you FFT a single 110Hz tone to get a spectrogram; you'll see evenly spaced peaks at 110 220 330 etc Hz -- the harmonics. 110 is the fundamental.
Suppose you have 3 tones. Already it's going to look quite messy in the frequency domain. Especially if you have a chord containing e.g. A110 and A220.
On account of this, I think a neural network is a good approach.
Feed in FFT output.
It would be a good idea to use a neural network that accepts complex valued inputs, as FFT outputs of a complex number for each frequency bin.
http://www.eagle.tamut.edu/faculty/igor/PRESENTATIONS/IJCNN-0813_Tutorial.pdf
It may seem computationally wasteful to extract so many frequencies with FFT, but FFT algorithms are extremely efficient nowadays. You should probably use a bit strength of 10, so 2^10 inputs -> 2^9 = 512 complex bins.
FFT is the right solution. Basically, when you have the FFT of an input signal that consists only of sinus waves, you can determine the chord by just mapping which frequencys are present to specific tones in whichever musical temperament you want to use, then look up the chord specified by those tones. If you don't have sinus-waves as input, then using a neural network is a valid attempt in solving the problem, provided that you have enough samples to train it.
FFT is the right way. Harmonics don't bother you, since they are an integer multiple of the fundamental frequency they're just higher 'octaves' of the same note. And to recognize a chord, tranpositions of notes over whole octaves don't matter.

1024 pt fft on a large set of data points

I have a signal that may not be periodic. We take about 15secs worth of samples (# 10kHz sampling rate) and we need to do the FFT on that signal to get the frequency content.
The problem is that we are implementing this FFT on an embedded system (DSP) which provides a library FFT of max. 1024 length. That is, it takes in 1024 data points as input and provides a 1024 point output.
What is the correct way of obtaining an FFT on the full 150000 point input?
You could run the FFT on each 1024 point block and average them to get an average power spectrum on the lower-resolution 1024-point frequency axis (512 samples from 0 to the Nyquist frequency, fs/2, so about 10 Hz resolution for your 10 kHz sampling). You should average the magnitudes of the component FFTs (i.e., sqrt(re^2+im^2)), otherwise the average will be sensitive to the drifting phase within each subwindow, which will depend on the precise frequency of the sinusoi.
If you think the periodic component may be at a low frequency, such that it will show up in a 15 sec sample but not complete any cycles in a 1024/10k ~ 100ms sample (i.e., below 10 Hz or so), you could downsample your input. You could try something as crude as averaging every 100 points to get a somewhat-distorted signal at 100 Hz sampling rate, then pack 10.24 sec worth into your 1024 pt sequence to pass to the FFT.
You could combine these two approaches by using a smaller downsampling factor and then do the magnitude-averaging of successive windows.
I'm confused why the system provides an FFT only up to 1024 points - is there something about the memory that makes it harder to access larger blocks?
Calculating a 128k point FFT using a 1k FFT as a subroutine is possible, but you'd end up recoding a lot of the FFT yourself. Maybe you should forget about the system library and use some other FFT implementation, without the length limitation, that will compile on your target. It may not incorporate all the optimizations of the system-provided one, but you're likely to lose a lot of that advantage when you embed it within the custom code needed to use the partial outputs of the multiple shorter FFTs to produce the long FFT.
Probably the quickest way to do the hybrid FFT (1024 points using the library, then added code to combine them into a 128k point FFT) would be to take an existing full FFT routine (a radix-2, decimation-in-time (DIT) routine for instance), but then modify it to use the system library for what would have been the first 10 stages, which amount to calculating 128 individual 1024-point FFTs on different subsets of the original signal (not, unfortunately, successive windows, but the partial-bit-reversed subsets), then let the remaining 7 stages of butterflies operate on those partial outputs. You'd want to get a pretty solid understanding of how the DIT FFT works to implement this.

How to decide to cuttoff frequecies of filter in case of using ADC( Flow: Analog-signal to ADC to bits to fir_filter to filtered_output)

FIR filter has to be used for removing the noise.
I don't know the frequencies of the noise that might be adding up into the analog feedback signal I am taking.
My apparatus consists analog feedback signal then i am using ADC to digitize the value now I have to apply FIR filter to remove the noise, Now I am not sure which noise the noise which added up in the analog signal from the environment or some sort of noise comes there due to ADC ?
I have to code this in vhdl.(this part is easy I can do that).
My main problem is in deciding the frequencies.
Thanks in Advance !
I am tagging vhdl as some people who are working in vhdl might know about the filter.
Let me start by stating the obvious: An ADC samples at a fixed rate and can not represent any frequency higher than the Nyquist frequency
Step one: understand aliasing, and that any frequency higher than the Nyquist will alias into your signal as noise. Once you get this you understand that you need an anti aliasing filter in your hardware, in your analog signal path before you digitize it. Depending on the noise requirements of the application you may implement a very complicated 4 pole filter using op-amps; the simplest is to use an RC filter.
Step two: setting the filter cut off. Don't set the cutoff right at the Nyquist frequency, make sure the filter is cutting well before the nyquist (1/2x... 1/10x, depends really how clean and how much noise is present)
So now you're actually kind of over sampling your signal: The filter is cutting above your signal, and the sample rate is high enough such that the Nyquist frequency is sufficiently higher. Over sampling is kind of extra data, that you captured with the intent of filtering further, and possibly even decimating (keeping on in N samples and throwing the rest out)
Step three: use a filter to further remove the noise between the initial cut off of the anti-aliasing filter and the nyquist frequency. This is a science on it's own really, but let me start by suggesting a good decimation filter: Averaging 2 values. It's a box-car filter of order 2, also known as a SINC filter, and can be re applied N times. After N times it is the equivalent of an FIR using the values of the Nth row in pascal's triangle (and divided by their sum).
Again, the filter choice is a science on it's own really. To the extreme is the decimation filters of a sigma-delta ADC. The CS5376A datasheet clearly explains what they're doing; I learn quite a bit just from reading that datasheet!

ANN-based navigation system

I am currently working on an indoor navigation system using a Zigbee WSN in star topology.
I currently have signal strength data for 60 positions in an area of 15m by 10 approximately. I want to use ANN to help predict the coordinates for other positions. After going through a number of threads, I realized that normalizing the data would give me better results.
I tried that and re-trained my network a few times. I managed to get the goal parameter in the nntool of MATLAB to the value .000745, but still after I give a training sample as a test input, and then scaling it back, it is giving a value way-off.
A value of .000745 means that my data has been very closely fit, right? If yes, why this anomaly? I am dividing and multiplying by the maximum value to normalize and scale the value back respectively.
Can someone please explain me where I might be going wrong? Am I using the wrong training parameters? (I am using TRAINRP, 4 layers with 15 neurons in each layer and giving a goal of 1e-8, gradient of 1e-6 and 100000 epochs)
Should I consider methods other than ANN for this purpose?
Please help.
For spatial data you can always use Gaussian Process Regression. With a proper kernel you can predict pretty well and GP regression is a pretty simple thing to do (just matrix inversion and matrix vector multiplication) You don't have much data so exact GP regression can be easily done. For a nice source on GP Regression check this.
What did you scale? Inputs or outputs? Did scale input+output for your trainingset and only the output while testing?
What kind of error measure do you use? I assume your "goal parameter" is an error measure. Is it SSE (sum of squared errors) or MSE (mean squared errors)? 0.000745 seems to be very small and usually you should have almost no error on your training data.
Your ANN architecture might be too deep with too few hidden units for an initial test. Try different architectures like 40-20 hidden units, 60 HU, 30-20-10 HU, ...
You should generate a test set to verify your ANN's generalization. Otherwise overfitting might be a problem.