Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I am looking for a hash table implementation that I can use for CUDA coding. are there any good one's out there. Something like the Python dictionary . I will use strings as my keys
Alcantara et al have demonstrated a data-parallel algorithm for building hash tables on the GPU. I believe the implementation was made available as part of CUDPP.
That said, you may want to reconsider your original choice of a hash table. Sorting your data by key and then performing lots of queries en masse should yield much better performance in a massively parallel setting. What problem are you trying to solve?
When I wrote an OpenCL kernel to create a simple hash table for strings, I used the hash algorithm from Java's String.hashCode(), and then just modded that over the number of rows in the table to get a row index.
Hashing function
uint getWordHash(__global char* str, uint len) {
uint hash = 0, multiplier = 1;
for(int i = len - 1; i >= 0; i--) {
hash += str[i] * multiplier;
int shifted = multiplier << 5;
multiplier = shifted - multiplier;
}
return hash;
}
Indexing
uint hash = getWordHash(word, len);
uint row = hash % nRows;
I handled collisions manually of course, and this approach worked well when I knew the number of strings ahead of time.
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
module sillyfunction(input logic [3:0] d0,d1, input logic s, output logic [3:0] y, z);
assign y = d0; // does this considers 1 bit or all bits of busses?
anothersillyfunction instance(d1,z) // when this function is fed with these inputs, does it consider 1 bit of busses or all bits of busses?
endmodule
My question is when we want to perform function on specified bits we write something like "assign y[1:0] = d0[1:0];". However, if we don't specify bits what does vivado consider? In other words writing "y or y[3:0]" are the same? Are writing " assign y[3:0] = d0[3:0];" and " assign y = d0;" the same? How system considers a buss when it is just used with its name?
You should not use part select if your intent is selecting the entire range. In fact, y is not the same as y[3:0] when it comes to signed arithmetic. A select of a variable is always unsigned. If you declared it as
logic signed [3:0] y;
...
if (y[3:0] < 0) .. this could never be true
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
A MATLAB .mat file will be shared between 40-50 people and it will include cost numbers. This .mat is used for some elaborate calculation however the cost numbers should not be openly revealed except for a very few (1-2 people out of 40-50).
So the 1-2 people would like to keep the 'exposed' version of this .mat file
a.dim.a = 1
a.dim.b = 2
a.dim.c = 3
a.cost.x = 11
a.cost.y = 12
and then place the 'hidden' version on the shared drive for everyone else.
a.dim.a = 1
a.dim.b = 2
a.dim.c = 3
a.cost.x = ADSAUJ#$#I
a.cost.y = SDHAUWH##$
Be mindful that m-scripts are working on this .mat file so key-pair encryption isn't right since it's not a situation where we're trying to keep third parties from snooping on our data. It's about making some peoples life a bit difficult but if they worked hard, they could expose the numbers. So I'd like to ask what in your opinion is the best way of doing this?
The fact that the data is in a structure is not really relevant, the question is how to encrypt data and unfortunately MATLAB doesn't have encryption functions built-in. But fear not, as they are available in Java - which can accessed from MATLAB.
You can adapt the following to your requirement:
import javax.crypto.Cipher;
% The text to encrypt.
plaintext = 'foobar';
% Use RSA
cipher = Cipher.getInstance('RSA');
% Generate a key pair
keygen = java.security.KeyPairGenerator.getInstance('RSA');
keyPair = keygen.genKeyPair();
cipher.init(Cipher.ENCRYPT_MODE, keyPair.getPrivate());
% Convert your input to bytes
plaintextUnicodeVals = uint16(plaintext);
plaintextBytes = typecast(plaintextUnicodeVals, 'int8');
% Encrypt
ciphertext = cipher.doFinal(plaintextBytes)' %'
% And decrypt again...
cipher.init(Cipher.DECRYPT_MODE, keyPair.getPublic());
decryptedBytes = cipher.doFinal(ciphertext);
decryptedText = char(typecast(decryptedBytes, 'uint16'))'
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a mathematical data where it would be very convenient to have the index to start from zero like
a=sparse([],[],[],30,1);
>> a(0)=someValueHere
Subscript indices must either be real positive integers or logicals.
but Matlab by default offers only the index to start from 1. Is there some easy hack or trick by which I could still assign a(0) so that I don't need to create a dummyVar a0 for the value or append the value at the end?
So how to get assignment such as a(0) in Matlab? Every time zero-index called catch the error and return someValueHere instead of the warning?
To get MATLAB's index to start from 0 you'll need to make an large set of object classes that emulate regular numeric classes, but behave differently with functions such as subsassgn(), subsref() etc.
Maybe someone was crazy enough to do it somewhere, I'd expect this to take weeks to months of work to actually work properly.
There is a discussion on the matlab index issue: http://www.mathworks.cn/matlabcentral/newsreader/view_thread/285566
Maybe you can write a function like
function t=C_index(x)
t = x + 1;
Then you can write something like y(C_index(0)) to get the first value in vector y.
In Addition,
t=#(x) x+1
y(t(0))
should work.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Given the signals:
f1[n] = sinc[n] {1[n+5]-1[n-5]}
f2[n] = 1-rect[n]
f3[n] = 1[n]-1[n-5]
write a programm in matlab in which you will check the following proprieties:
1)sinc[n]:=sin(phi*n)/phi*n;
2)(f1*f2)[n] = (f2*f1)[n];
3)f1[n]*{ f2[n] + f3[n] } = f1[n]*f2[n] + f1[n]*f3[n];
4)(f1*delta)[n] = (delta*f1)[n] = f1[n];
I'm really really grateful for any tips/ideal on how to solve this problem. :)
sinc[n]:=sin(phi*n)/phi*n;
That certainly isn't Matlab syntax, and the ; at the end makes it not look much like a question either. Anyway, you have two options. Either plot the functions to visually assess equivalence or else check the vectors. I'll demonstrate with this one, then you can try for all the others.
Firstly you need to make a sample n vector which will be your domain over which to test equivalence (i.e. the x values of your plot). I'm going to arbitrarily choose:
n = -10:0.01:10;
Also I'm going to assuming by phi you actually meant pi based on the Matlab definition of sinc: http://www.mathworks.com/help/signal/ref/sinc.html
So now we have to functions:
a = sinc(n);
b = sin(n)./n;
a and b are now vectors with a corresponding "y" value for each element of n. You'll also notice I used a . before the /, this means element wise divide i.e. divide each element by each corresponding element rather than matrix division which is inversion followed by matrix multiplication.
Now lets plot them:
plot(n, a, n, b, 'r')
and finally to check numerical equivalence we could do this:
all(a == b)
But (and this is probably a bit out of scope for your question but important to know) you should actually never check for absolute equivalence of floating point numbers like that as you get precision errors due to different truncations in the inner calculations (because of how your computer stores floating point numbers). So instead it is good practice to rather check that the difference between the two numbers is less than some tiny threshold.
all((a - b) < 0.000001)
I'll leave the rest up to you
I'm currently working on the iPhone with Audio Units and I'm playing four tracks simultaneously. To improve the performance of my setup, I thought it would be a good idea to minimize the number of Audio Units / threads, by mixing down the four tracks into one.
With the following code I'm processing the next buffer by adding up the samples of the four tracks, keep them in the SInt16 range and add them to a temporary buffer, which will later on be copied into the ioData.mBuffers of the Audio Unit.
Although it works, I don't have the impression that this is the most efficient way to do this.
SInt16* buffer = bufferToWriteTo;
int reads = bufferSize/sizeof(SInt16);
SInt16** files = circularBuffer->files;
float tempValue;
SInt16 values[reads];
int k,j;
int numFiles=4;
for (k=0; k<reads; k++)
{
tempValue=0.f;
for (j=0; j<numFiles; j++)
{
tempValue += files[j][packetNumber];
}
if (tempValue > 32767.f) tempValue = 32767.f;
else if (tempValue < -32768.f) tempValue =- 32768.f;
values[k] = (SInt16) tempValue;
values[k] += values[k] << 16;
packetNumber++;
if (packetNumber >= totalPackets) packetNumber=0;
}
memcpy(buffer,values,bufferSize);
Any ideas or pointers to speed this up? Am I right?
The biggest improvement you can get from this code would be by not using floating point arithmetic. While the arithmetic by itself is fast, the conversions which happen in the nested loops, take a long time, especially on the ARM processor in the iPhone. You can achieve exactly the same results by using 'SInt32' instead of 'float' for the 'tempValue' variable.
Also, see if you can get rid of the memcpy() in the last string: perhaps you can construct the 'buffer' directly, without using a temporary buffer called 'values'. That saves one copy, which would be significant improvement for such a function.
Other notes: the last two lines of the loop probably belong outside of the loop and the body of the nested loop should use 'k' as a second index, instead of 'packetNumber', but I'm not sure about this logic.
And the last note: you're squashing the peaks of your resulting sound. While this seems like a good idea, it will sound pretty rough. You probably want to scale the result down instead of cropping it. Like that: instead of this code
for (j=0; j<numFiles; j++)
{
tempValue += files[j][packetNumber];
}
if (tempValue > 32767.f) tempValue = 32767.f;
else if (tempValue < -32768.f) tempValue =- 32768.f;
you probably want something like this:
for (j=0; j<numFiles; j++)
{
tempValue += files[j][packetNumber] / numFiles;
}
Edit: and please do not forget to measure the performance before and after, to see which one of the improvements gave the biggest impact. This is the best way to learn performance: trial and measurement
A couple of pointers even though I'm not really familliar with iPhone development.
You could unwind the inner loop. You don't need a for loop to add 4 numbers together although it might be your compiler will do this for you.
Write directly to the buffer in your for loop. memcpy at the end will do another loop to copy the buffers.
Don't use a float for tempvalue. Depending on the hardware integer math is quicker and you don't need floats for summing channels.
Remove the if/endif. Digital clipping will sound horrible anyway so try to avoid it before summing the channels together. Branching inside a loop like this should be avoided if possible.
One thing I found when writing the audio mixing routines for my app is that incremented pointers worked much faster than indexing. Some compilers may sort this out for you but - not sure on the iphone - but certainly this gave my app a big boost for these tight loops (about 30% if I recall).
eg: instead of this:
for (k=0; k<reads; k++)
{
// Use buffer[k]
}
do this:
SInt16* p=buffer;
SInt16* pEnd=buffer+reads;
while (p!=pEnd)
{
// Use *p
p++;
}
Also, I believe iPhone has some sort of SIMD (single instruction multiple data) support called VFP. This would let you perform math on a number of samples in one instruction but I know little about this on iPhone.