Emulate AVX512 VPCOMPESSB byte packing without AVX512_VBMI2 - x86-64

I have populated a zmm register with an array of byte integers from 0-63. The numbers serve as indices into a matrix. Non-zero elements represent rows in the matrix that contain data. Not all rows will contain data.
I am looking for an instruction (or instructions) to do the same thing with a byte array in a zmm register as VPCOMPRESSQ does with a qword array in a zmm register.
Consider a situation where rows 3, 6, 7, 9 and 12 contain data and all the other rows (a total of 64 rows or fewer) are empty. Mask register k1 now contains 64 bits set to 001001001001000 ... 0.
With the VPCOMPRESSQ instruction, I can use k1 to compress the zmm register so that non-zero elements are arranged contiguously from the start of the register, but there is no equivalent instruction for bytes.
PSHUFB appears as a candidate, but the shuffle control mask must be integers, not bits. I can get an integer array 0 0 3 0 0 6, etc, but I still have elements set to zero, and I need them arranged contiguously from the start of a zmm register.
So my question is, given a zmm register with a byte array where k1 is set as shown above, how can I get the non-zero elements arranged contiguously from the beginning of the zmm register, like VPCOMPRESSQ does for qwords?
AVX512VBMI2 has VPCOMPRESSB, but that's only available in Ice Lake and later. What can be done on Skylake-avx512 / Cascade Lake?

Related

How do I write MATLAB code for a DAC converter ?

In the first step I generated a sequence of bits (0,1)..
I used a randi command x = randi([0 1],1,3) to generate random bits
I stuck with these 2 steps :
Divide sequence by 3 bits into 8 levels
[000, 001, 010, 011, 100, 101, 110, 111]
For each quantum level assigns amplitude value from the range [-2, 2]
I won't provide the full source code to leave a bit of the homework for you, but I will give you some hints:
randi() is creating a sequence of 0 and 1 floating point numbers
Look at the documentation of function bitpack. This allows you to pack your bits from array elements into a single byte. Be aware that you need to provide an 8 element array of "bits" to fill a byte. User 'uint8' as the class argument.
before passing the array of floating point numbers to bitpack you have to convert it to a logical array by using the logical() function.
look at the documentation of linspace() to create an array with 8 elements containing your equally spaces amplitude values
lookup the amplitude value in this array for each "digital" value.

Is there a way to write a non 8bit aligned data to a binary file in Matlab?

I need to write data to a binary file, while all the data is aligned to be in Bytes, it's made of several fragments which are not alighted to Bytes:
The total size of the data is 96 bits comprised of:
RGB color: 3*8 bit numbers (24bit),
1st property value: 7bit number
2nd property value: 7bit number
number of objects: 26bit number
memory offsert: 32bit number.
totaling to 96bit or 12B
The reason that the data is split this way is that each number has significance and it's easier to create the data by putting the numbers separately in their correct order. I'm using fwrite for this, but the function only allows to write numbers in sizes of Bytes. I found a way around it by using a "hack":
num=red;
num=num*2^8+green;
num=num*2^8+blue;
num=num*2^7+first_prop_val;
num=num*2^7+second_prop_val;
num=num*2^26+number_of_objects;
fwrite(fid, num,'uint64');
fwrite(fid, memory_offset,'uint32');
This works because all the numbers are positive, but it's ugly. Is there a less "hacky" way to accomplish what I need?
*-the property numbers are the size of 7 because they can get values from 0 to 100, and giving them an extra bit just to align the data would mean that I can have less objects as they all need to be counted
For a signed 32bit integer, you can get the binary representation using:
bi=dec2bin(typecast(int32(-23),'uint32'))=='1'
Signed: You heading n bits, if they are equal to the n+1th bit
bw=7
assert all(bi(1:end-bw+1))|all(~bi(1:end-bw+1))
bi=bi(end-bw+1:end)
For a unsigned one, use:
bi=dec2bin(uint32(23))=='1'
Unsigned: You can remove heading zeros.
assert(all(~bi(1:end-bw)))
bi=bi(end-bw+1:end)
Put this into a function, convert all integers, concatenate to one binary array, cut in 32bit parts and write using uint32-format.

how to sum matrix entities as indexing by another vector values?

Suppose I have a vector B=[1 1 2 2] and A=[5 6 7 4] in the form of B says the numbers in the A that are need to be summed up. That is we need to sum 5 and 6 as the first entry of the result array and sum 7 and 4 as the second entry. If B is [1 2 1 2] then first element of the result is 5+7 and second element is 6+4.
How could I do it in Matlab in generic sense?
A fexible and general approach would be to use accumarray().
accumarray(B',A')
The function accumulates the values in A into the positions specified by B.
Since the documentation is not simple to understand I will summarize why it is flexible. You can:
choose your accumulating function (sum by default)
specify the positions as a set of coordinates for accumulation into ND arrays
preset the dimension of the accumulated array (by default it expands to max position)
pad with custom values the non accumulated positions (pads with 0 by default)
set the accumulated array to sparse, thus potential avoiding out of memory
[sum(A(1:2:end));sum(A(2:2:end))]

Difference bloom filters and FM-sketches

What is the difference between bloom filters and hash sketches (also FM-sketches) and what is their use?
Hash sketches/Flajolet-Martin Sketches
Flajolet, P./Martin, G. (1985): Probabilistic counting algorithms for data base applications, in: Journal of Computer and System Sciences, Vol. 31, No. 2 (September 1985), pp. 182-209.
Durand, M./Flajolet, P. (2003): Loglog Counting of Large Cardinalities, in: Springer LNCS 2832, Algorithms ESA 2003, pp. 605–617.
Hash sketches are used to count the number of distinct elements in a set.
given:
a bit array B[] of length l
a (single) hash function h() that maps to [0,1,...2^l)
a function r() that gives the position of the least-significant 1-bit in the binary representation of its input (e.g. 000101 returns 1, 001000 returns 4)
insertion of element x:
pn := h(x) returns a pseudo-random number
apply r(pn) to get the position of the bit array to set to 1
since output of h() is pseudo-random every bit i is set to 1 ~n/(2^(i+1)) times
number of distinct elements in the set:
find the position p of the right-most 0 in the bit array
p = log2(n), solve for n to get the number of distinct element in the set;
the result might be up to 1.83 magnitudes off
usage:
in Data Mining, P2P/distributed applications, estimation of the document frequency, etc.
Bloom filters
Bloom, H. (1970): Space/time trade-offs in hash coding with allowable errors, in: Communications of the ACM, Vol. 13, No. 7 (July 1970), pp. 422-426.
Bloom filters are used to test whether an element is a member of a set.
given:
a bit array B[] of length m
k different hash functions h_k() that map to [0,...,m-1], i.e. to one of the position of the m-bit array
insertion of element x:
apply h_k to x (h_k(x)), for all k, i.e. you get k values
set the resulting bits in the array B to 1 (if already set to 1, don't change anything)
check if y is already in the set:
get the positions p_k to check using all the hash functions h_k (h_k(y)), i.e. for each function h_k you get a position p_k
if one of the positions p_k is set to 0 in the array B, the element y is definitively not in the set
if all positions given by p_k are 1, the element y might (!) be in the set
false positive rate is approximately (1 - e^(-kn/m))^k, no false negatives are possible!
by increasing the number of hashing functions, the false positive rate can be decreased; however, at the same time your bloom filter gets slower; the optimal value of k is k = (m/n)ln(2)
usage:
in the beginning used as a cheap filter in databases to filter out elements that do not match a query
various applications today, e.g. in Google BigTable, but also in networking for IP lookups, etc.
The Bloom Filter is a data structure used for Membership lookup while FM Sketch is primarily used for counting of elements. These two data structures provide the respective solutions optimizing over the space required to perform the lookup/computation and the trade off is the accuracy of the result.

How to generate all possible combinations n-bit strings?

Given a positive integer n, I want to generate all possible n bit combinations in matlab.
For ex : If n=3, then answer should be
000
001
010
011
100
101
110
111
How do I do it ?
I want to actually store them in matrix. I tried
for n=1:2^4
r(n)=dec2bin(n,5);
end;
but that gave error "In an assignment A(:) = B, the number of elements in A and B must be the same.
Just loop over all integers in [0,2^n), and print the number as binary. If you always want to have n digits (e.g. insert leading zeros), this would look like:
for ii=0:2^n-1,
fprintf('%0*s\n', n, dec2bin(ii));
end
Edit: there are a number of ways to put the results in a matrix. The easiest is to use
x = dec2bin(0:2^n-1);
which will produce an n-by-2^n matrix of type char. Each row is one of the bit strings.
If you really want to store strings in each row, you can do this:
x = cell(1, 2^n);
for ii=0:2^n-1,
x{ii} = dec2bin(ii);
end
However, if you're looking for efficient processing, you should remember that integers are already stored in memory in binary! So, the vector:
x = 0 : 2^n-1;
Contains the binary patterns in the most memory efficient and CPU efficient way possible. The only trade-off is that you will not be able to represent patterns with more than 32 of 64 bits using this compact representation.
This is a one-line answer to the question which gives you a double array of all 2^n bit combinations:
bitCombs = dec2bin(0:2^n-1) - '0'
So many ways to do this permutation. If you are looking to implement with an array counter: set an array of counters going from 0 to 1 for each of the three positions (2^0,2^1,2^2). Let the starting number be 000 (stored in an array). Use the counter and increment its 1st place (2^0). The number will be 001. Reset the counter at position (2^0) and increase counter at 2^1 and go on a loop till you complete all the counters.