I am about to create a vector of size n, with zero and ones. I want to encrypt all the elements of the vector, but I am wondering if encryption of the elements reveals information about the zero and ones. Is there any specific cryptosystem at which when I encrypt the zero and 1s they are not distinguishable in their ciphertext form?
I think I found the answer. If we are encrypting the elements of the vector using encryption algorithms like RSA/padding, we can get different outputs for the same plaintext. So, we can encrypt the elements of the vector with these kinds of algorithms and get different results, so the ciphertext is indistinguishable each time.
Related
I read
Hashing is the transformation of arbitrary size input in the form of a fixed-size value. We use hashing algorithms to perform hashing operations i.e to generate the hash value of an input
And vector embeddings pretty much do the same that they convert an input into a vector of fixed dimension. Trying to understand the difference between them.
Hash encoding can use any function which can convert any string to a random unique number but while creating vector embeddings we try to use domain knowledge and context in which the string might have occurred in the corpus.
The SEAL (Simple Encrypted Arithmetic Library) uses Galois Automorphisms to enable batch computations (i.e., the addition and multiplication of many ciphertexts in parallel in one single operation).
The batching procedure is described in sections 5.6 Galois Automorphisms and 7.4 CRT Batching of the SEAL 2.3.1 manual.
In particular, the two sections above state that the following rings are isomorphic.
\prod_{i=0}^{n} \mathbb{Z}_t \cong \prod_{i=0}^{n} \mathbb{Z}_t[\zeta^{2i+1}] \cong \mathbb{Z}_t[x]/(x^n+1)
where \zeta is a primitive 2n-th root of unity modulo t.
An image of the above equation can found here (Stackoverflow does not allow me display images for now)
The same sections also state that mapping plaintext tuples in \prod_{i=0}^{n} \mathbb{Z}_t to \mathbb{Z}_t[x]/(x^n+1) can be done using Galois Automorphims.
More precisely, a n-dimensional \mathbb{Z}_t-vector plaintext can be thought of as a 2-by-(n/2) matrix, and the Galois Automorphisms would correspond to rotations of the columns and rows of that matrix.
Following the application of the Galois Automorphisms on the plaintext vector (rotations of the rows and columns), one can obtain a corresponding element in \mathbb{Z}_t[x]/(x^n+1), which will be used for batch computations.
My questions are the following.
1- Why is \mathbb{Z}_t[\zeta^{2i+1}] isomorphic to \mathbb{Z}_t ?
2- How are the Galois Automorphisms used precisely to map n-dimensional \mathbb{Z}_t-vector plaintexts to elements in \mathbb{Z}_t[x]/(x^n+1)?
Or stated differently, how does the Compose operation work? And how do you use Galois Automorphisms (row and column rotations) to compute it?
========================================================================
The isomorphism simply evaluates a polynomial at a root of unity to obtain an element of Zt. Note that this works because the relevant root of unity is itself in Zt. The entire batching system is just a big old Chinese Remainder Theorem: the batching slots are the reductions of the plaintext polynomial modulo x-zeta2i+1 for different i. Going back requires a standard CRT reconstruction.
In practice the CRT is implemented through the Number Theoretic Transform (FFT over a finite field) and its inverse. The Galois automorphism acts on the roots of unity by permuting them, forming two orbits. If we order the plaintext matrix slots in a way that the batching slot value corresponding to the next Galois conjugate of a primitive root is always to the left (or right) of the slot value corresponding to that primitive root, then the Galois action will permute the rows of the matrix cyclically. The two orbits can also be interchanged, which corresponds to the column rotation (swap).
Matters are further complicated by the fact that the NTT algorithm that SEAL uses results in a so-called "bit reversed" output order. This needs to be taken into account when the correct ordering of the batching values is determined before any NTT or inverse NTT can be performed.
On page 3 of "Lecture 8, White Noise and Power Spectral Density" it is mentioned that rand and randn create Pseudo-random numbers. Please correct me if I am wrong: a sequence of random number is that which for the same seed, two sequences are never really exact.
Whereas, Pseudo-random numbers are deterministic i.e., two sequences are same if generated from the same seed.
How can I create random numbers and not pseudo-random numbers since I was under the impression that Matlab's rand and randn functions are used to generate identically independent random numbers? But, the slides mention that they create pseudo random numbers. Googling for creating of random numbers return rand and randn() functions.
The reason for distinguishing random numbers from pseudo-random numbers is that I need to compare performance of cryptography (A) random with white noise characteristics and (B) pseudo-random signal with white noise characteristic. So, (A) must be different from (B). I shall be grateful for any code and the correct way to generate random numbers and pseudo-random numbers.
Generation of "true" random numbers is a tricky exercise, you can check Wikipedia on RNG and the tests of randomness (http://en.wikipedia.org/wiki/Random_number_generation). This link offers RNG based on atmospheric noise (http://www.random.org/).
As mentioned above, it is really difficult (probably impossible) to create real random numbers with computer software. There are numerous projects on the internet that provide real random numbers that are generated by physical processes (for example the one Kostya mentioned). A Particularly interesting one is this from HU Berlin.
That being said, for experiments like the one you want to perform, Maltab's psedo RNGs are more than fine. Matlab's algorithms include Mersenne Twister which is one of the best known pseudo RNG (I would suggest you google the Mersenne Twister's properties). See Maltab rng documentation here.
Since you did not mention which type of system you want to simulate, one simple approach to solve your issue would be to use a good RNG (Mersenne Twister) for process A and a not-so-good for process B.
I need to create a 16 bit hash from a 32 bit number, and I'm trying to determine if a simple modulus 2^16 is appropriate.
The hash will be used in a 2^16 entry hash table for fast lookup of the 32 bit number.
My understanding is that if the data space has a fairly even distribution, that a simple mod 2^16 is fine - it shouldn't result in too many collisions.
In this case, my 32 bit number is the result of a modified adler32 checksum, using 2^16 as M.
So, in a general sense, is my understanding correct, that it's fine to use a simple mod n (where n is hashtable size) as a hashing function if I have an even data distribution?
And specifically, will adler32 give a random enough distribution for this?
Yes, if your 32-bit numbers are uniformly distributed over all possible values, then a modulo n of those will also be uniformly distributed over the n possible values.
Whether the results of your modified checksum algorithm are uniformly distributed is an entirely different question. That will depend on whether the data you are applying the algorithm to has enough data to roll over the sums several times. If you are applying the algorithm to short strings that don't roll over the sums, then the result will not be uniformly distributed.
If you want a hash function, then you should use a hash function. Neither Adler-32 nor any CRC is a good hash function. There are many very fast and effective hash functions available in the public domain. You can look at CityHash.
I'm working on lossless data compression in MATLAB. I wish to encode a signal of about 60000 length. Here's my code:
function sig = huffman (Y, fs)
%getting array of unique values
Z = unique (Y);
%counting occurences of each element and listing it to a new array
countElY=histc(Y,Z); %# get the count of elements
p = countElY/numel(Y); %getting the probability distribution
[dict,avglen] = huffmandict(Z,p); % Create dictionary.
comp = huffmanenco(Y,dict) % Encode the data.
dsig = huffmandeco(comp, dict) %Decode the data
sound(dsig, fs)
Problem is, for a signal of such length, I exceed the 500 recursion limit at MATLAB, and that error occurs while creating the dictionary. I have already tried to break the signal into parts, but that took hell lot of time, and for only a small part of it. Any ideas how to make it work? Apart from extending the recursion limit, which is rather pointless and time consuming?
First you need to determine why you think it's possible to compress the data. Is the signal smooth? Is the range limited? Is the quantization limited? What makes it compressible will determine how to compress it.
Simply applying Huffman coding to a series of real values will not compress the data, since each of the values appears once, or maybe a few appear twice. Huffman depends on taking advantage of many occurrences of the same symbol, and a bias in the frequency, where some symbols are much more common than others.
Compressing a waveform would use different approaches. First would be to convert each sample to as few bits as are significant and that cover the range of inputs. Second would be to take differences between samples or use more advanced predictors to take advantage of smoothness in the waveform (if it is smooth). Third would be to find a way to group differences to encode more likely ones in fewer bits. That last step might use Huffman coding, but only after you've constructed something that Huffman coding can take advantage of.