How to customize the output of the Postgres Pseudo Encrypt function? - postgresql

I would like to use the pseudo_encrypt function mentioned a few times on StackOverflow to make my IDs look more random: https://wiki.postgresql.org/wiki/Pseudo_encrypt
How can I customize this to output unique "random" numbers for just me. I read somewhere that you can just change the 1366.0 constant, but I don't want to take any risks with my IDs as any potential ID duplicates would cause major issues.
I really have no idea what each constant actually does, so I don't want to mess around with it unless I get some direction. Does anyone know which constants I can safely change?
Here it is:
CREATE OR REPLACE FUNCTION "pseudo_encrypt"("VALUE" int) RETURNS int IMMUTABLE STRICT AS $function_pseudo_encrypt$
DECLARE
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
BEGIN
l1:= ("VALUE" >> 16) & 65535;
r1:= "VALUE" & 65535;
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
r1 := l2;
l1 := r2;
i := i + 1;
END LOOP;
RETURN ((l1::int << 16) + r1);
END;
$function_pseudo_encrypt$ LANGUAGE plpgsql;
for bigint's
CREATE OR REPLACE FUNCTION "pseudo_encrypt"("VALUE" bigint) RETURNS bigint IMMUTABLE STRICT AS $function_pseudo_encrypt$
DECLARE
l1 bigint;
l2 bigint;
r1 bigint;
r2 bigint;
i int:=0;
BEGIN
l1:= ("VALUE" >> 32) & 4294967295::bigint;
r1:= "VALUE" & 4294967295;
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767*32767)::bigint;
r1 := l2;
l1 := r2;
i := i + 1;
END LOOP;
RETURN ((l1::bigint << 32) + r1);
END;
$function_pseudo_encrypt$ LANGUAGE plpgsql;

Alternative solution: use different ciphers
Other cipher functions are now available on postgres wiki. They're going to be significantly slower, but aside from that, they're better candidates for generating customized random-looking series of unique numbers.
For 32 bit outputs, Skip32 in plpgsql will encrypt its input with a 10 bytes wide key, so you just have to choose your own secret key to have your own specific permutation (the particular order in which the 2^32 unique values will come out).
For 64 bit outputs, XTEA in plpgsql will do similarly, but using a 16 bytes wide key.
Otherwise, to just customize pseudo_encrypt, see below:
Explanations about pseudo_encrypt's implementation:
This function has 3 properties
global unicity of the output values
reversability
pseudo-random effect
The first and second property come from the Feistel Network, and as already explained in #CodesInChaos's answer, they don't depend on the choice of these constants: 1366 and also 150889 and 714025.
Make sure when changing f(r1) that it stays a function in the mathematical sense, that is x=y implies f(x)=f(y), or in other words the same input must always produce the same output. Breaking this would break the unicity.
The purpose of these constants and this formula for f(r1) is to produce a reasonably good pseudo-random effect. Using postgres built-in random() or similar method is not possible because it's not a mathematical function as described above.
Why these arbitrary constants? In this part of the function:
r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
The formula and the values 1366, 150889 and 714025 come from Numerical recipes in C (1992, by William H.Press, 2nd ed.), chapter 7: random numbers, specifically p.284 and 285.
The book is not directly indexable on the web but readable through an interface here: http://apps.nrbook.com/c/index.html .It's also cited as a reference in various source code implementing PRNGs.
Among the algorithms discussed in this chapter, the one used above is very simple and relatively effective. The formula to get a new random number from a previous one (jran) is:
jran = (jran * ia + ic) % im;
ran = (float) jran / (float) im; /* normalize into the 0..1 range */
where jran is the current random integer.
This generator will necessarily loop over itself after a certain number of values (the "period"), so the constants ia, ic and im have to be chosen carefully for that period to be as large as possible. The book provides a table p.285 where constants are suggested for various lengths of the period.
ia=1366, ic=150889 and im=714025 is one of the entries for a period of
229 bits, which is way more than needed.
Finally the multiplication by 32767 or 215-1 is not part of the PRNG but meant to produce a positive half-integer from the 0..1 pseudo-random float value. Don't change that part, unless to widen the blocksize of the algorithm.

This function looks like a blockcipher based on a Feistel network - but it's lacking a key.
The Feistel construction is bijective, i.e. it guarantees that there are no collisions. The interesting part is: r2 := l1 # f(r1). As long as f(r1) only depends on r1 the pseudo_encrypt will be bijective, no matter what the function does.
The lack of key means that anybody who knows the source code can recover the sequential ID. So you're relying on security-though-obscurity.
The alternative is using a block cipher which takes a key. For 32 bit blocks there are relatively few choices, I know of Skip32 and ipcrypt. For 64 bit blocks there are many ciphers to choose from, including 3DES, Blowfish and XTEA.

Related

Ada Hash_type to integer

im trying to implement my own hash function as a wrapper to Strings.hash
with an upper bound, here is my code:
function hash (k : in tcodigo; b : in positive) return natural is
hash: Ada.Containers.Hash_Type;
begin
hash := Ada.Strings.Hash(String(k));
return integer(hash) mod b;
end hash;
My problem is that i dont know how to cast hash to integer so i can make the mod operation.
According to the Reference Manual (A.18.1) Hash_Type is a modular type. Therefore, a simple Integer(Hash) should work. Actually, I guess you can take directly the mod of the hash by first casting b to Hast_Type and then converting the result to Natural. Remember to add use type Ada.Containers.Hash_Type.
I recommend using the hashing function from the notes; if you don't have it at hand, I'll leave it here:
function hash(k: in string) return natural is
s, p: natural;
begin
s:= 0;
for i in k'range loop
p:= character'pos(k(i)); -- código ASCII
s:= (s+p) mod b;
end loop;
return s;
end hash;
See you on Monday at the exam; good luck and study hard!

Minizinc: declare explicit set in decision variable

I'm trying to implement the 'Sport Scheduling Problem' (with a Round-Robin approach to break symmetries). The actual problem is of no importance. I simply want to declare the value at x[1,1] to be the set {1,2} and base the sets in the same column upon the first set. This is modelled as in the code below. The output is included in a screenshot below it. The problem is that the first set is not printed as a set but rather some sort of range while the values at x[2,1] and x[3,1] are indeed printed as sets and x[4,1] again as a range. Why is this? I assume that in the declaration of x that set of 1..n is treated as an integer but if it is not, how to declare it as integers?
EDIT: ONLY the first column of the output is of importance.
int: n = 8;
int: nw = n-1;
int: np = n div 2;
array[1..np, 1..nw] of var set of 1..n: x;
% BEGIN FIX FIRST WEEK $
constraint(
x[1,1] = {1, 2}
);
constraint(
forall(t in 2..np) (x[t,1] = {t+1, n+2-t} )
);
solve satisfy;
output[
"\(x[p,w])" ++ if w == nw then "\n" else "\t" endif | p in 1..np, w in 1..nw
]
Backend solver: Gecode
(Here's a summarize of my comments above.)
The range syntax is simply a shorthand for contiguous values in a set: 1..8 is a shorthand of the set {1,2,3,4,5,6,7,8}, and 5..6 is a shorthand for the set {5,6}.
The reason for this shorthand is probably since it's often - and arguably - easier to read the shorthand version than the full list, especially if it's a long list of integers, e.g. 1..1024. It also save space in the output of solutions.
For the two set versions, e.g. {1,2}, this explicit enumeration might be clearer to read than 1..2, though I tend to prefer the shorthand version in all cases.

Split IEC 61131-3 DINT into two INT variables (PLC structured text)

I want to publish a DINT variable (dintTest) over MODBUS on a PLC to read it with Matlab Instrument Control Toolbox. Turns out, Matlab can read Modbus variables but only INT16. So i want to split the DINT variable into two INT variables in IEC. I found this solution, but this only allows values from +- 0 ... 32767^2:
dintTest := -2;
b := dintTest MOD 32767;
a := dintTest / 32767;
result := 32767 * a + b;
c := DINT_TO_INT(b); // publish over modbus
d := DINT_TO_INT(a); // publish over modbus
What would be the solution for the whole range of DINT?
Thanks!
edit:
I read with a matlab function block in simulink (requires Instrument Control Toolbox):
function Check = MBWriteHoldingRegs(Values,RegAddr)
coder.extrinsic('modbus');
m = modbus('tcpip', '192.169.237.17');
coder.extrinsic('write');
write(m,'holdingregs',RegAddr,double(Values),'int16');
Check = Values;
I would better split DINT to 2 WORD
VAR
diInt: DINT := -2;
dwTemp: DWORD;
w1: WORD;
w2: WORD;
END_VAR
dwTemp := DINT_TO_DWORD(diInt);
w1 := DWORD_TO_WORD(dwTemp);
w2 := DWORD_TO_WORD(SHR(dwTemp, 16));
And then I could build it back in matlab.
The point here is not using mathematic but bit masks.

Average value of input Data in Mitsubishi GX Works 2

I need to obtain an average value of my input signals from mitsubishi input module Q64AD. I'm working in GX Works 2 in structured text.
This is how i used to obtain average value in Codesys:
timer_sr(IN:= NOT timer_sr.Q , PT:= T#5s );
SUM1:= SUM1 + napr1;
Nsum:=Nsum + 1;
IF timer_sr.Q THEN
timer_sr(IN:= NOT timer_sr.Q , PT:= T#5s);
outsr := SUM1 /Nsum;
Nsum := 0;
SUM1 := 0;
END_IF;
napr1 - is value from module
This piece of code is not working in GX Works 2, and i think because SUM1 is not an INT data type, but just a Word[signed] type.
Is there a way to make SUM1 an INT type or may be there is another logic to that solution?
In other platforms it should work but the compiler gives a warning so I guess it will still compile? Of course, if the value is negative there will be problems.
You can convert a WORD to INT by the IEC function WORD_TO_INT. I'm not sure how well your system follows the standard but if it does, try the following:
WORD_TO_INT(SUM1). If the SUM1 > 65535 then there will be problems as the upper bound of INTis 32767.
If this doesn't help, could you provide more details? How is it not workin?
Ps. The WORD is unsigned data type, not signed as you wrote.

Generate unique random numbers in Postgresql with fixed length

I need to generate unique random numbers in Postgresql with a fixed length of 13 digits.
I've found a similar thread where was used a sequence encrypted using "pseudo_encrypt", but the returned number was not with a fixed length.
So, what i need is: get an encrypted random sequence with a fixed length of 13 digits where the min value is 0000000000001 and a max value is 9999999999999.
Is it possible? If start with the zeros in front is not possible is not a big problem (i think), i can set them programmatically during the reading from the db, but would be great if Postgresql can do it by itself.
-- EDIT --
After have realized some useful things i must change the question in order to explain better what i need:
I need to generate unique random numbers (bigint) in Postgresql with a fixed max length of 13 digits. Actually i'm trying to use pseudo_encrypt function (64 bit), but the returned number obviously is not with a fixed max length of 13, in the 32 bit case the max length is 10 digits (int), and for the 64 bit is 19 (bigint).
So, how to get an encrypted random sequence with a fixed max length of 13 digits, where the min value is 1 and the max value is 9999999999999 ?
Is it possible to modify the 64 bit pseudo_ecrypt function in order to get this result? Or if is not possible, there are other methods to get an unique sequence with this requirements?
Pseudo Encrypt function (64bit)
CREATE OR REPLACE FUNCTION pseudo_encrypt(VALUE bigint) returns bigint AS $$
DECLARE
l1 bigint;
l2 bigint;
r1 bigint;
r2 bigint;
i int:=0;
BEGIN
l1:= (VALUE >> 32) & 4294967295::bigint;
r1:= VALUE & 4294967295;
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767*32767)::int;
l1 := l2;
r1 := r2;
i := i + 1;
END LOOP;
RETURN ((l1::bigint << 32) + r1);
END;
$$ LANGUAGE plpgsql strict immutable;
Tweaking the existing function for N < 64 bits values
It's relatively simple to tweak the bigint variant to reduce the output to 2^N values, where N is even, and less than 64.
To get 13 decimal digits, consider the maximum N for which 2^N has 13 digits. That's N=42, with 2^42=4398046511104.
The algorithm works by breaking the input value into two halves with an equal number of bits, and make them flow through the Feistel network, essentially XOR'ing with the result of the round function and swapping halves at each iteration.
If at every stage of the process, each half is limited to 21 bits then the result combining both halves is guaranteed not to exceed 42 bits.
So here's my proposed variant:
CREATE OR REPLACE FUNCTION pseudo_encrypt42(VALUE bigint) returns bigint
AS $$
DECLARE
l1 bigint;
l2 bigint;
r1 bigint;
r2 bigint;
i int:=0;
b21 int:=(1<<21)-1; -- 21 bits mask for a half-number => 42 bits total
BEGIN
l1:= VALUE >> 21;
r1:= VALUE & b21;
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # (((((1366*r1+150889)%714025)/714025.0)*32767*32767)::int & b21);
l1 := l2;
r1 := r2;
i := i + 1;
END LOOP;
RETURN ((l1::bigint << 21) + r1);
END;
$$ LANGUAGE plpgsql strict immutable;
The input must be less than (2^42)-1, otherwise the outputs will collide , as pseudo_encrypt42(x) = pseudo_encrypt42(x mod 2^42).
What can be done about the missing numbers between 2^42 and 10^13 ?
2^42 - 10^13 = 5601953488896 so that's quite a lot of missing numbers.
I don't know how to help with that in a single pass with the Feistel network. One workaround that might be acceptable, though, is to generate another set of unique values in 0..M and add 2^42 to them, so there's no risk of collision.
This another set could be obtained by the same function, just with the offset added. 4398046511104 + pseudo_encrypt42(x) is guaranteed to be between 4398046511104 and 2*4398046511104 = 8796093022208 unique values so that's closer to the goal. The same technique could be applied with several other ranges, not even necessarily of the same size.
However this workaround degrades the random-looking behavior , as instead of having a single output range where every number can be between 0 and X, you'd get N distinct output ranges of X/N numbers. With several distinct partitions like that, it's easy to guess in what partition the output will be, just not what value inside the partition.