horner's method of hashing - hash

I know how to get the value of hashing a string by horner's method wich takes a three paramettres String str , int p (prime) and int m like this
p(str)=( sumOf(str(0)+str(1)*M+....+str(n)*M^n) )%p = hashVal
but the problem is how to get the string str by giving just hashVal , p and M
for example if I give you hashval=7 , p = 11 and M = 2 you have to give me a string for example "hello" (not right just a suggestion for understanding)
I mean that I don't know how to do the inverse
and thanks for your help

You can't get a unique input from the hash you describe, since the modulo operation throws information away. If you knew the hashval was 7, M=2 and p=11, as in your example, you wouldn't know whether the sumOf(...) was 7, or 18, or so on.
Even if you did, say you knew it was 5, for an example 2-character string you wouldn't be able to work out whether str(0) was 1 and str(1) was 2, for example, or str(0) was 5 and str(1) was 0.
Hashes are usually hard/impossible to reverse, especially uniquely. The simplest way to solve them is to hash all possible inputs and check their outputs. You'll end up with many inputs with the same hashVal (if p in your example is 3 there are only 3 different hashes).

Related

SPSS/macro: split string into multiple variables

I am trying to split a string variable into multiple dummy coded variables. I used these sources to get an idea of how one would achieve this task in SPSS:
https://www.ibm.com/support/pages/making-multiple-string-variables-single-multiply-coded-field
https://www.spss-tutorials.com/spss-split-string-variable-into-separate-variables/
But when I try to adapt the first one to my needs or when I try to convert the second one to a macro, I fail.
In my dataset I have (multiple) variables that contain a comma seperated string that represents different combinations of selected items (as well as missing values). For each item of a specific variable I want to create a dummy variable. If the item was selected, it should be represented with a 1 in the new dummy variable. If it was not selected, that case should be represented with a 0.
Different input variables can contain different numbers of items.
For example:
ID
VAR1
VAR2
DMMY1_1
DMMY1_2
DMMY1_3
1
1, 2
8
1
1
0
2
1
1, 3
1
0
0
3
3, 1
2, 3, 1
1
0
1
4
2, 8
0
0
0
Here is what I came up with so far ...
* DEFINE DATA.
DATA LIST /ID 1 (F) VAR1 2-5 (A) VAR2 6-12 (A).
BEGIN DATA
11, 28
21 1, 3
33, 12, 3, 1
4 2, 8
END DATA.
* MACRO SYNTAX.
* DEFINE VARIABLES (in the long run these should/will be inside the macro function, but for now I will leave them outside).
NUMERIC v1 TO v3 (F1).
VECTOR v = v1 TO v3.
STRING #char (A1).
DEFINE split_var(vr = TOKENS(1)).
!DO !#pos=1 !TO char.length(!vr).
COMPUTE #char = char.substr(!vr, !#pos, 1).
!IF (!#char !NE "," !AND !#char !NE " ") !THEN
COMPUTE v(NUMBER(!#char, F1)) = 1.
!IFEND.
!DOEND.
!ENDDEFINE.
split_var vr=VAR1.
EXECUTE.
As I got more errors than I can count, it's hard to narrow down my problem. But I think the problem has something to do with the way I use the char.length() function (and I am a bit confused when to use the bang operator).
If anyone has some insights, I would really appreciate some help :)
There is a fundamental issue to understand about SPSS macro - the macro does not read or interact in any way with the data. All the macro does is manipulate text to write syntax. The syntax created will later work on the actual data when you run it.
So, for example, Your first error is using char.length(!vr) within the syntax. You are trying to get the macro to read the data, calculate the length and use, but that simply can't be done - the macro can only work with what you gave it.
Another example in your code: you calculate #char and then try to use it in the macro as !#char. So that obviously won't work. ! precedes only macro functions or arguments. #char, in your code, is neither, and it can't become one - can't read the data into the macro...
To give you a litte push forward: I understand you want the macro loop to run a different number of times for each variable, but you can't use char.length(!vr). I suggest instead have the macro loop as many times as necessary to be sure you can deal with the longest variable you'll need to work with.
And another general strategy hint - first, create syntax to deal with one specific variable and one specific delimiter. Once this works, start working on a macro, keeping in mind that the only purpose of the macro is to recreate the same working syntax, only changing the parameters of variable name and delimiter.
With my new understanding of the SPSS macro logic (thanks to #eli-k) the problem was quite easy to solve. Here is the working solution.
* DEFINE DATA.
DATA LIST /ID 1 (F) VAR1 2-5 (A) VAR2 6-12 (A).
BEGIN DATA
11, 28
21 1, 3
33, 12, 3, 1
4 2, 8
END DATA.
* DEFINE MACRO.
DEFINE #split_var(src_var = !TOKENS(1)
/dmmy_var_label = !DEFAULT(dmmy) !TOKENS(1)
/dmmy_var_lvls = !TOKENS(1))
NUMERIC !CONCAT(!dmmy_var_label,1) TO !CONCAT(!dmmy_var_label, !dmmy_var_lvls) (F1).
VECTOR #dmmy_vec = !CONCAT(!dmmy_var_label,1) TO !CONCAT(!dmmy_var_label, !dmmy_var_lvls).
STRING #char (A1).
LOOP #pos=1 TO char.length(!src_var).
COMPUTE #char = char.substr(!src_var, #pos, 1).
DO IF (#char NE "," AND #char NE " ").
COMPUTE #index = NUMBER(#char, F1).
COMPUTE #dmmy_vec(#index) = 1.
END IF.
END LOOP.
RECODE !CONCAT(!dmmy_var_label,1) TO !CONCAT(!dmmy_var_label, !dmmy_var_lvls) (SYSMIS=0) (ELSE=COPY).
EXECUTE.
!ENDDEFINE.
* CALL MACRO.
#split_var src_var=VAR2 dmmy_var_lvls=8.

Minizinc: declare explicit set in decision variable

I'm trying to implement the 'Sport Scheduling Problem' (with a Round-Robin approach to break symmetries). The actual problem is of no importance. I simply want to declare the value at x[1,1] to be the set {1,2} and base the sets in the same column upon the first set. This is modelled as in the code below. The output is included in a screenshot below it. The problem is that the first set is not printed as a set but rather some sort of range while the values at x[2,1] and x[3,1] are indeed printed as sets and x[4,1] again as a range. Why is this? I assume that in the declaration of x that set of 1..n is treated as an integer but if it is not, how to declare it as integers?
EDIT: ONLY the first column of the output is of importance.
int: n = 8;
int: nw = n-1;
int: np = n div 2;
array[1..np, 1..nw] of var set of 1..n: x;
% BEGIN FIX FIRST WEEK $
constraint(
x[1,1] = {1, 2}
);
constraint(
forall(t in 2..np) (x[t,1] = {t+1, n+2-t} )
);
solve satisfy;
output[
"\(x[p,w])" ++ if w == nw then "\n" else "\t" endif | p in 1..np, w in 1..nw
]
Backend solver: Gecode
(Here's a summarize of my comments above.)
The range syntax is simply a shorthand for contiguous values in a set: 1..8 is a shorthand of the set {1,2,3,4,5,6,7,8}, and 5..6 is a shorthand for the set {5,6}.
The reason for this shorthand is probably since it's often - and arguably - easier to read the shorthand version than the full list, especially if it's a long list of integers, e.g. 1..1024. It also save space in the output of solutions.
For the two set versions, e.g. {1,2}, this explicit enumeration might be clearer to read than 1..2, though I tend to prefer the shorthand version in all cases.

Scientific notation in MATLAB

Say I have an array that contains the following elements:
1.0e+14 *
1.3325 1.6485 2.0402 1.0485 1.2027 2.0615 1.7432 1.9709 1.4807 0.9012
Now, is there a way to grab 1.0e+14 * (base and exponent) individually?
If I do arr(10), then this will return 9.0120e+13 instead of 0.9012e+14.
Assuming the question is to grab any elements in the array with coefficient less than one. Is there a way to obtain 1.0e+14, so that I could just do arr(i) < 1.0e+14?
I assume you want string output.
Let a denote the input numeric array. You can do it this way, if you don't mind using evalc (a variant of eval, which is considered bad practice):
s = evalc('disp(a)');
s = regexp(s, '[\de+-\.]+', 'match');
This produces a cell array with the desired strings.
Example:
>> a = [1.2e-5 3.4e-6]
a =
1.0e-04 *
0.1200 0.0340
>> s = evalc('disp(a)');
>> s = regexp(s, '[\de+-\.]+', 'match')
s =
'1.0e-04' '0.1200' '0.0340'
Here is the original answer from Alain.
Basic math can tell you that:
floor(log10(N))
The log base 10 of a number tells you approximately how many digits before the decimal are in that number.
For instance, 99987123459823754 is 9.998E+016
log10(99987123459823754) is 16.9999441, the floor of which is 16 - which can basically tell you "the exponent in scientific notation is 16, very close to being 17".
Now you have the exponent of the scientific notation. This should allow you to get to whatever your goal is ;-).
And depending on what you want to do with your exponent and the number, you could also define your own method. An example is described in this thread.

Turn off Warning: Extension: Conversion from LOGICAL(4) to INTEGER(4) at (1) for gfortran?

I am intentionally casting an array of boolean values to integers but I get this warning:
Warning: Extension: Conversion from LOGICAL(4) to INTEGER(4) at (1)
which I don't want. Can I either
(1) Turn off that warning in the Makefile?
or (more favorably)
(2) Explicitly make this cast in the code so that the compiler doesn't need to worry?
The code will looking something like this:
A = (B.eq.0)
where A and B are both size (n,1) integer arrays. B will be filled with integers ranging from 0 to 3. I need to use this type of command again later with something like A = (B.eq.1) and I need A to be an integer array where it is 1 if and only if B is the requested integer, otherwise it should be 0. These should act as boolean values (1 for .true., 0 for .false.), but I am going to be using them in matrix operations and summations where they will be converted to floating point values (when necessary) for division, so logical values are not optimal in this circumstance.
Specifically, I am looking for the fastest, most vectorized version of this command. It is easy to write a wrapper for testing elements, but I want this to be a vectorized operation for efficiency.
I am currently compiling with gfortran, but would like whatever methods are used to also work in ifort as I will be compiling with intel compilers down the road.
update:
Both merge and where work perfectly for the example in question. I will look into performance metrics on these and select the best for vectorization. I am also interested in how this will work with matrices, not just arrays, but that was not my original question so I will post a new one unless someone wants to expand their answer to how this might be adapted for matrices.
I have not found a compiler option to solve (1).
However, the type conversion is pretty simple. The documentation for gfortran specifies that .true. is mapped to 1, and false to 0.
Note that the conversion is not specified by the standard, and different values could be used by other compilers. Specifically, you should not depend on the exact values.
A simple merge will do the trick for scalars and arrays:
program test
integer :: int_sca, int_vec(3)
logical :: log_sca, log_vec(3)
log_sca = .true.
log_vec = [ .true., .false., .true. ]
int_sca = merge( 1, 0, log_sca )
int_vec = merge( 1, 0, log_vec )
print *, int_sca
print *, int_vec
end program
To address your updated question, this is trivial to do with merge:
A = merge(1, 0, B == 0)
This can be performed on scalars and arrays of arbitrary dimensions. For the latter, this can easily be vectorized be the compiler. You should consult the manual of your compiler for that, though.
The where statement in Casey's answer can be extended in the same way.
Since you convert them to floats later on, why not assign them as floats right away? Assuming that A is real, this could look like:
A = merge(1., 0., B == 0)
Another method to compliment #AlexanderVogt is to use the where construct.
program test
implicit none
integer :: int_vec(5)
logical :: log_vec(5)
log_vec = [ .true., .true., .false., .true., .false. ]
where (log_vec)
int_vec = 1
elsewhere
int_vec = 0
end where
print *, log_vec
print *, int_vec
end program test
This will assign 1 to the elements of int_vec that correspond to true elements of log_vec and 0 to the others.
The where construct will work for any rank array.
For this particular example you could avoid the logical all together:
A=1-(3-B)/3
Of course not so good for readability, but it might be ok performance-wise.
Edit, running performance tests this is 2-3 x faster than the where construct, and of course absolutely standards conforming. In fact you can throw in an absolute value and generalize as:
integer,parameter :: h=huge(1)
A=1-(h-abs(B))/h
and still beat the where loop.

Is this the simplified version of this boolean expression? Or is this reviewer wrong

Cause I've tried doing the truth table unfortunately one has 3 literals and the other has 4 so i got confused.
F = (A+B+C)(A+B+D')+B'C;
and this is the simplified version
F = A + B + C
http://www.belley.org/etc141/Boolean%20Sinplification%20Exercises/Boolean%20Simplification%20Exercise%20Questions.pdf
cause I think there's something wrong with this reviewer.. or is it accurate?
btw is simplification different from minimizing from Sum of Minterms to Sum of Products?
Yes, it is the same.
Draw the truth table for both expressions, assuming that there are four input variables in both. The value of D will not play into the second truth table: values in cells with D=1 will match values in cells with D=0. In other words, you can think of the second expression as
F = A +B + C + (0)(D)
You will see that both tables match: the (A+B+C)(A+B+D') subexpression has zeros in ABCD= {0000, 0001, 0011}; (A+B+C) has zeros only at {0000, 0001}. Adding B'C patches zero at 0011 in the first subexpressions, so the results are equivalent.