Palindromic permutated substrings - substring

I was asked this question in a HackerEarth test and I couldn't wrap my head around even forming the algorithm.
The question is -
Count the number of substrings of a string, such that any of their permutations is a palindrome.
So, for aab, the answer is 5 - a, a, b, aa and aab (which can be permuted to form aba).
I feel this is dynamic programming, but I can't find what kind of relations the subproblems might have.
Edit:
So I think the recursive relation might be
dp[i] = dp[i-1] + 1 if str[i] has already appeared before and
substring ending at i-1 has at most 2 characters with odd frequency
else dp[i] = dp[i-1]
No idea if this is right.

I can think of O(n^2) - traverse substrings of length > 1, from indexes (0, 1) up to (0, n-1), then from (1, n-1) down to (1, 3), then from (2, 3) up to (2, n-2), then from (3, n-2) down to (3, 5)...etc.
While traversing, maintain a map of current frequency for each character, as well as totals of the number of characters with odd counts and the number of characters with even counts. Update those on each iteration and add to the total count of palindromic permuted substrings if we are on a substring with (1) odd length and only one character with odd frequency, or (2) even length and no character with odd frequency.
(Add the string length for the count of single character palindromes.)

If I did not misunderstand your question, I tend to believe this is a math problem. Say the length of a string is n, then the answer should be n * (n+1) / 2, the sum of an infinite series. See https://en.wikipedia.org/wiki/1_%2B_2_%2B_3_%2B_4_%2B_%E2%8B%AF
For example, string abcde, we can get substrings
a, b, c, d, e,
ab, bc, cd, de,
abc, bcd, cde,
abcd, bcde,
abcde .
You may find the answer from the way I listed the substrings.

So here is my solution that may help you.
you can get a list of every possible substring of input by running a nested loop and for every substring you have to check if the substring can form a palindrome or not.
now how to check if a string/substring can form palindrome:
If a substring is having alphabet of odd number of occurance more than 1, them it can't form a palindrome.Here is the code:
bool stringCanbeFormAPalindrome(string s)
{
int oddValues, alphabet[26];
for(int i =0; i< s.length(); i++)
{
alphabet[s[i]-'a']++;
}
for(int i=0; i<26; i++)
{
if(alphabet[i]%2==1)
{
oddValues++;
if(oddValues>1) return FALSE;
}
}
return TRUE;
}
May that helps.

You can do it easily in O(N) time and O(N) space complexity
notice, the only thing that if the permutation of substring is palindrome or not is the parity of odd character in it so just create a mask of parity of every character, now for any valid substring there can be at most 1 bit different to our current mask, let's iterate on which bit is different, and adding the corresponding answer.
Here's a C++ code (assuming unordered_map is O(1) per query)
string s;
cin>>s;
int n=s.length();
int ans=0;
unordered_map<int,int>um;
um[0]=1;
int mask=0;
for(int i=0;i<n;++i){
mask^=1<<(s[i]-'a');
ans+=um[mask];
for(int j=27;j>=0;--j){
ans+=um[mask^(1<<j)];
}
um[mask]++;
}
cout<<ans;
take care of integer overflow.

Related

Hash an 8 digit number that contains non repetitive digits from 1 to 8 only

Given that a number can contain only digits from 1 to 8 (with no repetition), and is of length 8, how can we hash such numbers without using a hashSet?
We can't just directly use the value of the number of the hashing value, as the stack size of the program is limited. (By this, I mean that we can't directly make the index of an array, represent our number).
Therefore, this 8 digit number needs to be mapped to, at maximum, a 5 digit number.
I saw this answer. The hash function returns a 8-digit number, for a input that is an 8-digit number.
So, what can I do here?
There's a few things you can do. You could subtract 1 from each digit and parse it as an octal number, which will map one-to-one every number from your domain to the range [0,16777216) with no gaps. The resulting number can be used as an index into a very large array. An example of this could work as below:
function hash(num) {
return parseInt(num
.toString()
.split('')
.map(x => x - 1), 8);
}
const set = new Array(8**8);
set[hash(12345678)] = true;
// 12345678 is in the set
Or if you wanna conserve some space and grow the data structure as you add elements. You can use a tree structure with 8 branches at every node and a maximum depth of 8. I'll leave that up to you to figure out if you think it's worth the trouble.
Edit:
After seeing the updated question, I began thinking about how you could probably map the number to its position in a lexicographically sorted list of the permutations of the digits 1-8. That would be optimal because it gives you the theoretical 5-digit hash you want (under 40320). I had some trouble formulating the algorithm to do this on my own, so I did some digging. I found this example implementation that does just what you're looking for. I've taken inspiration from this to implement the algorithm in JavaScript for you.
function hash(num) {
const digits = num
.toString()
.split('')
.map(x => x - 1);
const len = digits.length;
const seen = new Array(len);
let rank = 0;
for(let i = 0; i < len; i++) {
seen[digits[i]] = true;
rank += numsBelowUnseen(digits[i], seen) * fact(len - i - 1);
}
return rank;
}
// count unseen digits less than n
function numsBelowUnseen(n, seen) {
let count = 0;
for(let i = 0; i < n; i++) {
if(!seen[i]) count++;
}
return count;
}
// factorial fuction
function fact(x) {
return x <= 0 ? 1 : x * fact(x - 1);
}
kamoroso94 gave me the idea of representing the number in octal. The number remains unique if we remove the first digit from it. So, we can make an array of length 8^7=2097152, and thus use the 7-digit octal version as index.
If this array size is bigger than the stack, then we can use only 6 digits of the input, convert them to their octal values. So, 8^6=262144, that is pretty small. We can make a 2D array of length 8^6. So, total space used will be in the order of 2*(8^6). The first index of the second dimension represents that the number starts from the smaller number, and the second index represents that the number starts from the bigger number.

Optimal String comparison method swift

What is the best algorithm to use to get a percentage similarity between two strings. I have been using Levenshtein so far, but it's not sufficient. Levenshtein gives me the number of differences, and then I have to try and compute that into a similarity by doing:
100 - (no.differences/no.characters_in_scnd_string * 100)
For example, if I test how similar "ab" is to "abc", I get around 66% similarity, which makes sense, as "ab" is 2/3 similar to "abc".
The problem I encounter, is when I test "abcabc" to "abc", I get a similarity of 100%, as "abc" is entirely present in "abcabc". However, I want the answer to be 50%, because 50% of "abcabc" is the same as "abc"...
I hope this makes some sense... The second string is constant, and I want to test the similairty of different strings to that string. By similar, I mean "cat dog" and "dog cat" have an extremely high similarity despite difference in word order.
Any ideas?
This implement of algorithms of Damerau–Levenshtein distance and Levenshtein distance
you can check this StringMetric Algorithms have what you need
https://github.com/autozimu/StringMetric.swift
Using Levenstein algorithm with input:
case1 - distance(abcabc, abc)
case2 - distance(cat dog, dog cat)
Output is:
distance(abcabc, abc) = 3 // what is ok, if you count percent from `abcabc`
distance(cat dog, dog cat) = 6 // should be 0
So in the case of abcabc and abc we are getting 3 and it is 50% of the largest word abcabc. exactly what you want to achive.
The second case with cats and dogs: my suggestion is to split this Strings to words and compare all possible combinations of them and chose the smallest result.
UPDATE:
The second case I will describe with pseudo code, because I'm not very familiar with Swift.
get(cat dog) and split to array of words ('cat' , 'dog') //array1
get(dog cat) and split to array of words ('dog' , 'cat') //array2
var minValue = 0;
for every i-th element of `array1`
var temp = maxIntegerValue // here will be storred all results of 'distance(i, j)'
index = 0 // remember index of smallest temp
for every j-th element of `array2`
if (temp < distance(i, j))
temp = distance(i, j)
index = j
// here we have found the smallest distance(i, j) value of i in 'array2'
// now we should delete current j from 'array2'
delete j from array2
//add temp to minValue
minValue = minValue + temp
Workflow will be like this:
After first iteration on first for statement (for value 'cat' array1) we will get 0, because i = 0 and j = 1 are identic. Then j = 1 will be removed from array2 and after that array2 will have only elem dog.
Second iteration on second for statement (for value 'dog' array1) we will get also 0, because it is identic with dog from array2
At least from now you have an idea how to deal with your problem. It is now depends on you how exactly you will implement it, probably you will take another data structure.

binary to decimal in objective-c

I want to convert the decimal number 27 into binary such a way that , first the digit 2 is converted and its binary value is placed in an array and then the digit 7 is converted and its binary number is placed in that array. what should I do?
thanks in advance
That's called binary-coded decimal. It's easiest to work right-to-left. Take the value modulo 10 (% operator in C/C++/ObjC) and put it in the array. Then integer-divide the value by 10 (/ operator in C/C++/ObjC). Continue until your value is zero. Then reverse the array if you need most-significant digit first.
If I understand your question correctly, you want to go from 27 to an array that looks like {0010, 0111}.
If you understand how base systems work (specifically the decimal system), this should be simple.
First, you find the remainder of your number when divided by 10. Your number 27 in this case would result with 7.
Then you integer divide your number by 10 and store it back in that variable. Your number 27 would result in 2.
How many times do you do this?
You do this until you have no more digits.
How many digits can you have?
Well, if you think about the number 100, it has 3 digits because the number needs to remember that one 10^2 exists in the number. On the other hand, 99 does not.
The answer to the previous question is 1 + floor of Log base 10 of the input number.
Log of 100 is 2, plus 1 is 3, which equals number of digits.
Log of 99 is a little less than 2, but flooring it is 1, plus 1 is 2.
In java it is like this:
int input = 27;
int number = 0;
int numDigits = Math.floor(Log(10, input)) + 1;
int[] digitArray = new int [numDigits];
for (int i = 0; i < numDigits; i++) {
number = input % 10;
digitArray[numDigits - i - 1] = number;
input = input / 10;
}
return digitArray;
Java doesn't have a Log function that is portable for any base (it has it for base e), but it is trivial to make a function for it.
double Log( double base, double value ) {
return Math.log(value)/Math.log(base);
}
Good luck.

Generate a hash sum for several integers

I am facing the problem of having several integers, and I have to generate one using them. For example.
Int 1: 14
Int 2: 4
Int 3: 8
Int 4: 4
Hash Sum: 43
I have some restriction in the values, the maximum value that and attribute can have is 30, the addition of all of them is always 30. And the attributes are always positive.
The key is that I want to generate the same hash sum for similar integers, for example if I have the integers, 14, 4, 10, 2 then I want to generate the same hash sum, in the case above 43. But of course if the integers are very different (4, 4, 2, 20) then I should have a different hash sum. Also it needs to be fast.
Ideally I would like that the output of the hash sum is between 0 and 512, and it should evenly distributed. With my restrictions I can have around 5K different possibilities, so what I would like to have is around 10 per bucket.
I am sure there are many algorithms that do this, but I could not find a way of googling this thing. Can anyone please post an algorithm to do this?.
Some more information
The whole thing with this is that those integers are attributes for a function. I want to store the values of the function in a table, but I do not have enough memory to store all the different options. That is why I want to generalize between similar attributes.
The reason why 10, 5, 15 are totally different from 5, 10, 15, it is because if you imagine this in 3d then both points are a totally different point
Some more information 2
Some answers try to solve the problem using hashing. But I do not think this is so complex. Thanks to one of the comments I have realized that this is a clustering algorithm problem. If we have only 3 attributes and we imagine the problem in 3d, what I just need is divide the space in blocks.
In fact this can be solved with rules of this type
if (att[0] < 5 && att[1] < 5 && att[2] < 5 && att[3] < 5)
Block = 21
if ( (5 < att[0] < 10) && (5 < att[1] < 10) && (5 < att[2] < 10) && (5 < att[3] < 10))
Block = 45
The problem is that I need a fast and a general way to generate those ifs I cannot write all the possibilities.
The simple solution:
Convert the integers to strings separated by commas, and hash the resulting string using a common hashing algorithm (md5, sha, etc).
If you really want to roll-your-own, I would do something like:
Generate large prime P
Generate random numbers 0 < a[i] < P (for each dimension you have)
To generate hash, calculate: sum(a[i] * x[i]) mod P
Given the inputs a, b, c, and d, each ranging in value from 0 to 30 (5 bits), the following will produce an number in the range of 0 to 255 (8 bits).
bucket = ((a & 0x18) << 3) | ((b & 0x18) << 1) | ((c & 0x18) >> 1) | ((d & 0x18) >> 3)
Whether the general approach is appropriate depends on how the question is interpreted. The 3 least significant bits are dropped, grouping 0-7 in the same set, 8-15 in the next, and so forth.
0-7,0-7,0-7,0-7 -> bucket 0
0-7,0-7,0-7,8-15 -> bucket 1
0-7,0-7,0-7,16-23 -> bucket 2
...
24-30,24-30,24-30,24-30 -> bucket 255
Trivially tested with:
for (int a = 0; a <= 30; a++)
for (int b = 0; b <= 30; b++)
for (int c = 0; c <= 30; c++)
for (int d = 0; d <= 30; d++) {
int bucket = ((a & 0x18) << 3) |
((b & 0x18) << 1) |
((c & 0x18) >> 1) |
((d & 0x18) >> 3);
printf("%d, %d, %d, %d -> %d\n",
a, b, c, d, bucket);
}
You want a hash function that depends on the order of inputs and where similar sets of numbers will generate the same hash? That is, you want 50 5 5 10 and 5 5 10 50 to generate different values, but you want 52 7 4 12 to generate the same hash as 50 5 5 10? A simple way to do something like this is:
long hash = 13;
for (int i = 0; i < array.length; i++) {
hash = hash * 37 + array[i] / 5;
}
This is imperfect, but should give you an idea of one way to implement what you want. It will treat the values 50 - 54 as the same value, but it will treat 49 and 50 as different values.
If you want the hash to be independent of the order of the inputs (so the hash of 5 10 20 and 20 10 5 are the same) then one way to do this is to sort the array of integers into ascending order before applying the hash. Another way would be to replace
hash = hash * 37 + array[i] / 5;
with
hash += array[i] / 5;
EDIT: Taking into account your comments in response to this answer, it sounds like my attempt above may serve your needs well enough. It won't be ideal, nor perfect. If you need high performance you have some research and experimentation to do.
To summarize, order is important, so 5 10 20 differs from 20 10 5. Also, you would ideally store each "vector" separately in your hash table, but to handle space limitations you want to store some groups of values in one table entry.
An ideal hash function would return a number evenly spread across the possible values based on your table size. Doing this right depends on the expected size of your table and on the number of and expected maximum value of the input vector values. If you can have negative values as "coordinate" values then this may affect how you compute your hash. If, given your range of input values and the hash function chosen, your maximum hash value is less than your hash table size, then you need to change the hash function to generate a larger hash value.
You might want to try using vectors to describe each number set as the hash value.
EDIT:
Since you're not describing why you want to not run the function itself, I'm guessing it's long running. Since you haven't described the breadth of the argument set.
If every value is expected then a full lookup table in a database might be faster.
If you're expecting repeated calls with the same arguments and little overall variation, then you could look at memoizing so only the first run for a argument set is expensive, and each additional request is fast, with less memory usage.
You would need to define what you mean by "similar". Hashes are generally designed to create unique results from unique input.
One approach would be to normalize your input and then generate a hash from the results.
Generating the same hash sum is called a collision, and is a bad thing for a hash to have. It makes it less useful.
If you want similar values to give the same output, you can divide the input by however close you want them to count. If the order makes a difference, use a different divisor for each number. The following function does what you describe:
int SqueezedSum( int a, int b, int c, int d )
{
return (a/11) + (b/7) + (c/5) + (d/3);
}
This is not a hash, but does what you describe.
You want to look into geometric hashing. In "standard" hashing you want
a short key
inverse resistance
collision resistance
With geometric hashing you susbtitute number 3 with something whihch is almost opposite; namely close initial values give close hash values.
Another way to view my problem is using the multidimesional scaling (MS). In MS we start with a matrix of items and what we want is assign a location of each item to an N dimensional space. Reducing in this way the number of dimensions.
http://en.wikipedia.org/wiki/Multidimensional_scaling

how can I count the number of set bits in a uint in specman?

I want to count the number of set bits in a uint in Specman:
var x: uint;
gen x;
var x_set_bits: uint;
x_set_bits = ?;
What's the best way to do this?
One way I've seen is:
x_set_bits = pack(NULL, x).count(it == 1);
pack(NULL, x) converts x to a list of bits.
count acts on the list and counts all the elements for which the condition holds. In this case the condition is that the element equals 1, which comes out to the number of set bits.
I don't know Specman, but another way I've seen this done looks a bit cheesy, but tends to be efficient: Keep a 256-element array; each element of the array consists of the number of bits corresponding to that value. For example (pseudocode):
bit_count = [0, 1, 1, 2, 1, ...]
Thus, bit_count2 == 1, because the value 2, in binary, has a single "1" bit. Simiarly, bit_count[255] == 8.
Then, break the uint into bytes, use the byte values to index into the bit_count array, and add the results. Pseudocode:
total = 0
for byte in list_of_bytes
total = total + bit_count[byte]
EDIT
This issue shows up in the book Beautiful Code, in the chapter by Henry S. Warren. Also, Matt Howells shows a C-language implementation that efficiently calculates a bit count. See this answer.