Separate Chaining: How many key comparisons do unsuccessful seaches take? - hash

I can't find any clear information on how may key comparisons unsuccessful searches count (using linked lists for chaining).
Maybe someone can explain it to me with the help of the following example:
h(k)=k; m=4
0 || - |
1 || 9 |
2 || > | 6 | 10 |
3 || > | 7 |
Let's say I search for 4, does it take 0 or 1 key comparisons to realise the value is not in the table?
If I searched for 14 how many key comparisons would that take? 2? 3?

Related

Relational databse design to represent similarity between rows of same table

For background purposes: I'm using PostgreSQL with SQLAlchemy (Python).
Given a table of unique references as such:
references_table
-----------------------
id | reference_code
-----------------------
1 | CODEABCD1
2 | CODEABCD2
3 | CODEWXYZ9
4 | CODEPOIU0
...
In a typical scenario, I would have a separate items table:
items_table
-----------------------
id | item_descr
-----------------------
1 | `Some item A`
2 | `Some item B`
3 | `Some item C`
4 | `Some item D`
...
In such typical scenario, the many-to-many relationship between references and items is set in a junction table:
references_to_items
-----------------------
ref_id (FK) | item_id (FK)
-----------------------
1 | 4
2 | 1
3 | 2
4 | 1
...
In that scenario, it is easy to model and obtain all references that are associated to the same item, for instance item 1 has references 2 and 4 as per table above.
However, in my scenario, there is no items_table. But I would still want to model the fact that some references refer to the same (non-represented) item.
I see a possibility to model that via a many-to-many junction table as such (associating FKs of the references table):
reference_similarities
-----------------------
ref_id (FK) | ref_id_similar (FK)
-----------------------
2 | 4
2 | 8
2 | 9
...
Where references with ID 2, 4, 8 and 9 would be considered 'similar' for the purposes of my data model.
However, the inconvenience here is that such model requires to choose one reference (above id=2) as a 'pivot', to which multiple others can be declared 'similar' in the reference_similarities table. Ref 2 is similar to 4 and ref 2 is similar to 8 ==> thus 4 is similar to 8.
So the question is: is there a better design that doesn't involve having a 'pivot' FK as above?
Ideally, I would store the 'similarity' as an Array of FKs as such:
reference_similarities
------------------------
id | ref_ids (Array of FKs)
------------------------
1 | [2, 4, 8, 9]
2 | [1, 3, 5]
..but I understand from https://dba.stackexchange.com/questions/60132/foreign-key-constraint-on-array-member that it is currently not possible to have foreign keys in PostgreSQL arrays. So I'm trying to figure out a better design for this model.
I can understand that you want to group items in a set, and able to query the set from any of item in it.
You can use a hash function to hash a set, then use the hash as pivot value.
For example you have a set of values (2,4,8,9), it will be hashed like this:
hash = ((((31*1 + 2)*31 + 4)*31 + 8)*31 + 9
you can refer to Arrays.hashCode in Java to know how to hash a list of values.
int result = 1;
for (Object element : a)
result = 31 * result + (element == null ? 0 : element.hashCode());
Table reference_similarities:
reference_similarities
-----------------------
ref_id (FK) | hash_value
-----------------------
2 | hash(2, 4, 8, 9) = 987204
4 | 987204
8 | 987204
9 | 987204
To query the set, you can first query hash_value from ref_id first, then, get all ref_id from hash_value.
The draw back of this solution is every time you add a new value to a set, you have to rehash the set.
Another solution is you can just write a function in Python to produce a unique hash_value when creating a new set.

TypeAhead - begins-with full text search

I'm implementing a simple search in postgresql that will be used to retrieve typeahead results on a web page. So, I need the last argument to use starts-with matching, since the user may not have finished typing the word. When I construct my tsquery, I'm adding :* to the last argument. Here's a sample query:
SELECT id, key, name
FROM principal,
to_tsvector(key || ' ' || name) vector,
to_tsquery('investig:*') query
WHERE vector ## query
ORDER BY ts_rank(vector, query) DESC
While typing the word "investigate", I get the following behavior:
Input | Result Count
==========================
i | 0
in | 0
inv | 8
inve | 8
inves | 8
invest | 8
investi | 7
investig | 7
investiga | 0
investigat | 0
investigate | 7
This is better than if I omit the :*, but not good enough. Why do I get 0 results for investiga when investigate returns 7 results? Is there a better way to construct my query to make sure I get everything that begins with a search term?

PostgreSQL - random primary key

I need a primary key for a PostgreSQL table. The ID should consist out of a number from about 20 numbers.
I am a beginner at database and also worked not with PostgreSQL. I found some examples for a random id, but that examples where with characters and I need only an integer.
Can anyone help me to resolve this problem?
I'm guessing you actually mean random 20 digit numbers, because a random number between 1 and 20 would rapidly repeat and cause collisions.
What you need probably isn't actually a random number, it's a number that appears random, while actually being a non-repeating pseudo-random sequence. Otherwise your inserts will randomly fail when there's a collision.
When I wanted to do something like this a while ago I asked the pgsql-general list, and got a very useful piece of advice: Use a feistel cipher over a normal sequence. See this useful wiki example. Credit to Daniel Vérité for the implementation.
Example:
postgres=# SELECT n, pseudo_encrypt(n) FROM generate_series(1,20) n;
n | pseudo_encrypt
----+----------------
1 | 1241588087
2 | 1500453386
3 | 1755259484
4 | 2014125264
5 | 124940686
6 | 379599332
7 | 638874329
8 | 898116564
9 | 1156015917
10 | 1410740028
11 | 1669489846
12 | 1929076480
13 | 36388047
14 | 295531848
15 | 554577288
16 | 809465203
17 | 1066218948
18 | 1326999099
19 | 1579890169
20 | 1840408665
(20 rows)
These aren't 20 digits, but you can pad them by multiplying them and truncating the result, or you can modify the feistel cipher function to produce larger values.
To use this for key generation, just write:
CREATE SEQUENCE mytable_id_seq;
CREATE TABLE mytable (
id bigint primary key default pseudo_encrypt(nextval('mytable_id_seq')),
....
);
ALTER SEQUENCE mytable_id_seq OWNED BY mytable;

Check if field value is in a list of strings in SSRS report

I'm using SSRS (VS2008) and creating a report of work orders. In the detail line of the report table, I have the following columns (with some fake data)
WONUM | A | B | Hours
ABC123 | 3 | 0 | 3
SPECIAL| 0 | 6 | 6
DEF456 | 5 | 0 | 5
GHI789 | 4 | 0 | 4
OTHER | 0 | 2 | 2
As you can kind of see, all work orders have a work order number (WONUM) as well as a total # of hours (HOURS). I need to put the hours into either column A or column B based on WONUM. I have a list of specifically named work orders (in the example, they would be "SPECIAL" and "OTHER") which would cause the HOURS value to be put in column B. If the WONUM is NOT a special named one, then it goes in column A. Here's what I WANTED to put as the expression for column A and column B:
Column A: =IIF(Fields!WONUM.Value IN ("SPECIAL","OTHER"), 0, Fields!Hours.Value)
Column B: =IIF(Fields!WONUM.Value IN ("SPECIAL","OTHER"), Fields!Hours.Value, 0)
But as you're probably aware, Fields!WONUM.Value IN ("SPECIAL","OTHER") is not a valid method of doing this! What is the best way to make this work? I cannot flag it in the SQL query in any other way for other reasons so it must be done in the table.
Thanks in advance for any and all help!
Try this, (Using InStr() function)
IIF(InStr(Fields!WONUM.Value,"SPECIAL")>0 OR InStr(Fields!WONUM.Value,"OTHER")>0, 0, Fields!Hours.Value)
IIF(InStr(Fields!WONUM.Value,"SPECIAL")>0 OR InStr(Fields!WONUM.Value,"OTHER")>0, Fields!Hours.Value,0)
If it's just the two WONUMs then you can do this:
Column A:
=IIF((Fields!WONUM.Value <> "SPECIAL") AND (Fields!WONUM.Value <> "OTHER"), Fields!Hours.Value, 0)
Column B:
=IIF((Fields!WONUM.Value = "SPECIAL") OR (Fields!WONUM.Value = "OTHER"), Fields!Hours.Value, 0)
or use the same formula in each column for consistency and swap the field/0 at the end.

How to remove column-duplicates from the query result using entity-framework?

On my database table I have
Key | Value
a | 1
a | 2
b | 11
c | 1
d | 2
b | 3
But I just need to get the items which keys are not duplicates of the previous rows. The desired result should be:
Key | Value
a | 1
b | 11
c | 1
d | 2
How could we get the desired result using entity-framework?
Note: we need the first value. Thank you very much.
var q = from e in Context.MyTable
group e by e.Key into g
select new
{
Key = g.Key,
Value = g.OrderBy(v => v.Value).FirstOrDefault()
};
You should look at either writing a View in the database and mapping your entity to that.
Or creating a DefiningQuery in the part of your EDMX (aka the bit that ends up in the SSDL file).
See Tip 34 for more information.
Conceptually both approaches allow you to write a view that excludes the 'duplicate rows'. The difference is just where the view lives.
If you have control over the database - I'd put the view in the database
If not you can put the view in your inside the and then map to that.
Hope this helps
Alex