TypeAhead - begins-with full text search - postgresql

I'm implementing a simple search in postgresql that will be used to retrieve typeahead results on a web page. So, I need the last argument to use starts-with matching, since the user may not have finished typing the word. When I construct my tsquery, I'm adding :* to the last argument. Here's a sample query:
SELECT id, key, name
FROM principal,
to_tsvector(key || ' ' || name) vector,
to_tsquery('investig:*') query
WHERE vector ## query
ORDER BY ts_rank(vector, query) DESC
While typing the word "investigate", I get the following behavior:
Input | Result Count
i | 0
in | 0
inv | 8
inve | 8
inves | 8
invest | 8
investi | 7
investig | 7
investiga | 0
investigat | 0
investigate | 7
This is better than if I omit the :*, but not good enough. Why do I get 0 results for investiga when investigate returns 7 results? Is there a better way to construct my query to make sure I get everything that begins with a search term?


ADF - Dataflow, using Join to send new values

there are two tables
tbl_1 as a source data
ID | Submission_id
1 | A00_1
2 | A00_2
3 | A00_3
4 | A00_4
5 | A00_5
6 | A00_6
7 | A00_7
tbl_2 as destination. In this table, Submission_id is unique key.
ID | Submission_id
1 | A00_1
2 | A00_2
3 | A00_3
4 | A00_4
tbl_1 as input value and tbl_2 as destination (sink). Expected result is only A00_5, A00_6 & A00_7 sent to tbl_2. So, this picture below is the Join
for AlterRow,
expected ouput
ID | Submission_id
1 | A00_1
2 | A00_2
3 | A00_3
4 | A00_4
5 | A00_5 -->(new)
6 | A00_6 -->(new)
7 | A00_7 -->(new)
But, output result from alterRow are all Submission_id. It should be only not equal comparison that has been stated in the alter row condition,
notEquals(DC__Submission_ID_BigInt, SrcStgDestination#{_Submission_ID}).
How to solve this problem in Azure DataFlow use 'Join' ?
I tried doing the same procedure and got the same result (all rows getting inserted). We were able to perform join in the desired way but couldn’t proceed further to get the required output. You can use the approach given below instead, which is achieved using JOINS.
In general, when we want to get records from table1 which are not present in table2, we execute the following query (in sql server).
select t1.id,t1.submission_id from t1 left outer join t2 on t1.submission_id = t2.submission_id where t2.submission_id is NULL
In the Dataflow, we were able to achieve the join successfully (same procedure as yours). Now instead using alter row transformation, I used filter transformation (to achieve t2.submission_id is NULL condition). I used the following expression (condition) to filter.
isNull(d1#submission_id) && isNull(d1#id)
Now proceed to configure the sink (tbl_2). The preview would show the records as in the below image.
Publish and run the dataflow activity in your pipeline to get the desired results.

Separate Chaining: How many key comparisons do unsuccessful seaches take?

I can't find any clear information on how may key comparisons unsuccessful searches count (using linked lists for chaining).
Maybe someone can explain it to me with the help of the following example:
h(k)=k; m=4
0 || - |
1 || 9 |
2 || > | 6 | 10 |
3 || > | 7 |
Let's say I search for 4, does it take 0 or 1 key comparisons to realise the value is not in the table?
If I searched for 14 how many key comparisons would that take? 2? 3?

Postgres Query for Beginners

Ok, I deleted previous post and will try this again. I am sure I don't know the topic and I'm not sure if this is a loop or if I should use a stored function or how to get what I'm looking for. Here's sample data and expected output;
I have a single table A. Table has following fields; date created, unique person key, type, location.
I need a Postgres query that says for any given month(parameter, based on date created) and given a location(parameter based on location field), provide me fieds below where unique person key may be duplicated + or – 30 days from the date created within the month given for same type but all locations.
Example Data
Date Created | Unique Person | Type | Location
2/5/2017 | 1 | Admit | Hospital1
2/6/2017 | 2 | Admit | Hospital2
2/15/2017 | 1 | Admit | Hospital2
2/28/2017 | 3 | Admit | Hospital2
3/3/2017 | 2 | Admit | Hospital1
3/15/2017 | 3 | Admit | Hospital3
3/20/2017 | 4 | Admit | Hospital1
4/1/2017 | 1 | Admit | Hospital2
Output for the month of March for Hospital1:
DateCreated| UniquePerson | Type | Location | +-30days | OtherLoc.
3/3/2017 | 2 | Admit| Hospital1 | 2/6/2017 | Hospital2
Output for the month of March for Hospital2:
None, because no one was seen at Hospital2 in March
Output for the month of March for Hospital3:
DateCreated| UniquePerson | Type | Location | +-30days | otherLoc.
3/15/2017 | 3 | Admit| Hospital3 | 2/28/2017 | Hospital2
Version 1
I would use a WITH clause. Please, notice that I've added a column id that is a primary key to simplify the query. It's just to prevent the rows to be matched with themselves.
location = 'Hospital1' AND
date_trunc('month', date_created) = date_trunc('month', '2017-03-01'::date)
a.date_created AS "+-30days",
a.location AS other_location
USING (unique_person_id, type)
x.id != a.id AND
abs(x.date_created - a.date_created) <= 30;
Now a little bit of explanations:
First we select, let's say a reference data with a WITH clause. Think of it as a temporary table that we can reference in the main query. In given example it could be a "main visit" in given hospital.
Then we join "main visits" with other visits of the same person and type (JOIN condition) that happen in date difference of 30 days (WHERE condition).
Notice that the WITH query has the limits you want to check (location and date). I use date_trunc function that truncates the date to specified precision (a month in this case).
Version 2
As #Laurenz Albe suggested, there is no special need to use a WITH clause. Right, so here is a second version.
a.date_created AS "+-30days",
a.location AS other_location
a AS x
USING (unique_person_id, type)
x.location = 'Hospital1' AND
date_trunc('month', x.date_created) = date_trunc('month', '2017-03-01'::date) AND
x.id != a.id AND
abs(x.date_created - a.date_created) <= 30;
This version is shorter than the first one but, in my opinion, the first is easier to understand. I don't have big enough set of data to test and I wonder which one runs faster (the query planner shows similar values for both).

Return only first rows where array contains an element and possible leave out any elements seen earlier

Given the following data set:
| page | sentence_ids |
| 1 | { 1, 2, 3 } |
| 2 | { 1, 2 } |
| 3 | { 3, 4 } |
I'd like to do a query that would return the pages where the sentence id occurred first. Preferably with sentence_ids only occurring once in the dataset and the least amount of pages. In this case:
| page | sentence_ids |
| 1 | { 1, 2, 3 } |
| 3 | { 4 } |
Is this even possible? The relation is denormalized because the pages can end up in the 10 thousands, and the sentences in the 100 thousands.
Right now we load all the pages with all the sentences and filter in the code. Terribly inefficient. Hope someone can help.
The only practical way* is to first unnest the array of sentence_ids and then pick the combination of page, sentence that matches the latter to the lowest page; you can do this with a window function by partitioning on the sentence and finding a rank after ordering by page. The record with rank=1 is the combination of interest. You then aggregate the result back into an array:
SELECT page, array_agg(sentence)
SELECT page, sentence, rank() OVER (PARTITION BY sentence ORDER BY page) AS rnk
SELECT page, unnest(sentence_ids) AS sentence
FROM page_sentences) p_s
) p_s_r
WHERE rnk = 1
GROUP BY page;
Given the size of your data this may not be a very fast solution but it is very likely to beat pulling all data and then filtering in code.
"Practical" is here loosely defined as "anything that avoids having to follow Craig's advice". (Sorry Craig...)

Check if field value is in a list of strings in SSRS report

I'm using SSRS (VS2008) and creating a report of work orders. In the detail line of the report table, I have the following columns (with some fake data)
WONUM | A | B | Hours
ABC123 | 3 | 0 | 3
SPECIAL| 0 | 6 | 6
DEF456 | 5 | 0 | 5
GHI789 | 4 | 0 | 4
OTHER | 0 | 2 | 2
As you can kind of see, all work orders have a work order number (WONUM) as well as a total # of hours (HOURS). I need to put the hours into either column A or column B based on WONUM. I have a list of specifically named work orders (in the example, they would be "SPECIAL" and "OTHER") which would cause the HOURS value to be put in column B. If the WONUM is NOT a special named one, then it goes in column A. Here's what I WANTED to put as the expression for column A and column B:
Column A: =IIF(Fields!WONUM.Value IN ("SPECIAL","OTHER"), 0, Fields!Hours.Value)
Column B: =IIF(Fields!WONUM.Value IN ("SPECIAL","OTHER"), Fields!Hours.Value, 0)
But as you're probably aware, Fields!WONUM.Value IN ("SPECIAL","OTHER") is not a valid method of doing this! What is the best way to make this work? I cannot flag it in the SQL query in any other way for other reasons so it must be done in the table.
Thanks in advance for any and all help!
Try this, (Using InStr() function)
IIF(InStr(Fields!WONUM.Value,"SPECIAL")>0 OR InStr(Fields!WONUM.Value,"OTHER")>0, 0, Fields!Hours.Value)
IIF(InStr(Fields!WONUM.Value,"SPECIAL")>0 OR InStr(Fields!WONUM.Value,"OTHER")>0, Fields!Hours.Value,0)
If it's just the two WONUMs then you can do this:
Column A:
=IIF((Fields!WONUM.Value <> "SPECIAL") AND (Fields!WONUM.Value <> "OTHER"), Fields!Hours.Value, 0)
Column B:
=IIF((Fields!WONUM.Value = "SPECIAL") OR (Fields!WONUM.Value = "OTHER"), Fields!Hours.Value, 0)
or use the same formula in each column for consistency and swap the field/0 at the end.