How to reference current search results in a subquery [duplicate] - postgresql

This question already has answers here:
Find sentences with two words adjacent to each other in Pg
(3 answers)
Closed 8 years ago.
I am trying to find sentences with two words adjecent to each other, using Postgres directly, not some command language extension. My tables are:
TABLE word (
spelling text,
wordid serial
)
TABLE sentence (
sentenceid serial
)
TABLE item (
sentenceid integer,
position smallint,
wordid integer
)
I have a simple query to find sentences with a single word:
SELECT DISTINCT sentence.sentenceid
FROM item, word, sentence
WHERE
word.spelling = 'word1' AND
item.wordid = word.wordid AND
sentence.sentenceid = item.sentenceid
and what I want is to filter that in turn by some word whose corresponding item has a item.sentenceid = the current query result's item (or sentence).sentenceid and
item.position = the current query result's item.position + 1
I don't see that ALIAS's work, and I don't know how to reference current state of the query in a subquery.

select distinct sentenceid
from
item i
inner join
item i2 using (sentenceid)
inner join
word w on w.wordid = i.wordid
inner join
word w2 on w2.wordid = i2.wordid
where
w.spelling = 'word1' and
w2.spelling = 'word2' and
i.position = i2.position - 1

Related

Return closest timestamp from Table B based on timestamp from Table A with matching Product IDs

Goal: Create a query to pull the closest cycle count event (Table C) for a product ID based on the inventory adjustments results sourced from another table (Table A).
All records from Table A will be used, but is not guaranteed to have a match in Table C.
The ID column will be present in both tables, but is not unique in either, so that pair of IDs and Timestamps together are needed for each table.
Current simplified SQL
SELECT
A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM
A
LEFT JOIN
C
ON A.LPID = C.LPID
WHERE
A.facility = 'FACID'
AND A.WHENOCCURRED > '23-DEC-22'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC
;
This is currently pulling the first hit on C.WHENOCCURRED on the LPID matches. Want to see if there is a simpler JOIN solution before going in a direction that creates 2 temp tables based on WHENOCCURRED.
I have a functioning INDEX(MATCH(MIN()) solution in Excel but that requires exporting a couple system reports first and is extremely slow with X,XXX row tables.
If you are using Oracle 12 or later, you can use a LATERAL join and FETCH FIRST ROW ONLY:
SELECT A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM A
LEFT OUTER JOIN LATERAL (
SELECT *
FROM C
WHERE A.LPID = C.LPID
AND A.whenoccurred <= c.whenoccurred
ORDER BY c.whenoccurred
FETCH FIRST ROW ONLY
) C
ON (1 = 1) -- The join condition is inside the lateral join
WHERE A.facility = 'FACID'
AND A.WHENOCCURRED > DATE '2022-12-23'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC;

Comparing two text array columns using % and ordering by number of matches

I have a user table that contains a "skills" column which is a text array. Given some input array, I would like to find all the users whose skills % one or more of the entries in the input array, and order by number of matches (according to the % operator from pg_trgm).
For example, I have Array['java', 'ruby', 'postgres'] and I want users who have these skills ordered by the number of matches (max is 3 in this case).
I tried unnest() with an inner join. It looked like I was getting somewhere, but I still have no idea how I can capture the count of the matching array entries. Any ideas on what the structure of the query may look like?
Edit: Details:
Here is what my programmers table looks like:
id | skills
----+-------------------------------
1 | {javascript,rails,css}
2 | {java,"ruby on rails",adobe}
3 | {typescript,nodejs,expressjs}
4 | {auth0,c++,redis}
where skills is a text array.
Here is what I have so far:
SELECT * FROM programmers, unnest(skills) skill_array(x)
INNER JOIN unnest(Array['ruby', 'node']) search(y)
ON skill_array.x % search.y;
which outputs the following:
id | skills | x | y
----+-------------------------------+---------------+---------
2 | {java,"ruby on rails",adobe} | ruby on rails | ruby
3 | {typescript,nodejs,expressjs} | nodejs | node
3 | {typescript,nodejs,expressjs} | expressjs | express
*Assuming pg_trgm is enabled.
For an exact match between the user skills and the searched skills, you can proceed like this :
You put the searched skills in the target_skills text array
You filter the users from the table user_table whose user_skills array has at least one common element with the target_skills array by using the && operator
For each of the selected users, you select the common skills by using unnest and INTERSECT, and you calculate the number of these common skills
You order the result by the number of common skills DESC
In this process, the users with skill "ruby" will be selected for the target skill "ruby", but not the users with skill "ruby on rails".
This process can be implemented as follow :
SELECT u.user_id
, u.user_skills
, inter.skills
FROM user_table AS u
CROSS JOIN LATERAL
( SELECT array( SELECT unnest(u.user_skills)
INTERSECT
SELECT unnest(target_skills)
) AS skills
) AS inter
WHERE u.user_skills && target_skills
ORDER BY array_length(inter.skills, 1) DESC
or with this variant :
SELECT u.user_id
, u.user_skills
, array_agg(t_skill) AS inter_skills
FROM user_table AS u
CROSS JOIN LATERAL unnest(target_skills) AS t_skill
WHERE u.user_skills && array[t_skill]
GROUP BY u.user_id, u.user_skills
ORDER BY array_length(inter_skills, 1) DESC
This query can be accelerated by creating a GIN index on the user_skills column of the user_table.
For a partial match between the user skills and the target skills (ie the users with skill "ruby on rails" must be selected for the target skill "ruby"), you need to use the pattern matching operator LIKE or the regular expression, but it is not possible to use them with text arrays, so you need first to transform your user_skills text array into a simple text with the function array_to_string. The query becomes :
SELECT u.user_id
, u.user_skills
, array_agg(t_skill) AS inter_skills
FROM user_table AS u
CROSS JOIN unnest(target_skills) AS t_skill
WHERE array_to_string(u.user_skills, ' ') ~ t_skill
GROUP BY u.user_id, u.user_skills
ORDER BY array_length(inter_skills, 1) DESC ;
Then you can accelerate the queries by creating the following GIN (or GiST) index :
DROP INDEX IF EXISTS user_skills ;
CREATE INDEX user_skills
ON user_table
USING gist (array_to_string(user_skills, ' ') gist_trgm_ops) ; -- gin_trgm_ops and gist_trgm_ops indexes are compliant with the LIKE operator and the regular expressions
In any case, managing the skills as text will ever fail if there are typing errors or if the skills list is not normalized.
I accepted Edouard's answer, but I thought I'd show something else I adapted from it.
CREATE OR REPLACE FUNCTION partial_and_and(list1 TEXT[], list2 TEXT[])
RETURNS BOOLEAN AS $$
SELECT EXISTS(
SELECT * FROM unnest(list1) x, unnest(list2) y
WHERE x % y
);
$$ LANGUAGE SQL IMMUTABLE;
Then create the operator:
CREATE OPERATOR &&% (
LEFTARG = TEXT[],
RIGHTARG = TEXT[],
PROCEDURE = partial_and_and,
COMMUTATOR = &&%
);
And finally, the query:
SELECT p.id, p.skills, array_agg(t_skill) AS inter_skills
FROM programmers AS p
CROSS JOIN LATERAL unnest(Array['ruby', 'java']) AS t_skill
WHERE p.skills &&% array[t_skill]
GROUP BY p.id, p.skills
ORDER BY array_length(inter_skills, 1) DESC;
This will output an error saying column 'inter_skills' does not exist (not sure why), but oh well point is the query seems to work. All credit goes to Edouard.

How to get the matching column name when searching across multiple columns in PostgreSQL?

Is there a way to get the matching column name when searching across multiple columns in PostgreSQL?
Say I have the following table structure and query:
CREATE TABLE document (
id serial PRIMARY KEY,
document_content VARCHAR
);
CREATE TABLE story (
id serial PRIMARY KEY,
headline VARCHAR
);
-----
SELECT
"document".*,
story.id,
story.headline
FROM
"document"
INNER JOIN story_document AS Documents_join ON "document".id = Documents_join.document_id
INNER JOIN story ON "story".id = Documents_join.story_id
WHERE to_tsvector(document_content) ## to_tsquery('foo')
OR to_tsvector(headline) ## to_tsquery('foo');
I was thinking of concatenating the value of the two columns, run the full text search, then create a sub query for both columns and re-run the search individually and record the result as a reference, but this would mean executing the search 3x:
SELECT
"document".*,
story.id AS story_id,
story.headline
(SELECT "document".id WHERE to_tsvector(document_content) ## to_tsquery('foo')) AS "matching_document_id",
(SELECT story_id WHERE to_tsvector(headline) ## to_tsquery('foo')) AS "matching_story_id"
FROM
"document"
INNER JOIN story_document AS Documents_join ON "document".id = Documents_join.document_id
RIGHT JOIN story ON "story".id = Documents_join.story_id
WHERE to_tsvector(document_content || ' ' || headline) ## to_tsquery('foo');
How could I get a reference to the column: document_content or headline, where the keyword "foo" was found in one query?
Thanks!
Since the columns are in different tables the best you can do is translate the OR into a UNION:
SELECT
"document".*,
story.id,
story.headline
FROM
"document"
INNER JOIN story_document AS Documents_join ON "document".id = Documents_join.document_id
INNER JOIN story ON "story".id = Documents_join.story_id
WHERE to_tsvector(document_content) ## to_tsquery('foo')
UNION
SELECT
"document".*,
story.id,
story.headline
FROM
"document"
INNER JOIN story_document AS Documents_join ON "document".id = Documents_join.document_id
INNER JOIN story ON "story".id = Documents_join.story_id
WHERE to_tsvector(headline) ## to_tsquery('foo');
Then PostgreSQL doesn't have to build the complete join just to filter out most of the rows. My variant will be fast if the conditions are selective and indexed and you have indexes on the join conditions as well, so that you can get fast nested loop joins.
Here is some more about dealing with OR.

Query table by a value in the second dimension of a two dimensional array column

WHAT I HAVE
I have a table with the following definition:
CREATE TABLE "Highlights"
(
id uuid,
chunks numeric[][]
)
WHAT I NEED TO DO
I need to query the data in the table using the following predicate:
... WHERE id = 'some uuid' and chunks[????????][1] > 10 chunks[????????][3] < 20
What should I put instead of [????????] in order to scan all items in the first dimension of the array?
Notes
I'm not entirely sure that chunks[][1] even close to something I need.
All I need is to test a row, whether its chunks column contains a two dimensional array, that has in any of its tuples some specific values.
May be there's better alternative, but this might do - you just go over first dimension of each array and testing your condition:
select *
from highlights as h
where
exists (
select
from generate_series(1, array_length(h.chunks, 1)) as tt(i)
where
-- your condition goes here
h.chunks[tt.i][1] > 10 and h.chunks[tt.i][3] < 20
)
db<>fiddle demo
update as #arie-r pointed out, it'd be better to use generate_subscripts function:
select *
from highlights as h
where
exists (
select *
from generate_subscripts(h.chunks, 1) as tt(i)
where
h.chunks[tt.i][3] = 6
)
db<>fiddle demo

PostgreSQL and word games

In a word game similar to Ruzzle or Letterpress, where users have to construct words out of a given set of letters:
I keep my dictionary in a simple SQL table:
create table good_words (
word varchar(16) primary key
);
Since the game duration is very short I do not want to check every entered word by calling a PHP script, which would look that word up in the good_words table.
Instead I'd like to download all possible words by one PHP script call before the round starts - since all letters are known.
My question is: if there is a nice SQLish way to find such words?
I.e. I could run a longer-taking script once to add a column to good_words table, which would have same letters as in the word columnt, but sorted alphabetically... But I still can't think of a way to match for it given a set of letters.
And doing the word matching inside of a PHP script (vs. inside the database) would probably take too long (because of bandwidth: would have to fetch every row from the database to the PHP script).
Any suggestions or insights please?
Using postgresql-8.4.13 with CentOS Linux 6.3.
UPDATE:
Other ideas I have:
Create a constantly running script (cronjob or daemon) which would prefill an SQL table with precompiled letters board and possible words - but still feels like a waste of bandwidth and CPU, I would prefer to solve this inside the database
Add integer columns a, b, ... , z and whenever I store a word into good_words, store the letter occurences there. I wonder if it is possible to create an insert trigger in Pl/PgSQL for that?
Nice question, I upvoted.
What you're up to is a list of all possible permutations of the given letters of a given length. As described in the PostgreSQL wiki, you can create a function and call it like this (matches highlighted letters in your screenshot):
SELECT * FROM permute('{E,R,O,M}'::text[]);
Now, to query the good_words use something like:
SELECT gw.word, gw.stamp
FROM good_words gw
JOIN permute('{E,R,O,M}'::text[]) s(w) ON gw.word=array_to_string(s.w, '');
This could be a start, except that it doesn't check if we have enough letters, only if he have the right letters.
SELECT word from
(select word,generate_series(0,length(word)) as s from good_words) as q
WHERE substring(word,s,1) IN ('t','h','e','l','e','t','t','e','r','s')
GROUP BY word
HAVING count(*)>=length(word);
http://sqlfiddle.com/#!1/2e3a2/3
EDIT:
This query select only the valid words though it seems a bit redundant. It's not perfect but certainly proves it can be done.
WITH words AS
(SELECT word, substring(word,s,1) as sub from
(select word,generate_series(1,length(word)) as s from good_words) as q
WHERE substring(word,s,1) IN ('t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'))
SELECT w.word FROM
(
SELECT word,words.sub,count(DISTINCT s) as cnt FROM
(SELECT s, substring(array_to_string(l, ''),s,1) as sub FROM
(SELECT l, generate_subscripts(l,1) as s FROM
(SELECT ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'] as l)
as q)
as q) as let JOIN
words ON let.sub=words.sub
GROUP BY words.word,words.sub) as let
JOIN
(select word,sub,count(*) as cnt from words
GROUP BY word, sub)
as w ON let.word=w.word AND let.sub=w.sub AND let.cnt>=w.cnt
GROUP BY w.word
HAVING sum(w.cnt)=length(w.word);
Fiddle with all possible 3+ letters words (485) for that image: http://sqlfiddle.com/#!1/2fc66/1
Fiddle with 699 words out of which 485 are correct: http://sqlfiddle.com/#!1/4f42e/1
Edit 2:
We can use array operators like so to get a list of words that contain the letters we want:
SELECT word as sub from
(select word,generate_series(1,length(word)) as s from good_words) as q
GROUP BY word
HAVING array_agg(substring(word,s,1)) <# ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'];
So we can use it to narrow down the list of words we need to check.
WITH words AS
(SELECT word, substring(word,s,1) as sub from
(select word,generate_series(1,length(word)) as s from
(
SELECT word from
(select word,generate_series(1,length(word)) as s from good_words) as q
GROUP BY word
HAVING array_agg(substring(word,s,1)) <# ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s']
)as q) as q)
SELECT DISTINCT w.word FROM
(
SELECT word,words.sub,count(DISTINCT s) as cnt FROM
(SELECT s, substring(array_to_string(l, ''),s,1) as sub FROM
(SELECT l, generate_subscripts(l,1) as s FROM
(SELECT ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'] as l)
as q)
as q) as let JOIN
words ON let.sub=words.sub
GROUP BY words.word,words.sub) as let
JOIN
(select word,sub,count(*) as cnt from words
GROUP BY word, sub)
as w ON let.word=w.word AND let.sub=w.sub AND let.cnt>=w.cnt
GROUP BY w.word
HAVING sum(w.cnt)=length(w.word) ORDER BY w.word;
http://sqlfiddle.com/#!1/4f42e/44
We can use GIN indexes to work on arrays so we probably could create a table that would store the arrays of letters and make words point to it (act, cat and tact would all point to array [a,c,t]) so probably that would speed things up but that's up for testing.
Create a table that has entries (id, char), be n the number of characters you are querying for.
select id, count(char) AS count from chartable where (char = x or char = y or char = z ...) and count = n group by id;
OR (for partial matching)
select id, count(char) AS count from chartable where (char = x or char = y or char = z ...) group by id order by count;
The result of that query has all the word-id's that fit the specifications. Cache the result in a HashSet and simple do a lookup whenever a word is entered.
You can add the column with sorterd letters formatted like '%a%c%t%'. Then use query:
select * from table where 'abcttx' like sorted_letters
to find words that can be built from letters 'abcttx'. I don't know about performance, but simplicity probably can't be beaten :)
Here is a query that finds the answers that can be found by walking through adjacent fields.
with recursive
input as (select '{{"t","e","s","e"},{"r","e","r","o"},{"r","e","m","a"},{"s","d","s","s"}}'::text[] as inp),
dxdy as(select * from (values(-1,-1),(-1,0),(-1,1),(0,1),(0,-1),(1,-1),(1,0),(1,1)) as v(dx, dy)),
start_position as(select * from generate_series(1,4) x, generate_series(1,4) y),
work as(select x,y,inp[y][x] as word from start_position, input
union
select w.x + dx, w.y + dy, w.word || inp[w.y+dy][w.x+dx]
from dxdy cross join input cross join work w
inner join good_words gw on gw.word like w.word || '%'
)
select distinct word from work
where exists(select * from good_words gw where gw.word = work.word)
(other answers don't take this into account).
Sql fiddle link: http://sqlfiddle.com/#!1/013cc/14 (notice You need an index with varchar_pattern_ops for the query to be reasonably fast).
Does not work in 8.4. Probably 9.1+ only. SQL Fidlle
select word
from (
select unnest(string_to_array(word, null)) c, word from good_words
intersect all
select unnest(string_to_array('TESTREROREMASDSS', null)) c, word from good_words
) s
group by word
having
array_agg(c order by c) =
(select array_agg(c order by c) from unnest(string_to_array(word, null)) a(c))
My own solution is to create an insert trigger, which writes letter frequencies into an array column:
create table good_words (
word varchar(16) primary key,
letters integer[26]
);
create or replace function count_letters() returns trigger as $body$
declare
alphabet varchar[];
i integer;
begin
alphabet := regexp_split_to_array('abcdefghijklmnopqrstuvwxyz', '');
new.word := lower(new.word);
for i in 1 .. array_length(alphabet, 1)
loop
-- raise notice '%: %', i, alphabet[i];
new.letters[i] := length(new.word) - length(replace(new.word, alphabet[i], ''));
end loop;
return new;
end;
$body$ language plpgsql;
create trigger count_letters
before insert on good_words
for each row execute procedure count_letters();
Then I generate similar array for the random board string tesereroremasdss
and compare both arrays using the array contains operator #>
Any new ideas or improvements are always welcome!