Need help with a SELECT statement - select

I express the relationship between records and searchtags that can be attached to records like so:
TABLE RECORDS
id
name
TABLE SEARCHTAGS
id
recordid
name
I want to be able to SELECT records based on the searchtags that they have. For example, I want to be able to SELECT all records that have searchtags:
(1 OR 2 OR 5) AND (6 OR 7) AND (10)
Using the above data structure, I am uncertain how to structure the SQL to accomplish this.
Any suggestions?
Thanks!

You may want to try the following:
SELECT r.id, r.name
FROM records r
WHERE EXISTS (SELECT NULL FROM searchtags WHERE recordid = r.id AND id IN (1, 2, 5)) AND
EXISTS (SELECT NULL FROM searchtags WHERE recordid = r.id AND id IN (6, 7)) AND
EXISTS (SELECT NULL FROM searchtags WHERE recordid = r.id AND id IN (10));
Test case: Note that only records 1 and 4 will satisfy the query criteria.
CREATE TABLE records (id int, name varchar(10));
CREATE TABLE searchtags (id int, recordid int);
INSERT INTO records VALUES (1, 'a');
INSERT INTO records VALUES (2, 'b');
INSERT INTO records VALUES (3, 'c');
INSERT INTO records VALUES (4, 'd');
INSERT INTO searchtags VALUES (1, 1);
INSERT INTO searchtags VALUES (2, 1);
INSERT INTO searchtags VALUES (6, 1);
INSERT INTO searchtags VALUES (10, 1);
INSERT INTO searchtags VALUES (1, 2);
INSERT INTO searchtags VALUES (2, 2);
INSERT INTO searchtags VALUES (3, 2);
INSERT INTO searchtags VALUES (1, 3);
INSERT INTO searchtags VALUES (10, 3);
INSERT INTO searchtags VALUES (5, 4);
INSERT INTO searchtags VALUES (7, 4);
INSERT INTO searchtags VALUES (10, 4);
Result:
+------+------+
| id | name |
+------+------+
| 1 | a |
| 4 | d |
+------+------+
2 rows in set (0.01 sec)

SELECT
id, name
FROM
records
WHERE
EXISTS (
SELECT 1 FROM searchtags WHERE recordid = records.id AND id IN (1, 2, 5)
)
AND EXISTS (
SELECT 1 FROM searchtags WHERE recordid = records.id AND id IN (6, 7)
)
AND EXISTS (
SELECT 1 FROM searchtags WHERE recordid = records.id AND id IN (10)
)

not sure how to do it in mysql, but in t-sql, you could do something like:
SELECT id, name FROM RECORDS where id in (SELECT recordid from SEARCHTAGS where id in (1,2,5,6,7,10))
I may not be understanding your question entirely... but I gave it my best.

Try:
SELECT R.*
FROM RECORDS R, SEARCHTAGS S
WHERE R.id == S.recordid
AND S.name in (1,2,5,6,7,10);
Don't know if you need S.name or S.id, but this is an example.

select RECORDS.name
from RECORDS join SEARCHTAGS
on RECORDS.id = SEARCHTAGS.recordid
where RECORDS.id in (1,2,...)

I misread the question first time around, and thought it was asking for
(1 AND 2 AND 5) OR (6 AND 7) OR (10)
instead of the correct
(1 OR 2 OR 5) AND (6 OR 7) AND (10)
All the answers so far have concentrated on answering the specific example, rather than addressing the more general question "and suppose I want a different set of criteria next time".
Actual question
I can't do much better than the selected answer for the actual question (Query 1):
SELECT r.id, r.name
FROM Records AS r
WHERE EXISTS(SELECT * FROM SearchTags AS s
WHERE r.id = s.recordid AND s.id IN (1, 2, 5))
AND EXISTS(SELECT * FROM SearchTags AS s
WHERE r.id = s.recordid AND s.id IN (6, 7))
AND EXISTS(SELECT * FROM SearchTags AS s
WHERE r.id = s.recordid AND s.id IN (10));
This could be written as a join to 3 aliases for the SearchTags table.
Alternative question
There are several ways to answer the alternative - I think this is the most nearly neat and extensible. Clearly, the one item (10) is easy (Query 2):
SELECT r.id, r.name
FROM records AS r JOIN searchtags AS t ON r.id = t.recordid
WHERE t.id IN (10) -- or '= 10' but IN is consistent with what follows
The two items (6 or 7) can be done with (Query 3):
SELECT r.id, r.name
FROM records AS r JOIN searchtags AS t ON r.id = t.recordid
WHERE t.id IN (6, 7)
GROUP BY r.id, r.name
HAVING COUNT(*) = 2
The three items (1, 2, 5) can be done with (Query 4):
SELECT r.id, r.name
FROM records AS r JOIN searchtags AS t ON r.id = t.recordid
WHERE t.id IN (1, 2, 5)
GROUP BY r.id, r.name
HAVING COUNT(*) = 3
And the whole collection can be a UNION of the three terms.
Generalizing the solutions
The downside of this solution is that the SQL must be manually crafted for each set of items.
If you want to automate the 'SQL generation', you need the control data - the sets of interesting search tags - in a table:
CREATE TABLE InterestingTags(GroupID INTEGER, TagID INTEGER);
INSERT INTO InterestingTags(1, 1);
INSERT INTO InterestingTags(1, 2);
INSERT INTO InterestingTags(1, 5);
INSERT INTO InterestingTags(2, 6);
INSERT INTO InterestingTags(2, 7);
INSERT INTO InterestingTags(3, 10);
For the query asking for '(1 OR 2 OR 5) AND (...)' (conjunctive normal form), you can write (Query 5):
SELECT r.id, r.name
FROM records AS r JOIN
searchtags AS s ON r.id = s.recordID JOIN
interestingtags AS t ON s.id = t.tagID
GROUP BY r.id, r.name
HAVING COUNT(DISTINCT t.GroupID) = (SELECT COUNT(DISTINCT GroupID)
FROM InterestingTags);
This checks that the number of distinct 'interesting groups of tags' for a given record is equal to the total number of 'interesting groups of tags'.
For the query asking for '(1 AND 2 AND 5) OR (...)' (disjunctive normal form), you can write a join with InterestingTags and check that the Record has as many entries as the group of tags (Query 6):
SELECT i.id, i.name
FROM (SELECT q.id, q.name, c.GroupSize,
COUNT(DISTINCT t.GroupID) AS GroupCount
FROM records AS q JOIN
searchtags AS s ON q.id = s.recordID JOIN
interestingtags AS t ON s.id = t.tagID JOIN
(SELECT GroupID, COUNT(*) AS GroupSize
FROM InterestingTags
GROUP BY GroupID) AS c ON c.GroupID = t.GroupID
GROUP BY q.id, q.name, c.GroupSize
) AS i
WHERE i.GroupCount = i.GroupSize;
Test Data
I took the test data from Daniel Vassalo's answer and augmented it with some extra values:
CREATE TABLE records (id int, name varchar(10));
CREATE TABLE searchtags (id int, recordid int);
INSERT INTO records VALUES (1, 'a');
INSERT INTO records VALUES (2, 'b');
INSERT INTO records VALUES (3, 'c');
INSERT INTO records VALUES (4, 'd');
INSERT INTO records VALUES (11, 'A11');
INSERT INTO records VALUES (21, 'B12');
INSERT INTO records VALUES (31, 'C13');
INSERT INTO records VALUES (41, 'D14');
INSERT INTO records VALUES (51, 'E15');
INSERT INTO records VALUES (61, 'F16');
INSERT INTO searchtags VALUES (1, 1);
INSERT INTO searchtags VALUES (2, 1);
INSERT INTO searchtags VALUES (6, 1);
INSERT INTO searchtags VALUES (10, 1);
INSERT INTO searchtags VALUES (1, 2);
INSERT INTO searchtags VALUES (2, 2);
INSERT INTO searchtags VALUES (3, 2);
INSERT INTO searchtags VALUES (1, 3);
INSERT INTO searchtags VALUES (10, 3);
INSERT INTO searchtags VALUES (5, 4);
INSERT INTO searchtags VALUES (7, 4);
INSERT INTO searchtags VALUES (10, 4);
INSERT INTO searchtags VALUES (1, 11);
INSERT INTO searchtags VALUES (2, 11);
INSERT INTO searchtags VALUES (5, 11);
INSERT INTO searchtags VALUES (6, 21);
INSERT INTO searchtags VALUES (7, 21);
INSERT INTO searchtags VALUES (10, 31);
INSERT INTO searchtags VALUES (1, 41);
INSERT INTO searchtags VALUES (6, 41);
INSERT INTO searchtags VALUES (10, 41);
INSERT INTO searchtags VALUES (2, 51);
INSERT INTO searchtags VALUES (5, 51);
INSERT INTO searchtags VALUES (10, 51);
INSERT INTO searchtags VALUES (7, 61);
INSERT INTO searchtags VALUES (2, 61);
INSERT INTO searchtags VALUES (1, 61);
CREATE TABLE InterestingTags(GroupID INTEGER, TagID INTEGER);
INSERT INTO InterestingTags VALUES(1, 1);
INSERT INTO InterestingTags VALUES(1, 2);
INSERT INTO InterestingTags VALUES(1, 5);
INSERT INTO InterestingTags VALUES(2, 6);
INSERT INTO InterestingTags VALUES(2, 7);
INSERT INTO InterestingTags VALUES(3, 10);
Test results
The outputs that I got were:
Query 1
1 a
4 d
41 D14
Query 2
1 a
3 c
4 d
31 C13
41 D14
51 E15
Query 3
21 B12
Query 4
11 A11
Query 5
1 a
41 D14
4 d
Query 6
4 d
31 C13
3 c
1 a
41 D14
51 E15
Clearly, if I wanted the output in a specific order, I would add an ORDER BY clause to the queries.

Related

PostgreSQL WITH RECURSIVE order by in non recursive term

I am trying to create a recursive CTE and I wanted to fetch the row in the non recursive term from the table using ORDER BY but it seems impossible to do. Is there any workaround on this?
Example:
CREATE TABLE mytable (
id BIGSERIAL PRIMARY KEY,
ref_id BIGINT NOT NULL,
previous_id BIGINT REFERENCES mytable(id),
some_name TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
INSERT INTO mytable (id, previous_id, ref_id, some_name) VALUES (1, NULL, 1, 'Barry');
INSERT INTO mytable (id, previous_id, ref_id, some_name) VALUES (2, NULL, 1, 'Nick');
INSERT INTO mytable (id, previous_id, ref_id, some_name) VALUES (3, 1, 2, 'Janet');
INSERT INTO mytable (id, previous_id, ref_id, some_name) VALUES (4, 1, 1, 'John');
INSERT INTO mytable (id, previous_id, ref_id, some_name) VALUES (5, 2, 7, 'Ron');
INSERT INTO mytable (id, previous_id, ref_id, some_name) VALUES (6, 1, 1, 'Aaron');
INSERT INTO mytable (id, previous_id, ref_id, some_name) VALUES (7, 4, 1, 'Anna');
The query I am trying to construct
WITH RECURSIVE my_path AS (
SELECT * FROM mytable
WHERE ref_id = 1 AND some_name = 'Anna'
ORDER BY created_at DESC
LIMIT 1
UNION ALL
SELECT ph.* FROM my_path hp
INNER JOIN mytable ph ON hp.previous_id = ph.id
)
SELECT * FROM my_path;
SQLFIDDLE
Just move it into a starter CTE:
updated fiddle
WITH RECURSIVE base_record as (
SELECT * FROM mytable
WHERE ref_id = 1 AND some_name = 'Anna'
ORDER BY created_at DESC
LIMIT 1
), my_path AS (
SELECT * FROM base_record
UNION ALL
SELECT ph.* FROM my_path hp
INNER JOIN mytable ph ON hp.previous_id = ph.id
)
SELECT * FROM my_path;

Postgresql find by count, joined table

Given 3 tables. I need to build SQL query to find two actors who CAST TOGETHER THE MOST and list the titles of those movies. Sort alphabetically
https://www.db-fiddle.com/f/r2Y9CpH8n7MHTeBaqEHe9S/0
The data for reproducing below:
create table film_actor
(
actor_id integer,
film_id integer
)
;
create table film
(
film_id integer,
title varchar
)
;
create table actor
(
actor_id integer,
first_name varchar,
last_name varchar
)
;
INSERT INTO public.film_actor (actor_id, film_id) VALUES (1, 1);
INSERT INTO public.film_actor (actor_id, film_id) VALUES (1, 2);
INSERT INTO public.film_actor (actor_id, film_id) VALUES (1, 3);
INSERT INTO public.film_actor (actor_id, film_id) VALUES (2, 1);
INSERT INTO public.film_actor (actor_id, film_id) VALUES (2, 2);
INSERT INTO public.film_actor (actor_id, film_id) VALUES (2, 3);
INSERT INTO public.film_actor (actor_id, film_id) VALUES (3, 1);
INSERT INTO public.film (film_id, title) VALUES (1, 'First');
INSERT INTO public.film (film_id, title) VALUES (2, 'Second');
INSERT INTO public.film (film_id, title) VALUES (3, 'Third');
INSERT INTO public.film (film_id, title) VALUES (4, 'Fourth');
INSERT INTO public.actor (actor_id, first_name, last_name) VALUES (1, 'John', 'Snow');
INSERT INTO public.actor (actor_id, first_name, last_name) VALUES (2, 'Spider', 'Man');
INSERT INTO public.actor (actor_id, first_name, last_name) VALUES (3, 'Mike', 'Kameron');
Is this what you are looking for?
with acting_pairs as (
select a1.actor_id as a1_id, a2.actor_id as a2_id
from film_actor a1
join film_actor a2 on a1.film_id = a2.film_id
where a1.actor_id < a2.actor_id
)
select a1_id, a2_id, count(*) as total
from acting_pairs
group by (a1_id, a2_id)
order by total desc
limit 1
Giving us expected output for the example input would be nice.

Transpose records from 2 "one to many" tables into one record

Basically, I would like to get a recordset similar to this:
CustomerID, CustomerName, OrderNumbers
1 John Smith 112, 113, 114, 115
2 James Smith 116, 117, 118
Currently I am using an Sql Server UDF to concatenate order #s.
Are there more efficient solutions ?
1) There are allot of solutions.
2) Example (SQL2005+):
DECLARE #Customer TABLE (
CustomerID INT PRIMARY KEY,
Name NVARCHAR(50) NOT NULL
);
INSERT #Customer VALUES (1, 'Microsoft');
INSERT #Customer VALUES (2, 'Macrosoft');
INSERT #Customer VALUES (3, 'Appl3');
DECLARE #Order TABLE (
OrderID INT PRIMARY KEY,
CustomerID INT NOT NULL -- FK
);
INSERT #Order VALUES (1, 1);
INSERT #Order VALUES (2, 2);
INSERT #Order VALUES (3, 2);
INSERT #Order VALUES (4, 1);
INSERT #Order VALUES (5, 2);
SELECT c.CustomerID, c.Name, oa.OrderNumbers
FROM #Customer AS c
/*CROSS or */OUTER APPLY (
SELECT STUFF((
SELECT ',' + CONVERT(VARCHAR(11), o.OrderID) FROM #Order AS o
WHERE o.CustomerID = c.CustomerID
-- ORDER BY o.OrderID -- Uncomment of order nums must be sorted
FOR XML PATH(''), TYPE
).value('.', 'VARCHAR(MAX)'), 1, 1, '') AS OrderNumbers
) oa;
Output:
CustomerID Name OrderNumbers
----------- --------- ------------
1 Microsoft 1,4
2 Macrosoft 2,3,5
3 Appl3 NULL

where exists - all - group by?

I use SQL Server 2008 R2.
I have a weird problem as following. I have a table as shown in
I need to write such a query like:
SELECT DISTINCT Field1
FROM MYTABLE
WHERE Field2 IN (96,102)
in this query, WHERE Field2 IN (96,102) gives me 96 or 102 or both!
More over, I would like to return rows that contains 96 and 102 at the same time!
Is there any suggestion? please write result oriented...
I have made a sqlfiddle for this..
create table a (id int, val int)
go
insert into a select 1, 22
insert into a select 1, 122
insert into a select 2, 22
insert into a select 3, 122
insert into a select 4, 22
insert into a select 4, 122
then select like this
select count(distinct id), id
from a
where val in (22, 122)
group by id
having count(id) > 1
EDIT: count(distinct id) will only show distinct counts..
EDIT:
Here's a sqlfiddle example (thanks to Mark Kremers):
http://sqlfiddle.com/#!3/df201/1
create table mytable (field1 int, field2 int)
go
insert into mytable values (199201, 84)
insert into mytable values (199201, 96)
insert into mytable values (199201, 102)
insert into mytable values (199201, 103)
insert into mytable values (581424, 96)
insert into mytable values (581424, 84)
insert into mytable values (581424, 106)
insert into mytable values (581424, 122)
insert into mytable values (687368, 79)
insert into mytable values (687368, 96)
insert into mytable values (687368, 102)
insert into mytable values (687368, 104)
insert into mytable values (687368, 106)
Here's the query:
select distinct a.field1 from
( select field1 from mytable where field2=96) a
inner join
( select field1 from mytable where field2=102) b
on a.field1 = b.field1
And here are the results:
FIELD1
199201
687368
Finally, here's a simplified version of the query (thans to pst):
select distinct a.field1 from mytable a
inner join mytable b
on a.field1 = b.field1
where a.field2=96 and b.field2=102
Use a self-join? Not the most tidy, but I think it works well for 2 values
SELECT *
FROM T R1
JOIN T R2 -- join table with itself
ON R1.F1 = R2.F1 -- where the first field is the same
WHERE R1.F2 = 96 AND R2.F2 = 102 -- and each has one of the required values
(T = Table, Rx = Relation Alias, Fx = Field)
If there can be an arbitrary number of fields, this can be solved as
CREATE TABLE #T (id int, val int)
GO
INSERT INTO #T (id, val)
VALUES
(1, 22), (1, 22), -- no, only 22 (but 2 records)
(2, 22), (2, 122), -- yes, both values (only)
(3, 122), -- no, only 122
(4, 22), (4,122), -- yes, both values ..
(4, 444), (4, null), -- and extra values
(5, 555) -- no, neither value
GO
-- Using DISTINCT over filtered results first, as
-- SQL Server 2008 does not support HAVING COUNT(DISTINCT F1, F2)
SELECT id
FROM (SELECT DISTINCT id, val
FROM #T
WHERE val IN (22, 122)) AS R1
GROUP BY id
HAVING COUNT(id) >= 2 -- or 3 or ..
GO
-- Or a similar variation, as can COUNT(DISTINCT ..)
-- in the SELECT of a GROUP BY
SELECT id
FROM (SELECT id, COUNT(DISTINCT val) as ct
FROM #T
WHERE val IN (22, 122)
GROUP BY id) AS R1
WHERE ct >= 2 -- or 3 or ..
GO
For larger IN (..) sizes, say above 20 values, it may be advisable to use a separate table or table-value and a JOIN for performance reasons.
Try from your original query:
SELECT DISTINCT Field1
FROM MYTABLE
WHERE rtrim(ltrim(cast(Field2 as varchar))) IN ('96','102')

SQL Server Conditional Comparing

I have two tables:
CREATE TABLE #HOURS
(DAY INTEGER,
HOUR INTEGER)
CREATE TABLE #PERSONS
(DAY INTEGER, HOUR INTEGER,
Name NVARCHAR(50))
GO
INSERT #HOURS VALUES (1, 5)
INSERT #HOURS VALUES (1, 6)
INSERT #HOURS VALUES (1, 8)
INSERT #HOURS VALUES (1, 10)
INSERT #HOURS VALUES (1, 14)
INSERT #HOURS VALUES (1, 15)
INSERT #HOURS VALUES (1, 16)
INSERT #HOURS VALUES (1, 17)
INSERT #HOURS VALUES (1, 18)
INSERT #PERSONS VALUES (1, 5, 'Steve')
INSERT #PERSONS VALUES (1, 6, 'Steve')
INSERT #PERSONS VALUES (1, 7, 'Steve')
INSERT #PERSONS VALUES (1, 8, 'Steve')
INSERT #PERSONS VALUES (1, 10, 'Steve')
INSERT #PERSONS VALUES (1, 14, 'Steve')
INSERT #PERSONS VALUES (1, 15, 'Steve')
INSERT #PERSONS VALUES (1, 16, 'Steve')
INSERT #PERSONS VALUES (1, 17, 'Steve')
INSERT #PERSONS VALUES (1, 10, 'Jim')
INSERT #PERSONS VALUES (1, 11, 'Jim')
INSERT #PERSONS VALUES (1, 12, 'Jim')
INSERT #PERSONS VALUES (1, 13, 'Jim')
GO
Hours shows the work hours and #Persons shows the persons that entered the system on hourly base.
I'd like to find the persons whose work hours matches the hours table. But he or she can skip two work hours.
I've tried this:
select t.Day, sum(t.Nulls)
from
(select h.Hour, h.Day
, Case
WHEN p.Hour is null Then 1 ELSE 0 END Nulls
from #HOURS h
left join #PERSONS P on h.Hour = p.Hour AND h.Day = p.Day) t
group by t.Day
HAVING sum(t.Nulls) < 2
But this only works when there is not different persons on the same day ;)
Any suggestions?
What I think you mean is that for each combination of day and person, you want to return it if the number of valid hours that that person worked on that day is within 2 hours of the number of valid hours that exist for that day. If so, this should do the trick:
SELECT
Day
, Name
, HoursWorked
, HoursInDay
FROM (
SELECT
p.Day
, p.Name
, COUNT(*) HoursWorked
, (SELECT COUNT(*) FROM #Hours H2 WHERE H2.Day = P.Day) HoursInDay
FROM
#Persons P INNER JOIN #Hours H
ON P.Day = H.Day And P.Hour = H.Hour
GROUP BY
p.Day, p.Name
) Data
WHERE
HoursWorked + 2 >= HoursInDay
I think you still need to clarify a bit further what you'd expect as a result. In the mean time, would this get you started?
SELECT p.DAY, p.Name, WORKED = COUNT(*), WORKHOURS = AVG(hc.WORKHOURS)
FROM #PERSONS p
INNER JOIN #HOURS h ON h.DAY = p.DAY AND h.HOUR = p.HOUR
LEFT OUTER JOIN (
SELECT DAY, WORKHOURS = COUNT(*)
FROM #HOURS
GROUP BY DAY
) hc ON hc.DAY = p.DAY
GROUP BY p.DAY, p.Name