Match ALL (not ANY) values from a WHERE clause [duplicate] - tsql

This question already has answers here:
SQL efficient way to join a table where all values exist
(3 answers)
Closed 4 years ago.
TL;DR How do I find rows in a table that match ALL (not ANY) rows from another table?
This seems so simple but I don't know the correct terminology, so am seeing dozens of answers that use INNER JOIN, INTERSECT, EXISTS or ALL, but don't achieve what I need. The other questions are either PostgreSQL, dynamically generated SQL via the application, or are unanswered.
Take the following people who like different colors:
DECLARE #tbl TABLE (
FirstName nvarchar(50),
Color nvarchar(50)
);
INSERT INTO #tbl
(FirstName, Color)
VALUES
('Bob', 'Purple'),
('Bob', 'Red'),
('Bob', 'Yellow'),
('Fred', 'Purple'),
('Fred', 'Red'),
('Fred', 'Yellow'),
('Greg', 'Orange'),
('Greg', 'Red'),
('Harry', 'Red');
I need to find people who like ALL of the colors I'm searching for.
DECLARE #SearchColors TABLE (SearchColor nvarchar(50));
INSERT INTO #SearchColors (SearchColor) VALUES ('Red'),('Yellow');
So I would only expect to see Bob and Fred in the results, because only those two people like ALL of the colours I'm searching for. I don't want people who only like a single colour, however, it doesn't matter if people like more than both of those colours (e.g. Bob likes 3 colors, including the two I need).
Reading through books online, I found ALL, which appeared close to what I need, but actually finds nothing (unless I'm using it wrong):
SELECT
*
FROM
#tbl
WHERE
(Color = ALL ( SELECT SearchColor FROM #SearchColors ));

What about getting the total count of colors you are looking for, filtering your main table for those, then counting names that have that many entries?
DECLARE #colorcount INT = (SELECT COUNT(DISTINCT SearchColor) FROM #SearchColors)
SELECT firstname
FROM #tbl
WHERE color IN (SELECT searchcolor FROM #SearchColors)
GROUP BY firstname
HAVING COUNT(DISTINCT color) = #colorcount
This will handle dupes in your #tbl if ('Greg', 'Red') is in there twice or if your #searchcolors table has ('Red') twice

Related

Smart way to filter out unnecessary rows from Query

So I have a query that shows a huge amount of mutations in postgres. The quality of data is bad and i have "cleaned" it as much as possible.
To make my report so user-friendly as possible I want to filter out some rows that I know the customer don't need.
I have following columns id, change_type, atr, module, value_old and value_new
For change_type = update i always want to show every row.
For the rest of the rows i want to build some kind of logic with a combination of atr and module.
For example if the change_type <> 'update' and concat atr and module is 'weightperson' than i don't want to show that row.
In this case id 3 and 11 are worthless and should not be shown.
Is this the best way to solve this or does anyone have another idea?
select * from t1
where concat(atr,module) not in ('weightperson','floorrentalcontract')
In the end my "not in" part will be filled with over 100 combinations and the query will not look good. Maybe a solution with a cte would make it look prettier and im also concerned about the perfomance..
CREATE TABLE t1(id integer, change_type text, atr text, module text, value_old text, value_new text) ;
INSERT INTO t1 VALUES
(1,'create','id','person',null ,'9'),
(2,'create','username','person',null ,'abc'),
(3,'create','weight','person',null ,'60'),
(4,'update','id','order','4231' ,'4232'),
(5,'update','filename','document','first.jpg' ,'second.jpg'),
(6,'delete','id','rent','12' ,null),
(7,'delete','cost','rent','600' ,null),
(8,'create','id','rentalcontract',null ,'110'),
(9,'create','tenant','rentalcontract',null ,'Jack'),
(10,'create','rent','rentalcontract',null ,'420'),
(11,'create','floor','rentalcontract',null ,'1')
Fiddle
You could put the list of combinations in a separate table and join with that table, or have them listed directly in a with-clause like this:
with combinations_to_remove as (
select *
from (values
('weight', 'person'),
('floor' ,'rentalcontract')
) as t (atr, module)
)
select t1.*
from t1
left join combinations_to_remove using(atr, module)
where combinations_to_remove.atr is null
I guess it would be cleaner and easier to maintain if you put them in a separate table!
Read more on with-queries if that sounds strange to you.

Optimizing a query with multiple IN

I have a query like this:
SELECT * FROM table
WHERE department='param1' AND type='param2' AND product='param3'
AND product_code IN (10-30 alphanumerics) AND unit_code IN (10+ numerics)
AND first_name || last_name IN (10-20 names)
AND sale_id LIKE ANY(list of regex string)
Runtime was too high so I was asked to optimize it.
The list of parameters varies for the code columns for different users.
Each user provides their list of codes and then loops over product.
product used to be an IN clause list as well but it was split up.
Things I tried
By adding an index on (department, type and product) I was able to get a 4x improvement.
Current runtime is that some values of product only take 2-3 seconds, while others take 30s.
Tried creating a pre-concat'd column of first_name || last_name, but the runtime improvement was too small to be worth it.
Is there some way I can improve the performance of the other clauses, such as the "IN" clauses or the LIKE ANY clause?
In my experience replacing large IN lists, with a JOIN to a VALUES clause often improves performance.
So instead of:
SELECT *
FROM table
WHERE department='param1'
AND type='param2'
AND product='param3'
AND product_code IN (10-30 alphanumerics)
Use:
SELECT *
FROM table t
JOIN ( values (1),(2),(3) ) as x(code) on x.code = t.product_code
WHERE department='param1'
AND type='param2'
AND product='param3'
But you have to make sure you don't have any duplicates in the values () list
The concatenation is also wrong because the concatenated value is something different then comparing each value individually, e.g. ('alexander', 'son') would be treated identical to ('alex', 'anderson')`
You should use:
and (first_name, last_name) in ( ('fname1', 'lname1'), ('fname2', 'lname2'))
This can also be written as a join
SELECT *
FROM table t
JOIN ( values (1),(2),(3) ) as x(code) on x.code = t.product_code
JOIN (
values ('fname1', 'lname1'), ('fname2', 'lname2')
) as n(fname, lname) on (n.fname, n.lname) = (t.first_name, t.last_name)
WHERE department='param1'
AND type='param2'
AND product='param3'
You generally don't have to do anything special to enable an index for it to be used with multiple IN-lists, other than keep the table well vacuumed and analyzed. A btree index on (department, type, product, product_code, unit_code, (first_name || last_name)) should work well. If it doesn't, please show an EXPLAIN (ANALYZE, BUFFERS) for it, preferably with track_io_timing turned on. If the selectivities of each of your conditions are not mostly independent of each other, that might lead to planning problems.

Split from a specific character T - SQL

I have a table with column Country_City which include a combinations of Countries and Cities separate with :, example Egypt: Cairo and i want to split them in 2 different columns, Country & City.
I manage to fulfill this task with SUBSTRING & CHARINDEX functions but i m searching for another solution if any.
Any opinions? Thanks in advance.
There are several approaches, but - to be honest - only one good choice: You should never ever store these values in one single column.
If you have to stick with this (legacy issue) or if you need this code in order to clean this bad structure, you may check one of these:
First a mockup table to simulate your issue:
DECLARE #tbl TABLE(ID INT IDENTITY, Country_Region NVARCHAR(1000));
INSERT INTO #tbl VALUEs('Egypt: Cairo'),('Germany: Berlin');
--Fastest in most cases will be this:
SELECT t.*
,TRIM(LEFT(t.Country_Region,A.PosColon-1)) AS Country
,TRIM(SUBSTRING(t.Country_Region,A.PosColon+1,1000)) AS Region
FROM #tbl t
CROSS APPLY(SELECT CHARINDEX(':',t.Country_Region) PosColon) A;
--Easy to read and good to use with more than two items per string (but rather slow)
SELECT t.*
,A.CastedToXml.value('/x[1]','nvarchar(max)') AS Country
,A.CastedToXml.value('/x[2]','nvarchar(max)') AS Region
FROM #tbl t
CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.Country_Region,': ','</x><x>') + '</x>' AS XML) CastedToXml) A;
--Needs v2016, but is very fast, easy to read and easy to up-scale
SELECT t.*
,JSON_VALUE(A.AsJSON,'$[0]') AS Country
,JSON_VALUE(A.AsJSON,'$[1]') AS Region
FROM #tbl t
CROSS APPLY(SELECT CONCAT('["',REPLACE(t.Country_Region,': ','","'),'"]') AsJSON) A;
All of them produce the same output
ID Country_Region Country Region
1 Egypt: Cairo Egypt Cairo
2 Germany: Berlin Germany Berlin

TSQL order by but first show these

I'm researching a dataset.
And I just wonder if there is a way to order like below in 1 query
Select * From MyTable where name ='international%' order by id
Select * From MyTable where name != 'international%' order by id
So first showing all international items, next by names who dont start with international.
My question is not about adding columns to make this work, or use multiple DB's, or a largerTSQL script to clone a DB into a new order.
I just wonder if anything after 'Where or order by' can be tricked to do this.
You can use expressions in the ORDER BY:
Select * From MyTable
order by
CASE
WHEN name like 'international%' THEN 0
ELSE 1
END,
id
(From your narrative, it also sounded like you wanted like, not =, so I changed that too)
Another way (slightly cleaner and a tiny bit faster)
-- Sample Data
DECLARE #mytable TABLE (id INT IDENTITY, [name] VARCHAR(100));
INSERT #mytable([name])
VALUES('international something' ),('ACME'),('international waffles'),('ABC Co.');
-- solution
SELECT t.*
FROM #mytable AS t
ORDER BY -PATINDEX('international%', t.[name]);
Note too that you can add a persisted computed column for -PATINDEX('international%', t.[name]) to speed things up.

PostgreSQL - Update rows in table with generate_series()

I have the following table:
create table test(
id serial primary key,
firstname varchar(32),
lastname varchar(64),
id_desc char(8)
);
I need to insert 100 rows of data. Getting the names is no problem - I have two tables one containing ten rows of first names and the other containing ten last names. By doing a insert - select query with a cross join I am able to get 100 rows of data (10x10 cross join).
id_desc contains of eight characters (fixed size is mandatory). It always starts with the same pattern (e.g. abcde) followed by 001, 002 etc. up to 999. I have tried to achieve this with the following statement:
update test set id_desc = 'abcde' || num.id
from (select * from generate_series(1, 100) as id) as num
where num.id = (select id from test where id = num.id);
The statement executes but affects zero rows. I know that the where-clause probably does not make much sense; I have been trying to finally get this to work and just started trying a couple of things. Didn't want to omit it though when posting here because I know it is definitely required.
Laurenz's suggestion fits this specific case very well. I recommend using it.
The rest of this is for the more general case where that simplification is not appropriate.
In my tests this doesn't work in this way.
I think you are better off using a WITH clause and a window function.
WITH ranked_ids (id, rank) AS (
select id, row_number() OVER (rows unbounded preceding)
FROM test
)
update test set id_desc = 'abcde' || ranked_ids.rank
from ranked_ids WHERE test.id = ranked_ids.id;
It should be as simple as
UPDATE test SET id_desc = 'abcde' || to_char(id, 'FM099');