Smart way to filter out unnecessary rows from Query - postgresql

So I have a query that shows a huge amount of mutations in postgres. The quality of data is bad and i have "cleaned" it as much as possible.
To make my report so user-friendly as possible I want to filter out some rows that I know the customer don't need.
I have following columns id, change_type, atr, module, value_old and value_new
For change_type = update i always want to show every row.
For the rest of the rows i want to build some kind of logic with a combination of atr and module.
For example if the change_type <> 'update' and concat atr and module is 'weightperson' than i don't want to show that row.
In this case id 3 and 11 are worthless and should not be shown.
Is this the best way to solve this or does anyone have another idea?
select * from t1
where concat(atr,module) not in ('weightperson','floorrentalcontract')
In the end my "not in" part will be filled with over 100 combinations and the query will not look good. Maybe a solution with a cte would make it look prettier and im also concerned about the perfomance..
CREATE TABLE t1(id integer, change_type text, atr text, module text, value_old text, value_new text) ;
INSERT INTO t1 VALUES
(1,'create','id','person',null ,'9'),
(2,'create','username','person',null ,'abc'),
(3,'create','weight','person',null ,'60'),
(4,'update','id','order','4231' ,'4232'),
(5,'update','filename','document','first.jpg' ,'second.jpg'),
(6,'delete','id','rent','12' ,null),
(7,'delete','cost','rent','600' ,null),
(8,'create','id','rentalcontract',null ,'110'),
(9,'create','tenant','rentalcontract',null ,'Jack'),
(10,'create','rent','rentalcontract',null ,'420'),
(11,'create','floor','rentalcontract',null ,'1')
Fiddle

You could put the list of combinations in a separate table and join with that table, or have them listed directly in a with-clause like this:
with combinations_to_remove as (
select *
from (values
('weight', 'person'),
('floor' ,'rentalcontract')
) as t (atr, module)
)
select t1.*
from t1
left join combinations_to_remove using(atr, module)
where combinations_to_remove.atr is null
I guess it would be cleaner and easier to maintain if you put them in a separate table!
Read more on with-queries if that sounds strange to you.

Related

Optimizing a query with multiple IN

I have a query like this:
SELECT * FROM table
WHERE department='param1' AND type='param2' AND product='param3'
AND product_code IN (10-30 alphanumerics) AND unit_code IN (10+ numerics)
AND first_name || last_name IN (10-20 names)
AND sale_id LIKE ANY(list of regex string)
Runtime was too high so I was asked to optimize it.
The list of parameters varies for the code columns for different users.
Each user provides their list of codes and then loops over product.
product used to be an IN clause list as well but it was split up.
Things I tried
By adding an index on (department, type and product) I was able to get a 4x improvement.
Current runtime is that some values of product only take 2-3 seconds, while others take 30s.
Tried creating a pre-concat'd column of first_name || last_name, but the runtime improvement was too small to be worth it.
Is there some way I can improve the performance of the other clauses, such as the "IN" clauses or the LIKE ANY clause?
In my experience replacing large IN lists, with a JOIN to a VALUES clause often improves performance.
So instead of:
SELECT *
FROM table
WHERE department='param1'
AND type='param2'
AND product='param3'
AND product_code IN (10-30 alphanumerics)
Use:
SELECT *
FROM table t
JOIN ( values (1),(2),(3) ) as x(code) on x.code = t.product_code
WHERE department='param1'
AND type='param2'
AND product='param3'
But you have to make sure you don't have any duplicates in the values () list
The concatenation is also wrong because the concatenated value is something different then comparing each value individually, e.g. ('alexander', 'son') would be treated identical to ('alex', 'anderson')`
You should use:
and (first_name, last_name) in ( ('fname1', 'lname1'), ('fname2', 'lname2'))
This can also be written as a join
SELECT *
FROM table t
JOIN ( values (1),(2),(3) ) as x(code) on x.code = t.product_code
JOIN (
values ('fname1', 'lname1'), ('fname2', 'lname2')
) as n(fname, lname) on (n.fname, n.lname) = (t.first_name, t.last_name)
WHERE department='param1'
AND type='param2'
AND product='param3'
You generally don't have to do anything special to enable an index for it to be used with multiple IN-lists, other than keep the table well vacuumed and analyzed. A btree index on (department, type, product, product_code, unit_code, (first_name || last_name)) should work well. If it doesn't, please show an EXPLAIN (ANALYZE, BUFFERS) for it, preferably with track_io_timing turned on. If the selectivities of each of your conditions are not mostly independent of each other, that might lead to planning problems.

TSQL order by but first show these

I'm researching a dataset.
And I just wonder if there is a way to order like below in 1 query
Select * From MyTable where name ='international%' order by id
Select * From MyTable where name != 'international%' order by id
So first showing all international items, next by names who dont start with international.
My question is not about adding columns to make this work, or use multiple DB's, or a largerTSQL script to clone a DB into a new order.
I just wonder if anything after 'Where or order by' can be tricked to do this.
You can use expressions in the ORDER BY:
Select * From MyTable
order by
CASE
WHEN name like 'international%' THEN 0
ELSE 1
END,
id
(From your narrative, it also sounded like you wanted like, not =, so I changed that too)
Another way (slightly cleaner and a tiny bit faster)
-- Sample Data
DECLARE #mytable TABLE (id INT IDENTITY, [name] VARCHAR(100));
INSERT #mytable([name])
VALUES('international something' ),('ACME'),('international waffles'),('ABC Co.');
-- solution
SELECT t.*
FROM #mytable AS t
ORDER BY -PATINDEX('international%', t.[name]);
Note too that you can add a persisted computed column for -PATINDEX('international%', t.[name]) to speed things up.

Postgres subquery has access to column in a higher level table. Is this a bug? or a feature I don't understand?

I don't understand why the following doesn't fail. How does the subquery have access to a column from a different table at the higher level?
drop table if exists temp_a;
create temp table temp_a as
(
select 1 as col_a
);
drop table if exists temp_b;
create temp table temp_b as
(
select 2 as col_b
);
select col_a from temp_a where col_a in (select col_a from temp_b);
/*why doesn't this fail?*/
The following fail, as I would expect them to.
select col_a from temp_b;
/*ERROR: column "col_a" does not exist*/
select * from temp_a cross join (select col_a from temp_b) as sq;
/*ERROR: column "col_a" does not exist
*HINT: There is a column named "col_a" in table "temp_a", but it cannot be referenced from this part of the query.*/
I know about the LATERAL keyword (link, link) but I'm not using LATERAL here. Also, this query succeeds even in pre-9.3 versions of Postgres (when the LATERAL keyword was introduced.)
Here's a sqlfiddle: http://sqlfiddle.com/#!10/09f62/5/0
Thank you for any insights.
Although this feature might be confusing, without it, several types of queries would be more difficult, slower, or impossible to write in sql. This feature is called a "correlated subquery" and the correlation can serve a similar function as a join.
For example: Consider this statement
select first_name, last_name from users u
where exists (select * from orders o where o.user_id=u.user_id)
Now this query will get the names of all the users who have ever placed an order. Now, I know, you can get that info using a join to the orders table, but you'd also have to use a "distinct", which would internally require a sort and would likely perform a tad worse than this query. You could also produce a similar query with a group by.
Here's a better example that's pretty practical, and not just for performance reasons. Suppose you want to delete all users who have no orders and no tickets.
delete from users u where
not exists (select * from orders o where o.user_d = u.user_id)
and not exists (select * from tickets t where t.user_id=u.ticket_id)
One very important thing to note is that you should fully qualify or alias your table names when doing this or you might wind up with a typo that completely messes up the query and silently "just works" while returning bad data.
The following is an example of what NOT to do.
select * from users
where exists (select * from product where last_updated_by=user_id)
This looks just fine until you look at the tables and realize that the table "product" has no "last_updated_by" field and the user table does, which returns the wrong data. Add the alias and the query will fail because no "last_updated_by" column exists in product.
I hope this has given you some examples that show you how to use this feature. I use them all the time in update and delete statements (as well as in selects-- but I find an absolute need for them in updates and deletes often)

TSQL Keyword Previous or Last or something similar

This question is geared for those who have more SQL experience than me.
I am writing a query(that will eventually be a Stored Procedure but this should be irrelevant) where I want to select the count of rows if the most recent entry's is equivalent to the one that was just entered before. And i want to continue to do this until it hits an entry that has a different value. (Poorly explained so I will show the example)
In my table I have a column 'Product_Id' and when this query is run i want it take the product_id and compare it to the previously entered product Id, if its the same I want to add one, and I want it to keep checking the previously entered product_id until it runs into a different product_id
I'm hoping it sounds more complicated than it is, and the query would look something like
Select count(Product_ID)
FROM dbo.myTable
Where Product_Id = previous(Product_Id)
Now, i know that previous isn't a keyword in TSQL, and neither was Last, but I'm hoping of someone who knows a keyword that does what I am asking.
Edit for Sam
USE DbName;
GO
WITH OrderedCount as
(
select ROW_NUMBER() OVER (Order by dbo.Line_Production.Run_Date DESC) as RowNumber,
Line_Production.Product_ID
From dbo.Line_Production
)
Select RowNumber, COUNT(OrderedCount.Product_ID) as PalletCount
From OrderedCount
WHERE OrderedCount.RowNumber + 1 = RowNumber
and Product_ID = Product_ID
Group by RowNumber
The OrderedCount portion works, and it returns the data back how I want it, I'm now having trouble comparing the Product_ID's for different RowNumbers
my Where Clause is wrong
There's no keyword. That would be a nice magic solution, but it doesn't exist, at least in part because there is no guaranteed ordering (okay, you could have the keyword only if there is an ORDER BY...). I can write you a query, but that'll take time, so for now I'll give you a few steps and I'll come back and see if you still need help in a bit.
Figure out an ORDER BY, otherwise no order is guaranteed. If there is a time entered field, that's a good choice, or an index, that works too.
Learn to use Row_Number.
Compare the table (with Row_Number) to itself where instance1.row - 1 = instance2.row.
If product_id is an identity column, couldn't you just do product_id - 1? In other words, if it's sequential, it's the same as using ROW_NUMBER mentioned in the previous comment.

Dynamic number of fields in table

I have a problem with TSQL. I have a number of tables, each table contain different number of fielsds with different names.
I need dynamically take all this tables, read all records and manage each record into string list, where each value separated by commas. And do smth. with this string.
I think that I need to use CURSORS, but I can't FETCH em without knowing A concrete amount of fields with names and types. Maybe I can create a table variable with dynamic number of fields?
Thanks a lot!
Makarov Artem.
I would repurpose one of the many T-SQL scripts written to generate INSERT statements. They do exactly what you require. Namely
Reverse engineer a given table to determine columns names and types
Generate a delimited string of values
The most complete example I've found is here
But just a simple Google search for "INSERT STATEMENT GENERATOR" will yield several examples that you can repurpose to fit your needs.
Best of luck!
SELECT
ORDINAL_POSITION
,COLUMN_NAME
,DATA_TYPE
,CHARACTER_MAXIMUM_LENGTH
,IS_NULLABLE
,COLUMN_DEFAULT
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME = 'MYTABLE'
ORDER BY
ORDINAL_POSITION ASC;
from http://weblogs.sqlteam.com/joew/archive/2008/04/27/60574.aspx
Perhaps you can do something with this.
select T2.X.query('for $i in *
return concat(data($i), ",")'
).value('.', 'nvarchar(max)') as C
from (
select *
from YourTable
for xml path('Row'),elements xsinil, type
) as T1(X)
cross apply T1.X.nodes('/Row') T2(X)
It will give you one row for each row in YourTable with each value in YourTable separated by a comma in the column C.
This builds an XML for the entire table and then parses that XML. Might get you into trouble if you have tables with a lot of rows.
BTW: I saw from a comment that you can "use only pure SQL". I really don't think this qualifies as "pure SQL" :).