In PostgreSQL, how can I optimize a query with which I obtain the differences between the current column and the immediately previous one? - postgresql

I have this audit table
User
date
text
text 2
u1
2023-01-01
hi
yes
u1
2022-12-20
hi
no
u1
2022-12-01
hello
maybe
And I need as a result, something like this:
User
date
text
text 2
u1
2023-01-01
null
x
u1
2022-12-20
x
x
u1
2022-12-01
null
null
So I can know which column changed from the last time.
Something like this is working, but I think may be a way to optimize it? or at least generate a "more easy to look" query? (i need the information for almost 20 columns, not only 3)
SELECT
ta.audit_date,
ta.audit_user,
CASE
WHEN ta.audit_operation = 'I' THEN 'Insert'
WHEN ta.audit_operation = 'U' THEN 'Update'
END AS action,
CASE WHEN ta.column1 <> (SELECT column1
FROM audit_table ta1
WHERE ta1.id = 9207 AND ta1.audit_date < ta.audit_date
ORDER BY ta1.audit_date DESC
LIMIT 1)
THEN 'X' ELSE null END column1,
CASE WHEN ta.column2 <> (SELECT column2
FROM audit_table ta1
WHERE ta1.id = 9207 AND ta1.audit_date < ta.audit_date
ORDER BY ta1.audit_date DESC
LIMIT 1)
THEN 'X' ELSE null END column2,
CASE WHEN ta.column3 <> (SELECT column3
FROM audit_table ta1
WHERE ta1.id = 9207 AND ta1.audit_date < ta.audit_date
ORDER BY ta1.audit_date DESC
LIMIT 1)
THEN 'X' ELSE null END column3
FROM
audit_table ta
WHERE
ta.id = 9207
ORDER BY
audit_date DESC
Thank you!

I think you can just use the LAG() analytic function here. If I understand correctly:
SELECT *, CASE WHEN text != LAG(text) OVER (ORDER BY date) THEN 'x' END AS text_label,
CASE WHEN text2 != LAG(text) OVER (ORDER BY date) THEN 'x' END AS text2_label
FROM yourTable
ORDER BY date;

Related

Postgresql, set order by desc or asc depending on variable parse into function

I have a function that takes product pricing data from today and yesterday and works out the difference, orders it by price_delta_percentage and then limits to 5. Now currently I order by price_delta_percentage DESC which returns the top 5 products that have increased in price since yesterday.
I would like to parse in a variable - sort - to change the function to either sort by DESC, or ASC. I have tried to use IF statements and get syntax errors and CASE statements which states that price_delta_percentage doesn't exist.
Script:
RETURNS TABLE(
product_id varchar,
name varchar,
price_today numeric,
price_yesterday numeric,
price_delta numeric,
price_delta_percentage numeric
)
LANGUAGE 'sql'
COST 100
STABLE STRICT PARALLEL SAFE
AS $BODY$
WITH cte AS (
SELECT
product_id,
name,
SUM(CASE WHEN rank = 1 THEN trend_price ELSE NULL END) price_today,
SUM(CASE WHEN rank = 2 THEN trend_price ELSE NULL END) price_yesterday,
SUM(CASE WHEN rank = 1 THEN trend_price ELSE 0 END) - SUM(CASE WHEN rank = 2 THEN trend_price ELSE 0 END) as price_delta,
ROUND(((SUM(CASE WHEN rank = 1 THEN trend_price ELSE NULL END) / SUM(CASE WHEN rank = 2 THEN trend_price ELSE NULL END) - 1) * 100), 2) as price_delta_percentage
FROM (
SELECT
magic_sets_cards.name,
pricing.product_id,
pricing.trend_price,
pricing.date,
RANK() OVER (PARTITION BY product_id ORDER BY date DESC) AS rank
FROM pricing
JOIN magic_sets_cards_identifiers ON magic_sets_cards_identifiers.mcm_id = pricing.product_id
JOIN magic_sets_cards ON magic_sets_cards.id = magic_sets_cards_identifiers.card_id
JOIN magic_sets ON magic_sets.id = magic_sets_cards.set_id
WHERE date BETWEEN CURRENT_DATE - days AND CURRENT_DATE
AND magic_sets.code = set_code
AND pricing.trend_price > 0.25) p
WHERE rank IN (1,2)
GROUP BY product_id, name
ORDER BY price_delta_percentage DESC)
SELECT * FROM cte WHERE (CASE WHEN price_today IS NULL OR price_yesterday IS NULL THEN 'NULL' ELSE 'VALID' END) !='NULL'
LIMIT 5;
$BODY$;sql
CASE Statement:
ORDER BY CASE WHEN sort = 'DESC' THEN price_delta_percentage END DESC, CASE WHEN sort = 'ASC' THEN price_delta_percentage END ASC)
Error:
ERROR: column "price_delta_percentage" does not exist
LINE 42: ORDER BY CASE WHEN sort = 'DESC' THEN price_delta_percenta...
You can't use CASE to decide between ASC and DESC like that. Those labels are not data, they are part of the SQL grammar. You would need to do it by combining the text into a string and then executing the string as a dynamic query, which means you would need to use pl/pgsql, not SQL
But since your column is numeric, you could just order by the product of the column and an indicator variable which is either 1 or -1.

How to perform Grouping equivalent like Informatica?

I've an Informatica function which I want to convert into query to be getting used in Spring Batch code.
I've a table EMPLOYEE table having 15 fields (all I want in select) and Informatica has function Router which creates group based on STATUS_CD = 'A' and default (means all other records should go here - where status is other than A).
How can we do in Postgres?
I've all the employees and I want to check based using combination of EMPLOYEE_CD, EMPLOYEE_ID is unique and I want to simply return the count of it.
Query1
SELECT EMPLOYEE_CD AS EMPLOYEE_CD,
EMPLOYEE_ID AS EMPLOYEE_ID,
COUNT (*) AS CNT
FROM EMPLOYEE
GROUP BY EMPLOYEE_CD, EMPLOYEE_ID
HAVING COUNT (*) > 1;
Query 2
SELECT EMPLOYEE_ID, EMPLOYEE_NAME, EMPLOYEE_EMAIL, EMPLOYEE_PHONE, EMPLOYEE_ADDRESS, (Create Count Field here)
FROM EMPLOYEE
Query 3 - I need to group (which is my original question) or Create Columns ACTIVE, NON_ACTIVE columns as a part of query results where EMPLOYEE_STAT_CD = 'A', ACTIVE column value should say YES and EMPLOYEE_STAT_CD other than A, NON_ACTIVE should say Yes.
How can merge Query1 and Query 2 and Query 3 into single query ?
if I understood the question, your code is something like:
SELECT EMPLOYEE_ID, EMPLOYEE_NAME, EMPLOYEE_EMAIL, EMPLOYEE_PHONE, EMPLOYEE_ADDRESS,
COUNT(*)OVER(PARTITION BY EMPLOYEE_CD, EMPLOYEE_ID) AS counter_from_sql1,
CASE WHEN EMPLOYEE_STAT_CD = 'A' THEN 'YES' ELSE NULL END AS ACTIVE,
CASE WHEN EMPLOYEE_STAT_CD <> 'A' THEN 'YES' ELSE NULL END AS NON_ACTIVE
FROM EMPLOYEE;
or
SELECT * FROM (
SELECT EMPLOYEE_ID, EMPLOYEE_NAME, EMPLOYEE_EMAIL, EMPLOYEE_PHONE, EMPLOYEE_ADDRESS,
COUNT(*)OVER(PARTITION BY EMPLOYEE_CD, EMPLOYEE_ID) AS counter_from_sql1,
CASE WHEN EMPLOYEE_STAT_CD = 'A' THEN 'YES' ELSE NULL END AS ACTIVE,
CASE WHEN EMPLOYEE_STAT_CD <> 'A' THEN 'YES' ELSE NULL END AS NON_ACTIVE
FROM EMPLOYEE
) z
WHERE counter_from_sql1 > 1;

hive window function - row change in value

I have data which has an id /flag and date fields
I need to populate flag_date field in below way
login_date id flag flag_date
5/1/2018 100 N NULL
5/2/2018 100 N NULL
5/3/2018 100 Y 5/3/2018
5/4/2018 100 Y 5/3/2018
5/5/2018 100 Y 5/3/2018
5/6/2018 100 N NULL
5/7/2018 100 N NULL
5/8/2018 100 Y 5/8/2018
5/9/2018 100 Y 5/8/2018
5/10/2018 100 Y 5/8/2018
When Flag value changes to N to Y, flag_date value changes accordingly.
please help.
select login_date
,id
,flag
,case when flag = 'Y' then min(login_date) over(partition by id,grp) end as flag_date
from (select login_date,id,flag
,row_number() over(partition by id order by login_date) -
row_number() over(partition by id,flag order by login_date) as grp
from tbl
) t
First classify rows into groups, i.e. consecutive 'Y's and 'N's starting a new value when the series is broken. This can be done with a difference of row numbers approach. (Run the inner query to see how group numbers are assigned)
Once the groups are assigned, it is trivial to compute flag_date with conditional aggregation.
One more approach to solve this involves generating a new group whenever a 'N' value is encountered. The outer query remains the same, only the inner one changes.
select login_date
,id
,flag
,case when flag = 'Y' then min(login_date) over(partition by id,grp) end as flag_date
from (select login_date,id,flag
,sum(case when flag = 'N' then 1 else 0 end) over(partition by id order by login_date) as grp
from tbl
) t

T-SQL check to see if date in one table is between two dates in another table then set value

I have two tables shown below. I want to create a new variable (VALUE) based on the logic below and show results in a 3rd table? How can I do this in T SQL?
TABLE_1
ID, DATE
TABLE_2
ID, DATE1, DATE2
Logic to set VALUE:
FOR ALL TABLE_1.ID
IF TABLE_1.DATE IS BETWEEN TABLE_2.DATE1 AND TABLE_2.DATE2
THEN VALUE = 1
ELSE VALUE = 0
IF TABLE_1.ID NOT IN TABLE_2
THEN VALUE = NULL
If you want to see the results for all rows where table_1.id = table_2.id (and table_1 rows that do not have a match on id), then we can use a left join and a case expression:
select
t.id
, t.date
, IsBetween = case
when t.date between t2.Date1 and t2.Date2
then 1
when t2.id is null
then null
else 0
end
, t2.*
from table_1 as t
left join table_2 as t2
on t.id = t2.id
If you only want one row for each row in table_1, and want to know if table_1.data is between any corresponding row in table_2 or not, then we can use a outer apply to select top 1 and a case expression:
select
t.id
, t.date
, IsBetween = case
when t.date between x.Date1 and x.Date2
then 1
when x.id is null
then null
else 0
end
from table_1 t
outer apply (
select top 1 t2.*
from table_2 t2
order by case
when t.date between t2.Date1 and t2.Date2
then 0
else 1
end
) as x

unexplained error in sql execution

UPDATE amc_machine b
SET with_parts = a.with_parts,
amc_validity_upto = a.amc_validity_upto
FROM (SELECT CASE
WHEN count(*) > 0 THEN (SELECT DISTINCT ON (machine_id) with_parts, amc_validity_upto, machine_id
FROM amc_amcdetail
WHERE machine_id = 2 AND id != 1
ORDER BY machine_id, amc_validity_upto DESC)
WHEN count(*) = 0 THEN (SELECT FALSE AS with_parts, NULL AS amc_validity_upto, 2 AS machine_id)
END AS a
FROM (SELECT DISTINCT ON (machine_id) with_parts, amc_validity_upto, machine_id
FROM amc_amcdetail
WHERE machine_id = 2
ORDER BY machine_id, amc_validity_upto
) AS T) AS foo
WHERE a.machine_id = b.id
The error shown is
ERROR: subquery must return only one column
LINE 5: WHEN count(*) > 0 THEN (SELECT DISTINCT ON (machine_id) w...
Can anyone tell what seems to be the problem.
Basically the query is to update on table b with data from table a if exists, else update with null , false as the case is.
The query executes when standalone. I am using Postgres 9.3, but deployment will be on postgres9.1
The subquery returns 3 columns
SELECT DISTINCT ON (machine_id) with_parts, amc_validity_upto, machine_id
Make it return only one
SELECT DISTINCT ON (machine_id) with_parts