PostgreSQL group by change in criteria and timestamp bins - postgresql

I have a table with 4 columns: name, criteria, timestamp1 and timestamp2. I also have the the query below that groups by change in the criteria for each name when sorted by timestamp1
Select name, criteria, min(timestamp1), max(timestamp1), min(timestamp2), max(timestamp2),grp
from (
SELECT name, timestamp1, criteria, timestamp2,
row_number() OVER (PARTITION BY name ORDER BY timestamp1)
- row_number() OVER (PARTITION BY name, criteria ORDER BY timestamp1) grp
from table
) foo
group by name, criteria, grp
Unfortunately, the data has overlapping timestamp1 and name values (due to timestamp1 errors) that can only be separated by timestamp2, which is when the record was added to the DB.
What I want to do is order by name and timestamp1 and create a new grp anytime the criteria changes or timestamp2 is <> the first timestamp2 in the group by more than two weeks.

Related

Postgresql: FIRST_VALUE as aggregate funtion

In Postgres, we want to use the window function as an aggregate function.
We have a table, where every line consists of two timestamps and a value. We first extend the table by adding a column with the difference between timestamps - only a few results are possible. Then we group data by timestamp1 and timediff. In each group, there can be more than one line. We need to choose in each group one value, the one that has the smallest timestamp2.
SELECT
timestamp1,
timediff,
FIRST_VALUE(value) OVER (ORDER BY timestamp2) AS value
FROM (
SELECT
timestamp1,
timestamp2,
value,
timestamp2 - timestamp1 AS timediff
FROM forecast_table WHERE device = 'TEST'
) sq
GROUP BY timestamp1,timediff
ORDER BY timestamp1
Error: column "sq.value" must appear in the GROUP BY clause or be used in an aggregate function
You can workaround this by aggregating into an array, then pick the first array element:
SELECT
timestamp1,
timediff,
(array_agg(value ORDER BY timestamp2))[1] AS value
FROM (
SELECT
timestamp1,
timestamp2,
value,
timestamp2 - timestamp1 AS timediff
FROM forecast_table
WHERE device = 'TEST'
) sq
GROUP BY timestamp1,timediff
ORDER BY timestamp1
Or you may use DISTINCT ON with custom ORDER BY.
SELECT DISTINCT ON (timestamp1, timediff)
timestamp1, timestamp2, value,
timestamp2 - timestamp1 AS timediff
FROM forecast_table WHERE device = 'TEST'
ORDER BY timestamp1, timediff, timestamp2;
There is no need for GROUP BY if you are not actually doing any aggregation.
You can get what you want if you define PARTITION BY timestamp1, timestamp2 - timestamp1 inside FIRST_VALUE():
SELECT DISTINCT timestamp1,
FIRST_VALUE(value) OVER (PARTITION BY timestamp1, timestamp2 - timestamp1 ORDER BY timestamp2) AS value,
timestamp2 - timestamp1 AS timediff
FROM forecast_table
WHERE device = 'TEST'
ORDER BY timestamp1, timediff;

Find difference between longest tenure and least tenure employee(s) SQL

i WanT a query to find the number of days between the longest and least tenured employee still working for the company. The output should include the number of employees with the longest-tenure, the number of employees with the least-tenure, and the number of days between both the longest-tenured and least-tenured hiring dates.
this is what i have so far,i am having trouble getting the difference to appear in output
select a.count, date_tenure
from
(
select distinct hire_date ,current_date - hire_date as date_tenure, count(id) over (partition by hire_date) as count, rank() over (order by current_date-hire_date) as rank_asc,
rank() over (order by current_date-hire_date desc) as rank_desc
from employees
where termination_date is null
order by 2 desc) a
where rank_desc = 1
or rank_asc = 1
table : employees
id
hire_date
termination_date

postgres - get top category purchased by customer

I have a denormalized table with the columns:
buyer_id
order_id
item_id
item_price
item_category
I would like to return something that returns 1 row per buyer_id
buyer_id, sum(item_price), item_category
-- but ONLY for the category with the highest rank of sales along that specific buyer_id.
I can't get row_number() or partition to work because I need to order by the sum of item_price relative to item_category relative to buyer. Am I overlooking anything obvious?
You need a few layers of fudging here:
SELECT buyer_id, item_sum, item_category
FROM (
SELECT buyer_id,
rank() OVER (PARTITION BY buyer_id ORDER BY item_sum DESC) AS rnk,
item_sum, item_category
FROM (
SELECT buyer_id, sum(item_price) AS item_sum, item_category
FROM my_table
GROUP BY 1, 3) AS sub2) AS sub
WHERE rnk = 1;
In sub2 you calculate the sum of 'item_price' for each 'item_category' for each 'buyer_id'. In sub you rank these with a window function by 'buyer_id', ordering by 'item_sum' in descending order (so the highest 'item_sum' comes first). In the main query you select those rows where rnk = 1.

t-sql how to select records without a duplicated one column

I want to select rows for all employess without repeating the data in one column.
For example I have two rows where salary (before raise) is displayed, how can I display only the largest figure without duplication.
You can use Row_Number function
Here is a sample code
select * from (
select *,
row_number() over (partition by empid, name, department order by salary desc) as rn
from employee
) employee where rn = 1
You can find Row_Number() with Partition By clause sample at http://www.kodyaz.com
If I'm understanding the question correctly, then a simple MAX function and GROUP BY would work.
SELECT EmployeeId, OtherColumns, MAX(Salary)
FROM tblEmployees
GROUP BY EmployeeId, OtherColumns

ORDER BY items must appear in the select list if SELECT DISTINCT is specified

declare #userid int
set #userid=9846
SELECT TOP 10
tagid,
[Description]
FROM tagsuggestion
WHERE tagid IN (SELECT **DISTINCT** TOP 10 tagid
FROM (SELECT ulpt.tagid,
createddate
FROM userlocationposttag ulpt
WHERE ulpt.UserID=#userid
UNION ALL
SELECT ult.tagid,
createddate
FROM userlocationtag ult
WHERE ult.UserID=#userid
UNION ALL
SELECT upt.tagid,
createddate
FROM userprofiletag upt
WHERE upt.UserID=#userid
) T
ORDER BY createddate DESC)
The above query is error ORDER BY items must appear in the select list if SELECT DISTINCT is specified.
.but if i remove DISTINCT this query runs
I want to ensure the distinctness and i also want to keep the sort order by createddate DESC
one of my associated post is
Last created 10 records from all of the 3 tables
You're basically telling the query to sort the output by a field that isn't there.
You must either put createddate in the select statement, or you could group by tagid, [Description] AND createddate and order by createddate.
Now if you feel you may get duplicate tagids and descriptions when you include distinct createddate, then wrap another query around that and do your top 10 distinct on just those two fields.
Okay, so we only care about each tag based on it's most recent created date, so we can structure the inner query as:
SELECT tagid,MAX(createddate) as createddate FROM (
SELECT ulpt.tagid,
createddate
FROM userlocationposttag ulpt
WHERE ulpt.UserID=#userid
UNION ALL
SELECT ult.tagid,
createddate
FROM userlocationtag ult
WHERE ult.UserID=#userid
UNION ALL
SELECT upt.tagid,
createddate
FROM userprofiletag upt
WHERE upt.UserID=#userid
) t
GROUP BY tagid
And then we can wrap that in another subselect, and apply the top ten:
SELECT TOP 10 tagid FROM (
SELECT tagid,MAX(createddate) as createddate FROM (
SELECT ulpt.tagid,
createddate
FROM userlocationposttag ulpt
WHERE ulpt.UserID=#userid
UNION ALL
SELECT ult.tagid,
createddate
FROM userlocationtag ult
WHERE ult.UserID=#userid
UNION ALL
SELECT upt.tagid,
createddate
FROM userprofiletag upt
WHERE upt.UserID=#userid
) t
GROUP BY tagid
) t ORDER BY createddate desc
And we no longer need distinct (because GROUP BY and MAX already ensured that each tag only appears once)