How to get latest value from table with self inner join - tsql

Please see http://sqlfiddle.com/#!6/9254d/3/0
I have two tables, Person and Values, PersonID is the link between them. Each person in the Values table has multiple values per day for every hour. I need to get the latest value for each user. I had a look on SO and what I could find was to get MAX(ValueDate) and then join on that but doesn't work. Join on PersonID didn't work either, not sure what else to try.
The output I need is
Name Value
1fn 1ln 2
2fn 2ln 20
3fn 3ln 200
I don't need the greatest value, I need the latest value for each person. Please share if you have any ideas. Thanks.

Try this:
SQLFIDDLEExample
DECLARE #Org nvarchar(3)
SELECT #Org = 'aaa'
DECLARE #MyDate date
SELECT #MyDate = CONVERT(date, '2014-09-12')
SELECT a.Name,
a.Value as Revenue
FROM(
SELECT p.FName + ' ' + p.LName AS Name,
vt.Value,
ROW_NUMBER()OVER(PARTITION BY vt.PersonID ORDER BY vt.ValueDate desc) as rnk
FROM Person p
LEFT JOIN ValueTable vt
ON vt.PersonID = p.PersonID
WHERE vt.ValueDate < DATEADD(day,1,#MyDate)
AND vt.ValueDate >= #MyDate
AND vt.Org = #Org)a
WHERE a.rnk = 1
ORDER BY a.Name ASC
Result:
| NAME | REVENUE |
|---------|---------|
| 1fn 1ln | 2 |
| 2fn 2ln | 20 |
| 3fn 3ln | 200 |

Related

Finding duplicate records posted within a lapse of time, in PostgreSQL

I'm trying to find duplicate rows in a large database (300,000 records). Here's an example of how it looks:
| id | title | thedate |
|----|---------|------------|
| 1 | Title 1 | 2021-01-01 |
| 2 | Title 2 | 2020-12-24 |
| 3 | Title 3 | 2021-02-14 |
| 4 | Title 2 | 2021-05-01 |
| 5 | Title 1 | 2021-01-13 |
I found this excellent (i.e. fast) answer here: Find duplicate rows with PostgreSQL
-- adapted from #MatthewJ answering in https://stackoverflow.com/questions/14471179/find-duplicate-rows-with-postgresql/14471928#14471928
select * from (
SELECT id, title, TO_DATE(thedate,'YYYY-MM-DD'),
ROW_NUMBER() OVER(PARTITION BY title ORDER BY id asc) AS Row
FROM table1
) dups
where
dups.Row > 1
Which I'm trying to use as a base to solve my specific problem: I need to find duplicates according to column values like in the example, but only for records posted within 15 days of each other (the date of record insertion in the column "thedate" in my DB).
I reproduced it in this fiddle http://sqlfiddle.com/#!15/ae109/2, where id 5 (same title as id 1, and posted within 15 days of each other) should be the only acceptable answer.
How would I implement that condition in the query?
With the LAG function you can get the date from the previous row with the same title and then filter based on the time difference.
WITH with_prev AS (
SELECT
*,
LAG(thedate, 1) OVER (PARTITION BY title ORDER BY thedate) AS prev_date
FROM table1
)
SELECT id, title, thedate
FROM with_prev
WHERE thedate::timestamp - prev_date::timestamp < INTERVAL '15 days'
You don't necessarily need window funtions for this, you an use a plain old self-join, like:
select p.id, p.thedate, n.id, n.thedate, p.title
from table1 p
join table1 n on p.title = n.title and p.thedate < n.thedate
where n.thedate::date - p.thedate::date < 15
http://sqlfiddle.com/#!15/a3a73a/7
This has the advantage that it might use some of your indexes on the table, and also, you can decide if you want to use the data (i.e. the ID) of the previous row or the next row from each pair.
If your date column however is not unique, you'll need to be a little more specific in your join condition, like:
select p.id, p.thedate, n.id, n.thedate, p.title
from table1 p
join table1 n on p.title = n.title and p.thedate <= n.thedate and p.id <> n.id
where n.thedate::date - p.thedate::date < 15

Count With Conditional on PostgreSQL

I have a table with people and another with visits. I want to count all visits but if the person signed up with 'emp' or 'oth' on ref_signup then remove the first visit. Example:
This are my tables:
PEOPLE:
id | ref_signup
---------------------
20 | emp
30 | oth
23 | fri
VISITS
id | date
-------------------------
20 | 10-01-2019
20 | 10-05-2019
23 | 10-09-2019
23 | 10-10-2019
30 | 09-10-2019
30 | 10-07-2019
On this example the visit count should be 4 because persons with id's 20 and 30 have their ref_signup as emp or oth, so it should exclude their first visit, but count from the second and forward.
This is what I have as a query:
SELECT COUNT(*) as visit_count FROM visits
LEFT JOIN people ON people.id = visits.people_id
WHERE visits.group_id = 1
Would using a case on the count help on this case as I just want to remove one visit not all of the visits from the person.
Subtract from COUNT(*) the distinct number of person.ids with person.ref_signup IN ('emp', 'oth'):
SELECT
COUNT(*) -
COUNT(DISTINCT CASE WHEN p.ref_signup IN ('emp', 'oth') THEN p.id END) as visit_count
FROM visits v LEFT JOIN people p
ON p.id = v.id
See the demo.
Result:
| visit_count |
| ----------- |
| 4 |
Note: this code and demo fiddle use the column names of your sample data.
Premise, select the count of visits from each person, along with a synthetic column that contains a 1 if the referral was from emp or oth, a 0 otherwise. Select the sum of the count minus the sum of that column.
SELECT SUM(count) - SUM(ignore_first) FROM (SELECT COUNT(*) as count, CASE WHEN ref_signup in ('emp', 'oth') THEN 1 ELSE 0 END as ignore_first as visit_count FROM visits
LEFT JOIN people ON people.id = visits.people_id
WHERE visits.group_id = 1 GROUP BY id) a
where's "people_id" in your example ?
SELECT COUNT(*) as visit_count
FROM visits v
JOIN people p ON p.id = v.people_id
WHERE p.ref_signup IN ('emp','oth');
then remove the first visit.
You cannot select count and delete the first visit at same time.
DELETE FROM visits
WHERE id IN (
SELECT id
FROM visits v
JOIN people p ON p.id = v.people_id
WHERE p.ref_signup IN ('emp','oth')
ORDER BY v.id
LIMIT 1
);
edit: typos
First, I create the tables
create table people (id int primary key, ref_signup varchar(3));
insert into people (id, ref_signup) values (20, 'emp'), (30, 'oth'), (23, 'fri');
create table visits (people_id int not null, visit_date date not null);
insert into visits (people_id, visit_date) values (20, '10-01-2019'), (20, '10-05-2019'), (23, '10-09-2019'), (23, '10-10-2019'), (30, '09-10-2019'), (30, '10-07-2019');
You can use the row_number() window function to mark which visit is "visit number one":
select
*,
row_number() over (partition by people_id order by visit_date) as visit_num
from people
join visits
on people.id = visits.people_id
Once you have that, you can do another query on those results, and use the filter clause to count up the correct rows that match the condition where visit_num > 1 or ref_signup = 'fri':
-- wrap the first query in a WITH clause
with joined_visits as (
select
*,
row_number() over (partition by people_id order by visit_date) as visit_num
from people
join visits
on people.id = visits.people_id
)
select count(1) filter (where visit_num > 1 or ref_signup = 'fri')
from joined_visits;
-- First get the corrected counts for all users
WITH grouped_visits AS (
SELECT
COUNT(visits.*) -
CASE WHEN people.ref_signup IN ('emp', 'oth') THEN 1 ELSE 0 END
AS visit_count
FROM visits
INNER JOIN people ON (people.id = visits.id)
GROUP BY people.id, people.ref_signup
)
-- Then sum them
SELECT SUM(visit_count)
FROM grouped_visits;
This should give you the result you're looking for.
On a side note, I can't help but think clever use of a window function could do this in a single shot without the CTE.
EDIT: No, it can't since window functions run after needed WHERE and GROUP BY and HAVING clauses.

Sum with different condition for every line

In my Postgresql 9.3 database I have a table stock_rotation:
+----+-----------------+---------------------+------------+---------------------+
| id | quantity_change | stock_rotation_type | article_id | date |
+----+-----------------+---------------------+------------+---------------------+
| 1 | 10 | PURCHASE | 1 | 2010-01-01 15:35:01 |
| 2 | -4 | SALE | 1 | 2010-05-06 08:46:02 |
| 3 | 5 | INVENTORY | 1 | 2010-12-20 08:20:35 |
| 4 | 2 | PURCHASE | 1 | 2011-02-05 16:45:50 |
| 5 | -1 | SALE | 1 | 2011-03-01 16:42:53 |
+----+-----------------+---------------------+------------+---------------------+
Types:
SALE has negative quantity_change
PURCHASE has positive quantity_change
INVENTORY resets the actual number in stock to the given value
In this implementation, to get the current value that an article has in stock, you need to sum up all quantity changes since the latest INVENTORY for the specific article (including the inventory value). I do not know why it is implemented this way and unfortunately it would be quite hard to change this now.
My question now is how to do this for more than a single article at once.
My latest attempt was this:
WITH latest_inventory_of_article as (
SELECT MAX(date)
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
)
SELECT a.id, sum(quantity_change)
FROM stock_rotation sr
INNER JOIN article a ON a.id = sr.article_id
WHERE sr.date >= (COALESCE(
(SELECT date FROM latest_inventory_of_article),
'1970-01-01'
))
GROUP BY a.id
But the date for the latest stock_rotation of type INVENTORY can be different for every article.
I was trying to avoid looping over multiple article ids to find this date.
In this case I would use a different internal query to get the max inventory per article. You are effectively using stock_rotation twice but it should work. If it's too big of a table you can try something else:
SELECT sr.article_id, sum(quantity_change)
FROM stock_rotation sr
LEFT JOIN (
SELECT article_id, MAX(date) AS date
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
GROUP BY article_id) AS latest_inventory
ON latest_inventory.article_id = sr.article_id
WHERE sr.date >= COALESCE(latest_inventory.date, '1970-01-01')
GROUP BY sr.article_id
You can use DISTINCT ON together with ORDER BY to get the latest INVENTORY row for each article_id in the WITH clause.
Then you can join that with the original table to get all later rows and add the values:
WITH latest_inventory as (
SELECT DISTINCT ON (article_id) id, article_id, date
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
ORDER BY article_id, date DESC
)
SELECT article_id, sum(sr.quantity_change)
FROM stock_rotation sr
JOIN latest_inventory li USING (article_id)
WHERE sr.date >= li.date
GROUP BY article_id;
Here is my take on it: First, build the list of products at their last inventory state, using a window function. Then, join it back to the entire list, filtering on operations later than the inventory date for the item.
with initial_inventory as
(
select article_id, date, quantity_change from
(select article_id, date, quantity_change, rank() over (partition by article_id order by date desc)
from stockRotation
where type = 'INVENTORY'
) a
where rank = 1
)
select ii.article_id, ii.quantity_change + sum(sr.quantity_change)
from initial_inventory ii
join stockRotation sr on ii.article_id = sr.article_id and sr.date > ii.date
group by ii.article_id, ii.quantity_change

sorting DDMMYYYY monthly using postgreSQL

I am writing a query using PostgreSQL to count something but I want to sort the date (DDMMYYYY) properly.
With this following codes,
WITH dis_id AS (SELECT
DISTINCT ON (source_user_id) source_user_id,
created_at
FROM public.info_scammers )
SELECT d.date, count(dis_id.source_user_id)
FROM (SELECT to_char(date_trunc('day',(current_date - offs)), 'DD-MM-YYYY') AS date
FROM generate_series(0,365,1) AS offs
) d LEFT OUTER JOIN
dis_id
ON (d.date = to_char(date_trunc('day',dis_id.created_at),'YYYY-MM-DD'))
GROUP BY d.date
The result is
Date | Count
01-01-2017 | 0
01-02-2017 | 0
01-03-2017 | 0
What I want is
Date | Count
01-01-2017 | 0
02-01-2017 | 0
03-01-2017 | 0
I have looked up the existing problems. But most of them do not use PostgreSQL
Thank you
Leave d.date as type date in the inner SELECT (don't convert it to text with to_char), then add ORDER BY d.date and do the conversion to text in the outer SELECT.
Something like:
WITH dis_id AS (...)
SELECT to_char(d.date, 'DD-MM-YYYY'), count(...)
FROM (SELECT date_trunc(...) AS date
FROM ...
) d
LEFT OUTER JOIN ...
GROUP BY to_char(d.date, 'DD-MM-YYYY')
ORDER BY d.date;

Select query for selecting columns from those records from the inner query . where inner query and outer query have different columns

I have a group by query which fetches me some records. What if I wish to find other column details representing those records.
Suppose I have a query as follows .Select id,max(date) from records group by id;
to fetch the most recent entry in the table.
I wish to fetch another column representing those records .
I want to do something like this (This incorrect query is just for example) :
Select type from (Select id,max(date) from records group by id) but here type doesnt exist in the inner query.
I am not able to define the question in a simpler manner.I Apologise for that.
Any help is appreciated.
EDIT :
Column | Type | Modifiers
--------+-----------------------+-----------
id | integer |
rdate | date |
type | character varying(20) |
Sample Data :
id | rdate | type
----+------------+------
1 | 2013-11-03 | E1
1 | 2013-12-12 | E1
2 | 2013-12-12 | A3
3 | 2014-01-11 | B2
1 | 2014-01-15 | A1
4 | 2013-12-23 | C1
5 | 2014-01-05 | C
7 | 2013-12-20 | D
8 | 2013-12-20 | D
9 | 2013-12-23 | A1
While I was trying something like this (I'm no good at sql) : select type from records as r1 inner join (Select id,max(rdate) from records group by id) r2 on r1.rdate = r2.rdate ;
or
select type from records as r1 ,(Select id,max(rdate) from records group by id) r2 inner join r1 on r1.rdate = r2.rdate ;
You can easily do this with a window function:
SELECT id, rdate, type
FROM (
SELECT id, rdate, type, rank() OVER (PARTITION BY id ORDER BY rdate DESC) rnk
FROM records
WHERE rnk = 1
) foo
ORDER BY id;
The window definition OVER (PARTITION BY id ORDER BY rdate DESC) takes all records with the same id value, then sorts then from most recent to least recent rdate and assigns a rank to each row. The rank of 1 is the most recent, so equivalent to max(rdate).
If I've understood the question right, then this should work (or at least get you something you can work with):
SELECT
b.id, b.maxdate, a.type
FROM
records a -- this is the records table, where you'll get the type
INNER JOIN -- now join it to the group by query
(select id, max(rdate) as maxdate FROM records GROUP BY id) b
ON -- join on both rdate and id, otherwise you'll get lots of duplicates
b.id = a.id
AND b.maxdate = a.rdate
Note that if you have records with different types for the same id and rdate combination you'll get duplicates.