I am writing a query using PostgreSQL to count something but I want to sort the date (DDMMYYYY) properly.
With this following codes,
WITH dis_id AS (SELECT
DISTINCT ON (source_user_id) source_user_id,
created_at
FROM public.info_scammers )
SELECT d.date, count(dis_id.source_user_id)
FROM (SELECT to_char(date_trunc('day',(current_date - offs)), 'DD-MM-YYYY') AS date
FROM generate_series(0,365,1) AS offs
) d LEFT OUTER JOIN
dis_id
ON (d.date = to_char(date_trunc('day',dis_id.created_at),'YYYY-MM-DD'))
GROUP BY d.date
The result is
Date | Count
01-01-2017 | 0
01-02-2017 | 0
01-03-2017 | 0
What I want is
Date | Count
01-01-2017 | 0
02-01-2017 | 0
03-01-2017 | 0
I have looked up the existing problems. But most of them do not use PostgreSQL
Thank you
Leave d.date as type date in the inner SELECT (don't convert it to text with to_char), then add ORDER BY d.date and do the conversion to text in the outer SELECT.
Something like:
WITH dis_id AS (...)
SELECT to_char(d.date, 'DD-MM-YYYY'), count(...)
FROM (SELECT date_trunc(...) AS date
FROM ...
) d
LEFT OUTER JOIN ...
GROUP BY to_char(d.date, 'DD-MM-YYYY')
ORDER BY d.date;
Related
I have a Postgres table that contains a date and status field. I want to create a query that will return the date, plus the total number of records and then the total number of records for each status on that date.
Source Table:
job_id, process_datetime, process_status
The results I would like:
process_date | total_925_jobs | total_completed_925_jobs
2022-01-02 | 50 | 45
2022-01-03 | 150 | 135
I tried to join to subqueries, but it does not like the calculated date field.
SELECT
date(all_records.create_datetime) AS process_date,
total_jobs.total_925_jobs,
total_completed.total_completed_925_jobs
from "925-FilePreprocessing"
all_records
INNER JOIN
( SELECT
date("925-FilePreprocessing".create_datetime) AS total_process_date,
"925-FilePreprocessing".process_status,
COUNT("925-FilePreprocessing".file_preprocessing_id) as total_925_jobs
FROM
"925-FilePreprocessing"
where
"925-FilePreprocessing".create_datetime > '2022-01-01'
GROUP BY
total_process_date, process_status
) as "total_jobs"
ON date(all_records.create_datetime) = date(total_jobs.total_process_date)
INNER JOIN
(SELECT
date("925-FilePreprocessing".create_datetime) AS completed_process_date,
COUNT("925-FilePreprocessing".file_preprocessing_id) as total_completed_925_jobs
FROM
"925-FilePreprocessing"
where
"925-FilePreprocessing".create_datetime > '2022-01-01'
and ("925-FilePreprocessing".process_status = 'completed'
or "925-FilePreprocessing".process_status = 'completed-duplicated'
or "925-FilePreprocessing".process_status = 'completed-duplicated-published'
or "925-FilePreprocessing".process_status = 'completed-not_a_drawing'
)
GROUP BY
completed_process_date
) as "total_completed"
ON all_records.process_date = total_completed.completed_process_date
ORDER BY
process_date
I get an error:
ERROR: column all_records.process_date does not exist
LINE 42: ON all_records.process_date = total_completed.completed_pro...
^
Conditional count may be usefull
Old way (using sum) - before Postgresql 9.4
select
a.process_datetime::DATE,
count(*) total_925_jobs,
sum ( case when a.process_status in ('completed',
'completed-duplicated',
'completed-duplicated-published',
'completed-not_a_drawing')
then 1
else 0 end) total_completed_925_jobs
from "925-FilePreprocessing" a
where a.process_datetime::DATE >= '2021-01-01'
group by a.process_datetime::DATE
New way - from POstgresql 9.4 (using filter)
select
a.process_datetime::DATE,
count(*) total_925_jobs,
count(*) filter (where a.process_status in ('completed', 'completed-duplicated', 'completed-duplicated-published', 'completed-not_a_drawing')) total_completed_925_jobs
from "925-FilePreprocessing" a
where a.process_datetime::DATE >= '2021-01-01'
group by a.process_datetime::DATE
Going back to your query - I have error column 925-FilePreprocessing.create_datetime does not exist which is different than yours. Check if table definition you deliver is complete.
the result you like
process_date | total_925_jobs | total_completed_925_jobs
2022-01-02 | 50 | 45
2022-01-03 | 150 | 135
since total_completed have far less row than total_jobs means that there are only two date/datetime greater than '2022-01-01'.
the follow query can be get your result. I declutter a lot unnecessary code.
group by 1 mean: https://www.cybertec-postgresql.com/en/postgresql-group-by-expression/
WITH total_jobs AS (
SELECT
create_datetime::date AS total_process_date,
process_status,
COUNT(file_preprocessing_id) AS total_925_jobs
FROM
"925-FilePreprocessing"
WHERE
create_datetime::date > '2022-01-01'::date
GROUP BY
1,
2
),
total_completed AS (
SELECT
date("925-FilePreprocessing".create_datetime) AS completed_process_date,
COUNT(file_preprocessing_id) AS total_completed_925_jobs
FROM
"925-FilePreprocessing"
WHERE
create_datetime::date > '2022-01-01'
AND process_status IN ('completed', 'completed-duplicated', 'completed-duplicated-published', 'completed-not_a_drawing')
GROUP BY
1
)
SELECT
total_jobs. *,
tp.total_completed_925_jobs
FROM
total_jobs tk
JOIN total_completed tp ON tk.total_process_date = tp.completed_process_date
I'm trying to find duplicate rows in a large database (300,000 records). Here's an example of how it looks:
| id | title | thedate |
|----|---------|------------|
| 1 | Title 1 | 2021-01-01 |
| 2 | Title 2 | 2020-12-24 |
| 3 | Title 3 | 2021-02-14 |
| 4 | Title 2 | 2021-05-01 |
| 5 | Title 1 | 2021-01-13 |
I found this excellent (i.e. fast) answer here: Find duplicate rows with PostgreSQL
-- adapted from #MatthewJ answering in https://stackoverflow.com/questions/14471179/find-duplicate-rows-with-postgresql/14471928#14471928
select * from (
SELECT id, title, TO_DATE(thedate,'YYYY-MM-DD'),
ROW_NUMBER() OVER(PARTITION BY title ORDER BY id asc) AS Row
FROM table1
) dups
where
dups.Row > 1
Which I'm trying to use as a base to solve my specific problem: I need to find duplicates according to column values like in the example, but only for records posted within 15 days of each other (the date of record insertion in the column "thedate" in my DB).
I reproduced it in this fiddle http://sqlfiddle.com/#!15/ae109/2, where id 5 (same title as id 1, and posted within 15 days of each other) should be the only acceptable answer.
How would I implement that condition in the query?
With the LAG function you can get the date from the previous row with the same title and then filter based on the time difference.
WITH with_prev AS (
SELECT
*,
LAG(thedate, 1) OVER (PARTITION BY title ORDER BY thedate) AS prev_date
FROM table1
)
SELECT id, title, thedate
FROM with_prev
WHERE thedate::timestamp - prev_date::timestamp < INTERVAL '15 days'
You don't necessarily need window funtions for this, you an use a plain old self-join, like:
select p.id, p.thedate, n.id, n.thedate, p.title
from table1 p
join table1 n on p.title = n.title and p.thedate < n.thedate
where n.thedate::date - p.thedate::date < 15
http://sqlfiddle.com/#!15/a3a73a/7
This has the advantage that it might use some of your indexes on the table, and also, you can decide if you want to use the data (i.e. the ID) of the previous row or the next row from each pair.
If your date column however is not unique, you'll need to be a little more specific in your join condition, like:
select p.id, p.thedate, n.id, n.thedate, p.title
from table1 p
join table1 n on p.title = n.title and p.thedate <= n.thedate and p.id <> n.id
where n.thedate::date - p.thedate::date < 15
In my Postgresql 9.3 database I have a table stock_rotation:
+----+-----------------+---------------------+------------+---------------------+
| id | quantity_change | stock_rotation_type | article_id | date |
+----+-----------------+---------------------+------------+---------------------+
| 1 | 10 | PURCHASE | 1 | 2010-01-01 15:35:01 |
| 2 | -4 | SALE | 1 | 2010-05-06 08:46:02 |
| 3 | 5 | INVENTORY | 1 | 2010-12-20 08:20:35 |
| 4 | 2 | PURCHASE | 1 | 2011-02-05 16:45:50 |
| 5 | -1 | SALE | 1 | 2011-03-01 16:42:53 |
+----+-----------------+---------------------+------------+---------------------+
Types:
SALE has negative quantity_change
PURCHASE has positive quantity_change
INVENTORY resets the actual number in stock to the given value
In this implementation, to get the current value that an article has in stock, you need to sum up all quantity changes since the latest INVENTORY for the specific article (including the inventory value). I do not know why it is implemented this way and unfortunately it would be quite hard to change this now.
My question now is how to do this for more than a single article at once.
My latest attempt was this:
WITH latest_inventory_of_article as (
SELECT MAX(date)
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
)
SELECT a.id, sum(quantity_change)
FROM stock_rotation sr
INNER JOIN article a ON a.id = sr.article_id
WHERE sr.date >= (COALESCE(
(SELECT date FROM latest_inventory_of_article),
'1970-01-01'
))
GROUP BY a.id
But the date for the latest stock_rotation of type INVENTORY can be different for every article.
I was trying to avoid looping over multiple article ids to find this date.
In this case I would use a different internal query to get the max inventory per article. You are effectively using stock_rotation twice but it should work. If it's too big of a table you can try something else:
SELECT sr.article_id, sum(quantity_change)
FROM stock_rotation sr
LEFT JOIN (
SELECT article_id, MAX(date) AS date
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
GROUP BY article_id) AS latest_inventory
ON latest_inventory.article_id = sr.article_id
WHERE sr.date >= COALESCE(latest_inventory.date, '1970-01-01')
GROUP BY sr.article_id
You can use DISTINCT ON together with ORDER BY to get the latest INVENTORY row for each article_id in the WITH clause.
Then you can join that with the original table to get all later rows and add the values:
WITH latest_inventory as (
SELECT DISTINCT ON (article_id) id, article_id, date
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
ORDER BY article_id, date DESC
)
SELECT article_id, sum(sr.quantity_change)
FROM stock_rotation sr
JOIN latest_inventory li USING (article_id)
WHERE sr.date >= li.date
GROUP BY article_id;
Here is my take on it: First, build the list of products at their last inventory state, using a window function. Then, join it back to the entire list, filtering on operations later than the inventory date for the item.
with initial_inventory as
(
select article_id, date, quantity_change from
(select article_id, date, quantity_change, rank() over (partition by article_id order by date desc)
from stockRotation
where type = 'INVENTORY'
) a
where rank = 1
)
select ii.article_id, ii.quantity_change + sum(sr.quantity_change)
from initial_inventory ii
join stockRotation sr on ii.article_id = sr.article_id and sr.date > ii.date
group by ii.article_id, ii.quantity_change
I was looking a way to select the first item from a GROUP BY in PostgreSQL, until I find this stackoverflow: Select first row in each GROUP BY group?
There, I see that the WITH command was used.
I'm trying to understand some more "advanced" commands of SQL, like PARTITION, WITH, ROW_NUMBER etc. Until two or three months ago, I known only the basic commands (SELECT, INNER JOIN, LEFT JOIN, ORDER BY, GROUP BY, etc);
I have a little problem (resolved, but I don't known if this is the better way* to do).
*better way = I'm more concerned about a clean SQL code than the performance - this is just for a reports that will be executed once a day, and no more than 5000 records.
I have two tables, in PostgreSQL:
+----------------------------------------------+
| TABLE NAME: point |
+--------+---------------+----------+----------+
| km | globalid | lat | long |
+--------+---------------+----------+----------+
| 36600 | 1553E2AB-B2F8 | -1774.44 | -5423.58 |
| 364000 | 25EB2465-1B8A | -1773.42 | -5422.03 |
| 362000 | 5FFDE611-88DF | -1771.80 | -5420.37 |
+--------+---------------+----------+----------+
+---------------------------------------------------------+
| TABLE NAME: photo |
+--------------+---------------+------------+-------------+
| attachmentid | rel_globalid | date | filename |
+--------------+---------------+------------+-------------+
| 1 | 1553E2AB-B2F8 | 2015-02-24 | photo01.jpg |
| 2 | 1553E2AB-B2F8 | 2015-02-24 | photo02.jpg |
| 405 | 25EB2465-1B8A | 2015-02-12 | photo03.jpg |
| 406 | 25EB2465-1B8A | 2015-02-12 | photo04.jpg |
| 407 | 25EB2465-1B8A | 2015-02-13 | photo06.jpg |
| 3 | 5FFDE611-88DF | 2015-02-12 | photo07.jpg |
+--------------+---------------+------------+-------------+
So, for the problem:
Every point has one or more photos, but I only need the point data, and first and the last photo. If point has only one photo, I need only the first photo. If point has three photos, I need only the first and the third photo.
So, how I resolved:
First, I need the first photo of every point, so, I grouped by rel_globalid, and numbered every photo by group:
WITH photos_numbered AS (
SELECT
rel_globalid,
date,
filename,
ROW_NUMBER()
OVER (
PARTITION BY rel_globalid
ORDER BY date
) AS photo_num
FROM
photo
)
With this code, I can get the 2th, 3th and so on too.
Ok, so, now, I want to get the first photo (still using the WITH above):
SELECT *
FROM
photos_numbered
WHERE
photo_num = 1
And to get the last photo, I used the following SQL:
SELECT
p1.*
FROM
photos_numbered p1
JOIN (
SELECT
rel_globalid,
max(photo_num) photo_num
FROM
photos_numbered
GROUP BY
rel_globalid
) p2
ON
p1.rel_globalid = p2.rel_globalid AND
p1.photo_num = p2.photo_num
WHERE
p1.photo_num > 1
The WHERE p1.photo_num > 1 is because if point has only one photo, this photo will appear as first photo, and last photo will be NULL.
OK, now I must "convert" the SELECT for the first photo and the last photo to a WITH, and do a simple SELECT with a INNER JOIN for the first photo and a LEFT JOIN for the last photo:
WITH photos_numbered AS (
SELECT
rel_globalid,
date,
filename,
ROW_NUMBER()
OVER (
PARTITION BY rel_globalid
ORDER BY date
) AS photo_num
FROM
photo
), first_photo AS (
SELECT *
FROM
photos_numbered
WHERE
photo_num = 1
), last_photo AS (
SELECT p1.*
FROM
photos_numbered p1
JOIN (
SELECT
rel_globalid,
max(photo_num) photo_num
FROM
photos_numbered
GROUP BY
rel_globalid
) p2
ON p1.rel_globalid = p2.rel_globalid AND
p1.photo_num = p2.photo_num
WHERE
p1.photo_num > 1
)
SELECT DISTINCT
point.km,
point.globalid,
point.lat,
point."long",
first_photo.date AS fp_date,
first_photo.filename AS fp_filename,
last_photo.date AS lp_date,
last_photo.filename AS lp_filename
FROM
point
INNER JOIN
first_photo
ON
first_photo.rel_globalid = point.globalid
LEFT JOIN
last_photo
ON
last_photo.rel_globalid = point.globalid
ORDER BY
km
I think this SQL is huge for a 'simple thing'!
Is working? Yes, but I want some advices, some documentations that I can read and understand better, some commands that maybe I can use to make a "better" SQL (like I said, about two or three months ago I don't even know the PARTITION and WITH commands).
I tried to put a link for SQLFiddle here, but SQLFiddle never worked for me (always return 'oops' message).
If you are looking for clean SQL, then try lateral left join together with first_value and last_value window functionsinstead of common table expression (WITH clause):
select *
from point po
left join lateral
(
select first_value( date ) over( order by ph.date) as first_photo_date,
first_value( filename ) over( order by ph.date) as first_photo_filename,
last_value( date ) over( order by ph.date) as last_photo_date,
last_value( filename ) over( order by ph.date) as last_photo_filename
from photo ph
where po.globalid = ph.rel_globalid
limit 1
) q on true
;
an additional count(*) over() with a case expression can be used to "clean" values of last photo when there is only one record:
select *
from point po
left join lateral
(
select first_value( date ) over( order by ph.date) as first_photo_date,
first_value( filename ) over( order by ph.date) as first_photo_filename,
case when count(*) over () > 1
then last_value( date ) over( order by ph.date)
end as last_photo_date,
case when count(*) over () > 1
then last_value( filename ) over( order by ph.date)
end as last_photo_filename
from photo ph
where po.globalid = ph.rel_globalid
limit 1
) q on true
;
Using the answer from krokodilko, I made a new SQL query without LEFT JOIN LATERAL, because I'm using PostgreSQL 9.2 (without LEFT JOIN LATERAL).
SELECT DISTINCT
po.km,
po.globalid,
po.lat,
po."long",
ph.fp_date,
ph.fp_filename,
ph.lp_date,
ph.lp_filename
FROM
point po
INNER JOIN
(
SELECT DISTINCT
rel_globalid,
first_value(date) OVER (PARTITION BY ph.rel_globalid) AS fp_date,
first_value(filename) OVER (PARTITION BY ph.rel_globalid) AS fp_filename,
CASE WHEN count(*) OVER (PARTITION BY ph.rel_globalid) > 1 THEN
last_value(date) OVER (PARTITION BY ph.rel_globalid)
END AS lp_date,
CASE WHEN count(*) OVER (PARTITION BY ph.rel_globalid) > 1 THEN
last_value(filename) OVER (PARTITION BY ph.rel_globalid)
END AS lp_filename
FROM
photo ph
ORDER BY
rel_globalid
) ph
ON ph.rel_globalid = po.globalid
only thing I don't like it's OVER (PARTITION) in almost every field in INNER JOIN
Please see http://sqlfiddle.com/#!6/9254d/3/0
I have two tables, Person and Values, PersonID is the link between them. Each person in the Values table has multiple values per day for every hour. I need to get the latest value for each user. I had a look on SO and what I could find was to get MAX(ValueDate) and then join on that but doesn't work. Join on PersonID didn't work either, not sure what else to try.
The output I need is
Name Value
1fn 1ln 2
2fn 2ln 20
3fn 3ln 200
I don't need the greatest value, I need the latest value for each person. Please share if you have any ideas. Thanks.
Try this:
SQLFIDDLEExample
DECLARE #Org nvarchar(3)
SELECT #Org = 'aaa'
DECLARE #MyDate date
SELECT #MyDate = CONVERT(date, '2014-09-12')
SELECT a.Name,
a.Value as Revenue
FROM(
SELECT p.FName + ' ' + p.LName AS Name,
vt.Value,
ROW_NUMBER()OVER(PARTITION BY vt.PersonID ORDER BY vt.ValueDate desc) as rnk
FROM Person p
LEFT JOIN ValueTable vt
ON vt.PersonID = p.PersonID
WHERE vt.ValueDate < DATEADD(day,1,#MyDate)
AND vt.ValueDate >= #MyDate
AND vt.Org = #Org)a
WHERE a.rnk = 1
ORDER BY a.Name ASC
Result:
| NAME | REVENUE |
|---------|---------|
| 1fn 1ln | 2 |
| 2fn 2ln | 20 |
| 3fn 3ln | 200 |