How to select only one result per condition met inside an individual table (no joins)? - postgresql

I have a table containing all the trips taken by different cars. I've filtered down this table to trips that had multiple stops specifically. Now all i want to do is get the first stop that each car had.
What i've got is:
Car ID
Date_depart
Date_arrive
Count (from a previous table creation)
I've filtered this table by using Car ID + Date Depart and making a count where there are multiple date_arrives for a single date_depart. Now i'm trying to figure out how to only get back the first stop but am completely stuck. Outside of doing the lateral join X, order by Z limit 1 etc method; i have no idea how to get back only the first result in this table.
Here's some sample data:
Car ID Date_depart Date_arrive Count
949 2017-01-01 2017-01-05 2
949 2017-01-01 2017-01-09 2
1940 2017-01-09 2017-01-11 3
1940 2017-01-09 2017-01-14 3
1940 2017-01-09 2017-01-28 3
949 2018-04-19 2018-04-23 2
949 2018-04-19 2018-04-26 2
and the expected result would be:
Car ID Date_depart Date_arrive Count
949 2017-01-01 2017-01-05 2
1940 2017-01-09 2017-01-11 3
949 2018-04-19 2018-04-23 2
Any help?

You need DISTINCT ON
SELECT DISTINCT ON (date_depart, car_id)
*
FROM
trips
ORDER BY date_depart, car_id, date_arrive
This gives you the first (ordered) row of each group (date_depart, car_id)
demo: db<>fiddle

Related

How do i write a group by query in PostgreSQL

I'm getting errors with PostgreSQL when am writing a group by query,
am sure someone will tell me to put all the columns I've selected in group by, but that will not give me the correct results.
Am writing a query that will select all the vehicles in the database and group the results by vehicles, giving me the total distance and cost for a given period.
Here is how am doing the query.
SELECT i.vehicle AS vehicle,
i.costcenter AS costCenter,
i.department AS department,
SUM(i.quantity) AS liters,
SUM(i.totalcost) AS Totalcost,
v.model AS model,
v.vtype AS vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates::text LIKE '%2019-03%' AND i.deleted_at IS NULL
GROUP BY i.vehicle;
If I put all the columns that are in the select in the group bt, the results will not be correct.
How do i go about this without putting all the columns in group by and creating sub-queries?
The fuel table looks like:
vehicle dates department quantity totalcost
1 2019-01-01 102 12 1200
1 2019-01-05 102 15 1500
1 2019-01-13 102 18 1800
1 2019-01-22 102 10 1000
2 2019-01-01 102 12 1260
2 2019-01-05 102 19 1995
2 2019-01-13 102 28 2940
Vehicle Table
id model vtype
1 1 2
2 4 6
2 5 7
This is the results i expect from the query
vehicle dates department quantity totalcost model vtype
1 2019-01-01 102 12 1200 1 2
1 2019-01-05 102 15 1500 1 2
1 2019-01-13 102 18 1800 1 2
1 2019-01-22 102 10 1000 1 2
1 2019-01-18 102 10 1000 1 2
1 65 6500
2 2019-01-01 102 12 1260 5 7
2 2019-01-05 102 19 1995 5 7
2 2019-01-13 102 28 2940 5 7
1 45 6195
Your query doesn't really make sense. Apparently there can be multiple departments and costcenters per vehicle in the fuelissuances table - which of those should be returned?
One way to deal with that, is to return all of them, e.g. as an array:
SELECT i.vehicle,
array_agg(i.costcenter) as costcenters,
array_agg(i.department) as departments,
SUM(i.quantity) AS liters,
SUM(i.totalcost) AS Totalcost,
v.model,
v.vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates >= date '2019-03-01'
and i.date < date '2019-04-01'
AND i.deleted_at IS NULL
group by i.vehicle, v.model, v.vtype;
Instead of an array, you could also return a comma separated lists of those values, e.g. string_agg(i.costcenter, ',') as costcenters.
Adding the columns v.model and v.vtype won't (shouldn't) change anything as the group by i.vehicle will only return a single vehicle anyway and thus the model and vtype won't change for that in the group.
Note that I removed the useless aliases and replaced the condition on the date with a proper range condition that can make use of an index on the dates column.
Edit
Based on your new sample data, you want a running total, rather than a "regular" aggregation. This can easily be done using window functions
SELECT i.vehicle,
i.costcenter,
i.department,
SUM(i.quantity) over (w) AS liters,
SUM(i.totalcost) over (w) AS Totalcost,
v.model,
v.vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates >= date '2019-01-01'
and i.dates < date '2019-02-01'
AND i.deleted_at IS NULL
window w as (partition by i.vehicle order by i.dates)
order by i.vehicle, i.dates;
I would not create those "total" lines using SQL, but rather in your front end that display the data.
Online example: https://rextester.com/CRJZ27446
You need to use a nested query to get those SUM you want inside that query.
SELECT i.vehicle AS vehicle,
i.costcenter AS costCenter,
i.department AS department,
(SELECT SUM(i.quantity) FROM TABLES WHERE CONDITIONS GROUP BY vehicle) AS liters,
(SELECT SUM(i.totalcost) FROM TABLES WHERE CONDITIONS GROUP BY vehicle) AS Totalcost,
v.model AS model,
v.vtype AS vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates::text LIKE '%2019-03%' AND i.deleted_at IS NULL;

Running Count Total with PostgresQL

I'm fairly close to this solution, but I just need a little help getting over the end.
I'm trying to get a running count of the occurrences of client_ids regardless of the date, however I need the dates and ids to still appear in my results to verify everything.
I found part of the solution here but have not been able to modify it enough for my needs.
Here is what the answer should be, counting if the occurrences of the client_ids sequentially :
id client_id deliver_on running_total
1 138 2017-10-01 1
2 29 2017-10-01 1
3 138 2017-10-01 2
4 29 2013-10-02 2
5 29 2013-10-02 3
6 29 2013-10-03 4
7 138 2013-10-03 3
However, here is what I'm getting:
id client_id deliver_on running_total
1 138 2017-10-01 1
2 29 2017-10-01 1
3 138 2017-10-01 1
4 29 2013-10-02 3
5 29 2013-10-02 3
6 29 2013-10-03 1
7 138 2013-10-03 2
Rather than counting the times the client_id appears sequentially, the code counts the time the id appears in the previous date range.
Here is my code and any help would be greatly appreciated.
Thank you,
SELECT n.id, n.client_id, n.deliver_on, COUNT(n.client_id) AS "running_total"
FROM orders n
LEFT JOIN orders o
ON (o.client_id = n.client_id
AND n.deliver_on > o.deliver_on)
GROUP BY n.id, n.deliver_on, n.client_id
ORDER BY n.deliver_on ASC
* EDIT WITH ANSWER *
I ending up solving my own question. Here is the solution with comments:
-- Set "1" for counting to be used later
WITH DATA AS (
SELECT
orders.id,
orders.client_id,
orders.deliver_on,
COUNT(1) -- Creates a column of "1" for counting the occurrences
FROM orders
GROUP BY 1
ORDER BY deliver_on, client_id
)
SELECT
id,
client_id,
deliver_on,
SUM(COUNT) OVER (PARTITION BY client_id
ORDER BY client_id, deliver_on
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -- Counts the sequential client_ids based on the number of times they appear
FROM DATA
Just the answer posted to close the question:
-- Set "1" for counting to be used later
WITH DATA AS (
SELECT
orders.id,
orders.client_id,
orders.deliver_on,
COUNT(1) -- Creates a column of "1" for counting the occurrences
FROM orders
GROUP BY 1
ORDER BY deliver_on, client_id
)
SELECT
id,
client_id,
deliver_on,
SUM(COUNT) OVER (PARTITION BY client_id
ORDER BY client_id, deliver_on
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -- Counts the sequential client_ids based on the number of times they appear
FROM DATA

Optimized querying in PostgreSQL

Assume you have a table named tracker with following records.
issue_id | ingest_date | verb,status
10 2015-01-24 00:00:00 1,1
10 2015-01-25 00:00:00 2,2
10 2015-01-26 00:00:00 2,3
10 2015-01-27 00:00:00 3,4
11 2015-01-10 00:00:00 1,3
11 2015-01-11 00:00:00 2,4
I need the following results
10 2015-01-26 00:00:00 2,3
11 2015-01-11 00:00:00 2,4
I am trying out this query
select *
from etl_change_fact
where ingest_date = (select max(ingest_date)
from etl_change_fact);
However, this gives me only
10 2015-01-26 00:00:00 2,3
this record.
But, I want all unique records(change_id) with
(a) max(ingest_date) AND
(b) verb columns priority being (2 - First preferred ,1 - Second preferred ,3 - last preferred)
Hence, I need the following results
10 2015-01-26 00:00:00 2,3
11 2015-01-11 00:00:00 2,4
Please help me to efficiently query it.
P.S :
I am not to index ingest_date because I am going to set it as "distribution key" in Distributed Computing setup.
I am newbie to Data Warehouse and querying.
Hence, please help me with optimized way to hit my TB sized DB.
This is a typical "greatest-n-per-group" problem. If you search for this tag here, you'll get plenty of solutions - including MySQL.
For Postgres the quickest way to do it is using distinct on (which is a Postgres proprietary extension to the SQL language)
select distinct on (issue_id) issue_id, ingest_date, verb, status
from etl_change_fact
order by issue_id,
case verb
when 2 then 1
when 1 then 2
else 3
end, ingest_date desc;
You can enhance your original query to use a co-related sub-query to achieve the same thing:
select f1.*
from etl_change_fact f1
where f1.ingest_date = (select max(f2.ingest_date)
from etl_change_fact f2
where f1.issue_id = f2.issue_id);
Edit
For an outdated and unsupported Postgres version, you can probably get away using something like this:
select f1.*
from etl_change_fact f1
where f1.ingest_date = (select f2.ingest_date
from etl_change_fact f2
where f1.issue_id = f2.issue_id
order by case verb
when 2 then 1
when 1 then 2
else 3
end, ingest_date desc
limit 1);
SQLFiddle example: http://sqlfiddle.com/#!15/3bb05/1

How to group by one column and summation of second?

tsql newbie here.
I have a table, similar to this one:
CarId CarName UserId RentedTimes CrashedTimes
`````````````````````````````````````````````````````````
1 Ferrari 1 2 0
2 DB9 1 5 0
3 Ferrari 2 4 0
4 Audi 3 1 0
5 Audi 1 1 0
Assuming the table is called 'Cars', I am trying to select total number of times each of cars were rented. According to the table above, Ferrari was rented total of 6 times, DB9 five times and Audi twice.
I tried doing this:
select CarName, SUM(RentedTimes)
from Cars
group by CarName, RentedTimes
order by RentedTimes desc
but, it is returning two rows of ferrari with 2,4 as rented times and so on..
How do I select all cars, and total times each were rented, please?
Thanks
Edited the query to include sort order, sorry.
select CarName, SUM(RentedTimes)
from Cars
group by CarName
ORDER BY SUM(RentedTimes) DESC
Try this way.. removed RentedTimes from group by

T-SQL Determine Status Changes in History Table

I have an application which logs changes to records in the "production" table to a "history" table. The history table is basically a field for field copy of the production table, with a few extra columns like last modified date, last modified by user, etc.
This works well because we get a snapshot of the record anytime the record changes. However, it makes it hard to determine unique status changes to a record. An example is below.
BoxID StatusID SubStatusID ModifiedTime
1 4 27 2011-08-11 15:31
1 4 11 2011-08-11 15:28
1 4 11 2011-08-10 09:07
1 5 14 2011-08-09 08:53
1 5 14 2011-08-09 08:19
1 4 11 2011-08-08 14:15
1 4 9 2011-07-27 15:52
1 4 9 2011-07-27 15:49
1 2 8 2011-07-26 12:00
As you can see in the above table (data comes from the real system with other fields removed for brevity and security) BoxID 1 has had 9 changes to the production record. Some of those updates resulted in statuses being changed and some did not, which means other fields (those not shown) have changed.
I need to be able, in TSQL, to extract from this data the unique status changes. The output I am looking for, given the above input table, is below.
BoxID StatusID SubStatusID ModifiedTime
1 4 27 2011-08-11 15:31
1 4 11 2011-08-10 09:07
1 5 14 2011-08-09 08:19
1 4 11 2011-08-08 14:15
1 4 9 2011-07-27 15:49
1 2 8 2011-07-26 12:00
This is not as easy as grouping by StatusID and SubStatusID and taking the min(ModifiedTime) then joining back into the history table since statuses can go backwards as well (see StatusID 4, SubStatusID 11 gets set twice).
Any help would be greatly appreciated!
Does this do work for you
;WITH Boxes_CTE AS
(
SELECT Boxid, StatusID, SubStatusID, ModifiedTime,
ROW_NUMBER() OVER (PARTITION BY Boxid ORDER BY ModifiedTime) AS SEQUENCE
FROM Boxes
)
SELECT b1.Boxid, b1.StatusID, b1.SubStatusID, b1.ModifiedTime
FROM Boxes_CTE b1
LEFT OUTER JOIN Boxes_CTE b2 ON b1.Boxid = b2.Boxid
AND b1.Sequence = b2.Sequence + 1
WHERE b1.StatusID <> b2.StatusID
OR b1.SubStatusID <> b2.SubStatusID
OR b2.StatusID IS NULL
ORDER BY b1.ModifiedTime DESC
;
Select BoxID,StatusID,SubStatusID FROM Staty CurrentStaty
INNER JOIN ON
(
Select BoxID,StatusID,SubStatusID FROM Staty PriorStaty
)
Where Staty.ModifiedTime=
(Select Max(PriorStaty.ModifiedTime) FROM PriorStaty
Where PriortStaty.ModifiedTime<Staty.ModifiedTime)
AND Staty.BoxID=PriorStaty.BoxID
AND NOT (
Staty.StatusID=PriorStaty.StatusID
AND
Staty.SubStatusID=PriorStaty.StatusID
)