Output all values of group even if null for each day of the year postgresql - postgresql

my problem is the following:
I have a movement table that documents whenever a car is moved from one station to another or a station gets supplied with more cars. The tables entries are: id, productId, quantity, updatedAt, date, locationId, createdAt.
I want so display the sum of the cars available for each day of the year, grouped by productId and locationId.
with you as(
SELECT "productId"
, "locationId"
, "date" as start_date
, SUM(SUM("Movement"."quantity")) OVER (PARTITION BY "productId", "locationId" ORDER BY "date") as schau
, "quantity"
, LEAD ("date") OVER (PARTITION BY "productId", "locationId" ORDER BY "date") as end_date
FROM "Movement"
GROUP BY 1,2,3,5
Order BY "date"),
calendar as (
select date '2022-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 365) n
)
SELECT * FROM calendar
left join you on calendar.calendar_date between start_date and end_date
The SumSum delivers the rolling sum of the products available. Check. The LEAD function allows to define periods per movement which are necessary in order to match the calendar I get by using the generate_series function. Like this I get a date for every day of the year. Check.
The only problem I have, is that my output won´t show me products that don´t have an entry by the time of the day. Is there any way to ALWAYS show all members of a group even if there is no entry for them? In my case I want to show all possible product and locationId s even if for this date there is no entry in my table? (I want to output 0 in that case)

Related

How to join on closest date in Postgresql

Suppose, I have following tables
product_prices
product|price|date
-------+-----+----------
apple |10 |2014-03-01
-------+-----+----------
apple |20 |2014-05-02
-------+-----+----------
egg |2 |2014-03-03
-------+-----+----------
egg |4 |2015-10-12
purchases:
user|product|date
----+-------+----------
John|apple |2014-03-02
----+-------+----------
John|apple |2014-06-03
----+-------+----------
John|egg |2014-08-13
----+-------+----------
John|egg |2016-08-13
What I need is table similar to this:
name|product|purchase date |price date|price
----+-------+--------------+----------+-----
John|apple |2014-03-02 |2014-03-01|10
----+-------+--------------+----------+-----
John|apple |2014-06-03 |2014-05-02|20
----+-------+--------------+----------+-----
John|egg |2014-08-13 |2014-08-13|2
----+-------+--------------+----------+-----
John|egg |2016-08-13 |2015-10-12|4
Or "what is the price for product at this day". Where price is calculated based on date from products table.
On real DB I tried to use something similar to:
SELECT name, product, pu.date, pp.date, pp.price
FROM purchases AS pu
LEFT JOIN product_prices AS pp
ON pu.date = (
SELECT date
FROM product_prices
ORDER BY date DESC LIMIT 1);
But I keep either getting only left part of table (with (null) instead of product dates and prices) or many rows with all the combinations of prices and dates.
I would suggest changing product_prices table to use a daterange column instead (or at least a start_date and an end_date).
You can use an exclusion constraint to make sure you never have overlapping ranges for one product and an insert trigger that "closes" the "current" prices and creates a new unbounded range for the newly inserted price.
A daterange can efficiently be indexed and with that in place the query gets as easy as:
SELECT name, product, pu.date, pp.valid_during, pp.price
FROM purchases AS pu
LEFT JOIN product_prices AS pp ON pu.date <# pp.valid_during
(assuming the range column is named valid_during)
The exclusion constraint would only work however if the product was an integer (not a varchar) - but I guess your real product_purchases table uses a foreign key to some product table anyway (which is an integer).
The new table definitions could look something like this:
create table purchase_prices
(
product_id integer not null references products,
price numeric(16,4) not null,
valid_during daterange not null
);
And the constraint that prevents overlapping ranges:
alter table purchase_prices
add constraint check_price_range
exclude using gist (product_id with =, valid_during with &&);
The constraint needs the btree_gist extension.
As always improving query speed comes with a price and in this case it's the higher maintenance costs for the GiST index. You would need to run some tests to see if the easier (and most probably much faster) query outweighs the slower insert performance on purchase_prices.
Look at your scalar sub-query very closely. It is not correlated back to the outer query. In other words, it will return the same result every time: the latest date in the product_prices table. Period. Think about the query out of context:
SELECT date
FROM product_prices
ORDER BY date DESC LIMIT 1
There are two problems with it:
It will return 2015-10-12 for every row in the join and ultimately, nothing was purchased on that date, hence, null.
Your approximation of closest is that the dates are equal. Unless you have a product_prices row for every product for every single date, you'll always have misses. "Closest" implies distance and ranking.
WITH close_prices_by_purchase AS (
SELECT
p.user,
p.product,
p.date pp.date,
pp.price,
row_number() over (partition by pp.product, order by pp.date desc) as distance -- calculate distance between purchase date and price date
FROM purchases AS p
INNER JOIN product_prices AS pp on pp.product = p.product
WHERE pp.date < p.date
)
SELECT user as name, product, pu.date as purchase_date, pp.date as price_date, price
FROM close_prices_by_purchase AS cpbp
WHERE distance = 1; -- shortest distance
You can try something like this, although I am sure there's a better way:
with diffs as (
select
a.*,
b."date" as bdate,
b.price,
b."date" - a."date" as diffdays,
row_number() over (
partition by "user", a."product", a."date"
order by "user", a."product", a."date", b."date" - a."date" desc
) as sr
from purchases a
inner join product_prices b on a.product = b.product
where b."date" - a."date" < 1
)
select
"user" as "name",
product,
"date" as "purchase date",
bdate as "price date",
price
from diffs
where sr = 1
Example: https://www.db-fiddle.com/f/dwQ9EXmp1SdpNpxyV1wc6M/0
Explanation
I attempted to join both tables and find the difference between dates of purchase and price, and ranked them by closest date prior to the purchase. Rank of 1 will go to the closest date. Then, data with rank of 1 was extracted.
This is a great place to use date ranges! We know the start date of the price range and we can use a window function to get the next date. At that point, it's really easy to figure out the price on any day.
with price_ranges as
(select product,
price,
date as price_date,
daterange(date, lead(date, 1)
OVER (partition by product order by date), '[)'
) as valid_price_range from product_prices
)
select "user" as name,
purchases.product,
purchases.date,
price_date,
price
from purchases
join price_ranges on purchases.product = price_ranges.product
and purchases.date <# price_ranges.valid_price_range
order by purchases.date;

How to include three or more aggregators in a sql query?

I have a table called retail which stores items and their price along with date of purchase. I want to find out total monthly count of unique items sold.
This is the sql query I tried
select date_trunc('month', date) as month, sum(count(distinct(items))) as net_result from retail group by month order by date;
But I get the following error
ERROR: aggregate function calls cannot be nested
Now I searched for similar stackoverflow posts one of which is postgres aggregate function calls may not be nested and but I am unable to replicate it to create the correct sql query.
What am I doing wrong?
From your description, it doesn't seem like you need to nest the aggregate functions, the count(distinct item) construction will give you a count of distinct items sold, like so:
select date_trunc('month', date) as month
, count(distinct items) as unique_items_sold
, count(items) as total_items_sold
from retail
group by "month"
order by "month" ;
If you had a column called item_count (say if there was row in the table for each item sold, but a sale might include, say, three widgets)
select date_trunc('month', date) as month
, count(distinct items) as unique_items_sold
, sum(item_count) as total_items_sold
from retail
group by "month"
order by "month" ;
Use subqueries:
Select month, sum(citems) as net_result
from
(select
date_trunc('month', date) as month,
count(distinct(items)) as citems
from
retail
group by month
order by date
)
I am suspect your group by statement will throw an Error because your month column are condition column and you cannot put in the same level in your query so put your full expression instead.
select
month,
sum(disct_item) as net_results
from
(select
date_trunc('month', date) as month,
count(distinct items) as disct_item
from
retail
group by
date_trunc('month', date)
order by
date) as tbl
group by
month;
You cannot make nested aggregate so you wrap first count to subquery and after that in outer you make sum to do the operation.

How to get the minimum date of same column in DB2

Need to get order qty of the minimum ADATE
Im using below query and getting 12 records. Now I want to select orderqty of minimum ADATE which is 06-NOV-2018(2018-11-06).
For every customer(will get multiple records), i need to get the Order_Qty of minimum ADATE column.
select
Customer ,
OrderID ,
LocationID ,
Order_Qty,Sent_date ,ADATE
from
(
select
OrderID ,
LocationID ,
Sent_date ,
Order_Qty ,
Customer ,
TimeStampA
from ARC_TBL
)
obn
inner join
(
select
ADATE ,TimeStampA
from trackTBL snt
)snt
on obn.TimeStampA = snt.TimeStampA
where Customer='ABC' and OrderID='XYZ100' and Sent_date='2018-11-18' and LocationID='250';
SELECT QTY, ADATE
FROM table
ORDER BY ADATE
FETCH FIRST 1 ROW ONLY
Explain your question in more detail and you will get better answers.

Cassandra error - Order By only supported when partition key is restricted by EQ or IN

Here is the table I'm creating, this table contains information about players that played the last mundial cup.
CREATE TABLE players (
group text, equipt text, number int, position text, name text,
day int, month int, year int,
club text, liga text, capitan text,
PRIMARY key (name, day, month, year));
When doing the following query :
Obtain 5 names from the oldest players that were captain of the selection team
Here is my query:
SELECT name FROM players WHERE captain='YES' ORDER BY year DESC LIMIT 5;
And I am getting this error:
Order By only supported when partition key is restricted by EQ or IN
I think is a problem about the table I'm creating, but I don't know how to solve it.
Thanks.
Your table definition is incorrect for the query you're trying to run.
You've defined a table with partition key "name", clustering columns "day", "month", "year", and various other columns.
In Cassandra all SELECT queries must specify a partition key with EQ or IN. You're permitted to include some or all of the clustering columns, using the equality and inequality operators you're used to in SQL.
The clustering columns must be included in the order they're defined. An ORDER BY clause can only include clustering columns that aren't already specific by an EQ, again in the order they're defined.
For example, you can write the query
select * from players where name = 'fiticida' and day < 5 order by month desc;
or
select * from players where name = 'fiticida' and day = 10 and month > 2 order by month asc;
but not
select * from players where name = 'fiticida' and year = 2017;
which doesn't include "day" or "month"
and not
select * from players where name = 'fiticida' and day = 5 order by year desc;
which doesn't include "month".
Here is the official documentation on the SELECT query.
To satisfy your query, the table needs
A partition key specified by EQ or IN: "captain" will work
An ORDER BY clause using the leftmost clustering column: put "year" to the left of "month" and "day" in your primary key definition

array_agg group by and null

Given this table:
SELECT * FROM CommodityPricing order by dateField
"SILVER";60.45;"2002-01-01"
"GOLD";130.45;"2002-01-01"
"COPPER";96.45;"2002-01-01"
"SILVER";70.45;"2003-01-01"
"GOLD";140.45;"2003-01-01"
"COPPER";99.45;"2003-01-01"
"GOLD";150.45;"2004-01-01"
"MERCURY";60;"2004-01-01"
"SILVER";80.45;"2004-01-01"
As of 2004, COPPER was dropped and mercury introduced.
How can I get the value of (array_agg(value order by date desc) ) [1] as NULL for COPPER?
select commodity,(array_agg(value order by date desc) ) --[1]
from CommodityPricing
group by commodity
"COPPER";"{99.45,96.45}"
"GOLD";"{150.45,140.45,130.45}"
"MERCURY";"{60}"
"SILVER";"{80.45,70.45,60.45}"
SQL Fiddle
select
commodity,
array_agg(
case when commodity = 'COPPER' then null else price end
order by date desc
)
from CommodityPricing
group by commodity
;
To "pad" missing rows with NULL values in the resulting array, build your query on full grid of rows and LEFT JOIN actual values to the grid.
Given this table definition:
CREATE TEMP TABLE price (
commodity text
, value numeric
, ts timestamp -- using ts instead of the inappropriate name date
);
I use generate_series() to get a list of timestamps representing the years and CROSS JOIN to a unique list of all commodities (SELECT DISTINCT ...).
SELECT commodity, (array_agg(value ORDER BY ts DESC)) AS years
FROM generate_series ('2002-01-01 00:00:00'::timestamp
, '2004-01-01 00:00:00'::timestamp
, '1y') t(ts)
CROSS JOIN (SELECT DISTINCT commodity FROM price) c(commodity)
LEFT JOIN price p USING (ts, commodity)
GROUP BY commodity;
Result:
COPPER {NULL,99.45,96.45}
GOLD {150.45,140.45,130.45}
MERCURY {60,NULL,NULL}
SILVER {80.45,70.45,60.45}
SQL Fiddle.
I cast the array to text in the fiddle, because the display sucks and would swallow NULL values otherwise.