Why is this nested INNER JOIN not working in POSTGRESQL? - postgresql

I am using the Nothwind data base & working in pgAdmin, and my query is looking like this at the moment
SELECT
TO_CHAR (o.ShippedDate, 'yyyy.MM') AS Month
,o.OrderID
,Total
,SUM (Total) OVER PARTITION BY TO_CHAR (ShippedDate,
‘yyyy.MM’) ORDER BY O.OrderID) AS Running_Total
FROM public.orders O
INNER JOIN (
SELECT OrderID, SUM(Quantity*UnitPrice) AS Total
FROM public.order_details
GROUP BY OrderID
ORDER BY OrderID
) OD ON O.OrderID = OD.OrderID
WHERE
TO_CHAR (o.ShippedDate, 'yyyy.MM') IS NOT NULL
And is is not working, it says:
ERROR: column "o.shippeddate" must appear in the GROUP BY clause or be used in an aggregate function
LINE 2: TO_CHAR (o.ShippedDate, 'yyyy.MM') AS Month
Can you help me out what could be the issue? Thanks!
I fixed the query, so it is now the correct one.

Related

Facing issue in DB2 Sql error with SQLCODE=-181, SQLSTATE=22007, SQLERRMC=0;*N, DRIVER=3.61.75

I am facing an issue while using date formatting in where clause, while the same formatting works fine for another select query.
Working query using following condition in where clause:
select t1.x,t1.y,t2.z
from t1
inner join t2
where
TIMESTAMP(SUBSTR(20||t1.TRANSACTION_DATE,1,4)||'-'||SUBSTR(t1.TRANSACTION_DATE,3,2)||'-'||SUBSTR(t1.TRANSACTION_DATE,5,2)||' '||SUBSTR(t1.TRANSACTION_TIME,1,2)||':'||SUBSTR(t1.TRANSACTION_TIME,3,2)||':'||SUBSTR(t1.TRANSACTION_TIME,5,2))
BETWEEN '2018-06-01 00:00:00' AND '2018-06-18 12:01:00';
When the same query is used for t1 table and t3 table like:
select t1.x,t1.y,t3.z
from t1
inner join t3
where
TIMESTAMP(SUBSTR(20||t1.TRANSACTION_DATE,1,4)||'-'||SUBSTR(t1.TRANSACTION_DATE,3,2)||'-'||SUBSTR(t1.TRANSACTION_DATE,5,2)||' '||SUBSTR(t1.TRANSACTION_TIME,1,2)||':'||SUBSTR(t1.TRANSACTION_TIME,3,2)||':'||SUBSTR(t1.TRANSACTION_TIME,5,2))
BETWEEN '2018-06-01 00:00:00' AND '2018-06-18 12:01:00';
It does not work for the timestamp part.
Note: Transaction_date value is in '180618' format(yymmdd) in the table t1. Also the transaction_time is in 123030(hhmmss) format
Your Timestamp values have an error. You are trying to calculate the timestamp of 201806-06-18 12:30:30. That just won't work.
Change SUBSTR(20||t1.TRANSACTION_DATE,1,4) to SUBSTR(20||t1.TRANSACTION_DATE,1,2) in each query.
or you could replace that whole long substring with
timestamp_format(digits(t1.transaction_date) || digits(t1.transaction_time), 'YYMMDDHH24MISS')

Postgres : Need distinct records count

I have a table with duplicate entries and the objective is to get the distinct entries based on the latest time stamp.
In my case 'serial_no' will have duplicate entries but I select unique entries based on the latest time stamp.
Below query is giving me the unique results with the latest time stamp.
But my concern is I need to get the total of unique entries.
For example assume my table has 40 entries overall. With the below query I am able to get 20 unique rows based on the serial number.
But the 'total' is returned as 40 instead of 20.
Any help on this pls?
SELECT
*
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp,
COUNT(*) OVER() as total
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC OFFSET 0
LIMIT
10
) AS my_info
ORDER BY
serial_no asc
product_info table intially has this data
serial_no name timestamp
11212 pulp12 2018-06-01 20:00:01
11213 mango 2018-06-01 17:00:01
11214 grapes 2018-06-02 04:00:01
11215 orange 2018-06-02 07:05:30
11212 pulp12 2018-06-03 14:00:01
11213 mango 2018-06-03 13:00:00
After the distict query I got all unique results based on the latest
timestamp:
serial_no name timestamp total
11212 pulp12 2018-06-03 14:00:01 6
11213 mango 2018-06-03 13:00:00 6
11214 grapes 2018-06-02 04:00:01 6
11215 orange 2018-06-02 07:05:30 6
But total is appearing as 6 . I wanted the total to be 4 since it has
only 4 unique entries.
I am not sure how to modify my existing query to get this desired
result.
Postgres supports COUNT(DISTINCT column_name), so if I have understood your request, using that instead of COUNT(*) will work, and you can drop the OVER.
What you could do is move the window function to a higher level select statement. This is because window function is evaluated before distinct on and limit clauses are applied. Also, you can not include DISTINCT keyword within window functions - it has not been implemented yet (as of Postgres 9.6).
SELECT
*,
COUNT(*) OVER() as total -- here
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC
LIMIT
10
) AS my_info
Additionally, offset is not required there and one more sorting is also superfluous. I've removed these.
Another way would be to include a computed column in the select clause but this would not be as fast as it would require one more scan of the table. This is obviously assuming that your total is strictly connected to your resultset and not what's beyond that being stored in the table, but gets filtered out.
select count(*), serial_no from product_info group by serial_no
will give you the number of duplicates for each serial number
The most mindless way of incorporating that information would be to join in a sub query
SELECT
*
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp,
COUNT(*) OVER() as total
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC OFFSET 0
LIMIT
10
) AS my_info
join (select count(*) as counts, serial_no from product_info group by serial_no) as X
on X.serial_no = my_info.serial_no
ORDER BY
serial_no asc

Unable to get Percentile_Cont() to work in Postgresql

I am trying to calculate a percentile using the percentile_cont() function in PostgreSQL using common table expressions. The goal is find the top 1% of accounts regards to their balances (called amount here). My logic is to find the 99th percentile which will return those whose account balances are greater than 99% of their peers (and thus finding the 1 percenters)
Here is my query
--ranking subquery works fine
with ranking as(
select a.lname,sum(c.amount) as networth from customer a
inner join
account b on a.customerid=b.customerid
inner join
transaction c on b.accountid=c.accountid
group by a.lname order by sum(c.amount)
)
select lname, networth, percentile_cont(0.99) within group
order by networth over (partition by lname) from ranking ;
I keeping getting the following error.
ERROR: syntax error at or near "order"
LINE 2: ...ame, networth, percentile_cont(0.99) within group order by n..
I am thinking that perhaps I forgot a closing brace etc. but I can't seem to figure out where. I know it could be something with the order keyword but I am not sure what to do. Can you please help me to fix this error?
This tripped me up, too.
It turns out percentile_cont is not supported in postgres 9.3, only in 9.4+.
https://www.postgresql.org/docs/9.4/static/release-9-4.html
So you have to use something like this:
with ordered_purchases as (
select
price,
row_number() over (order by price) as row_id,
(select count(1) from purchases) as ct
from purchases
)
select avg(price) as median
from ordered_purchases
where row_id between ct/2.0 and ct/2.0 + 1
That query care of https://www.periscopedata.com/blog/medians-in-sql (section: "Median on Postgres")
You are missing the brackets in the within group (order by x) part.
Try this:
with ranking
as (
select a.lname,
sum(c.amount) as networth
from customer a
inner join account b on a.customerid = b.customerid
inner join transaction c on b.accountid = c.accountid
group by a.lname
order by networth
)
select lname,
networth,
percentile_cont(0.99) within group (
order by networth
) over (partition by lname)
from ranking;
I want to point out that you don't need a subquery for this:
select c.lname, sum(t.amount) as networth,
percentile_cont(0.99) within group (order by sum(t.amount)) over (partition by lname)
from customer c inner join
account a
on c.customerid = a.customerid inner join
transaction t
on a.accountid = t.accountid
group by c.lname
order by networth;
Also, when using table aliases (which should be always), table abbreviations are much easier to follow than arbitrary letters.

Creating 'Empty' Records for Days of the Month Without Records

I have a very simpl postgres (9.3) query that looks like this:
SELECT a.date, b.status
FROM sis.table_a a
JOIN sis.table_b b ON a.thing_id = b.thing_id
WHERE EXTRACT(MONTH FROM a.date) = 06
AND EXTRACT(YEAR FROM a.date) = 2015
Some days of the month of June do not exist in table_a and thus are obviously not joined to table_b. What is the best way to create records for these not represented days and assign a placeholder (e.g. 'EMPTY') to their 'status' column? Is this even possible to do using pure SQL?
Basically, you need LEFT JOIN and it looks like you also need generate_series() to provide the full set of days:
SELECT d.date
, a.date IS NOT NULL AS a_exists
, COALESCE(b.status, 'status_missing') AS status
FROM (
SELECT date::date
FROM generate_series('2015-06-01'::date
, '2015-06-30'::date
, interval '1 day') date
) d
LEFT JOIN sis.table_a a USING (date)
LEFT JOIN sis.table_b b USING (thing_id)
ORDER BY 1;
Use sargable WHERE conditions. What you had cannot use a plain index on date and has to default to a much more expensive sequential scan. (There are no more WHERE conditions in my final query.)
Aside: don't use the basic type name (and reserved word in standard SQL) date as identifier.
Related (2nd chapter):
PostgreSQL: running count of rows for a query 'by minute'

array_agg group by and null

Given this table:
SELECT * FROM CommodityPricing order by dateField
"SILVER";60.45;"2002-01-01"
"GOLD";130.45;"2002-01-01"
"COPPER";96.45;"2002-01-01"
"SILVER";70.45;"2003-01-01"
"GOLD";140.45;"2003-01-01"
"COPPER";99.45;"2003-01-01"
"GOLD";150.45;"2004-01-01"
"MERCURY";60;"2004-01-01"
"SILVER";80.45;"2004-01-01"
As of 2004, COPPER was dropped and mercury introduced.
How can I get the value of (array_agg(value order by date desc) ) [1] as NULL for COPPER?
select commodity,(array_agg(value order by date desc) ) --[1]
from CommodityPricing
group by commodity
"COPPER";"{99.45,96.45}"
"GOLD";"{150.45,140.45,130.45}"
"MERCURY";"{60}"
"SILVER";"{80.45,70.45,60.45}"
SQL Fiddle
select
commodity,
array_agg(
case when commodity = 'COPPER' then null else price end
order by date desc
)
from CommodityPricing
group by commodity
;
To "pad" missing rows with NULL values in the resulting array, build your query on full grid of rows and LEFT JOIN actual values to the grid.
Given this table definition:
CREATE TEMP TABLE price (
commodity text
, value numeric
, ts timestamp -- using ts instead of the inappropriate name date
);
I use generate_series() to get a list of timestamps representing the years and CROSS JOIN to a unique list of all commodities (SELECT DISTINCT ...).
SELECT commodity, (array_agg(value ORDER BY ts DESC)) AS years
FROM generate_series ('2002-01-01 00:00:00'::timestamp
, '2004-01-01 00:00:00'::timestamp
, '1y') t(ts)
CROSS JOIN (SELECT DISTINCT commodity FROM price) c(commodity)
LEFT JOIN price p USING (ts, commodity)
GROUP BY commodity;
Result:
COPPER {NULL,99.45,96.45}
GOLD {150.45,140.45,130.45}
MERCURY {60,NULL,NULL}
SILVER {80.45,70.45,60.45}
SQL Fiddle.
I cast the array to text in the fiddle, because the display sucks and would swallow NULL values otherwise.