Get the first live record on each quarter - postgresql

we have a pricing table and I need to get the first live record on each quarter, the table structure is like this:
record_id (int)
start_date (date)
price (decimal)
live (boolean)
I need to be able to get the first "live" record on each quarter.
So far, I've been able to do this:
SELECT DISTINCT EXTRACT(QUARTER FROM start_date::TIMESTAMP) as quarter,
EXTRACT(YEAR FROM start_date::TIMESTAMP) as year,
distinct start_date,
live
FROM record_pricing rp
group by year, quarter,record_instance_uid
order by year,quarter;
I get this:
As you can see there are live and not live records there in the results, I just need the first live record on each Q, as highlighted in the picture above as an example.

you can use:
SELECT *, ROW_NUMBER() OVER(PARTITION BY year,quarter order by start_date asc) as Rank,
FROM (
SELECT EXTRACT(QUARTER FROM start_date::TIMESTAMP) as quarter,
EXTRACT(YEAR FROM start_date::TIMESTAMP) as year,
record_instance_uid,live,start_date
FROM record_pricing rp
)Tab
where tab.Rank=1

Related

postgresql get weekly average of cases with daily data

I have a table called Table1. I am trying to get the weekly average, but I only have daily data. My table contains the following attributes: caseID, date, status and some other (irrelevant) attributes. With the following query, I made the following table which comes close to what I want:
However, I would like to add a average per week of the number of cases. I have look everywhere, but I am not sure how to include that. Has anybody any clues for how to add that.
Thanks.
To expand on #luuk's answer...
SELECT
date,
COUNT(id) as countcase,
EXTRACT(WEEK FROM date) AS weeknbr,
AVG(COUNT(id)) OVER (PARTITION BY EXTRACT(WEEK FROM date)) as weeklyavg
FROM table1
GROUP BY date, weeknbr
ORDER BY date, weeknbr
This is possible as the Aggregation / GROUP BY is applied before the window/analytic function.
select
date,
countcase,
extract(week from date) as weeknbr,
avg(countcase) over (partition by extract(week from date)) as weeklyavg
from table1;

HQL: Max date of previous month

Good morning,
I have a problem I've been trying to solve for but am getting now where.
I need to find the max date of the previous month. Normally I would just use the following to find the last day of the previous month: last_day(add_months(current_date, -1)
However, this particular data set doesn't always have the last day with data. E.g. Last day in the data for May was May 30th. Obviously if i try using the syntax above it would return no data because it would be looking for 5/31.
So is there a way to find the "max" day available in the data of the previous month? Or the month prior etc.?
For example like this (two scans of table: one in subquery to find max date and one in main query):
select *
from mytable
where as_of_date in (select max(as_of_date) from mytable where as_of_date between first_day(add_months(current_date, -1)) and last_day(add_months(current_date, -1))
Or (single scan + analytic function) like this
select col1 ... colN
from
(
select t.*, rank() over (partition by month (t.as_of_date) order by t.as_of_date desc) rnk
from mytable t
where --If you have partition on date, this WHERE may improve performance
t.as_of_date between first_day(add_months(current_date, -1)) and last_day(add_months(current_date, -1))
)s
where rnk=1

Condition and max reference in redshift window function

I have a list of dates, accounts, and sources of data. I'm taking the latest max date for each account and using that number in my window reference.
In my window reference, I'm using row_number () to assign unique rows to each account and sources of data that we're receiving and sorting it by the max date for each account and source of data. The end result should list out one row for each unique account + source of data combination, with the max date available in that combination. The record with the highest date will have 1 listed.
I'm trying to set a condition on my window function where only rows that populate with 1 are listed in the query, while the other ones are not shown at all. This is what I have below and where I get stuck:
SELECT
date,
account,
data source,
MAX(date) max_date,
ROW_NUMBER () OVER (PARTITION BY account ORDER BY max_date) ROWNUM
FROM table
GROUP BY
date,
account,
data source
Any help is greatly appreciated. I can elaborate on anything if necessary
If I understood your question correctly this SQL would do the trick
SELECT
date,
account,
data source,
MAX(date) max_date
FROM (
SELECT
date,
account,
data source,
MAX(date) max_date,
ROW_NUMBER () OVER (PARTITION BY account ORDER BY max_date) ROWNUM
FROM table
GROUP BY
date,
account,
data source
)
where ROWNUM = 1
If you do not need the row number for anything other than uniqueness then a query like this should work:
select distinct t.account, data_source, date
from table t
join (select account, max(date) max_date from table group by account) m
on t.account=m.account and t.date=m.max_date
This can still generate two records for one account if two records for different data sources have the identical date. If that is a possibility then mdem7's approach is probably best.
It's a bit unclear from the question but if you want each combination of account and data_source with its max date making sure there are no duplicates, then distinct should be enough:
select distinct account, data_source, max(date) max_date
from table t
group by account, data_source

Specific number of quarters from date using HiveQL

I am trying to bring a specific number (8) of quarters using a traction date from table. the date format is YYYYMMDD
I could write a select using case to display specific quarter depending on current month.
I could find beginning of month using trunc function but could not find a logic to bring the last 8 quarters of data
Convert date to Hive format first.
Then use DENSE_RANK() to number rows by quarters (order by year desc and quarter desc) then filter by rnk<=8:
select * from
(
--calculate DENSE_RANK
select s.*, DENSE_RANK() over(order by year(your_date) desc, quarter(your_date) desc) as rnk
from
(
--convert date to YYYY-MM-DD format
select t.*, from_unixtime(unix_timestamp(),'yyyyMMdd') your_date
from table_name t
--Also restrict your dataset to select only few last years here
--because you do not need to scan all data
--so add WHERE clause here
)s
)s where rnk<=8;
See manual on functions here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
You can optimize this query knowing your data and how your table is partitioned, restrict the dataset. Also add partition by clause to the over() if you need to query last 8 quarters for each key.

PostgreSQL subquery not working

What's wrong with this query?
select extract(week from created_at) as week,
count(*) as received,
(select count(*) from bugs where extract(week from updated_at) = a.week) as done
from bugs as a
group by week
The error message is:
column a.week does not exist
UPDATE:
following the suggestion of the first comment, I tried this:
select a.extract(week from created_at) as week,
count(*) as received, (select count(*)
from bugs
where extract(week from updated_at) = a.week) as done from bugs as a group by week
But it doesn't seem to work:
ERROR: syntax error at or near "from"
LINE 1: select a.extract(week from a.created_at) as week, count(*) a...
As far as I can tell you don't need the sub-select at all:
select extract(week from created_at) as week,
count(*) as received,
sum( case when extract(week from updated_at) = extract(week from created_at) then 1 end) as done
from bugs
group by week
This counts all bugs per week and counts those that are updated in the same week as "done".
Note that your query will only report correct values if you never have more than one year in your table.
If you have more than one year of data in the table you need to include the year in the comparison as well:
select to_char(created_at, 'iyyy-iw') as week,
count(*) as received,
sum( case when to_char(created_at, 'iyyy-iw') = to_char(updated_at, 'iyyy-iw') then 1 end) as done
from bugs
group by week
Note that I used IYYY an IW to cater for the ISO definition of the year and the week around the year end/start.
Maybe a little explanation on why your original query did not work would be helpful:
The "outer" query uses two aliases
a table alias for bugs named a
a column alias for the expression extract(week from created_at) named week
The only place where the column alias week can be used is in the group by clause.
To the sub-select (select count(*) from bugs where extract(week from updated_at) = a.week)) the alias a is visible, but not the alias week (that's how the SQL standard is defined).
To get your subselect working (in terms of column visibility) you would need to reference the full expression of the "outer" column:
(select count(*) from bugs b where extract(week from b.updated_at) = extract(week from a.created_at))
Note that I introduced another table alias b in order to make it clear which column stems from which alias.
But even then you'd have a problem with the grouping as you can't reference an ungrouped column like that.
that could work as well
with origin as (
select extract(week from created_at) as week, count(*) as received
from bugs
group by week
)
select week, received,
(select count(*) from bugs where week = extract(week from updated_at) )
from origin
it should have a good performance