Condition and max reference in redshift window function - amazon-redshift

I have a list of dates, accounts, and sources of data. I'm taking the latest max date for each account and using that number in my window reference.
In my window reference, I'm using row_number () to assign unique rows to each account and sources of data that we're receiving and sorting it by the max date for each account and source of data. The end result should list out one row for each unique account + source of data combination, with the max date available in that combination. The record with the highest date will have 1 listed.
I'm trying to set a condition on my window function where only rows that populate with 1 are listed in the query, while the other ones are not shown at all. This is what I have below and where I get stuck:
SELECT
date,
account,
data source,
MAX(date) max_date,
ROW_NUMBER () OVER (PARTITION BY account ORDER BY max_date) ROWNUM
FROM table
GROUP BY
date,
account,
data source
Any help is greatly appreciated. I can elaborate on anything if necessary

If I understood your question correctly this SQL would do the trick
SELECT
date,
account,
data source,
MAX(date) max_date
FROM (
SELECT
date,
account,
data source,
MAX(date) max_date,
ROW_NUMBER () OVER (PARTITION BY account ORDER BY max_date) ROWNUM
FROM table
GROUP BY
date,
account,
data source
)
where ROWNUM = 1

If you do not need the row number for anything other than uniqueness then a query like this should work:
select distinct t.account, data_source, date
from table t
join (select account, max(date) max_date from table group by account) m
on t.account=m.account and t.date=m.max_date
This can still generate two records for one account if two records for different data sources have the identical date. If that is a possibility then mdem7's approach is probably best.
It's a bit unclear from the question but if you want each combination of account and data_source with its max date making sure there are no duplicates, then distinct should be enough:
select distinct account, data_source, max(date) max_date
from table t
group by account, data_source

Related

How to get latest data for a column when using grouping in postgres

I am using postgres alongside sequelize. I have encountered a case where I need to write a coustom query which groups the records are a particular field. I know for the remaning columns that are not used for grouping, I need to use a aggregate function like SUM. But the problem is that for some columns I need to get the one what is the latest one (DESC sorted by created_at). I see no function in sql to do so. Is my only option to write subqueries or is there a better way? Thanks?
For better understanding, If you look at the below picture, I want the group the records with address. So after the query there should only be two records, one with sydney and the other with new york. But when it comes to the distance, I want the result of the query to contain the distance form the row that was most recently created, i.e with the latest created_at.
so the final two query results should be:
sydney 100 2022-09-05 18:14:53.492131+05:45
new york 40 2022-09-05 18:14:46.23328+05:45
select address, distance, created_at
from(
select address, distance, created_at, row_number() over(partition by address order by created_at DESC) as rn
from table) x
where rn = 1

How to extend dynamic schema with views in Hasura and Postgres?

So I am trying and struggling for few days to extend the schema with the custom groupby using something like this
I have a table with few fields like id, country, ip, created_at.
Then I am trying to get them as groups. For example, group the data based on date, hourly of date, or based on country, and based on country with DISTINCT ip.
I am zero with SQLs honestly. But I tried to play around and get what I want. Here's an example.
SELECT Hour(created_at) AS date,
COUNT(*) AS count
FROM session where CAST(created_at AS date) = '2021-04-05'
GROUP BY Hour(created_at)
ORDER BY date;
SELECT country,
count(*) AS count from (SELECT * FROM session where CAST(created_at AS date) <= '2021-05-12' GROUP BY created_at) AS T1
GROUP BY country;
SELECT country, COUNT(*) as count
FROM (SELECT DISTINCT ip, country FROM session) AS T1
GROUP BY country;
SELECT DATE(created_at) AS date,
COUNT(*) AS count
FROM session
GROUP BY DATE(created_at)
ORDER BY date;
Now I am struggling with two things.
How do I make the date as variables? I mean, if I want to group them for a particular date range/ or today's data hourly, or per quarter gap (more of configurable), how do I add the variables in Hasura's Raw SQL?
Also for this approach I have to add schema for each one of them? Like this
CREATE
OR REPLACE VIEW "public"."unique_session_counts_date" AS
SELECT
date(session.created_at) AS date,
count(*) AS count
FROM
session
GROUP BY
(date(session.created_at))
ORDER BY
(date(session.created_at));
Is there a way to make it more generalized? What I mean is, if it
was in Nodejs I could have done something like
return rawQuery(
`
select ${field} x, count(*) y
from ${table}
where website_id=$1
and created_at between $2 and $3
${domainFilter}
${urlFilter}
group by 1
order by 2 desc
`,
params,
);
In this case, based on whatever field and where clause I send, one query would do the trick for me. Can do something similar in hasura?
Thank you so much in advance.
How do I make the date as variables? I mean, if I want to group them for a particular date range/ or today's data hourly, or per quarter gap (more of configurable), how do I add the variables in Hasura's Raw SQL?
My first thought is this. If you're thinking about passing in variables via a GraphQL for example, the GraphQL would look something like:
query MyQuery {
unique_session_counts_date(where: {created_at: {_gte: "<start date here>", _lte: "<end date here>"}}) {
<...any fields, rollups, etc here...>
}
}
The underlying view/query would follow the group by and order by that you've detailed. Then you'd be able to submit a query of the graphql query and just pass in the pertinent parameters like the $1, $2, and $3 in the raqQuery call.
Also for this approach I have to add schema for each one of them?
The schema? The view? I don't think a view specifically would be required, if a multilevel select or similar query can handle it and perform then a view wouldn't particularly be needed.
That's my first stab at the problem. I'm going to try to work through this problem in a few hours via a Twitch stream # HasuraHQ if you can join, happy to walk through it live.

SQL SSRS aggregate fuctions

I am trying to figure out the aggregate functions in SQL SSRS to give me to sum of total sales for the given information by YEAR. I need to combine the year, the months within that year and provide the total sum of sales for that year. For example: for 2018 I need to combine month's 2-12 and provide the total sum, for 2019 combine 1-12 and provide total sum and so on.
enter image description here
I'm not sure where to begin on this one as I am new to SQL SSRS. Any help would be appreciated!
UPDATE:
Ideally I want this to be the end result:
id Year Price
102140 2019 ($XXXXX.XX)
102140 2018 ($XXXXX.XX)
102140 2017 ($XXXXX.XX)
And so on.
your query:
Select customer_id
, year_ordered
--, month_ordered
--, extended_price
--, SUM(extended_price) OVER (PARTITION BY year_ordered) AS year_total
, SUM(extended_price) AS year_total
From customer_order_history
Where customer_id = '101646'
Group By
customer_id
, year_ordered
, extended_price
--, month_ordered
Provides this:
enter image description here
multiple "years_ordered" because it is still using each month and that months SUM of price.
There are two approaches.
Do this in your dataset query:
SELECT Customer_id, year_ordered, SUM(extended_price) AS Price
FROM myTable
GROUP BY Customer_id, year_ordered
This option is best when you will never need the month values themselves in the report (i.e. you don't intend to have a drill down to the month data)
Do this in SSRS
By default you will get a RowGroup called "Details" (look under the main design area and you will row groups and column groups).
You can right-click this and add grouping for both customer_id and year_ordered. You can then change the extended_price textbox's value property to =SUM(Fields!extended_price.Value)
You could use a window function in your SQL:
select [year], [month], [price], SUM(PRICE) OVER (PARTITION BY year) as yearTotal
from myTable

Extract amount for the minimum date in Postgres

Very basic: I have a table with dates, account and amount done by a particular account on that date. I am stuck on a very basic problem - get the amount for the minimum date per account.
Input:
Desired:
If I do the query below it obviously returns the grouping by the amount, too.
SELECT account_ref AS account_alias,
Min(timestamp_made) AS reg_date,
amount
FROM stg_payment_mysql
GROUP BY account_ref,
amount
Perhaps the most performant way to do this would be to use DISTINCT ON:
SELECT DISTINCT ON (account) account, date, amount
FROM stg_payment_mysql
ORDER BY account, date;
A more general ANSI SQL approach to this would use ROW_NUMBER:
WITH cte AS (
SELECT account, date, amount,
ROW_NUMBER() OVER (PARTITION BY account ORDER BY date) rn
FROM stg_payment_mysql
)
SELECT account, date, amount
FROM cte
WHERE rn = 1
ORDER BY account;

Calculate previous order date and status in Postgres

I have a simple table of orders, and I need to calculate some stats for each order. Essentially I have a Postgres db with fields:
Order_ID (unique), User_ID, Created_at (date), City, Total
I want to write a query that will generate, for each Order_ID:
1) the Created_at date of the user's most recent order prior to the current Order_ID (so if a customer placed order with Order_ID=200005b on 9/20/14, what is the date of that user's most recent previous order?)
2) another field showing a user's "Status" based on this date, given the following cases:
-- if this is user's first order, Status="new";
-- if most recent previous order date <= 60 days before the given/current order, Status="active";
-- if most recent previous order date > 60 days before the given/current order, Status="reactivated"
I think there's a way to write this query using some nested SELECTS, and maybe a self-join, but I don't know PostgreSQL well enough to understand the ordering of queries. I have been able to generate an "Order_N" field using the following query that I could use to lookup (Order_N)-1 to find the date, but I get stuck once trying to use that in nesting.
SELECT
user_id,
order_id,
created_at,
row_number() over (partition by user_id order by created_at ) as order_n
order by user_id, created_at;
Does anyone have any ideas?