I have the following query code
query = """
with double_entry_book as (
SELECT to_address as address, value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE to_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
-- credits
SELECT from_address as address, -value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE from_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
)
SELECT address,
sum(value) / 1000000000000000000 as balance
from double_entry_book
group by address
order by balance desc
LIMIT 15000000
"""
In the last part, I want to drop rows where "balance" is less than, let's say, 0.02 and then group, order, etc. I imagine this should be a simple code. Any help will be appreciated!
We can delete on a CTE and use returning to get the id's of the rows being deleted, but they still exist until the transaction is comitted.
CREATE TABLE t (
id serial,
variale int);
insert into t (variale) values
(1),(2),(3),(4),(5);
✓
5 rows affected
with del as
(delete from t
where variale < 3
returning id)
select
t.id,
t.variale,
del.id ids_being_deleted
from t
left join del
on t.id = del.id;
id | variale | ids_being_deleted
-: | ------: | ----------------:
1 | 1 | 1
2 | 2 | 2
3 | 3 | null
4 | 4 | null
5 | 5 | null
select * from t;
id | variale
-: | ------:
3 | 3
4 | 4
5 | 5
db<>fiddle here
The code below gives me the following results
Early: 7738
Late: 6586
On Time: 1720
How would I take this a step further and add a third column that finds the percentages?
Here is a link to the ERD and database set-up: https://www.postgresqltutorial.com/postgresql-sample-database/
WITH
t1
AS
(
SELECT *, DATE_PART('day', return_date - rental_date) AS days_rented
FROM rental
),
t2
AS
(
SELECT rental_duration, days_rented,
CASE WHEN rental_duration > days_rented THEN 'Early'
WHEN rental_duration = days_rented THEN 'On Time'
ELSE 'Late'
END AS rental_return_status
FROM film f, inventory i, t1
WHERE f.film_id = i.film_id AND t1.inventory_id = i.inventory_id
)
SELECT rental_return_status, COUNT(*) AS total_films_rented
FROM t2
GROUP BY 1
ORDER BY 2 DESC;
You can use a window function with one CTE table (instead of 2):
WITH raw_status AS (
SELECT rental_duration - DATE_PART('day', return_date - rental_date) AS days_remaining
FROM rental r
JOIN inventory i ON r.inventory_id=i.inventory_id
JOIN film f on f.film_id=i.film_id
)
SELECT CASE WHEN days_remaining > 0 THEN 'Early'
WHEN days_remaining = 0 THEN 'On Time'
ELSE 'Late' END AS rental_status,
count(*),
(100*count(*))/sum(count(*)) OVER () AS percentage
FROM raw_status
GROUP BY 1;
rental_status | count | percentage
---------------+-------+---------------------
Early | 7738 | 48.2298678633757168
On Time | 1720 | 10.7205185739217153
Late | 6586 | 41.0496135627025679
(3 rows)
Disclosure: I work for EnterpriseDB (EDB)
Use a window function to get the sum of the count column (sum(count(*)) over ()), then just divide the count by that (count(*)/sum(count(*)) over ()). Multiply by 100 to make it a percentage.
psql (12.1 (Debian 12.1-1))
Type "help" for help.
testdb=# CREATE TABLE faket2 AS (
SELECT 'early' AS rental_return_status UNION ALL
SELECT 'early' UNION ALL
SELECT 'ontime' UNION ALL
SELECT 'late');
SELECT 4
testdb=# SELECT
rental_return_status,
COUNT(*) as total_films_rented,
(100*count(*))/sum(count(*)) over () AS percentage
FROM faket2
GROUP BY 1
ORDER BY 2 DESC;
rental_return_status | total_films_rented | percentage
----------------------+--------------------+---------------------
early | 2 | 50.0000000000000000
late | 1 | 25.0000000000000000
ontime | 1 | 25.0000000000000000
(3 rows)
I have data:
id | price | date
1 | 25 | 2019-01-01
2 | 35 | 2019-01-01
1 | 27 | 2019-02-01
2 | 37 | 2019-02-01
Is it possible to write such query which will return only first row from window? something like LIMIT 1 but for the window OVER( date )?
I expect next result:
id | price | date
1 | 25 | 2019-01-01
1 | 27 | 2019-02-01
Or ignore whole window if first window row has NULL:
id | price | date
1 | NULL | 2019-01-01
2 | 35 | 2019-01-01
1 | 27 | 2019-02-01
2 | 37 | 2019-02-01
result:
1 | 27 | 2019-02-01
Order the rows by date and id, and take only the first row per date.
Then remove those where the price is NULL.
SELECT *
FROM (SELECT DISTINCT ON (date)
id, price, date
FROM mytable
ORDER BY date, id
) AS q
WHERE price IS NOT NULL;
#Laurenz let me to provide a bit more explanation
select distinct on (<fldlist>) * from <table> order by <fldlist+>;
is equal to much more complex query:
select * from (
select row_number() over (partition by <fldlist> order by <fldlist+>) as rn,*
from <table>)
where rn = 1;
And here <fldlist> should be the beginning part (or equal) of <fldlist+>
As Myon on IRC said:
if you want to use a window function in WHERE, you need to put it into a subselect first
So the target query is:
select * from (
select
*
agg_function( my_field ) OVER( PARTITION BY other_field ) as agg_field
from sometable
) x
WHERE agg_field <condition>
In my case I have next query:
SELECT * FROM (
SELECT *,
FIRST_VALUE( p.price ) over( PARTITION BY crate.app_period ORDER BY st.DEPTH ) AS first_price,
ROW_NUMBER() over( PARTITION BY crate.app_period ORDER BY st.DEPTH ) AS row_number
FROM st
LEFT JOIN price p ON <COND>
LEFT JOIN currency_rate crate ON <COND>
) p
WHERE p.row_number = 1 AND p.first_price IS NOT null
Here I select only first rows from the group and where price IS NOT NULL
I have a query which returns monthly averages from the same table, but for different pressure_level's:
SELECT some_id, avg(exposure_value) monthly_avg_1000
FROM mytable
WHERE pressure_level = 1000
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
I then have the same query, but for a different pressure_level:
SELECT some_id, avg(exposure_value) monthly_avg_925
FROM mytable
WHERE pressure_level = 925
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
Both queries return 12 rows (1 per month) with the ID and the average value for the month:
some_id | monthly_avg_1000
--------------------------
1 | 0.000023
1 | 0.000051
1 | 0.000009
some_id | monthly_avg_925
--------------------------
1 | 0.000014
1 | 0.000007
1 | 0.000131
I would like to combine the two queries so that the monthly_avg_* columns all appear in the final table:
some_id | monthly_avg_1000 | monthly_avg_925
--------------------------
1 | 0.000023 | 0.000014
1 | 0.000051 | 0.000007
1 | 0.000009 | 0.000131
How can I do this?
if you have same id, then you can try join:
with a as (
SELECT some_id, avg(exposure_value) monthly_avg_1000,date_trunc('month', measurement_time) d
FROM mytable
WHERE pressure_level = 1000
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
)
, b as (
SELECT some_id, avg(exposure_value) monthly_avg_925, date_trunc('month', measurement_time) d
FROM mytable
WHERE pressure_level = 925
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
)
select distinct a.some_id, monthly_avg_1000,monthly_avg_925
from a
join b on a.some_id = b.some_id and a.d = b.d
UPDATE:
my orgional attempt to use FULL OUTER JOIN did not work correctly. I have updated the question to reflex the true issue. Sorry for presenting a classic XY PROBLEM.
I'm trying to retrieve a dataset from multiple tables all in one query thats is grouped by year, month of the data.
The final result should look like this:
| Year | Month | Col1 | Col2 | Col3 |
|------+-------+------+------+------|
| 2012 | 11 | 231 | - | - |
| 2012 | 12 | 534 | 12 | 13 |
| 2013 | 1 | - | 22 | 14 |
Coming from data that looks like this:
Table 1:
| Year | Month | Data |
|------+-------+------|
| 2012 | 11 | 231 |
| 2012 | 12 | 534 |
Table 2:
| Year | Month | Data |
|------+-------+------|
| 2012 | 12 | 12 |
| 2013 | 1 | 22 |
Table 3:
| Year | Month | Data |
|------+-------+------|
| 2012 | 12 | 13 |
| 2013 | 1 | 14 |
I tried using FULL OUTER JOIN but this doesn't quite work because in my SELECT clause because no matter which table I select 'Year' and 'Month' from there are null values.
SELECT
Collase(t1.year,t2.year,t3.year)
,Collese(t1.month,t2.month,t3.month)
,t1.data as col1
,t2.data as col2
,t3.data as col3
From t1
FULL OUTER JOIN t2
on t1.year = t2.year and t1.month = t2.month
FULL OUTER JOIN t3
on t1.year = t3.year and t1.month = t3.month
Result is something like this (is too confusing to repeat exactly what i would get using this demo data):
| Year | Month | Col1 | Col2 | Col3 |
|------+-------+------+------+------|
| 2012 | 11 | 231 | - | - |
| 2012 | 12 | 534 | 12 | 13 |
| 2013 | 1 | - | 22 | |
| - | 1 | - | - | 14 |
If your data allows it (not 100 columns), this is usually a clean way of doing it:
select year, month, sum(col1) as col1, sum(col2) as col2, sum(col3) as col3
from (
SELECT t1.year, t1.month, t1.data as col1, 0 as col2, 0 as col3
From t1
union all
SELECT t2.year, t2.month, 0 as col1, t2.data as col2, 0 as col3
From t2
union all
SELECT t3.year, t3.month, 0 as col1, 0 as col2, t3.data as col3
From t3
) as data
group by year, month
If you are using SQL Server 2005 or later version, you could also try this PIVOT solution:
SELECT
Year,
Month,
Col1,
Col2,
Col3
FROM (
SELECT Year, Month, 'Col1' AS Col, Data FROM t1
UNION ALL
SELECT Year, Month, 'Col2' AS Col, Data FROM t2
UNION ALL
SELECT Year, Month, 'Col3' AS Col, Data FROM t3
) f
PIVOT (
SUM(Data) FOR Col IN (Col1, Col2, Col3)
) p
;
This query can be tested and played with at SQL Fiddle.
Perhaps you are looking for the COALESCE keyword? It takes a list of columns and returns the first one that is NOT NULL, or NULL if all arguments are null. In your example, you would do something like this.
SELECT COALESCE(t1.data, t2.data)
You would still need to join tables in this case. It would just cut down on the case statements.
You could derive the complete list of years and months from all the tables, than join every table to that list (using a left join):
SELECT
f.Year,
f.Month,
t1.data AS col1,
t2.data AS col2,
t3.data AS col3
FROM (
SELECT Year, Month FROM t1
UNION
SELECT Year, Month FROM t2
UNION
SELECT Year, Month FROM t3
) f
LEFT JOIN t1 ON f.year = t1.year and f.month = t1.month
LEFT JOIN t2 ON f.year = t2.year and f.month = t2.month
LEFT JOIN t3 ON f.year = t3.year and f.month = t3.month
;
You can see a live demonstration of this query at SQL Fiddle.
if you are looking for the non-null values from either tabloe then you will have to add t1.dat IS NOT NULL as well. I hope that I understand your question.
CREATE VIEW joined_SALES
AS SELECT t1.year, t1.month, t1.data , t2.data
FROM table1 t1, table2 t2
WHERE
t1.year = t2.year
and t1.month = t2.month
and t1.dat IS NOT NULL
GROUP BY t1.year, t1.month;
This might be a better way, especially if you are going to do something with the data before returning it. Basically you are translating the table the data came from into a typeId.
declare #temp table
([year] int,
[month] int,
typeId int,
data decimal)
insert into #temp
SELECT t1.year, t1.month, 1, sum(t1.data)
From t1
group by t1.year, t1.month
insert into #temp
SELECT t2.year, t2.month, 2, sum(t2.data)
From t2
group by t1.year, t1.month
insert into #temp
SELECT t3.year, t3.month, 3, sum(t3.data)
group by t1.year, t1.month
select t.year, t.month,
sum(case when t.typeId = 1 then t.data end) as col1,
sum(case when t.typeId = 2 then t.data end) as col2,
sum(case when t.typeId = 3 then t.data end) as col3
from #temp t
group by t.year, t.month