Count occurrences of value in field for a particular ID using Redshift

Count occurrences of value in field for a particular ID using Redshift - amazon-redshift

I want to count the occurrences of particular values in a certain field for an ID. So what I have is this:
| Location ID | Group |
|:----------- |:---------|
| 1 | Group A |
| 2 | Group B |
| 3 | Group C |
| 4 | Group A |
| 4 | Group B |
| 4 | Group C |
| 3 | Group A |
| 2 | Group B |
| 1 | Group C |
| 2 | Group A |
And what I would hope to yield through some computer magic is this:
| Location ID | Group A Count | Group B Count | Group C count|
|:----------- |:--------------|:--------------|:-------------|
| 1 | 1 | 0 | 1 |
| 2 | 1 | 2 | 0 |
| 3 | 1 | 0 | 1 |
| 4 | 1 | 1 | 1 |
Is there some sort of pivoting function I can use in Redshift to achieve this?

This will require the usage of the CASE function and GROUP clause, as in example.
SELECT l_id,
SUM(CASE WHEN l_group = 'Group A' THEN 1 ELSE 0 END) AS a,
SUM(CASE WHEN l_group = 'Group B' THEN 1 ELSE 0 END) AS b-- and so on
FROM location
GROUP BY l_id;
This should give you such result:
| l_id | a | b |
|------|---|---|
| 4 | 1 | 1 |
| 1 | 1 | 0 |
| 3 | 1 | 0 |
| 2 | 1 | 2 |
You can play with it on this SQL Fiddle.

Related

Postgres distinct rows whilst also summing

I have a dataset that is similar to this. I need to pick out the most recent metadata (greater execution time = more recent) for a client including the sum of quantities and the latest execution time and meta where the quantity > 0
| Name | Quantity | Metadata | Execution time |
| -------- | ---------|----------|----------------|
| Neil | 1 | [1,3] | 4 |
| James | 1 | [2,18] | 5 |
| Neil | 1 | [4, 1] | 6 |
| Mike | 1 | [5, 42] | 7 |
| James | -1 | Null | 8 |
| Neil | -1 | Null | 9 |
Eg the query needs to return:
| Name | Summed Quantity | Metadata | Execution time |
| -------- | ----------------|----------|----------------|
| James | 0 | [2,18] | 5 |
| Neil | 1 | [4, 1] | 6 |
| Mike | 1 | [5, 42] | 7 |
My query doesn't quite work as it's not returning the sum of the quantities correctly.
SELECT
distinct on (name) name,
(
SELECT
cast(
sum(quantity) as int
)
) as summed_quantity,
meta,
execution_time
FROM
table
where
quantity > 0
group by
name,
meta,
execution_time
order by
name,
execution_time desc;
This query gives a result of
| Name | Summed Quantity | Metadata | Execution time |
| -------- | ----------------|----------|----------------|
| James | 1 | [2,18] | 5 |
| Neil | 1 | [4, 1] | 6 |
| Mike | 1 | [5, 42] | 7 |
ie it's just taking the quantity > 0 from the where and not adding up the quantities in the sub query (i assume because of the distinct clause) I'm unsure how to fix my query to produce the desired output.

This can be achieved using window functions (hence with a single pass of the data)
select
name
, sum_qty
, metadata
, execution_time
from (
select
*
, sum(Quantity) over(partition by name) sum_qty
, row_number() over(partition by name, case when quantity > 0 then 1 else 0 end
order by Execution_time DESC) as rn
from mytable
) d
where rn = 1 and quantity > 0
order by name
result
+-------+---------+----------+----------------+
| name | sum_qty | metadata | execution_time |
+-------+---------+----------+----------------+
| James | 0 | [2,18] | 5 |
| Mike | 1 | [5,42] | 7 |
| Neil | 1 | [4,1] | 6 |
+-------+---------+----------+----------------+
db<>fiddle here

postgresql: query two tables with same column names and show the result side by side ordered their column names, which occur in both tables

Having two tables (table1, table2) with the same column names (generation, parent), the desired output would be the combination of all columns of both tables. Thereby the rows of table2 should join table1 so that the rows of table2 are matching those of table1 on generation column. The parent number should be ordered ascending for the entries in table1 as well as in table2. The number of rows of the query results should be equal of those of table1.
Given the following tables
table1:
| generation | parent |
|:----------:|:------:|
| 0 | 1 |
| 0 | 2 |
| 0 | 3 |
| 1 | 3 |
| 1 | 2 |
| 1 | 1 |
| 2 | 2 |
| 2 | 1 |
| 2 | 3 |
table2:
| generation | parent |
|:----------:|:------:|
| 1 | 3 |
| 1 | 1 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
The following queries are thought for creating and populating two sample tables as shown above:
create table table1(generation integer, parent integer);
insert into table1 (generation, parent) values(0,1),(0,2),(0,3),(1,3),(1,2),(1,1),(2,2),(2,1),(2,3);
create table table2(generation integer, parent integer);
insert into table2 (generation, parent) values(1,3),(1,1),(1,3),(2,1),(2,2),(2,3);
the imagined query should lead to the following desired result:
| table1_generation | table1_parent | table2_generation | table2_parent |
|:-----------------:|:-------------:|:-----------------:|:-------------:|
| 0 | 1 | | |
| 0 | 2 | | |
| 0 | 3 | | |
| 1 | 1 | 1 | 1 |
| 1 | 2 | 1 | 3 |
| 1 | 3 | 1 | 3 |
| 2 | 1 | 2 | 1 |
| 2 | 2 | 2 | 2 |
| 2 | 3 | 2 | 3 |
Current query looks as follows:
with
p as (
select
generation,
parent
from
table1
order by
generation,
parent
), o as(
select
generation,
parent
from
table2
order by
generation,
parent
)
select
p.generation as table1_generation,
p.parent as table1_parent,
o.generation as table2_generation,
o.parent as table2_parent
from
p
left join o on
o.generation=p.generation;
Which leads to the following result:
| table1_generation | table1_parent | table2_generation | table2_parent |
|:-----------------:|:-------------:|:-----------------:|:-------------:|
| 0 | 1 | | |
| 0 | 2 | | |
| 0 | 3 | | |
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 3 |
| 1 | 1 | 1 | 3 |
| 1 | 2 | 1 | 1 |
| 1 | 2 | 1 | 3 |
| 1 | 2 | 1 | 3 |
| 1 | 3 | 1 | 1 |
| 1 | 3 | 1 | 3 |
| 1 | 3 | 1 | 3 |
| 2 | 1 | 2 | 1 |
| 2 | 1 | 2 | 2 |
| 2 | 1 | 2 | 3 |
| 2 | 2 | 2 | 1 |
| 2 | 2 | 2 | 2 |
| 2 | 2 | 2 | 3 |
| 2 | 3 | 2 | 1 |
| 2 | 3 | 2 | 2 |
| 2 | 3 | 2 | 3 |
This link led to the conclusion, that any join command might not what is necessary here ... But union does only append rows... so for me it is absolutely unclear, how the desired result can be achieved o.O
Any help is highly appreciated. Thanks in advance!

The main misunderstanding on this question arose from the fact that you mentioned join, which is a very precisely mathematically defined concept based on the Cartesian product and can be applied to any two sets. So the current output is clear.
But as you wrote in the title, you want to put two tables side by side. You take advantage of the fact that they have the same number of rows (triples).
This select returns the output you want.
I made artificial join columns, row_number() OVER (order by generation, parent) as rnum, and moved the second table using the addition of three. I hope this helps you:
with
p as (
select
row_number() OVER (order by generation, parent) as rnum,
generation,
parent
from
table1
order by
generation,
parent
), o as(
select
row_number() OVER (order by generation, parent) as rnum,
generation,
parent
from
table2
order by
generation,
parent
)
select
p.generation as table1_generation,
p.parent as table1_parent,
o.generation as table2_generation,
o.parent as table2_parent
from
p
left join o on
o.rnum+3=p.rnum
order by 1,2,3,4;
Output:
table1_generation
table1_parent
table2_generation
table2_parent
0
1
(null)
(null)
0
2
(null)
(null)
0
3
(null)
(null)
1
1
1
1
1
2
1
3
1
3
1
3
2
1
2
1
2
2
2
2
2
3
2
3

PostgreSQL limit by group, only show first 2 store options

I need to select first 2 lines where the store_name is different than one given for a given product
id | store_name | prod_name
----+------------+------
1 | 1 | A
2 | 1 | B
3 | 1 | C
4 | 1 | A
5 | 2 | E
6 | 2 | A
7 | 3 | G
8 | 2 | A
9 | 1 | A
10 | 3 | A
(10 rows)
result should be store_name <> 3 AND prod_name ='A'
id | store_name | prod_name
----+------------+------
1 | 1 | A
4 | 1 | A
6 | 2 | A
8 | 2 | A

Use the row_number() window function to accomplish this.
Query #1
with first_two as (
select *,
row_number() over (partition by store_name
order by id) as rn
from store_product
where store_name <> 3
and prod_name = 'A'
)
select id, store_name, prod_name
from first_two
where rn <= 2;
| id | store_name | prod_name |
| --- | ---------- | --------- |
| 1 | 1 | A |
| 4 | 1 | A |
| 6 | 2 | A |
| 8 | 2 | A |
View on DB Fiddle

Comparing Subqueries

I have two subqueries. Here is the output of subquery A....
id | date_lat_lng | stat_total | rnum
-------+--------------------+------------+------
16820 | 2016_10_05_10_3802 | 9 | 2
15701 | 2016_10_05_10_3802 | 9 | 3
16821 | 2016_10_05_11_3802 | 16 | 2
17861 | 2016_10_05_11_3802 | 16 | 3
16840 | 2016_10_05_12_3683 | 42 | 2
17831 | 2016_10_05_12_3767 | 0 | 2
17862 | 2016_10_05_12_3802 | 11 | 2
17888 | 2016_10_05_13_3683 | 35 | 2
17833 | 2016_10_05_13_3767 | 24 | 2
16823 | 2016_10_05_13_3802 | 24 | 2
and subquery B, in which date_lat_lng and stat_total has commonality with subquery A, but id does not.
id | date_lat_lng | stat_total | rnum
-------+--------------------+------------+------
17860 | 2016_10_05_10_3802 | 9 | 1
15702 | 2016_10_05_11_3802 | 16 | 1
17887 | 2016_10_05_12_3683 | 42 | 1
15630 | 2016_10_05_12_3767 | 20 | 1
16822 | 2016_10_05_12_3802 | 20 | 1
16841 | 2016_10_05_13_3683 | 35 | 1
15632 | 2016_10_05_13_3767 | 23 | 1
17863 | 2016_10_05_13_3802 | 3 | 1
16842 | 2016_10_05_14_3683 | 32 | 1
15633 | 2016_10_05_14_3767 | 12 | 1
Both subquery A and B pull data from the same table. I want to delete the rows in that table that share the same ID as subquery A but only where date_lat_lng and stat_total have a shared match in subquery B.
Effectively I need:
DELETE FROM table WHERE
id IN
(SELECT id FROM (subqueryA) WHERE
subqueryA.date_lat_lng=subqueryB.date_lat_lng
AND subqueryA.stat_total=subqueryB.stat_total)
Except I'm not sure where to place subquery B, or if I need an entirely different structure.

Something like this,
DELETE FROM table WHERE
id IN (
SELECT DISTINCT id
FROM subqueryA
JOIN subqueryB
USING (id,date_lat_lng,stat_total)
)

left join 2 tables not working

I have 2 tables:
Table1: 'op_ats'
| ID1 | numero |id_cofre | id_chave | estadoAT
| 1 | 111 | 1 | 3 | 1
| 2 | 222 | 3 | 3 | 2
| 3 | 333 | 1 | 4 | 2
| 4 | 444 | 1 | 2 | 3
Table_2: 'op_ats_cofres_chaves'
| ID2 | num_chave |
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | E |
I have this SQL:
SELECT chaves.*, ats.numero numAT, ats.estadoAT
FROM op_ats_cofres_chaves chaves
LEFT JOIN op_ats ats ON ats.id_chave_cofre = chaves.id AND ats.id_cofre = 1
With this I get the following result:
| ID2 | num_chave | numAT | estadoAT |
| 1 | A | 444 | 3 |
| 2 | B | NULL | NULL |
| 3 | C | 111 | 1 |
| 4 | D | 333 | 2 |
| 5 | E | NULL | NULL |
Now the problem is that I want to filter the rows that are in Table1 but only that have the column 'estadoAT' with values 1 and 2. I've tried to add the line
WHERE op_ats.estadoAT = 1 OR op_ats.estadoAT = 2
But this makes the following result:
| ID2 | num_chave | numAT | estadoAT |
| 1 | A | 444 | 3 |
| 3 | C | 111 | 1 |
| 4 | D | 333 | 2 |
Resuming...
My intention is to get ALL rows in the Table2 and join the Table1 rows that have the 'id_cofre = 1' and '(estadoAT = 1 OR estadoAT = 2)'.
Any help is appreciated.

You have to move condition to JOIN clause instead of WHERE.
SELECT chaves.*, ats.numero numAT, ats.estadoAT
FROM op_ats_cofres_chaves chaves
LEFT JOIN op_ats ats ON ats.id_chave_cofre = chaves.id AND ats.id_cofre = 1
AND op_ats.estadoAT = 1 OR op_ats.estadoAT = 2;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Count occurrences of value in field for a particular ID using Redshift - amazon-redshift

Related

Postgres distinct rows whilst also summing

postgresql: query two tables with same column names and show the result side by side ordered their column names, which occur in both tables

PostgreSQL limit by group, only show first 2 store options

Comparing Subqueries

left join 2 tables not working

Categories

Resources