Group by and sum depending on cases in Google Big Query - group-by

The data looks like-
A_value B_value C_value Type
1 null null A
2 null null A
null 3 null B
null 4 null B
null null 5 C
null null 6 C
When Type is 'A' I want to sum the 'A_value' and store in a different column called 'Type_value', when Type is 'B' I want to sum the 'B_value' and store in the column 'Type_value' and do similar for 'C'
Expected results-
Type_value Type
3 A
7 B
11 C
How to achieve this result?

Below is for BigQuery Standard SQL
#standardSQL
SELECT SUM(CASE Type
WHEN 'A' THEN A_value
WHEN 'B' THEN B_value
WHEN 'C' THEN C_value
ELSE 0
END) AS Type_value, Type
FROM `project.dataset.table`
GROUP BY Type
If to apply to sample data in your question - result is
Row Type_value Type
1 3 A
2 7 B
3 11 C
Another potential option is to reuse the fact that your data has pattern of having value only in respective columns. So if it is true - you can use below version
#standardSQL
SELECT SUM(IFNULL(A_value, 0) + IFNULL(B_value, 0) + IFNULL(C_value, 0)) AS Type_value, Type
FROM `project.dataset.table`
GROUP BY Type
with same result obviously

Related

Postgres NOT IN does not work as expected

I am trying to add the below condition in my query to filter data.
SELECT *
FROM dump
WHERE letpos NOT IN ('0', '(!)','NA','N/A') ;
I need only records with id 1,2,3 and 6. But the query does not return ids 3 and 6. I get only 1,2.
TABLE:
id
name
letpos
num
1
AAA
A
60
2
BBB
B
3
CCC
50
4
DDD
0
5
EEE
(!)
70
6
FFF
70
I am not sure what is missing? Could anyone advise on how to resolve this?
-Thanks
In the row with id = 3 the value of letpos is (I suspect) NULL, so the boolean expression in the WHERE clause is:
WHERE NULL NOT IN ('0', '(!)','NA','N/A');
The comparison of NULL with operators like IN, NOT IN, =, > etc always returns NULL and is never TRUE.
So you don't get this row in the results.
Check for NULL also in the WHERE clause:
SELECT *
FROM dump
WHERE letpos IS NULL
OR letpos NOT IN ('0', '(!)', 'NA', 'N/A');

How to multiple decimal numbers in column within a group by?

I have sql table that looks like this:
date id value type
2020-01-01 1 1.03 a
2020-01-01 1 1.02 a
2020-01-02 2 1.06 a
2020-01-02 2 1.2 a
2020-01-03 3 1.09 b
I need to build a query that groups by date,id, and type by multiplying the value column whereever type = 'a'.
what new table should look like:
date id value type
2020-01-01 1 1.0506 a
2020-01-02 2 1.272 a
2020-01-03 3 1.09 b
currently I am building this query,
select
date, id, value, type
from my_table
where date between 'some date' and 'some date'
and trying to fit in EXP(SUM(LOG(value)
but, how do I do the multiplication only where type = 'a' in a group by?
edit:
there are more than 2 values in the type column
I am using redshift. Not postgresql.
select date
, id
-- use the 'case' syntax to check if it is type 'a'
, case when type = 'a' then EXP(SUM(LOG(value::float))) -- your multiply logic
else max(value) -- use min or max to pick only one value
end as value
from my_table
where date between 'some date' and 'some date'
group
by date, id, type

SQL Renumbering index after group by

I have the following input table:
Seq Group GroupSequence
1 0
2 4 A
3 4 B
4 4 C
5 0
6 6 A
7 6 B
8 0
Output table is:
Line NewSeq GroupSequence
1 1
2 2 A
3 2 B
4 2 C
5 3
6 4 A
7 4 B
8 5
The rules for the input table are:
Any positive integer in the Group column indicates that the rows are grouped together. The entire field may be NULL or blank. A null or 0 indicates that the row is processed on its own. In the above example there are two groups and three 'single' rows.
the GroupSequence column is a single character that sorts within the group. NULL, blank, 'A', 'B' 'C' 'D' are the only characters allowed.
if Group has a positive integer, there must be alphabetic character in GroupSequence.
I need a query that creates the output table with a new column that sequences as shown.
External apps needs to iterate through this table in either Line or NewSeq order(same order, different values)
I've tried variations on GROUP BY, PARTITION BY, OVER(), etc. WITH no success.
Any help much appreciated.
Perhaps this will help
The only trick here is Flg which will indicate a new Group Sequence (values will be 1 or 0). Then it is a small matter to sum(Flg) via a window function.
Edit - Updated Flg method
Example
Declare #YourTable Table ([Seq] int,[Group] int,[GroupSequence] varchar(50))
Insert Into #YourTable Values
(1,0,null)
,(2,4,'A')
,(3,4,'B')
,(4,4,'C')
,(5,0,null)
,(6,6,'A')
,(7,6,'B')
,(8,0,null)
Select Line = Row_Number() over (Order by Seq)
,NewSeq = Sum(Flg) over (Order By Seq)
,GroupSequence
From (
Select *
,Flg = case when [Group] = lag([Group],1) over (Order by Seq) then 0 else 1 end
From #YourTable
) A
Order By Line
Returns
Line NewSeq GroupSequence
1 1 NULL
2 2 A
3 2 B
4 2 C
5 3 NULL
6 4 A
7 4 B
8 5 NULL

Postgres - bind results of equal type by year - long to wide data

Please excuse my not very propper way of asking this as i am new to postgres...
Having the following two tables:
CREATE TABLE pub (
id int
, time timestamp
);
id time
1 1 2010-02-10 01:00:00
2 2 2011-02-10 01:00:00
3 3 2012-02-10 01:00:00
And
CREATE TABLE val (
id int
, type text
, val int
);
id type val
1 1 A 1
2 1 B 2
3 1 C 3
4 2 A 4
5 2 B 5
6 3 D 6
I would like to get the following output (for id <= 2 )
type 2010 2011
1 A 1 4
2 B 2 5
3 C 3 NULL
So type is the superset of all type's present in table val.
NULL meaning that there is no value for label C.
Ideally the column-headings are are years of the time. Alternatively the id itself...
Exists at least two ways to do this.
If your table have not many categories you can use CTE
WITH x AS (
SELECT type,
sum(val) FILTER (WHERE date_part('year', time) = 2010) AS "2010",
sum(val) FILTER (WHERE date_part('year', time) = 2011) AS "2011"
FROM pub AS p JOIN val AS v ON (v.id = p.id)
GROUP BY type
)
SELECT * FROM x
WHERE "2010" is NOT NULL OR "2011" IS NOT NULL
ORDER BY type
;
But if you have many or dynamic categories you must use crosstab:
CREATE EXTENSION tablefunc;
SELECT * FROM crosstab(
$$
SELECT type,
date_part('year', time)::text as time,
sum(val) AS val
FROM pub AS p JOIN val AS v ON (v.id = p.id)
GROUP BY type, 2
ORDER BY 1, 2
$$,
$$VALUES ('2010'::text), ('2011'), ('2012') $$
) AS ct (type text, "2010" int, "2011" int, "2012" int);
;

combining results of CTEs

I have several CTEs. CTE1A counts number of type A shops in area 1. CTE1B counts number of type B shops in area 1 and so on up to CTE1D. Similarly, CTE2B counts number of type B shops in area 2 and so on. shop_types CTE selects all types of shops: A,B,C,D. How to display a table that shows for each area (column) how many shops of each type there is (rows).
For example:
1 2 3 4 5
A 0 7 4 0 0
B 2 3 8 2 9
C 8 5 8 1 6
D 7 1 5 4 3
Database has 2 tables:
Table regions: shop_id, region_id
Table shops: shop_id, shop_type
WITH
shop_types AS (SELECT DISTINCT shops.shop_type AS type FROM shops WHERE shops.shop_type!='-9999' AND shops.shop_type!='Other'),
cte1A AS (
SELECT regions.region_id, COUNT(regions.shop_id) AS shops_number, shops.shop_type
FROM regions
RIGHT JOIN shops
ON shops.shop_id=regions.shop_id
WHERE regions.region_id=1
AND shops.shop_type='A'
GROUP BY shops.shop_type,regions.region_id)
SELECT * FROM cte1A
I'm not entirely sure I understand why you are after, but it seems you are looking for something like this:
select sh.shop_type,
count(case when r.region_id = 1 then 1 end) as region_1_count,
count(case when r.region_id = 2 then 1 end) as region_2_count,
count(case when r.region_id = 3 then 1 end) as region_3_count
from shops sh
left join regions r on r.shop_id = sh.shop_id
group by sh.shop_type
order by sh.shop_type;
You need to add one case statement for each region you want to have in the output.
If you are using Postgres 9.4 you can replace the case statements using a filter condition which kind of makes the intention a bit easier to understand (I think)
count(*) filter (where r.region_id = 1) as region_1_count,
count(*) filter (where r.region_id = 2) as region_2_count,
...
SQLFiddle: http://sqlfiddle.com/#!1/98391/1
And before you ask: no you can't make the number of columns "dynamic" based on a select statement. The column list for a query must be defined before the statement is actually executed.