count distinct concat in BigQuery - postgresql

I have tried PostgreSQL:count distinct (col1,col2,col3,col4,col5)
in BigQuery :Count distinct concat(col1,col2,col3,col4,col5)
My scenario is I need to get same result as PostgreSQL in BigQuery
Though this scenario works on 3 columns ,I am not getting same value as PostgreSQL for 5 columns.
sample query:
select col1,
count(distinct concat((col1,col2,col3,col4,col5)
from table A
group by col1
when I remove distinct and concat, simple count(col1,col2,col3,col4,col5) gives exact value as populated in PostgreSQL. But i need to have distinct of these columns. Is there any way to achieve this? and does bigquery concat works differently?

Below few options for BigQuery Standard SQL
#standardSQL
SELECT col1,
COUNT(DISTINCT TO_JSON_STRING((col1,col2,col3,col4,col5)))
FROM A
GROUP BY col1
OR
#standardSQL
SELECT col1,
COUNT(DISTINCT FORMAT('%T', [col1,col2,col3,col4,col5]))
FROM A
GROUP BY col1

An alternative suitable for the many databases that don't support that form of COUNT DISTINCT:
SELECT COUNT(*)
FROM (
SELECT DISTINCT Origin, Dest, Reporting_Airline
FROM `fh-bigquery.flights.ontime_201908`
WHERE FlightDate_year = "2018-01-01"
)
My guess on why CONCAT didn't work in your sample: Do you have any null columns?

Related

How do I select only 1 record per user id using ROW_NUMBER without a subquery?

My current method of de-duping is really dumb.
select col1, col2 ... col500 from
(select col1, col2 ... col500, ROW_NUMBER() OVER(PARTITION BY uid) as row_num)
where row_num=1;
Is there a way to do this without a subquery? Select distinct is not an option as there can be small variations in the columns which are not significant for this output.
In Postgres distinct on () is typically faster then the equivalent solution using a window function and also doesn't require a sub-query:
select distinct on (uuid) *
from the_table
order by something
You have to supply an order by (which is something you should have done with row_number() as well) to get stable results - otherwise the chosen row is "random".
The above is true for Postgres. You also tagged your question with amazon-redshift - I have no idea if Redshift (which is in fact a very different DBMS) supports the same thing nor if it is as efficient.

select all except for a specific column

I have a table with more than 20 columns, I want to get all columns except for one which I'll use in a conditional expression.
SELECT s.* (BUT NOT column1),
CASE WHEN column1 is null THEN 1 ELSE 2 END AS column1
from tb_sample s;
Can I achieve it in postgresql given the logic above?
It may not be ideal, but you can use information_schema to get the columns and use the column to exclude in the where clause.
That gives you a list of all the column names you DO want, which you can copy/paste into your select query:
select textcat(column_name, ',')
from information_schema.columns
where table_name ='table_name' and column_name !='column_to_exclude';

How do I control the execution order in unioned queries?

I have a series of queries joined by union. Example:
SELECT
SUM(WHOS) [CRITERIA]
FROM ONFIRST
UNION
SELECT
COUNT(WHATS) [CRITERIA]
FROM ONSECOND
UNION
SELECT
IDONTKNOW [CRITERIA]
FROM ONTHIRD
etc.
The query results don't always come back in the same order and I want the results to be in the same order I have the queries written.
Example: Sometimes I get the SUM of WHOS first, sometimes I get the COUNT of WHATS first.
What's the best way to accomplish this?
You can control this easily by using a dummy order column, and ordering by that value:
;With Cte As
(
Select Sum(WHOS) CRITERIA
, 1 As Ord
From ONFIRST
Union
Select Count(WHATS) CRITERIA
, 2 As Ord
From ONSECOND
Union
Select IDONTKNOW CRITERIA
, 3 As Ord
From ONTHIRD
)
Select CRITERIA
From Cte
Order By Ord Asc;

tableau handle partitioned tables

I have a load of partitioned tables which I would like to consume into Tableau. This worked really well with Qlik sense, because it would consume each table into it's own memory, then processes it.
In Tableau I can't see a way to UNION tables (though you can UNION files). If I try to union it as custom sql, it just loads for hours, so I'm assuming it's just pulling all the data at once, which is 7GB of data and won't perform well on the db or Tableau. Database is PostgreSQL.
The partitions are pre-aggregated, so when I do the custom query union it looks like this:
SELECT user_id, grapes, day FROM steps.steps_2016_04_02 UNION
SELECT user_id, grapes, day FROM steps.steps_2016_04_03 UNION
SELECT user_id, grapes, day FROM steps.steps_2016_04_04 UNION
If you can guarantee that the data of each table is unique, then don't use UNION, because it has to an extra work to make distinct rows out of it.
Use UNION ALL instead, which is basically an append of rows. UNION or UNION DISTINCT (the same) like you showed is somewhat equivalent to:
SELECT DISTINCT * FROM (
SELECT user_id, grapes, day FROM steps.steps_2016_04_02 UNION ALL
SELECT user_id, grapes, day FROM steps.steps_2016_04_03 UNION ALL
SELECT user_id, grapes, day FROM steps.steps_2016_04_04
) t;
And the DISTINCT can be a very slow operation.
Another simpler option is to use PostgreSQL's partitioning with table inheritance and work on Tableau as a single table.

how to create an extra column and rows using select query in postgresql

Following is a sample of what am trying to achieve (never mind the select query because it just to show my actual problem)
for example,
select col1 from(
select 'tab09' as col1
union
select 'tab09_01'
union
select 'tab09_02'
union
select 'tab09_03'
union
select 'tab09_04'
) t order by col1
will return
col1
----------
tab09
tab09_01
tab09_02
tab09_03
tab09_04
So, Which PostgreSQL function will helps to get the result like below
col1 col2
----------+----------
tab09 tab10
tab09_01 tab10_01
tab09_02 tab10_02
tab09_03 tab10_03
tab09_04 tab10_04
select col1,overlay(col1 placing '10' from 4 for 2) col2 from(
--your select query goes here
) t order by col1
       
overlay-postgresql doc
oh oh, i see a MAJOR problem here. UNION is definitely not what you want here. There is a major difference between UNION and UNION ALL. UNION automatically filters duplicates, which is not your goal. UNION ALL does an append. This is a very common mistake many SQL users tend to make. Here are some examples. I hope it helps: http://www.cybertec.at/common-mistakes-union-vs-union-all/
usually a UNION vs UNION ALL problems reaches my desk hidden as "performance problem".