I have a table with 2 columns (ref1, ref2) referring to another table. I am looking for a SELECT query to generate a summary of how often values in ref1 or ref2 exist, counting twice if a value is in both columns:
SELECT ref1,ref2 from table;
(100,100),
(100,101),
(100,NULL)
Should lead to:
(100,4),
(101,1)
The only approach I can think of is a UNION, but I hope there is another way. Any ideas ?
union removes duplicates, so that won't work - you need the union all operator:
SELECT ref, COUNT(*)
FROM (SELECT ref1 AS ref FROM mytable
UNION ALL
SELECT ref2 AS ref FROM mytable) t
Related
I have two tables that am using union to combine the result-set of them my problem here is Each SELECT statement within UNION must have the same number of columns and data types, I don't have the same number of columns so am creating null columns
select
d.deal_id as order_id,
EXISTS(select * from table1 c where d.user_id = c.user_id) as IsUser, --this returns boolean value
from table 1 c
union
select
cast(o.id as varchar) as order_id,
coalesce('No_user'::text,'0'::text) as IsUser, --i get an error :UNION types boolean and character varying cannot be matched
from table2 o
how can I create a null column in table2 that matches the boolean data type of the table1
how can I create a null column in table2 that matches the boolean data type of the table1
By putting NULL into the SELECT list of the second query.
Generally (in SQL) the datatype for the column is dictated by the first query in the union set and other queries must match. You don't need column aliases for subsequent queries either, but they may help make a query more readable:
select
d.deal_id as order_id,
EXISTS(select * from table1 c where d.user_id = c.user_id) as IsUser, --this returns boolean value
from table 1 c
union
select
cast(o.id as varchar) as order_id,
null --IsUser
from table2 o
If you have a case where the type of the column in the first query is different to what you want in the output, you cast the first column:
select 1::boolean as boolcol
UNION
select true
This will ensure that the column is boolean, rather than giving a "integer and bool cannot be matched". Remember also that, should you need to, NULL can be cast to a type, e.g. select null::boolean as bolcol
I have two tables:
table_a with fields item_id,rank, and 50 other fields.
table_b with fields item_id, and the same 50 fields as table_a
I need to write a SELECT query that adds the rows of table_b to table_a but with rank set to a specific value, let's say 4.
Currently I have:
SELECT * FROM table_a
UNION
SELECT item_id, 4 rank, field_1, field_2, ...
How can I join the two tables together without writing out all of the fields and without using an INSERT query?
EDIT:
My idea is to join table_b to table_a somehow with the rank field remaining empty, then simply replace the null rank fields. The rank field is never null, but item_id can be duplicated and table_a may have item_id values that are not in table_b, and vice-versa.
I am not sure I understand why you need this, but you can use jsonb functions:
select (jsonb_populate_record(null::table_a, row)).*
from (
select to_jsonb(a) as row
from table_a a
union
select to_jsonb(b) || '{"rank": 4}'
from table_b b
) s
order by item_id;
Working example in rextester.
I'm pretty sure I've got it. The predefined rank column can be inserted into table_b by joining to the subset of itself with only the columns left of the column behind which you want to insert.
WITH
_leftcols AS ( SELECT item_id, 4 rank FROM table_b ),
_combined AS ( SELECT * FROM table_b JOIN _leftcols USING (item_id) )
SELECT * FROM _combined
UNION
SELECT * FROM table_a
I've got a table with a lot of columns in it and I want to run a query to find the most common value in each column.
Ordinarily for a single column, I'd run something like:
SELECT country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
Does PostgreSQL have a built in function for doing this or can anyone suggest a query I could run to achieve this?
Using the same query, for more than one column you should do:
SELECT *
FROM
(
SELECT country
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) country
,(
SELECT city
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) city
This works for any type and will return all the values in the same row, with the columns having its original name.
For more columns just had more subquerys as:
,(
SELECT someOtherColumn
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) someOtherColumn
Edit:
You could reach it with window functions also. However it will not be better in performance nor in readability.
Starting from PG 9.4 there is aggregate function for this:
mode() WITHIN GROUP (ORDER BY sort_expression)
returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)
And for earlier versions, you could create one...
CREATE OR REPLACE FUNCTION mode_array(anyarray)
RETURNS anyelement AS
$BODY$
SELECT a FROM unnest($1) a GROUP BY 1 ORDER BY COUNT(1) DESC, 1 LIMIT 1;
$BODY$
LANGUAGE SQL IMMUTABLE;
CREATE AGGREGATE mode(anyelement)(
SFUNC = array_append, --Function to call for each row. Just builds the array
STYPE = anyarray,
FINALFUNC = mode_array, --Function to call after everything has been added to array
INITCOND = '{}'--Initialize an empty array when starting
) ;
Usage: SELECT mode(column) FROM table;
If I were doing this, I'd write a query like this one:
SELECT 'country', country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
UNION ALL
SELECT 'city', city
FROM USERS
GROUP BY city
ORDER BY count(*) DESC
LIMIT 1
-- etc.
It should be noted this only works if all the columns are of compatible types. If they are not, you'll probably need a different solution.
This window function version will read the users table and the computed table once each. The correlated subquery version will read the users table once for each of the columns. If the columns are many as in the OPs case then my guess is that this is faster. SQL Fiddle
select distinct on (country_count, age_count) *
from (
select
country,
count(*) over(partition by country) as country_count,
age,
count(*) over(partition by age) as age_count
from users
) s
order by country_count desc, age_count desc
limit 1
I have a field called "Users", and I want to run SUM() on that field that returns the sum of all DISTINCT records. I thought that this would work:
SELECT SUM(DISTINCT table_name.users)
FROM table_name
But it's not selecting DISTINCT records, it's just running as if I had run SUM(table_name.users).
What would I have to do to add only the distinct records from this field?
Use count()
SELECT count(DISTINCT table_name.users)
FROM table_name
SQLFiddle demo
This code seems to indicate sum(distinct ) and sum() return different values.
with t as (
select 1 as a
union all
select '1'
union all
select '2'
union all
select '4'
)
select sum(distinct a) as DistinctSum, sum(a) as allSum, count(distinct a) as distinctCount, count(a) as allCount from t
Do you actually have non-distinct values?
select count(1), users
from table_name
group by users
having count(1) > 1
If not, the sums will be identical.
You can see for yourself that distinct works with the following example. Here I create a subquery with duplicate values, then I do a sum distinct on those values.
select DistinctSum=sum(distinct x), RegularSum=Sum(x)
from
(
select x=1
union All
select 1
union All
select 2
union All
select 2
) x
You can see that the distinct sum column returns 3 and the regular sum returns 6 in this example.
You can use a sub-query:
select sum(users)
from (select distinct users from table_name);
SUM(DISTINCTROW table_name.something)
It worked for me (innodb).
Description - "DISTINCTROW omits data based on entire duplicate records, not just duplicate fields." http://office.microsoft.com/en-001/access-help/all-distinct-distinctrow-top-predicates-HA001231351.aspx
;WITH cte
as
(
SELECT table_name.users , rn = ROW_NUMBER() OVER (PARTITION BY users ORDER BY users)
FROM table_name
)
SELECT SUM(users)
FROM cte
WHERE rn = 1
SQL Fiddle
Try here yourself
TEST
DECLARE #table_name Table (Users INT );
INSERT INTO #table_name Values (1),(1),(1),(3),(3),(5),(5);
;WITH cte
as
(
SELECT users , rn = ROW_NUMBER() OVER (PARTITION BY users ORDER BY users)
FROM #table_name
)
SELECT SUM(users) DisSum
FROM cte
WHERE rn = 1
Result
DisSum
9
If circumstances make it difficult to weave a "distinct" into the sum clause, it will usually be possible to add an extra "where" clause to the entire query - something like:
select sum(t.ColToSum)
from SomeTable t
where (select count(*) from SomeTable t1 where t1.ColToSum = t.ColToSum and t1.ID < t.ID) = 0
May be a duplicate to
Trying to sum distinct values SQL
As per Declan_K's answer:
Get the distinct list first...
SELECT SUM(SQ.COST)
FROM
(SELECT DISTINCT [Tracking #] as TRACK,[Ship Cost] as COST FROM YourTable) SQ
I made two queries that I thought should have the same result:
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
WHERE table2.value = '1')
AS result1 ORDER BY id1)
AS result2;
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
)
AS result1 ORDER BY id1)
AS result2
WHERE value = '1';
The only difference being that one had the WHERE clause inside SELECT DISTINCT ON, and the other outside that, but inside SELECT COUNT. But the results were not the same. I don't understand why the position of the WHERE clause should make a difference in this case. Can anyone explain? Or is there a better way to phrase this question?
here's a good way to look at this:
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a;
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a
WHERE value = 2;
The problem has to do with the unique conditions and what is visible where. It is behavior by design.