This is blowing my mind.
All I want to do is basic string comparison on a long varchar field.
I have a table of approx. 12M records.
If I query for MY_FIELD='a string', I get a count of 25947, which seems about right.
If I query for MY_FIELD!='a string', I get a count of 989.
Shouldn't these 2 counts add up to the full table size of 12M?
And in how many of those rows is MY_FIELD set to NULL?
a. select count(*) from mytable;
b. select count(*) from mytable where my_field is null;
c. select count(*) from mytable where my_field is not null;
d. select count(*) from mytable where my_field = 'some value';
e. select count(*) from mytable where my_field != 'some value';
NULL is not equal or unequal to any value, including NULL so I would expect d+e to equate to c and b+c to equate to a.
Related
I have two tables that am using union to combine the result-set of them my problem here is Each SELECT statement within UNION must have the same number of columns and data types, I don't have the same number of columns so am creating null columns
select
d.deal_id as order_id,
EXISTS(select * from table1 c where d.user_id = c.user_id) as IsUser, --this returns boolean value
from table 1 c
union
select
cast(o.id as varchar) as order_id,
coalesce('No_user'::text,'0'::text) as IsUser, --i get an error :UNION types boolean and character varying cannot be matched
from table2 o
how can I create a null column in table2 that matches the boolean data type of the table1
how can I create a null column in table2 that matches the boolean data type of the table1
By putting NULL into the SELECT list of the second query.
Generally (in SQL) the datatype for the column is dictated by the first query in the union set and other queries must match. You don't need column aliases for subsequent queries either, but they may help make a query more readable:
select
d.deal_id as order_id,
EXISTS(select * from table1 c where d.user_id = c.user_id) as IsUser, --this returns boolean value
from table 1 c
union
select
cast(o.id as varchar) as order_id,
null --IsUser
from table2 o
If you have a case where the type of the column in the first query is different to what you want in the output, you cast the first column:
select 1::boolean as boolcol
UNION
select true
This will ensure that the column is boolean, rather than giving a "integer and bool cannot be matched". Remember also that, should you need to, NULL can be cast to a type, e.g. select null::boolean as bolcol
I have a PostgreSQL function similar to this:
CREATE OR REPLACE FUNCTION dbo.MyTestFunction(
_ID INT
)
RETURNS dbo.MyTable AS
$$
SELECT *,
(SELECT Name FROM dbo.MySecondTable WHERE RecordID = PersonID)
FROM dbo.MyTable
WHERE PersonID = _ID
$$ LANGUAGE SQL STABLE;
I would really like to NOT have to replace the RETURNS dbo.MyTable AS with something like:
RETURNS TABLE(
col1 INT,
col2 TEXT,
col3 BOOLEAN,
col4 TEXT
) AS
and list out all the columns of MyTable and Name of MySecondTable. Is this something that can be done? Thanks.
--EDIT--
To clarify I have to return ALL columns in MyTable and 1 column from MySecondTable. If MyTable has >15 columns, I don't want to have to list out all the columns in a RETURNS TABLE (col1.. coln).
You just list the columns that you want returned in the SELECT portion of your SQL statement:
SELECT t1.column1, t1.column2,
(SELECT Name FROM dbo.MySecondTable WHERE RecordID = PersonID)
FROM dbo.MyTable t1
WHERE PersonID = _ID
Now you'll just get column1, column3, and name returned
Furthermore, you'll probably find better performance using a LEFT OUTER JOIN in your FROM portion of the SQL statement as opposed to the correlated subquery you have now:
SELECT t1.column1, t1.column2, t2.Name
FROM dbo.MyTable t1
LEFT OUTER JOIN dbo.MySecondTable t2 ON
t2.RecordID = t1.PersonID
WHERE PersonID = _ID
Took a bit of a guess on where RecordID and PersonID were coming from, but that's the general idea.
I've got a table with a lot of columns in it and I want to run a query to find the most common value in each column.
Ordinarily for a single column, I'd run something like:
SELECT country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
Does PostgreSQL have a built in function for doing this or can anyone suggest a query I could run to achieve this?
Using the same query, for more than one column you should do:
SELECT *
FROM
(
SELECT country
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) country
,(
SELECT city
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) city
This works for any type and will return all the values in the same row, with the columns having its original name.
For more columns just had more subquerys as:
,(
SELECT someOtherColumn
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) someOtherColumn
Edit:
You could reach it with window functions also. However it will not be better in performance nor in readability.
Starting from PG 9.4 there is aggregate function for this:
mode() WITHIN GROUP (ORDER BY sort_expression)
returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)
And for earlier versions, you could create one...
CREATE OR REPLACE FUNCTION mode_array(anyarray)
RETURNS anyelement AS
$BODY$
SELECT a FROM unnest($1) a GROUP BY 1 ORDER BY COUNT(1) DESC, 1 LIMIT 1;
$BODY$
LANGUAGE SQL IMMUTABLE;
CREATE AGGREGATE mode(anyelement)(
SFUNC = array_append, --Function to call for each row. Just builds the array
STYPE = anyarray,
FINALFUNC = mode_array, --Function to call after everything has been added to array
INITCOND = '{}'--Initialize an empty array when starting
) ;
Usage: SELECT mode(column) FROM table;
If I were doing this, I'd write a query like this one:
SELECT 'country', country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
UNION ALL
SELECT 'city', city
FROM USERS
GROUP BY city
ORDER BY count(*) DESC
LIMIT 1
-- etc.
It should be noted this only works if all the columns are of compatible types. If they are not, you'll probably need a different solution.
This window function version will read the users table and the computed table once each. The correlated subquery version will read the users table once for each of the columns. If the columns are many as in the OPs case then my guess is that this is faster. SQL Fiddle
select distinct on (country_count, age_count) *
from (
select
country,
count(*) over(partition by country) as country_count,
age,
count(*) over(partition by age) as age_count
from users
) s
order by country_count desc, age_count desc
limit 1
I made two queries that I thought should have the same result:
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
WHERE table2.value = '1')
AS result1 ORDER BY id1)
AS result2;
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
)
AS result1 ORDER BY id1)
AS result2
WHERE value = '1';
The only difference being that one had the WHERE clause inside SELECT DISTINCT ON, and the other outside that, but inside SELECT COUNT. But the results were not the same. I don't understand why the position of the WHERE clause should make a difference in this case. Can anyone explain? Or is there a better way to phrase this question?
here's a good way to look at this:
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a;
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a
WHERE value = 2;
The problem has to do with the unique conditions and what is visible where. It is behavior by design.
Lets say I have table with ID int, VALUE string:
ID | VALUE
1 abc
2 abc
3 def
4 abc
5 abc
6 abc
If I do select value, count(*) group by value I should get
VALUE | COUNT
abc 5
def 1
Now the tricky part, if there is count == 1 I need to get that ID from first table. Should I be using CTE? creating resultset where I will add ID string == null and run update b.ID = a.ID where count == 1 ?
Or is there another easier way?
EDIT:
I want to have result table like this:
ID VALUE count
null abc 5
3 def 1
If your ID values are unique, you can simply check to see if the max(id) = min(id). If so, then use either one, otherwise you can return null. Like this:
Select Case When Min(id) = Max(id) Then Min(id) Else Null End As Id,
Value, Count(*) As [Count]
From YourTable
Group By Value
Since you are already performing an aggregate, including the MIN and Max function is not likely to take any extra (noticeable) time. I encourage you to give this a try.
The way I would do it would indeed be a CTE:
using #group AS (SELECT value, Count(*) as count from MyTable GROUP BY value HAVING count = 1)
SELECT MyTable.ID, #group.value, #group.count from MyTable
JOIN #group ON #group.value = MyTable.value
When using group by, after the group by statement you can use a having clause.
So
SELECT [ID]
FROM table
GROUP BY [VALUE]
HAVING COUNT(*) = 1
Edit: with regards to your edited question: this uses some fun joins and unions
CREATE TABLE #table
(ID int IDENTITY,
VALUE varchar(3))
INSERT INTO #table (VALUE)
VALUES('abc'),('abc'),('def'),('abc'),('abc'),('abc')
SELECT * FROM (
SELECT Null as ID,VALUE, COUNT(*) as [Count]
FROM #table
GROUP BY VALUE
HAVING COUNT(*) > 1
UNION ALL
SELECT t.ID,t.VALUE,p.Count FROM
#table t
JOIN
(SELECT VALUE, COUNT(*) as [Count]
FROM #table
GROUP BY VALUE
HAVING COUNT(*) = 1) p
ON t.VALUE=p.VALUE
) a
DROP TABLE #table
maybe not the most efficient but something like this works:
SELECT MAX(Id) as ID,Value FROM Table WHERE COUNT(*) = 1 GROUP BY Value