ERROR: array must not contain nulls PostgreSQL - postgresql

My Query is
SELECT
id,
ARRAY_AGG(session_os)::integer[]
FROM
t
GROUP BY id
HAVING ARRAY_AGG(session_os)::integer[] && ARRAY[1,NULL]
It's giving ERROR: array must not contain nulls
Actually I want to get rows like
id | Session_OS
-------|-------------
641 | {1, 2}
642 | {NULL, 2}
643 | {NULL}
Kindly check the sample data here
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=7793fa763a360bf7334787e4249d6107

The && operator does not support NULL values. So, you need another approach. For example you could join the data to the table first. This gives you the ids which are linked to your required data. At the second step you are able to arregate all values using these ids.
step-by-step demo:db<>fiddle
SELECT
id,
ARRAY_AGG(session_os) -- 4
FROM t
WHERE id IN ( -- 3
SELECT
id
FROM
t
JOIN (
SELECT unnest(ARRAY[1, null]) as a -- 1
)s ON s.a IS NOT DISTINCT FROM t.session_os -- 2
)
GROUP BY id
Create a table or query result which contains your relevant data, incl. the NULL value.
You can join the data, incl. the NULL value, using the operator IS NOT DISTINCT FROM, which considers the NULL.
Now you have fetched the relevant id values which can be used in the WHERE filter
Finally your can do your aggregation

The extension intarray installs its own && operator for int[], and this doesn't allow NULLs and it takes precedence over the built-in && operator.
If you are not using intarray, you can just uninstall it (except for in dbfiddle, where you can't). If you are using it occasionally, I think it is best to install it in its own schema which is not in your search path. Then you need to schema qualify its operators when you do need them.
Alternatively, you can leave intarray in place and schema qualify the normal built-in operators when you need those ones specifically, as shown here.

Related

Postgres Union not union'ing everything

This is using Postgres 12
I have a table that stages an import from a spreadsheet. It has a structure like
code | yes | no
XX1000 | yes | null
ZX1001 | null | no
I am trying to get all the results in a query so I can do other stuff with it.
When I run
select substring(code, 3), 'Y' from table1 where yes = 'yes' and no is null
I get the correct number of results (lets say 100).
When I run
select substring(code, 3), 'N' from table1 where yes is null and no is not null
I get the proper number of results (lets say 6)
If I run
select substring(code, 3), 'Y' from table1 where yes = 'yes' and no is null
UNION
select substring(code, 3), 'N' from table1 where yes is null and no is not null
I get 102 results, with 4 results missing from the yes query. Extracting all the results and comparing in Excel, I can see that there are not any values in the substring results that duplicate for both queries (e.g. each id is in the result set of the yes query once). I can also guarantee there are no overlapping substring(code, 3) values since the spreadsheet is populated from a different system where the values after the two characters are the id column in the table (I also verified the second query run separately returns distinct values compared to the first query).
Running a UNION ALL gives me 106 results that are all unique.
What is going on here? I am so confused why the UNION is dropping unique results.
According documentation UNION effectively appends the result of query2 to the result of query1 (although there is no guarantee that this is the order in which the rows are actually returned). Furthermore, it eliminates duplicate rows from its result, in the same way as DISTINCT, unless UNION ALL is used.

Sphinx seems to force a Order on ID?

I added a new field to my index (weight) which is an integer based value I want to sort on.
I added it to the select and invoked it as sql_attr_uint
When I call it in my query it shows up. However when I try to sort on it i get strange behavior. It always sorts on the record ID instead. So Order on ID is identical to Order on Weight.
I've checked the Index pretty thoroughly and can't find a reason why, does sphinx auto-sort on record ID somehow?
I know the details are fairly sparse yet I'm hoping there is some basic explanation I'm missing before asking anyone to delve in further.
As an update: I don't believe the ID field sort has been "imposed" on the index in anyway inadvertently since I can Order by a # of other fields, both integer and text and the results are returned independent of the ID values (e.g sort on last name Record #100 Adams will come before Record #1 Wyatt)
Yet ordering on Weight always returns the same order as ordering by ID whether it is asc or desc. No error about the field or index not existing or being sortable, no ignoring the order request (desc and asc work) it just ignores that particular field value and uses the ID instead.
Further Update: The Weight value is indexed via a join to the main table indexed by sphinx in the following manner:
sql_attr_multi = uint value_Weight from ranged-query; \
SELECT j.id AS ID, IF(s.Weight > 0, 1, 0) AS Weight \
FROM Customers j \
INNER JOIN CustomerSources s ON j.customer_id = s.customer_id \
AND j.id BETWEEN $start AND $end \
ORDER BY s.id; \
SELECT MIN(id), MAX(id) FROM Customers
Once indexed sorting on both id and value_Weight return the same sort whereas Weight and ID are unrelated.
Ah yes, from
http://sphinxsearch.com/docs/current/mva.html
Filtering and group-by (but not sorting) on MVA attributes is supported.
Can't sort by a MVA attribute (which as noted in comments makes sense, as MVAs usually contain many values, sorting by many values is rather 'tricky'.
When you try, it simply fails. So sorting is falling back on the 'natural' order of the index, which is usually by ID.
Use sql_attr_unit instead
http://sphinxsearch.com/docs/current/conf-sql-attr-uint.html
(but will proabbly mean rewriting the sql_query to perform the JOIN on CustomerSources )

PostgreSQL -must appear in the GROUP BY clause or be used in an aggregate function

I am getting this error in the pg production mode, but its working fine in sqlite3 development mode.
ActiveRecord::StatementInvalid in ManagementController#index
PG::Error: ERROR: column "estates.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "estates".* FROM "estates" WHERE "estates"."Mgmt" = ...
^
: SELECT "estates".* FROM "estates" WHERE "estates"."Mgmt" = 'Mazzey' GROUP BY user_id
#myestate = Estate.where(:Mgmt => current_user.Company).group(:user_id).all
If user_id is the PRIMARY KEY then you need to upgrade PostgreSQL; newer versions will correctly handle grouping by the primary key.
If user_id is neither unique nor the primary key for the 'estates' relation in question, then this query doesn't make much sense, since PostgreSQL has no way to know which value to return for each column of estates where multiple rows share the same user_id. You must use an aggregate function that expresses what you want, like min, max, avg, string_agg, array_agg, etc or add the column(s) of interest to the GROUP BY.
Alternately you can rephrase the query to use DISTINCT ON and an ORDER BY if you really do want to pick a somewhat arbitrary row, though I really doubt it's possible to express that via ActiveRecord.
Some databases - including SQLite and MySQL - will just pick an arbitrary row. This is considered incorrect and unsafe by the PostgreSQL team, so PostgreSQL follows the SQL standard and considers such queries to be errors.
If you have:
col1 col2
fred 42
bob 9
fred 44
fred 99
and you do:
SELECT col1, col2 FROM mytable GROUP BY col1;
then it's obvious that you should get the row:
bob 9
but what about the result for fred? There is no single correct answer to pick, so the database will refuse to execute such unsafe queries. If you wanted the greatest col2 for any col1 you'd use the max aggregate:
SELECT col1, max(col2) AS max_col2 FROM mytable GROUP BY col1;
I recently moved from MySQL to PostgreSQL and encountered the same issue. Just for reference, the best approach I've found is to use DISTINCT ON as suggested in this SO answer:
Elegant PostgreSQL Group by for Ruby on Rails / ActiveRecord
This will let you get one record for each unique value in your chosen column that matches the other query conditions:
MyModel.where(:some_col => value).select("DISTINCT ON (unique_col) *")
I prefer DISTINCT ON because I can still get all the other column values in the row. DISTINCT alone will only return the value of that specific column.
After often receiving the error myself I realised that Rails (I am using rails 4) automatically adds an 'order by id' at the end of your grouping query. This often results in the error above. So make sure you append your own .order(:group_by_column) at the end of your Rails query. Hence you will have something like this:
#problems = Problem.select('problems.username, sum(problems.weight) as weight_sum').group('problems.username').order('problems.username')
#myestate1 = Estate.where(:Mgmt => current_user.Company)
#myestate = #myestate1.select("DISTINCT(user_id)")
this is what I did.

SqlAlchemy: count of distinct over multiple columns

I can't do:
>>> session.query(
func.count(distinct(Hit.ip_address, Hit.user_agent)).first()
TypeError: distinct() takes exactly 1 argument (2 given)
I can do:
session.query(
func.count(distinct(func.concat(Hit.ip_address, Hit.user_agent))).first()
Which is fine (count of unique users in a 'pageload' db table).
This isn't correct in the general case, e.g. will give a count of 1 instead of 2 for the following table:
col_a | col_b
----------------
xx | yy
xxy | y
Is there any way to generate the following SQL (which is valid in postgresql at least)?
SELECT count(distinct (col_a, col_b)) FROM my_table;
distinct() accepts more than one argument when appended to the query object:
session.query(Hit).distinct(Hit.ip_address, Hit.user_agent).count()
It should generate something like:
SELECT count(*) AS count_1
FROM (SELECT DISTINCT ON (hit.ip_address, hit.user_agent)
hit.ip_address AS hit_ip_address, hit.user_agent AS hit_user_agent
FROM hit) AS anon_1
which is even a bit closer to what you wanted.
The exact query can be produced using the tuple_() construct:
session.query(
func.count(distinct(tuple_(Hit.ip_address, Hit.user_agent)))).scalar()
Looks like sqlalchemy distinct() accepts only one column or expression.
Another way around is to use group_by and count. This should be more efficient than using concat of two columns - with group by database would be able to use indexes if they do exist:
session.query(Hit.ip_address, Hit.user_agent).\
group_by(Hit.ip_address, Hit.user_agent).count()
Generated query would still look different from what you asked about:
SELECT count(*) AS count_1
FROM (SELECT hittable.user_agent AS hittableuser_agent, hittable.ip_address AS sometable_column2
FROM hittable GROUP BY hittable.user_agent, hittable.ip_address) AS anon_1
You can add some variables or characters in concat function in order to make it distinct. Taking your example as reference it should be:
session.query(
func.count(distinct(func.concat(Hit.ip_address, "-", Hit.user_agent))).first()

strange result when use Where filter in CQL cassandra

i have a column family use counter as create table command below: (KEY i use bigin to filter when query ).
CREATE TABLE BannerCount (
KEY bigint PRIMARY KEY
) WITH
comment='' AND
comparator=text AND
read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
default_validation=counter AND
min_compaction_threshold=4 AND
max_compaction_threshold=32 AND
replicate_on_write='true' AND
compaction_strategy_class='SizeTieredCompactionStrategy' AND
compression_parameters:sstable_compression='SnappyCompressor';
But when i insert data to this column family , and select using Where command to filter data
results i retrived very strange :( like that:
use Query:
select count(1) From BannerCount where KEY > -1
count
-------
71
use Query:
select count(1) From BannerCount where KEY > 0;
count
-------
3
use Query:
select count(1) From BannerCount ;
count
-------
122
What happen with my query , who any tell me why i get that :( :(
To understand the reason for this, you should understand Cassandra's data model. You're probably using RandomPartitioner here, so each of these KEY values in your table are being hashed to token values, so they get stored in a distributed way around your ring.
So finding all rows whose key has a higher value than X isn't the sort of query Cassandra is optimized for. You should probably be keying your rows on some other value, and then using either wide rows for your bigint values (since columns are sorted) or put them in a second column, and create an index on it.
To explain in a little more detail why your results seem strange: CQL 2 implicitly turns "KEY >= X" into "token(KEY) >= token(X)", so that a querier can iterate through all the rows in a somewhat-efficient way. So really, you're finding all the rows whose hash is greater than the hash of X. See CASSANDRA-3771 for how that confusion is being resolved in CQL 3. That said, the proper fix for you is to structure your data according to the queries you expect to be running on it.