The below query returns incorrect number of rows:
SELECT keyName.left(keyName.indexOf('_Configuration/')) AS keyValue,
address.zip AS zip,
customer_map AS customerMap
FROM CUSTOMERS
WHERE customer_group = 'xyz'
GROUP BY keyValue, customerMap, zip
where column datatypes are as the following:
keyName String
address EMBEDDEDMAP
customer_map EMBEDDEDMAP
However, if the GROUP BY is changed to use the column names instead of aliases, the query returns correct number of rows:
SELECT keyName.left(keyName.indexOf('_Configuration/')) AS keyValue,
address.zip AS zip,
customer_map AS customerMap
FROM CUSTOMERS
WHERE customer_group = 'xyz'
GROUP BY keyValue, customer_map, address.zip
Interestingly, using an alias for keyName.left(...) does not affect GROUP BY, but a function like for example:
ifnull(keyName, 'ABC) AS keyValue
makes the query to return incorrect number of rows.
Note that the first query does not give any errors/warnings, just returns incorrect number of rows.
Is that expected behaviour of GROUP BY?
Unfortunately OrientDB docs do not have too many details about GROUP BY.
The query execution flow is as follows:
find query target (indexes first, then fallback on cluster iterators)
iterate over the target and filter based on WHERE condition (excluding conditions already matched by indexes)
calculate projections on filtered records
apply UNWIND and EXPAND
apply Group By
apply Order By
apply SKIP and LIMIT
You can use
select keyValue,zip,customerMap from (SELECT keyName.left(keyName.indexOf('_Configuration/')) AS keyValue,
address.zip AS zip,
customer_map AS customerMap
FROM CUSTOMERS
WHERE customer_group = 'xyz' )
GROUP BY keyValue, customerMap, zip
Related
I have the table below and subsequent query.
TABLE: sales
This will work successfully (group by all filter field)
SELECT
brand,
SUM (total)
FROM
sales
GROUP BY
brand;
How about in even where i want to filter by both brand and segment and yet Group by brand alone ? Like below
SELECT
brand,
segment,
SUM (total)
FROM
sales
GROUP BY
brand;
Suppose you have a group of rows like the following:
brand
segment
total
aaa
parent
100
aaa
student
50
And you run the query you suggest:
SELECT
brand,
segment,
SUM (total)
FROM
sales
GROUP BY
brand;
By grouping, it returns exactly one row for the group defined by brand = 'aaa'.
What should it return for the segment column? It can only return one value, either 'parent' or 'student'. Which one? How can SQL know which one you want?
The single-value rule in SQL is that when you run a query with GROUP BY, all columns of the select-list must be either in aggregate functions, or else in the GROUP BY. Otherwise the column is ambiguous, because SQL cannot guess which value from the group you want.
So this query is okay, because it uses the aggregate function MAX() to pick one value from the segment column:
SELECT
brand,
MAX(segment),
SUM (total)
FROM
sales
GROUP BY
brand;
Or this query is okay because it outputs multiple rows, one for each distinct pair of brand & segment.
SELECT
brand,
segment,
SUM (total)
FROM
sales
GROUP BY
brand, segment;
Or this query is okay because it concatenates all the non-null values in segment into a comma-separated string. The comma-separated string is therefore a single string value, and that is okay by the single-value rule.
SELECT
brand,
STRING_AGG(segment, ','),
SUM (total)
FROM
sales
GROUP BY
brand;
(STRING_AGG() is a PostgreSQL function. Other brands of database have similar functions but by another name. MySQL and SQLite for example use GROUP_CONCAT().)
For Example:
SELECT * FROM Customers
WHERE Country IN ('Germany', 'France', 'UK')
I want to LIMIT 1 for each of the countries in my IN clause so I only see a total of 3 rows: One customer for per country (1 German, 1 France, 1 UK). Is there a simple way to do that?
Normally, a simple GROUP BY would suffice for this type of solution, however as you have specified that you want to include ALL of the columns in the result, then we can use the ROW_NUMBER() window function to provide a value to filter on.
As a general rule it is important to specify the column to sort on (ORDER BY) for all windowing or paged queries to make the result repeatable.
As no schema has been supplied, I have used Name as the field to sort on for the window, please update that (or the question) with any other field you would like, the PK is a good candidate if you have nothing else to go on.
SELECT * FROM
(
SELECT *
, ROW_NUMBER() OVER(PARTITION BY Country ORDER BY Name) AS _rn
FROM Customers
WHERE Country IN ('Germany', 'France', 'UK')
)
WHERE _rn = 1
The PARTITION BY forces the ROW_NUMBER to be counted across all records with the same Country value, starting at 1, so in this case we only select the rows that get a row number (aliased as _rn) of 1.
The WHERE clause could have been in the outer query if you really want to, but ROW_NUMBER() can only be specified in the SELECT or ORDER BY clauses of the query, so to use it as a filter criteria we are forced to wrap the results in some way.
I'm trying to count the number of unique pool operators for every permit # in a table but am having trouble putting this value in a new column dedicated to that count.
So I have 2 tables: doh_analysis; doh_pools.
Both of these tables have a "permit" column (TEXT), but doh_analysis has about 1000 rows with duplicates in the permit column but occasional unique values in the operator column (TEXT).
I'm trying to fill a column "operator_count" in the table "doh_pools" with a count of unique values in "pooloperator" for each permit #.
So I tried the following code but am getting a syntax error at or near "(":
update doh_pools
set operator_count = select count(distinct doh_analysis.pooloperator)
from doh_analysis
where doh_analysis.permit ilike doh_pools.permit;
When I remove the "select" from before the "count" I get "SQL Error [42803]: ERROR: aggregate functions are not allowed in UPDATE".
I can successfully query a list of distinct permit-pooloperator pairs using:
select distinct permit, pooloperator
from doh_analysis;
And I can query the # of unique pooloperators per permit 1 at a time using:
select count(distinct pooloperator)
from doh_analysis
where permit ilike '52-60-03054';
But I'm struggling to insert a count of unique pairs for each permit # in the operatorcount column.
Is there a way to do this?
There is certainly a better way of doing this but I accomplished my goal by creating 2 intermediary tables and the updating the target table with values from the 2nd intermediate table like so:
select distinct permit, pooloperator
into doh_pairs
from doh_analysis;
select permit, count(distinct pooloperator)
into doh_temp
from doh_pairs
group by permit;
select count(distinct permit)
from doh_temp;
update doh_pools
set operator_count = doh_temp.count
from doh_temp
where doh_pools.permit ilike doh_temp.permit
and doh_pools.permit is not NULL
returning count;
Say I have a query like this:
SELECT
car.id,
car.make,
car.model,
car.vin,
car.year,
car.color
FROM car GROUP BY car.make
I want to group the result by make so I can eliminate any duplicate makes. I'm essentially trying to do a SELECT DISTINCT. But I get this error:
ERROR column must appear in the GROUP BY clause or be used in an aggregate function
It seems silly to group by each column when I dont want to see any of them in a group. How do I get around this?
Instead of GROUP BY, use DISTINCT ON:
SELECT DISTINCT ON (c.make) c.*
FROM car c
ORDER BY c.make;
This will return an arbitrary row for each make. Which row? An arbitrary one. You can include a second key in the ORDER BY to determine the particular row you want (cheapest, oldest, etc.).
All column names in SELECT list must appear in GROUP BY clause unless name is used only in an aggregate function. PostgreSQL only let you omit from the GROUP BY clause columns that are functionally dependent on columns that are in the GROUP BY.
I have a table on pgsql with names (having more than 1 mio. rows), but I have also many duplicates. I select 3 fields: id, name, metadata.
I want to select them randomly with ORDER BY RANDOM() and LIMIT 1000, so I do this is many steps to save some memory in my PHP script.
But how can I do that so it only gives me a list having no duplicates in names.
For example [1,"Michael Fox","2003-03-03,34,M,4545"] will be returned but not [2,"Michael Fox","1989-02-23,M,5633"]. The name field is the most important and must be unique in the list everytime I do the select and it must be random.
I tried with GROUP BY name, bu then it expects me to have id and metadata in the GROUP BY as well or in a aggragate function, but I dont want to have them somehow filtered.
Anyone knows how to fetch many columns but do only a distinct on one column?
To do a distinct on only one (or n) column(s):
select distinct on (name)
name, col1, col2
from names
This will return any of the rows containing the name. If you want to control which of the rows will be returned you need to order:
select distinct on (name)
name, col1, col2
from names
order by name, col1
Will return the first row when ordered by col1.
distinct on:
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.
Anyone knows how to fetch many columns but do only a distinct on one column?
You want the DISTINCT ON clause.
You didn't provide sample data or a complete query so I don't have anything to show you. You want to write something like:
SELECT DISTINCT ON (name) fields, id, name, metadata FROM the_table;
This will return an unpredictable (but not "random") set of rows. If you want to make it predictable add an ORDER BY per Clodaldo's answer. If you want to make it truly random, you'll want to ORDER BY random().
To do a distinct on n columns:
select distinct on (col1, col2) col1, col2, col3, col4 from names
SELECT NAME,MAX(ID) as ID,MAX(METADATA) as METADATA
from SOMETABLE
GROUP BY NAME