hive Expression Not In Group By Key - group-by

I create a table in HIVE.
It has the following columns:
id bigint, rank bigint, date string
I want to get avg(rank) per month. I can use this command. It works.
select a.lens_id, avg(a.rank)
from tableA a
group by a.lens_id, year(a.date_saved), month(a.date_saved);
However, I also want to get date information. I use this command:
select a.lens_id, avg(a.rank), a.date_saved
from lensrank_archive a
group by a.lens_id, year(a.date_saved), month(a.date_saved);
It complains: Expression Not In Group By Key

The full error message should be in the format Expression Not In Group By Key [value].
The [value] will tell you what expression needs to be in the Group By.
Just looking at the two queries, I'd say that you need to add a.date_saved explicitly to the Group By.

A walk around is to put the additional field in a collect_set and return the first element of the set. For example
select a.lens_id, avg(a.rank), collect_set(a.date_saved)[0]
from lensrank_archive a
group by a.lens_id, year(a.date_saved), month(a.date_saved);

This is because there is more than one ‘date_saved’ record under your group by. You can turn these ‘date_saved’ records into arrays and output them.

Related

Why do I need to group by columns that I don't need to group by?

Say I have a query like this:
SELECT
car.id,
car.make,
car.model,
car.vin,
car.year,
car.color
FROM car GROUP BY car.make
I want to group the result by make so I can eliminate any duplicate makes. I'm essentially trying to do a SELECT DISTINCT. But I get this error:
ERROR column must appear in the GROUP BY clause or be used in an aggregate function
It seems silly to group by each column when I dont want to see any of them in a group. How do I get around this?
Instead of GROUP BY, use DISTINCT ON:
SELECT DISTINCT ON (c.make) c.*
FROM car c
ORDER BY c.make;
This will return an arbitrary row for each make. Which row? An arbitrary one. You can include a second key in the ORDER BY to determine the particular row you want (cheapest, oldest, etc.).
All column names in SELECT list must appear in GROUP BY clause unless name is used only in an aggregate function. PostgreSQL only let you omit from the GROUP BY clause columns that are functionally dependent on columns that are in the GROUP BY.

GROUP BY and ordering by date that was extracted as timestamp

I have a rather simple query:
SELECT table.foo, array_agg([ARRAY[EXTRACT(epoch FROM table.date), table.bar]) AS array
FROM table
GROUP BY table.foo,
ORDER BY table.date ASC;
When I run this query I get an error:
ERROR: column "table.date" must appear in the GROUP BY clause or be used in an aggregate function
I don't quite understand why that is happening because date appears in aggregate function. Is there any way to achieve that grouping?
you cant order by not existing column. If you want to order values in aggregation, use:
SELECT table.foo, array_agg([ARRAY[EXTRACT(epoch FROM table.date), table.bar] ORDER BY table.date ASC) AS array
FROM table
GROUP BY table.foo;

Column must appear in the GROUP BY clause

I have this query:
SELECT
"EventReadingListItem"."id"
, "EventReadingListItem"."UserId"
FROM "EventReadingListItems" AS "EventReadingListItem"
group by "EventReadingListItem"."EventId";
When I run it I get the error
Column "EventReadingListItem"."id" must appear in the GROUP BY clause or be used in an aggregate function.
Why? I have read similar questions but I don't really get why this simple group by is not working. Is it because the field in group by is not known as "EventReadingListItem" yet?
So, according to your comment, this should work for you.
Gives unique rows for each EventId which does have smallest/min id value:
select DISTINCT ON (EventId) EventId, id, UserId
from EventReadingListItems
order by EventId, id

group by date aggregate function in postgresql

I'm getting an error running this query
SELECT date(updated_at), count(updated_at) as total_count
FROM "persons"
WHERE ("persons"."updated_at" BETWEEN '2012-10-17 00:00:00.000000' AND '2012-11-07 12:25:04.082224')
GROUP BY date(updated_at)
ORDER BY persons.updated_at DESC
I get the error ERROR: column "persons.updated_at" must appear in the GROUP BY clause or be used in an aggregate function LINE 5: ORDER BY persons.updated_at DESC
This works if I remove the date( function from the group by call, however I'm using the date function because i want to group by date, not datetime
any ideas
At the moment it is unclear what you want Postgres to return. You say it should order by persons.updated_at but you do not retrieve that field from the database.
I think, what you want to do is:
SELECT date(updated_at), count(updated_at) as total_count
FROM "persons"
WHERE ("persons"."updated_at" BETWEEN '2012-10-17 00:00:00.000000' AND '2012-11-07 12:25:04.082224')
GROUP BY date(updated_at)
ORDER BY count(updated_at) DESC -- this line changed!
Now you are explicitly telling the DB to sort by the resulting value from the COUNT-aggregate. You could also use: ORDER BY 2 DESC, effectively telling the database to sort by the second column in the resultset. However I highly prefer explicitly stating the column for clarity.
Note that I'm currently unable to test this query, but I do think this should work.
the problem is that, because you are grouping by date(updated_at), the value for updated_at may not be unique, different values of updated_at can return the same value for date(updated_at). You need to tell the database which of the possible values it should use, or alternately use the value returned by the group by, probably one of
SELECT date(updated_at) FROM persons GROUP BY date(updated_at)
ORDER BY date(updated_at)
or
SELECT date(updated_at) FROM persons GROUP BY date(updated_at)
ORDER BY min(updated_at)

Create a query to select two columns; (Company, No. of Films) from the database

I have created a database as part of university assignment and I have hit a snag with the question in the title.
More likely I am being asked to find out how many films each company has made. Which suggests to me a group by query. But I have no idea where to begin. It is only a two mark question but the syntax is not clicking in my head.
My schema is:
CREATE TABLE Movie
(movieID CHAR(3) ,
title CHAR(36),
year NUMBER,
company CHAR(50),
totalNoms NUMBER,
awardsWon NUMBER,
DVDPrice NUMBER(5,2),
discountPrice NUMBER(5,2))
There are other tables but at first glance I don't think they are relevant to this question.
I am using sqlplus10
The answer you need comes from three basic SQL concepts, I'll step through them with you. If you need more assistance to create an answer from these hints, let me know and I can try to keep guiding you.
Group By
As you mentioned, SQL offers a GROUP BY function that can help you.
A SQL Query utilizing GROUP BY would look like the following.
SELECT list, fields, aggregate(value)
FROM tablename
--WHERE goes here, if you need to restrict your result set
GROUP BY list, fields
a GROUP BY query can only return fields listed in the group by statement, or aggregate functions acting on each group.
Aggregate Functions
Your homework question also needs an Aggregate function called Count. This is used to count the results returned. A simple query like the following returns the count of all records returned.
SELECT Count(*)
FROM tablename
The two can be combined, allowing you to get the Count of each group in the following way.
SELECT list, fields, count(*)
FROM tablename
GROUP BY list, fields
Column Aliases
Another answer also tried to introduce you to SQL column aliases, but they did not use SQLPLUS syntax.
SELECT Count(*) as count
...
SQLPLUS column alias syntax is shown below.
SELECT Count(*) "count"
...
I'm not going to provide you the SQL, but instead a way to think about it.
What you want to do is select where the company matches and count the total rows returned. That count is the number of films made by the specified company.
Hope that points you in the right direction.
Select company, count(*) AS count
from Movie
group by company
select * group by company won't work in Oracle.