GROUP BY and ordering by date that was extracted as timestamp

GROUP BY and ordering by date that was extracted as timestamp - postgresql

I have a rather simple query:
SELECT table.foo, array_agg([ARRAY[EXTRACT(epoch FROM table.date), table.bar]) AS array
FROM table
GROUP BY table.foo,
ORDER BY table.date ASC;
When I run this query I get an error:
ERROR: column "table.date" must appear in the GROUP BY clause or be used in an aggregate function
I don't quite understand why that is happening because date appears in aggregate function. Is there any way to achieve that grouping?

you cant order by not existing column. If you want to order values in aggregation, use:
SELECT table.foo, array_agg([ARRAY[EXTRACT(epoch FROM table.date), table.bar] ORDER BY table.date ASC) AS array
FROM table
GROUP BY table.foo;

Related

Why do I need to group by columns that I don't need to group by?

Say I have a query like this:
SELECT
car.id,
car.make,
car.model,
car.vin,
car.year,
car.color
FROM car GROUP BY car.make
I want to group the result by make so I can eliminate any duplicate makes. I'm essentially trying to do a SELECT DISTINCT. But I get this error:
ERROR column must appear in the GROUP BY clause or be used in an aggregate function
It seems silly to group by each column when I dont want to see any of them in a group. How do I get around this?

Instead of GROUP BY, use DISTINCT ON:
SELECT DISTINCT ON (c.make) c.*
FROM car c
ORDER BY c.make;
This will return an arbitrary row for each make. Which row? An arbitrary one. You can include a second key in the ORDER BY to determine the particular row you want (cheapest, oldest, etc.).

All column names in SELECT list must appear in GROUP BY clause unless name is used only in an aggregate function. PostgreSQL only let you omit from the GROUP BY clause columns that are functionally dependent on columns that are in the GROUP BY.

postgres(redshift) query including to_char and group by returns some errors

Im using redshift now.
then Id like to run query like
SELECT to_char(created_at, 'HH24') AS hour , to_char(created_at, 'YYYY-MM-DD HH24') AS tmp FROM log GROUP BY tmp;
this returns error, when I do it in mysql, it seems to be good.
this error is
ERROR: column "log.created_at" must appear in the GROUP BY clause or be used in an aggregate function
when I changed group by clause like "group by created_at", it returns results, but it has duplicated list.
Is is due to redshift?

If you're using a GROUP BY clause, any column in your query must either appear in the clause or you have to specify how you want it to be aggregated.
In your case, you seem to be trying to aggregate your log entries by hour. I suggest using the postgres date manipulation functions, for example:
SELECT created_at::date AS date,
extract('HOUR' FROM created_at) as hour
FROM log
GROUP BY date, hour;

group by date aggregate function in postgresql

I'm getting an error running this query
SELECT date(updated_at), count(updated_at) as total_count
FROM "persons"
WHERE ("persons"."updated_at" BETWEEN '2012-10-17 00:00:00.000000' AND '2012-11-07 12:25:04.082224')
GROUP BY date(updated_at)
ORDER BY persons.updated_at DESC
I get the error ERROR: column "persons.updated_at" must appear in the GROUP BY clause or be used in an aggregate function LINE 5: ORDER BY persons.updated_at DESC
This works if I remove the date( function from the group by call, however I'm using the date function because i want to group by date, not datetime
any ideas

At the moment it is unclear what you want Postgres to return. You say it should order by persons.updated_at but you do not retrieve that field from the database.
I think, what you want to do is:
SELECT date(updated_at), count(updated_at) as total_count
FROM "persons"
WHERE ("persons"."updated_at" BETWEEN '2012-10-17 00:00:00.000000' AND '2012-11-07 12:25:04.082224')
GROUP BY date(updated_at)
ORDER BY count(updated_at) DESC -- this line changed!
Now you are explicitly telling the DB to sort by the resulting value from the COUNT-aggregate. You could also use: ORDER BY 2 DESC, effectively telling the database to sort by the second column in the resultset. However I highly prefer explicitly stating the column for clarity.
Note that I'm currently unable to test this query, but I do think this should work.

the problem is that, because you are grouping by date(updated_at), the value for updated_at may not be unique, different values of updated_at can return the same value for date(updated_at). You need to tell the database which of the possible values it should use, or alternately use the value returned by the group by, probably one of
SELECT date(updated_at) FROM persons GROUP BY date(updated_at)
ORDER BY date(updated_at)
or
SELECT date(updated_at) FROM persons GROUP BY date(updated_at)
ORDER BY min(updated_at)

hive Expression Not In Group By Key

I create a table in HIVE.
It has the following columns:
id bigint, rank bigint, date string
I want to get avg(rank) per month. I can use this command. It works.
select a.lens_id, avg(a.rank)
from tableA a
group by a.lens_id, year(a.date_saved), month(a.date_saved);
However, I also want to get date information. I use this command:
select a.lens_id, avg(a.rank), a.date_saved
from lensrank_archive a
group by a.lens_id, year(a.date_saved), month(a.date_saved);
It complains: Expression Not In Group By Key

The full error message should be in the format Expression Not In Group By Key [value].
The [value] will tell you what expression needs to be in the Group By.
Just looking at the two queries, I'd say that you need to add a.date_saved explicitly to the Group By.

A walk around is to put the additional field in a collect_set and return the first element of the set. For example
select a.lens_id, avg(a.rank), collect_set(a.date_saved)[0]
from lensrank_archive a
group by a.lens_id, year(a.date_saved), month(a.date_saved);

This is because there is more than one ‘date_saved’ record under your group by. You can turn these ‘date_saved’ records into arrays and output them.

sql date order by problem

i have image table, which has 2 or more rows with same date.. now im tring to do order by created_date DESC, which works fine and shows rows same position, but when i change the query and try again, it shows different positions.. and no i dont have any other order by field, so im bit confused on why its doing it and how can i fix it.
can you please help on this.

To get reproducible results you need to have columns in your order by clause that together are unique. Do you have an ID column? You can use that to tie-break:
ORDER BY created_date DESC, id

I suspect that this is happening because MySQL is not given any ordering information other than ORDER BY created_date DESC, so it does whatever is most convenient for MySQL depending on its complicated inner workings (caching, indexing, etc.). Assuming you have a unique key id, you could do:
SELECT * FROM table t ORDER BY t.created_date DESC, t.id ASC
Which would give you the same result every time because putting a comma in the arguments following ORDER BY gives it a secondary ordering rule that is executed when the first ordering rule doesn't produce a clear order between two rows.

To have consistent results, you will need to add at least more column to the 'ORDER BY' clause. Since the values in the created_date column are not unique, there is not a defined order. If you wanted that column to be 'unique', you could define it as a timestamp.