How can I sort results by aggregate in Cloudwatch Log Insights? - amazon-cloudwatchlogs

I have a pretty straightforward query:
fields #timestamp, req.url, msg
| sort #timestamp desc
| filter msg = "request completed"
| stats count() by req.url
It presents all requests served by my app aggregated by url. However, I would also like to sort the results by the value of aggregate count() - but both | sort count desc and | sort "count()" desc don't work. How can I achieve that?

Turns out, all I had to do was to use an alias and then sort by it:
fields #timestamp, msg, req.url
| filter msg="request completed"
| stats count() as count by req.url
| sort count desc

Related

Postgres: subquery uses ungrouped column from outer query

So I am trying to create a query where I group a subquery in an array.
I am not an expert in Postgres especially not arrays in queries so I am trying to learn the more advanced stuffs.
I want to reach the following output:
Date | Jobs
-----------------------------------------------------------------------------
21/11/2022 | {{TestJob1,1500},{TestJob2,1100},{TestJob3,500}}
20/11/2022 | {{TestJob1,1300},{TestJob2,100},{TestJob3,500}}
19/11/2022 | {{TestJob1,1400},{TestJob2,1900}}
18/11/2022 | {{TestJob1,1200},{TestJob2,1700},{TestJob3,800},{TestJob4,500}}
I alread started experimenting and this is how far I got:
SELECT j."Start time"::date AS "Date",
(select array["Job name", count(*)::varchar] from amdw."Job runs" where "Start time"::date = j."Start time"::date group by "Job name") as "Jobs"
FROM amdw."Job runs" j
GROUP BY "Date"
ORDER BY "Date" DESC;
But with this query I get the following error:
SQL Error [42803]: ERROR: subquery uses ungrouped column "j.Start time" from outer query
Position: 135
Anybody an idea how to solve this query issue, and get to the output I want to get?
After adapting the query as #a_horse_with_no_name suggested:
SELECT j."Start time"::date AS "Date",
["Job name", count(*)::varchar] as "Jobs"
FROM amdw."Job runs" j
GROUP BY "Date", "Job name"
ORDER BY "Date" DESC;
Date | Jobs
-----------------------------------------------------------------------------
21/11/2022 | {{TestJob1,1500}
21/11/2022 | {TestJob2,1100}
21/11/2022 | {TestJob3,500}}
20/11/2022 | {{TestJob1,1300}
20/11/2022 | {TestJob2,100}
20/11/2022 | {TestJob3,500}}
So I now need to find a way to only show the date once and create a second dimension in the array...
You can aggregate this in two steps, first you create a temporary table that count each Job name by Start time and then you aggregate your arrays with ARRAY_AGG()
-- Creation of a temporary table
WITH agg_by_date_and_job_name AS (
SELECT
"Start time" AS "Date",
ARRAY ["Job name", count(*)::text] AS "Job"
FROM
amdw
GROUP BY
"Date",
"Job name"
)
SELECT
"Date",
ARRAY_AGG("Job") AS "Jobs"
FROM
agg_by_date_and_job_name
GROUP BY
"Date"
ORDER BY
"Date" DESC;

Aggregate function to extract all fields based on maximum date

In one table I have duplicate values ​​that I would like to group and export only those fields where the value in the "published_at" field is the most up-to-date (the latest date possible). Do I understand it correctly as I use the MAX aggregate function the corresponding fields I would like to extract will refer to the max found or will it take the first found in the table?
Let me demonstrate you this on simple example (in real world example I am also joining two different tables). I would like to group it by id and extract all fields but only relating to the max published_at field. My query would be:
SELECT "t1"."id", "t1"."field", MAX("t1"."published_at") as "published_at"
FROM "t1"
GROUP By "t1"."id"
| id | field | published_at |
---------------------------------
| 1 | document1 | 2022-01-10 |
| 1 | document2 | 2022-01-11 |
| 1 | document3 | 2022-01-12 |
The result I want is:
1 - document3 - 2022-01-12
Also one question - why am I getting this error "ERROR: column "t1"."field" must appear in the GROUP BY clause or be used in an aggregate function". Can I use MAX function on string type column?
If you want the latest row for each id, you can use DISTINCT ON. For example:
select distinct on (id) *
from t
order by id, published_at desc
If you just want the latest row in the whole result set you can use LIMIT. For example:
select *
from t
order by published_at desc
limit 1

group by in postgres sql with error must appear in the GROUP BY clause or be used in an aggregate function [duplicate]

I've been migrating some of my MySQL queries to PostgreSQL to use Heroku. Most of my queries work fine, but I keep having a similar recurring error when I use group by:
ERROR: column "XYZ" must appear in the GROUP BY clause or be used in
an aggregate function
Could someone tell me what I'm doing wrong?
MySQL which works 100%:
SELECT `availables`.*
FROM `availables`
INNER JOIN `rooms` ON `rooms`.id = `availables`.room_id
WHERE (rooms.hotel_id = 5056 AND availables.bookdate BETWEEN '2009-11-22' AND '2009-11-24')
GROUP BY availables.bookdate
ORDER BY availables.updated_at
PostgreSQL error:
ActiveRecord::StatementInvalid: PGError: ERROR: column
"availables.id" must appear in the GROUP BY clause or be used in an
aggregate function:
SELECT "availables".* FROM "availables" INNER
JOIN "rooms" ON "rooms".id = "availables".room_id WHERE
(rooms.hotel_id = 5056 AND availables.bookdate BETWEEN E'2009-10-21'
AND E'2009-10-23') GROUP BY availables.bookdate ORDER BY
availables.updated_at
Ruby code generating the SQL:
expiration = Available.find(:all,
:joins => [ :room ],
:conditions => [ "rooms.hotel_id = ? AND availables.bookdate BETWEEN ? AND ?", hostel_id, date.to_s, (date+days-1).to_s ],
:group => 'availables.bookdate',
:order => 'availables.updated_at')
Expected Output (from working MySQL query):
+-----+-------+-------+------------+---------+---------------+---------------+
| id | price | spots | bookdate | room_id | created_at | updated_at |
+-----+-------+-------+------------+---------+---------------+---------------+
| 414 | 38.0 | 1 | 2009-11-22 | 1762 | 2009-11-20... | 2009-11-20... |
| 415 | 38.0 | 1 | 2009-11-23 | 1762 | 2009-11-20... | 2009-11-20... |
| 416 | 38.0 | 2 | 2009-11-24 | 1762 | 2009-11-20... | 2009-11-20... |
+-----+-------+-------+------------+---------+---------------+---------------+
3 rows in set
MySQL's totally non standards compliant GROUP BY can be emulated by Postgres' DISTINCT ON. Consider this:
MySQL:
SELECT a,b,c,d,e FROM table GROUP BY a
This delivers 1 row per value of a (which one, you don't really know). Well actually you can guess, because MySQL doesn't know about hash aggregates, so it will probably use a sort... but it will only sort on a, so the order of the rows could be random. Unless it uses a multicolumn index instead of sorting. Well, anyway, it's not specified by the query.
Postgres:
SELECT DISTINCT ON (a) a,b,c,d,e FROM table ORDER BY a,b,c
This delivers 1 row per value of a, this row will be the first one in the sort according to the ORDER BY specified by the query. Simple.
Note that here, it's not an aggregate I'm computing. So GROUP BY actually makes no sense. DISTINCT ON makes a lot more sense.
Rails is married to MySQL, so I'm not surprised that it generates SQL that doesn't work in Postgres.
PostgreSQL is more SQL compliant than MySQL. All fields - except computed field with aggregation function - in the output must be present in the GROUP BY clause.
MySQL's GROUP BY can be used without an aggregate function (which is contrary to the SQL standard), and returns the first row in the group (I don't know based on what criteria), while PostgreSQL must have an aggregate function (MAX, SUM, etc) on the column, on which the GROUP BY clause is issued.
Correct, the solution to fixing this is to use :select and to select each field that you wish to decorate the resulting object with and group by them.
Nasty - but it is how group by should work as opposed to how MySQL works with it by guessing what you mean if you don't stick fields in your group by.
If I remember correctly, in PostgreSQL you have to add every column you fetch from the table where the GROUP BY clause applies to the GROUP BY clause.
Not the prettiest solution, but changing the group parameter to output every column in model works in PostgreSQL:
expiration = Available.find(:all,
:joins => [ :room ],
:conditions => [ "rooms.hotel_id = ? AND availables.bookdate BETWEEN ? AND ?", hostel_id, date.to_s, (date+days-1).to_s ],
:group => Available.column_names.collect{|col| "availables.#{col}"},
:order => 'availables.updated_at')
According to MySQL's "Debuking GROUP BY Myths" http://dev.mysql.com/tech-resources/articles/debunking-group-by-myths.html. SQL (2003 version of the standard) doesn't requires columns referenced in the SELECT list of a query to also appear in the GROUP BY clause.
For others looking for a way to order by any field, including joined field, in postgresql, use a subquery:
SELECT * FROM(
SELECT DISTINCT ON(availables.bookdate) `availables`.*
FROM `availables` INNER JOIN `rooms` ON `rooms`.id = `availables`.room_id
WHERE (rooms.hotel_id = 5056
AND availables.bookdate BETWEEN '2009-11-22' AND '2009-11-24')
) AS distinct_selected
ORDER BY availables.updated_at
or arel:
subquery = SomeRecord.select("distinct on(xx.id) xx.*, jointable.order_field")
.where("").joins(")
result = SomeRecord.select("*").from("(#{subquery.to_sql}) AS distinct_selected").order(" xx.order_field ASC, jointable.order_field ASC")
I think that .uniq [1] will solve your problem.
[1] Available.select('...').uniq
Take a look at http://guides.rubyonrails.org/active_record_querying.html#selecting-specific-fields

Having(count) in a Cakephp 3 query fails on PostgreSQL

I have the following table structure with matching relations:
,---------. ,--------------. ,---------.
| Threads | | ThreadsUsers | | Users |
|---------| |--------------| |---------|
| id | | id | | id |
'---------' | thread_id | '---------'
| user_id |
'--------------'
This custom query in ThreadsTable is meant to find threads with a given number of participants. It works fine on mysql
public function findWithUserCount(Query $query, array $options)
{
return $query
->matching('Users')
->select([
'Threads.id',
'count' => 'COUNT(Users.id)'
])
->group('Threads.id HAVING count = ' . $options['count']);
}
However it fails on postgresql with the following error
PDOException: SQLSTATE[42703]: Undefined column: 7
ERROR: column "count" does not exist
LINE 1: ...ThreadsUsers.user_id)) GROUP BY Threads.id HAVING count = 2
The HAVING clause cannot reference column aliases defined in the SELECT clause. The documentation says:
Each column referenced in condition must unambiguously reference a grouping column, unless the reference appears within an aggregate function or the ungrouped column is functionally dependent on the grouping columns.
Since count is neither a "grouping column" (i.e. the subject of the GROUP BY clause) nor an aggregate function, it can't be used there.
So the correct form would presumably be (I don't know CakePHP, and the fact that you can inject SQL into the group call at all seems like a massively broken design for a query builder):
->group('Threads.id HAVING COUNT(Users.id) = ' . $options['count']);

Postgres - is it possible to group by substring of one of my fields?

This is my table:
id | integer | not null default nextval('frontend_prescription_id_seq'::regclass)
actual_cost | double precision | not null
chemical_id | character varying(9) | not null
practice_id | character varying(6) | not null
I'd like to query results for a particular practice_id, and then sum the actual_cost by date and by the first two characters of the chemical_id. Is this possible in Postgres?
In other words, I'd like the output to look something like this:
processing_date | cost | chemical_id_substr
01-01-2010 1234 01
01-02-2010 4366 01
01-01-2010 3827 02
01-02-2010 8768 02
This is my current query, but it groups by the whole of chemical_id, not the substring:
query = "SELECT SUM(actual_cost) as cost, processing_date, "
query += "chemical_id as id FROM frontend_items"
query += " WHERE practice_id=%s "
query += "GROUP BY processing_date, chemical_id"
cursor.execute(query, (practice_id,))
I'm not sure how to change this to group by substring, or whether I should add a functional index, or whether I should just denormalise my table and add a new column. Thanks for any help.
You can do this, but you also need to make sure the substring is used in the select list, not the complete column:
SELECT SUM(actual_cost) as cost,
processing_date,
left(chemical_id,2) as id --<< use the same expression here as in the GROUP BY
FROM frontend_items
WHERE practice_id= %s
GROUP BY processing_date, left(chemical_id,2);