Postgres: subquery uses ungrouped column from outer query - postgresql

So I am trying to create a query where I group a subquery in an array.
I am not an expert in Postgres especially not arrays in queries so I am trying to learn the more advanced stuffs.
I want to reach the following output:
Date | Jobs
-----------------------------------------------------------------------------
21/11/2022 | {{TestJob1,1500},{TestJob2,1100},{TestJob3,500}}
20/11/2022 | {{TestJob1,1300},{TestJob2,100},{TestJob3,500}}
19/11/2022 | {{TestJob1,1400},{TestJob2,1900}}
18/11/2022 | {{TestJob1,1200},{TestJob2,1700},{TestJob3,800},{TestJob4,500}}
I alread started experimenting and this is how far I got:
SELECT j."Start time"::date AS "Date",
(select array["Job name", count(*)::varchar] from amdw."Job runs" where "Start time"::date = j."Start time"::date group by "Job name") as "Jobs"
FROM amdw."Job runs" j
GROUP BY "Date"
ORDER BY "Date" DESC;
But with this query I get the following error:
SQL Error [42803]: ERROR: subquery uses ungrouped column "j.Start time" from outer query
Position: 135
Anybody an idea how to solve this query issue, and get to the output I want to get?
After adapting the query as #a_horse_with_no_name suggested:
SELECT j."Start time"::date AS "Date",
["Job name", count(*)::varchar] as "Jobs"
FROM amdw."Job runs" j
GROUP BY "Date", "Job name"
ORDER BY "Date" DESC;
Date | Jobs
-----------------------------------------------------------------------------
21/11/2022 | {{TestJob1,1500}
21/11/2022 | {TestJob2,1100}
21/11/2022 | {TestJob3,500}}
20/11/2022 | {{TestJob1,1300}
20/11/2022 | {TestJob2,100}
20/11/2022 | {TestJob3,500}}
So I now need to find a way to only show the date once and create a second dimension in the array...

You can aggregate this in two steps, first you create a temporary table that count each Job name by Start time and then you aggregate your arrays with ARRAY_AGG()
-- Creation of a temporary table
WITH agg_by_date_and_job_name AS (
SELECT
"Start time" AS "Date",
ARRAY ["Job name", count(*)::text] AS "Job"
FROM
amdw
GROUP BY
"Date",
"Job name"
)
SELECT
"Date",
ARRAY_AGG("Job") AS "Jobs"
FROM
agg_by_date_and_job_name
GROUP BY
"Date"
ORDER BY
"Date" DESC;

Related

How to select data where id and date range timestamp in postgres

I have a problem to make a query for view data in postgresql. I want to view data with 2 condition :
where employeeId
and between daterange
Heres my Query:
Select *
from employee
where employeeId = 3
and date(created_at) = between '2022-08-29' and '2022-08-31'
I have run that query but show error:
Reason:
SQL Error [42601]: ERROR: syntax error at or near "date" Position: 1`
The type data of column created_at is timestamp.
My questions is: What is correct query for view data from that conditions?
Remove the = operator from your query, the BETWEEN does not require the =
Select * from employee where employeeId = 3 and date(created_at) between '2022-08-29' and '2022-08-31'
You can use arithmetic operation
Select *
from employee
where employeeId = 3
and created_at>='2022-08-29'
and created_at< '2022-09-01'

Aggregate function to extract all fields based on maximum date

In one table I have duplicate values ​​that I would like to group and export only those fields where the value in the "published_at" field is the most up-to-date (the latest date possible). Do I understand it correctly as I use the MAX aggregate function the corresponding fields I would like to extract will refer to the max found or will it take the first found in the table?
Let me demonstrate you this on simple example (in real world example I am also joining two different tables). I would like to group it by id and extract all fields but only relating to the max published_at field. My query would be:
SELECT "t1"."id", "t1"."field", MAX("t1"."published_at") as "published_at"
FROM "t1"
GROUP By "t1"."id"
| id | field | published_at |
---------------------------------
| 1 | document1 | 2022-01-10 |
| 1 | document2 | 2022-01-11 |
| 1 | document3 | 2022-01-12 |
The result I want is:
1 - document3 - 2022-01-12
Also one question - why am I getting this error "ERROR: column "t1"."field" must appear in the GROUP BY clause or be used in an aggregate function". Can I use MAX function on string type column?
If you want the latest row for each id, you can use DISTINCT ON. For example:
select distinct on (id) *
from t
order by id, published_at desc
If you just want the latest row in the whole result set you can use LIMIT. For example:
select *
from t
order by published_at desc
limit 1

select ID and sql query (count query) from table and write ID and result of count query to target table

My source table has query id and a sql query.
Using Talend I need to run this CUSTOM_SQL query against the database and load a target table with the counts.
source table:
QUERY_ID|CUSTOM_SQL |
--------+----------------------------------------------------------------+
1|select count(1) as ROW_COUNT from SYSTEM_PRIVILEGE_MAP |
2|select count(1) as ROW_COUNT from OGIS_SPATIAL_REFERENCE_SYSTEMS|
3|select count(1) as ROW_COUNT from SDO_COORD_SYS |
4|select count(1) as ROW_COUNT from SDO_COORD_REF_SYS |
5|select count(1) as ROW_COUNT from SDO_PREFERRED_OPS_SYSTEM |
6|select count(1) as ROW_COUNT from SDO_TIN_PC_SYSDATA_TABLE |
expected output in target table:
QUERY_ID|QUERY_RESULT |
--------+-------------+
1|290 |
2|322 |
3|784 |
4|8484 |
5|743 |
I created a job that looks as follows but it is not complete:
tdbInput -> tFlowIterate -> tDBInput -> tMap -> tDBOutput
With the above design I'm able to run the CUSTOM_SQL, capture the result from tDBInput, but unable capture and propagate the QUERY_ID.
How do I propagate both query_id and the query result in one row to the target table. What components should I use?
Please note that each CUSTOM_SQLs always return one row and one column. So this is a very specific usecase.
I simplified my scenario by using some dummy data.
I will appreciate any help on this.
Thank you!
With your first tDBInput component, make sure you extract both QUERY_ID and CUSTOM_SQL (select QUERY_iD,CUSTOM_SQL from source_table) : you should get in global variables of your tFlowToIterate two variables (something like ((String)globalMap.get("row1.QUERY_ID")) , with row1 being the name of the flow between tDBInput and tFlowToIterate). You can also check in "outline" view if those variables appear under tFlowToIterate_1 component.
Then in tMap, you can just access this query_id global variable (with the above syntax) and push it to your target table.

postgresql groupby count where date is between

So I have a table where I want the count of rows where the customer is McDonalds's and Date > 2019-06-30
I am trying
select "Customer",
Count("Customer")
FROM
public.master_environmental_data
WHERE "Customer" = 'McDonald''s' AND "Date" > '2021-06-30';
However I am getting this error:
column "master_environmental_data.Customer" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select "Customer",
^
SQL state: 42803
Character: 8
What is the correct query?
Should add a GROUP BY at the end of the query:
select "Customer",
Count("Customer")
FROM
public.master_environmental_data
WHERE "Customer" = 'McDonald''s' AND "Date" > '2021-06-30'
GROUP BY "Customer";

group by in postgres sql with error must appear in the GROUP BY clause or be used in an aggregate function [duplicate]

I've been migrating some of my MySQL queries to PostgreSQL to use Heroku. Most of my queries work fine, but I keep having a similar recurring error when I use group by:
ERROR: column "XYZ" must appear in the GROUP BY clause or be used in
an aggregate function
Could someone tell me what I'm doing wrong?
MySQL which works 100%:
SELECT `availables`.*
FROM `availables`
INNER JOIN `rooms` ON `rooms`.id = `availables`.room_id
WHERE (rooms.hotel_id = 5056 AND availables.bookdate BETWEEN '2009-11-22' AND '2009-11-24')
GROUP BY availables.bookdate
ORDER BY availables.updated_at
PostgreSQL error:
ActiveRecord::StatementInvalid: PGError: ERROR: column
"availables.id" must appear in the GROUP BY clause or be used in an
aggregate function:
SELECT "availables".* FROM "availables" INNER
JOIN "rooms" ON "rooms".id = "availables".room_id WHERE
(rooms.hotel_id = 5056 AND availables.bookdate BETWEEN E'2009-10-21'
AND E'2009-10-23') GROUP BY availables.bookdate ORDER BY
availables.updated_at
Ruby code generating the SQL:
expiration = Available.find(:all,
:joins => [ :room ],
:conditions => [ "rooms.hotel_id = ? AND availables.bookdate BETWEEN ? AND ?", hostel_id, date.to_s, (date+days-1).to_s ],
:group => 'availables.bookdate',
:order => 'availables.updated_at')
Expected Output (from working MySQL query):
+-----+-------+-------+------------+---------+---------------+---------------+
| id | price | spots | bookdate | room_id | created_at | updated_at |
+-----+-------+-------+------------+---------+---------------+---------------+
| 414 | 38.0 | 1 | 2009-11-22 | 1762 | 2009-11-20... | 2009-11-20... |
| 415 | 38.0 | 1 | 2009-11-23 | 1762 | 2009-11-20... | 2009-11-20... |
| 416 | 38.0 | 2 | 2009-11-24 | 1762 | 2009-11-20... | 2009-11-20... |
+-----+-------+-------+------------+---------+---------------+---------------+
3 rows in set
MySQL's totally non standards compliant GROUP BY can be emulated by Postgres' DISTINCT ON. Consider this:
MySQL:
SELECT a,b,c,d,e FROM table GROUP BY a
This delivers 1 row per value of a (which one, you don't really know). Well actually you can guess, because MySQL doesn't know about hash aggregates, so it will probably use a sort... but it will only sort on a, so the order of the rows could be random. Unless it uses a multicolumn index instead of sorting. Well, anyway, it's not specified by the query.
Postgres:
SELECT DISTINCT ON (a) a,b,c,d,e FROM table ORDER BY a,b,c
This delivers 1 row per value of a, this row will be the first one in the sort according to the ORDER BY specified by the query. Simple.
Note that here, it's not an aggregate I'm computing. So GROUP BY actually makes no sense. DISTINCT ON makes a lot more sense.
Rails is married to MySQL, so I'm not surprised that it generates SQL that doesn't work in Postgres.
PostgreSQL is more SQL compliant than MySQL. All fields - except computed field with aggregation function - in the output must be present in the GROUP BY clause.
MySQL's GROUP BY can be used without an aggregate function (which is contrary to the SQL standard), and returns the first row in the group (I don't know based on what criteria), while PostgreSQL must have an aggregate function (MAX, SUM, etc) on the column, on which the GROUP BY clause is issued.
Correct, the solution to fixing this is to use :select and to select each field that you wish to decorate the resulting object with and group by them.
Nasty - but it is how group by should work as opposed to how MySQL works with it by guessing what you mean if you don't stick fields in your group by.
If I remember correctly, in PostgreSQL you have to add every column you fetch from the table where the GROUP BY clause applies to the GROUP BY clause.
Not the prettiest solution, but changing the group parameter to output every column in model works in PostgreSQL:
expiration = Available.find(:all,
:joins => [ :room ],
:conditions => [ "rooms.hotel_id = ? AND availables.bookdate BETWEEN ? AND ?", hostel_id, date.to_s, (date+days-1).to_s ],
:group => Available.column_names.collect{|col| "availables.#{col}"},
:order => 'availables.updated_at')
According to MySQL's "Debuking GROUP BY Myths" http://dev.mysql.com/tech-resources/articles/debunking-group-by-myths.html. SQL (2003 version of the standard) doesn't requires columns referenced in the SELECT list of a query to also appear in the GROUP BY clause.
For others looking for a way to order by any field, including joined field, in postgresql, use a subquery:
SELECT * FROM(
SELECT DISTINCT ON(availables.bookdate) `availables`.*
FROM `availables` INNER JOIN `rooms` ON `rooms`.id = `availables`.room_id
WHERE (rooms.hotel_id = 5056
AND availables.bookdate BETWEEN '2009-11-22' AND '2009-11-24')
) AS distinct_selected
ORDER BY availables.updated_at
or arel:
subquery = SomeRecord.select("distinct on(xx.id) xx.*, jointable.order_field")
.where("").joins(")
result = SomeRecord.select("*").from("(#{subquery.to_sql}) AS distinct_selected").order(" xx.order_field ASC, jointable.order_field ASC")
I think that .uniq [1] will solve your problem.
[1] Available.select('...').uniq
Take a look at http://guides.rubyonrails.org/active_record_querying.html#selecting-specific-fields