Aggregate function to extract all fields based on maximum date - postgresql

In one table I have duplicate values ​​that I would like to group and export only those fields where the value in the "published_at" field is the most up-to-date (the latest date possible). Do I understand it correctly as I use the MAX aggregate function the corresponding fields I would like to extract will refer to the max found or will it take the first found in the table?
Let me demonstrate you this on simple example (in real world example I am also joining two different tables). I would like to group it by id and extract all fields but only relating to the max published_at field. My query would be:
SELECT "t1"."id", "t1"."field", MAX("t1"."published_at") as "published_at"
FROM "t1"
GROUP By "t1"."id"
| id | field | published_at |
---------------------------------
| 1 | document1 | 2022-01-10 |
| 1 | document2 | 2022-01-11 |
| 1 | document3 | 2022-01-12 |
The result I want is:
1 - document3 - 2022-01-12
Also one question - why am I getting this error "ERROR: column "t1"."field" must appear in the GROUP BY clause or be used in an aggregate function". Can I use MAX function on string type column?

If you want the latest row for each id, you can use DISTINCT ON. For example:
select distinct on (id) *
from t
order by id, published_at desc
If you just want the latest row in the whole result set you can use LIMIT. For example:
select *
from t
order by published_at desc
limit 1

Related

ADF - Dataflow, using Join to send new values

there are two tables
tbl_1 as a source data
ID | Submission_id
--------------------
1 | A00_1
2 | A00_2
3 | A00_3
4 | A00_4
5 | A00_5
6 | A00_6
7 | A00_7
tbl_2 as destination. In this table, Submission_id is unique key.
ID | Submission_id
--------------------
1 | A00_1
2 | A00_2
3 | A00_3
4 | A00_4
tbl_1 as input value and tbl_2 as destination (sink). Expected result is only A00_5, A00_6 & A00_7 sent to tbl_2. So, this picture below is the Join
for AlterRow,
expected ouput
tbl_2
ID | Submission_id
--------------------
1 | A00_1
2 | A00_2
3 | A00_3
4 | A00_4
5 | A00_5 -->(new)
6 | A00_6 -->(new)
7 | A00_7 -->(new)
But, output result from alterRow are all Submission_id. It should be only not equal comparison that has been stated in the alter row condition,
notEquals(DC__Submission_ID_BigInt, SrcStgDestination#{_Submission_ID}).
How to solve this problem in Azure DataFlow use 'Join' ?
I tried doing the same procedure and got the same result (all rows getting inserted). We were able to perform join in the desired way but couldn’t proceed further to get the required output. You can use the approach given below instead, which is achieved using JOINS.
In general, when we want to get records from table1 which are not present in table2, we execute the following query (in sql server).
select t1.id,t1.submission_id from t1 left outer join t2 on t1.submission_id = t2.submission_id where t2.submission_id is NULL
In the Dataflow, we were able to achieve the join successfully (same procedure as yours). Now instead using alter row transformation, I used filter transformation (to achieve t2.submission_id is NULL condition). I used the following expression (condition) to filter.
isNull(d1#submission_id) && isNull(d1#id)
Now proceed to configure the sink (tbl_2). The preview would show the records as in the below image.
Publish and run the dataflow activity in your pipeline to get the desired results.

group by in postgres sql with error must appear in the GROUP BY clause or be used in an aggregate function [duplicate]

I've been migrating some of my MySQL queries to PostgreSQL to use Heroku. Most of my queries work fine, but I keep having a similar recurring error when I use group by:
ERROR: column "XYZ" must appear in the GROUP BY clause or be used in
an aggregate function
Could someone tell me what I'm doing wrong?
MySQL which works 100%:
SELECT `availables`.*
FROM `availables`
INNER JOIN `rooms` ON `rooms`.id = `availables`.room_id
WHERE (rooms.hotel_id = 5056 AND availables.bookdate BETWEEN '2009-11-22' AND '2009-11-24')
GROUP BY availables.bookdate
ORDER BY availables.updated_at
PostgreSQL error:
ActiveRecord::StatementInvalid: PGError: ERROR: column
"availables.id" must appear in the GROUP BY clause or be used in an
aggregate function:
SELECT "availables".* FROM "availables" INNER
JOIN "rooms" ON "rooms".id = "availables".room_id WHERE
(rooms.hotel_id = 5056 AND availables.bookdate BETWEEN E'2009-10-21'
AND E'2009-10-23') GROUP BY availables.bookdate ORDER BY
availables.updated_at
Ruby code generating the SQL:
expiration = Available.find(:all,
:joins => [ :room ],
:conditions => [ "rooms.hotel_id = ? AND availables.bookdate BETWEEN ? AND ?", hostel_id, date.to_s, (date+days-1).to_s ],
:group => 'availables.bookdate',
:order => 'availables.updated_at')
Expected Output (from working MySQL query):
+-----+-------+-------+------------+---------+---------------+---------------+
| id | price | spots | bookdate | room_id | created_at | updated_at |
+-----+-------+-------+------------+---------+---------------+---------------+
| 414 | 38.0 | 1 | 2009-11-22 | 1762 | 2009-11-20... | 2009-11-20... |
| 415 | 38.0 | 1 | 2009-11-23 | 1762 | 2009-11-20... | 2009-11-20... |
| 416 | 38.0 | 2 | 2009-11-24 | 1762 | 2009-11-20... | 2009-11-20... |
+-----+-------+-------+------------+---------+---------------+---------------+
3 rows in set
MySQL's totally non standards compliant GROUP BY can be emulated by Postgres' DISTINCT ON. Consider this:
MySQL:
SELECT a,b,c,d,e FROM table GROUP BY a
This delivers 1 row per value of a (which one, you don't really know). Well actually you can guess, because MySQL doesn't know about hash aggregates, so it will probably use a sort... but it will only sort on a, so the order of the rows could be random. Unless it uses a multicolumn index instead of sorting. Well, anyway, it's not specified by the query.
Postgres:
SELECT DISTINCT ON (a) a,b,c,d,e FROM table ORDER BY a,b,c
This delivers 1 row per value of a, this row will be the first one in the sort according to the ORDER BY specified by the query. Simple.
Note that here, it's not an aggregate I'm computing. So GROUP BY actually makes no sense. DISTINCT ON makes a lot more sense.
Rails is married to MySQL, so I'm not surprised that it generates SQL that doesn't work in Postgres.
PostgreSQL is more SQL compliant than MySQL. All fields - except computed field with aggregation function - in the output must be present in the GROUP BY clause.
MySQL's GROUP BY can be used without an aggregate function (which is contrary to the SQL standard), and returns the first row in the group (I don't know based on what criteria), while PostgreSQL must have an aggregate function (MAX, SUM, etc) on the column, on which the GROUP BY clause is issued.
Correct, the solution to fixing this is to use :select and to select each field that you wish to decorate the resulting object with and group by them.
Nasty - but it is how group by should work as opposed to how MySQL works with it by guessing what you mean if you don't stick fields in your group by.
If I remember correctly, in PostgreSQL you have to add every column you fetch from the table where the GROUP BY clause applies to the GROUP BY clause.
Not the prettiest solution, but changing the group parameter to output every column in model works in PostgreSQL:
expiration = Available.find(:all,
:joins => [ :room ],
:conditions => [ "rooms.hotel_id = ? AND availables.bookdate BETWEEN ? AND ?", hostel_id, date.to_s, (date+days-1).to_s ],
:group => Available.column_names.collect{|col| "availables.#{col}"},
:order => 'availables.updated_at')
According to MySQL's "Debuking GROUP BY Myths" http://dev.mysql.com/tech-resources/articles/debunking-group-by-myths.html. SQL (2003 version of the standard) doesn't requires columns referenced in the SELECT list of a query to also appear in the GROUP BY clause.
For others looking for a way to order by any field, including joined field, in postgresql, use a subquery:
SELECT * FROM(
SELECT DISTINCT ON(availables.bookdate) `availables`.*
FROM `availables` INNER JOIN `rooms` ON `rooms`.id = `availables`.room_id
WHERE (rooms.hotel_id = 5056
AND availables.bookdate BETWEEN '2009-11-22' AND '2009-11-24')
) AS distinct_selected
ORDER BY availables.updated_at
or arel:
subquery = SomeRecord.select("distinct on(xx.id) xx.*, jointable.order_field")
.where("").joins(")
result = SomeRecord.select("*").from("(#{subquery.to_sql}) AS distinct_selected").order(" xx.order_field ASC, jointable.order_field ASC")
I think that .uniq [1] will solve your problem.
[1] Available.select('...').uniq
Take a look at http://guides.rubyonrails.org/active_record_querying.html#selecting-specific-fields

Postgres Query for Beginners

Ok, I deleted previous post and will try this again. I am sure I don't know the topic and I'm not sure if this is a loop or if I should use a stored function or how to get what I'm looking for. Here's sample data and expected output;
I have a single table A. Table has following fields; date created, unique person key, type, location.
I need a Postgres query that says for any given month(parameter, based on date created) and given a location(parameter based on location field), provide me fieds below where unique person key may be duplicated + or – 30 days from the date created within the month given for same type but all locations.
Example Data
Date Created | Unique Person | Type | Location
---------------------------------------------------
2/5/2017 | 1 | Admit | Hospital1
2/6/2017 | 2 | Admit | Hospital2
2/15/2017 | 1 | Admit | Hospital2
2/28/2017 | 3 | Admit | Hospital2
3/3/2017 | 2 | Admit | Hospital1
3/15/2017 | 3 | Admit | Hospital3
3/20/2017 | 4 | Admit | Hospital1
4/1/2017 | 1 | Admit | Hospital2
Output for the month of March for Hospital1:
DateCreated| UniquePerson | Type | Location | +-30days | OtherLoc.
------------------------------------------------------------------------
3/3/2017 | 2 | Admit| Hospital1 | 2/6/2017 | Hospital2
Output for the month of March for Hospital2:
None, because no one was seen at Hospital2 in March
Output for the month of March for Hospital3:
DateCreated| UniquePerson | Type | Location | +-30days | otherLoc.
------------------------------------------------------------------------
3/15/2017 | 3 | Admit| Hospital3 | 2/28/2017 | Hospital2
Version 1
I would use a WITH clause. Please, notice that I've added a column id that is a primary key to simplify the query. It's just to prevent the rows to be matched with themselves.
WITH x AS (
SELECT
id,
date_created,
unique_person_id,
type,
location
FROM
a
WHERE
location = 'Hospital1' AND
date_trunc('month', date_created) = date_trunc('month', '2017-03-01'::date)
)
SELECT
x.date_created,
x.unique_person_id,
x.type,
x.location,
a.date_created AS "+-30days",
a.location AS other_location
FROM
x
JOIN a
USING (unique_person_id, type)
WHERE
x.id != a.id AND
abs(x.date_created - a.date_created) <= 30;
Now a little bit of explanations:
First we select, let's say a reference data with a WITH clause. Think of it as a temporary table that we can reference in the main query. In given example it could be a "main visit" in given hospital.
Then we join "main visits" with other visits of the same person and type (JOIN condition) that happen in date difference of 30 days (WHERE condition).
Notice that the WITH query has the limits you want to check (location and date). I use date_trunc function that truncates the date to specified precision (a month in this case).
Version 2
As #Laurenz Albe suggested, there is no special need to use a WITH clause. Right, so here is a second version.
SELECT
x.date_created,
x.unique_person_id,
x.type,
x.location,
a.date_created AS "+-30days",
a.location AS other_location
FROM
a AS x
JOIN a
USING (unique_person_id, type)
WHERE
x.location = 'Hospital1' AND
date_trunc('month', x.date_created) = date_trunc('month', '2017-03-01'::date) AND
x.id != a.id AND
abs(x.date_created - a.date_created) <= 30;
This version is shorter than the first one but, in my opinion, the first is easier to understand. I don't have big enough set of data to test and I wonder which one runs faster (the query planner shows similar values for both).

Having(count) in a Cakephp 3 query fails on PostgreSQL

I have the following table structure with matching relations:
,---------. ,--------------. ,---------.
| Threads | | ThreadsUsers | | Users |
|---------| |--------------| |---------|
| id | | id | | id |
'---------' | thread_id | '---------'
| user_id |
'--------------'
This custom query in ThreadsTable is meant to find threads with a given number of participants. It works fine on mysql
public function findWithUserCount(Query $query, array $options)
{
return $query
->matching('Users')
->select([
'Threads.id',
'count' => 'COUNT(Users.id)'
])
->group('Threads.id HAVING count = ' . $options['count']);
}
However it fails on postgresql with the following error
PDOException: SQLSTATE[42703]: Undefined column: 7
ERROR: column "count" does not exist
LINE 1: ...ThreadsUsers.user_id)) GROUP BY Threads.id HAVING count = 2
The HAVING clause cannot reference column aliases defined in the SELECT clause. The documentation says:
Each column referenced in condition must unambiguously reference a grouping column, unless the reference appears within an aggregate function or the ungrouped column is functionally dependent on the grouping columns.
Since count is neither a "grouping column" (i.e. the subject of the GROUP BY clause) nor an aggregate function, it can't be used there.
So the correct form would presumably be (I don't know CakePHP, and the fact that you can inject SQL into the group call at all seems like a massively broken design for a query builder):
->group('Threads.id HAVING COUNT(Users.id) = ' . $options['count']);

Search inside full search column using certain letters

I want to search inside a full search column using certain letters, I mean:
select "Name","Country","_score" from datatable where match("Country", 'China');
Returns many rows and is ok. My question is, how can I search for example:
select "Name","Country","_score" from datatable where match("Country", 'Ch');
I want to see, China, Chile, etc.
I think that match_type phrase_prefix can be the answer, but I don't know how I can use (correct syntax).
The match predicate supports different types by use of using match_type [with (match_parameter = [value])].
So in your example using the phrase_prefix match type:
select "Name","Country","_score" from datatable where match("Country", 'Ch') using phrase_prefix;
gives you your desired results.
See the match predicate documentation: https://crate.io/docs/en/latest/sql/fulltext.html?#match-predicate
If you just need to match the beginning of a string column, you don't need a fulltext analyzed column. You can use the LIKE operator instead, e.g.:
cr> create table names_table (name string, country string);
CREATE OK (0.840 sec)
cr> insert into names_table (name, country) values ('foo', 'China'), ('bar','Chile'), ('foobar', 'Austria');
INSERT OK, 3 rows affected (0.049 sec)
cr> select * from names_table where country like 'Ch%';
+---------+------+
| country | name |
+---------+------+
| Chile | bar |
| China | foo |
+---------+------+
SELECT 2 rows in set (0.037 sec)