Updating records in Postgres using nested sub-selects - postgresql

I have a table where I have added a new column, and I want to write a SQL statement to update that column based on existing information. Here are the two tables and the relevant columns
'leagues'
=> id
=> league_key
=> league_id (this is the new column)
'permissions'
=> id
=> league_key
Now, what I want to do, in plain English, is this
Set leagues.league_id to be permissions.id for each value of permissions.league_key
I had tried SQL like this:
UPDATE leagues
SET league_id =
(SELECT id FROM permissions WHERE league_key =
(SELECT distinct(league_key) FROM leagues))
WHERE league_key = (SELECT distinct(league_key) FROM leagues)
but I am getting an error message that says
ERROR: more than one row returned by a subquery used as an expression
Any help for this would be greatly appreciated

Based on your requirements of
Set leagues.league_id to be permissions.id for each value of permissions.league_key
This does that.
UPDATE leagues
SET league_id = permissions_id
FROM permissions
WHERE permissions.league_key = leagues.league_key;

When you do a subquery as an expression, it can't return a result set. Your subquery must evaluate to a single result. The error that you are seeing is because one of your subqueries returns more than one value.
Here is the relevant documentation for pg84:

Related

SQL update statements updates wrong fields

I have the following code in Postgres
select op.url from identity.legal_entity le
join identity.profile op on le.legal_entity_id =op.legal_entity_id
where op.global_id = '8wyvr9wkd7kpg1n0q4klhkc4g'
which returns 1 row.
Then I try to update the url field with the following:
update identity.profile
set url = 'htpp:sam'
where identity.profile.url in (
select op.url from identity.legal_entity le
join identity.profile op on le.legal_entity_id =op.legal_entity_id
where global_id = '8wyvr9wkd7kpg1n0q4klhkc4g'
);
But the above ends up updating more than 1 row, actually all of the rows of the identity table.
I would assume since the first postgres statement returns one row, only one row at most can be updated, but I am getting the wrong effect where all of the rows are being updated. Why ?? Please help a nubie fix the above update statement.
Instead of using profile.url to identify the row you want to update, use the primary key. That is what it is there for.
So if the primary key column is called id, the statement could be modified to:
UPDATE identity.profile
SET ...
WHERE identity.profile.id IN (SELECT op.id FROM ...);
But you can do this much simpler in PostgreSQL with
UPDATE identity.profile op
SET url = 'htpp:sam'
FROM identity.legal_entity le
WHERE le.legal_entity_id = op.legal_entity_id
AND le.global_id = '8wyvr9wkd7kpg1n0q4klhkc4g';

Postgres Update Using Select Passing In Parent Variable

I need to update a few thousand rows in my Postgres table using the result of a array_agg and spatial lookup.
The query needs to take the geometry of the parent table, and return an array of the matching row IDs in the other table. It may return no IDs or potentially 2-3 IDs.
I've tried to use an UPDATE FROM but I can't seem to pass into the subquery the parent table geom column for the SELECT. I can't see any way of doing a JOIN between the 2 tables.
Here is what I currently have:
UPDATE lrc_wales_data.records
SET lrc_array = subquery.lrc_array
FROM (
SELECT array_agg(wales_lrcs.gid) AS lrc_array
FROM layers.wales_lrcs
WHERE st_dwithin(records.geom_poly, wales_lrcs.geom, 0)
) AS subquery
WHERE records.lrc = 'nrw';
The error I get is:
ERROR: invalid reference to FROM-clause entry for table "records"
LINE 7: WHERE st_dwithin(records.geom_poly, wales_lrcs.geom, 0)
Is this even possible?
Many thanks,
Steve
Realised there was no need to use SET FROM. I could just use a sub query directly in the SET:
UPDATE lrc_wales_data.records
SET lrc_array = (
SELECT array_agg(wales_lrcs.gid) AS lrc
FROM layers.wales_lrcs
WHERE st_dwithin(records.geom_poly, wales_lrcs.geom, 0)
)
WHERE records.lrc = 'nrw';

How can I orderby id when I use distinct on Web2py

I'm using postgresql database and have a log table. I want to show the ticket information from this log table and want to sort by id. But ticket id has duplicate data, so I use distinct to filter, and then I can't sort by id when I use distinct.
How can I solve this issue? Thanks!
rows = db(db.log.ticket_id != '').select(db.log.ALL, orderby=~db.log.id, distinct=db.log.ticket_id , limitby=((page-1) * PAGE_ROWS, (page*PAGE_ROWS)))
I got the error message:
SELECT DISTINCT ON expressions must match initial ORDER BY expressions
And I try to:
rows = db(db.log.ticket_id != '').select(db.log.ALL, orderby=~db.log.id|db.log.ticket_id, distinct=db.log.ticket_id , limitby=((page-1) * PAGE_ROWS, (page*PAGE_ROWS)))
But still can't work...

Select most reviewed courses starting from courses having at least 2 reviews

I'm using Flask-SQLAlchemy with PostgreSQL. I have the following two models:
class Course(db.Model):
id = db.Column(db.Integer, primary_key = True )
course_name =db.Column(db.String(120))
course_description = db.Column(db.Text)
course_reviews = db.relationship('Review', backref ='course', lazy ='dynamic')
class Review(db.Model):
__table_args__ = ( db.UniqueConstraint('course_id', 'user_id'), { } )
id = db.Column(db.Integer, primary_key = True )
review_date = db.Column(db.DateTime)#default=db.func.now()
review_comment = db.Column(db.Text)
rating = db.Column(db.SmallInteger)
course_id = db.Column(db.Integer, db.ForeignKey('course.id') )
user_id = db.Column(db.Integer, db.ForeignKey('user.id') )
I want to select the courses that are most reviewed starting with at least two reviews. The following SQLAlchemy query worked fine with SQlite:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \ .order_by(func.count(models.Review.course_id).desc()).all()
But when I switched to PostgreSQL in production it gives me the following error:
ProgrammingError: (ProgrammingError) column "review.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT review.id AS review_id, review.review_date AS review_...
^
'SELECT review.id AS review_id, review.review_date AS review_review_date, review.review_comment AS review_review_comment, review.rating AS review_rating, review.course_id AS review_course_id, review.user_id AS review_user_id, count(review.course_id) AS count_1 \nFROM review GROUP BY review.course_id \nHAVING count(review.course_id) > %(count_2)s ORDER BY count(review.course_id) DESC' {'count_2': 1}
I tried to fix the query by adding models.Review in the GROUP BY clause but it did not work:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \.order_by(func.count(models.Review.course_id).desc()).all()
Can anyone please help me with this issue. Thanks a lot
SQLite and MySQL both have the behavior that they allow a query that has aggregates (like count()) without applying GROUP BY to all other columns - which in terms of standard SQL is invalid, because if more than one row is present in that aggregated group, it has to pick the first one it sees for return, which is essentially random.
So your query for Review basically returns to you the first "Review" row for each distinct course id - like for course id 3, if you had seven "Review" rows, it's just choosing an essentially random "Review" row within the group of "course_id=3". I gather the answer you really want, "Course", is available here because you can take that semi-randomly selected Review object and just call ".course" on it, giving you the correct Course, but this is a backwards way to go.
But once you get on a proper database like Postgresql you need to use correct SQL. The data you need from the "review" table is just the course_id and the count, nothing else, so query just for that (first assume we don't actually need to display the counts, that's in a minute):
most_rated_course_ids = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
all()
but that's not your Course object - you want to take that list of ids and apply it to the course table. We first need to keep our list of course ids as a SQL construct, instead of loading the data - that is, turn it into a derived table by converting the query into a subquery (change the word .all() to .subquery()):
most_rated_course_id_subquery = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
subquery()
one simple way to link that to Course is to use an IN:
courses = session.query(Course).filter(
Course.id.in_(most_rated_course_id_subquery)).all()
but that's essentially going to throw away the "ORDER BY" you're looking for and also doesn't give us any nice way of actually reporting on those counts along with the course results. We need to have that count along with our Course so that we can report it and also order by it. For this we use a JOIN from the "course" table to our derived table. SQLAlchemy is smart enough to know to join on the "course_id" foreign key if we just call join():
courses = session.query(Course).join(most_rated_course_id_subquery).all()
then to get at the count, we need to add that to the columns returned by our subquery along with a label so we can refer to it:
most_rated_course_id_subquery = session.query(
Review.course_id,
func.count(Review.course_id).label("count")
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
subquery()
courses = session.query(
Course, most_rated_course_id_subquery.c.count
).join(
most_rated_course_id_subquery
).order_by(
most_rated_course_id_subquery.c.count.desc()
).all()
A great article I like to point out to people about GROUP BY and this kind of query is SQL GROUP BY techniques which points out the common need for the "select from A join to (subquery of B with aggregate/GROUP BY)" pattern.

Update columns based on subquery containing "select unnest" clause

I am attempting to update a boolean column in one table based upon the values in a second.
UPDATE channels
SET contains_photos = TRUE
WHERE id IN (SELECT unnest(ancestors)
FROM channel_tree WHERE id = 11329);
The channel_tree.ancestors column contains an array of channel IDs. The above is failing with the following error:
ERROR: cannot TRUNCATE "channel_tree" because it is being used by active queries in this session
The overriding goal is to set the contains_photos column to true for all ancestors of a given channel. Any one know how best to alleviate this error, or even an alternative solution?
No idea why your error says TRUNCATE. It sounds like you have a trigger or rule that is doing a truncate that we can't see.
Here's some alternative ways of doing that same query:
UPDATE channels
SET contains_photos = TRUE
WHERE id = ANY (SELECT ancestors
FROM channel_tree WHERE id = 11329);
Or with a join:
UPDATE channels
SET contains_photos = TRUE
FROM channel_tree
WHERE channels.id = ANY (channel_tree.ancestors)
AND channel_tree.id = 11329;