Multiple conditions in postgresql_where? - postgresql

postgresql_where is useful to get around the (in my opinion wrong, but apparently the SQL standard defines it) way in which Postgres defines unique-ness, where Null values are always unique. A typical example is shown below, where no item can have identical name+purpose+batch_id values (and None/Null is considered one unique value due to the second Index).
class Item(StoredObject, Base):
batch_id = Column(Integer, ForeignKey('batch.id'))
group_id = Column(Integer, ForeignKey('group.id'))
name = Column(Text, nullable=False)
purpose = Column(Text, nullable=False, default="")
__table_args__ = (
Index('idx_batch_has_value',
'group_id', 'name', 'purpose', 'batch_id',
unique=True,
postgresql_where=(batch_id.isnot(None))),
Index('idx_batch_has_no_value',
'group_id', 'name', 'purpose',
unique=True,
postgresql_where=(batch_id.is_(None))),
)
However, I want the same behaviour across two ids (batch_id and group_id), that is to say that no item can have identical name+purpose+batch_id+group_id values (None/Null is considered one unique value in both batch_id and group_id).
I can workaround this by creating a 'default' batch/group object with a fixed ID (say 0). This means I'd have to ensure that batch/group object exists, cannot be deleted, and that that id doesn't get re-appropriated for another 'real' batch/group objects (not to mention I'd have to remember to reduce all counts by one when using/writing functions which count how many batches/groups I have).
Do-able, and I'm about to do it now, but there must be a better way! Is there something like:-
postgresql_where = (batch_id.isnot(None) AND group_id.isnot(None))
That would solve the problem where, in my opinion, it is meant to be solved, in the DB rather than in my model and/or initialization code.

Related

How to end up with a new Postgresql column value if there is a mismatch

INSERT INTO main_parse_user ("user_id","group_id", "username","bio", "first_name")
VALUES (%s,%s,%s,%s,%s) ON CONFLICT (user_id) DO UPDATE SET (group_id,username,bio,first_name) =
(EXCLUDED.group_id,EXCLUDED.username,
coalesce(main_parse_user.bio, EXCLUDED.bio),EXCLUDED.first_name)
Here is the code I have now, in case of a conflict, it updates everything except bio (if it is empty, it updates)
There was a new need to check with the old one when a new base arrives and, if the values ​​differ, to supplement, and if the values ​​do not differ, just leave it as it is
EXAMPLE
OLD
bio id
1 qwerty
NEW
bio id
1 qwerty1
AFTER
bio id
1 qwerty | 1
And if both bios in the old and new tables are the same, then do not touch
What you are getting is precisely does the coalesce() function does. Since the new requirement is to to supplement (I assume that means add to the existing value) you can replace, coalesce(...) with
case when main_parse_user.bio is distinct from EXCLUDED.bio
and EXCLUDED.bio is not null
then concat(trim(main_parse_user.bio), ' ', trim(EXCLUDED.bio))
else main_parse_user.bio
end

Querying for overlapping time ranges in SQLAlchemy and Postgres

I'm using Flask-SQLAlchemy to describe a Postgres database. Three related tables look like this (in part):
from sqlalchemy.dialects.postgresql import TSTZRANGE
class Shift(Base):
__tablename__ = "shifts"
id = db.Column(db.Integer, primary_key=True)
hours = db.Column(TSTZRANGE, nullable=False)
class Volunteer(Base):
__tablename__ = "volunteers"
id = db.Column(db.Integer(), primary_key=True)
shifts = db.relationship(
"Shift",
secondary="shift_assignments",
backref=db.backref("volunteers", lazy="dynamic"),
)
class ShiftAssignment(Base):
__tablename__ = "shift_assignments"
__table_args__ = (db.UniqueConstraint('shift_id', 'volunteer_id', name='_shift_vol_uc'),)
id = db.Column(db.Integer, primary_key=True)
shift_id = db.Column("shift_id", db.Integer(), db.ForeignKey("shifts.id"))
volunteer_id = db.Column(
"volunteer_id", db.Integer(), db.ForeignKey("volunteers.id")
)
Now, I'm assigning a Volunteer to new Shift and want to make sure that the vol isn't already committed to a different Shift at the same time.
I have tried this in a Volunteer instance method, but it's not working:
new_shift = db.session.get(Shift, new_shift_id)
if new_shift not in self.shifts:
for shift in self.shifts:
overlap = db.session.scalar(shift.hours.overlaps(new_shift.hours))
This results in the following exception:
'DateTimeTZRange' object has no attribute 'overlaps'
It seems like I should probably not even be doing this by iterating over the list anyway, but should be directly querying the DB to do the date overlap math. So I guess I need to join the volunteers and shifts and then filter to find out if any shifts overlap with the target shift. But I can't figure out how to do that, and examples of overlaps and its RangeOperators friends is really thin on the ground.
Would appreciate a hand here.
It was much easier than I was making it. Again, this is in a Volunteer instance method.
new_shift = db.session.get(Shift, new_shift_id)
overlapping_shift = (
db.session.query(Shift, ShiftAssignment)
.join(ShiftAssignment)
.filter(ShiftAssignment.volunteer_id == self.id)
.filter(Shift.hours.overlaps(new_shift.hours))
.first()
)
if overlapping_shift:
print("overlap found")
Note that the query returns a (Shift, ShiftAssignment) tuple. We join the two appropriate tables and then filter twice, left with any overlapping shifts assigned to the current volunteer.

Double inserting records in flask SqlAlchemy connected with PostgreSql?

Very rarely, I meet a problem that the record that I inserted into Table Tbl_CUSTOMER was double with auto ID from Postgres.
I have no idea, but I suspected that it could be caused from postgres vacuum running time. To confirm that, I tried to run postgres vacuum at the same with inserting record, but could not found this problem happened, therefore, I could not duplicate the issue to find what was the root cause and fix the problem.
models.py
class Tbl_CUSTOMER():
ID = db.Column(db.Numeric(25, 9), primary_key=True, autoincrement=True)
PotentialCustomer = db.Column(db.String(12))
FirstNameEn = db.Column(db.String(35))
LastNameEn = db.Column(db.String(35))
FirstNameKh = db.Column(db.String(35))
LastNameKh = db.Column(db.String(35))
Salutation = db.Column(db.String(4))
Gender = db.Column(db.String(6))
DateOfBirth = db.Column(db.String(10))
CountryOfBirth = db.Column(db.String(2))
Nationality = db.Column(db.String(2))
ProvinceOfBirth = db.Column(db.String(3))
views.py
dataInsert =Tbl_CUSTOMER(
PotentialCustomer = request.form['PotentialCustomer'],
FirstNameEn = request.form['FirstNameEn'],
LastNameEn = request.form['LastNameEn'],
FirstNameKh = request.form['FirstNameKh'],
LastNameKh = request.form['LastNameKh'],
Salutation = request.form['Salutation'],
Gender = request.form['Gender'],
DateOfBirth = request.form['DateOfBirth'],
CountryOfBirth = request.form['CountryOfBirth'],
Nationality = request.form['Nationality'],
ProvinceOfBirth = request.form['ProvinceOfBirth']
)
db.session.add(dataInsert)
db.session.commit()
This problem does not happen frequently. So, what is the problem, and how can I fix this to prevent it happen in future? Thanks.
If you create a unique key ( or replace your primary key ) with some hashing function value based on all the values of your row, that may help you to see when this problem is happening. Using this hashing column you will be able to decide what you should happen when your system get the same value ( same hash ). One option, for example, just ignores the new row, keeping the old one. Other, is to rewrite, etc.
The chance of getting the same hash value from different rows is so small that I would not even consider that. Look this thread https://crypto.stackexchange.com/questions/1170/best-way-to-reduce-chance-of-hash-collisions-multiple-hashes-or-larger-hash if you want to see more details about that.

Updating two tables with same data where there is no relationship (PK,FK) between those tables

update Claim
set first_name = random_name(7),
Last_name = random_name(6),
t2.first_name=random_name(7),
t2.last_name=random_name(6)
from Claim t1
inner join tbl_ecpremit t2
on t1.first_name = t2.first_name
I am getting below error
column "t2" of relation "claim" does not exist
You can do this with a so-called data-modifying CTE:
WITH c AS (
UPDATE claim SET first_name = random_name(7), last_name = random_name(6)
WHERE <put your condition here>
RETURNING *
)
UPDATE tbl_ecpremit SET last_name = c.last_name
FROM c
WHERE first_name = c.first_name;
This assumes that random_name() is a function you define, it is not part of PostgreSQL afaik.
The nifty trick here is that the UPDATE in the WITH query returns the updated record in the first table using the RETURNING clause. You can then use that record in the main UPDATE statement to have exactly the same data in the second table.
This is all very precarious though, because you are both linking on and modifying the "first_name" column with some random value. In real life this will only work well if you have some more logic regarding the names and conditions.

Select most reviewed courses starting from courses having at least 2 reviews

I'm using Flask-SQLAlchemy with PostgreSQL. I have the following two models:
class Course(db.Model):
id = db.Column(db.Integer, primary_key = True )
course_name =db.Column(db.String(120))
course_description = db.Column(db.Text)
course_reviews = db.relationship('Review', backref ='course', lazy ='dynamic')
class Review(db.Model):
__table_args__ = ( db.UniqueConstraint('course_id', 'user_id'), { } )
id = db.Column(db.Integer, primary_key = True )
review_date = db.Column(db.DateTime)#default=db.func.now()
review_comment = db.Column(db.Text)
rating = db.Column(db.SmallInteger)
course_id = db.Column(db.Integer, db.ForeignKey('course.id') )
user_id = db.Column(db.Integer, db.ForeignKey('user.id') )
I want to select the courses that are most reviewed starting with at least two reviews. The following SQLAlchemy query worked fine with SQlite:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \ .order_by(func.count(models.Review.course_id).desc()).all()
But when I switched to PostgreSQL in production it gives me the following error:
ProgrammingError: (ProgrammingError) column "review.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT review.id AS review_id, review.review_date AS review_...
^
'SELECT review.id AS review_id, review.review_date AS review_review_date, review.review_comment AS review_review_comment, review.rating AS review_rating, review.course_id AS review_course_id, review.user_id AS review_user_id, count(review.course_id) AS count_1 \nFROM review GROUP BY review.course_id \nHAVING count(review.course_id) > %(count_2)s ORDER BY count(review.course_id) DESC' {'count_2': 1}
I tried to fix the query by adding models.Review in the GROUP BY clause but it did not work:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \.order_by(func.count(models.Review.course_id).desc()).all()
Can anyone please help me with this issue. Thanks a lot
SQLite and MySQL both have the behavior that they allow a query that has aggregates (like count()) without applying GROUP BY to all other columns - which in terms of standard SQL is invalid, because if more than one row is present in that aggregated group, it has to pick the first one it sees for return, which is essentially random.
So your query for Review basically returns to you the first "Review" row for each distinct course id - like for course id 3, if you had seven "Review" rows, it's just choosing an essentially random "Review" row within the group of "course_id=3". I gather the answer you really want, "Course", is available here because you can take that semi-randomly selected Review object and just call ".course" on it, giving you the correct Course, but this is a backwards way to go.
But once you get on a proper database like Postgresql you need to use correct SQL. The data you need from the "review" table is just the course_id and the count, nothing else, so query just for that (first assume we don't actually need to display the counts, that's in a minute):
most_rated_course_ids = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
all()
but that's not your Course object - you want to take that list of ids and apply it to the course table. We first need to keep our list of course ids as a SQL construct, instead of loading the data - that is, turn it into a derived table by converting the query into a subquery (change the word .all() to .subquery()):
most_rated_course_id_subquery = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
subquery()
one simple way to link that to Course is to use an IN:
courses = session.query(Course).filter(
Course.id.in_(most_rated_course_id_subquery)).all()
but that's essentially going to throw away the "ORDER BY" you're looking for and also doesn't give us any nice way of actually reporting on those counts along with the course results. We need to have that count along with our Course so that we can report it and also order by it. For this we use a JOIN from the "course" table to our derived table. SQLAlchemy is smart enough to know to join on the "course_id" foreign key if we just call join():
courses = session.query(Course).join(most_rated_course_id_subquery).all()
then to get at the count, we need to add that to the columns returned by our subquery along with a label so we can refer to it:
most_rated_course_id_subquery = session.query(
Review.course_id,
func.count(Review.course_id).label("count")
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
subquery()
courses = session.query(
Course, most_rated_course_id_subquery.c.count
).join(
most_rated_course_id_subquery
).order_by(
most_rated_course_id_subquery.c.count.desc()
).all()
A great article I like to point out to people about GROUP BY and this kind of query is SQL GROUP BY techniques which points out the common need for the "select from A join to (subquery of B with aggregate/GROUP BY)" pattern.