Querying for overlapping time ranges in SQLAlchemy and Postgres - postgresql

I'm using Flask-SQLAlchemy to describe a Postgres database. Three related tables look like this (in part):
from sqlalchemy.dialects.postgresql import TSTZRANGE
class Shift(Base):
__tablename__ = "shifts"
id = db.Column(db.Integer, primary_key=True)
hours = db.Column(TSTZRANGE, nullable=False)
class Volunteer(Base):
__tablename__ = "volunteers"
id = db.Column(db.Integer(), primary_key=True)
shifts = db.relationship(
"Shift",
secondary="shift_assignments",
backref=db.backref("volunteers", lazy="dynamic"),
)
class ShiftAssignment(Base):
__tablename__ = "shift_assignments"
__table_args__ = (db.UniqueConstraint('shift_id', 'volunteer_id', name='_shift_vol_uc'),)
id = db.Column(db.Integer, primary_key=True)
shift_id = db.Column("shift_id", db.Integer(), db.ForeignKey("shifts.id"))
volunteer_id = db.Column(
"volunteer_id", db.Integer(), db.ForeignKey("volunteers.id")
)
Now, I'm assigning a Volunteer to new Shift and want to make sure that the vol isn't already committed to a different Shift at the same time.
I have tried this in a Volunteer instance method, but it's not working:
new_shift = db.session.get(Shift, new_shift_id)
if new_shift not in self.shifts:
for shift in self.shifts:
overlap = db.session.scalar(shift.hours.overlaps(new_shift.hours))
This results in the following exception:
'DateTimeTZRange' object has no attribute 'overlaps'
It seems like I should probably not even be doing this by iterating over the list anyway, but should be directly querying the DB to do the date overlap math. So I guess I need to join the volunteers and shifts and then filter to find out if any shifts overlap with the target shift. But I can't figure out how to do that, and examples of overlaps and its RangeOperators friends is really thin on the ground.
Would appreciate a hand here.

It was much easier than I was making it. Again, this is in a Volunteer instance method.
new_shift = db.session.get(Shift, new_shift_id)
overlapping_shift = (
db.session.query(Shift, ShiftAssignment)
.join(ShiftAssignment)
.filter(ShiftAssignment.volunteer_id == self.id)
.filter(Shift.hours.overlaps(new_shift.hours))
.first()
)
if overlapping_shift:
print("overlap found")
Note that the query returns a (Shift, ShiftAssignment) tuple. We join the two appropriate tables and then filter twice, left with any overlapping shifts assigned to the current volunteer.

Related

How to find relationship from Snomed Postgres Sql Database

Problem Statement:
Extract all parents, grandparents, child, and grandchild from Snomed CT database
Description:
I am trying to set up the snomed database on my local box to extract relationships (all parents and child) for a particular concept (using concept_id).
I have downloaded snomed data from https://download.nlm.nih.gov/umls/kss/IHTSDO20190131/SnomedCT_InternationalRF2_PRODUCTION_20190131T120000Z.zip
Then I imported data into Postgres SQL DB using a script which I found here https://github.com/IHTSDO/snomed-database-loader/tree/master/PostgreSQL
But I didn't find any relationship between these tables so that I can fetch parents, grandparents, children and grandchildren for a particular concept id (I tried with lung cancer 93880001)
Following image contains table structure:
I really appreciate any help or suggestions.
According to the NHS CT Browser, which may not be accessible from everywhere, 93880001 has three parents:
Malignant tumor of lung (disorder)
Primary malignant neoplasm of intrathoracic organs (disorder)
Primary malignant neoplasm of respiratory tract (disorder)
and 31 children:
Carcinoma of lung parenchyma (disorder)
Epithelioid hemangioendothelioma of lung (disorder)
Non-Hodgkin's lymphoma of lung (disorder)
Non-small cell lung cancer (disorder)
and so on...
The way to find higher and lower levels of the hierarchy is to use relationship_f.sourceid and relationship_f.destinationid. However, the raw tables are not user friendly so I would suggest making some views. I have taken the code from the Oracle .sql files in this GitHub repo.
First, we make a view with concept IDs and preferred names:
create view conceptpreferredname as
SELECT distinct c.id conceptId, d.term preferredName, d.id descriptionId
FROM postgres.snomedct.concept_f c
inner JOIN postgres.snomedct.description_f d
ON c.id = d.conceptId
AND d.active = '1'
AND d.typeId = '900000000000013009'
inner JOIN postgres.snomedct.langrefset_f l
ON d.id = l.referencedComponentId
AND l.active = '1'
AND l.refSetId = '900000000000508004' -- GB English
AND l.acceptabilityId = '900000000000548007';
Then we make a view of relationships:
CREATE VIEW relationshipwithnames AS
SELECT id, effectiveTime, active,
moduleId, cpn1.preferredName moduleIdName,
sourceId, cpn2.preferredName sourceIdName,
destinationId, cpn3.preferredName destinationIdName,
relationshipGroup,
typeId, cpn4.preferredName typeIdName,
characteristicTypeId, cpn5.preferredName characteristicTypeIdName,
modifierId, cpn6.preferredName modifierIdName
from postgres.snomedct.relationship_f relationship,
conceptpreferredname cpn1,
conceptpreferredname cpn2,
conceptpreferredname cpn3,
conceptpreferredname cpn4,
conceptpreferredname cpn5,
conceptpreferredname cpn6
WHERE moduleId = cpn1.conceptId
AND sourceId = cpn2.conceptId
AND destinationId = cpn3.conceptId
AND typeId = cpn4.conceptId
AND characteristicTypeId = cpn5.conceptId
AND modifierId = cpn6.conceptId;
So a query to print out the names and ids of the three parent concepts would be:
select *
from relationshipwithnames r
where r.sourceId = '93880001'
and r.active = '1'
and r.typeIdName = 'Is a';
Note that this actually returns three extra concepts, which the online SNOMED browser thinks are obsolete. I am not sure why.
To print out the names and ids of child concepts, replace destinationId with sourceId:
select *
from relationshipwithnames r
where r.destinationId = '93880001'
and r.active = '1'
and r.typeIdName = 'Is a';
Note that this actually returns sixteen extra concepts, which the online SNOMED browser thinks are obsolete. Again, I cannot find a reliable way to exclude only these sixteen from the results.
From here, queries to get grandparents and grandchildren are straightforward.

Double inserting records in flask SqlAlchemy connected with PostgreSql?

Very rarely, I meet a problem that the record that I inserted into Table Tbl_CUSTOMER was double with auto ID from Postgres.
I have no idea, but I suspected that it could be caused from postgres vacuum running time. To confirm that, I tried to run postgres vacuum at the same with inserting record, but could not found this problem happened, therefore, I could not duplicate the issue to find what was the root cause and fix the problem.
models.py
class Tbl_CUSTOMER():
ID = db.Column(db.Numeric(25, 9), primary_key=True, autoincrement=True)
PotentialCustomer = db.Column(db.String(12))
FirstNameEn = db.Column(db.String(35))
LastNameEn = db.Column(db.String(35))
FirstNameKh = db.Column(db.String(35))
LastNameKh = db.Column(db.String(35))
Salutation = db.Column(db.String(4))
Gender = db.Column(db.String(6))
DateOfBirth = db.Column(db.String(10))
CountryOfBirth = db.Column(db.String(2))
Nationality = db.Column(db.String(2))
ProvinceOfBirth = db.Column(db.String(3))
views.py
dataInsert =Tbl_CUSTOMER(
PotentialCustomer = request.form['PotentialCustomer'],
FirstNameEn = request.form['FirstNameEn'],
LastNameEn = request.form['LastNameEn'],
FirstNameKh = request.form['FirstNameKh'],
LastNameKh = request.form['LastNameKh'],
Salutation = request.form['Salutation'],
Gender = request.form['Gender'],
DateOfBirth = request.form['DateOfBirth'],
CountryOfBirth = request.form['CountryOfBirth'],
Nationality = request.form['Nationality'],
ProvinceOfBirth = request.form['ProvinceOfBirth']
)
db.session.add(dataInsert)
db.session.commit()
This problem does not happen frequently. So, what is the problem, and how can I fix this to prevent it happen in future? Thanks.
If you create a unique key ( or replace your primary key ) with some hashing function value based on all the values of your row, that may help you to see when this problem is happening. Using this hashing column you will be able to decide what you should happen when your system get the same value ( same hash ). One option, for example, just ignores the new row, keeping the old one. Other, is to rewrite, etc.
The chance of getting the same hash value from different rows is so small that I would not even consider that. Look this thread https://crypto.stackexchange.com/questions/1170/best-way-to-reduce-chance-of-hash-collisions-multiple-hashes-or-larger-hash if you want to see more details about that.

Multiple conditions in postgresql_where?

postgresql_where is useful to get around the (in my opinion wrong, but apparently the SQL standard defines it) way in which Postgres defines unique-ness, where Null values are always unique. A typical example is shown below, where no item can have identical name+purpose+batch_id values (and None/Null is considered one unique value due to the second Index).
class Item(StoredObject, Base):
batch_id = Column(Integer, ForeignKey('batch.id'))
group_id = Column(Integer, ForeignKey('group.id'))
name = Column(Text, nullable=False)
purpose = Column(Text, nullable=False, default="")
__table_args__ = (
Index('idx_batch_has_value',
'group_id', 'name', 'purpose', 'batch_id',
unique=True,
postgresql_where=(batch_id.isnot(None))),
Index('idx_batch_has_no_value',
'group_id', 'name', 'purpose',
unique=True,
postgresql_where=(batch_id.is_(None))),
)
However, I want the same behaviour across two ids (batch_id and group_id), that is to say that no item can have identical name+purpose+batch_id+group_id values (None/Null is considered one unique value in both batch_id and group_id).
I can workaround this by creating a 'default' batch/group object with a fixed ID (say 0). This means I'd have to ensure that batch/group object exists, cannot be deleted, and that that id doesn't get re-appropriated for another 'real' batch/group objects (not to mention I'd have to remember to reduce all counts by one when using/writing functions which count how many batches/groups I have).
Do-able, and I'm about to do it now, but there must be a better way! Is there something like:-
postgresql_where = (batch_id.isnot(None) AND group_id.isnot(None))
That would solve the problem where, in my opinion, it is meant to be solved, in the DB rather than in my model and/or initialization code.

Select most reviewed courses starting from courses having at least 2 reviews

I'm using Flask-SQLAlchemy with PostgreSQL. I have the following two models:
class Course(db.Model):
id = db.Column(db.Integer, primary_key = True )
course_name =db.Column(db.String(120))
course_description = db.Column(db.Text)
course_reviews = db.relationship('Review', backref ='course', lazy ='dynamic')
class Review(db.Model):
__table_args__ = ( db.UniqueConstraint('course_id', 'user_id'), { } )
id = db.Column(db.Integer, primary_key = True )
review_date = db.Column(db.DateTime)#default=db.func.now()
review_comment = db.Column(db.Text)
rating = db.Column(db.SmallInteger)
course_id = db.Column(db.Integer, db.ForeignKey('course.id') )
user_id = db.Column(db.Integer, db.ForeignKey('user.id') )
I want to select the courses that are most reviewed starting with at least two reviews. The following SQLAlchemy query worked fine with SQlite:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \ .order_by(func.count(models.Review.course_id).desc()).all()
But when I switched to PostgreSQL in production it gives me the following error:
ProgrammingError: (ProgrammingError) column "review.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT review.id AS review_id, review.review_date AS review_...
^
'SELECT review.id AS review_id, review.review_date AS review_review_date, review.review_comment AS review_review_comment, review.rating AS review_rating, review.course_id AS review_course_id, review.user_id AS review_user_id, count(review.course_id) AS count_1 \nFROM review GROUP BY review.course_id \nHAVING count(review.course_id) > %(count_2)s ORDER BY count(review.course_id) DESC' {'count_2': 1}
I tried to fix the query by adding models.Review in the GROUP BY clause but it did not work:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \.order_by(func.count(models.Review.course_id).desc()).all()
Can anyone please help me with this issue. Thanks a lot
SQLite and MySQL both have the behavior that they allow a query that has aggregates (like count()) without applying GROUP BY to all other columns - which in terms of standard SQL is invalid, because if more than one row is present in that aggregated group, it has to pick the first one it sees for return, which is essentially random.
So your query for Review basically returns to you the first "Review" row for each distinct course id - like for course id 3, if you had seven "Review" rows, it's just choosing an essentially random "Review" row within the group of "course_id=3". I gather the answer you really want, "Course", is available here because you can take that semi-randomly selected Review object and just call ".course" on it, giving you the correct Course, but this is a backwards way to go.
But once you get on a proper database like Postgresql you need to use correct SQL. The data you need from the "review" table is just the course_id and the count, nothing else, so query just for that (first assume we don't actually need to display the counts, that's in a minute):
most_rated_course_ids = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
all()
but that's not your Course object - you want to take that list of ids and apply it to the course table. We first need to keep our list of course ids as a SQL construct, instead of loading the data - that is, turn it into a derived table by converting the query into a subquery (change the word .all() to .subquery()):
most_rated_course_id_subquery = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
subquery()
one simple way to link that to Course is to use an IN:
courses = session.query(Course).filter(
Course.id.in_(most_rated_course_id_subquery)).all()
but that's essentially going to throw away the "ORDER BY" you're looking for and also doesn't give us any nice way of actually reporting on those counts along with the course results. We need to have that count along with our Course so that we can report it and also order by it. For this we use a JOIN from the "course" table to our derived table. SQLAlchemy is smart enough to know to join on the "course_id" foreign key if we just call join():
courses = session.query(Course).join(most_rated_course_id_subquery).all()
then to get at the count, we need to add that to the columns returned by our subquery along with a label so we can refer to it:
most_rated_course_id_subquery = session.query(
Review.course_id,
func.count(Review.course_id).label("count")
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
subquery()
courses = session.query(
Course, most_rated_course_id_subquery.c.count
).join(
most_rated_course_id_subquery
).order_by(
most_rated_course_id_subquery.c.count.desc()
).all()
A great article I like to point out to people about GROUP BY and this kind of query is SQL GROUP BY techniques which points out the common need for the "select from A join to (subquery of B with aggregate/GROUP BY)" pattern.

Self m:n Relation

I have persons and a person can contact multiple other persons, so basically the "default" tables would be:
persons (id)
contacts (person1_id, person2_id)
With this schema, I'd have to issue queries like
SELECT *
FROM contacts c
WHERE ( person1_id = *id of person1* AND person2_id = *id of person2* )
OR
( person1_id = *id of person2* AND person2_id = *id of person1* )
to get the relation between two persons when I insert such a relation only once.
What is the common practice to deal with this situation?
Insert data once and do such an OR query
Insert the relation twice so that person1_id = id of person1 AND person2_id = id of person2 is enough
An entirely different approach?
Assuming:
The m:n table actually contains additional data, so if I create a relation for both ways, I'd have to duplicate the data
This is a core part of the application and most non-trivial queries involve at least a sub query that determines whether or not such a relation exists
If you write your insert logic such that person1_id < person2_id is true for all rows, then you can just write
SELECT *
FROM contacts c
WHERE person1_id = min(*id_of_person_1*, *id_of_person_2*)
AND person2_id = max(*id_of_person_1*, *id_of_person_2*)
Why don't you use Join between the tables?
something like this:
SELECT *
FROM contact c INNER JOIN person p ON p.id = c.person1_id
The the where and group bys you need to complete you're query =)
Take a look here how the results will be showed:
http://www.w3schools.com/Sql/sql_join_inner.asp
Regards,
Elkas
Try this one mate =)
SELECT c.person1_id as id_person_1, c.person2_id as id_person_2, p1.name as name_person_1, p2.name as name_person_2
FROM contact c
LEFT JOIN person p1 ON p1.id = c.person1_id
RIGHT JOIN person p2 ON p2.id = c.person2_id;
I don't know if it will work.. but give it try mate =)
"Insert the relation twice so that person1_id = id of person1 AND person2_id = id of person2 is enough"
That is how I'd do it, personally. It allows to deal with the situation where A has the contact details of B but not the other way around (e.g. a girl gives a guy her number at the bar saying "call me" as she walks out). It also makes the queries simpler.