Problem Statement:
Extract all parents, grandparents, child, and grandchild from Snomed CT database
Description:
I am trying to set up the snomed database on my local box to extract relationships (all parents and child) for a particular concept (using concept_id).
I have downloaded snomed data from https://download.nlm.nih.gov/umls/kss/IHTSDO20190131/SnomedCT_InternationalRF2_PRODUCTION_20190131T120000Z.zip
Then I imported data into Postgres SQL DB using a script which I found here https://github.com/IHTSDO/snomed-database-loader/tree/master/PostgreSQL
But I didn't find any relationship between these tables so that I can fetch parents, grandparents, children and grandchildren for a particular concept id (I tried with lung cancer 93880001)
Following image contains table structure:
I really appreciate any help or suggestions.
According to the NHS CT Browser, which may not be accessible from everywhere, 93880001 has three parents:
Malignant tumor of lung (disorder)
Primary malignant neoplasm of intrathoracic organs (disorder)
Primary malignant neoplasm of respiratory tract (disorder)
and 31 children:
Carcinoma of lung parenchyma (disorder)
Epithelioid hemangioendothelioma of lung (disorder)
Non-Hodgkin's lymphoma of lung (disorder)
Non-small cell lung cancer (disorder)
and so on...
The way to find higher and lower levels of the hierarchy is to use relationship_f.sourceid and relationship_f.destinationid. However, the raw tables are not user friendly so I would suggest making some views. I have taken the code from the Oracle .sql files in this GitHub repo.
First, we make a view with concept IDs and preferred names:
create view conceptpreferredname as
SELECT distinct c.id conceptId, d.term preferredName, d.id descriptionId
FROM postgres.snomedct.concept_f c
inner JOIN postgres.snomedct.description_f d
ON c.id = d.conceptId
AND d.active = '1'
AND d.typeId = '900000000000013009'
inner JOIN postgres.snomedct.langrefset_f l
ON d.id = l.referencedComponentId
AND l.active = '1'
AND l.refSetId = '900000000000508004' -- GB English
AND l.acceptabilityId = '900000000000548007';
Then we make a view of relationships:
CREATE VIEW relationshipwithnames AS
SELECT id, effectiveTime, active,
moduleId, cpn1.preferredName moduleIdName,
sourceId, cpn2.preferredName sourceIdName,
destinationId, cpn3.preferredName destinationIdName,
relationshipGroup,
typeId, cpn4.preferredName typeIdName,
characteristicTypeId, cpn5.preferredName characteristicTypeIdName,
modifierId, cpn6.preferredName modifierIdName
from postgres.snomedct.relationship_f relationship,
conceptpreferredname cpn1,
conceptpreferredname cpn2,
conceptpreferredname cpn3,
conceptpreferredname cpn4,
conceptpreferredname cpn5,
conceptpreferredname cpn6
WHERE moduleId = cpn1.conceptId
AND sourceId = cpn2.conceptId
AND destinationId = cpn3.conceptId
AND typeId = cpn4.conceptId
AND characteristicTypeId = cpn5.conceptId
AND modifierId = cpn6.conceptId;
So a query to print out the names and ids of the three parent concepts would be:
select *
from relationshipwithnames r
where r.sourceId = '93880001'
and r.active = '1'
and r.typeIdName = 'Is a';
Note that this actually returns three extra concepts, which the online SNOMED browser thinks are obsolete. I am not sure why.
To print out the names and ids of child concepts, replace destinationId with sourceId:
select *
from relationshipwithnames r
where r.destinationId = '93880001'
and r.active = '1'
and r.typeIdName = 'Is a';
Note that this actually returns sixteen extra concepts, which the online SNOMED browser thinks are obsolete. Again, I cannot find a reliable way to exclude only these sixteen from the results.
From here, queries to get grandparents and grandchildren are straightforward.
Related
I have 3 tables in our ERP database holding all delivery data (table documents holds one row for each delivery note, documentpos holds all positions on delivery notes, documentserialnumbers holds all serial numbers for delivered items).
I want to show all items with their serial number that have been delivered to the customer and still resides there so far.
The above shown output of the following query however shows, that one item that has been delivered was returned (red marks) later. The return delivery note has the document number 527419 (dark red mark) and refers to the the delivery note 319821 (green) which is listed yellow.
The correct list would consequential show only items that are still on customer's site without the returned items (see below).
How do I have to change the query in order to exclude the returned items from the output?
The upper table shows in the image shows the output of my query, the table below how it should be.
select a.BelID, c.ReferenzBelID, a.itemnumber, a.itemname, c.deliverynotenumber,c.documenttype, c.documentmark, b.serialnumber
from dbo.documentpos a
inner join dbo.documentserialnumbers b on a.BelPosID = b.BelPosID
inner join dbo.documents c on a.BelID = c.BelID
inner join sysdba.customers d on d.account = c.A0Name1
where d.AccountID = 'customername' and c.documenttype like '%delivery%'
order by a.BelID
You may exclude positions, which are referenced by any "return" delivery note, like this (edited)
select a.BelID, c.ReferenzBelID, a.itemnumber, a.itemname, c.deliverynotenumber,c.documenttype, c.documentmark, b.serialnumber
from dbo.documentpos a
inner join dbo.documentserialnumbers b on a.BelPosID = b.BelPosID
inner join dbo.documents c on a.BelID = c.BelID
inner join sysdba.customers d on d.account = c.A0Name1
where d.AccountID = 'customername' and c.documenttype like '%delivery%'
and not exists (select 1
from dbo.documents cc
where cc.documenttype like '%delivery%'
and c.ReferenzBelID=cc.BelID
and c.documentmark='VLR')
and not exists (select 1
from dbo.documents ccc
join dbo.documentpos aa on aa.BelID = ccc.BelID
where ccc.ReferenzBelID=c.BelID
and ccc.documentmark='VLR'
and a.itemnumber=aa.itemnumber)
order by a.BelID
I was having some trouble with an SQL 2k sproc and which we moved to SQL 2k5 so we could used Table Value UDF's instead of Scalar UDF's.
This is simplified, but this is my problem.
I have a temporary table that I fill up with product information. I then pass that product information into a UDF and return the information back to my main results set. It doesn't seem to work.
Am I not allowed to pass a Temporary Table value into an CROSS APPLY'd Table Value UDF?
--CREATE AND FILL #brandInfo
SELECT sku, upc, prd_id, cp.customerPrice
FROM products p
JOIN #brandInfo b ON p.brd_id=b.brd_id
CROSS APPLY f_GetCustomerPrice(b.priceAdjustmentValue, b.priceAdjustmentAmount, p.Price) cp
--f_GetCUstomerPrice uses the AdjValue, AdjAmount, and Price to calculate users actual price
When I put dummy values in for b.priceAdjustmentValue and b.priceAdjustmentAmount it works great. But as soon as I try to load the temp table values in it bombs.
Msg 207, Level 16, State 1, Line 140
Invalid column name 'b.priceAdjustmentValue'.
Msg 207, Level 16, State 1, Line 140
Invalid column name 'b.priceAdjustmentAmount'.
Have you tried:
--CREATE AND FILL #brandInfo
SELECT sku, upc, prd_id, cp.customerPrice
FROM products p
JOIN #brandInfo b ON p.brd_id=b.brd_id
CROSS APPLY (
SELECT *
FROM f_GetCustomerPrice(b.priceAdjustmentValue, b.priceAdjustmentAmount, p.Price) cp
)
--f_GetCUstomerPrice uses the AdjValue, AdjAmount, and Price to calculate users actual price
Giving the UDF the proper context in order to resolve the column references?
EDIT:
I have built the following UDF in my local Northwind 2005 database:
CREATE FUNCTION dbo.f_GetCustomerPrice(#adjVal DECIMAL(28,9), #adjAmt DECIMAL(28,9), #price DECIMAL(28,9))
RETURNS TABLE
AS RETURN
(
SELECT Level = 'One', AdjustValue = #adjVal, AdjustAmount = #adjAmt, Price = #price
UNION
SELECT Level = 'Two', AdjustValue = 2 * #adjVal, AdjustAmount = 2 * #adjAmt, Price = 2 * #price
)
GO
And referenced it in the following query without issue:
SELECT p.ProductID,
p.ProductName,
b.CompanyName,
f.Level
FROM Products p
JOIN Suppliers b
ON p.SupplierID = b.SupplierID
CROSS APPLY dbo.f_GetCustomerPrice(p.UnitsInStock, p.ReorderLevel, p.UnitPrice) f
Are you certain that your definition of #brandInfo has the priceAdjustmentValue and priceAdjustmentAmount columns defined on it? More importantly, if you are putting this in a stored procedure as you mentioned, does there exist a #brandInfo table already without those columns defined? I know #brandInfo is a temporary table, but if it exists at the time you attempt to create the stored procedure and it lacks the columns, the parsing engine may be getting tripped up. Oddly, if the table doesn't exist at all, the parsing engine simply glides past the missing table and creates the SP for you.
I'm using Flask-SQLAlchemy with PostgreSQL. I have the following two models:
class Course(db.Model):
id = db.Column(db.Integer, primary_key = True )
course_name =db.Column(db.String(120))
course_description = db.Column(db.Text)
course_reviews = db.relationship('Review', backref ='course', lazy ='dynamic')
class Review(db.Model):
__table_args__ = ( db.UniqueConstraint('course_id', 'user_id'), { } )
id = db.Column(db.Integer, primary_key = True )
review_date = db.Column(db.DateTime)#default=db.func.now()
review_comment = db.Column(db.Text)
rating = db.Column(db.SmallInteger)
course_id = db.Column(db.Integer, db.ForeignKey('course.id') )
user_id = db.Column(db.Integer, db.ForeignKey('user.id') )
I want to select the courses that are most reviewed starting with at least two reviews. The following SQLAlchemy query worked fine with SQlite:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \ .order_by(func.count(models.Review.course_id).desc()).all()
But when I switched to PostgreSQL in production it gives me the following error:
ProgrammingError: (ProgrammingError) column "review.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT review.id AS review_id, review.review_date AS review_...
^
'SELECT review.id AS review_id, review.review_date AS review_review_date, review.review_comment AS review_review_comment, review.rating AS review_rating, review.course_id AS review_course_id, review.user_id AS review_user_id, count(review.course_id) AS count_1 \nFROM review GROUP BY review.course_id \nHAVING count(review.course_id) > %(count_2)s ORDER BY count(review.course_id) DESC' {'count_2': 1}
I tried to fix the query by adding models.Review in the GROUP BY clause but it did not work:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \.order_by(func.count(models.Review.course_id).desc()).all()
Can anyone please help me with this issue. Thanks a lot
SQLite and MySQL both have the behavior that they allow a query that has aggregates (like count()) without applying GROUP BY to all other columns - which in terms of standard SQL is invalid, because if more than one row is present in that aggregated group, it has to pick the first one it sees for return, which is essentially random.
So your query for Review basically returns to you the first "Review" row for each distinct course id - like for course id 3, if you had seven "Review" rows, it's just choosing an essentially random "Review" row within the group of "course_id=3". I gather the answer you really want, "Course", is available here because you can take that semi-randomly selected Review object and just call ".course" on it, giving you the correct Course, but this is a backwards way to go.
But once you get on a proper database like Postgresql you need to use correct SQL. The data you need from the "review" table is just the course_id and the count, nothing else, so query just for that (first assume we don't actually need to display the counts, that's in a minute):
most_rated_course_ids = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
all()
but that's not your Course object - you want to take that list of ids and apply it to the course table. We first need to keep our list of course ids as a SQL construct, instead of loading the data - that is, turn it into a derived table by converting the query into a subquery (change the word .all() to .subquery()):
most_rated_course_id_subquery = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
subquery()
one simple way to link that to Course is to use an IN:
courses = session.query(Course).filter(
Course.id.in_(most_rated_course_id_subquery)).all()
but that's essentially going to throw away the "ORDER BY" you're looking for and also doesn't give us any nice way of actually reporting on those counts along with the course results. We need to have that count along with our Course so that we can report it and also order by it. For this we use a JOIN from the "course" table to our derived table. SQLAlchemy is smart enough to know to join on the "course_id" foreign key if we just call join():
courses = session.query(Course).join(most_rated_course_id_subquery).all()
then to get at the count, we need to add that to the columns returned by our subquery along with a label so we can refer to it:
most_rated_course_id_subquery = session.query(
Review.course_id,
func.count(Review.course_id).label("count")
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
subquery()
courses = session.query(
Course, most_rated_course_id_subquery.c.count
).join(
most_rated_course_id_subquery
).order_by(
most_rated_course_id_subquery.c.count.desc()
).all()
A great article I like to point out to people about GROUP BY and this kind of query is SQL GROUP BY techniques which points out the common need for the "select from A join to (subquery of B with aggregate/GROUP BY)" pattern.
I'm building an ecommerce system with products and variants, where each has between 1 and 5 images that are stored on Amazon S3. Is it considered best practice to have a separate images table where I store the S3 URLs, or is acceptable to just add 5 image columns to each of the products and variants tables? Having a separate images table means that on import I need to do 6 SELECTS and then INSERTS (to make sure the product and each of its images don't already exist and then to import them) rather than 1. And, on retrieval, I need to join the images table to the products table 5 times to have it return the images with the product, like this:
SELECT prd."id" AS id, prd."title" AS title, prd."description" AS description,
prd."createdAt" AS productcreatedate,
prdPic1."url" AS productpic1,
prdPic2."url" AS productpic2,
prdPic3."url" AS productpic3,
prdPic4."url" AS productpic4,
prdPic5."url" AS productpic5,
brd."name" AS brandname, brd."id" AS brandid,
cat."name" AS categoryname, cat."id" AS categoryid,
prt."name" AS partnername, prt."id" AS partnerid
FROM "Products" prd
LEFT OUTER JOIN "Pictures" prdPic1 ON prdPic1."entityId" = prd."id" AND prdPic1."entity" = '1'
AND prdPic1."sortOrder" = 1
LEFT OUTER JOIN "Pictures" prdPic2 ON prdPic2."entityId" = prd."id" AND prdPic2."entity" = '1'
AND prdPic2."sortOrder" = 2
LEFT OUTER JOIN "Pictures" prdPic3 ON prdPic3."entityId" = prd."id" AND prdPic3."entity" = '1'
AND prdPic3."sortOrder" = 3
LEFT OUTER JOIN "Pictures" prdPic4 ON prdPic4."entityId" = prd."id" AND prdPic4."entity" = '1'
AND prdPic4."sortOrder" = 4
LEFT OUTER JOIN "Pictures" prdPic5 ON prdPic5."entityId" = prd."id" AND prdPic5."entity" = '1'
AND prdPic5."sortOrder" = 5
INNER JOIN "Brands" brd ON brd."id" = prd."BrandId"
INNER JOIN "Categories" cat ON cat."id" = prd."CategoryId"
INNER JOIN "Partners" prt ON prt."id" = brd."PartnerId";
The value of normalizing Brands, Categories, and Partners is clear to me to reduce redundancy. I'm less clear on the value for an images table. Explain Analyze on Postgres says this query takes 3310 msec to return 22000 rows. However, I haven't created indexes on Pictures yet, so that's not a fair analysis.
In this case, I think I would add single column, of text[] type (array of text). In this way, you have the images where you need them, and you're not bound to end in hell when you'll add more images for a product.
The following code section is returning multiple columns for a few records.
SELECT a.ClientID,ltrim(rtrim(c.FirstName)) + ' ' +
case when c.MiddleName <> '' then
ltrim(rtrim(c.MiddleName)) + '. '
else ''
end +
ltrim(rtrim(c.LastName)) as ClientName, a.MISCode, b.Address, b.City, dbo.ClientGetEnrolledPrograms(CONVERT(int,a.ClientID)) as Abbreviation
FROM ClientDetail a
JOIN Address b on(a.PersonID = b.PersonID)
JOIN Person c on(a.PersonID = c.PersonID)
LEFT JOIN ProgramEnrollments d on(d.ClientID = a.ClientID and d.Status = 'Enrolled' and d.HistoricalPKID is null)
LEFT JOIN Program e on(d.ProgramID = e.ProgramID and e.HistoricalPKID is null)
WHERE a.MichiganWorksData=1
I've isolated the issue to the ProgramEnrollments table.
This table holds one-to-many relationships where each ClientID can be enrolled in many programs. So for each program a client is enrolled in, there is a record in the table.
The final result set is therefore returning a row for each row in the ProgramEnrollments table based on these joins.
I presume my join is the issue but I don't see the problem.
Thoughts/Suggestions?
Thanks,
Chuck
The JOIN is not the issue, it's doing what it's meant to do with a one-to-many relationship
You could use a GROUP BY statement on your query, or alternatively use a sub-select to return DISTINCT values from the ProgramEnrollments/Program tables.
You don't seem to be using data from the ProgramEnrollments or Program tables so are they needed in the query (I presume they are, just thought I'd ask the question).
You don't actually appear to be using any columns in ProgramEnrollments or Program, so try removing those 2 JOINs.