ETL Mapping LOINC vocabulary on OMOP Common Data Model

ETL Mapping LOINC vocabulary on OMOP Common Data Model - postgresql

I am working on the lab test values mapping (MEASUREMENT table of the OMOP CDM).
My local mapping table (handmade) has my measurement name (in French) and the associated LOINC code.
The LOINC vocabulary has been loaded from Athena (OHDSI community tool) https://athena.ohdsi.org/search-terms/
I load my local concepts into the CONCEPT table, then use an SQL query to associate the equivalent LOINC concept_id (from concept_code mapping/LOINC source codes).
I realise that the link is not made on the LOINC concept_code.
Indeed, when I filter the CONCEPT table on a LOINC concept_code (ex 34714-6) I find no result.
select *
from omop.concept
where concept_code in ('34714-6');
When I filter on the corresponding concept_id (3032080) I find the result with the desired concept_code.
select *
from omop.concept
where concept_id in ('3032080');
I have tested concept_code like '34714__' which returns the expected line.
This is not due to the encoding because when I copy/paste the resulting concept_code (filtering on concept_id = ‘3032080’) into my query concept_code in ('34714-6') I get the same problem.
However other LOINC codes work:
select *
from omop.concept
where concept_code in ('14646-4');
When I check what symbol exacty is being used :
select ASCII(substr(concept_code,1,1))
,ASCII(substr(concept_code,2,1))
,ASCII(substr(concept_code,3,1))
,ASCII(substr(concept_code,4,1))
,ASCII(substr(concept_code,5,1))
,ASCII(substr(concept_code,6,1))
,ASCII(substr(concept_code,7,1))
from omop.concept
where concept_id = 3032080 ;
I also checked/removed the whitespaces.
The same process works on drugs (concept_code from ATC).
Can you tell me where this error comes from?
Thank you for your help.

please check that you're using the latest sql-client and JDBC driver versions

Related

format issue in scala, while having wildcards characters

I have a sql query suppose
sqlQuery="select * from %s_table where event like '%holi%'"
listCity=["Bangalore","Mumbai"]
for (city<- listCity){
print(s.format(city))
}
Expected output:
select * from Bangalore_table where event like '%holi%'
select * from Mumbai_table where event like '%holi%'
Actual output:
unknown format conversion exception: Conversion='%h'
Can anyone let me how to solve this, instead of holi it could be anything iam looking for a generic solution in scala.

If you want the character % in a formatting string you need to escape it by repeating it:
sqlQuery = "select * from %s_table where event like '%%holi%%'"
More generally I would not recommend using raw SQL. Instead, use a library to access the database. I use Slick but there are a number to choose from.
Also, having different tables named for different cities is really poor database design and will cause endless problems. Create a single table with an indexed city column and use WHERE to select one or more cities for inclusion in the query.

Redshift Spectrum table doesnt recognize array

I have ran a crawler on json S3 file for updating an existing external table.
Once finished I checked the SVL_S3LOG to see the structure of the external table and saw it was updated and I have new column with Array<int> type like expected.
When I have tried to execute select * on the external table I got this error: "Invalid operation: Nested tables do not support '*' in the SELECT clause.;"
So I have tried to detailed the select statement with all columns names:
select name, date, books.... (books is the Array<int> type)
from external_table_a1
and got this error:
Invalid operation: column "books" does not exist in external_table_a1;"
I have also checked under "AWS Glue" the table external_table_a1 and saw that column "books" is recognized and have the type Array<int>.
Can someone explain why my simple query is wrong?
What am I missing?

Querying JSON data is a bit of a hassle with Redshift: when parsing is enabled (eg using the appropriate SerDe configuration) the JSON is stored as a SUPER type. In your case that's the Array<int>.
The AWS documentation on Querying semistructured data seems pretty straightforward, mentioning that PartiQL uses "dotted notation and array subscript for path navigation when accessing nested data". This doesn't work for me, although I don't find any reasons in their SUPER Limitations Documentation.
Solution 1
What I have to do is set the flags set json_serialization_enable to true; and set json_serialization_parse_nested_strings to true; which will parse the SUPER type as JSON (ie back to JSON). I can then use JSON-functions to query the data. Unnesting data gets even crazier because you can only use the unnest syntax select item from table as t, t.items as item on SUPER types. I genuinely don't think that this is the supposed way to query and unnest SUPER objects but that's the only approach that worked for me.
They described that in some older "Amazon Redshift Developer Guide".
Solution 2
When you are writing your query or creating a query Redshift will try to fit the output into one of the basic column data types. If the result of your query does not match any of those types, Redshift will not process the query. Hence, in order to convert a SUPER to a compatible type you will have to unnest it (using the rather peculiar Redshift unnest syntax).
For me, this works in certain cases but I'm not always able to properly index arrays, not can I access the array index (using my_table.array_column as array_entry at array_index syntax).

How to build a select using Zend with a DISTINCT specific column?

I'm using Zend Framework for my website and I'd like to retrieve some data from my PostgreSQL database.
I have a request like :
SELECT DISTINCT ON(e.id) e.*, f.*, g.* FROM e, f, g
WHERE e.id = f.id_e AND f.id = g.id_f
This request works well but I don't know how to convert the DISTINCT ON(e.id) with Zend.
It seems that I can get DISTINCT rows but no distinct columns.
$select->distinct()->from("e")->join("f", "e.id = f.id_e")
->join("g", "f.id = g.id_f");
Any idea on how to make a select with distinct column ?
Thanks for help

You probably can't do this with Zend Framework since distinct on is not part of the SQL standard (end of page in Postgres documentation). Although Postgres supports it, I would assume its not part of Zend Framework because you could in theory configure another database connection which does not offer support.
If you know in advance that you're developing for a specific database (Postgres in this case), you could use manually written statements instead. You'll gain more flexibility within the queries and better performance at the cost of no longer being able to switch databases.
You would then instantiate a Zend_Db_Apdapter for Postgres. There a various methods available to get results for SQL queries which are described in the frameworks documentation starting at section Reading Query Results. If you choose to go this route I'd recommend to create an own subclass of the Zend_Db_Adapter_Pgsql class. This is to be able to convert data types and throw exceptions in case of errors instead of returning ambiguous null values and hiding error causes.

Why does SQL Server 2000 treat SELECT test.* and SELECT t.est.* the same?

I butter-fingered a query in SQL Server 2000 and added a period in the middle of the table name:
SELECT t.est.* FROM test
Instead of:
SELECT test.* FROM test
And the query still executed perfectly. Even SELECT t.e.st.* FROM test executes without issue.
I've tried the same query in SQL Server 2008 where the query fails (error: the column prefix does not match with a table name or alias used in the query). For reasons of pure curiosity I have been trying to figure out how SQL Server 2000 handles the table names in a way that would allow the butter-fingered query to run, but I haven't had much luck so far.
Any sql gurus know why SQL Server 2000 ran the query without issue?
Update: The query appears to work regardless of the interface used (e.g. Enterprise Manager, SSMS, OSQL) and as Jhonny pointed out below it bizarrely even works when you try:
SELECT TOP 1000 dbota.ble.* FROM dbo.table

Maybe table names are constructed from a naive concatenation of prefix and base name.
't' + 'est' == 'test'
And maybe in the later versions of SQL Server, the distinction was made more semantic/more rigorously.
{ owner = t, table = est } != { table = test }

SQL Server 2005 and up has a "proper" implementation of schemas. SQL 2000 and earlier did not. The details escape me (its been years since I used SQL 2000), all I recall clearly is that you'd be nuts to create anything that wasn't owned by "dbo". It all ties into users and object ownership, but the 2000 and earlier model was pretty confusticated. Hopefully someone will read up on BOL, do some experimentation, and post their results here.

S-SQL reference manual:
"[dot] Can be used to combine multiple names into a name of the form A.B to refer to a column in a table, or a table in a schema. Note that you calso just use a symbol with a dot in it."
So I think if you referenced tblTest as tblT.est it would work OK as long as there isn't a column called 'est' in tblTest.
If it can't find a column name referenced with the dot I imagine it checks the parent of the object.

I found a reference to it being a bug
Note: as a result of a comparison
algorithm bug in SQL Server 2000, dot
symbols themselves have no effect on
matching, so "dbo.t" will successfully
match with tables "dbot", "d.b.o.t",
etc
from Link
It's been fixed in SQL Server 2005. Same link > Changes introduced in SQL Server 2005
Dot-related comparison bug has been fixed.

Is it in the "Open table" view of SSMS or via Enterprise Manager or via an SSMS Query Window?
There is/was a SQL Server 2005 issue with SSMS so how you run the query affects how it behaves.

This is a bug.
It has to do with internal representation of column names in SQL server 2000 that leaked out.
You will also not be able to create tablecolumn with a name which collides with table+column concatenation with another column, like, if you have tables User and UserDetail, you won't be able to have columns DetailAge and Age in these tables, respectively.

MVA attributes in Sphinx

Can anybody help me understand the expected format of data for creating MVA (multi-value)
attributes in Sphinx?
I have a MySQL function which returns a row of comma-separated integers, collated with
GROUP_CONCAT, as a blob. I have two further MVA attributes which collate the results of a
JOIN statement, with GROUP_CONCAT, as a blob (as generated by ThinkingSphinx). These are all included in my sql_query in my sphinx.conf.
I've tried running the SQL on a small result set in the console, and it works: for all
the MVA columns, the results are a blob containing data such as:
2432,35345,342347,8975,453645
and so on. The two MVA attributes generated with the JOIN/GROUP_CONCAT combination index correctly. However, the MVA attribute generated with the MySQL function causes the
indexing to fail silently (seemingly little or no data is indexed). This is despite the query working absolutely fine in the console..
So the data format seems to be identical, but Sphinx is rejecting one of the columns. Does anybody know of any gotchas with defining MVA attributes which might help me debug
this?

I've never used thinking-sphinx (being a PHP shop here), but I don't think you should be group_concat'ing your results. From a working example in one of my sphinx.conf files:
sql_attr_multi = uint categories from query; SELECT entry_id, cat_id FROM exp_category_posts

I solved this problem eventually. It was happening because of something
which seemed unrelated: a 'sql_attr_str2ordinal' attribute which seemed to be affected
(or effect) the SQL query/indexing in ways I don't fully understand.
See: http://www.sphx.org/forum/view.html?id=2867
Fortunately, in my case I was able to remove it entirely, and indexing now seems to work.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse