XML Node query to get all valuenames

XML Node query to get all valuenames - tsql

I have a table (Table1) which has a coloumn by name XMLColoumn a part of the xml is like
<StudentSubjects>
<ValueName>Maths</ValueName>
<ValueName>Science</ValueName>
<ValueName>History</ValueName>
<ValueName>Calculus</ValueName>
</StudentSubjects>
My table is (Table1) something like
StudentNo XMLColoumn(textfile)
112 (above XML)
1445 (same structure as above XML)
I am trying to get output as
StudentNo Subjects
112 Maths
112 Science
112 History
112 Calculus
my findings till now was
SELECT
convert(XML,CAST(XMLCOLOUMN AS nvarchar(max)).value('(//StudentSubjects/ValueName/text())[1]','nvarchar(max)'),StudentNo from Table1
Which returns me only the first row ie maths..how can i get all the <ValueName>?
Please let me know i tried a lot to find...but couldnt.Please help!

You need to use nodes() to shred the XML to rows.
select StudentNo,
N.value('.', 'nvarchar(max)') as Subjects
from Table1
cross apply XMLColumn.nodes('/StudentSubjects/ValueName') as X(N)
SE-Data

Related

More efficient way to fuzzy match large datasets in SAS

I have a dataset with over 33 million records that includes a name field. I need to flag records where this name field value also appears in a second dataset that includes about 5 million records. For my purposes, a fuzzy match would be both acceptable and beneficial.
I wrote the following program to do this. It works but has been thus far running for 4 days, so I'd like to find a more efficient way to write it.
proc sql noprint;
create table INDIV_MATCH as
select A.NAME, SPEDIS(A.NAME, B.NAME) as SPEDIS_VALUE, COMPGED(A.NAME,B.NAME) as COMPGED_SCORE
from DATASET1 A join DATASET2 B
on COMPGED(A.NAME, B.NAME) le 400 and SPEDIS(A.NAME, B.NAME) le 10
order by A.name;
quit;
Any help would be much appreciated!

Merge multiple tables having different columns

I have 4 tables each tables has different number of columns as listed below.
tableA - 34
tableB - 47
tableC - 26
tableD - 16
Every table has a common column called id, now i need to perform a union the problem is since the columns are not equal length and entirely different i can't do a union.
Based on id only i can get the details from every table, so how should i approach this.
What is the optimized way to solve this, tried full join but that takes too much time.
Tried so far
SELECT * FROM tableA FULL JOIN
tableB FULL JOIN
tableC FULL JOIN
tableD
USING (id)
WHERE tableA.id = 123 OR
tableB.id = 123 OR
tableC.id = 123 OR
tableD.id = 123

Snowflake does have a declared limitation in use of Set operators (such as UNION):
When using these operators:
Make sure that each query selects the same number of columns.
[...]
However, since the column names are well known, it is possible to come up with a superset of all unique column names required in the final result and project them explicitly from each query.
There's not enough information in the question on how many columns overlap (47 unique columns?), or if they are all different (46 + 33 + 25 + 15 = 119 unique columns?). The answer to this would determine the amount of effort required to write out each query, as it would involve adapting a query from the following form:
SELECT * FROM t1
Into an explicit form with dummy columns defined with acceptable defaults that match the data type on tables where they are present:
SELECT
present_col1,
NULL AS absent_col2,
0.0 AS absent_col3,
present_col4,
[...]
FROM
t1
You can also use some meta programming with stored procedures to "generate" such an altered query by inspecting independent result's column names using the Statement::getColumnCount(), Statement::getColumnName(), etc. APIs and forming a superset union version with default/empty values.

select query with where condition using variables instead of column names in q kdb

I have a table with columns sym, px size
t:([] sym:`GOOG`IBM`APPL; px:10 20 30; size:1000 2000 3000)
Now, if I assign sym column to variable ab
ab:`sym
Then, running below query is not giving proper output
select [ab],px from t where [ab]=`IBM / returns empty table
?[t;(=;`sym;`IBM);0b; [ab]`px![ab]`px]/ type
Got understanding here and here but could not create a working query.

The answer above is close but there are some things to consider. The query you are running is basically:
q)parse"select sym,px from t where sym=`IBM"
?
`t
,,(=;`sym;,`IBM)
0b
`sym`px!`sym`px
The key thing here is that , usually indicates that a term needs enlisted. Additionally for the dictionary of column names you just need to join the value ab to px. With all that in mind I have modified your query above:
q)?[t;enlist(=;`sym;enlist`IBM);0b;(ab,`px)!ab,`px]
sym px
------
IBM 20
And assuming the where clause should also refer to ab:
q)?[t;enlist(=;ab;enlist`IBM);0b;(ab,`px)!ab,`px]
sym px
------
IBM 20

Querying a table using KDB

How can i query a table using kdb
I created a table using the code following
q)name:`Iain`Nathan`Ryan`Ross
q)number:98 42 126 98
q)table:([] name; number)
This creates a table:
name number
Iain 98
Nathan 42
Ryan 126
Ross 98
How can a query this table to return results of number which is equal to "98"
Or name which is equal to Iain
This is what I had been using

You can do this using a Q-SQL statement
select from table where number=98
The documentation for these this form of querying can be found at https://code.kx.com/q/ref/qsql/

distinct key word only for one column

I'm using postgresql as my database, I'm stuck with getting desired results with a query,
what I have in my table is something like following,
nid date_start date_end
1 20 25
1 20 25
2 23 26
2 23 26
what I want is following
nid date_start date_end
1 20 25
2 23 26
for that I used SELECT DISTINCT nid,date_start,date_end from table_1 but this result duplicate entries, how can I get distinct nid s with corresponding date_start and date_end?
can anyone help me with this?
Thanks a lot!

Based on your sample data and sample output, your query should work fine. I'll assume your sample input/output is not accurate.
If you want to get distinct values of a certain column, along with values from other corresponding columns, then you need to determine WHICH value from the corresponding columns to display (your question and query would otherwise not make sense). For this you need to use aggregates and group by. For example:
SELECT
nid,
MAX(date_start),
MAX(date_end)
FROM
table_1
GROUP BY
nid

That query should work unless you are selecting more columns.
Or maybe you are getting the same nid with a different start and/or end date

Try distinct on:
select distinct on (col1) col1, col2 from table;

DISTINCT can't result in duplicate entries - that's what it does... removed duplicates.
Is your posted data is incorrect? Exactly what are your data and output?