I have the below XML in each row with different data.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:Declaration xmlns="urn:wco:datamodel:WCO:Declaration_DS:DMS:2" xmlns:ns2="urn:wco:datamodel:WCO:DEC-DMS:2" xmlns:ns3="urn:wco:datamodel:WCO:WCO_DEC_EDS_AUTHORISATION:1">
<ns2:FunctionCode>9</ns2:FunctionCode>
<ns2:ProcedureCategory>B1</ns2:ProcedureCategory>
<ns2:FunctionalReferenceID>LRNU4YZHFFG</ns2:FunctionalReferenceID>
<ns2:IssueDateTime>
<DateTimeString formatCode="304">20210816084322+01</DateTimeString>
</ns2:IssueDateTime>
<ns2:TypeCode>EXA</ns2:TypeCode>
<ns2:GoodsItemQuantity>2</ns2:GoodsItemQuantity>
<ns2:DeclarationOfficeID>ABCd</ns2:DeclarationOfficeID>
<ns2:TotalGrossMassMeasure unitCode="KGM">33000.000</ns2:TotalGrossMassMeasure>
<ns2:TotalPackageQuantity>400</ns2:TotalPackageQuantity>
<ns2:Submitter>
<ns2:Name>ABC</ns2:Name>
<ns2:ID>ABC</ns2:ID>
</ns2:Submitter>
</ns2:Declaration>
ID
messagebody
1
<Xml...
2
<Xml...
what I would like to have is a query that extracts some elements from the XML and put them in a table as below
ID
messagebody
FunctionalReferenceID
ProcedureCategory
1
<Xml...
LRR....
B1
2
<Xml...
LR1....
B2
I'm using the below sql to extract only 1 path
select u.val::text
from sw_customs_message scm
cross join unnest(xpath('//ns2:ProcedureCategory/text()',
scm.messagebody::xml,
array[array['ns2','urn:wco:datamodel:WCO:DEC-DMS:2']])) as u(val)
where u.val::text = 'H7'
How i can use the xmltable() ?
Using xmltable() is typically easier for multiple columns (and rows), especially if namespaces come into play:
select scm.id, mb.*
from sw_customs_message scm
cross join xmltable(xmlnamespaces ('urn:wco:datamodel:WCO:DEC-DMS:2' as ns2,
'urn:wco:datamodel:WCO:WCO_DEC_EDS_AUTHORISATION:1' as ns3),
'/ns2:Declaration'
passing cast(messagebody as xml)
columns
functional_reference_id text path 'ns2:FunctionalReferenceID',
procedure_category text path 'ns2:ProcedureCategory',
function_code int path 'ns2:FunctionCode',
good_items_quantity int path 'ns2:GoodsItemQuantity'
) as mb
where mb.procedure_category = ...
messagebody should really be defined as xml so that you don't need to cast it each time you want to do something with the XML.
Related
Consider the following xml document that is stored in a PostgreSQL field:
<E_sProcedure xmlns="http://www.minushabens.com/2008/FMSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" modelCodeScheme="Emo_ex" modelCodeSchemeVersion="01" modelCodeValue="EMO_E_PROCEDURA" modelCodeMeaning="Section" sectionID="11">
<tCatSnVsn_Pmax modelCodeScheme="Emodinamica_referto" modelCodeSchemeVersion="01" modelCodeValue="tCat4" modelCodeMeaning="My text"><![CDATA[1]]></tCatSnVsn_Pmax>
</E_sProcedure>
If I run the following query I get the correct result for Line 1, while Line 2 returns nothing:
SELECT
--Line 1
TRIM(BOTH FROM array_to_string((xpath('//child::*[#modelCodeValue="tCat4"]/text()', t.xml_element)),'')) as tCatSnVsn_Pmax_MEANING
--Line2
,TRIM(BOTH FROM array_to_string((xpath('/tCatSnVsn_Pmax/text()', t.xml_element)),'')) as tCatSnVsn_Pmax
FROM (
SELECT unnest(xpath('//x:E_sProcedure', s.XMLDATA::xml, ARRAY[ARRAY['x', 'http://www.minushabens.com/2008/FMSchema']])) AS xml_element
FROM sr_data as s)t;
What's wrong in the xpath of Line 2?
Your second xpath() doesn't return anything because of two problems. First: you need to use //tCatSnVsn_Pmax as the xml_element still starts with <E_sProcedure>. The path /tCatSnVsn_Pmax tries to select a top-level element with that name.
But even then, the second one won't return anything because of the namespace. You need to pass the same namespace definition to the xpath(), so you need something like this:
SELECT (xpath('/x:tCatSnVsn_Pmax/text()', t.xml_element, ARRAY[ARRAY['x', 'http://www.minushabens.com/2008/FMSchema']]))[1] as tCatSnVsn_Pmax
FROM (
SELECT unnest(xpath('//x:E_sProcedure', s.XMLDATA::xml, ARRAY[ARRAY['x', 'http://www.minushabens.com/2008/FMSchema']])) AS xml_element
FROM sr_data as s
)t;
With modern Postgres versions (>= 10) I prefer using xmltable() for anything nontrivial. It makes passing namespaces easier and accessing multiple attributes or elements.
SELECT xt.*
FROM sr_data
cross join
xmltable(xmlnamespaces ('http://www.minushabens.com/2008/FMSchema' as x),
'/x:E_sProcedure'
passing (xmldata::xml)
columns
sectionid text path '#sectionID',
pmax text path 'x:tCatSnVsn_Pmax',
model_code_value text path 'x:tCatSnVsn_Pmax/#modelCodeValue') as xt
For your sample XML, the above returns:
sectionid | pmax | model_code_value
----------+------+-----------------
11 | 1 | tCat4
I have a SQL query and I would like to insert a hashtag between one column and another to be able to reference in Excel, using an import option in fields delimited by #. Anyone have an idea how to do it? A query is as follows:
SELECT FC.folha, folha->folhames,folha->folhaano, folha->folhaseq, folha->folhadesc, folha->TipoCod as Tipo_Folha,
folha->FolhaFechFormatado as Folha_Fechada, folha->DataPagamentoFormatada as Data_Pgto,
Servidor->matricula, Servidor->nome, FC.rubrica,
FC.Rubrica->Codigo, FC.Rubrica->Descricao, FC.fator, FC.TipoRubricaFormatado as TipoRubrica,
FC.ValorFormatado,FC.ParcelaAtual, FC.ParcelaTotal
FROM RHFolCalculo FC WHERE folha -> FolhaFech = 1
AND folha->folhaano = 2018
and folha->folhames = 06
and folha->TipoCod->codigo in (1,2,3,4,6,9)
You are generating delimited output from the query, so the first row should be a header row, with all following rows the data rows. You will really only have one column due to concat. So remove the alias from the columns, output the first row like so (using the alias here) . . .
SELECT 'folha#folhames#folhaano#folhaseq#folhadesc#Tipo_Folha#
Folha_Fechada#Data_Pgto#
matricula#nome#rubrica#
Codigo#Descricao#fator#TipoRubrica#
ValorFormatado#ParcelaAtual#ParcelaTotal'
UNION
SELECT FC.folha || '#' || folha->folhames || '#' || folha->folhaano . . .
The UNION will give the remaining rows. Note some conversion may be necessary on the columns data if not all strings.
I have the following heap of text:
"BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,
URLConnectionSample,ShortVersion,1.0,Version,1.0,BundleSize,155648,DynamicSize,
16384,Identifier,com.IdentifierForVendor3,Name,IdentifierForVendor3,ShortVersion,
1.0,Version,1.0,".
What I'd like to do is extract data from this in the following manner:
BundleSize:155648
DynamicSize:204800
Identifier:com.URLConnectionSample
Name:URLConnectionSample
ShortVersion:1.0
Version:1.0
BundleSize:155648
DynamicSize:16384
Identifier:com.IdentifierForVendor3
Name:IdentifierForVendor3
ShortVersion:1.0
Version:1.0
All tips and suggestions are welcome.
It isn't quite clear what do you need to do with this data. If you really need to process it entirely in the database (looks like the task for your favorite scripting language instead), one option is to use hstore.
Converting records one by one is easy:
Assuming
%s =
BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,URLConnectionSample,ShortVersion,1.0,Version,1.0
SELECT * FROM each(hstore(string_to_array(%s, ',')));
Output:
key | value
--------------+-------------------------
Name | URLConnectionSample
Version | 1.0
BundleSize | 155648
Identifier | com.URLConnectionSample
DynamicSize | 204800
ShortVersion | 1.0
If you have table with columns exactly matching field names (note the quotes, populate_record is case-sensitive to key names):
CREATE TABLE data (
"BundleSize" integer, "DynamicSize" integer, "Identifier" text,
"Name" text, "ShortVersion" text, "Version" text);
You can insert hstore records into it like this:
INSERT INTO data SELECT * FROM
populate_record(NULL::data, hstore(string_to_array(%s, ',')));
Things get more complicated if you have comma-separated values for more than one record.
%s = BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,URLConnectionSample,ShortVersion,1.0,Version,1.0,BundleSize,155648,DynamicSize,16384,Identifier,com.IdentifierForVendor3,Name,IdentifierForVendor3,ShortVersion,1.0,Version,1.0,
You need to break up an array into chunks of number_of_fields * 2 = 12 elements first.
SELECT hstore(row) FROM (
SELECT array_agg(str) AS row FROM (
SELECT str, row_number() OVER () AS i FROM
unnest(string_to_array(%s, ',')) AS str
) AS str_sub
GROUP BY (i - 1) / 12) AS row_sub
WHERE array_length(row, 1) = 12;
Output:
"Name"=>"URLConnectionSample", "Version"=>"1.0", "BundleSize"=>"155648", "Identifier"=>"com.URLConnectionSample", "DynamicSize"=>"204800", "ShortVersion"=>"1.0"
"Name"=>"IdentifierForVendor3", "Version"=>"1.0", "BundleSize"=>"155648", "Identifier"=>"com.IdentifierForVendor3", "DynamicSize"=>"16384", "ShortVersion"=>"1.0"
And inserting this into the aforementioned table:
INSERT INTO data SELECT (populate_record(NULL::data, hstore(row))).* FROM ...
the rest of the query is the same.
I want to create a sitemap xml file (including images) directly from the database without another process (like transformation or another trick).
My query is:
;WITH XMLNAMESPACES(
DEFAULT 'http://www.sitemaps.org/schemas/sitemap/0.9',
'http://www.google.com/schemas/sitemap-image/1.1' as [image] )
SELECT
(SELECT
'mysite' as [loc],
(select
'anotherloc'
as [image:loc]
for XML path('image:image'), type
)
for xml path('url'), type
)
for xml path('urlset'), type
Returns:
<urlset xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<loc>mysite</loc>
<image:image xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<image:loc>anotherloc</image:loc>
</image:image>
</url>
</urlset>
But I need this output, without repeated namespace declaration:
<urlset xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>mysite</loc>
<image:image>
<image:loc>anotherloc</image:loc>
</image:image>
</url>
</urlset>
I'm sure you realise that the additional otiose namespace declarations don't change the meaning of the XML document, so if the result is going to be consumed by an XML-conformant tool, they shouldn't matter. Nevertheless I know there are some tools out there which don't do XML Namespaces correctly, and in a large XML instance superfluous repeated namespace declarations can bloat the size of the result significantly, which may cause its own problems.
In general there is no getting around the fact that each SELECT...FOR XML statement within the scope of a WITH XMLNAMESPACES prefix will generate namespace declarations on the outermost XML element(s) in its result set, in all XML-supporting versions of SQL Server up to SQL Server 2012.
In your specific example, you can get fairly close to the desired XML by separating the SELECTs rather than nesting them, and using the ROOT syntax for the enveloping root element, thus:
DECLARE #inner XML;
WITH XMLNAMESPACES('http://www.google.com/schemas/sitemap-image/1.1' as [image])
SELECT #inner =
(
SELECT
'anotherloc' AS [image:loc]
FOR XML PATH('image:image'), TYPE
)
;WITH XMLNAMESPACES(
DEFAULT 'http://www.sitemaps.org/schemas/sitemap/0.9'
)
SELECT
'mysite' AS [loc],
#inner
FOR XML PATH('url'), ROOT('urlset'), TYPE
The result being:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>mysite</loc>
<image:image xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns="">
<image:loc>anotherloc</image:loc>
</image:image>
</url>
</urlset>
But this approach doesn't provide a completely general solution to the problem.
You can use UDF. Example:
ALTER FUNCTION [dbo].[udf_get_child_section] (
#serviceHeaderId INT
)
RETURNS XML
BEGIN
DECLARE #result XML;
SELECT #result =
(
SELECT 1 AS 'ChildElement'
FOR XML PATH('Child')
)
RETURN #result
END
GO
DECLARE #Ids TABLE
(
ID int
)
INSERT INTO #Ids
SELECT 1 AS ID
UNION ALL
SELECT 2 AS ID
;WITH XMLNAMESPACES (DEFAULT 'http://www...com/content')
SELECT
[dbo].[udf_get_child_section](ID)
FROM
#Ids
FOR XML PATH('Parent')
Result:
<Parent xmlns="http://www...com/content">
<Child xmlns="">
<ChildElement>1</ChildElement>
</Child>
</Parent>
<Parent xmlns="http://www...com/content">
<Child xmlns="">
<ChildElement>1</ChildElement>
</Child>
</Parent>
Maybe too late for answer, but this is a quick solution.
`DECLARE #PageNumber Int = 1;
DECLARE #siteMapXml XML ;
;WITH XMLNAMESPACES (
'http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd' as "schemaLocation",
'http://www.w3.org/2001/XMLSchema-instance' as xsi,
'http://www.google.com/schemas/sitemap-image/1.1' as [image],
DEFAULT 'http://www.sitemaps.org/schemas/sitemap/0.9'
)
SELECT #siteMapXml = (
SELECT
Slug loc,
convert(varchar(300),[Image]) as [image:image/image:loc]
,
Convert(char(10), UpdatedOnUtc, 126) as lastmod,
'hourly' as changefreq,
'0.5' as priority
FROM Products(NOLOCK)
WHERE Pagenumber = #PageNumber
FOR XML PATH ('url'), ROOT ('urlset'))
SELECT #siteMapXml = REPLACE(CAST(#siteMapXml AS NVARCHAR(MAX)), ' xmlns:schemaLocation=', ' xsi:schemaLocation=')
SELECT #siteMapXml`
declare #myDoc xml
set #myDoc = '<Form xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.mydomain.org/MySchema.xsd" SectionId="ABCD" Description="Some stuff">
<ProductDescription ProductID="1" ProductName="Road Bike">
<Features>
<Warranty>1 year parts and labor</Warranty>
<Maintenance>3 year parts and labor extended maintenance is available</Maintenance>
</Features>
</ProductDescription>
</Form>'
;WITH XMLNAMESPACES( 'http://www.w3.org/2001/XMLSchema-instance' as xsi, 'http://www.w3.org/2001/XMLSchema' as xsd, DEFAULT 'http://www.mydomain.org/MySchema.xsd' )
SELECT #myDoc.value('/Form[#SectionId][0]', 'varchar')
I need to obtain the attribute value of SectionId as a nvarchar ? how do I do it ?...
T and R
Mark
You could write it even simpler:
;WITH XMLNAMESPACES(DEFAULT 'http://www.mydomain.org/MySchema.xsd')
SELECT #myDoc.value('(/Form/#SectionId)[1]', 'VARCHAR(100)') AS SectionId
Since you're never using/referring to any of the xsi or xsd namespaces, there's no need to declare those.
And since you're only fetching one attribute from one element, there's really no point in using the .nodes() function to create an internal "dummy table", either.
;WITH XMLNAMESPACES( 'http://www.w3.org/2001/XMLSchema-instance' as xsi, 'http://www.w3.org/2001/XMLSchema' as xsd, DEFAULT 'http://www.mydomain.org/MySchema.xsd' )
SELECT Node.value('#SectionId', 'VARCHAR(100)') AS SectionId
FROM #myDoc.nodes('/Form') TempXML (Node);