XSLT transform with MSXML doesn't use proper encoding

XSLT transform with MSXML doesn't use proper encoding - encoding

I'm using IXMLDOMDocument::transformNode from MSXML 3.0 to apply XSLT transforms. Each of the transforms has an xsl:output directive that specifies UTF-8 as the encoding. For example,
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
...
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:str="http://exslt.org/strings"
xmlns:math="http://exslt.org/math"
extension-element-prefixes="str math">
<xsl:output encoding="UTF-8" indent="yes" method="xml" />
...
</xsl:stylesheet>
Yet the transformed result is always UTF-16 (and the encoding attribute says UTF-16).
<?xml version="1.0" encoding="UTF-16"?>
Is this a bug in MSXML?
For various reasons, I'd really like to have UTF-8. Is there a workaround? Or do I have to convert the transformed result to UTF-8 myself and patch up the encoding attribute?
Update: I've worked around the problem by accepting the UTF-16 encoding and prepending a byte-order mark, which satisfies the downstream users of the transformed result, but I'm still be interested in how to get UTF-8 output.

You're probably sending the ouput to a DOM tree or to a character stream, not to a byte stream. If that's the case then it's not MSXML that's doing the encoding, and whatever does do the final encoding has no knowledge of the xsl:output directive (or indeed, of XSLT).

Supplementing what Michael Kay said (which is spot on, of course), here's a JScript example how to transform to a stream, using the XSLT serialization in the process:
// command line args
var args = WScript.Arguments;
if (args.length != 3) {
WScript.Echo("usage: cscript msxsl.js in.xml ss.xsl out.xml");
WScript.Quit();
}
xmlFile = args(0);
xslFile = args(1);
resFile = args(2);
// DOM objects
var xsl = new ActiveXObject("MSXML2.DOMDOCUMENT.6.0");
var xml = xsl.cloneNode(false);
// source document
xml.validateOnParse = false;
xml.async = false;
xml.load(xmlFile);
if (xml.parseError.errorCode != 0)
WScript.Echo ("XML Parse Error : " + xml.parseError.reason);
// stylesheet document
xsl.validateOnParse = false;
xsl.async = false;
xsl.resolveExternals = true;
//xsl.setProperty("AllowDocumentFunction", true);
//xsl.setProperty("ProhibitDTD", false);
//xsl.setProperty("AllowXsltScript", true);
xsl.load(xslFile);
if (xsl.parseError.errorCode != 0)
WScript.Echo ("XSL Parse Error : " + xsl.parseError.reason);
// output object, a stream
var stream = WScript.createObject("ADODB.Stream");
stream.open();
stream.type = 1;
xml.transformNodeToObject( xsl, stream );
stream.saveToFile( resFile );
stream.close();
You may test using this input:
<Urmel>
<eins>Käse</eins>
<deux>café</deux>
<tre>supplì</tre>
</Urmel>
And this stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="UTF-8"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
I think it'll be easy for you to adapt the JScript example to C++.

As you noted, BSTRs are all UTF-16. However, I think Michael Ludwig might be on to something here. Have you tried using this method?
HRESULT IXMLDOMDocument::transformNodeToObject(
IXMLDOMNode *stylesheet,
VARIANT outputObject);
You should be able to just use CreateStreamOnHGlobal, stash the resultant IStream ptr into a VARIANT, and pass that in as the outputObject parameter. Theoretically. I haven't actually tried this, though :)

Related

Convert XML like String to PySpark Dataframe

I'm using azure.storage.queue's receive_messages() function in databricks to pull messages from a azure queue. The response looks like xml but it is really just a string:
<?xml version="1.0" encoding="utf-16"?>
<root>
<col1>123</col1>
<col2>1</col2>
<col3>Unknown</col3>
<col4>Dog</col4>
<col5>Owner</col5>
<col6>-1</col6>
<col7>Owner</col7>
<col8></col8>
</root>
When I write the response to a list, it looks like:
'<root>\r\n <col1>123</col1>\r\n <col2>1</col2>\r\n <col3>Unknown</col3>\r\n <col4>Dog</col4>\r\n <col5>Owner</col5>\r\n <col6>-1</col6>\r\n <col7>Owner</col7>\r\n <col8></col8>\r\n</root>'
I know that I can split on \r\n with something like:
l = [x.strip().split(' ') for x in a[0].split('\r\n')]
l
This gives:
['root'],
['<col1>123</col1>'],
['<col2>1</col2>'],
['<col3>Unknown</col3>'],
['<col4>Dog</col4>'],
['<col5>Owner</col5>'],
['<col6>-1</col6>'],
['<col7>Owner</col7>'],
['<col8></col8'],
['</root>']]
I'm not sure if this is the best route and I don't want to hard code each value into the spark dataframe, because I need to iterate through all messages in the queue. Looking for a solution that converts each 'col' into a header and then grabs the value between 'tags'.

Here is an answer:
data=[]
for message in response:
#print(message.content)
soup = BeautifulSoup(message.content, "xml")
c=soup.find_all('col1')
c1=soup.find_all('col2')
c2=soup.find_all('col3')
c3=soup.find_all('col4')
c4=soup.find_all('col5')
c5=soup.find_all('col6')
c6=soup.find_all('col7')
c7=soup.find_all('col8')
for i in range(0,len(c)):
rows=[c[i].get_text(),
c1[i].get_text(),
c2[i].get_text(),
c3[i].get_text(),
c4[i].get_text(),
c5[i].get_text(),
c6[i].get_text(),
c7[i].get_text()]
data.append(rows)
#print(data)
out_df = spark.createDataFrame(data,schema = ['c','c1','c2','c3','c4',
'c5','c6','c7'])

This was faster, but requires the response to always be in the same order, which mine is.
data=[]
for message in response:
#print(message.content)
root=etree.fromstring(message.content.encode('utf-16'))
arr=[]
for child in root:
r=child.text
arr.append(r)
data.append(arr)
out_df = spark.createDataFrame(data,schema = ['c','c1','c2','c3','c4',
'c5','c6','c7'])
display(out_df)

Mirth Add Segment HL7 V3

I am new to Mirth/JavaScript. I have a project where I need to add a segment to an incoming HL7 v3 XML file. I have tried the following JavaScript in the destination transformer;
tmp = msg.copy();
tmp.createSegment('templateId', ClinicalDocument, 1);
tmp.ClinicalDocument['templateId'][1]['#root'] ="2.16.840.1.113883.10.20.22.1.1";
This generates an error.
Also I need to place this new segment before the existing templateID segment.
Currently this is what we receive –
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:mif="urn:hl7-org:v3/mif" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:sdtc="urn:hl7-org:sdtc">
<realmCode code="US" />
<typeId root="2.16.840.1.113883.1.3" extension="POCD_HD000040" /><br/>
<templateId root="2.16.840.1.113883.10.20.22.1.2" extension="2015-08-01" />
We want to add
Tranformed Output Desired -
Any help on how to accomplish this will be greatly appreciated.
Thank You

I understand your requirement in this way. That you need to add
<templateId root="2.16.840.1.113883.10.20.22.1.1" extension="2015-08-01" />
Exactly before the templateID
<templateId root="2.16.840.1.113883.10.20.22.1.2" extension="2015-08-01" />
If that's the requirement, this JAVA code will work
public static void main(String[] args) throws SAXException, IOException, ParserConfigurationException, TransformerException {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
// Source XML
File fXmlFile = new File("C:\\Labs\\POC\\Import_Export\\TEST.xml");
Document doc = dBuilder.parse(fXmlFile);
// Get the element after which template ID code has to be added
Node nodeName = doc.getElementsByTagName("typeId").item(0);
// Code to add new Template ID
Element newTemplateID = doc.createElement("templateId");
newTemplateID.setAttribute("root", "2.16.840.1.113883.10.20.22.1.1");
newTemplateID.setAttribute("extension", "2015-08-01");
// Inserting exactly on specific area
nodeName.getParentNode().insertBefore(newTemplateID, nodeName.getNextSibling());
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
// destination XML
StreamResult result = new StreamResult(new File("C:\\Labs\\POC\\Import_Export\\TESTNew.xml"));
transformer.transform(source, result);
}
you will get an XML like this
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:mif="urn:hl7-org:v3/mif" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:sdtc="urn:hl7-org:sdtc">
<realmCode code="US"/>
<typeId extension="POCD_HD000040" root="2.16.840.1.113883.1.3"/>
<templateId extension="2015-08-01" root="2.16.840.1.113883.10.20.22.1.1"/>
<templateId extension="2015-08-01" root="2.16.840.1.113883.10.20.22.1.2"/>
</ClinicalDocument>
Refactor the above code for Mirth specification u will get same output, else u simply want to add extra tags on the end of xml you can use this code in Mirth script
var addTemplateId = new XML("<templateId></templateId>");
addTemplateId['#root'] = '2.16.840.1.113883.10.20.22.1.1';
addTemplateId['#extension'] = '2015-08-01';
var newValue = msg.appendChild(addTemplateId);
msg = newValue;
This will add new tags at the end of the existing tags of the message. which means your output will be like this
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:mif="urn:hl7-org:v3/mif" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:sdtc="urn:hl7-org:sdtc">
<realmCode code="US"/>
<typeId extension="POCD_HD000040" root="2.16.840.1.113883.1.3"/>
<templateId extension="2015-08-01" root="2.16.840.1.113883.10.20.22.1.2"/>
<templateId extension="2015-08-01" root="2.16.840.1.113883.10.20.22.1.1"/>
</ClinicalDocument>

How to export xml or json from typo3

I need to export content from a typo3 site to a web app. I am considering to use xml or json. But I haven't figure it out how to do it.
I'm new to typo3 development. So I would like to know if someone has suggestions how to do this.
Regards,

This highly depends on your requirements ;)
As a starting point you can use a new page type and disable all header codes to generate XML e.g.
xml = PAGE
xml {
typeNum = 123
config {
disableAllHeaderCode = 1
xhtml_cleaning = none
admPanel = 0
metaCharset = utf-8
additionalHeaders = Content-Type:text/xml;charset=utf-8
}
10 = COA
10 {
wrap = <?xml version="1.0" encoding="UTF-8" standalone="yes" ?><your_root_tag>|</your_root_tag>
# add code here to generate xml content
10 = ...
}
}
If you browse to http://example.com/index.php?type=123 you'll get the XML content.
But if things get more complex writing an extension maybe the better approach.

Problem while configuring the file Delimeter("\t") in app.config(C#3.0)

In my app.config file I made the setting like the following
<add key = "Delimeter" value ="\t"/>
Now while accessing the above from the program by using the below code
string delimeter = ConfigurationManager.AppSettings["FileDelimeter"].ToString();
StreamWriter streamWriter = null;
streamWriter = new StreamWriter(fs);
streamWriter.BaseStream.Seek(0, SeekOrigin.End);
Enumerable
.Range(0, outData.Length)
.ToList().ForEach(i => streamWriter.Write(outData[i].ToString() + delimiter));
streamWriter.WriteLine();
streamWriter.Flush();
I am getting the output as
18804\t20100326\t5.59975381254617\t
18804\t20100326\t1.82599797249479\t
But if I directly use "\t" in the delimeter variable I am getting the correct output
18804 20100326 5.59975381254617
18804 20100326 1.82599797249479
I found that while I am specifying the "\t" in the config file, and while reading it into
the delimeter variable, it is becoming "\\t" which is the problem.
I even tried with but with no luck.
I am using C#3.0.
Need help

You need to use the XML entity that represents tab, which I believe is rather than the C# representation (which is "\t" as you already know).
<add key="Delimeter" value=" "/>
Or you could always just take the easy way out:
// allow for <add key="Delimeter" value="\t"/>
if (delimiter == #"\t")
delimiter = "\t";

Parsing an XML string containing " " (which must be preserved)

I have code that is passed a string containing XML. This XML may contain one or more instances of (an entity reference for the blank space character). I have a requirement that these references should not be resolved (i.e. they should not be replaced with an actual space character).
Is there any way for me to achieve this?
Basically, given a string containing the XML:
<pattern value="[A-Z0-9 ]" />
I do not want it to be converted to:
<pattern value="[A-Z0-9 ]" />
(What I am actually trying to achieve is to simply take an XML string and write it to a "pretty-printed" file. This is having the side-effect of resolving occurrences of in the string to a single space character, which need to be preserved. The reason for this requirement is that the written XML document must conform to an externally-defined specification.)
I have tried creating a sub-class of XmlTextReader to read from the XML string and overriding the ResolveEntity() method, but this isn't called. I have also tried assigning a custom XmlResolver.
I have also tried, as suggested, to "double encode". Unfortunately, this has not had the desired effect, as the & is not decoded by the parser. Here is the code I used:
string schemaText = #"...<pattern value=""[A-Z0-9&#x20;]"" />...";
XmlWriterSettings writerSettings = new XmlWriterSettings();
writerSettings.Indent = true;
writerSettings.NewLineChars = Environment.NewLine;
writerSettings.Encoding = Encoding.Unicode;
writerSettings.CloseOutput = true;
writerSettings.OmitXmlDeclaration = false;
writerSettings.IndentChars = "\t";
StringBuilder writtenSchema = new StringBuilder();
using ( StringReader sr = new StringReader( schemaText ) )
using ( XmlReader reader = XmlReader.Create( sr ) )
using ( TextWriter tr = new StringWriter( writtenSchema ) )
using ( XmlWriter writer = XmlWriter.Create( tr, writerSettings ) )
{
XPathDocument doc = new XPathDocument( reader );
XPathNavigator nav = doc.CreateNavigator();
nav.WriteSubtree( writer );
}
The written XML ends up with:
<pattern value="[A-Z0-9&#x20;]" />

If you want it to be preserved, you need to double-encode it: &#x20;. The XML-reader will translate entities, that's more or less how XML works.

<pattern value="[A-Z0-9&#x20;]" />
What I did above is replaced "&" with "&" thereby escaping the ampersand.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

XSLT transform with MSXML doesn't use proper encoding - encoding

You're probably sending the ouput to a DOM tree or to a character stream, not to a byte stream. If that's the case then it's not MSXML that's doing the encoding, and whatever does do the final encoding has no knowledge of the xsl:output directive (or indeed, of XSLT).

Related

Convert XML like String to PySpark Dataframe

Mirth Add Segment HL7 V3

How to export xml or json from typo3

Problem while configuring the file Delimeter("\t") in app.config(C#3.0)

Parsing an XML string containing " " (which must be preserved)

Categories

Resources