Building DOM with xerces and Java - how to prevent escaping of ampersand - xml-serialization

I am using xerces in Java to build a DOM. For one of the fields that becomes a text node in the DOM, the data is being delivered from a source that has already turned any non ASCII and/or XML special characters into their entity names or numbers, e.g. "Banana®"
I know the design of the system is wrong in terms the data source shouldn't be doing this but that is out of my control, but what I am wondering is if there is a way to somehow prevent this from being escaped and turned into "Banana®" without decoding first? (I know it will implicitly convert any chars it needs to so I could enter the raw char after decoding).
Example code:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document dom = db.newDocument();
Element root = dom.createElement("Companies");
dom.appendChild(root);
Element company = dom.createElement("Company");
Text t = dom.createTextNode("Banana®");
company.appendChild(t);
root.appendChild(company);
DOMImplementationRegistry dir = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl =
(DOMImplementationLS)dir.getDOMImplementation("LS");
LSSerializer writer = impl.createLSSerializer();
LSOutput output = impl.createLSOutput();
output.setByteStream(System.out);
writer.write(dom, output);
Example Output:
<?xml version="1.0" encoding="UTF-8"?>
<Companies><Company>Banana&#174;</Company></Companies>

If you could somehow declare it in a CDATA section, it should be passed through as is.

Related

NSDocument XML read Issue

I am working on a NSDocument based Mac app. Which imports .xml file. It's working fine for some xml files but for few having issues.
Issue is read() is modifying the data when we import file, i need to keep the original data as it is.
what do i need to do to make sure i get original xml data in the read()?
I am using below function to read the file
override func read(from data: Data, ofType typeName: String) throws {
var error:NSError? = nil
var xmlDocument1:XMLDocument? = XMLDocument()
do{
xmlDocument1 = try XMLDocument(data: data, options: XMLNode.Options(rawValue: XMLNode.Options.RawValue(Int(XMLNode.Options.nodePreserveWhitespace.rawValue))))
}catch let err as NSError{
error = err
}
if error != nil {
throw error!
}
}
and i parse xmlDocument1 to read and get all the xml information.
Issue: Doing this way swift is modifying the document, as mentioned below.
Example 1:
Original:
<iws:attr-option name="1 - Poor" />
<iws:attr-option name="2 - Needs Improvement" />
Data getting from Read(), notice the closing tags added automatically
<iws:attr-option name="1 - Poor"></iws:attr-option>
<iws:attr-option name="2 - Needs Improvement"></iws:attr-option>
Example 2:
Original:
<source>
<ph id="12" x="</span>">{12}</ph>
</source>
Data getting from Read(), notice the ">" symbol is replaced with "& gt;"
<source>
<ph id="12" x="</span>">{12}</ph>
</source>
Example 3:
I am not able to paste the code here as the special character is not even displaying here, so adding image.
left is the original and right side one is what i am getting in read(), special character is missing.
Code Sameple : (I am not sure if we can post code directly here)
https://drive.google.com/drive/folders/1WWGE7fFJPKvs5KU5f_PlwWtoqCVxTcS0?usp=sharing
Above drive we have sample xml file and code.
"DevelopingADocumentBasedApp" is the code, just open the "DocumentBasedApp.xcodeproj", run it.
3 .Once it runs, click on Menu->File->Open and open the provided xml file.
In content.swift, Keep a break point at "print(xmlDocument!)"
Here we can see the document is modified by NSDocument, and it is different from the original
Edit:
#matt Thank you for making me understand real problem, Initially i thought that i have issue with NSDocument's read(). But issues is XMLDocument() not returning exact data. I need to find a solution for that.
Reading is not changing your document.
You make an xml document, with XMLDocument(data:...). You are asking for a new valid XML document based on your original, and that is exactly what you get. The resulting structure is not a big string, like your original data; it is an elaborate node tree reflecting the structure of your XML. That node tree is identical to the structure described by your original. That fact does not affect in any way your ability to parse the document; indeed, it is why you are able to parse the document. If you think it does cause an inability to parse the document, your parsing code is wrong (but you didn't show that, so no more can be said).
Also note that your evidence for what is "in" the XML document is indirect; the XML document is a node tree, but the strings you display are the output of a secondary rendering into a string. That rendering representation is arbitrary and malleable; it obeys its own rules of formatting. (And again, you didn't show anything about how you obtain that rendering. Perhaps we are talking about your print statement?)
The point is, you seem to have to some sort of expectation about how passing into an XMLDocument and then back out of it will "round trip" your original string in such a way that the output looks just like the original. That expectation is incorrect. That's not what XMLDocument does.
And merely reading the original data into an XMLDocument did not change the data, I can promise you that.
So don't worry, be happy; as far as the validity of your XML is concerned, everything is fine, and the data you started with has not been altered in any way.
Here's a demonstration:
let xmlstring = """
<testing>
<fun whatever="thingy" />
</testing>
"""
print(xmlstring)
let xmldata = xmlstring.data(using: .utf8)!
let xml = try? XMLDocument(data: xmldata, options: [])
print("=======")
print(xml!)
The output is:
<testing>
<fun whatever="thingy" />
</testing>
=======
<?xml version="1.0"?><testing><fun whatever="thingy"></fun></testing>
As you can see, the output from the print is not the same as the input string. But it is a valid XML representation of the original string, and that's all that matters. And the original xmlstring and xmldata that I started with are, I assure you, completely untouched.

Combine two PDF-a documents using ITextSharp

hoping that someone can see the flaw in my code to merge to PDF-a documents using ITextSharp. Currently it complains about missing metadata which PDF-a requires.
Document document = new Document();
MemoryStream ms = new MemoryStream();
using (PdfACopy pdfaCopy = new PdfACopy(document, ms, PdfAConformanceLevel.PDF_A_1A))
{
document.Open();
using (PdfReader reader = new PdfReader("Doc1.pdf"))
{
pdfaCopy.AddDocument(reader);
}
using (PdfReader reader = new PdfReader("doc2.pdf"))
{
pdfaCopy.AddDocument(reader);
}
}
The exact error received is
Unhandled Exception: iTextSharp.text.pdf.PdfAConformanceException: The document catalog dictionary of a PDF/A conforming file shall contain
the Metadata key
I was hoping that the 'document catalog dictionary' would be copied as well, but I guess the 'new Document()' creates an empty non-conforming document or something.
Thanks! Hope you can help
Wouter
You need to add this line:
copy.CreateXmpMetadata();
This will create some default XMP metadata. Of course: if you want to create your own XMP file containing info about the documents you're about to merge, you can also use:
copy.XmpMetadata = myMetaData;
where myMetaData is a byte array containing a correct XMP stream.
I hope you understand that iText can't automatically create the correct metadata. Providing metadata is something that needs human attention.

Sending email with attachment using scala and Liftweb

This is the first time i am integrating Email service with liftweb
I want to send Email with attachments(Like:- Documents,Images,Pdfs)
my code looking like below
case class CSVFile(bytes: Array[Byte],filename: String = "file.csv",
mime: String = "text/csv; charset=utf8; header=present" )
val attach = CSVFile(fileupload.mkString.getBytes("utf8"))
val body = <p>Please research the enclosed.</p>
val msg = XHTMLPlusImages(body,
PlusImageHolder(attach.filename, attach.mime, attach.bytes))
Mailer.sendMail(
From("vyz#gmail.com"),
Subject(subject(0)),
To(to(0)),
)
this code is taken from LiftCookbook its not working like my requirement
its working but only the Attached file name is coming(file.csv) no data in it(i uploaded this file (gsy.docx))
Best Regards
GSY
You don't specify what type fileupload is, but assuming it is of type net.liftweb.http. FileParamHolder then the issue is that you can't just call mkString and expect it to have any data since there is no data in the object, just a fileStream method for retrieving it (either from disk or memory).
The easiest to accomplish what you want would be to use a ByteArrayInputStream and copy the data to it. I haven't tested it, but the code below should solve your issue. For brevity, it uses Apache IO Commons to copy the streams, but you could just as easily do it natively.
val data = {
val os = new ByteArrayOutputStream()
IOUtils.copy(fileupload.fileStream, os)
os.toByteArray
}
val attach = CSVFile(data)
BTW, you say you are uploading a Word (DOCX) file and expecting it to automatically be CSV when the extension is changed? You will just get a DOCX file with a csv extension unless you actually do some conversion.

Serializing OpenXML Parts Elements Storing in VARBINARY SQL 2005

I am building a solution that allows users to pick and chose sections from a Word template, populate those sections with content from a database, and assemble the 1k
new data into a new .docx document
So far, I have successful methodologies for locating content and transplanting that content into a new document. I am using the OpenXML SDK 2.0 to locate content by Styles and Content Controls. I am able to create IEnumerable objects containing elements such as Paragraphs, SdtBlocks, Run, etc.
I need to find an elegant way to serialize these element blocks so I can store them as whole blocks of type VARBINARY in a SQL 2005 database. Can someone please point me to a viable example for serializing these OpenXML parts/elements?
I am working on Excel at the moment but I think your problem is similar in nature.
From the code below I can extract the XML code of the row and then store it.
private string GetContents(uint rowIndex)
{
return GetExistingRow(rowIndex).OuterXml;
}
private Row GetExistingRow(uint rowIndex)
{
return SheetData.
Elements<Row>().
Where(r => r.RowIndex == rowIndex).
FirstOrDefault();
}
please note the SheetData object is extracted as
this.SheetData = WorksheetPart.Worksheet.GetFirstChild<SheetData>()
I hope this helps.

War file deployment

I wrote a jsp application, and if I generate a war file with eclipse in windows XP, language: tradition Chinese. and deploy to weblogic,
it will have such problem:
inputAdministrator.jsp:251:11: This type name is ambiguous because it matches more than one '*'-import, including 'java.io.*' and 'admin.iguard.businessObject.*'.
DataInput d = (DataInput) dataInput;
^-------^
inputAdministrator.jsp:252:29: Type java.io.DataInput contains no methods named getDept1.
String dept1 = d.getDept1();
^------^
inputAdministrator.jsp:253:26: No match was found for method trim() in type <error>.
String emp2 = d.getEmp2().trim();
^----------------^
inputAdministrator.jsp:253:28: Type java.io.DataInput contains no methods named getEmp2.
String emp2 = d.getEmp2().trim();
^-----^
inputAdministrator.jsp:254:29: Type java.io.DataInput contains no methods named getDept2.
String dept2 = d.getDept2();
^------^
inputAdministrator.jsp:255:33: Type java.io.DataInput contains no methods named getDept_code.
String dept_code = d.getDept_code();
^----------^
inputAdministrator.jsp:256:32: Type java.io.DataInput contains no methods named getStaff_no.
String staff_no = d.getStaff_no();
^---------^
inputAdministrator.jsp:257:32: Type java.io.DataInput contains no methods named getEmp2_por.
String emp2_por = d.getEmp2_por();
^---------^
if I generate the war file in windows xp, simplize Chinese, and deploy to weblogic, everything will be OK.
I don't know how the "text file encoding" setting will affect the generated war file,
how can i make sure that all this things are in sync.
Any one have better solution?
Any suggestions will be appreciated.
Thanks in advance!
did you check it? does text encoding changes in both the j2ee exports as a WAR file?
windows-->preferences-->General-->workspace-->textfileencoding?
it defaults to cp1532
what is the value of textfileencoding variable set in simplize Chinese as compared to tradition Chinese ??
May be the "text file encoding" triggers some kind of recompilation which makes that issue visible.
In any case, could you try first to disambiguate the DataInput usage, by:
adding for example "java.io."(in front of DataInput) everywhere in that source where it is actually a java.io case (leaving a simple DataInput for businessObject usages)
not using import java.io.* (but using CTRL+SHIFT+O for reorganizing the imports)
would that solve the problem, whatever the "text file encoding" is?