itextSharp: Mistake in Loading XMP in PDF with C# - itext

I am using iTextSharp to load the XMP in PDF file
Reference: Is it possible to load XMP file in PDF using iTextSharp?
From above instruction I am loaded XMP data in PDF file, but there is one problem
In keywords section "; " semicolon and single space added in prefix by default as show below in screen shot.
PDF Properties Window:
XMP Sample I used to load:
I used source code to sort out this problem but I can't, still i am searching only. Before I would like to let you know to iTextSharp author, so only am posting this question.
Note:
In case I am setting Keywords dictionary by
Dictionary<String, String> info = reader.Info;
info.Add("Keywords", ",key1; key2");
It working fine.

The problem is probably caused by the XMP file you are adding. Adobe Reader is adding extra stuff to the keywords you define in the <dc:subject> based on what is present or missing in the pdf:Keywords attribute.
Please take a look at this example: xmp_metadata_added.pdf
This is what the XMP file looks like:
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
xmlns:xmp="http://ns.adobe.com/xap/1.0/"
dc:format="application/pdf"
pdf:Keywords="Metadata, iText, PDF"
pdf:Producer="iText® 5.5.1 ©2000-2014 iText Group NV (AGPL-version); modified using iText® 5.5.1 ©2000-2014 iText Group NV (AGPL-version)"
xmp:CreateDate="2014-05-16T17:04:59+01:00"
xmp:CreatorTool="My program using iText"
xmp:ModifyDate="2014-05-16T17:04:59+01:00"
xmp:MetadataDate="2014-05-16T17:04:59+01:00">
<dc:description>
<rdf:Alt>
<rdf:li xml:lang="x-default">This example shows how to add metadata</rdf:li>
</rdf:Alt>
</dc:description>
<dc:creator>
<rdf:Seq>
<rdf:li>Bruno Lowagie</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:subject>
<rdf:Bag>
<rdf:li>Metadata</rdf:li>
<rdf:li>iText</rdf:li>
<rdf:li>PDF</rdf:li>
</rdf:Bag>
</dc:subject>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">Hello World example</rdf:li>
</rdf:Alt>
</dc:title>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
You recognize:
<dc:subject>
<rdf:Bag>
<rdf:li>Metadata</rdf:li>
<rdf:li>iText</rdf:li>
<rdf:li>PDF</rdf:li>
</rdf:Bag>
</dc:subject>
But do you see:
pdf:Keywords="Metadata, iText, PDF"
You need that part too.
This is a screen shot with that part:
When I remove pdf:Keywords="Metadata, iText, PDF", I can reproduce your problem:
This proves that your problem is caused by your XMP file, not by iText.

Related

keep/copy XMP with libexif

I try to add a thumbnail to a JPEG picture using libexif.
For now I'm borrowing the code from exif (the command line tool that is shipped by the libexif team).
However I noticed the XMP tags get deleted from the metadata. There is an old bugreport here.
I tried to see how to achieve this anyway with libexif but I don't really understand how to get the XMP from input file and put it in the output file. I just want to copy all XMP data, I don't need to extract anything of it.
I saw there is a TAG EXIF_TAG_XML_PACKET in exif_tag.h but couldn't figure out how to read/write this tag.
A related solution is in this SO answer but it looks complicated. I'm not familiar coding in C.
Is it actually possible to keep all XMP when using only libexif API? Have things changed in recent years on that? How would you write this in code?
Thanks
I believe it should be somewhat straightforward. XMP fields are described in the ISO/Adobe standard. Regular Kotlin/Java/Android file I/O and some string manipulation should be all that is required.
I would start out by becoming intimately familiar with ISO 16684-1:2019. Then, write a method for your jpeg file class that grabs all the XMP fields. Store those fields in a temp file (to prevent difficult to recover data loss in the event of your code or libexif crashing). Hand the file off to libexif. Generate the thumbnail. Finally, when that's done you can restore the XMP fields. If the thumbnail is stored in an XMP field as well (and it sounds like it is), it may be easier to concatenate that field with the other ones which were already grabbed, updating the temp file so that it contains EVERY XMP field, before adding all of the XMP fields back to the jpeg.
Unfortunately, I do not currently have the time to read a 50 page ISO standard, synthesize the information, and then write the code to implement the solution. Here's a link to the standard at least, to get you started.
https://www.iso.org/obp/ui/#iso:std:iso:16684:-1:ed-2:v1:en

Why does DITA Open Toolkit PDF plugin rename image href attributes?

I'm sorry if this doesn't have enough information. I don't typically ask for help online like this.
I'm using DITA Open Toolkit 3.4 on Windows. I generated a plugin called "vcr2" using Jarno's (very excellent and helpful) PDF Plugin Generator and then made a handful of customizations. The plugin uses the pdf2 plugin as a base. When I try to use the vcr2 plugin, my images are not working. I've tracked the problem down to malformed image filenames in the image's href attribute.
For example:
In my source file (a DITA Task), the markup for one of my images looks like this:
<image href="MyRemindersChooseReminder.png"/>
If I run a transform with the pdf2 plugin, the images work fine. In the merged stage1.xml file in the Temp folder, the XML for that same image looks like this:
<image class="- topic/image " href="df2d132af27436c59c5c8c4282e112d62bec8201.png" placement="inline" xtrc="image:1;10:66" xtrf="file:/V:/Vasont/Extract/t12340879-minimal/t12340879.xml"/>
It is processed into a file Topic.fo, and looks like this:
<fo:external-graphic
 src="url('file:/V:/Vasont/Extract/t12340879-minimal/MyRemindersChooseReminder.png')"/>
Everything works fine and the image looks fine.
If I run the same file through my 'vcr2' plugin, which just calls the same pdf2 plugin with some overrides, all the images get broken:
stage1.xml
<image class="- topic/image " href="df2d132af27436c59c5c8c4282e112d62bec8201.png" placement="inline" xtrc="image:1;10:66" xtrf="file:/V:/Vasont/Extract/t12340879-minimal/t12340879.xml"/>
Topic.fo
<fo:external-graphic
 src="url('file:/V:/Vasont/Extract/t12340879-minimal/df2d132af27436c59c5c8c4282e112d62bec8201.png')"
/>
As I track this down further, it appears that somewhere in the map-reader Ant task, this filename gets changed to that cryptic string of pseudo-hexadecimal. I think later on it's supposed to be changed back or resolved to a complete URI or something.
So, the two-part question is: Why does Open Toolkit change my filenames, and what's supposed to change them back?
DITA-OT's preprocess uses hashes for temporary filenames because it allows the code to not deal with directory structures. This enables preprocess to work in so-called "map-first" mode, where it first processes all DITA map resources and only then starts to process DITA topic and image resources.
The preprocess has a step called clean-preprocess that can rewrite the temporary file names to match source resource files names. However, this rewrite operation is disabled for PDF output because the original file names are not used for anything in that output type.

can open xml sdk be used in creating xml files?

is it possible to use the OPEN XML SDK and generate an xml file that contains some metadata of a particular docx file?
details: i have a docx file, from which i want to extract some metadata(using open xml) and display them as xml file and later use Jquery to present them in a more readable form.
You can use the SDK to extract info from the various properties parts which may be present in the docx (for example, the core properties part, which included dublin core type info).
You can extract it in its native XML form:
<cp:coreProperties
xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core- properties"
xmlns:dc="http://purl.org/dc/elements/1.1/" .. >
<dc:creator>Joe</dc:creator>
<cp:lastModifiedBy>Joe</cp:lastModifiedBy>
<cp:revision>1</cp:revision>
<dcterms:created xsi:type="dcterms:W3CDTF">2010-11-10T00:32:00Z</dcterms:created>
<dcterms:modified xsi:type="dcterms:W3CDTF">2010-11-10T00:33:00Z</dcterms:modified>
</cp:coreProperties>
or, in some other XML dialect of your own choosing.
I know question was posted a long time ago, but first result of google search sent me here. So if there are others looking for a solution to this, there is a snippet on MSDN website https://msdn.microsoft.com/en-us/library/office/cc489219.aspx
short answer is... using XmlTextWritter, and it applies to Office 2013 afaik:
// Add the CoreFilePropertiesPart part in the new word processing document.
var coreFilePropPart = wordDoc.AddCoreFilePropertiesPart();
using (XmlTextWriter writer = new XmlTextWriter(coreFilePropPart.GetStream(FileMode.Create), System.Text.Encoding.UTF8))
{
writer.WriteRaw("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<cp:coreProperties xmlns:cp=\"http://schemas.openxmlformats.org/package/2006/metadata/core-properties\"></cp:coreProperties>");
writer.Flush();
}

Regarding external-document in FOP

I am creating pdf file through the XML, XSL and FOP. I want PDF file contents to display external file contents such as word document.
I know for displaying image in PDF we use but what tag we should to display file contents other than pdf file type.
There's a FOP extension that claims to be able to do this:
jeremias-maerki.ch/development/fop/index.html
Also see xmlgraphics.apache.org/fop/1.0/extensions.html#external-document
When I used in this way
xmlns:fox="http://xmlgraphics.apache.org/fop/extensions"
content-type="pdf" src="C:\temp\reports\p2.pdf"/>
I am getting exception as
org.apache.fop.apps.FOPException: Error(Unknown location): No element mapping definition found for fox:external-document
Let me know the reason.
THanks in advance.
I'd say you're probably using an old Apache FOP version which doesn't have the fox:external-document extension, yet. Please upgrade to FOP 1.0 (or at least 0.95).
Change the namespace from:
http://xml.apache.org/fop/extensions
to
http://xmlgraphics.apache.org/fop/extensions

Do we have any Equivalent of Response.AppendHeader in windows application

I came around this technique of converting datatable to excel
http://www26.brinkster.com/mvark/dyna/downloadasexcel.html
Do we have any Equivalent of Response.AppendHeader in windows application in C#.
Regards
Hema
The trick in the code sample that you have mentioned to dynamically generate an Excel file is based on the fact that documents can be converted from Word/Excel to HTML (File->Save As) and vice versa. Essentially a HTML page containing Office XML is created & in a web application a file download is triggered with the help of the following Response.AppendHeader statements -
Response.AppendHeader("Content-Type", "application/vnd.ms-excel");
Response.AppendHeader("Content-disposition", "attachment; filename=my.xls");
If you want to use this technique in a Winforms application, just save the string content as a text file and give the file an extension of ".xls". Instead of the last 3 lines in the sample's Page_Load method, replace it with this line -
System.IO.File.WriteAllText(#"C:\Report.xls", strBody);
HTH