Regarding external-document in FOP - apache-fop

I am creating pdf file through the XML, XSL and FOP. I want PDF file contents to display external file contents such as word document.
I know for displaying image in PDF we use but what tag we should to display file contents other than pdf file type.
There's a FOP extension that claims to be able to do this:
jeremias-maerki.ch/development/fop/index.html
Also see xmlgraphics.apache.org/fop/1.0/extensions.html#external-document
When I used in this way
xmlns:fox="http://xmlgraphics.apache.org/fop/extensions"
content-type="pdf" src="C:\temp\reports\p2.pdf"/>
I am getting exception as
org.apache.fop.apps.FOPException: Error(Unknown location): No element mapping definition found for fox:external-document
Let me know the reason.
THanks in advance.

I'd say you're probably using an old Apache FOP version which doesn't have the fox:external-document extension, yet. Please upgrade to FOP 1.0 (or at least 0.95).

Change the namespace from:
http://xml.apache.org/fop/extensions
to
http://xmlgraphics.apache.org/fop/extensions

Related

Why does DITA Open Toolkit PDF plugin rename image href attributes?

I'm sorry if this doesn't have enough information. I don't typically ask for help online like this.
I'm using DITA Open Toolkit 3.4 on Windows. I generated a plugin called "vcr2" using Jarno's (very excellent and helpful) PDF Plugin Generator and then made a handful of customizations. The plugin uses the pdf2 plugin as a base. When I try to use the vcr2 plugin, my images are not working. I've tracked the problem down to malformed image filenames in the image's href attribute.
For example:
In my source file (a DITA Task), the markup for one of my images looks like this:
<image href="MyRemindersChooseReminder.png"/>
If I run a transform with the pdf2 plugin, the images work fine. In the merged stage1.xml file in the Temp folder, the XML for that same image looks like this:
<image class="- topic/image " href="df2d132af27436c59c5c8c4282e112d62bec8201.png" placement="inline" xtrc="image:1;10:66" xtrf="file:/V:/Vasont/Extract/t12340879-minimal/t12340879.xml"/>
It is processed into a file Topic.fo, and looks like this:
<fo:external-graphic
 src="url('file:/V:/Vasont/Extract/t12340879-minimal/MyRemindersChooseReminder.png')"/>
Everything works fine and the image looks fine.
If I run the same file through my 'vcr2' plugin, which just calls the same pdf2 plugin with some overrides, all the images get broken:
stage1.xml
<image class="- topic/image " href="df2d132af27436c59c5c8c4282e112d62bec8201.png" placement="inline" xtrc="image:1;10:66" xtrf="file:/V:/Vasont/Extract/t12340879-minimal/t12340879.xml"/>
Topic.fo
<fo:external-graphic
 src="url('file:/V:/Vasont/Extract/t12340879-minimal/df2d132af27436c59c5c8c4282e112d62bec8201.png')"
/>
As I track this down further, it appears that somewhere in the map-reader Ant task, this filename gets changed to that cryptic string of pseudo-hexadecimal. I think later on it's supposed to be changed back or resolved to a complete URI or something.
So, the two-part question is: Why does Open Toolkit change my filenames, and what's supposed to change them back?
DITA-OT's preprocess uses hashes for temporary filenames because it allows the code to not deal with directory structures. This enables preprocess to work in so-called "map-first" mode, where it first processes all DITA map resources and only then starts to process DITA topic and image resources.
The preprocess has a step called clean-preprocess that can rewrite the temporary file names to match source resource files names. However, this rewrite operation is disabled for PDF output because the original file names are not used for anything in that output type.

How to download a file (csv.gz) from a url using Python 3.7

As with others who have posted in the past, I cannot figure out to download a csv.gz file from a URL in Python 3.7. I see posts but they only post a 2kb file.
I am a 100% newbie using Python. What follows is the code for one file that I am trying to obtain. I can't even do that. The final goal would be to request all files that start with 2019* using python. Please try the code below to save the file. As others stated, the file is just a name without the true content - Ref: Downloading a csv.gz file from url in Python
import requests
url = 'https://public.bitmex.com/?prefix=data/trade/20191026.csv.gz'
r = requests.get(url, allow_redirects=True)
open('20191026.csv.gz', 'wb').write(r.content)
Yields:
Out[40]:
1245
I've tried "wget" and urllib.request along with "urlretrieve" also.
I wish I could add a screenshot or attach a file. The file created is 2kb and not even a csv.gz file. But the true file that I can download from a web browser is 78mb. The file is 20191026.csv.gz not that it matters as they all do the same thing. The location is https://public.bitmex.com/?prefix=data/trade/
Again, if you know of a way to obtain all the files using a filter such that 2019*csv.gz would be fantastic.
You are trying to download the files from https://public.bitmex.com/?prefix=data/trade/.
To achieve your final goal of download all the files starting from 2019* you have to do in 3 steps
1) you read the content of https://public.bitmex.com/?prefix=data/trade/
2) convert the content into an list, from that filter out the file names which starting from 2019.
3) from the result list try to download the csv using the example which you referring.
Hope this approach will help you
Happy coding.

itextSharp: Mistake in Loading XMP in PDF with C#

I am using iTextSharp to load the XMP in PDF file
Reference: Is it possible to load XMP file in PDF using iTextSharp?
From above instruction I am loaded XMP data in PDF file, but there is one problem
In keywords section "; " semicolon and single space added in prefix by default as show below in screen shot.
PDF Properties Window:
XMP Sample I used to load:
I used source code to sort out this problem but I can't, still i am searching only. Before I would like to let you know to iTextSharp author, so only am posting this question.
Note:
In case I am setting Keywords dictionary by
Dictionary<String, String> info = reader.Info;
info.Add("Keywords", ",key1; key2");
It working fine.
The problem is probably caused by the XMP file you are adding. Adobe Reader is adding extra stuff to the keywords you define in the <dc:subject> based on what is present or missing in the pdf:Keywords attribute.
Please take a look at this example: xmp_metadata_added.pdf
This is what the XMP file looks like:
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
xmlns:xmp="http://ns.adobe.com/xap/1.0/"
dc:format="application/pdf"
pdf:Keywords="Metadata, iText, PDF"
pdf:Producer="iText® 5.5.1 ©2000-2014 iText Group NV (AGPL-version); modified using iText® 5.5.1 ©2000-2014 iText Group NV (AGPL-version)"
xmp:CreateDate="2014-05-16T17:04:59+01:00"
xmp:CreatorTool="My program using iText"
xmp:ModifyDate="2014-05-16T17:04:59+01:00"
xmp:MetadataDate="2014-05-16T17:04:59+01:00">
<dc:description>
<rdf:Alt>
<rdf:li xml:lang="x-default">This example shows how to add metadata</rdf:li>
</rdf:Alt>
</dc:description>
<dc:creator>
<rdf:Seq>
<rdf:li>Bruno Lowagie</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:subject>
<rdf:Bag>
<rdf:li>Metadata</rdf:li>
<rdf:li>iText</rdf:li>
<rdf:li>PDF</rdf:li>
</rdf:Bag>
</dc:subject>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">Hello World example</rdf:li>
</rdf:Alt>
</dc:title>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
You recognize:
<dc:subject>
<rdf:Bag>
<rdf:li>Metadata</rdf:li>
<rdf:li>iText</rdf:li>
<rdf:li>PDF</rdf:li>
</rdf:Bag>
</dc:subject>
But do you see:
pdf:Keywords="Metadata, iText, PDF"
You need that part too.
This is a screen shot with that part:
When I remove pdf:Keywords="Metadata, iText, PDF", I can reproduce your problem:
This proves that your problem is caused by your XMP file, not by iText.

Replacing aspose JRPptExporterParameter.PPT_TEMPLATE_PRESENTATION in jasperreports 5.0

I am generating a ppt report using jasperreports and aspose library (ppt exporter from aspose). I'm trying to eliminate aspose from the project and use the ppt exporter from jasperreports 5.0. The problem is that at the moment the generated report needs an external .pot file which is added using aspose:
com.aspose.slides.jasperreports.JRPptExporter exporter = new com.aspose.slides.jasperreports.JRPptExporter();
......
exporter.setParameter(com.aspose.slides.jasperreports.JRPptExporterParameter.PPT_TEMPLATE_PRESENTATION, pptTemplate);
exporter.exportReport();
I didn't find any similar parameter in JRExporterParameter from jasperreports and I couldn't find any efficient solution yet. Is there any method of using an external .pot file? I was thinking about creating a second JasperPrint object from the .pot file and then exporting both JasperPrint objects setting JRExporterParameter.JASPER_PRINT_LIST
Not sure if that fits for you, but I've written a custom PPTX Exporter (only pptx, not binary ppt), which is based on Apache POI. The POI element can be initalized by your own template pptx (not yet implemented in my version).
https://code.google.com/p/pptx-shape-exporter/
Drop me a line, if this sounds interesting to you.

can open xml sdk be used in creating xml files?

is it possible to use the OPEN XML SDK and generate an xml file that contains some metadata of a particular docx file?
details: i have a docx file, from which i want to extract some metadata(using open xml) and display them as xml file and later use Jquery to present them in a more readable form.
You can use the SDK to extract info from the various properties parts which may be present in the docx (for example, the core properties part, which included dublin core type info).
You can extract it in its native XML form:
<cp:coreProperties
xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core- properties"
xmlns:dc="http://purl.org/dc/elements/1.1/" .. >
<dc:creator>Joe</dc:creator>
<cp:lastModifiedBy>Joe</cp:lastModifiedBy>
<cp:revision>1</cp:revision>
<dcterms:created xsi:type="dcterms:W3CDTF">2010-11-10T00:32:00Z</dcterms:created>
<dcterms:modified xsi:type="dcterms:W3CDTF">2010-11-10T00:33:00Z</dcterms:modified>
</cp:coreProperties>
or, in some other XML dialect of your own choosing.
I know question was posted a long time ago, but first result of google search sent me here. So if there are others looking for a solution to this, there is a snippet on MSDN website https://msdn.microsoft.com/en-us/library/office/cc489219.aspx
short answer is... using XmlTextWritter, and it applies to Office 2013 afaik:
// Add the CoreFilePropertiesPart part in the new word processing document.
var coreFilePropPart = wordDoc.AddCoreFilePropertiesPart();
using (XmlTextWriter writer = new XmlTextWriter(coreFilePropPart.GetStream(FileMode.Create), System.Text.Encoding.UTF8))
{
writer.WriteRaw("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<cp:coreProperties xmlns:cp=\"http://schemas.openxmlformats.org/package/2006/metadata/core-properties\"></cp:coreProperties>");
writer.Flush();
}