Pdfa's with xfa and iText

Pdfa's with xfa and iText - itext

We have a pdf form with xfa init.
I grab and save the xfa data locally to a xml file.
<?xml version="1.0" encoding="UTF-8"?><xfa:data xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/">
<form1>
<Name>J Motiwala</Name>
<Title>Senior Software Engineer</Title>
<Deptartment>development</Deptartment>
<Phone>678-751-9448</Phone>
<Date>2017-03-10</Date>
<DateNeeded>2017-03-10</DateNeeded>
<Reason>Training course</Reason>
<Payee>Safari</Payee>
<Amount>125.00000000</Amount>
<Date/>
<DateNeeded/>
<Reason/>
<Payee/>
<Amount/>
<Date/>
<DateNeeded/>
<Reason/>
<Payee/>
<Amount/>
<Date/>
<DateNeeded/>
<Reason/>
<Payee/>
<Amount/>
<Date/>
<DateNeeded/>
<Reason/>
<Payee/>
<Amount/>
<Date/>
<DateNeeded/>
<Reason/>
<Payee/>
<Amount/>
<DeliveryInstructions>please send a cheque</DeliveryInstructions>
<Comments>training needed asap</Comments>
<AmountPaid/>
<CheckNo/>
<DateReceived/>
</form1>
</xfa:data>
Now it is possible this could also be
<?xml version="1.0" encoding="UTF-8"?><xfa:data xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/"/>
My question is if form1 tag a standard xfa tag, I could not find any documentation that states that it is.
My question is if a form1 tag is not found could I generate it via code and simply update the xml?
Also can the tag be named anything other than form1.

<form1> is not standard XFA. In XFA, the designer of the form can use any XSD he wants for the <xfa:data>. That's an advantage of XFA; people don't have to adapt their data to the form. Normally, there's also a data description part stored in the XFA XML. The syntax for this description looks somewhat like XSD, but it's not.
If you have an existing PDF, you cannot just "invent" new tags, because there's a data-binding between tag names and field names. If you introduce a tag that isn't known by the form, the corresponding data won't show up anywhere in the form.

Related

How to read form fragment references in a pdf document?

I am working with adobe LiveCycle ES4 and I am attempting to make a custom LiveCycle component (in java) that counts references to a specified form fragment. However I am having some difficulty finding documentation on form fragments in pdf files. So my question is how would I go about reading form fragment references from a pdf document?
Also any documentation, API, or library that may help me with this task would be greatly appreciated.
--Form Fragments--
A form fragment is a collection of form objects (fields, buttons, shapes, tables, etc) as well as related style/formatting that is saved as a separate .xsd file in a form fragment library (usually a directory on the livecycle server that holds the .xsd files). References to form fragments can be inserted into a form when working in LiveCycle Designer.This is particularly useful for creating many similar forms (such as a form fragment with fields for contact information). When a form fragment is edited the changes are reflected across all forms that hold a reference to that fragment (when the pdf is opened and has access to the form fragment library.

The PDF specification currently is an official ISO standard - ISO 32000. You should be able to get the document from the ISO organisation or from your country's standards organisation.
However, before being an ISO standard, PDF was developed and maintained by Adobe and they still have the specification available on their site: http://www.adobe.com/devnet/pdf/pdf_reference.html
There are differences between this specification and the ISO 32000 specification document, but they are largely from an editorial manner so for your purpose I'd look at the Adobe document.
Using the additional information in the comments, your test file and a low level PDF document browser (pdfToolbox in this case - attention, I'm affiliated with this product), I found following information:
In the "Catalog" object for your test PDF, you'll find a key named "AcroForm" that points to a dictionary with information about your form.
In that "AcroForm" dictionary you'll find a key called "XFA" that contains what appears to be almost all information about the XFA form that LifeCycle designer generated.
That "XFA" key points to an array which seems to consist of pairs of information. Element 0 is a string called "preamble", element 1 seems to be the data belonging to that string. So each pair of elements is a bit of information.
The information in that array consists of "preamble", "config", "template", "localeset", "xmpmeta" and "postamble". If you look at the element for "template" (the 6th element in the array if you calculate 1-based), you'll find the data you are looking for. The data is stored as a FlateDecoded stream that you'll have to uncompress - then it's just XML data that should be fairly easy to parse. In it are three lines that for you should be of particular interest:
<subform x="6.35mm" y="6.35mm" name="TestFragment1"
<subform x="3.175mm" y="34.925mm" name="TestFragment2"
<subform x="0.125in" y="2.75in" name="TestFragment2"
I'm assuming the XFA specification pointed to by mkl contains more information about these things, but it seems like simply looking for "subform" elements in the XML should get you the references to the form fragments pretty easily.

In the XFA xml data in the sample PDF you provided there are processing instructions like this:
<?templateDesigner expand 1?><?designerFragmentSource CjxzdWJmb3JtIHVzZWhyZWY9Ii4uXC4uXC4uXEFkb2JlXEFkb2JlIExpdmVDeWNsZSBFUzRcZm9y
bV9mcmFnbWVudHNcVGVzdEZyYWdtZW50MS54ZHAjc29tKCR0ZW1wbGF0ZS5mb3JtMS5UZXN0RnJh
Z21lbnQxKSIgeD0iNi4zNW1tIiB5PSI2LjM1bW0iIHhtbG5zPSJodHRwOi8vd3d3LnhmYS5vcmcv
c2NoZW1hL3hmYS10ZW1wbGF0ZS8zLjMvIgo+PD90ZW1wbGF0ZURlc2lnbmVyIGV4cGFuZCAxPz48
L3N1YmZvcm0KPg==?>
Decoding the base64 encoded argument in there one gets:
<subform usehref="..\..\..\Adobe\Adobe LiveCycle ES4\form_fragments\TestFragment1.xdp#som($template.form1.TestFragment1)" x="6.35mm" y="6.35mm" xmlns="http://www.xfa.org/schema/xfa-template/3.3/"
><?templateDesigner expand 1?></subform
>
So it looks like your
custom LiveCycle component (in java) that counts references to a specified form fragment
should parse the XFA XML, look for these designerFragmentSource processing instructions, and analyze them.
Please be aware, though, that these processing instructions most likely are proprietary Adobe stuff (I at least did not find them in the current XFA specification). Thus, as soon some third party tool touches the XFA XML, the PIs might not be accurate anymore. You actually even cannot be sure what happens in different Adobe software versions.

XML import of IndexCat files

the National Library of Medicine (NLM) has made available to the public last April its Index Medicus collection. This collection is composed of 5 series, each series containing several volumes. These files are available as XML files at this website: http://www.nlm.nih.gov/hmd/indexcat/indexcatxml.html
The same website also has a file for the Document Type Definition (DTD).
I am trying to import these XML files into FileMaker Pro 12 Advanced, but so have been unsuccessful. I realize I need to specify a XSLT style sheet that transforms the XML into the proper grammar. I do not know how to do that. I used the example that comes with FileMaker (called msdso_elem.xslt). I also modified the top of one the XML files as indicated below.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE fmresultset PUBLIC "-//FMI//DTD fmresultset//EN"
"http://www.nlm.nih.gov/databases/dtd/nlmindexcataloguerecordset_130401.dtd">
With these two modifications, I am able to import all records in a series, but the fields do not correspond to the original fields of the XML file. In particular, I have fewer fields in the newly created FileMaker file, and the information they contain is not well separated (for example the author first name is smashed to the author last name, rather than being in its own field).
Can anybody help me to modify the XSLT style sheet to achieve a proper import of the NLM XML files?
Many thanks
Patrick

You can try the universal XSLT at http://jensteich.de/fmfaq/export/universal-xslt-fur-den-import-beliebiger-xml-daten/

Creating a working copy for Plone 4 custom content types

I have created a custom Plone content type in my package i.e. my.product.
I am in need of integrating a working copy support: so that a "published" document (in my case, a published content type) stays online while it is being edited. Basically, I want to take advantage of 'Working Copy Support (Iterate)' provided by plone.app.iterate to achieve what is explained here. This will provide me with ability to check-in/check-out my changes.
Is this possible in Plone 4 with custom content types using Archetypes? How would one go about it if yes?

I added the following two files inside my.product/my/product/profiles/default folder and it appears to work:
diff_tool.xml
<?xml version="1.0"?>
<object>
<difftypes>
<type portal_type="MyCustomType">
<field name="any" difftype="Compound Diff for AT types"/>
</type>
</difftypes>
</object>
repositorytool.xml
<?xml version="1.0"?>
<repositorytool>
<policymap>
<type name="MyCustomType">
<policy name="at_edit_autoversion"/>
<policy name="version_on_revert"/>
</type>
</policymap>
</repositorytool>

I have never used plone.app.iterate, but this is the generic approach how to solve the problem.
Actions are installed by plone.app.iterate GenericSetup profile. You can see actions here:
https://github.com/plone/plone.app.iterate/blob/master/plone/app/iterate/profiles/default/actions.xml
Pay note to the line *available_expr* which tells when to show the action or not. It points to helper view with the conditition.
The view is defined here
https://github.com/plone/plone.app.iterate/blob/master/plone/app/iterate/browser/configure.zcml#L7
The checks that are performed for the content item if it's archiveable
https://github.com/plone/plone.app.iterate/blob/master/plone/app/iterate/browser/control.py#L47
Most likely the failure comes from if not interfaces.IIterateAware.providedBy condition. Your custom contennt must declare this interface. However, you can confirm this putting a pdb breakpoint in checkin_allowed(self) and step it though line-by-line and see what happens with your content type.

Exporting data in enhanced Grid to csv or xml format using dojo

In my project we are using dojo framework in UI. We are having a functionality to exporting the data in the enhanced grid into excel/csv files. In the dojo toolkit, they are binding the id in the textarea but i need those values in the excel/csv file...can any one help in this issue...? if possible pls tell me how to export the enhanced grid data to excel/csv files...

If you are already using the Enhanced Data Grid, you should be able to include the exporter plugin - dojox.grid.enhanced.plugins.exporter.CSVWriter - to get the CSV text.
This will give you access to two main functions exportGrid and exportSelected that will take the contents and export them as CSV text.
Unfortunately that doesn't get them as a separate file (click to download), just the formatted text in a textarea (or whatever).
To get a "click to download CSV function), you could write a servlet/jsp proxy, which would take a POST from your page with the CSV text (from the plugin above) as part of the form and simply copy it back out with the correct headers to make it appear as an attachment.
response.setContentType("text/csv"); response.setHeader("Content-Disposition","attatchment;filename=name.csv")
This would require something server side though.. and at that point, you may want to consider having a servlet simply produce the CSV text directly.
http://dojotoolkit.org/reference-guide/dojox/grid/EnhancedGrid/plugins/Exporter.html

Ignore CDATA while xml parsing

I am new to iphone development.I want to ignore CDATA tag while parsing because it consider the HTML tag following it as text.Since i want to display the content alone ,i want my parser to ignore CDATA tag.My source code is
[CDATA[<br /><p class="author"><span class="by">By: </span>By Sydney Ember</p><br><p>In the week since an </p>]].
Is there any way to ignore CDATA tag?
Is there any way to parse my source twice so it displays only the content?
Please give me some sample code.Please help me out.Thanks.

If you treat the CDATA content as XML instead of CDATA then your parser will throw an error (since your HTML is a weird mix of XHTML and HTML and is not well formed).
If you want to get the HTML, then parse the XML, extract the text content of the node, then parse that text as HTML.

There is no way to ignore the CDATA tag - it's part of the xml spec and parsers should honour it.
If you don't like the idea of this answer to your earlier question, you could get the contents of the CDATA section and parse it as XML again. However, this is highly not recommended! You don't know that the contents of the CDATA are going to be valid xml (they're probably not).
If you can 100% guarentee that the CDATA section contains the form you have above, you could probably use some string manipulation to get the data out (i.e. string replace '<span class="by">By: </span>' with '') but again, this will almost certainly break if the CDATA contents change.
Where is the xml coming from? It's a better idea to talk to owner of the service and get them to send you instead of description something like
<description>
<author>By Sydney Ember</autho>
<text>In the week since an </text>
</description>
S

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse