I know some similar issues exist (Find the field names of inputtable form fields in a PDF document?) but my question is different:
I have all the field names (in fdf file).
I wish I could visually identify directly on the PDF.
With acrobat I should be able to right click on a field and then select "display the name of the field" but I can find no such thing.
Can someone help me ?
Ok. I have found pdf editor where this is possible. Probably acrobat pro too...
http://www.pdfescape.com/
Right click on the field : unlock. Right click again : get properties.
If you're using Apache PDFBox to fill the form automatically, you can use it to fill all text fields with their name:
final PDDocument document = PDDocument.load(in);
final PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
final Iterator<PDField> it = acroForm.getFieldIterator();
for (PDField f : acroForm.getFields()) {
System.out.println(f.toString());
if (f instanceof PDTextField) {
f.setValue(f.getFullyQualifiedName());
}
};
document.save(...);
When you open the generated PDF, you'll be able to identify each field immediately like you asked.
There's a free tool that does this job quite well.
sudo apt install pdftk
You can use pdftk's dump_data_fields to get all fields like this:
pdftk sample.pdf dump_data_fields output fields.txt
Dump would look something like this:
FieldType: Text
FieldName: CRIssue
FieldFlags: 8388608
FieldValue: Issue with something ---- if it is filled
FieldJustification: Center
If you're looking for a tool where you can load your editable pdf file, click on the input (text or checkbox) field and get the basic info like Name: https://code-industry.net/masterpdfeditor/. It's available cross-platform.
Related
First off, sorry if this is really basic, but I've been working with fields in a word document for the past few days and I'm finding them quite counterintuitive. I have a document with over 100 images, and I am sourceing those images using the INCLUDEPICTURE field. Inside that field there is a DOCVARIABLEwhich contains the path to the image. I set this up to display all 1000 images. I then copied this word file and made a new one because I had a second set of images to display. SoI copied and pasted a section of the image name in the field codes and replaced it with a new name, e.g. all "image_a" instances were replaced with "image_b" so instead of seeing "image_a_1.png" and "image_a_2-png", the field codes now show "image_b_1.png" and "image_b_2.png" etc. and this has successfully retrieved the correct images so the document looks good.
However after doing this I have noticed that the codes in the fields has now changed. beforehand at the start the appeared like this:
{ INCLUDEPICTURE "{ DOCVARIABLE "var_doc_path" }folderwithpics\\image_a_1.pgn" \d }
now however after the copy and paste this is what appears:
{ INCLUDEPICTURE "folderwithpics\\image_b_1.pgn" \* MERGEFORMAT \d }
The doc variable is no longer displayed. What's weird that is that the correct image is still sourced and displayed in the word document, so it seems that the docvarible which is essential for the field to reference the correct path, is still active.
There is a problem though, which is that in a new word document, I need to use INCLUDEIMAGE to source all of the 1000 images again into this new document, and they aren't getting displayed. I need to go back and manually enter in the full path for each of the images in order for the new word document to access those image.
I think this must have something to do with the fact that the correct path is no longer displayed. Can anyone help me? I think I need to get the document to display { DOCVARIABLE "var_doc_path" } in the INCLUDEPICTURE field again.
As a side note if anyone has a good guide they can reccommend on working with fields I think that would be a great help. Thanks!
Unless you copied the document via Windows or via SaveAs, rather than simply copying & pasting content from one document to another, the new document will not contain the Document Variable. By using the \d switch, Word is referencing a copy of the image stored in the document metadata rather than the one in the filepath it can no longer access via the DOCVARIABLE field.
FWIW, the \* MERGEFORMAT switch does nothing useful in an INCLUDEPICTURE field.
I am using Word interop to build a Word plugin. In this plugin I have a case where I want to examine all
Field objects in the document and when that field is a cross-reference to another place in the same document I need to be able to capture the text in the paragraph that the field is referring to.
I was able to get the name of the field object but there were no bookmarks defined in the Document although in Word I could click on the field to get to the other location.
Example field
Example field as code
referenced text I need to get
No Bookmark objects are defined
I tried to simulate the user clicking on the field by invoking DoClick() on it and then I accessed V_V_Scalar_Document_Generic.Application.Selection.Range.Text
but it gave nothing. I also tried the GoTo approach below but still didn't reach the referenced text.
System.Collections.Generic.List<string> L_V_List_String_Fields = new System.Collections.Generic.List<string>();
foreach (Field L_V_Scalar_Field_Item in V_V_Scalar_Document_Generic.Range.Fields)
{
try
{
if (L_V_Scalar_Field_Item.Type == WdFieldType.wdFieldRef)
// L_V_Scalar_Field_Item.Data --> gives COM exception
// L_V_Scalar_Field_Item.Code.ID --> blanks
// L_V_Scalar_Field_Item.DoClick() 'will not help because fields are not always hyperlinks
// L_V_Scalar_Field_Item.Result.Text --> gives the text of the field itself
// all variations I tried for the target parameter in the line below (last param) are not working
// V_V_Scalar_Document_Generic.[GoTo](Microsoft.Office.Interop.Word.WdGoToItem.wdGoToField, System.Type.Missing, System.Type.Missing, "_Ref28680085")
// Dim L_V_Scalar_String_Source as string = V_V_Scalar_Document_Generic.Application.Selection.Range.Text
L_V_List_String_Fields.Add($"CodeText:{L_V_Scalar_Field_Item.Code.Text} |FieldType:{L_V_Scalar_Field_Item.Type} |FieldKind:{L_V_Scalar_Field_Item.Kind} |SourceText:{"source text ??"}");
}
catch (Exception L_V_Scalar_Exception_Generic)
{
}
}
The bookmarks are not listed because Word has a convention that bookmarks with names starting with an underscore ("_") are "hidden". In the Insert->Links->Bookmark dialog box, you can see them if you check the "Hidden Bookmarks" box, but in the Find and Replace box, you have to enter the name manually.
Even when Bookmarks are hidden, you can reference them. So for example you should be able to do something like this (this is VBA syntax):
Dim TargetText As String
TargetText = ActiveDocument.Bookmarks("_Ref28680085").Range.Text
to get the text "covered" by the bookmark. In theory, you could use Goto, by using wdGotoBookmark instead of wdGotoField, except that I think it will only have a chance of working with the Selection object, not a Range object.
Depending on what type of cross-reference the user inserts, Word "covers" different parts of the referenced material. So you may need to construct the Range you really need, e.g. using the Bookmark's Range.Start to tell you which paragraph the reference is pointing at.
Document properties are fields that can be used throughout microsoft word: https://support.office.com/en-us/article/view-or-change-the-properties-for-an-office-file-21d604c2-481e-4379-8e54-1dd4622c6b75
Typically I use these fields throughout my document so changing the document property will change all instances of that field throughout my document. This is great, however, when one of the fields is "Company Email" I not only want the text to change to employee#company.com but I would like to have that field hyperlinked to employee#company.com so that the user can click it in order to send an email.
Edit: (Info from comments)
I have tried to insert a hyperlink, hit alt-f and embed a docproperty via the instructions from a similar thread: stackoverflow.com/questions/17428891/…. But could not get that to work properly.
The following works for me, in a quick test (you'll need to substitute the name of the Document Property you're using):
{ HYPERLINK "mailto:{ DocProperty Email }" }
When I Ctrl+Click this it creates a new Email to the address.
For readers unfamiliar with inserting field codes: The wavy bracket pairs must be inserted using Ctrl+F9. Alt+F9 after to switch from field code to field display.
When I try to using
pdftk my.pdf dump_data_fields >result.txt
have empty data result
Your file my.pdf may not be compatible with pdftk. Convert the file first using the following command:
>pdftk my.pdf output my_converted.pdf
Then try,
>pdftk my_converted.pdf dump_data_fields > result.txt
I've taken this from the following http://www.fpdf.org/en/script/script93.php where the converting process is suggested when the fields won't write to the pdf file so converting before dumping the fields may not help.
If your pdf has fields you it should be fillable in your pdf viewer. If in isn't fillable then it would seem that it has no fields.
This is most likely because the pdf you are using doesn't have any data fields to dump! Use a tool like Adobe Acrobat to open the pdf, go to wherever you need to to Edit Fields, and add fields anywhere you need them to show up. Make sure they are named so you can utilize them by using the attributes[] call in pdftk.
I recommend using snake case (i.e. text box named 'first_name') and then you should have access to it using attributes[:first_name] = 'your text'.
Hope this helps, let me know if you have any other questions/issues.
I want to create a word 2007 document without using object model. So I would prefer to create it using open xml format. So far I have been able to create the document. Now I want to add a content control in it and map it to xml. Can anybody guide me regarding the same???
Anoop,
You said that you are able to creat the document using OpenXmlSdk. With that assumption, you can use the following code to create the content control to add to the Wordprocessing.Body element of your Document.
//praragraph to be added to the rich text content control
Run run = new Run(new Text("Insert any text Here") { Space = StaticTextConstants.Preserve });
Paragraph paragraph = new Paragraph(run);
SdtProperties sdtPr = new SdtProperties(
new Alias { Val = "MyContentCotrol" },
new Tag { Val = "_myContentControl" });
SdtContentBlock sdtCBlock = new SdtContentBlock(paragraph);
SdtBlock sdtBlock = new SdtBlock(sdtPr, sdtCBlock);
//add this content control to the body of the word document
WordprocessingDocument wDoc = WordprocessingDocument.Open(path, true); //path is where your word 2007 file is
Body mBody = wDoc.MainDocumentPart.Document.Body;
mBody.AppendChild(sdtBlock);
wDoc.MainDocumentPart.Document.Save();
wDoc.Dispose();
I hope this answers a part of your question. I did not understand what you ment by "Map it to XML". Did you mean to say you want to create CustomXmlBlock and add the ContentControl to it?
Have a look for the Word Content Control Toolkit on www.codeplex.com.
Here is a very brief explanation on how to do what you are attempting.
You need to have access to the developer tab on the Word ribbon. To get this working click on the Office (Round thingy) in the top left hand corner and Select Word Options at the bottom of the menu. On the first options page there is a checkbox to show the developer toolbar.
Use the developer toolbar to add the Content controls you want on the page. Click the properties button in the Content controls section of the developer bar and set the name and tag properties (I stick to naming the name and tag fields with the same name).
Save and close the word document.
Open the Content control toolkit and then open your document with the toolkit. Use the left hand pain to create some custom xml to link to your controls.
Now use the bind view to drag and drop the mappings between your custom xml and the custom controls that are displayed in the right panel of the toolkit.
You can use the openxml sdk 1.0 or 2.0 (still in ctp) to open your word document in code and access the custom xml file that is contained as part of the word document.
If you want to have a look at how your word document looks as xml. Make a copy of your word document and then rename it to say "a.zip". Double click on the zip file and then navigate the folder structure. The main content of the word document is held under the word folder in a file called "document.xml". The custom xml part of the document is held under the customXml folder and is generally found in the file named "item1.xml".
I hope this brief explanation get you up and running.