Replace or remove tags Adobe Acrobat? - tags

I am trying to automate removal tags in PDF document structure (container "Artifact", container "Figure" and etc.)
Cannot find anything related to tags in operation wizard.
I mean tags that appear in content of document
or what program can work with these tags?
...find and replace or remove
i need this to batch process

Related

This file has Custom XML elements that are no longer supported in Word. Saving the file will remove these elements permanently

One of my customers has reported seeing the following message when working in documents created from templates that I provided:
This file has Custom XML elements that are no longer supported in Word. Saving the file will remove these elements permanently.
I understand that this message relates to Custom XML Markup (https://learn.microsoft.com/en-us/office/troubleshoot/word/custom-xml-elements-not-supported). I can confirm that there is no Custom XML Markup in the document.
Any idea what could be causing this error to display?

manipulating Microsoft Word DOCX files that have links and track changes using Python

I have been using the excellent python-docx package to read, modify, and write Microsoft Word files. The package supports extracting the text from each paragraph. It also allows accessing a paragraph a "run" at a time, where the run is a set of characters that have the same font information. Unfortunately, when you access a paragraph by runs, you lose the links, because the package does not support links. The package also does not support accessing change tracking information.
My problem is that I need to access change tracking information. Or, more specifically, I need to copy paragraphs that have change tracking indicated from one document to another.
I've tried doing this at the XML level. For example, this code snippet appends the contents of file1.docx to file2.docx:
from docx import Document
doc1 = Document("file1.docx")
doc2 = Document("file2.docx")
doc2.element.body.append(doc1.element.body)
doc2.save("file2-appended.docx")
When I try to open the file on my Mac for complicated files, I get this error:
But if I click OK, the contents are there. The manipulation also works without problem for very simple files.
What am I missing?
The .element attribute is really an "internal" interface and should be named ._element. In most other places I have named it that. What you're getting there is the root element of the document part. You can see what it is by calling:
print(doc2.element.xml)
That element has one and only one w:body element below it, which is what you get when with doc2.element.body (.xml will work on that too, btw, if you want to inspect that element).
What your code is doing is appending one body element at the end of another w:body element and thereby forming invalid XML. The WordprocessingML vocabulary is quite strict about what element can follow another and how many and so forth. The only surprise for me is that it actually sometimes works for you, I take it :)
If you want to manipulate the XML directly, which is what the ._element attribute is there for, you need to do it carefully, in view of the (complex) WordprocessingML XML Schema.
Unlike when you stick to the published API, there's no safety net once ._element (or .element) appears in your code.
Inside the body XML can be relationships to external document parts, like images and hyperlinks. These will only be valid within the document in which they appear. This might explain why some files can be repaired.

Office 2013 JavaScript API for Word - Content Control questions

is it possible to insert a content control into a Word document, then, get some sort of handle or context to the content control, and then insert HTML into it?
Essentially, the scenario that I am trying to create with the Office JavaScript API is to, upon the user's request, insert a rich text content control, and then populate it with HTML.
I am able to insert the content control from the JavaScript API using the approach suggested at http://social.msdn.microsoft.com/Forums/en-US/appsforoffice/thread/8c4809c7-743c-4388-aef0-bc6a6855c882. It requires a coercionType of ooxml. However, the content that I wish to populate with the ooxml is HTML based. So when I try to insert a content control with the following ooxml:
...Boiler ooxml to create content control...
<w:r><w:t><h1>Test header</h1><h2>Test subheader</h2><p>Test paragraph text</p></w:t></w:r>
The insert attempt fails. I'm assuming that's because you can't mix ooxml and html when inserting this into the document with a coercionType of ooxml.
Since this ooxml approach is the only way you can insert a content control, how can I then set the content control with HTML text? I have looked over the Document object help content at http://msdn.microsoft.com/en-us/library/fp142295.aspx, but I'm unsure how I can do this still, or if it's feasible.
Thanks
though I have not tried this with JS - it should be possible nontheless.
Try adding a altChunk Element, it can contain other open xml or html. I have used it a few times with success.
a few links on the issue:
http://blogs.msdn.com/b/brian_jones/archive/2008/12/08/the-easy-way-to-assemble-multiple-word-documents.aspx
http://blogs.msdn.com/b/ericwhite/archive/2008/10/27/how-to-use-altchunk-for-document-assembly.aspx
U should however try to use "strict"-xml - otherwise the above might not be possible.
I just found this example (sry it's german, but there should be an english version somewhere as well). In which coercionType is used like this:
Office.context.document.setSelectedDataAsync(
booksToRead,
{ coercionType: Office.CoercionType.Html },
function (result) {
// Access the results, if necessary.
});
This might do the trick as well.

Implement a tag system in SugarCRM

I'd like to implement a tag system to a SugarCRM that has the following characteristics:
User can create any new tag just by writing the new tag into the tag field
Existing tags are suggested for user to choose, in order to avoid having similar tags (e.g. if user tried to enter "sugar" as tag, it should be suggested for them to choose "SugarCRM")
Search by tag should result only records that have that tag, but not records that have that tag as a keyword in title or description fields
All record in all modules should have a tag field
It would be nice if a search by tag could be made throughout all modules
Does anyone know if there is already some implementation for this?
There are several on SugarForge. I've used SugarTAGS from CARRENET (http://www.sugarforge.org/projects/sugartags/) as a starting point before.

Purpose for Word Open XML and content controls binding

For word report generation, I am looking at binding XML to content controls to see if it is any easier than to use Word Interop and hardcode index reference to content controls to assign values to them.
However, I don't really understand how to do it.
My work flow is entering information in Excel and then generate an XML file to have content controls populated by XML, however, what I read is the other way round: Word Control Control Toolkit and descriptions where the XML is populated by user entering information in Word, and then programmer to unzip docx file to retrieve the XML file.
How can I populate content controls with XML?
There are samples on generating Word documents from Word templates, XML and data bound content controls # http://worddocgenerator.codeplex.com/
Set up the mapped content controls in the 'template' docx using the content control toolkit or similar. Do this using a sample XML file containing your Excel data.
Now you have that template document, at run time you can inject your XML file into it (ie replace the custom xml part it contains, with your instance data), in C# or Java or whatever.
When the user opens the document in Word 2007/2010, the information in the custom XML part will automatically be copied into the bound controls, and visible to the user.
Note that content control data binding doesn't easily support repeating data (eg populating table rows) in Word 2007/2010, though there are ways to do it.