Copy from one Word document to another without breaking styling - ms-word

We have two word documents, one that is used for as an order template and another that is our terms and conditions (T&C).
We would like to merge both documents into one without breaking the styling and have not been able to do so even with quite a considerable amount of effort.
Essentially, the T&C document has a bunch custom styles some of which overlap with the default word styles. This means that when copy pasting the T&C document into the order template, the result is a complete mess.
Is there a way of combining the two word files while maintaining the existing format (we would like the T&C to be copied into the Word document and not the other way around).
I have tried using the following Macro (found online) to remove the styles of the T&C document and maintain the format:
Sub DirectFormat()
Dim para As Paragraph
Dim fnt As Font
Dim pfmt As ParagraphFormat
For Each para In ActiveDocument.Paragraphs
With para
If .Style <> ActiveDocument.Styles("Normal") Then
Set fnt = .Style.Font
Set pfmt = .Style.ParagraphFormat
.Style = ActiveDocument.Styles("Normal")
.Range.Font = fnt
.Range.ParagraphFormat = pfmt
End If
End With
Next
End Sub
However after running it, the document looks completely different.

The simplest and most effective way is to ensure you don't have conflicting Style definitions and that the Style definitions in the source document, especially, haven't been overridden with direct formatting.
Other than that, instead of using copy/paste for content replication, you might use something like:
wdRngTgt.FormattedText = wdRngSrc.FormattedText
where wdRngTgt defines the destination range and wdRngSrc defines the source range.

Related

iText - Manipulate existing PDF - add dashes to end of each paragraph

I need to manipulate existing PDF in iText to add dashes to the end of each paragraph. Something like this:
I would make this in Word with tab leaders.
Is this possible to do with iText on an existing document.
Any help would be greatly appreciated.
Thanks!
Edit for clarifications
iText version is 5.5.x, but I guess we can upgrade it if the task would be easier with newer version.
There could be some paragraph that do not need dashes, but I have some control of the original PDF. It is assembled from different system and I could add some kind of markers to the paragraphs that need leaders (ie. I can add text like "~tab~" at the end of such paragraphs).
At the moment the documents that need this kind of editing have headers and footer, nothing but the text and one column with justified alignment.
Edit for even more clarification
I can even (by configuration) set where the dashes has to end (ie. at 10px) for specific document. We know every document type (and its structure) that needs to be manipulated this way.
This is insanely hard.
You should think of a PDF document as a container of instructions, rather than a WYSIWYG format. So finding out where lines are (let alone paragraphs) is very hard.
High level plan:
use IEventListener to process events from the PDF being parsed
look out for TextRenderInfo events, store them
sort TextRenderInfo events to ensure your list of events is in logical reading order.
merge items in your list if they appear on the same line and are less than a certain distance apart (for instance the distance of 3 spaces in the font specified by TextRenderInfo)
Now you should have lines
Merge lines if they appear in close vertical proximity of eachother and they overlap horizontally. How close they should be, and how much they overlap is something you'll have to figure out, and might differ from page to page, and document to document.
now you should have paragraphs
Figure out the bounding box of each paragraph. Or more accurately, the convex hull. There is a good algorithm for this called the gift-wrapping algorithm.
Now you can simply insert lines by inspecting your convex hull. This is the easy step.
If you can insert markers, you can easily do this using iText7. iText7 has an implementation of IEventListener that allows you to look for regular expressions within a PDF document. It returns the locations where the regular expression was found. If you can ensure your markers always satisfy some kind of regular expression, you can easily look for them, get their coordinates, and insert a line at the calculated position.
Of course, then you need to get rid of the marker text.
For that you can use pdfSweep.

Word Add in VSTO - How to get multiple ranges of text copied on multi-select

I am trying to store and high-light text copied by user when he opens the word file back. When he copies one paragraph, I am able to highlight (I am storing all this copied information for e.g. range values in an XML file) but when he copies content of multiple paragraphs using Ctrl Button, I am unable to get individual range values.
Could you guys help on this?
What you are trying to do is not supported by (the current versions of) Word because programmatic access to discontiguous selection is limited. In particular, you cannot access the different ranges in that selection (you can only the last subrange).
The limitations are listed in detail in this MSDN article:
Limited programmatic access to Word discontiguous selections

Accurately Reading Document Content with Position Using Open XML (Word)

I have a need to retrieve a string which matches perfectly the content of a word document. That is to say, in position 1000 of this string, I should find exactly the text at position 1000 in the document.
We have been through various iterations of reading in the document context text and adjusting for field codes/tables/pictures/inline shapes etc by padding in the right places. This approach does work (well) but we want to move towards Open XML instead for speed.
We have Power Tools for Open Xml installed, and have been looking at ways to recreate this string using Open Xml. We can get all the text by going through the runs (as per Eric White's blogs), but we also need everything else. \r's, \t's etc. I see things like "TabChar" in runs, and wdfldChar, but I am unclear how to use this information to generically get what we want.
For example, "TabChar" in our string should be \t. We must need to interpret wdfldChar begin, separate, end in a certain way (maybe by adding spaces). The problem is that we don't want to have to find every possibility and code them
[If run = "TabChar" append."\t" etc] a) because it's inefficient, b) it is unsafe.
Can anyone help with a method to reproduce this string with complete accuracy?
Thanks

Making Word documents independent of their linked templates

At the office we are using several custom Word 2007 / 2010 templates. If we then send out doc files to clients they sometimes appear quite messy and ugly unformatted, as they do not have these templates on their machines.
Is there a way to embed templates into Word documents or kind of "flatten" these documents so they are not depending on the templates anymore and have formatting, images etc. all contained within just the Word doc file without needing the template anymore?
btw: I know printing the doc into a pdf and sending this would be a workaround, but we need to keep it in word, as clients have to be able to edit the documents.
This macro will break the link between a document and its template:
Sub DivorceFromTemplate()
' Dissassociates the document from its Word Template
With ActiveDocument
.UpdateStylesOnOpen = False
.AttachedTemplate = ""
End With
End Sub

OO:Doc -perl module for Openoffic

I want to automate some writer tasks. I need to create a .odt writer
document with oo:doc using methods such as create paragraph and append
paragraph. The problem is that append paragraph and create paragraph does not
allow text to start at middle of page or at a certain column, ie
Name Surname Address
When I unzip the "master" document I want to to create, when I inspect the content.xml file i see the xml equivalent is
" <text:p text:style-name="Text_20_body"><text:s text:c="115"/><text:span text:style-name="T1"><text:s/>Hallo how are you today</text:span></text:p><text:p text:style-name="P1"><text:s text:c="116"/>I hope you are well also</text:p><text:p text:style-name="P1""
How do I set the text:c and text:s element(s) from within oo::doc
Question2:
How do i set the formatting of a paragraph
to only extend from ie column 20 to column 80
thanks
Those elements are for runs of non-breaking spaces. the text:c attribute says how many spaces there are.
That doesn't strike me as a solution to what you want, which is to change the margins and position of a paragraph, yes?
Do you have a document that you want to use as a template, where the text will be inserted? Or ar you trying to create the entire page from scratch?
I think you want to use OpenOffice.org to create a Writer document that has the structure you want, then look at the XML to see what the markup is that accomplishes that. Look at paragraph-level styles or even frames if that is what is used. You might be able to create insertion points for your generated content by then adding magic-text phrases that you can scan for.
Then figure out how to get that done with the perl module.