merge word documents to a single document - ms-word

I used the code in the link mentioned below to merge word files into a single file
http://devpinoy.org/blogs/keithrull/archive/2007/06/09/updated-how-to-merge-multiple-microsoft-word-documents.aspx
However, seeing the output file i realized that it was unable to copy header image in the first document. How do we merge documents preserving format and content.

I will suggest to use GroupDocs.Merger Cloud for merging multiple word document to a single word document, it keeps the formatting and contents of the source documents. It is a platform independent REST API solution without depending on any third-party tool or software.
Sample C# code:
var configuration = new GroupDocs.Merger.Cloud.Sdk.Client.Configuration(MyAppSid, MyAppKey);
var apiInstance_Document = new GroupDocs.Merger.Cloud.Sdk.Api.DocumentApi(configuration);
var apiInstance_File = new GroupDocs.Merger.Cloud.Sdk.Api.FileApi(configuration);
var pathToSourceFiles = #"C:/Temp/input/";
var remoteFolder = "Temp/";
var joinItem_list = new List<JoinItem>();
try
{
DirectoryInfo dir = new DirectoryInfo(pathToSourceFiles);
System.IO.FileInfo[] files = dir.GetFiles();
foreach (System.IO.FileInfo file in files)
{
var request_upload = new GroupDocs.Merger.Cloud.Sdk.Model.Requests.UploadFileRequest(remoteFolder + file.Name, File.Open(file.FullName, FileMode.Open));
var response_upload = apiInstance_File.UploadFile(request_upload);
var item = new JoinItem
{
FileInfo = new GroupDocs.Merger.Cloud.Sdk.Model.FileInfo
{ FilePath = remoteFolder + file.Name }
};
joinItem_list.Add(item);
}
var options = new JoinOptions
{
JoinItems = joinItem_list,
OutputPath = remoteFolder + "Merged_Document.docx"
};
var request = new JoinRequest(options);
var response = apiInstance_Document.Join(request);
Console.WriteLine("Output file path: " + response.Path);
}
catch (Exception e)
{
Console.WriteLine("Exception while Merging Documents: " + e.Message);
}

That code is inserting a page break after each file.
Since sections control headers, if a second or subsequent document has a header, you'll probably be wanting to keep the original section properties, and insert those after your first document.
If you look at your original document as a docx, you'll probably see that your section is a document level section properties element.
The easiest way around your problem may be to create a second section properties element inside the last paragraph (which contains the header information). Then this should just stay there when the documents are merged (ie other paragraphs added after it).
That's the theory. See also http://www.pcreview.co.uk/forums/thread-898133.php
But I haven't tried it; it assumes InsertFile behaves as I expect it should.

Related

How can I create a function that automatically takes data from Google Sheets and replaces the tags in a Slides template?

I am new to Google Apps Script and coding in general and wasn't entirely sure how to do this. I want to create code that allows me to create a new set of Google Slides based on a Slides template using the relevant rows from a Google Sheets document.
function generateNewSlides() {
var wsID = "would insert worksheet URL ID here";
var ws = SpreadsheetApp.openById(wsID).getSheetByName("Data");
var data = ws.getRange(2, 1, ws.getLastRow()-1, 5).getValues();
>the above should get the relevant table from the sheet
data.forEach(function(info){
if(info[0]){
var firstname = info[0];
var surname = info[1];
var email = info[2];
var phone = info[3];
var image = info[4];
var presName = info[5];
>the above are columns where the different pieces of data would be taken from for the placeholders in the Slides template
var slidesTemplateID = "would insert slides template URL ID here";
var slidesTemplate = SlidesApp.openById(slidesTemplateID);
var template = slidesTemplate.getSlides();
var folderID = "would insert desired folder ID for saving in here";
>the above should get me the Slides template
template.makeCopy(presName,DriveApp.getFolderById(folderID)); **>line where error occurred**
var newPresentation = DriveApp.getFilesByName(presName).next().getUrl();
var Presentation = SlidesApp.openByUrl(newPresentation);
>the above should create a copy and then open it
var shapes = (Presentation.getShapes());
shapes.forEach(function(shape){
shape.getText().replaceAllText('{{firstname}}',firstname);
shape.getText().replaceAllText('{{surname}}',surname);
shape.getText().replaceAllText('{{email}}',email);
shape.getText().replaceAllText('{{phone}}',phone);
shape.getText().replaceAllText('{{presname}}', presName)
});
>the above should replace all the placeholder tags in the template with the row data
}
});
}
Above is the code I have so far. The worksheet I am extracting data from has columns: first name, surname, email address, phone number, image (URL), and presentation name. When I try to run it I encounter an error on line 37 where it says template.makeCopy is not a function, however I am certain .makeCopy should be able to create a copy for it, no?
My main questions are:
1) What should I change to make it work, generating a new set slides for each row in the worksheet?
2) How can I add images to it replacing placeholder tags I've added in squares (not textboxes) in the template?
Thanks in advance!
Issue 1. makeCopy:
makeCopy(name, destination) is a method of the class File, which belongs to the Drive Service, not to the Slides Service. In your code, template is a list of Slides (you retrieve it by calling the method getSlides() from a Presentation). makeCopy cannot work here.
In order to make a copy of a Presentation, you should be using the Drive Service instead. You should replace these lines:
var slidesTemplate = SlidesApp.openById(slidesTemplateID);
var template = slidesTemplate.getSlides();
With this one:
var template = DriveApp.getFileById(slidesTemplateID);
Issue 2. Iterating through all shapes:
Next, you want to iterate through all shapes in your Presentation, and replace all placeholder tags with your desired text. In order to do that, you are using Presentation.getShapes(), which cannot work, since getShapes() is not a method of Presentation, but of Slide.
You should first iterate through all Slides in the Presentation, and for each Slide, iterate through all Shapes. You should replace these lines:
var shapes = (Presentation.getShapes());
shapes.forEach(function(shape){
// Replacing text lines
});
With these ones:
Presentation.getSlides().forEach(function(slide) {
slide.getShapes().forEach(function(shape) {
// Replacing text lines
})
});
Note:
In order to retrieve the copied presentation, you are currently doing this:
template.makeCopy(presName,DriveApp.getFolderById(folderID));
var newPresentation = DriveApp.getFilesByName(presName).next().getUrl();
var Presentation = SlidesApp.openByUrl(newPresentation);
There is no need to do this, you can just retrieve the ID of the created template, and open by ID, like this:
var copiedTemplate = template.makeCopy(presName,DriveApp.getFolderById(folderID));
var Presentation = SlidesApp.openById(copiedTemplate.getId());
Reference:
Slides Service
Drive Service

Word OpenXml Word Found Unreadable Content

We are trying to manipulate a word document to remove a paragraph based on certain conditions. But the word file produced always ends up being corrupted when we try to open it with the error:
Word found unreadable content
The below code corrupts the file but if we remove the line:
Document document = mdp.Document;
The the file is saved and opens without issue. Is there an obvious issue that I am missing?
var readAllBytes = File.ReadAllBytes(#"C:\Original.docx");
using (var stream = new MemoryStream(readAllBytes))
{
using (WordprocessingDocument wpd = WordprocessingDocument.Open(stream, true))
{
MainDocumentPart mdp = wpd.MainDocumentPart;
Document document = mdp.Document;
}
}
File.WriteAllBytes(#"C:\New.docx", readAllBytes);
UPDATE:
using (WordprocessingDocument wpd = WordprocessingDocument.Open(#"C:\Original.docx", true))
{
MainDocumentPart mdp = wpd.MainDocumentPart;
Document document = mdp.Document;
document.Save();
}
Running the code above on a physical file we can still open Original.docx without the error so it seems limited to modifying a stream.
Here's a method that reads a document into a MemoryStream:
public static MemoryStream ReadAllBytesToMemoryStream(string path)
{
byte[] buffer = File.ReadAllBytes(path);
var destStream = new MemoryStream(buffer.Length);
destStream.Write(buffer, 0, buffer.Length);
destStream.Seek(0, SeekOrigin.Begin);
return destStream;
}
Note how the MemoryStream is instantiated. I am passing the capacity rather than the buffer (as in your own code). Why is that?
When using MemoryStream() or MemoryStream(int), you are creating a resizable MemoryStream instance, which you will want in case you make changes to your document. When using MemoryStream(byte[]) (as in your code), the MemoryStream instance is not resizable, which will be problematic unless you don't make any changes to your document or your changes will only ever make it shrink in size.
Now, to read a Word document into a MemoryStream, manipulate that Word document in memory, and end up with a consistent MemoryStream, you will have to do the following:
// Get a MemoryStream.
// In this example, the MemoryStream is created by reading a file stored
// in the file system. Depending on the Stream you "receive", it makes
// sense to copy the Stream to a MemoryStream before processing.
MemoryStream stream = ReadAllBytesToMemoryStream(#"C:\Original.docx");
// Open the Word document on the MemoryStream.
using (WordprocessingDocument wpd = WordprocessingDocument.Open(stream, true)
{
MainDocumentPart mdp = wpd.MainDocumentPart;
Document document = mdp.Document;
// Manipulate document ...
}
// After having closed the WordprocessingDocument (by leaving the using statement),
// you can use the MemoryStream for whatever comes next, e.g., to write it to a
// file stored in the file system.
File.WriteAllBytes(#"C:\New.docx", stream.GetBuffer());
Note that you will have to reset the stream.Position property by calling stream.Seek(0, SeekOrigin.Begin) whenever your next action depends on that MemoryStream.Position property (e.g., CopyTo, CopyToAsync). Right after having left the using statement, the stream's position will be equal to its length.

Protractor - Create a txt file as report with the "Expect..." result

I'm trying to create a report for my scenario, I want to execute some validations and add the retults in a string, then, write this string in a TXT file (for each validation I would like to add the result and execute again till the last item), something like this:
it ("Perform the loop to search for different strings", function()
{
browser.waitForAngularEnabled(false);
browser.get("http://WebSite.es");
//strings[] contains 57 strings inside the json file
for (var i = 0; i == jsonfile.strings.length ; ++i)
{
var valuetoInput = json.Strings[i];
var writeInFile;
browser.wait;
httpGet("http://website.es/search/offers/list/"+valuetoInput+"?page=1&pages=3&limit=20").then(function(result) {
writeInFile = writeInFile + "Validation for String: "+ json.Strings[i] + " Results is: " + expect(result.statusCode).toBe(200) + "\n";
});
if (i == jsonfile.strings.length)
{
console.log("Executions finished");
var fs = require('fs');
var outputFilename = "Output.txt";
fs.writeFile(outputFilename, "Validation of Get requests with each string:\n " + writeInFile, function(err) {
if(err)
{
console.log(err);
}
else {
console.log("File saved to " + outputFilename);
}
});
}
};
});
But when I check my file I only get the first row writen in the way I want and nothing else, could you please let me know what am I doing wrong?
*The validation works properly in the screen for each of string in my file used as data base
**I'm a newbie with protractor
Thank you a lot!!
writeFile documentation
Asynchronously writes data to a file, replacing the file if it already
exists
You are overwriting the file every time, which is why it only has 1 line.
The easiest way would probably (my opinion) be appendFile. It writes to a file without overwriting existing data and will also create the file if it doesnt exist in the first place.
You could also re-read that log file, store that data in a variable, and re-write to that file with the old AND new data included in it. You could also create a writeStream etc.
There are quite a few ways to go about it and plenty of other answers
on SO specifically on those functions that can provide more info.
Node.js Write a line into a .txt file
Node.js read and write file lines
Final note, if you are using Jasmine you can also create a custom jasmine reporter. They have methods that contain exactly what you want (status Pass/Fail, actual vs expected values etc) and it's fairly easy to set up with Protractor

How to edit pasted content using the Open XML SDK

I have a custom template in which I'd like to control (as best I can) the types of content that can exist in a document. To that end, I disable controls, and I also intercept pastes to remove some of those content types, e.g. charts. I am aware that this content can also be drag-and-dropped, so I also check for it later, but I'd prefer to stop or warn the user as soon as possible.
I have tried a few strategies:
RTF manipulation
Open XML manipulation
RTF manipulation is so far working fairly well, but I'd really prefer to use Open XML as I expect it to be more useful in the future. I just can't get it working.
Open XML Manipulation
The wonderfully-undocumented (as far as I can tell) "Embed Source" appears to contain a compound document object, which I can use to modify the copied content using the Open XML SDK. But I have been unable to put the modified content back into an object that lets it be pasted correctly.
The modification part seems to work fine. I can see, if I save the modified content to a temporary .docx file, that the changes are being made correctly. It's the return to the clipboard that seems to be giving me trouble.
I have tried assigning just the Embed Source object back to the clipboard (so that the other types such as RTF get wiped out), and in this case nothing at all gets pasted. I've also tried re-assigning the Embed Source object back to the clipboard's data object, so that the remaining data types are still there (but with mismatched content, probably), which results in an empty embedded document getting pasted.
Here's a sample of what I'm doing with Open XML:
using OpenMcdf;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
...
object dataObj = Forms.Clipboard.GetDataObject();
object embedSrcObj = dateObj.GetData("Embed Source");
if (embedSrcObj is Stream)
{
// read it with OpenMCDF
Stream stream = embedSrcObj as Stream;
CompoundFile cf = new CompoundFile(stream);
CFStream cfs = cf.RootStorage.GetStream("package");
byte[] bytes = cfs.GetData();
string savedDoc = Path.GetTempFileName() + ".docx";
File.WriteAllBytes(savedDoc, bytes);
// And then use the OpenXML SDK to read/edit the document:
using (WordprocessingDocument openDoc = WordprocessingDocument.Open(savedDoc, true))
{
OpenXmlElement body = openDoc.MainDocumentPart.RootElement.ChildElements[0];
foreach (OpenXmlElement ele in body.ChildElements)
{
if (ele is Paragraph)
{
Paragraph para = (Paragraph)ele;
if (para.ParagraphProperties != null && para.ParagraphProperties.ParagraphStyleId != null)
{
string styleName = para.ParagraphProperties.ParagraphStyleId.Val;
Run run = para.LastChild as Run; // I know I'm assuming things here but it's sufficient for a test case
run.RunProperties = new RunProperties();
run.RunProperties.AppendChild(new DocumentFormat.OpenXml.Wordprocessing.Text("test"));
}
}
// etc.
}
openDoc.MainDocumentPart.Document.Save(); // I think this is redundant in later versions than what I'm using
}
// repackage the document
bytes = File.ReadAllBytes(savedDoc);
cf.RootStorage.Delete("Package");
cfs = cf.RootStorage.AddStream("Package");
cfs.Append(bytes);
MemoryStream ms = new MemoryStream();
cf.Save(ms);
ms.Position = 0;
dataObj.SetData("Embed Source", ms);
// or,
// Clipboard.SetData("Embed Source", ms);
}
Question
What am I doing wrong? Is this just a bad/unworkable approach?

OpenXml ChangeDocumentType

I need to convert a powerpoint template from potx to pptx. As seen here: http://www.codeproject.com/Tips/366463/Create-PowerPoint-presentation-using-PowerPoint-te I have tried with the following code. However the resulting pptx document is invalid and can't be opened by Office Powerpoint. If I skip the line newDoc.ChangeDocumentType then the resulting document is valid, but not converted to pptx.
templateContentBytes is a byte array containing the content of the potx document.
And temppath points to its local version.
using (var stream = new MemoryStream())
{
stream.Write(templateContentBytes, 0, templateContentBytes.Length);
using (var newdoc = PresentationDocument.Open(stream, true))
{
newdoc.ChangeDocumentType(PresentationDocumentType.Presentation);
PresentationPart presentationPart = newdoc.PresentationPart;
presentationPart.PresentationPropertiesPart.AddExternalRelationship(
"http://schemas.openxmlformats.org/officeDocument/2006/" + "relationships/attachedTemplate",
new Uri(tempPath, UriKind.Absolute));
presentationPart.Presentation.Save();
File.WriteAllBytes(tempPathResult, stream.ToArray());
I had the same problem, just move
File.WriteAllBytes(tempPathResult, stream.ToArray());
outside of the using