Office web addin - GetFileAsync is not giving proper data - ms-word

I am working in Office word web addin. In this, i am using Office.context.document.getFileAsync(Office.FileType.Compressed, { sliceSize:10000 } to get whole document content.
Once we got all byte array from slices then converting to string, system always converting as like as follows (whatever content is there in document)
Please let me know, how to overcome this?

Related

Can OpenXML be used to launch a new Word instance?

I'm able to generate Word documents without issue. I save the resulting *.docx file to a temporary location and then need to launch the file in Word.
The requirement is to not "open" the file in Word (easily done with a Process.Start) but to have load into Word as a new unsaved file. This is because certain propriety integrations for Word need to take over when a user saves the file and don't kick in if the file is ready saved but to a location on disk.
I've achieved this by using Interop calls to the Word application, adding the new document to Word's workspace. My problem is with Interop which tends to break on various client machines, particularly when Office upgrades take place (say a client had 32-bit office but upgraded with a 64-bit version).
I'm somewhat new to OpenXML, but can it be used to automate Word or is Interop my only real option?
object oFilename = tmpFileName;
object oNewTemplate = false;
object oDocumentType = 0;
object oVisible = true;
Document document = _application.Documents.Add(ref oFilename, ref oNewTemplate, ref oDocumentType, ref oVisible);
No, the Open XML technology has no way of interacting with the Office (Word) application - it's for file creation/manipulation, only. The interop is required in order to do anything with the Word application.
There is sort of a way around this - and it's only possible with Word, no other Office application has this - is to convert the Open XML content to the OPC flat-file format. This "concatenates" the various packages that make up the zip file to a pure text string, essetially a single XML file.
XML content in the OPC flat-file format can then be written to an already opened (even newly created) Word document using the Range.InsertXML method via "the interop". In a way, this "streams" the Open XML content into the opened Word document.
The problem with this approach is that certain document-level properties are not written to the target document, so not all aspects of the opened document can be changed. For example: page size, orientation, headers, footers... So if this kind of thing also needs to be affected the interop is required for such settings.

How to make a section optional when mapped to optional data in a Word OpenXml Part?

I'm using OpenXml SDK to generate word 2013 files. I'm running on a server (part of a server solution), so automation is not an option.
Basically I have an xml file that is output from a backend system. Here's a very simplified example:
<my:Data
xmlns:my="https://schemas.mycorp.com">
<my:Customer>
<my:Details>
<my:Name>Customer Template</my:Name>
</my:Details>
<my:Orders>
<my:Count>2</my:Count>
<my:OrderList>
<my:Order>
<my:Id>1</my:Id>
<my:Date>19/04/2017 10:16:04</my:Date>
</my:Order>
<my:Order>
<my:Id>2</my:Id>
<my:Date>20/04/2017 10:16:04</my:Date>
</my:Order>
</my:OrderList>
</my:Orders>
</my:Customer>
</my:Data>
Then I use Word's Xml Mapping pane to map this data to content control:
I simply duplicate the word file, and write new Xml data when generating new files.
This is working as expected. When I update the xml part, it reflects the data from my backend.
Thought, there's a case that does not works. If a customer has no order, the template content is kept in the document. The xml data is :
<my:Data
xmlns:my="https://schemas.mycorp.com">
<my:Customer>
<my:Details>
<my:Name>Some customer</my:Name>
</my:Details>
<my:Orders>
<my:Count>0</my:Count>
<my:OrderList>
</my:OrderList>
</my:Orders>
</my:Customer>
</my:Data>
(see the empty order list).
In Word, the xml pane reflects the correct data (meaning no Order node):
But as you can see, the template content is still here.
Basically, I'd like to hide the order list when there's no order (or at least an empty table).
How can I do that?
PS: If it can help, I uploaded the word and xml files, and a small PowerShell script that injects the data : repro.zip
Thanks for sharing your files so we can better help you.
I had a difficult time trying to solve your problem with your existing Word Content Controls, XML files and the PowerShell script that added the XML to the Word document. I found what seemed to be Microsoft's VSTO example solution to your problem, but I couldn't get this to work cleanly.
I was however able to write a simple C# console application that generates a Word file based on your XML data. The OpenXML code to generate the Word file was generated code from the Open XML Productivity Tool. I then added some logic to read your XML file and generate the second table rows dynamically depending on how many orders there are in the data. I have uploaded the code for you to use if you are interested in this solution. Note: The xml data file should be in c:\temp and the generated word files will be in c:\temp also.
Another added bonus to this solution is if you were to add all of the customer data into one XML file, the application will create separate word files in your temp directory like so:
customer_<name1>.docx
customer_<name2>.docx
customer_<name3>.docx
etc.
Here is the document generated from the first xml file
Here is the document generated from the second xml file with the empty row
Hope this helps.

itextsharp PDF to text dump

I am looking for a way to actually get the contents of the file itself, in its text format, dumped. E.g.: i don't want a dictionary object, i don't want some sort of extractionstrategy option, i just want the same text document that itextsharp uses to parse... the WHOLE thing as a string or stringbuilder...
I have not yet found a way to do this using any tools what so ever... my problem is that i am trying to read a dynamic PDF into a C# application... and we all know that those darn dynamic PDFs can't be parsed by iTextSharp (AcroForm and AcroFields always comes up empty), so i figured that if i can get the actual text dump of the entire file, i can see what it looks like and parse it myself for this specific task (e.g.: make a class for each document i know i can received, and make a map there based on what i see).
If anyone can help me do that, or even better, find a way, in C#, to extract the XML Source for the PDF (kinda like clicking the XML Source tab in LiveCycle) instead, it would be greatly appreciated.
Thanks!
Matt
If you are looking for the actual operators and commands of each page in the raw text format, try the following code:
var reader = new PdfReader("test.pdf");
int intPageNum = reader.NumberOfPages;
for (int i = 1; i <= intPageNum; i++)
{
byte[] contentBytes = reader.GetPageContent(i);
File.WriteAllBytes("page-" + i + ".txt", contentBytes);
}
reader.Close();
I am looking for a way to actually get the contents of the file
itself, in its text format, dumped. E.g.: i don't want a dictionary
object, i don't want some sort of extractionstrategy option, i just
want the same text document that itextsharp uses to parse... the WHOLE
thing as a string or stringbuilder...
Unfortunately the data that itextsharp uses to parse are not yet text; the operators in that data are given in some textual format but the actual glyphs may be given in a completely arbitrary ad-hoc encoding. That been said, often some standard encoding is used as it is the most simple solution for the components in use. You cannot in general count on that, though. The answer by VahidN shows you how to access the starting points for that content; not seldomly, though, that page content data he extracts only contain references to resources which are contained in different objects.
my problem is that i am trying to read a dynamic PDF into a C#
application... and we all know that those darn dynamic PDFs can't be
parsed by iTextSharp (AcroForm and AcroFields always comes up empty),
This sounds as if you actually have a completely different task at hand. Dynamic forms and their contents are not part of the page content but instead stored in a separate XML Forms Architecture stream.
iText in Action, 2nd edition, in chapter 8 gives you some information on how to access the XFA stream data, for a first glimps look at the sample XfaMovie.cs.
You might also want to look at the iText XML Worker project for easier manipulation of XFA streams.
if you just want to dump the text, try this:
PdfReader reader = new PdfReader(pdfFileName);
String text = "";
nPages = reader.NumberOfPages;
for (int i = 0; i < nPages; i++)
{
text += PdfTextExtractor.GetTextFromPage(reader, i + 1);
}

Trying to figure out what {s: ;} tags mean and where they come from

I am working on migrating posts from the RightNow infrastructure to another service called ZenDesk. I noticed that whenever users added files or even URL links, when I pull the xml data from RightNow it gives me a lot of weird codes like this:
{s:3:""url"";s:45:""/files/56f5be6c1/MUG_presso.pdf"";s:4:""name"";s:27:""MUG presso.pdf"";s:4:""size"";s:5:""2.1MB"";}
It wasn't too hard to write something that parses them and makes normal urls and links, but I was just wondering if this is something specific to the RightNow service, or if it is a tag system that is used. I tried googling for this but am getting some weird results so, thought stack overflow might have someone who has run into this one.
So, anyone know what these {s ;} tags are called and if there are any particular tools to use to read them?
Any answers appreciated!
This resembles partial PHP serialized data, as returned by the serialize() call. It looks like someone may have turned each " into "", which could prevent it from parsing properly. If it's wrapped with text like this before the {s: section, it's almost definitely PHP.
a:6:{i:1;a:10:{s:
These letters/numbers mean things like "an array with six elements follows", "a string of length 20 follows", etc.
You can use any PHP instance with unserialize() to handle the data. If those double-quotes are indeed returned by the API, you might need to replace :"" and ""; with " before parsing.
Parsing modules exist for other languages like Python. You can find more information in this answer.

Do we have any Equivalent of Response.AppendHeader in windows application

I came around this technique of converting datatable to excel
http://www26.brinkster.com/mvark/dyna/downloadasexcel.html
Do we have any Equivalent of Response.AppendHeader in windows application in C#.
Regards
Hema
The trick in the code sample that you have mentioned to dynamically generate an Excel file is based on the fact that documents can be converted from Word/Excel to HTML (File->Save As) and vice versa. Essentially a HTML page containing Office XML is created & in a web application a file download is triggered with the help of the following Response.AppendHeader statements -
Response.AppendHeader("Content-Type", "application/vnd.ms-excel");
Response.AppendHeader("Content-disposition", "attachment; filename=my.xls");
If you want to use this technique in a Winforms application, just save the string content as a text file and give the file an extension of ".xls". Instead of the last 3 lines in the sample's Page_Load method, replace it with this line -
System.IO.File.WriteAllText(#"C:\Report.xls", strBody);
HTH