Parsing OpenXML document [Windows Store Apps/Metro Apps] - microsoft-metro

I do not feel like reinventing the whole wheel on Windows Store Apps, but is there any readily available way that I am able to parse OpenXML document quickly on Windows Store Apps?
I had been using the OpenXML SDK (currently on 2.5) but it is unfortunately tied with WindowsBase and I can't reference it in my project.
I read somewhere which mentioned that I should be looking at Windows.IO.Packaging, I do understand that OpenXML is a container of all files, but that is "reinventing the whole wheel" to me already, and I need to go down to the file access level to read the WorkBookPart and WorkSheetPart` (... blah blah blah).
Thanks!~

Apparently I took the chance to learn OpenXML and was able to parse the .XLSX file on my own. It wasn't as difficult as I initially feared. The fact that we can use ZipArchive to help derive the corresponding Sheet, coupling with the use of Linq to Xml means that a custom-made ExcelReader class takes about 1 hour to write.

Related

Can I convert .docx Word documents using the DocX .NET Library?

I am currently attempting to convert a couple of .NET desktop applications that I have developed into a web application harnessing AngularJS and RESTful services.
One of the key components of these applications is in their ability to generate Word documents on the fly using a .dotx Word template. I am currently exploring the possibility of using a third party library called DocX to generate these Word documents without resorting to using a template.
I guess my question is: Can I use this library to read an existing Word document in .docx format and generate a source code representation of the document? If this is possible could someone point me in the direction of any code samples that I could use? I have looked around and have been unable to find anything that could help me get started.
Generating code representation of the document and using it with DocX seems like a time consuming effort to me. Why not using a template instead and fill it with data at runtime?
I have some experience with Docentric, which is 3rd party OpenXML toolkit. It features an Word Add-in for template design and libraries for document generation and manipulation. It took me less then a week to generate pretty complex documents. If I was in your shoes I would definitely try some 3rd party toolkits. They cost money, but save time so do some math and see it they can be useful for you.
It is possible to read an existing Word document in .docx format with following code
DocX document = DocX.Load(filename)
While it is impossible to generate a source code representation of a document.

Building an automated script to copy a page of text, translate it in Google Translator and paste it in Word

So, as the title says, I would like to make an automated script that is going to take all the text from one PDF page, copy it, paste it into Google Translate and then copy the translated text into another Microsoft Word document.
Since that PDF has a lot of pages (150+), I thought it may be easier to make an automated script to do that.
What language would I have to use, would it be complicated for me to do it and in the end, will I actually save time by using this script (implying that I have to learn it first, but I have some programming experience (I know C++, Javascript, PHP), but I do not have a strong grasp of algorithms (like Flood Fill, ...))?
Thanks in advance!
EDIT : I found that I could use AutoIt for scripting... but I don't know would I be better off using AutoIt or Powershell... I also want to learn something that would be enable me to create other scripts (for example to automate some processes I do in Camtasia Studio)... So, AutoIt or Powershell?
As an AutoIt user I would say AutoIt.
Copying text out of PDFs is not quite as simple as you might imagine. Mileage will vary on how the PDF was created, and there are several methods you can use:
Most PDFs will have most of the text in the file itself, allowing you to get the text using a simple method like this
This method uses zlib to do something to the pdf. Not sure what as I've never tried it.
There are a variety of examples of using third party programs to do this, which may be better. There is one using Debenu and another using XPDF
Automating other programs such as acrobat should be possible, in acrobats case they have an api that can be used, though I'm not aware of this already being wrapped in AutoIt.
As to the rest of the requirements, there is a UDF to translate with google translate here, and the word UDF is a standard one that comes with the AutoIt installation.

docx - markup / markup - docx conversion

I have to store some documents in the docx format, but can't stand using msword: I would like to edit some kind of plain text markup, anything except stuff based on XML (I don't like that either) and convert from/to that to/from docx.
Are there any options for this?
EDIT: since people think this is not programming related, I'll extend my question. What libraries do you suggest for writing a complete tex-docx/docx-tex converter?
If you're talking .net, I'd check out the OpenXML toolkit first. There are lots of "libraries" on the internet to do this, but they all seem to just be thin wrappers around the OpenXML stuff.
You might also check out
http://openxmldeveloper.org/
Aspose.Words for .NET allows you to create DOCX files from scratch using text or other content and then convert DOCX files to text etc. It doesn't require MS Office to be installed on the system. And the component is a simple .NET assembly with an easy to learn and implement API. Please try and see if it helps in your scenario.
Disclosure: I work as developer evangelist at Aspose.
You can try the DocxEditorKit http://java-sl.com/docx_editor_kit.html
Set the editor kit to JEditorPane, add styled text and store the document in docx format.

Is there an alternative to open-xml sdk to generate word documents

I'm trying to generate word documents using open xml sdk. When the documents are small this is no problem (and rather easy). When the documents become larger (+500 pages) I notice the peformance (duration, memory usage, ...) goes down significantly.
Googling this problem I came across some posts that point out the same problem. For excel there is a solution with spreadsheetgear.
I would like to know if there is a word alternative to this or if there are other solutions to generate word documents?
Thanks,
Jelle
I've written a blog post series on generating Open XML WordprocessingML documents. The approach that I take is that you create a template Word document, insert content controls, and then write XPath expressions in those content controls to specify the XML to pull from a source XML data file. I've also explored another approach where you write C# code in Open XML content controls. That approach also works.
http://ericwhite.com/blog/map/generating-open-xml-wordprocessingml-documents-blog-post-series/
-Eric
You might look at http://docx.codeplex.com/
On Java, you could use docx4j. If you were brave, you could create DLLs for it via IKVM...
I decided to go with Aspose Words. It is really fast and not very demanding on resources (CPU, memory). It has the disadvantage that it is quite expensive. I also investigated Softartisans Office writer. The posibilities are the same but due to fact that the company I'm currently working for already used other Aspose components we decided to go with Aspose Word.

Own data format for the iPhone

I would like to create my own data format for an iPhone app. The files should be similar structured as e.g. Apple's iWork files (.pages). That means, I have a folder with some files in it:
The file 'Juicy.fruit' contains:
Fruits
---> Apple.xml
---> Banana.xml
---> Pear.xml
---> PreviewPicture.png
This folder "Fruits" should be packed in a handy file 'Juicy.fruit'. Compression isn't necessary. How could I achieve this? I've discovered some open source ZIP-libraries. However, I would like to to build my own data format with the iPhones built-in libs (if possible).
Best regards,
Stefan
Okay, so there are three ways I am reading your question, here's my best guess on each one:
You want your .fruit files to be associated with your app via Safari/SMS/some network connection (aka when someone wants to download files made for your app or made by your app).
In this case, you can register a protocol for your app, as discussed here:
iPhone file extension app association
You want the iPhone to globally associate .fruit files with your app, in which case you want to look into Uniform Type Identifiers. Basically, you set up this association in your installer's info.plst file.
You want to know how you can go from having a folder with files in it to that folder being a single file (package) with your .fruit extension.
If that's the case, there are many options out there and I don't see a purpose in rolling your own. Both Microsoft and Adobe simply use a standard zip compression method and use their own extension (instead of .zip). If you drop any office 2007 document, such as docx or Adobe's experimental .pdfxml file into an archive utility (I like 7z, but any descent one will do), you will get a folder with several xml files, just like you're describing for your situation. (This is also how Java's jar file type works, fyi). So unless you have a great reason to avoid standard compression methods (I vote gzip), I would follow the industry lead on this one.
I can definitly appreciate the urge to go DIY at every level possible, but you're basically asking (if it's #3) how you can create your own packaging algorithm, and after reading how some of the most basic compression methods work, I would leave that one alone. Plus I really doubt that Apple has built in libraries for doing something that most people will just use standard methods for.
One last note:
If you are really gunning to do it from scratch (still suggest not), since your files are all XML, you could just create a new XML file that will act as a wrapper of sorts, and have each file go into that wrapper file. But this would be really redundant when it came time to unwrap, as it would have to load the whole file every time. But it would be something like:
Juicy.fruit --
<fruit-wrapper>
<fruit>
<apple>
... content from apple.xml
</apple>
</fruit>
<fruit>
<banana>
... content from banana.xml
</banana>
</fruit>
<fruit>
<pear>
... content from pear.xml
</pear>
</fruit>
<picture>
...URL-encoded binary of preview picture
</picture>
</fruit-wrapper>
But with this idea, you either have to choose to unpack it, and thus risk losing track of the files, overwriting some but not all, etc etc, or you always treat it like one big file, in which case, unlike with archives, you have to load all of the data each time to pull anything out, instead of just pulling the file you want from the archive.
But it could work, if you're determined.
Also, if you are interested, there is a transfer protocol intended specifically for XML over mobile called WBXML (Wap Binary XML). Not sure if it is still taken seriously, but if there is an iPhone library for it, you should research it.