Conversion to PDF - iphone

I want to do a project to convert different file formats to pdf. The file formats are
o Images: .jpg, .tiff, .gif
o Microsoft Word: .doc and .docx
o Microsoft PowerPoint: .ppt and .pptx
o Microsoft Excel: .xls and .xlsx
o Web pages: .htm and .html
o Apple Keynote: .key
o Apple Numbers: .numbers
o Apple Pages: .pages
o Text: .txt
Images and text can be converted to pdf and then stored. But can other file formats be converted. Can any one help with a link or something which promotes this. Any help is appreciated.
Thanks

yes, xlsx or pdfx are some kind of xml files (http://en.wikipedia.org/wiki/Office_Open_XML). So you can parse and render the files by your self.

Related

Read the pageviews.gz files from wikipedia

I wrote a script to download the pagviewsXXXXX.gz files from wikipedia. So fa so good.
When I unzip the files the content is illegible. Any one knows how to read the content of the pagwviews.gz files ? If there is some api or any idea on how to do it ?
Thanks in advance
I don't know what software you used to decompress the .gz files. I just used 7-zip on a 64-bit Win10 machine with success. Having done that I find that https://dumps.wikimedia.org/other/pagecounts-raw/ provides a description of the lines in the uncompressed file.
The line
de Stadio_Arena_Garibaldi_-_Romeo_Anconetani 1 11820
is from the de (German) wikipedia, page 'Stadio_Arena_Garibaldi_-_Romeo_Anconetani', which had been referenced once in the hour-long period covered by the gzipped file, and the server returned 11,820 bytes.
This line looks like gibberish.
ar %D9%85%D8%B7%D9%8A%D8%A7%D9%81%D9%8A%D8%A9 1 16742
The first two characters, however, indicate that it represents a reference to the Arabic version of wikipedia. The '%' items are non-ascii characters.

Why aren't word processing program documents stored as plaintext?

Whenever an MS word (or LibreOffice or other word processor) document is opened in its respective program, the words appear normally on the page, but when the document is opened in a text editor, most of it is Unicode gibberish.
I can understand why the document might have some parts that aren't legible, like bullet points or metadata, but why isn't at least some of the content stored as plaintext? Does every letter get encoded?
The last format docx of Microsoft Word is an xml with plain text compressed with zip. You can unzip the file by renaming docx to zip and then open the file with a notepad. So it is stored partially as plain text just compressed.
I find that it is probably a branding thing. If you want you can import it to a Text File.
If you go to File > Export > Change File Type > Plain Text (*.txt), you can export the document there.

convert stream file of iText PDF not opening MS word

Our project has requirement to generate end report both in PDF and MS-Word Document. We are using iTextSharp to dynamically generate tables and rows in report. Finally we will upload the file to server as PDF and MS-word. Both will be converted to Byte Array/Stream file and saved as PDF and MS-Word Document. In Which,uploaded PDF working as expected, but MS-word getting error and not opening(Attaching the screen shot).
iTextSharp doesn't produce MS Word documents, so this isn't an actual iText question. When I look at your screen shot, I see that you are trying to import a PDF file into Word. Since Word can't interpret PDF syntax, it shows you the syntax of the PDF file:
%PDF-1.4
%âãÏÓ
1 0 obj
<</Type/Font...
I think your question is wrong. You are not using iTextSharp to create a PDF file and an MS Word file. You are using iTextSharp to create a PDF file, and not an MS Word file.
There is no such thing as "Save a PDF as MS Word file" in iTextSharp, and it will be extremely difficult to find another tool that can convert a PDF document to a Word document in an acceptable way. (There are such tools, but the quality is suboptimal for PDFs that weren't made to be converted to another format.)

Starting format for text file to be converted to other formats

I need to write a document using images, texts, hyperlinks... And then convert it to PDF and DOC (but in the future it can be converted to more file formats).
What's the best "starting format" for this document?
Doc or Docx might be the best file format for creating the document containing images, texts, hyperlinks, and many more elements. Once created, it's easy to convert files in .doc/.docx format into other file format, such as Image, PDF, HTML, by using OpenXML or even commercial library like Spire.Doc.

How to determine the content of an Office content type of an saved Document?

The documentation of getFileAsyc says it will always be in (.pptx or .docx) in Office Open XML (OOXML)
Since Office 2016 this holds no longer true, if one saves the file in OpenDocument format (*.odt).
How will I get the information about the filetype? The name ends with *.odt, but in Word 2013 the name also ended with *.odt, but was transferred as *.docx
Example:
In following case, the binary filecontent cannot be determined:
Create an empty file in Word
Insert your TaskpaneApp
Safe the file as *.odt to you PC in Word
call getFileAsync(Compressed), and
get no docx but odt-content in Word 2016 with the name .odt
get docx-content in Word 2013 with the name .odt
For Word 2013 I fixed the problem by adding .docx to the provided name. Exactly this fix causes the Problems with Word 2016, where the file is realy a *.odt
The input parameter of the getFileAsync method is precisely the file type you need. And this is independently on which format you saved the file.
Office.js supports 3 file types: compressed (which is docx,pptx, etc), text (plain text) and PDF. ODT is not a file format supported in the getFileAsync Method. Check the article you referred to see what formats are supported in which Office Applications.
hope this clarification helps.