WOPI corrupting files on edit - ms-wopi

I have a WOPI host running in a Blazor server application with all of the .wopitest tests passing for the desired functionality (others skipped).
When I upload a word document, I am able to view the document with no issues. I am also able to edit the document, however when I try and edit the document a second time, I get an error.
The error doesn't appear to be handled and seems to originate in the Office online javascript file.
Error on attempting second edit
Following the error, I am still able to open the document for viewing. It is the same behaviour if I use the 'Editing' button in the Office Online page or directly navigate to the editing page using an edit action url.
Supplementary information:
Using ngrok to debug locally
.NET 6
Using SQLite database for holding file information (including path to file)
Using local folders for storing file contents (e.g. 'data' folder containing all files)
Similar issues with .xslx files beign corrupted upon editing and requiring a 'repair' when opened with Excel. This repair removes cells containing text and indicates that it removes the theme.
Viewing a word document gives the following console errors View document error
The first editing of a word document gives the following console errors Edit document error
I was expecting to be able to repeatedly edit the document.
I tried opening the file in the Desktop version of Word and got the following error Desktop Word recover
Following a recover, the document appears to work as expected in Word (desktop) but still won't open for editing through WOPI.

Turns out it was the way the POST http request body was being saved.
Still not certain what was going wrong but somewhere along the way of writing the stream into a buffer and then saving that to a file corrupted the file.
I suspect the file stream was either truncating or adding a few bytes.
The interesting part being that Office Online was still able to view the file.
This indicates there is some tolerance for malformed files still being served.

Related

Convert HTML to a Word-document which can be edited in Word Online

Our users write in a rich-text field, pretty much like this one, and we would like for them to be able to export this as a Word Document in their OneDrive, preserving the formatting, and being able to open the file in Word Online.
I have no trouble creating new files using the https://graph.microsoft.com/v1.0/ api. The problem is the conversion to docx. The google api provides this conversion automatically, but I did not find that for Microsoft. I tried using html-docx-js and it almost works perfectly.
The file is created:
However when opening the file the following dialog pops up:
Opening in read-only mode works, the file shows with the correct formatting.
Downloading the file and opening in desktop word works perfectly (i.e. editing as well).
The HTML-content i use is a simple div with a few p-tags, so the "Objects that Word Online doesn't support" probably comes from html-docx-js.
Here's an example word-file that is created. This file can be opened as normal in desktop Word but only opened in read-only in Word Online
https://1drv.ms/w/s!AqpUGtnMiyurgwE543OscH7PdLnY
Any ideas?

itext pdfreader not working in unix [duplicate]

I have some code that reads pdf files. The code fails at the line :
iTextSharp.text.pdf.PRTokeniser.CheckPdfHeader() at
iTextSharp.text.pdf.PdfReader.ReadPdf()
I know from other entries that this issue is coming from some invalid formatting in the pdf. However I'm not in a position to tell my users to redo their pdfs. Is there some other way around this issue, that can allow reading of the pdf despite this problem?
If a file doesn't start with %PDF- then there's nothing to fix: the file isn't a PDF file.
However, there may be another problem: maybe you're trying to access a file that has zero length due to some problem while creating the InputStream. Another context in which I've seen this happen, is a PDF loaded from a server, where the server returned a 404 message in HTML instead of a PDF file ;-)
Whenever that exception happens, you should store the bytes somewhere, and examine them. Without those bytes, nobody will be able to give you useful advice.

Opening OneDrive file in desktop Word

I'm trying to open a file for edit from Office365's OneDrive in desktop version of Word(I'm logged in with my Office 365 account) using ms-word protocol and I have noticed that there are several possibilities:
Sometimes file opens in edit mode, I can edit file and by pressing Ctrl + S save it directly to OneDrive without being prompted for any additional actions.
Sometimes file opens in Read Only mode, I can switch to Editor mode, but then when I try to save file I'm prompted to specify save location(default location is my OneDrive directory with this file).
Sometimes Word asks me to login to my Office365 account(even though I'm logged in with this account in Word), then opens file in Read Only mode and after it looks like 2nd case.
I would like to open it as described in 1st case so user doesn't have to make any additional actions.
My current scenario is:
User calls an API to create file.
API creates file in user's OneDrive using Microsoft Graph.
API returns direct URL to file and I open this file in Word using ms-word protocol.
By direct URL to file I mean: https://domain-my.sharepoint.com/personal/account/Documents/Apps/Microsoft Graph/appname/directoryname/filename.docx
URL to open file looks like:
ms-word:ofe|u|<file path specified above>
And as I described at the beginning there are 3 cases how file is opened and it looks randomly for me.
I have also noticed that when I open my file in Word Online(using web url to file) and then I press Edit in Word it uses exactly the same file URL I have created and returned to user but from here the file always open with 1st scenario.
Do you have any ideas why this behaves differently when I manually open file using ms-word protocol compared to Word Online using ms-word protocol with exactly the same url?
I would like to always open file from user's OneDrive in desktop Word in scenario when user doesn't have to make any additional steps to edit and save file back to OneDrive.
(I don't have reputation so I can't comment. I will try again with a partial answer.)
There is always a chance that the credentials will have to be refreshed, so there is no way to completely prevent Office apps from prompting for credentials but it should be relatively uncommon.
As to the issue of opening in edit mode vs protected mode: There are a variety of reasons why some files will open into protected view: https://support.office.com/en-us/article/What-is-Protected-View-d6f09ac7-e6b9-4495-8e43-2bbcdbcb6653
If you have a file that seemingly opens in edit mode vs some version of readonly or protected view, please use answers.microsoft.com where the conversation doesn't have to fit into the stackoverflow model.
when I open my file in Word Online(using web url to file) and then I press Edit in Word it uses exactly the same file URL
You suggest that the URLs are identical, but my first thought was that the difference may have been that the Word Online link uses the driveItem's webDavUrl property rather than baseItem's webUrl
https://learn.microsoft.com/en-us/onedrive/developer/rest-api/resources/driveitem#json-representation

Editing Word Documents from Web Server

I have looked for a solution to this but all I have found are products that are close but not what I need.
We have a program that creates a word document on the fly based on data from our database, and stores it on our server, then the user can download this file to print,email,file away.
I need something that will allow the user to open the existing document from the server, edit it, and save it back to the server.
I need this to be able to work on all browser, so activex isn't a full solution.
This link is a proof of concept of using CKEditor to do what you describe.
The focus is on ensuring that the "long tail" of possible docx content is preserved across the editing process.
For example, take a look at the Microsoft demo docx, which they use to compare their web apps with Google Docs, at
google-documents-vs-word-web-app

Invalidpdfexception pdf header signature not found

I have some code that reads pdf files. The code fails at the line :
iTextSharp.text.pdf.PRTokeniser.CheckPdfHeader() at
iTextSharp.text.pdf.PdfReader.ReadPdf()
I know from other entries that this issue is coming from some invalid formatting in the pdf. However I'm not in a position to tell my users to redo their pdfs. Is there some other way around this issue, that can allow reading of the pdf despite this problem?
If a file doesn't start with %PDF- then there's nothing to fix: the file isn't a PDF file.
However, there may be another problem: maybe you're trying to access a file that has zero length due to some problem while creating the InputStream. Another context in which I've seen this happen, is a PDF loaded from a server, where the server returned a 404 message in HTML instead of a PDF file ;-)
Whenever that exception happens, you should store the bytes somewhere, and examine them. Without those bytes, nobody will be able to give you useful advice.