XLSX file via OpenXml SDK Both Valid and Invalid - openxml

I have a program which exports a System.Data.DataTable to an XLSX / OpenXml Spreadsheet. Finally have it mostly working. However when opening the Spreadsheet in Excel, Excel complains about the file being invalid, and needing repair, giving this message...
We found a problem with some content in . Do you want us to
try to recover as much as we can? If you trust the source of the
workbook, clik Yes.
If I click Yes, it comes back with this message...
Clicking the log file and opening that, just shows this...
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>error268360_01.xml</logFileName>
<summary>Errors were detected in file 'C:\Users\aabdi\AppData\Local\Temp\data.20190814.152538.xlsx'</summary>
<repairedRecords>
<repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1.xml part</repairedRecord>
</repairedRecords>
</recoveryLog>
Obviously, we don't want to deploy this into a production environment like this. So I've been trying to figure out how to fix this. I threw together a quick little sample to validate the XML and show the errors, based on this link from MSDN. But when I run the program and load the exact same XLSX document that Excel complains about, the Validator comes back saying that the file is perfectly Valid. So I'm not sure where else to go from there.
Any better tools for trying to validate my XLSX XML? Following is the complete code I'm using to generate the XLSX file. (Yes, it's in VB.NET, it's a legacy app.)
If I comment out the line in the For Each dr As DataRow loop, then the XLSX file opens fine in Excel, (just without any data). So it's something with the individual cells, but I'm not really DOING much with them. Setting a value and data type, and that's it.
I also tried replacing the For Each loop in ConstructDataRow with the following, but it still outputs the same "bad" XML...
rv.Append(
(From dc In dr.Table.Columns
Select ConstructCell(
NVL(dr(dc.Ordinal), String.Empty),
MapSystemTypeToCellType(dc.DataType)
)
).ToArray()
)
Also tried replacing the call to Append with AppendChild for each cell too, but that didn't help either.
The zipped up XLSX file (erroring, with dummy data) is available here:
https://drive.google.com/open?id=1KVVWEqH7VHMxwbRA-Pn807SXHZ32oJWR
Full DataTable to Excel XLSX Code
#Region " ToExcel "
<Extension>
Public Function ToExcel(ByVal target As DataTable) As Attachment
Dim filename = Path.GetTempFileName()
Using doc As SpreadsheetDocument = SpreadsheetDocument.Create(filename, DocumentFormat.OpenXml.SpreadsheetDocumentType.Workbook)
Dim data = New SheetData()
Dim wbp = doc.AddWorkbookPart()
wbp.Workbook = New Workbook()
Dim wsp = wbp.AddNewPart(Of WorksheetPart)()
wsp.Worksheet = New Worksheet(data)
Dim sheets = wbp.Workbook.AppendChild(New Sheets())
Dim sheet = New Sheet() With {.Id = wbp.GetIdOfPart(wsp), .SheetId = 1, .Name = "Data"}
sheets.Append(sheet)
data.AppendChild(ConstructHeaderRow(target))
For Each dr As DataRow In target.Rows
data.AppendChild(ConstructDataRow(dr)) '// THIS LINE YIELDS THE BAD PARTS
Next
wbp.Workbook.Save()
End Using
Dim attachmentname As String = Path.Combine(Path.GetDirectoryName(filename), $"data.{Now.ToString("yyyyMMdd.HHmmss")}.xlsx")
File.Move(filename, attachmentname)
Return New Attachment(attachmentname, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
End Function
Private Function ConstructHeaderRow(dt As DataTable) As Row
Dim rv = New Row()
For Each dc As DataColumn In dt.Columns
rv.Append(ConstructCell(dc.ColumnName, CellValues.String))
Next
Return rv
End Function
Private Function ConstructDataRow(dr As DataRow) As Row
Dim rv = New Row()
For Each dc As DataColumn In dr.Table.Columns
rv.Append(ConstructCell(NVL(dr(dc.Ordinal), String.Empty), MapSystemTypeToCellType(dc.DataType)))
Next
Return rv
End Function
Private Function ConstructCell(value As String, datatype As CellValues) As Cell
Return New Cell() With {
.CellValue = New CellValue(value),
.DataType = datatype
}
End Function
Private Function MapSystemTypeToCellType(t As System.Type) As CellValues
Dim rv As CellValues
Select Case True
Case t Is GetType(String)
rv = CellValues.String
Case t Is GetType(Date)
rv = CellValues.Date
Case t Is GetType(Boolean)
rv = CellValues.Boolean
Case IsNumericType(t)
rv = CellValues.Number
Case Else
rv = CellValues.String
End Select
Return rv
End Function
#End Region

For anyone else coming in and finding this, I finally tracked this down to the Cell.DataType
Setting a value of CellValues.Date will cause Excel to want to "fix" the document.
(apparently for dates, the DataType should be NULL, and Date was only used in Office 2010).
Also, if you specify a DataType of CellValues.Boolean, then the CellValue needs to be either 0 or 1. "true" / "false" will also cause Excel to want to "fix" your spreadsheet.
Also, Microsoft has a better validator tool already built for download here:
https://www.microsoft.com/en-us/download/details.aspx?id=30425

Related

Redemption in MS Access / Outlook (trying to include a separate message file in email)

This code is in MS Access (2010) VBA, using the Redemption library with Microsoft Outlook 2010.
I had this process working before, but we recently had a Citrix upgrade that I guess reset something in my Outlook and now the process no longer works.
I have a folder of .msg files which are basically pre-made email templates with all the proper formatting, images, text, etc.
This is what I was doing before:
Dim outlookApp As Object, namespace As Object
Dim oItem, MyItem
Set outlookApp = CreateObject("Outlook.Application")
Set namespace = outlookApp.GetNamespace("MAPI")
namespace.Logon
Set MyItem = outlookApp.CreateItemFromTemplate(path_to_dot_msg_file)
'And then there are many calls like this:
MyItem.HTMLBody = Replace(MyItem.HTMLBody, "Dear Person,", "Dear " & name)
'...
Set safeItem = CreateObject("Redemption.SafeMailItem")
Set oItem = MyItem
safeItem.Item = oItem
'this next line displays the email, and as of this line, it looks correct
'safeItem.Display
'but as of this line, the issue occurs
safeItem.HTMLBody = "<p>This is an extra message that shows up before the .msg file</p>" & safeItem.HTMLBody
safeItem.Recipients.ResolveAll
safeItem.Send
Now when the email is sent, the .msg contents aren't present at all -- the only thing that shows up is the "extra message" that I prepended to the HTMLBody.
What do I need to change or update? Is this something I need to change in the code, or in my Outlook settings, etc?
Extra: body insertion:
Function insertStringBodyTag(htmlBody As String, stringToInsert As String)
Dim s As String
Dim i As Integer
s = htmlBody
i = InStr(1, s, "<body")
i = InStr(i, s, ">")
s = Left(s, i) & stringToInsert & Right(s, Len(s) - i)
insertStringBodyTag = s
End Function
'Called with safeItem.htmlBody = insertStringBodyTag(safeItem.htmlBody, prefix_string)
You cannot concatenate 2 HTML strings and expect a valid HTML string back - the two must be merged - find the position of the "<body"substring in the original HTML body, then find the positon of the following ">" (this way you take care of the body element with attributes), then insert your HTML string following that ">".

How to edit pasted content using the Open XML SDK

I have a custom template in which I'd like to control (as best I can) the types of content that can exist in a document. To that end, I disable controls, and I also intercept pastes to remove some of those content types, e.g. charts. I am aware that this content can also be drag-and-dropped, so I also check for it later, but I'd prefer to stop or warn the user as soon as possible.
I have tried a few strategies:
RTF manipulation
Open XML manipulation
RTF manipulation is so far working fairly well, but I'd really prefer to use Open XML as I expect it to be more useful in the future. I just can't get it working.
Open XML Manipulation
The wonderfully-undocumented (as far as I can tell) "Embed Source" appears to contain a compound document object, which I can use to modify the copied content using the Open XML SDK. But I have been unable to put the modified content back into an object that lets it be pasted correctly.
The modification part seems to work fine. I can see, if I save the modified content to a temporary .docx file, that the changes are being made correctly. It's the return to the clipboard that seems to be giving me trouble.
I have tried assigning just the Embed Source object back to the clipboard (so that the other types such as RTF get wiped out), and in this case nothing at all gets pasted. I've also tried re-assigning the Embed Source object back to the clipboard's data object, so that the remaining data types are still there (but with mismatched content, probably), which results in an empty embedded document getting pasted.
Here's a sample of what I'm doing with Open XML:
using OpenMcdf;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
...
object dataObj = Forms.Clipboard.GetDataObject();
object embedSrcObj = dateObj.GetData("Embed Source");
if (embedSrcObj is Stream)
{
// read it with OpenMCDF
Stream stream = embedSrcObj as Stream;
CompoundFile cf = new CompoundFile(stream);
CFStream cfs = cf.RootStorage.GetStream("package");
byte[] bytes = cfs.GetData();
string savedDoc = Path.GetTempFileName() + ".docx";
File.WriteAllBytes(savedDoc, bytes);
// And then use the OpenXML SDK to read/edit the document:
using (WordprocessingDocument openDoc = WordprocessingDocument.Open(savedDoc, true))
{
OpenXmlElement body = openDoc.MainDocumentPart.RootElement.ChildElements[0];
foreach (OpenXmlElement ele in body.ChildElements)
{
if (ele is Paragraph)
{
Paragraph para = (Paragraph)ele;
if (para.ParagraphProperties != null && para.ParagraphProperties.ParagraphStyleId != null)
{
string styleName = para.ParagraphProperties.ParagraphStyleId.Val;
Run run = para.LastChild as Run; // I know I'm assuming things here but it's sufficient for a test case
run.RunProperties = new RunProperties();
run.RunProperties.AppendChild(new DocumentFormat.OpenXml.Wordprocessing.Text("test"));
}
}
// etc.
}
openDoc.MainDocumentPart.Document.Save(); // I think this is redundant in later versions than what I'm using
}
// repackage the document
bytes = File.ReadAllBytes(savedDoc);
cf.RootStorage.Delete("Package");
cfs = cf.RootStorage.AddStream("Package");
cfs.Append(bytes);
MemoryStream ms = new MemoryStream();
cf.Save(ms);
ms.Position = 0;
dataObj.SetData("Embed Source", ms);
// or,
// Clipboard.SetData("Embed Source", ms);
}
Question
What am I doing wrong? Is this just a bad/unworkable approach?

How to retrieve the value of an Input field and use it to modify a placeholder in a LibreOffice Basic macro?

I've spent two days on this and I'm still not able to figure it out 8-)
I have a LibreOffice Writer document with some Placeholders (Insert -> Fields -> More Fields -> Functions -> Placeholder -> Image) and Input fields (Insert -> Fields -> More Fieds -> Functions -> Input field) and I need to retrieve the value of an Input field and use it to replace a specified Placeholder in the same document.
To be more precise. I have an Input field where I enter for example 123
and somewhere in the document is a button, which triggers a macro, and this macro should:
retrieve the current value of the specified (named?) Input field ("123"),
"replace" a specified (named?) Placeholder with an image loaded from http://domain.tld/image/123.png
Is this somehow possible? Would be great, because I'm trying to to insert externally generated barcodes into my document...
These are both "Text fields", and some information and macro examples are in Andrew Pitonyak's book OpenOffice Macros Explained (available as a free pdf download from http://www.pitonyak.org/oo.php). The wiki page also has some good background.
Form controls (from the toolbar "Form controls") are named, so they have an advantage when working with macros. Text fields, however - the kind you have in your document - are not named, so you have to cycle through all the fields in a document, or highlight a particular run of text and cycle through the field within the highlighted area to find the one you are after. The Pitonyak document has examples of both methods.
Assuming the document has only one input field, this StarBasic code will print its current value:
Sub DisplayFields
Dim oEnum As Object
Dim oField As Object
oEnum = ThisComponent.getTextFields().createEnumeration()
Do While oEnum.hasMoreElements()
oField = oEnum.nextElement()
If oField.getPresentation(True) = "Input field" Then
Print "Input field contents: " & oField.getPresentation(False)
Exit Do
End If
Loop
End Sub
As far as I can tell, there is no API to replace a placeholder with its designated content. There might be a way with the dispatcher - the list of dispatch commands tantalizingly includes "FieldDialog" - but I wasn't able to find any documentation or examples.
I think what you'd have to do is find the field, put your cursor there, insert the image, then delete the placeholder field. Some more StarBasic code (again, assuming there's only a single placeholder field in the document):
Sub InsertImage
Dim oEnum As Object
Dim oField As Object
Dim oAnchor As Object
Dim oText As Object
Dim oCursor As Object
Dim FileName As String
Dim FileURL As String
Dim objTextGraphicObject As Object
oEnum = ThisComponent.getTextFields().createEnumeration()
Do While oEnum.hasMoreElements()
oField = oEnum.nextElement()
If oField.getPresentation(True) = "Placeholder" Then
oAnchor = oField.Anchor
oText = oAnchor.getText()
oCursor = oText.createTextCursorByRange(oAnchor.getEnd)
FileName = "C:\after zoo.JPG"
FileURL = convertToURL(FileName)
objTextGraphicObject = ThisComponent.createInstance("com.sun.star.text.TextGraphicObject")
REM Optional to set the size
' Dim objSize as New com.sun.star.awt.Size
' objSize.Width = 3530
' objSize.Height = 1550
' objTextGraphicObject.setSize(objSize)
objTextGraphicObject.GraphicURL = FileURL
oText.insertTextContent(oCursor.Start, objTextGraphicObject, false)
oField.dispose()
Exit Do
End If
Loop
End Sub

Annot doesnt have action

I'm trying to change all annotations link destination to fit page and also i need to validate the links.
This is the code i'm trying
Dim PdfReader As PdfReader
'Dim myBookmarks As New List(Of Dictionary(Of String, Object))()
Dim pg As Integer
PdfReader = New PdfReader("D:\\Annot_Testing.pdf")
Dim PageCount As Integer = PdfReader.NumberOfPages
For pg = 1 To PdfReader.NumberOfPages
Dim PageDictionary As PdfDictionary = PdfReader.GetPageN(pg)
Dim Annots As PdfArray = PageDictionary.GetAsArray(PdfName.ANNOTS)
If (Annots Is Nothing) OrElse (Annots.Length = 0) Then
Continue For
End If
''//Loop through each annotation
For Each A In Annots.ArrayList
''//I do not completely understand this but I think this turns an Indirect Reference into an actual object, but I could be wrong
''//Anyway, convert the itext-specific object as a generic PDF object
Dim AnnotationDictionary = DirectCast(PdfReader.GetPdfObject(A), PdfDictionary)
''//Make sure this annotation has a link
If Not AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK) Then Continue For
''//Make sure this annotation has an ACTION
If AnnotationDictionary.Get(PdfName.ACTION) Is Nothing Then Continue For
''//Get the ACTION for the current annotation
Dim AnnotationAction = DirectCast(AnnotationDictionary.Get(PdfName.A), PdfDictionary)
''//Test if it is a named actions such as /FIRST, /LAST, etc
If AnnotationAction.Get(PdfName.S).Equals(PdfName.NAMED) Then
MsgBox("Yes named")
''//Otherwise see if its a GOTO page action
ElseIf AnnotationAction.Get(PdfName.S).Equals(PdfName.PAGE) Then
MsgBox("page")
ElseIf AnnotationAction.Get(PdfName.S).Equals(PdfName.GOTOR) Then
MsgBox("gotoR")
ElseIf AnnotationAction.Get(PdfName.S).Equals(PdfName.GOTO) Then
''//Make sure that it has a destination
If AnnotationAction.GetAsArray(PdfName.D) Is Nothing Then Continue For
Dim AnnotationReferencedPage = PdfReader.GetPdfObject(DirectCast(AnnotationAction.GetAsArray(PdfName.D).ArrayList(0), PRIndirectReference))
''//Re-loop through all of the pages in the main document comparing them to this page
For J = 1 To PageCount
If AnnotationReferencedPage.Equals(PdfReader.GetPageN(J)) Then
Trace.WriteLine(J)
Exit For
End If
Next
ElseIf AnnotationAction.Get(PdfName.S).Equals(PdfName.URI) Then
Else
MsgBox("else type")
End If
Next
Next
'bookmarkGlobalSetting(myBookmarks)
PdfReader.Close()
In the sample document attached having 3 annots, but its not showing action on it while reading via code, but its availed when seeing this in acrobat. Please suggest me on this..
please download from here... https://drive.google.com/file/d/0B72uT6TzR_cdWGczRkY3ZmNpeTg/edit?usp=sharing
Those annotations do not have any A action entry:
Your expectation to find an action with a link is wrong because it may alternatively have a destination, cf. section 12.5.6.5 Link Annotations of ISO 32000-1:
A link annotation represents either a hypertext link to a destination elsewhere in the document (see 12.3.2, “Destinations”) or an action to be performed (12.6, “Actions”). Table 173 shows the annotation dictionary entries specific to this type of annotation.
Your code only deals with the latter type (with an action to be performed) and not the former (with a hypertext link to a destination elsewhere in the document).
Thus, you have to improve your code to also deal with the former type. You'll find details in Table 173 – Additional entries specific to a link annotation of ISO 32000-1.
PS: I assume your
If AnnotationDictionary.Get(PdfName.ACTION) Is Nothing Then Continue For
should have been
If AnnotationDictionary.Get(PdfName.A) Is Nothing Then Continue For

Custom clipboard data format accross RDC (.NET)

I'm trying to copy a custom object from a RDC window into host (my local) machine. It fails.
Here's the code that i'm using to 1) copy and 2) paste:
1) Remote (client running on Windows XP accessed via RDC):
//copy entry
IDataObject ido = new DataObject();
XmlSerializer x = new XmlSerializer(typeof(EntryForClipboard));
StringWriter sw = new StringWriter();
x.Serialize(sw, new EntryForClipboard(entry));
ido.SetData(typeof(EntryForClipboard).FullName, sw.ToString());
Clipboard.SetDataObject(ido, true);
2) Local (client running on local Windows XP x64 workstation):
//paste entry
IDataObject ido = Clipboard.GetDataObject();
DataFormats.Format cdf = DataFormats.GetFormat(typeof(EntryForClipboard).FullName);
if (ido.GetDataPresent(cdf.Name)) //<- this always returns false
{
//can never get here!
XmlSerializer x = new XmlSerializer(typeof(EntryForClipboard));
string xml = (string)ido.GetData(cdf.Name);
StringReader sr = new StringReader(xml);
EntryForClipboard data = (EntryForClipboard)x.Deserialize(sr);
}
It works perfectly on the same machine though.
Any hints?
There are a couple of things you could look into:
Are you sure the serialization of the object truely converts it into XML? Perhaps the outputted XML have references to your memory space? Try looking at the text of the XML to see.
If you really have a serialized XML version of the object, why not store the value as plain-vanilla text and not using typeof(EntryForClipboard) ? Something like:
XmlSerializer x = new XmlSerializer(typeof(EntryForClipboard));
StringWriter sw = new StringWriter();
x.Serialize(sw, new EntryForClipboard(entry));
Clipboard.SetText(sw.ToString(), TextDataFormat.UnicodeText);
And then, all you'd have to do in the client-program is check if the text in the clipboard can be de-serialized back into your object.
Ok, found what the issue was.
Custom format names get truncated to 16 characters when copying over RDC using custom format.
In the line
ido.SetData(typeof(EntryForClipboard).FullName, sw.ToString());
the format name was quite long.
When i was receiving the copied data on the host machine the formats available had my custom format, but truncated to 16 characters.
IDataObject ido = Clipboard.GetDataObject();
ido.GetFormats(); //used to see available formats.
So i just used a shorter format name:
//to copy
ido.SetData("MyFormat", sw.ToString());
...
//to paste
DataFormats.Format cdf = DataFormats.GetFormat("MyFormat");
if (ido.GetDataPresent(cdf.Name)) {
//this not works
...