Converting programmatically created openXML word documents to HTML errors with nulls

Converting programmatically created openXML word documents to HTML errors with nulls - openxml

I'm creating WordProcessingDocuments using openxml (which works fine and the produced word doc is exactly what I want), now I'm trying to convert these newly created docs to HTML using the openxml Powertools. I'm new to this so I'm hoping thats it's something stupid that I'm missing but was hoping someone could point me in the right direction with these nullable errors I'm receiving.
This is the exact error...
System.NullReferenceException: Object reference not set to an instance of an object.
at OpenXmlPowerTools.HtmlConverter.ConvertToHtmlTransform(WordprocessingDocument wordDoc, HtmlConverterSettings settings, XNode node, Func2 imageHandler)
at OpenXmlPowerTools.HtmlConverter.<>c__DisplayClass37.<ConvertToHtmlTransform>b__1d(XElement e)
at System.Linq.Enumerable.WhereSelectEnumerableIterator2.MoveNext()
at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
at System.Xml.Linq.XElement..ctor(XName name, Object content)
at OpenXmlPowerTools.HtmlConverter.ConvertToHtmlTransform(WordprocessingDocument wordDoc, HtmlConverterSettings settings, XNode node, Func2 imageHandler)
at OpenXmlPowerTools.HtmlConverter.<>c__DisplayClass37.<ConvertToHtmlTransform>b__1c(XElement e)
at System.Linq.Enumerable.WhereSelectEnumerableIterator2.MoveNext()
at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
at System.Xml.Linq.XElement..ctor(XName name, Object[] content)
at OpenXmlPowerTools.HtmlConverter.ConvertToHtmlTransform(WordprocessingDocument wordDoc, HtmlConverterSettings settings, XNode node, Func`2 imageHandler)
I'm using the exact same code you can find on Eric Whites blog.
public static void PrintHTML(string file)
{
byte[] byteArray = File.ReadAllBytes(file);
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc =
WordprocessingDocument.Open(memoryStream, true))
{
HtmlConverterSettings settings = new HtmlConverterSettings()
{
//PageTitle = "some title"
};
XElement html = HtmlConverter.ConvertToHtml(doc, settings);
File.WriteAllText(#"C:\\Temp\Test.html", html.ToStringNewLineOnAttributes());
}
}
}
I know the code works because if i pass it a normal worddoc that I haven't created it works fine and converts to html fine. If i create a word doc using openxml then manually copy the contents into a new word file, save it, then pass it through the conversion code, that will work as well. So I'm thinking it must be something to do with the way I'm createing the word doc in openxml initially. Maybe im not adding a part to the file that is required.
Using the openxml sdk I have compared a working and non working file and they appear to have the same components/parts.
From the errors I've posted does anyone have any ideas of where the problem could be, ie, what is null? I can post the creation code for the word doc but it's quite extensive and it might just confuse people more.

I finally got to the bottom of this. I had to dig out the source code for the HtmlConverter in the openxmlpower tools, after some debuging I found that this line in the code was erroring...
line 371
styleId = (string)wordDoc.MainDocumentPart.StyleDefinitionsPart
.GetXDocument().Root.Elements(W.style)
.Where(e => (string)e.Attribute(W.type) == "paragraph" &&
(string)e.Attribute(W._default) == "1")
.FirstOrDefault().Attributes(W.styleId).FirstOrDefault();
basically in my debugging the
(string)e.Attribute(W._default)
was returning as True or False
so i changed the following line
.Where(e => (string)e.Attribute(W.type) == "paragraph" &&
(string)e.Attribute(W._default) == "1")
to
.Where(e => (string)e.Attribute(W.type) == "paragraph" && (
(string)e.Attribute(W._default) == "1" || (string)e.Attribute(W._default) == "true"))
and now works as expected

Had the same issue where I was saving a reportbuilder report to OpenWordXML and could not convert the bytes to html.
Had to add the following line of code for it to work correctly with version 2.8.1.0
private static IEnumerable<XElement> ParaStyleParaPropsStack(XDocument stylesXDoc,
string paraStyleName, XElement para)
{
if (stylesXDoc == null)
yield break;
var localParaStyleName = paraStyleName;
while (localParaStyleName != null)
{
XElement paraStyle = stylesXDoc.Root.Elements(W.style).FirstOrDefault(s
=>
**s.Attribute(W.type) != null &&**
s.Attribute(W.type).Value == "paragraph" &&
s.Attribute(W.styleId).Value == localParaStyleName);
s.Attribute(W.type) != null && // the liner that was added

Related

Salesforce trigger-Not able to understand

Below is the code written by my collegue who doesnt work in the firm anymore. I am inserting records in object with data loader and I can see success message but I do not see any records in my object. I am not able to understand what below trigger is doing.Please someone help me understand as I am new to salesforce.
trigger DataLoggingTrigger on QMBDataLogging__c (after insert) {
Map<string,Schema.RecordTypeInfo> recordTypeInfo = Schema.SObjectType.QMB_Initial_Letter__c.getRecordTypeInfosByName();
List<QMBDataLogging__c> logList = (List<QMBDataLogging__c>)Trigger.new;
List<Sobject> sobjList = (List<Sobject>)Type.forName('List<'+'QMB_Initial_Letter__c'+'>').newInstance();
Map<string, QMBLetteTypeToVfPage__c> QMBLetteTypeToVfPage = QMBLetteTypeToVfPage__c.getAll();
Map<String,QMBLetteTypeToVfPage__c> mapofLetterTypeRec = new Map<String,QMBLetteTypeToVfPage__c>();
set<Id>processdIds = new set<Id>();
for(string key : QMBLetteTypeToVfPage.keyset())
{
if(!mapofLetterTypeRec.containsKey(key)) mapofLetterTypeRec.put(QMBLetteTypeToVfPage.get(Key).Letter_Type__c, QMBLetteTypeToVfPage.get(Key));
}
for(QMBDataLogging__c log : logList)
{
Sobject logRecord = (sobject)log;
Sobject QMBLetterRecord = new QMB_Initial_Letter__c();
if(mapofLetterTypeRec.containskey(log.Field1__c))
{
string recordTypeId = recordTypeInfo.get(mapofLetterTypeRec.get(log.Field1__c).RecordType__c).isAvailable() ? recordTypeInfo.get(mapofLetterTypeRec.get(log.Field1__c).RecordType__c).getRecordTypeId() : recordTypeInfo.get('Master').getRecordTypeId();
string fieldApiNames = mapofLetterTypeRec.containskey(log.Field1__c) ? mapofLetterTypeRec.get(log.Field1__c).FieldAPINames__c : '';
//QMBLetterRecord.put('Letter_Type__c',log.Name);
QMBLetterRecord.put('RecordTypeId',tgh);
processdIds.add(log.Id);
if(string.isNotBlank(fieldApiNames) && fieldApiNames.contains(','))
{
Integer i = 1;
for(string fieldApiName : fieldApiNames.split(','))
{
string logFieldApiName = 'Field'+i+'__c';
fieldApiName = fieldApiName.trim();
system.debug('fieldApiName=='+fieldApiName);
Schema.DisplayType fielddataType = getFieldType('QMB_Initial_Letter__c',fieldApiName);
if(fielddataType == Schema.DisplayType.Date)
{
Date dateValue = Date.parse(string.valueof(logRecord.get(logFieldApiName)));
QMBLetterRecord.put(fieldApiName,dateValue);
}
else if(fielddataType == Schema.DisplayType.DOUBLE)
{
string value = (string)logRecord.get(logFieldApiName);
Double dec = Double.valueOf(value.replace(',',''));
QMBLetterRecord.put(fieldApiName,dec);
}
else if(fielddataType == Schema.DisplayType.CURRENCY)
{
Decimal decimalValue = Decimal.valueOf((string)logRecord.get(logFieldApiName));
QMBLetterRecord.put(fieldApiName,decimalValue);
}
else if(fielddataType == Schema.DisplayType.INTEGER)
{
string value = (string)logRecord.get(logFieldApiName);
Integer integerValue = Integer.valueOf(value.replace(',',''));
QMBLetterRecord.put(fieldApiName,integerValue);
}
else if(fielddataType == Schema.DisplayType.DATETIME)
{
DateTime dateTimeValue = DateTime.valueOf(logRecord.get(logFieldApiName));
QMBLetterRecord.put(fieldApiName,dateTimeValue);
}
else
{
QMBLetterRecord.put(fieldApiName,logRecord.get(logFieldApiName));
}
i++;
}
}
}
sobjList.add(QMBLetterRecord);
}
if(!sobjList.isEmpty())
{
insert sobjList;
if(!processdIds.isEmpty()) DeleteDoAsLoggingRecords.deleteTheProcessRecords(processdIds);
}
Public static Schema.DisplayType getFieldType(string objectName,string fieldName)
{
SObjectType r = ((SObject)(Type.forName('Schema.'+objectName).newInstance())).getSObjectType();
DescribeSObjectResult d = r.getDescribe();
return(d.fields.getMap().get(fieldName).getDescribe().getType());
}
}

You might be looking in the wrong place. Check if there's an unit test written for this thing (there should be one, especially if it's deployed to production), it should help you understand how it's supposed to be used.
You're inserting records of QMBDataLogging__c but then it seems they're immediately deleted in DeleteDoAsLoggingRecords.deleteTheProcessRecords(processdIds). Whether whatever this thing was supposed to do succeeds or not.
This seems to be some poor man's CSV parser or generic "upload anything"... that takes data stored in QMBDataLogging__c and creates QMB_Initial_Letter__c out of it.
QMBLetteTypeToVfPage__c.getAll() suggests you could go to Setup -> Custom Settings, try to find this thing and examine. Maybe it has some values in production but in your sandbox it's empty and that's why essentially nothing works? Or maybe some values that are there are outdated?
There's some comparison if what you upload into Field1__c can be matched to what's in that custom setting. I guess you load some kind of subtype of your QMB_Initial_Letter__c in there. Record Type name and list of fields to read from your log record is also fetched from custom setting based on that match.
Then this thing takes what you pasted, looks at the list of fields in from the custom setting and parses it.
Let's say the custom setting contains something like
Name = XYZ, FieldAPINames__c = 'Name,SomePicklist__c,SomeDate__c,IsActive__c'
This thing will look at first record you inserted, let's say you have the CSV like that
Field1__c,Field2__c,Field3__c,Field4__c
XYZ,Closed,2022-09-15,true
This thing will try to parse and map it so eventually you create record that a "normal" apex code would express as
new QMB_Initial_Letter__c(
Name = 'XYZ',
SomePicklist__c = 'Closed',
SomeDate__c = Date.parse('2022-09-15'),
IsActive__c = true
);
It's pretty fragile, as you probably already know. And because parsing CSV is an art - I expect it to absolutely crash and burn when text with commas in it shows up (some text,"text, with commas in it, should be quoted",more text).
In theory admin can change mapping in setup - but then they'd need to add new field anyway to the loaded file. Overcomplicated. I guess somebody did it to solve issue with Record Type Ids - but there are better ways to achieve that and still have normal CSV file with normal columns and strong type matching, not just chucking everything in as strings.
In theory this lets you have "jagged" csv files (row 1 having 5 fields, row 2 having different record type and 17 fields? no problem)
Your call whether it's salvageable or you'd rather ditch it and try normal loading of QMB_Initial_Letter__c records. (get back to your business people and ask for requirements?) If you do have variable number of columns at source - you'd need to standardise it or group the data so only 1 "type" of records (well, whatever's in that "Field1__c") goes into each file.

How to merge two ppt by poi

I want to merge multiple ppts. I use POI realize most functions, but there are still some problems. Some elements are not generated. I tested several groups of ppts.
Case 1: If there is only one slide in the PPT, the result is right. If there are multiple slides, will throw exception.
Below is the exception stack:
java.lang.ClassCastException: org.apache.poi.ooxml.POIXMLDocumentPart cannot be cast to org.apache.poi.xslf.usermodel.XSLFPictureData
at org.apache.poi.xslf.usermodel.XSLFSheet.importBlip(XSLFSheet.java:649)
at org.apache.poi.xslf.usermodel.XSLFPictureShape.copy(XSLFPictureShape.java:378)
at org.apache.poi.xslf.usermodel.XSLFSheet.wipeAndReinitialize(XSLFSheet.java:454)
at org.apache.poi.xslf.usermodel.XSLFSheet.importContent(XSLFSheet.java:433)
at org.apache.poi.xslf.usermodel.XSLFSlide.importContent(XSLFSlide.java:294)
at com.office.MergingMultiplePresentations.main(MergingMultiplePresentations.java:38)
Case 2: I tested another PPT, and when I opened it, it prompted “there is a problem with the content, you can try to repair it“”. When I click repair, the some slide of the PPT was deleted. Is there something that hasn't been copied?
Here is my code:
XMLSlideShow ppt = new XMLSlideShow();
//taking the two presentations that are to be merged
String path = "E:\\prj\\test\\";
String file1 = "1.pptx";
String file2 = "2.pptx";
String[] inputs = {file1,file2};
for(String arg : inputs){
FileInputStream inputstream = new FileInputStream(path+arg);
XMLSlideShow src = new XMLSlideShow(inputstream);
for(XSLFSlide srcSlide : src.getSlides()) {
try {
XSLFSlideLayout srcLayout = srcSlide.getSlideLayout();
XSLFSlideMaster srcMaster = srcSlide.getSlideMaster();
XSLFSlide slide = ppt.createSlide();
XSLFSlideLayout layout = slide.getSlideLayout();
XSLFSlideMaster master = slide.getSlideMaster();
layout.importContent(srcLayout);
master.importContent(srcMaster);
slide.importContent(srcSlide);
}
catch (Exception e){
e.printStackTrace();
}
}
}
String file3 = "3.pptx";
//creating the file object
FileOutputStream out = new FileOutputStream(path+file3);
// saving the changes to a file
ppt.write(out);
out.close();

The operation of merging presentations using POI looks a bit cumbersome due to the fact that you have to take care of the layouts and masters yourself. It's easier to use Aspose.Slides for Java for this. The following code example shows you how to merge presentations using that library. Slide layouts and slide masters will be merged automatically.
String file1 = "1.pptx";
String file2 = "2.pptx";
String[] inputs = {file1, file2};
// Prepare a new empty presentation.
Presentation ppt = new Presentation();
ppt.getSlides().removeAt(0); // removes the first empty slide
ppt.getSlideSize().setSize(SlideSizeType.Widescreen, SlideSizeScaleType.Maximize);
// Merge the input presentations.
for (String file : inputs) {
Presentation source = new Presentation(file);
for (ISlide slide : source.getSlides()) {
ppt.getSlides().addClone(slide);
}
source.dispose();
}
ppt.save("3.pptx", SaveFormat.Pptx);
ppt.dispose();
This is a paid product, but you can get a temporary license to try it out.
Alternatively, you could use Aspose.Slides Cloud SDK for Java. This product provides a REST-based API that allows you to make 150 free API calls per month for API learning and presentation processing. The following code example shows you how to do the same using Aspose.Slides Cloud:
SlidesApi slidesApi = new SlidesApi("my_client_id", "my_client_secret");
String file1 = "1.pptx";
String file2 = "2.pptx";
String outFile = "3.pptx";
// Prepare a new empty presentation.
slidesApi.createPresentation(outFile, null, null, null, null, null);
slidesApi.deleteSlide(outFile, 1, null, null, null); // removes the first empty slide
SlideProperties slideProperties = new SlideProperties();
slideProperties.setSizeType(SlideProperties.SizeTypeEnum.WIDESCREEN);
slideProperties.setScaleType(SlideProperties.ScaleTypeEnum.MAXIMIZE);
slidesApi.setSlideProperties(outFile, slideProperties, null, null, null);
// Merge the input presentations.
PresentationsMergeRequest mergeRequest = new PresentationsMergeRequest();
mergeRequest.setPresentationPaths(Arrays.asList(file1, file2));
slidesApi.merge(outFile, mergeRequest, null, null, null);
Sometimes it is necessary to merge presentations without any code. For such cases, you can use the free Aspose Online Merger.
I work as a Support Developer at Aspose.

How to edit pasted content using the Open XML SDK

I have a custom template in which I'd like to control (as best I can) the types of content that can exist in a document. To that end, I disable controls, and I also intercept pastes to remove some of those content types, e.g. charts. I am aware that this content can also be drag-and-dropped, so I also check for it later, but I'd prefer to stop or warn the user as soon as possible.
I have tried a few strategies:
RTF manipulation
Open XML manipulation
RTF manipulation is so far working fairly well, but I'd really prefer to use Open XML as I expect it to be more useful in the future. I just can't get it working.
Open XML Manipulation
The wonderfully-undocumented (as far as I can tell) "Embed Source" appears to contain a compound document object, which I can use to modify the copied content using the Open XML SDK. But I have been unable to put the modified content back into an object that lets it be pasted correctly.
The modification part seems to work fine. I can see, if I save the modified content to a temporary .docx file, that the changes are being made correctly. It's the return to the clipboard that seems to be giving me trouble.
I have tried assigning just the Embed Source object back to the clipboard (so that the other types such as RTF get wiped out), and in this case nothing at all gets pasted. I've also tried re-assigning the Embed Source object back to the clipboard's data object, so that the remaining data types are still there (but with mismatched content, probably), which results in an empty embedded document getting pasted.
Here's a sample of what I'm doing with Open XML:
using OpenMcdf;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
...
object dataObj = Forms.Clipboard.GetDataObject();
object embedSrcObj = dateObj.GetData("Embed Source");
if (embedSrcObj is Stream)
{
// read it with OpenMCDF
Stream stream = embedSrcObj as Stream;
CompoundFile cf = new CompoundFile(stream);
CFStream cfs = cf.RootStorage.GetStream("package");
byte[] bytes = cfs.GetData();
string savedDoc = Path.GetTempFileName() + ".docx";
File.WriteAllBytes(savedDoc, bytes);
// And then use the OpenXML SDK to read/edit the document:
using (WordprocessingDocument openDoc = WordprocessingDocument.Open(savedDoc, true))
{
OpenXmlElement body = openDoc.MainDocumentPart.RootElement.ChildElements[0];
foreach (OpenXmlElement ele in body.ChildElements)
{
if (ele is Paragraph)
{
Paragraph para = (Paragraph)ele;
if (para.ParagraphProperties != null && para.ParagraphProperties.ParagraphStyleId != null)
{
string styleName = para.ParagraphProperties.ParagraphStyleId.Val;
Run run = para.LastChild as Run; // I know I'm assuming things here but it's sufficient for a test case
run.RunProperties = new RunProperties();
run.RunProperties.AppendChild(new DocumentFormat.OpenXml.Wordprocessing.Text("test"));
}
}
// etc.
}
openDoc.MainDocumentPart.Document.Save(); // I think this is redundant in later versions than what I'm using
}
// repackage the document
bytes = File.ReadAllBytes(savedDoc);
cf.RootStorage.Delete("Package");
cfs = cf.RootStorage.AddStream("Package");
cfs.Append(bytes);
MemoryStream ms = new MemoryStream();
cf.Save(ms);
ms.Position = 0;
dataObj.SetData("Embed Source", ms);
// or,
// Clipboard.SetData("Embed Source", ms);
}
Question
What am I doing wrong? Is this just a bad/unworkable approach?

TagLib-Sharp - Can it read/write chapter marks?

I am trying to read chapters from mp4 video files. I don't see this in the File.Tags list, but I was hoping there was a way to get them via requesting the chap atom.
I did try mp4chap, but it only gets me the first chapter. I think it may be meant for audio files only.

I am the author of mp4chap lib.
Yes, you are correct. mp4chap library is meant to use for audio files only.
Library is open source and really simple, you are free to modify it.
Here library analyze only first track (see code below). You can play with this code and get more data from mp4 tracks.
http://mp4chap.codeplex.com/SourceControl/latest#Mp4Chapters/ChapterExtractor.cs
private void ReadChapters(MoovInfo moovBox)
{
var soundBox = moovBox.Tracks.Where(b => b.Type == "soun").ToArray();
if (soundBox.Length == 0) return;
if (soundBox[0].Chaps != null && soundBox[0].Chaps.Length > 0)
{
var cb = new HashSet<uint>(soundBox[0].Chaps);
var textBox = moovBox.Tracks.Where(b => b.Type == "text" && cb.Contains(b.Id)).ToArray();
if (textBox.Length == 0) return;
ReadChaptersText(textBox[0]);
}
}

How do I unlock a content control using the OpenXML SDK in a Word 2010 document?

I am manipulating a Word 2010 document on the server-side and some of the content controls in the document have the following Locking properties checked
Content control cannot be deleted
Contents cannot be edited
Can anyone advise set these Locking options to false or remove then altogether using the OpenXML SDK?

The openxml SDK provides the Lock class and the LockingValues enumeration
for programmatically setting the options:
Content control cannot be deleted and
Contents cannot be edited
So, to set those two options to "false" (LockingValues.Unlocked),
search for all SdtElement elements in the document and set the Val property to
LockingValues.Unlocked.
The code below shows an example:
static void UnlockAllSdtContentElements()
{
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(#"c:\temp\myword.docx", true))
{
IEnumerable<SdtElement> elements =
wordDoc.MainDocumentPart.Document.Descendants<SdtElement>();
foreach (SdtElement elem in elements)
{
if (elem.SdtProperties != null)
{
Lock l = elem.SdtProperties.ChildElements.First<Lock>();
if (l == null)
{
continue;
}
if (l.Val == LockingValues.SdtContentLocked)
{
Console.Out.WriteLine("Unlock content element...");
l.Val = LockingValues.Unlocked;
}
}
}
}
}
static void Main(string[] args)
{
UnlockAllSdtContentElements();
}

Just for the ones who copy this code, keep in mind that if there is no Locks associated to the content control, then there won't be a Lock property associated to it, so when the code executes the following instruction, it will return a exception since there is no element found:
Lock l = elem.SdtProperties.ChildElements.First<Lock>();
The way to fix this is do the FirstOrDefault instead of First.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse