OpenXML: Issue adding images to documents

OpenXML: Issue adding images to documents - openxml

Up until now, this block of code has been using to build documents with text for several months with no snags. I am now trying to dynamically add images. I've spent about two days staring at code and researching and am at an end. I suspect the issue is that relationships are not being created (more details below.) Maybe not?
//set stuff up...
WordprocessingDocument doc = WordprocessingDocument.Open(fsPat, true, new OpenSettings(){
AutoSave = true,
MarkupCompatibilityProcessSettings = new MarkupCompatibilityProcessSettings(MarkupCompatibilityProcessMode.ProcessAllParts,
DocumentFormat.OpenXml.FileFormatVersions.Office2007),
MaxCharactersInPart = long.MaxValue
});
MainDocumentPart mainPart = doc.MainDocumentPart;
.
.Other stuff goes here
.
//now the fun...
Run r2 = new Run();
// Add an ImagePart.
ImagePart ip = mainPart.AddImagePart(ImagePartType.Png);
string imageRelationshipID = mainPart.CreateRelationshipToPart(ip); //
using (Stream imgStream = ip.GetStream())
{
System.Drawing.Bitmap b = new System.Drawing.Bitmap("myfile.png");
b.Save(imgStream, System.Drawing.Imaging.ImageFormat.Png);
}
Drawing drawing = BuildImage(imageRelationshipID, "name"+imageRelationshipID.ToString(), 17, 17);
r2.Append(drawing);
p.Append(r2);
The image part is essentially copied from http://blog.stuartwhiteford.com/?p=33) and is running in a loop presently. I also copied his BuildImage() function and use it as-is.
When I open the resulting docx, I see red Xs where the images are saying "This image cannot currently be displayed."
When I open the zip, the images will appear in root/media, but not root/word/media as I'd expect. I also cannot find the images referenced in any of the relationship files. Ideally they'd be in root/word/_rels/document.xml.rels. You'll notice I changed how imageRelationshipID is set hoping to fix this. It didn't.
Please help. Thank you.

So... It seems like OpenXML just hates me. I copied AddImagePart code from like 3-4 places among trying other things--none of which lasted long--and just could not get relationships to form. The implication I see is that they happen automatically with the AddImagePart function.
I ended up doing a complete workaround where I add all the pictures I might want to put and remove the Drawing nodes' parents of the ones I didn't want (Run nodes, generally.) Since these are very small pictures, it's feasible and in ways more elegant than trying to add them as necessary since I don't have to keep track of where images are stored on disk.

Related

Javascript DOM addressing into a sub-window DOM element

Given this screenshot of a Firefox DOM rendering, I'm interested in reading that highlighted element down a ways there and writing to the "hidden" attribute 3 lines above it. I don't know the Javascript hierarchy nomenclature to traverse through that index "0" subwindow that shows in the first line under window indexed "3" which is the root context of my code's hierarchy. That innerText element I'm after does not appear anywhere else in the DOM, at least that I can find...and I've looked and looked for it elsewhere.
Just looking at this DOM, I would say I could address that info as follows: Window[3].Window[0].contentDocument.children[0].innerText (no body, interestingly enough).
How this DOM came about is a little strange in that Window[0] is generated by the following code snippet located inside an onload event. It makes a soft EMBED element, so that Window[0] and everything inside is transient. FWIW, the EMBED element is simply a way for the script to offload the task of asynchronously pulling in the next .mp4 file name from the server while the previous .mp4 is playing so it will be ready instantly onended; no blocking necessary to get it.
if (elmnt.contentDocument.body.children[1] == 'undefined' || elmnt.contentDocument.body.children[1] == null)
{
var mbed = document.createElement("EMBED");
var attsrc = document.createAttribute("src")
mbed.setAttributeNode(attsrc);
var atttyp = document.createAttribute("type")
mbed.setAttributeNode(atttyp);
var attwid = document.createAttribute("width")
mbed.setAttributeNode(attwid);
var atthei = document.createAttribute("height")
mbed.setAttributeNode(atthei);
elmnt.contentDocument.body.appendChild(mbed);
}
elmnt.contentDocument.body.children[1].src=elmnt.contentDocument.body.children[0].currentSrc + '\?nextbymodifiedtime'
elmnt.contentDocument.body.children[1].type='text/plain'
I know better than to think Window[3].Window[0]...... is valid. Can anyone throw me a clue how to address the DOM steps into the contentDocument of that Window[0]? Several more of those soft Windows from soft EMBED elements will eventually exist as I develop the code, so keep that in mind. Thank you!

elmnt.contentWindow[0].document.children[0].innerText does the trick

How can I get Excel to close when I'm done with it?

This is in a COM API Word AddIn. And yes normally Hans Passant's advice to let .NET clean everything up works.
But it is not working for the following case - and I have tested running normally (no debugger) and have narrowed it down to this specific code:
private Chart chart;
private bool displayAlerts;
private Application xlApp;
Chart chart = myShape.Chart;
ChartData chartData = chart.ChartData;
chartData.Activate();
WorkbookData = (Workbook)chartData.Workbook;
xlApp = WorkbookData.Application;
displayAlerts = xlApp.DisplayAlerts;
xlApp.Visible = false;
xlApp.DisplayAlerts = false;
WorksheetData = (Worksheet)WorkbookData.Worksheets[1];
WorksheetDataName = WorksheetData.Name;
WorksheetData.UsedRange.Clear();
// ... do a bunch of stuff including writing to the worksheet
xlApp.DisplayAlerts = displayAlerts;
WorkbookData.Close(true);
I think the problem is likely Word is giving me this workbook and so who knows what it is doing to instantiate Excel. But even after I exit Word, the Excel instance is still running.
Again, in Word (not Excel), accessing a chart object to update the data in the chart.

COM objects need to be released completely, else "orphaned" objects can keep an application in memory, even after the code that called it goes out-of-scope.
This particular case may be special (compared to other code you've used previously) due to using xlApp. By default, an Excel Application object is not needed or used when manipulating charts with the object model introduced in Office 2007 (I think it was). It's used in the code in the question in order to hide the Excel window, which is visible by default (and by design). But the object model isn't designed to handle cleaning that up - it assumes it isn't present...
In my tests, the object is released correctly when (referencing the code in the question):
All Excel objects are set to null in the reverse order they are instantiated, being sure to quit the Excel application before trying to set that object to null:
WorksheetData = null;
WorkbookData = null;
xlApp.Quit();
xlApp = null;
Then, C# has a tendency to create objects behind the scenes when COM dot-notation is used - these don't always get cleaned up (released) properly. So it's good practice to create an object for each level of the hierarchy being used. (Note: VBA doesn't have this problem, so any code picked up from a VBA example or the macro recorder needs to be re-worked in this respect.) From the code in the question this affects WorksheetData.UsedRange.Clear();
Excel.Range usedRng = WorksheetData.UsedRange;
usedRng.Clear();
usedRng = null;
And the standard clean up, to make sure everything is released at a predictable moment:
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
When things like this crop up, I always refer to ".NET Development for Office" by Andrew Whitechapel. That's really "bare bones" and some of it no longer relevant, given the changes to C# over the years (to make it "easier to use" in the way VB.NET is "easier"). But how COM interacts with .NET hasn't changed, deep down...

Remove PdfImageOject from a PDF

I have 1000th of PDF generated from emails containing .png (I am not owner of the generator). For some reasons, those PDF are very very slow to render with the Imaging system I am using (I am not the developer of that system and may not change it).
If I use iTextSharp and implement a IRenderListener to count the Images to be rendered, there are thousands per page (99% being 1 or 2 pixels only). But if I count the Images in the resources of the PDF, there are only a few (~tens).
I am counting the images in the resources, per page, with the code here after
var dict = pdfReader.GetPageN(currentPage)
PdfDictionary res = (PdfDictionary)PdfReader.GetPdfObject(dict.Get(PdfName.RESOURCES));
PdfDictionary xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
if (xobj != null)
{
foreach (PdfName name in xobj.Keys)
{
PdfObject obj = xobj.Get(name);
if ((obj.IsIndirect()))
{
PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
PdfName subtype = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
if (PdfName.IMAGE.Equals(subtype))
{
Count++
And my IRenderListener looks like this:
class ImageRenderListener : IRenderListener
{
public void RenderImage(iTextSharp.text.pdf.parser.ImageRenderInfo renderInfo)
{
PdfImageObject image = renderInfo.GetImage();
if (image == null) return;
var refObj = renderInfo.GetRef();
if (refObj == null)
Count++; // but why no ref ??
else
Count++;
}
I just started to learn about PDF specification and iTextSharp this evening, to analyze my PDF and understand what could be wrong... if I am correct, I see that many images to be rendered that are not referencing a resource (refObj == null) and that they are .png (image.streamContentType.FileExtension = "png"). So, I think those are the images making the rendering so slow...
For testing purpose, I would like to delete those images from the PDF but don't find how to proceed.
I only found code samples to remove image that are in the resources... but the images I want to delete are not :/
Is there any code sample somewhere to help me ? I did google on "iTextSharp remove object", etc... but there was nothing similar to my case :(

Let me start with the blunt observation that you have a shitty PDF.
The image you see when opening the PDF in a PDF viewer seems to be composed of several small 1- or 2-pixel images. The drawing operations to show these pixels one by one is suboptimal, no matter which imaging system you use: you are faced with a bad PDF.
In your first snippet, I see that you loop over all of the indirect objects stored in the the XObject resources of each page in search of images. You count these images, resulting in a number of Image XObjects stored in the PDF. If you add up all the Count values for all the pages, this number can be higher than the actual number of Image XObject stored in the PDF as you don't take into account that some images can be reused on different pages.
You do not count the inline images that are stored in the content streams. I'm biased. In the ISO committees for PDF, I'm on the side of the group of people saying that "inline images are evil" and "inline images should die". For now, we didn't succeed in getting rid of inline images, but we introduced some substantial limitations that should reduce the (ab)use of inline images in PDF that conform to ISO-32000-2 (the PDF 2.0 spec that is due in 2016).
You've already discovered that your PDF has inline images. Those are the images where refObj == null. They are not stored as indirect objects; they are stored inline, in the content stream of the page. As you can imagine based on my feelings towards inline images, I consider your PDF being a bad PDF for this reason (although it does conform to ISO-32000-1).
The presence of inline images is a first explanation why you have a different image count: when you loop over the indirect objects you only find part of the images. When you parse the document for images, you also find the inline images.
A second explanation could be the fact that the Image XObject are used more than once. That's the whole point of not using inline images. For instance: if you have an image that represents a logo that needs to be repeated on every page, one could use inline images. That would be a bad idea: the same image bytes would be present in the PDF as many times as there are pages. One should use an Image XObject. In this case, the image bytes of the logo are stored only once in an indirect object. There's a reference to this object from every page, so that the image bytes are stored in the document only once. In a 10-page document, you can see 10 identical images on 10 pages, but when looking inside the document, you'll find only one image that is referenced from every page.
If you remove Image XObjects by removing the indirect objects containing the image stream objects, you have to be very careful: are you sure you're not corrupting your document? Because there's a reference to the Image XObject in the content stream of your page. This reference points to an entry in the /XObjects entry of the page's /Resources. This /XObject references to the stream object with the image bytes. If you remove that indirect object without removing the references (e.g. from the content stream), you break your PDF. Some viewers will ignore those errors, but at some point in time some tool (or some body) is going to complain that your PDF is corrupt.
If you want to remove inline images, you have to parse all the content streams in your PDF: page content streams as well as Form XObject content streams. You have to rewrite all these streams and make sure all inline images are removed. That is: all objects that that start with the BI operator (Begin Image) and end with the EI operator (End Image).
That's a task for a PDF specialist who knows both iTextSharp and ISO-32000-1 inside-out. The solution to your problem probably doesn't fit into an answering window on StackOverflow.
I'm the original author of iText. From a certain point of view, iText is like a sharp knife. A sharp knife is a very good tool that can be used for many good things. However, you can also seriously cut your fingers when you're not using the knife in a correct way. I hope you'll be careful and that you're not going to create a whole series of damaged PDF files.
For instance: you assume that some of the files in the PDF are PNGs because iText suggests to store them as PNGs. However: PNG is not supported by ISO-32000-1, so your assumption that your PDF contains PNGs is wrong. I honestly worry when I see questions like yours.

Importing multiple .xlsx files taken from a network of files

I'm a Matlab novice and have been struggling with this particular task for weeks now.
I'm trying to create a nested for loop for uploading all of my data into Matlab. I need the code to go into the file for subject 1, go into the file for the first exercise, upload 3 files (EMG, Kinetic, and kinematic data), then go back and enter the file for the second exercise and upload the data in that file, repeat this for all 5 exercises, and then repeat this whole process for all 12 subjects. I have created code for uploading data from just one file using information I've read off the internet, but to create this programme to get all of the data from all these files has proven to be very difficult.
Below is the code I have currently written:
clear all;
Subjects = dir('C:\Users\pricep\Desktop\JuggaData');
Exercise = dir('C:\Users\pricep\Desktop\JuggaData\Subject1');
Trialdata = dir('C:\Users\pricep\Desktop\JuggaData\Subject1\*.xlsx');
subjectnum = numel(Subjects);
exercisenum = numel(Exercise);
datanum = numel(Trialdata);
myData = cell(datanum,1);
for k = 1:subjectnum
for j = 1:exercisenum
for i = 1:datanum
filename = sprintf(Trialdata(i).name);
myData{k} = importdata(filename);
end
end
end
No error message appears, but no data appears either.
As you can tell I'm a complete novice so any help would be greatly appreciated.

Verify that subjectnum, exercisenum and datanum are above zero, probably one of the files is zero causing nothing to happen. Besides this, there are other issues:
exercisenum is constant, which assumes that every subject has the same amount of exercises. Might be wrong.
dir returns the file names .. (parent directory) and . (current directory) as well. You don't filter these.
importdata is called with a relative file path, which is not possible.
myData = cell(datanum,1); allocates not enough space, assuming there is more than one subject and exercise.

How to delete a file while the application is running

I have developed a C# application, in the application the users choose a photo for each record. However the user should also be able to change the pre-selected photo with a newer one. When the user changes the photo the application first deletes the old photo from the application directory then copies the new photo, but when it does that the application gives an exception because the file is used by the application so it cannot be deleted while the application is running. Does any one have a clue how to sort this out? I appreciate your help
This is the exception
The process cannot access the file
'D:\My
Projects\Hawkar'sProject\Software\Application\bin\Debug\Photos\John
Smith.png' because it is being used by
another process.
//defining a string where contains the file source path
string fileSource = Open.FileName;
//defining a string where it contains the file name
string fileName = personNameTextBox.Text + ".png" ;
//defining a string which specifies the directory of the destination file
string fileDest = dir + #"\Photos\" + fileName;
if (File.Exists(fileDest))
{
File.Delete(fileDest);
//this is a picturebox for showing the images
pbxPersonal.Image = Image.FromFile(dir + #"\Photos\" + "No Image.gif");
File.Copy(fileSource, fileDest);
}
else
{
File.Copy(fileSource, fileDest);
}
imageIDTextBox.Text = fileDest;

First of all, you code is not good.
The new image is only copied if there is currently no image (else).
But if there is an old image, you only delete this image, but never copy the newer one (if).
The code should better look like this:
if (File.Exists(fileDest))
{
File.Delete(fileDest);
}
File.Copy(fileSource, fileDest);
imageIDTextBox.Text = fileDest;
This code should work, but if you are getting an exception that the file is already in use, you should check "where" you use the file. Perhaps you are reading the file at program start. Check all parts of your program you are accessing these user files if there are some handles open.

Thanks a lot for your help and sorry for the mistake in the code I just saw it, my original code is just like as you have written but I don't know maybe when I posted accidentally I've put like this. when the application runs there is a picturebox which the image of each record is shown, that is why the application is giving an exception when I want to change the picture because it has been used once by the picturebox , however I've also tried to load another picture to the picture box before deleting the original one but still the same. I've modified the above could if you want to examine it

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse