Set the text of a control via macro in Libre Office Draw - libreoffice

I have designed a document in Libre Office Draw, and now need to personalize it by filling certain controls (mainly labels) with names read from a text file.
Reading from a text file was trivial, but am facing difficulties in obtaining a reference to a control placed in a Libre Office Draw document; all the functions mentioned were related to controls placed on a dialog, and did not seem applicable in this case.
This might be the first lead into reaching my goal:
document = ThisComponent.CurrentController.Frame
dispatcher = createUnoService("com.sun.star.frame.DispatchHelper")
But then, how to find a control placed on 'document' named, say, "MyLabel1"? Once the label is filled, the document would need to be exported to PDF.
Thanks a lot!

To export a LO Draw document to PDF from Basic you can use the following code.
Sub ExportToPDF
sURL = convertToURL("d:\temp\lo_draw.pdf")
dim mFileType(0)
mFileType(0) = createUnoStruct("com.sun.star.beans.PropertyValue")
mFileType(0).Name = "FilterName"
mFileType(0).Value = "draw_pdf_Export"
thisComponent.storeToURL(sURL, mFileType())
End Sub
To figure out how to get access to the "labels" please provide a sample LO Draw document.

Related

Report generator only showing names of the objects instead of their content

I have a code generating a full report but I want to add an image in the header so I created a template and tried out. But now the word file only gives me the name of the objects in the report and not the actual content like this:
Do anyone know where does this problem comes from?
My code is
%% init
import mlreportgen.report.*
import mlreportgen.dom.*
rpt = Document('Report','docx', 'template.dotx');
moveToNextHole(rpt)
%% chapter
ChapterRegression = Chapter;
ChapterRegression.Title = 'Summary';
append(rpt, ChapterRegression)
close(rpt)
rptview(rpt)
My dotx template just have one Rich Text Content Control hole.
Thank you in advance!
My question has been answered on Mathwork forum. But let me sum it up here as well:
I am using the Document object and we can not add Report API objects like TitlePage to a DOM Document. Instead and that was what I was doing at first I should use mlreportgen.report.Report of the DOM Document. You can add both DOM and Report API objects to a Report object.
Then to use all the functions relative to the Document object like moveToNextHole() the Report object has an underlying DOM Document object that can be used.
Use it like this for example:
doc = rpt.Document;
doc.moveToNextHole();

iTextSharp extracts wrapped cell contents into new lines - how do you identify to which column a given wrapped piece of data belongs now?

I am using iTextSharp to extract data from pdfs.
I stumbled across the following problem, depicted by the scenario below:
I created a sample excel file to illustrate. Here is what it looks like:
I convert it to a pdf, using one of the many free online converters available out there, which generates a pdf looking like (when I generated the pdf I did not apply the styling to the excel):
Now, using iTextSharp to extract the data from the pdf, returns me the following string as the data extracted:
As you can see, wrapped cell data generate new lines, where each wrapped piece of data separated by a single white space.
The problem: how does one identify, now, to which column a given piece of wrapped data belongs to ? If only iTextSharp preserved as many white spaces as columns...
In my example - how can I identify to which column does 111 belong ?
Update 1:
A similar problem occurs whenever a field has more than one word (i.e., contains white spaces). For example, considering the 1st line of the sample above:
say it looked like
---A--- ---B--- ---C--- ---D---
aaaaaaa bb b cccc
iText again would generate the extraction for this one as:
aaaaaaa bb b cccc
Same problem here, in having to determine the borders of each column.
Update 2:
A sample of the real pdf file I am working with:
This is how the pdf data looks like.
In addition to Chris' generic answer, some background in iText(Sharp) content parsing...
iText(Sharp) provides a framework for content extraction in the namespace iTextSharp.text.pdf.parser / package com.itextpdf.text.pdf.parser. This franework reads the page content, keeps track of the current graphics state, and forwards information on pieces of content to the IExtRenderListener or IRenderListener / ExtRenderListener or RenderListener the user (i.e. you) provides. In particular it does not interpret structure into this information.
This render listener may be a text extraction strategy (ITextExtractionStrategy / TextExtractionStrategy), i.e. a special render listener which is predominantly designed to extract a pure text stream without formatting or layout information. And for this special case iText(Sharp) additionally provides two sample implementations, the SimpleTextExtractionStrategy and the LocationTextExtractionStrategy.
For your task you need a more sophisticated render listener which either
exports the text with coordinates (Chris in one of his answers has provided an extended LocationTextExtractionStrategy which can additionally provide positions and bounding boxes of text chunks) allowing you in additional code to analyse tabular structures; or
does the analysis of tabular data itself.
I do not have an example for the latter variant because generically recognizing and parsing tables is a whole project in itself. You might want to look into the Tabula project for inspiration; this project is surprisingly good at the task of table extraction.
PS: If you feel more at home with trying to extract structured content from a pure string representation of the content which nonetheless tries to reflect the original layout, you might try something like what is proposed in this answer, a variant of the LocationTextExtractionStrategy working similar to the pdftotext -layout tool; only the changes to be applied to the LocationTextExtractionStrategy are shown there.
PPS: Extraction of data from very specific PDF tables may be much easier; for example have a look at this answer which demonstrates that after some PDF analysis the specific way a given table is created might give rise to a simple custom render listener for extracting the table data. This can make sense for a single PDF with a table spanning many many pages like in the case of that answer, or it can make sense if you have many PDFs identically created by the same software.
This is why I asked for a representative sample file in a comment to your question
Concerning your comments
Still with the pdf example above, both with an implementation from scratch of ITextExtractionStrategy and with extending LocationExtractionStrategy, I see that each RenderText is called at the following chunks: Fi, el, d, A, Fi, el, d... and so on. Can this be changed?
The chunks of text you get as separate RenderText calls are not separated by accident or some random decision of iText. They are the very strings drawn separately in the page content!
In your sample "Fi", "el", "d", and "A" come in different RenderText calls because the content stream contains operations in which first "Fi" is drawn, then "el", then "d", then "A".
This may sound weird at first. A common cause for such torn up words is that PDF does not use the kerning information from fonts; to apply kerning, therefore, the PDF generating software has to insert tiny forward or backward jumps between characters which should be farther from or nearer to each other than without kerning. Thus, words often are torn apart between kerning pairs.
So this cannot be changed, you will get those pieces, and it is the job of the text extraction strategy to put them together.
By the way, there are worse PDFs, some PDF generators position each and every glyph separately, foremost such generators which predominantly build GUIs but can as a feature automatically export GUI canvasses as PDFs.
I would expect that in entering the realm of "adding my own implementation" I would have control over how to determine what is a "chunk" of text.
You can... well, you have to decide which of the incoming pieces belong together and which don't. E.g. do glyphs with the same y coordinate form a single line? Or do they form separate lines in different columns which just happen to be located next to each other.
So yes, you decide which glyphs you interpret as a single word or as content of a single table cell, but your input consists of the groups of glyphs used in the actual PDF content stream.
Not only that, in none of the interface's methods I can "spot" how/where it deals with non-text data/images - so I could intercede with the spacing issue (RenderImage is not called)
RenderImage will be called for embedded bitmap images, JPEGs etc. If you want to be informed about vector graphics, your strategy will also have to implement IExtRenderListener which provides methods ModifyPath, RenderPath and ClipPath.
This isn't really an answer but I needed a spot to show some things that might help you understand things.
First "conversion" from Excel, Word, PowerPoint, HTML or whatever to PDF is almost always going to be a destructive change. The destructive part is very important and it happens because you are taking data from a program that has very specific knowledge of what that data represents (Excel) and you are turning it into drawing commands in a very generic universal format (PDF) that only cares about what the data looks like, not the data itself. Unless the data is "tagged" (and it almost never is these days still) then there is no context for the drawing commands. There are no paragraphs, there are no sentences, there are no columns, rows, tables, etc. There's literally just draw this letter at x,y and draw this word at a,b.
Second, imagine you Excel file had that following data and for some reason that last column was narrower than the others when the PDF was made:
Column A | Column B | Column
C
Data #1 Data #2 Data
#3
You and I have context so we know that the second and fourth lines are really just the continuation of the first and third lines. But since iText doesn't have any context during extraction it doesn't think like that and it sees four lines of text. In fact, since it doesn't have context it doesn't even see columns, just the lines themselves.
Third, although a very small thing you need to understand that you don't draw spaces in PDF. Imagine the three column table below:
Column A | Column B | Column C
Yes
If you extracted that from a PDF you'd get this data:
Column A | Column B | Column C
Yes
Inside the PDF the word "Yes" will be just drawn at a certain x coordinate that you and I consider to be under the third column and it won't have a bunch of spaces in front of it.
As I said at the beginning, this isn't much of an answer but hopefully it will explain to you the problem that you are trying to solve. If your PDF is tagged then it will have context and you can use that context during extraction. Context isn't universal, however, so there usually isn't just a magic "insert context" checkbox. Excel actually does have a checkbox (if I remember correctly) to make a tagged PDF during export and it ultimately creates a tagged PDF using HTML-like tags for tables. Very primitive but it will works. However it will be up to you to parse this context.
Leaving here an alternative strategy for extracting the data - that does not solve the problem of who are spaces treated/can be treated, but gives you somewhat more control over the extraction by specifying geometric areas you want to extract text from. Taken from here.
public static System.util.RectangleJ GetRectangle(float distanceInPixelsFromLeft, float distanceInPixelsFromBottom, float width, float height)
{
return new System.util.RectangleJ(
distanceInPixelsFromLeft,
distanceInPixelsFromBottom,
width,
height);
}
public static void Strategy2()
{
// In this example, I'll declare a pageNumber integer variable to
// only capture text from the page I'm interested in
int pageNumber = 1;
var text = new StringBuilder();
List<Tuple<string, int>> result = new List<Tuple<string, int>>();
// The PdfReader object implements IDisposable.Dispose, so you can
// wrap it in the using keyword to automatically dispose of it
using (var pdfReader = new PdfReader("D:/Example.pdf"))
{
float distanceInPixelsFromLeft = 20;
//float distanceInPixelsFromBottom = 730;
float width = 300;
float height = 10;
for (int i = 800; i >= 0; i -= 10)
{
var rect = GetRectangle(distanceInPixelsFromLeft, i, width, height);
var filters = new RenderFilter[1];
filters[0] = new RegionTextRenderFilter(rect);
ITextExtractionStrategy strategy =
new FilteredTextRenderListener(
new LocationTextExtractionStrategy(),
filters);
var currentText = PdfTextExtractor.GetTextFromPage(
pdfReader,
pageNumber,
strategy);
currentText =
Encoding.UTF8.GetString(Encoding.Convert(
Encoding.Default,
Encoding.UTF8,
Encoding.Default.GetBytes(currentText)));
//text.Append(currentText);
result.Add(new Tuple<string, int>(currentText, currentText.Length));
}
}
// You'll do something else with it, here I write it to a console window
//Console.WriteLine(text.ToString());
foreach (var line in result.Distinct().Where(r => !string.IsNullOrWhiteSpace(r.Item1)))
{
Console.WriteLine("Text: [{0}], Length: {1}", line.Item1, line.Item2);
}
//Console.WriteLine("", string.Join("\r\n", result.Distinct().Where(r => !string.IsNullOrWhiteSpace(r.Item1))));
Outputs:
PS.: We are still left with the problem of how to deal with spaces/non text data.

PDF Image Flattening

I am trying to flatten bitmap images in a PDF document as opposed to keep separatelayers. For example, let's say I have a document with two square images that are partially overlaying each other. I would like to merge them so that the user cannot individually select one of the squares to copy it out of the document. They'll be able to select both, I would think, but I don't want them to be able to isolate one of them. My client has a more complicated reason for wanting this restriction, but this is the simplest explanation. I would like to solve this with iTextSharp, but another product would be fine with me. I have used iTextSharp for form flattening, but I can't figure out how to flatten images. Thank you.
Edit
I realized another solution might just be to prevent selection within the document, which would hopefully prevent copying and pasting. I would guess that all document readers would not have to abide by my command to prevent selection, but as long as Adobe Reader (and maybe Foxit Reader) do abide by it, that should be good enough.
As you say you can use other products, I'll show a way to do merge layers using ABCpdf
Dim oDoc As New WebSupergoo.ABCpdf7.Doc
Using oDoc
oDoc.Read("D:\example.pdf")
Dim iTotal As Integer = oDoc.PageCount()
For i As Integer = 1 To iTotal
oDoc.PageNumber = 1
oDoc.Rendering.Save("D:\" & i & ".JPG")
oDoc.Delete(oDoc.Page)
Next
For i As Integer = 1 To iTotal
oDoc.AddPage()
oDoc.AddImage("D:\" & i & ".JPG")
oDoc.Flatten()
Next
oDoc.Save("D:\example_abc.pdf")
End Using
Original:
https://encodable.com/cgi-bin/filechucker.cgi?action=landing&path=/SOabcpdf/&file=example.pdf
Procesed:
https://encodable.com/cgi-bin/filechucker.cgi?action=landing&path=/SOabcpdf/&file=example_abc.pdf
You have to change the quality, reading ABCpdf Help.
http://www.websupergoo.com/helppdfnet/source/4-examples/19-rendering.htm
Edit:
As you don't want only merge image, but the copy protection, ABcPDF has this:
Dim oDoc As New WebSupergoo.ABCpdf7.Doc
Using oDoc
oDoc.Read("D:\example.pdf")
oDoc.Encryption.Type = 2
oDoc.Encryption.CanCopy = False
oDoc.Encryption.OwnerPassword = "password"
oDoc.Save("D:\example_abc.pdf")
End Using

Add Hyperlinks to Powerpoint from Matlab using ActiveX

Does anyone know how I can use matlab and activeX to add hyperlinks to powerpoint files?
There are two helpful posts on MatlabCentral, but they don't give me everything I need. The first explains how to create a powerpoint file using matlab: "Create Powerpoint Files with Matlab"
and the second shows how to use ActiveX to insert hyperlinks into Excel:"Add Hyperlink in Excel from Matlab" (See the second answer by Kaustubha)
I tried to merge the two answers. In powerpoint the slide objects have the .Hyperlinks attribute, but there is no .Add method for .Hyperlinks as there is in Excel.
Here is the code I have so far. I would like the link to appear in a table:
ppt = actxserver('PowerPoint.Application');
op = invoke(ppt.Presentations,'Add');
slide = invoke(op.Slides,'Add',1,1);
sH = op.PageSetup.SlideHeight; % slide height
sW = op.PageSetup.SlideWidth; % silde width
table = invoke(slide.Shapes, 'AddTable', 1, 3, 0.05*sW, sH*.2, 0.9*sW, sH*.60);
table.Table.Cell(1,1).Shape.TextFrame.TextRange.Text = 'www.stackoverflow.com';
% Add hyperlink to text in table using ActiveX
% slide.Hyperlinks - this exists but there is no add feature
invoke(op,'Save');
invoke(op,'Close');
invoke(ppt,'Quit');
delete(ppt);
Slide objects have a .Hyperlinks collection that you can examine to learn how many hyperlinks there are, where they point and so forth. To add a hyperlink you have to work with individual shapes or text ranges.
Sub AddAHyperlink()
Dim oSh As Shape
' As an example we're going to add hyperlinks to the
' currently selected shape.
' You could use any other method of getting a reference to
' a shape that you like, however:
Set oSh = ActiveWindow.Selection.ShapeRange(1)
' Add a hyperlink to the shape itself:
With oSh
.ActionSettings(1).Hyperlink.Address = "http://www.pptfaq.com"
' you can also add a subaddress if required
End With
' Or add the hyperlink to the text within the shape:
With oSh.TextFrame.TextRange
.Text = "Hyperlink me, daddy, 8 to the click"
.ActionSettings(1).Hyperlink.Address = "http://www.pptools.com"
End With
End Sub
To access the text within a table cell you'd do as you're already doing:
table.Table.Cell(1,1).Shape.TextFrame.TextRange
or
table.Table.Cell(1,1).Shape
or
Set oSh = table.Table.Cell(1,1).Shape
then use the same code as I've shown above
Not sure if this is still an active request from anyone and potentially the program capability has changed; but for anyone else that might be interested it does appear to be possible to add links.
Here is an example from some code I wrote...The slide number/item references will need to get updated for your task but I think it covers the key points. In this example the goal was to add a hyperlink to another slide within the presentation.
hyperlink_text = sprintf('%0.0f, %0.0f, %s', Presentation.Slides.Range.Item(3+i).SlideID, Presentation.Slides.Range.Item(3+i).SlideIndex,Presentation.Slides.Range.Item(3+i).Shapes.Item(2).TextFrame.TextRange.Text);
The hyperlink text will look something like this, as a text string. '250, 4, Slide Title'
Presentation.Slides.Range.Item(3).Shapes.Item(2).Table.Cell(1+i,1).Shape.TextFrame.TextRange.ActionSettings.Item(1).Hyperlink.SubAddress = hyperlink_text;
For internal links the Hyperlink.Address field can be left blank.
It appears that the only thing that was missing from the prior answers was that when using Matlab to execute the powerpoint VBA you need to use ActionSettings.Item(1) to refer to the mouseclick action instead of ActionSettings(1) that was shown from basic powerpoint VBA.
Hopefully this can be helpful for anyone else still looking.
Note that I am currently using Matlab R2017A and Powerpoint in Microsoft 365 ProPlus

How to add content control in a Word 2007 document using OpenXML

I want to create a word 2007 document without using object model. So I would prefer to create it using open xml format. So far I have been able to create the document. Now I want to add a content control in it and map it to xml. Can anybody guide me regarding the same???
Anoop,
You said that you are able to creat the document using OpenXmlSdk. With that assumption, you can use the following code to create the content control to add to the Wordprocessing.Body element of your Document.
//praragraph to be added to the rich text content control
Run run = new Run(new Text("Insert any text Here") { Space = StaticTextConstants.Preserve });
Paragraph paragraph = new Paragraph(run);
SdtProperties sdtPr = new SdtProperties(
new Alias { Val = "MyContentCotrol" },
new Tag { Val = "_myContentControl" });
SdtContentBlock sdtCBlock = new SdtContentBlock(paragraph);
SdtBlock sdtBlock = new SdtBlock(sdtPr, sdtCBlock);
//add this content control to the body of the word document
WordprocessingDocument wDoc = WordprocessingDocument.Open(path, true); //path is where your word 2007 file is
Body mBody = wDoc.MainDocumentPart.Document.Body;
mBody.AppendChild(sdtBlock);
wDoc.MainDocumentPart.Document.Save();
wDoc.Dispose();
I hope this answers a part of your question. I did not understand what you ment by "Map it to XML". Did you mean to say you want to create CustomXmlBlock and add the ContentControl to it?
Have a look for the Word Content Control Toolkit on www.codeplex.com.
Here is a very brief explanation on how to do what you are attempting.
You need to have access to the developer tab on the Word ribbon. To get this working click on the Office (Round thingy) in the top left hand corner and Select Word Options at the bottom of the menu. On the first options page there is a checkbox to show the developer toolbar.
Use the developer toolbar to add the Content controls you want on the page. Click the properties button in the Content controls section of the developer bar and set the name and tag properties (I stick to naming the name and tag fields with the same name).
Save and close the word document.
Open the Content control toolkit and then open your document with the toolkit. Use the left hand pain to create some custom xml to link to your controls.
Now use the bind view to drag and drop the mappings between your custom xml and the custom controls that are displayed in the right panel of the toolkit.
You can use the openxml sdk 1.0 or 2.0 (still in ctp) to open your word document in code and access the custom xml file that is contained as part of the word document.
If you want to have a look at how your word document looks as xml. Make a copy of your word document and then rename it to say "a.zip". Double click on the zip file and then navigate the folder structure. The main content of the word document is held under the word folder in a file called "document.xml". The custom xml part of the document is held under the customXml folder and is generally found in the file named "item1.xml".
I hope this brief explanation get you up and running.