Reading and writing the TOC for a PDF - itext

My requirement is to prefix a PDF generated by some other process with a single 'cover page'.
I've written a relatively simple Ant task to concatenate a list of PDF documents, and this works well, but the TOC is not preserved in the process. By TOC I mean ..
Ideally, both TOCs would be preserved, and adjusted for their new offsets, but I can see no means of reading or writing the TOC (the left hand one) in the iText API. Can anyone?
M.

I'm the author of the book about iText and in the context of this book, I've written an example that explains how to concatenate PDFs while preserving the bookmarks (aka the outline tree). You can find this example here. As you can see, you need the SimpleBookmark object to extract the bookmarks from the existing documents. Make sure that you shift the page numbers, or your links will point at the wrong pages. Finally add the new ArrayList<HashMap<String, Object>> to the PdfCopy object using the setOutlines() method.
Once again, we've found proof that reading documentation saves time ;-)

Related

Is it possible to use data frame in r-exams?

I would like to paste the data-frame from the R environment to the latex part (question or solution part) when creating exercises in r-exams. Later the exercises will be imported into Moodle. Is that possible in r-exams? We saw it is possible when the object is matrix object via $\Sexpr{toLatex(matrix_obj)}$. But a similar way does not seem to work with the data-frames. Thank you!
A data.frame would usually be included as a {tabular} in LaTeX and there are various packages for automatic conversion like xtable or using the function kable() in knitr. For PDF output this also works nicely including all vertical and/or horizontal lines included in the table. However, for HTML-based output (as for Moodle) the table as such is converted correctly but without any lines.
An overview of a couple of solutions is available as:
Different copies of question with table for Moodle with R-Exams
Moreover, Kenji Sato has proposed to inject some dedicated CSS code to handle the table formatting in HTML. We are currently working on some automated way of including this in R/exams:
https://www.kenjisato.jp/en/post/2020/07/moodle-bordered-table/

How do you get around Cloned Templates losing Element References?

I noticed that hyperHTML preserves references I make to elements:
let div = document.createElement("div");
div.textContent = "Before Update";
hyperHTML.bind(document.body)`static1 - ${div} - static2`;
div.textContent = "After Update";
Above will produce a page that says:
static1 - After Update - static2
It is my understanding that hyperHTML ultimately clones an HTML <tempate> element to render the final output. However, don't you typical lose references when cloning an HTML template (like the variable "div" in the example above)?
Therefore, on the initial render, does hyperHTML somehow replace cloned elements with their originals after cloning the HTML template?
Here's how I think it works:
Create an HTML Template of the original template literal while
replacing all interpolations with comments.
Clone the html template with comments left in.
Make elements or document fragments out of each interpolation originally recieved
Replace each comment in the clone with its processed interpolation.
Is this correct?
I am not sure what is the question here, but there is a documentation page, and various examples too to understand how to use hyperHTML, which is not exactly in the way you are using it.
In fact, there's no need to have any reference there because hyperHTML is declarative, so you'd rather write:
function update(text) {
var render = hyperHTML.bind(document.body);
render`static1 - <div>${text}</div> - static2`;
}
and call update("any text") any time you need.
Here's how I think it works ... Is this correct?
No, it's not. hyperHTML doesn't clone anything the way you described, it associates once per unique template tag a sanitized version to the output and finds out all interpolated holes in it.
The part of the library that does this is called domtagger, and the mapping per template literal is based on the standard fact that these are unique per scope:
const templates = [];
function addTemplate(template, value) {
templates.push(template);
return template.join(value);
}
function asTemplate(value) {
return addTemplate`number ${value}!`;
}
asTemplate(1);
asTemplate(2);
asTemplate(Math.random());
templates[0] === templates[1]; // true
templates[1] === templates[2]; // true
// it is always the same template object!
After that, any other element using once that very same tag template will have a clone of that fragment with a map to find holes once and some complex logic to avoid replacing anything that's already known, being that text, attributes, events, or any other kind of node.
hyperHTML never removes comments, it uses these as pin and then uses domdiff to eventually update nodes related to these pins whenever there's a need to update anything.
Domdiff is a vDOM-less implementation of the petit-dom algorithm, which in turns is based on E.W Myers' "An O(ND) Difference Algorithm and Its Variations" paper.
Whenever you have DOM nodes in the holes, hyperHTML understand that and fill these holes with those nodes. If you pass repeatedly the same node, hyperHTML won't do anything 'cause it's full of algorithm and smart decisions, all described in the documentation, to obtain best performance out of its abstraction.
All these things, and much more, normalized for any browser out there, makes hyperHTML weight roughly 7K once minified and gzipped, bit it also offers:
Custom Elements like hooks through onconnected/disconnected listeners
lightweight components through hyperHTML.Component
SVG manipulation as content or via wire
easy Custom Elements definition through HyperHTMLElement class
As summary, if you need these simplifications and you don't want to reinvent the wheel, I suggest you give it a better try.
If you instead are just trying to understand how it works, there's no need to assume anything because the project is fully open source.
So far, all I've read from your questions here and there, is that you just believe to understand how it works so I hope in this reply I've put together all the missing pieces you need to fully understand it.
Do you want to write your own lit/hyperHTML library? Go ahead, feel free to use the domtagger or the domdiff library too, few others are already doing the same.

Most efficient way to change the value of a specific tag in a DICOM file using GDCM

I have a need to go through a set of DICOM files and modify certain tags to be current with the data maintained in the database of an external system. I am looking to use GDCM. I am new to GDCM. A search through stack overflow posts demonstrates that the anonymizer class can be used to change tag values.
Generating a simple CT DICOM image using GDCM
My question is if this is the best use of the GDCM API or if there is a better approach for changing the values of individual tags such as patient name or accession number. I am unfamiliar with all of the API options but have a link to the API documentation. It looks like the DataElement SetValue member could be used, but it doesn't appear that there is a valid constructor for doing this in the Value class. Any assistance would appreciated. This is my current approach:
Anonymizer anon = new Anonymizer();
anon.SetFile(myFile);
anon.Replace(new Tag(0x0010, 0x0010), "BUGS^BUNNY");
Quite late, but maybe it would be still useful. You have not mention if you write in C++ or C#, but I assume the latter, as you do not use pointers. Generally, your approach is correct (unless you use System.IO.File instead of gdcm.File). The value (second parameter of Replace function) has to be a plain string so no special constructor is needed. You should probably start with doxygen documentation of gdcm, and there is especially one complete example. It is in C++, but there should be no problems with translation.
There are two different ways to pad dicom tags:
Anonymizer
gdcm::Anonymizer anon;
anon.SetFile(file);
anon.Replace(gdcm::Tag(0x0002, 0x0013), "Implementation Version Name");
//Implementation Version Name
DatsElement
gdcm::Attribute<0x0018, 0x0088> ss;
ss.SetValue(10.0);
ds.Insert(ss.GetAsDataElement());

iTextSharp StyleSheet equivalent

When using the HTMLWorker to covert HTML into PDF elements we can provide a StyleSheet instance that can be used to style the generated elements.
Unfortunately the CSS-to-PDF conversion is quite limited (it doesn't seem possible to indent a list for example) so I wondered if there is an equivalent iTextSharp "PDF Stylesheet" we can declare, which will be used when elements are written to the document?
Alternatively are there any events we can hook into in order to walk the element tree and apply our styles, before the document is written?
as documented on many places (especially on SO), HTMLWorker is deprecated in favor of XML Worker. XML Worker reads CSS from file, from the header, inline, etc... Read the documentation for more info about the Java version. For the C# version, take a look at the test apps.

How does an addin retrieve and process data from the AddinRoot?

I'm planning to use Mono.Addins in my C#/.NET project.
For that, I've read the Programming Guide and Reference Manual presented in codeplex.com, downloaded the latest version of source code from github.com, and successfully built all the samples included in the source package. However, whether the online documents or sample projects, all try to demonstrate how to extend an AddinHost by creating new instances of an ExtensionNode. There seems to miss something about how to retrieve and process data from the AddinHost.
For example, say I have a text editor, which processes RTF document, and I want to provide the possibility for addins to find/replace the document with its own way (For example, Regex / Forward / Backward / Whole document / Current Line...), so the addin need to get the content from AddinHost first. This is the question I need an answer for.
Any ideas?
If I understood well you have to maintain a reference to the RTF document, I think that providing it inside an initialization code for your plugin could be a way to obtain it.
Or you can have a sort of "IFindReplaceAddin" with a method, say "FindReplace", that accepts the RTF document as argument and returns the elaborated document.