FOP: unwanted linebreaks in nested <fo:block> elements - element

I have a Problem conecerning FOP. I am using CKEditor to create an XSL-Fo String and transform that into a PDF using FOP. Everything works fine, but when using nested blocks, I experience linebreaks in the PDF that should not exist.
XSF-FO:
...<fo:block>ONE<fo:block font-weight="bold">TWO</fo:block><fo:block font-style="italic">THREE</fo:block><fo:block text-decoration="underline">vier</fo:block><fo:block class="linebreak"/></fo:block>...
(The XSL-FO is not complete, the root element and other things are missing. But as other things things like tables and lists are working just fine, there should not be any errors in the document structure.)
The resuling PDF looks somewhat like this:
ONE
TWO
THREE
I just have no idea why?
Thx in advance for an help :)

fo:block always takes a whole line. If you need few items on line you can use fo:inline (to change font, colour etc.) or tables (if you also need to control width of items)

Related

iText - Manipulate existing PDF - add dashes to end of each paragraph

I need to manipulate existing PDF in iText to add dashes to the end of each paragraph. Something like this:
I would make this in Word with tab leaders.
Is this possible to do with iText on an existing document.
Any help would be greatly appreciated.
Thanks!
Edit for clarifications
iText version is 5.5.x, but I guess we can upgrade it if the task would be easier with newer version.
There could be some paragraph that do not need dashes, but I have some control of the original PDF. It is assembled from different system and I could add some kind of markers to the paragraphs that need leaders (ie. I can add text like "~tab~" at the end of such paragraphs).
At the moment the documents that need this kind of editing have headers and footer, nothing but the text and one column with justified alignment.
Edit for even more clarification
I can even (by configuration) set where the dashes has to end (ie. at 10px) for specific document. We know every document type (and its structure) that needs to be manipulated this way.
This is insanely hard.
You should think of a PDF document as a container of instructions, rather than a WYSIWYG format. So finding out where lines are (let alone paragraphs) is very hard.
High level plan:
use IEventListener to process events from the PDF being parsed
look out for TextRenderInfo events, store them
sort TextRenderInfo events to ensure your list of events is in logical reading order.
merge items in your list if they appear on the same line and are less than a certain distance apart (for instance the distance of 3 spaces in the font specified by TextRenderInfo)
Now you should have lines
Merge lines if they appear in close vertical proximity of eachother and they overlap horizontally. How close they should be, and how much they overlap is something you'll have to figure out, and might differ from page to page, and document to document.
now you should have paragraphs
Figure out the bounding box of each paragraph. Or more accurately, the convex hull. There is a good algorithm for this called the gift-wrapping algorithm.
Now you can simply insert lines by inspecting your convex hull. This is the easy step.
If you can insert markers, you can easily do this using iText7. iText7 has an implementation of IEventListener that allows you to look for regular expressions within a PDF document. It returns the locations where the regular expression was found. If you can ensure your markers always satisfy some kind of regular expression, you can easily look for them, get their coordinates, and insert a line at the calculated position.
Of course, then you need to get rid of the marker text.
For that you can use pdfSweep.

Accurately Reading Document Content with Position Using Open XML (Word)

I have a need to retrieve a string which matches perfectly the content of a word document. That is to say, in position 1000 of this string, I should find exactly the text at position 1000 in the document.
We have been through various iterations of reading in the document context text and adjusting for field codes/tables/pictures/inline shapes etc by padding in the right places. This approach does work (well) but we want to move towards Open XML instead for speed.
We have Power Tools for Open Xml installed, and have been looking at ways to recreate this string using Open Xml. We can get all the text by going through the runs (as per Eric White's blogs), but we also need everything else. \r's, \t's etc. I see things like "TabChar" in runs, and wdfldChar, but I am unclear how to use this information to generically get what we want.
For example, "TabChar" in our string should be \t. We must need to interpret wdfldChar begin, separate, end in a certain way (maybe by adding spaces). The problem is that we don't want to have to find every possibility and code them
[If run = "TabChar" append."\t" etc] a) because it's inefficient, b) it is unsafe.
Can anyone help with a method to reproduce this string with complete accuracy?
Thanks

jasper text field getting truncated

I'm having a font issue with my jasper report where one of my more wordy text fields (the last one in a detail band) is getting cut off in the PDF and PDF Preview but not in the Internal Preview.
e.g.
Internal Preview:
Here is a fake description. It fits
perfectly, fitting just in the lines.
PDF Preview
Here is a fake description. It
fits perfectly, fitting just in the
Jasper is (seemingly) using some algorithm to figure out how tall the field should be, my text is barely fitting, then when the PDF is generated the text wraps and disappears on the next line.
I'm not using custom fonts (just the default/implicit "SansSerif"), and not using any custom styles beyond bold/italic. This behavior is demonstrable in both iReport's PDF Preview and my code's generated PDF on Windows and MacOS (Linux likely still has the issue, but my example text didn't exhibit the behavior on Ubuntu).
I've played with Stretch Type, Position Type, and Stretch with Overflow, as well as moved this text field to its own band but none fixes this bug (and several of them cause others).
I've had luck changing the font to the other built-in fonts, but this just tells me my example doesn't work for that particular font, not that I've fixed the bug.
Any tips would be greatly appreciated.
Update 1
I tried upgrading from Jasper Reports 5.2.0 to 6.2.0 and Jasper Fonts 4.0.0 to 6.0.0... no change.
Update 2
Tried editing my src/main/resources/jasperreports_extension.properties and adding
net.sf.jasperreports.export.pdf.force.linebreak.policy=true
... no change.
(Notably though in my use-case I can't use isStretchWithOverflow="true", so this may be why it didn't work.)
Update 3
I tried embedding the font by editing src/main/resources/jasperreports_extension.xml and adding:
net.sf.jasperreports.extension.registry.factory.fonts=net.sf.jasperreports.engine.fonts.SimpleFontExtensionsRegistryFactory
net.sf.jasperreports.extension.simple.font.families.arialFontFamily=fonts/customFontFamilies.xml
customFontFamilies.xml:
<?xml version="1.0" encoding="UTF-8"?>
<fontFamilies>
<fontFamily name="ArialEM">
<normal><![CDATA[fonts/Arial/Arial.ttf]]></normal>
<bold><![CDATA[fonts/Arial/Arial Bold.ttf]]></bold>
<italic><![CDATA[fonts/Arial/Arial Italic.ttf]]></italic>
<boldItalic><![CDATA[fonts/Arial/Arial Bold Italic.ttf]]>/boldItalic>
<pdfEncoding><![CDATA[Cp1252]]></pdfEncoding>
<pdfEmbedded><![CDATA[true]]></pdfEmbedded>
</fontFamily>
</fontFamilies>
... no dice. (Though this did help with an issue where Firefox's PDF renderer wouldn't render bold fonts.)
Update 4
I noticed that in all the test-cases I was able to create that the first line was blank, so I changed the particular cell to be vertical-align top, which worked, but of course made that one cell misalign when there wasn't much text in it.
Scrapped that as a solution, but may work for someone.
Update 5
At this point hopefully it's clear I've tried the "real" solutions and watched them all die a horrible death. Thus, we enter the realm of the hack solution. First I tried #wmmci's solution, but his answer changes the height of my box (due to it being dynamically calculated by Dynamic Jasper). I noticed that all of the examples I could create involved intra-word periods in the string, e.g. "foo...bar". That might not be your case, but it was for me. So I injected a "hair space" ( ) after intra-word spaces.
This is obviously not a real solution, just a temporary work-around until I'm able to find more examples of the bug.
Update 6
I checked and I don't have #KarolisÅ arapnickis's issue with the printOrder. Ah well. I shall soldier on. ;-)
I had same issue and I tried all possible configurations - didn't work. Finally as a workaround I appended a new line character to the field and it worked.
Something like this: $F{description} + "\n"
Had the same issues with text being truncated and nothing seemed to work.
luckily I found out that my root xml element had the following attribute:
printOrder="Horizontal"
Removing it solved my issues.
Well, i'm not sure if you're struggeling with the exact same problem i was.
But my solution was setting the property "net.sf.jasperreports.print.keep.full.text" of the field to "true".
In my case, I had really long text in a single text field. Adding a line break would solve the issue for some cells, but not for the really long ones that spanned pages. To finally solve it, I had to set the text field to stretch to RELATIVE_TO_BAND_HEIGHT. Previously, it was set to RELATIVE_TO_TALLEST_OBJECT. My guess is that, RELATIVE_TO_TALLEST_OBJECT was being calculated incorrectly (lower than needed).
This did the trick:
textField.setStretchType( StretchTypeEnum.RELATIVE_TO_BAND_HEIGHT );
Seems like the only working solution is to put some text formating signs as #wmmcii said. Then another text renderer is used (discused here). However the new line \n is not ideal, because there is unwanted influence to the output doc. Better solution seems to put tab sign \t to the end of the line. To avoid additional problems when using Horizontal Alignment = Justified, put also a space prior to tab sign. For example:
$F{my_text} + " \t"

Is there an option to control output page orientation (using knitr->pander->pandoc->docx)

I am playing with Tal's intro to producing word tables with as little overhead as possible in real world situations. (Please see for reproducible examples there - Thanks, Tal!) In real application, tables are to wide to print them on a portrait-oriented page, but you might not want to split them.
Sorry if I have overlooked this in the pandoc or pander documentation, but how do I control page orientation (portrait/landscape) when writing from R to a Word .docx file?
I maybe should add tat I started using knitr+markdown, and I am not yet familiar with LaTex syntax. But I'm trying to pick up as much as possible while getting my stuff done.
I am pretty sure the docx writer has no section breaks implemented, also as far as I understand --reference-docx allows for customizing styles and not the page layout (but I might also be wrong here), this is from pandocs guide on --reference-docx:
--reference-docx=FILE
Use the specified file as a style reference in producing a docx file.
For best results, the reference docx should be a modified version of a
docx file produced using pandoc. The contents of the reference docx
are ignored, but its stylesheets are used in the new docx. If no
reference docx is specified on the command line, pandoc will look for
a file reference.docx in the user data directory (see --data-dir). If
this is not found either, sensible defaults will be used. The
following styles are used by pandoc: [paragraph] Normal, Title,
Authors, Date, Heading 1, Heading 2, Heading 3, Heading 4, Heading 5,
Block Quote, Definition Term, Definition, Body Text, Table Caption,
Image Caption; [character] Default Paragraph Font, Body Text Char,
Verbatim Char, Footnote Ref, Link.
Which are styles that are saved in the /word/styles.xml component of the docx document.
The page layout on the other hand is saved in the /word/document.xml component in the <w:sectPr> tag, but pandoc's docx writer ignores this part as far as I can tell.
The docx writer builds by default a continuous document, with elements such as headers, paragraphs, simple tables and so on ... much like a html output.
Option #1 (doesn't solve the page orientation problem):
The only page layout option that you can define through styles is the pageBreakBefore which will add a page break before a certain style
Option #2 (seems elegant but hasn't been tested):
Recently the custom writer has been added that allows for a custom lua script, where you should be able to define how certain Pandoc blocks will be written into the output file ... meaning you could potentially define section breaks and page layout for a specific block inserting the sectPr tag into the document. I haven't tried this out but it would be worth investigating. On pandoc github you can check out a sample lua script file for custom html output.
However, this means, you have to have lua installed, learn the language, and it is up to you if you think its worth the time investment.
Optin #3 (a couple of clicks in Word might just do):
As you will probably spend quite some time setting up how to insert sections and what would be the right size, margins, and figuring how to fit the table to such a layout ... I recommend that you use pandoc to put write your document.docx, that you open in Word, and do the layout by hand:
select the table you want on the landscape page
go to Layout > Margins
> select Apply to: Selected text
> choose Page Setup > select Landscape
Now a new section with a landscape orientation should surround your table.
What you would anyway also probably want to do is styling the table and table caption a little (font-size,...), to achieve the best result (all text styling can be already applied with pandoc where --reference-docx comes handy).
Option #4 (in situation when you can just use pdf instead of docx):
As far as I could figure out is that with pandoc does a good job with tables in md -> docx (alignment, style, ... ), in tex -> docx it had some trouble sometimes. However if your option allows for a pdf output latex will be your greatest friend. For example your problem is solved as easily as just using
\usepackage{pdflscape}
and adding this around your table
\begin{landscape}
...
\end{landscape}
This are the options that I could think of so far.
I would always recommend using the pdf format for reports, as you can style it to your liking with latex and the layout will stay the way you want it to be.
However, I also know that for various reasons word documents are still the main way of reviewing manuscripts in many fields ... so i would most likely just go with my suggested option 3, mostly cause it is a lazy and quick solution and because I usually don't have many documents with tons of giant tables with awkward placement and styling.
Good luck ;-)
Based on Taleb's answer here and some officer package functions, I created a little gist that one can use like this:
---
title: "Example"
author: "Dan Chaltiel"
output:
word_document:
pandoc_args:
'--lua-filter=page-break.lua'
---
I'm in portrait
\endLandscape
I'm in landscape
\endPortrait
I'm in portrait again
With page-breaks.lua being the file hosted here: https://gist.github.com/DanChaltiel/e7505e62341093cfdc489265963b6c8f
This is far from perfect (for instance it won't work without the last portrait section), but it is quite useful sometimes.

Highlight pdf line

Please can any one help me. I am really stuck I don't know how to highlight particular line of pdf. It would be better if any one can provide me sample code or pseudo code
Thanks
This is not trivial.
To do this, I'd render the PDF contents into one layer, and somehow get the position of the said line/object using the CoreGraphics PDF parser (or some other way). After that, you highlight the said object using your own drawing code.
Just highlighting a particular line is quite difficult.
If you need search and highlight, please try FastPDFKit. I played with it for a while and it's quite good as a pdf reader.
http://mobfarm.eu/fastpdfkit
I'm working on the same thing at the moment and it's not trivial indeed.
From what I can figure out you need to load the text and arrange it in lines first. If you are using Poppler, the Poppler.Page.textList() will provide you with a list of TextBoxes and a TextBox.hasSpaceAfter() will tell you the end of line when returning False.
I am using the Qt4 frontend, so the each TextBox has a QRect from which I can figure out where to highlight a word. Highlighting a line is more or less lirstWordOfLine.geometry().united(lastWordOfLine.geometry()) which will provide the geometry of the line to highlight.
Now what I can't figure out is how to save the coordinates of the highlights in the document.