iTextSharp does not retrieve the TAB character - itext

I'm reading a pdf file by iTextSharp but the following command does not return the TAB character, only the ENTER.
var rect = new System.util.RectangleJ(x, y, width, height);
var filters = new RenderFilter[1];
filters[0] = new RegionTextRenderFilter(rect);
ITextExtractionStrategy strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filters);
var currentText = PdfTextExtractor.GetTextFromPage(pdfReader, pageNumber, strategy);
Can someone help me?
thank you

Nobody can answer your question because your assumption that the concept of a TAB character in a PDF content stream exists is wrong.
There is no such thing as a TAB character between two words. TABs are created by defining distances between words. Text is added at absolute positions and if two snippets of text need to be separated by tab space, the coordinates are adapted in accordance with this requirement. There are no TAB characters! Only differences in distances between text snippets.
iTextSharp can give you detailed information about the position of text snippets that are stored inside a PDF. You can find some code in the accepted answer to this question: PDF Reading highlighed text (highlight annotations) using C#
We've demonstrated the concept of text extraction at our iText Summit in Cologne on June 17, 2014. These are the slides that will help you on your way: http://www.slideshare.net/iTextPDF/itext-summit-2014-talk-unstructured-pdf

Related

How to autoupdate an MS word field based on typed text?

Is there a way to automatically update the content of a word field based on a text I type?
i.e. I have a table with two cells. The left cell contains a QR Code generated by
{ Displaybarcode "Just a Text " QR \s 40 \t }
The rigtht cell contains "Just a text"
Is there a way to update the QR code (actually a word field) based on what I type in the right cell?
So, if I change the text to "I just changed that text!"
I would get this:
I do not mind pressing Ctrl-F9, but I would not want to edit the field itself.
Is that possible?
Dan
I would put the area for the text in a mapped Content Control and use a copy of that control in the DisplayBarCode field as the text portion. The field would need to be updated.
Here is another Add-In to produce Mapped Content Controls by MVP Graham Mayor.
Here is a video by Laura Townsend on how to do it yourself.
Here is the Walkthrough page on mapping to an XML part from Microsoft.

Aspose.Words - Format a single word in a paragraph

I'm new to Aspose.Words for .Net, and working on recreating some documents for a customer. I need to make a single word in a paragraph bold and underlined. I'm trying to achieve this by creating separate paragraph runs for the text before the bold word, the bold word itself, and the text after. Then I'm formatting the bold word's run and appending everything to the paragraph. This seems overly complicated. Is there a simple way to achieve this from within DocumentBuilder.WriteLn("some text")?
I kept working at it, and achieved the desired result by using DocumentBuilder.Write() instead of DocumentBuilder.Writeln():
builder.Write("The start of the paragraph ");
builder.Font.Bold = true;
builder.Font.Underline = Underline.Single;
builder.Write("underlined bolded text");
builder.Font.Bold = false;
builder.Font.Underline = Underline.None;
builder.Write(" the end of the paragraph.");

String output to MS-WORD has too many new lines

This is my program window... The box in questions is the one labeled "INVESTIGATION" and "DISPOSITION".
When text is entered into the multiline textbox it is output in the DOCX file and it adds extra new lines... If a line is entered right below the original line it is a single extra new line. If it is spaced by one line it adds in two new lines. I figure it is something with the textbox properties or the output argument not playing well with the string. I have included pictures of the Form as it looks to the user. The builder code. The word document which shows the problem and the textbox properties. I want new lines in the investigation and disposition textbox to display how they are in the GUI and not have the extra new line. Any pointers?
I cant post images yet but you can view the images at the DOCX discussion board here https://docx.codeplex.com/discussions/658603
I hope its clear that I want it to display just as it is in the GUI:
A
A
A
should not look like it does in the last image where it is:
A
A
A
Thanks for all the help!

DataDynamics.ActiveReports replacing text during runtime changes original format

I'm almost left with no time but facing a problem with DataDynamics.ActiveReports.
I have to replace some text for 500 reports so automating the task through code at run time.
The major problem I'm facing is on replacing text the original bold wont changes to normal font. center justified text will be left justified also Arial Narrow text changes to Arial.
Is there any way to replace text without disturbing the original format.
Here is the piece of code:
var textBox = (DataDynamics.ActiveReports.RichTextBox)reportSection.Controls[controlIdx];
if (textBox.Text.Contains("Babu"))
{
MessageBox.Show(textBox.Text);
var modifiedtext = (DataDynamics.ActiveReports.RichTextBox)reportSection.Controls[controlIdx];
modifiedtext.Text = modifiedtext.Text.Replace("Babu", "Mannu");
MessageBox.Show(modifiedtext.Text);
}
The modified report has a format different than the original. How to fix this issue??
its richtext, not plain text.
every rich text has a formatting associated with it.
try editing the original rtf that you are loading into the rtb control. This is what I would recommend.
Or, another approach could be to use richtextbox.rtf.replac instead of richtextbox.text.
At what time of the report processing are you doing this?

What is the "up-pointing" version of U+25BE?

I'm not even sure if it exists but I'm using this Unicode character as a down indicator http://www.charbase.com/25be-unicode-black-down-pointing-small-triangle (▾) but I need the "up" version...any ideas?
U+25B4 (▴) is technically the up-pointing version of (▾), but it's not exactly the same.
I was using it for showing whether a dropdown element was open and needed it to be exact. So I ended up using CSS transform to flip it. (In my case it was a pseudo-element).
.dropdown::after {
content: '\25BE';
}
.dropdown.active::after {
transform: rotate(180deg);
}
For this character, http://www.unicode.org/charts/PDF/U25A0.pdf contains U+25BE (▾) and related characters.
Looking at the PDF shows U+25B4 (▴) as the black small up-pointing triangle (formally BLACK UP-POINTING SMALL TRIANGLE).
In general, go to http://www.unicode.org/charts and enter the hex number for a character (e.g. 25B4) and it will show you which PDF file describes the related characters. View the PDF; in this case, a quick scan upwards from the down-pointing arrow found the related character code, and the next page shows the formal name and related details.
Do you want U+25B4 BLACK UP-POINTING SMALL TRIANGLE (▴)?
If you know the codepoint of a character and you're trying to find similar ones, try searching the code charts by hex code.
U+25B4 = BLACK UP-POINTING SMALL TRIANGLE. Isn't there a character map you can use installed on your system? I have one (gucharmap - the GNOME [Unicode] Character Map) specifically for occasions like this. Just a suggestion. :-)