extract text with custom font result non readble - itext

I need to extect text from pdf with custom fonts but custom don't let to copy/paste text or search text or extract text in a clear/readble way by iText lib... the resultant text is space or non uman readable chars
The pdf format are:
Author: User Creator: Compart Docponent API Producer: Compart MFFPDF I/O Filter 2013-03-09 00:51:11 CreationDate: 04/21/16 11:26:59 ModDate: 06/09/16 10:02:16 Tagged: no Form: none Pages: 6 Encrypted: no Page size: 595.2 x 841.92 pts (A4) (rotated 0 degrees) File size: 312703 bytes Optimized: yes PDF version: 1.4
the pdf fonts info are (running pdffonts command line for each fonts): name:[none] ; type:[Type 3] ; emb: [yes]; sub: [no]; uni : [yes];
so the pdf seems to have a ToUnicode map but that is not enough..
How I can read text in a clear way?
thanks in advance
G.G.

Related

how to let long text change to multiple lines when adding annotation to images

how to let long text change to multiple lines when adding annotation to image
and make the length of each line are equal, like rectangle?
image temp:=getfrontimage()
temp.ShowImage()
imageDisplay disp = front.ImageGetImageDisplay(0)
getsize(temp,x,y)
le=x*2/3
to1=y*80/100
component text1 = NewTextAnnotation(le,to1,string1+","+string2+","+string3,100)
I tried to add 3 strings to an image, each string has more than 15 letters/characters, Total more than 50 letters.
If I put all 3 strings in one text annotation in on line, it is too long.
If I put them as 3 text annotations, as the each line does not have exactly same numbers of letters, it shows ugly.
Is there any could let the text as multiple line, and the background of texts in each line has the same length?
Or the 3 text annotations has the same length, I mean the background of the texts has the same length, when the letters in each text annotations are not same, for example, 1st text annotation with 16 letters, 2nd text annotation with 20 letters, 3rd text annotation with 14 letters, but their background of the text have the same length.
Thanks,
You can add line-breaks as with all strings by simply adding the line-break escape string \n.
Example script:
number sx = 512
number sy = 512
image img := RealImage("test",4,sx,sy)
img = icol
img.ShowImage()
imageDisplay disp = img.ImageGetImageDisplay(0)
number l = sx * 2/3
number t = sy * 80/100
String mLstr = "Some text line\nSome more text lines\nShort text"
number fontSize = 12
Component Line = disp.NewTextAnnotation(l,t,mLstr,fontSize)
Line.TextAnnotationSetAlignment(1) // 1=Left, 2=Center, 3=Right
Line.ComponentSetDrawingMode(1) // 1=with background, 2=without background
Line.ComponentSetBackgroundColor(0.5,0.0,0)
Line.ComponentSetForegroundColor(0,1,0)
disp.ComponentAddChildAtEnd(line)
A note: When creating the new component, there are two different variants of the NewTextAnnotation command:
Component NewTextAnnotation( Component ref_par_comp, Number left, Number top, String text, Number size )
Component NewTextAnnotation( Number left, Number top, String text, Number size )
The first one takes the addtional "parent" component. If you use that one, then the font-size will scale with the default display-size of the parent component on the screen, i.e. will not be different for differently sizes images.
To test: Just try the above script with sx and sy values. Then do the same without the disp. in the line-annotation creating line.

Red updating text in VID using reactive method

Red [needs: 'view]
num: ["1^/"]
k: num/1
view [
size 600x600
txt: text 30x50 k
ar: area 300x400 "" focus on-change[
txt/size: ar/size
len: length? split face/text newline
either (len - face/data) > 0 [
append num append form (len + 1) newline
face/data: len
][
remove back tail num
face/data: face/data - 1
]
txt/text: form num
]
do [ar/data: 0]
]
This Red program contains a "text face" and an "area face". The text face contains a vertical list of serial numbers. When a newline is added in the area face, the serial number will increase as per number of lines. And when a line is removed in the area face, the serial number will decrease as well.
This is using a non-reactive method. Is there a reactive approach to do it?
I believe that you we're looking for the react function. The reactive framework was introduced in this blog post and there is a very similar example of converting an example using on-change to its reactive version.
Anyhow, I was reading a lot lately about Red, and I was looking for an first exercise; My to-list implementation could be improved probably, but the view declaration is more compact now:
Red [needs: 'view]
to-list: function [text][
; converts text area string to list of numbers separated by newlines
txt: copy text
append txt "dummy" ; handle empty lines
len: length? split txt newline
x: copy ""
repeat i len [ append x mold i append x newline]
]
view [
size 600x600
text 30x600 react [
face/text: to-list text-area/text
]
text-area: area 300x400 ""
]

Change text color in knitr for Word

I have an R Markdown file that outputs a data table into Word (unfortunately, my company's security settings won't allow it to interface with LaTeX) and I would like to color code the text in the table based on a classification factor to make it more readable. For example:
df <- data.frame(c(1:5), runif(5, 100, 200), c(rep("A", 3), rep("B", 2)))
colnames(df) <- c("N", "Value", "Category"
kable(df, format = "markdown", row.names = FALSE, align = 'l')
I would like whatever has a Category value of B, for example to be red in the output. I haven't found a command in knitr for Word that allows colors to change (I think this problem would be doable with LaTeX) which may stop me right there. Any thoughts or ideas how to get this to work?

How to set two different colors for a single string in itext

I have string like below, and i can't split the string.
String result="Developed By : Mr.XXXXX";
i can create a paragraph in itext and set font with color like below,
Font dataGreenFont = FontFactory.getFont("Garamond", 10,Color.GREEN);
preface.add(new Paragraph(result, dataGreenFont));
it set the green color to entire text result but i want to set color only for Mr.XXXXX part. How do i do this?
First this: you are using an obsolete version of iText. Please upgrade!
As for your question: a Paragraph consists of a series of Chunk objects. A Chunk is an atomic part of text in which all the glyphs are in the same font, have the same font size, color, etc...
Hence you need to split your String in two parts:
Font dataGreenFont = FontFactory.getFont("Garamond", 10, BaseColor.GREEN);
Font dataBlackFont = FontFactory.getFont("Garamond", 10, BaseColor.BLACK);
Paragraph p = new Paragraph();
p.Add(new Chunk("Developed By : ", dataGreenFont));
p.Add(new Chunk("Mr.XXXXX", dataBlackFont));
document.add(p);

Changing text line spacing

I'm creating a PDF document consisting of text only, where all the text is the same point size and font family but each character could potentially be a different color. Everything seems to work fine using the code snippet below, but the default space between the lines is slightly greater than I consider ideal. Is there a way to control this? (FYI, type "ColoredText" in the code below merely contains a string and its color. Also, the reason I am treating the newline character separately is that for some reason it doesn't cause a newline if it's in a Chunk.)
Thanks,
Ray
List<byte[]> pdfFilesAsBytes = new List<byte[]>();
iTextSharp.text.Document document = new iTextSharp.text.Document();
MemoryStream memStream = new MemoryStream();
iTextSharp.text.pdf.PdfWriter.GetInstance(document, memStream);
document.SetPageSize(isLandscape ? iTextSharp.text.PageSize.LETTER.Rotate() : iTextSharp.text.PageSize.LETTER);
document.Open();
foreach (ColoredText coloredText in coloredTextList)
{
Font font = new Font(Font.FontFamily.COURIER, pointSize, Font.NORMAL, coloredText.Color);
if (coloredText.Text == "\n")
document.Add(new Paragraph("", font));
else
document.Add(new Chunk(coloredText.Text, font));
}
document.Close();
pdfFilesAsBytes.Add(memStream.ToArray());
According to the PDF specification, the distance between the baseline of two lines is called the leading. In iText, the default leading is 1.5 times the size of the font. For instance: the default font size is 12 pt, hence the default leading is 18.
You can change the leading of a Paragraph by using one of the other constructors. See for instance: public Paragraph(float leading, String string, Font font)
You can also change the leading using one of the methods that sets the leading:
paragraph.SetLeading(fixed, multiplied);
The first parameter is the fixed leading: if you want a leading of 15 no matter which font size is used, you can choose fixed = 15 and multiplied = 0.
The second parameter is a factor: for instance if you want the leading to be twice the font size, you can choose fixed = 0 and multiplied = 2. In this case, the leading for a paragraph with font size 12 will be 24, for a font size 10, it will be 20, and son on.
You can also combine fixed and multiplied leading.
private static Paragraph addSpace(int size = 1)
{
Font LineBreak = FontFactory.GetFont("Arial", size);
Paragraph paragraph = new Paragraph("\n\n", LineBreak);
return paragraph;
}