Getting bytearray from PdfDictionary - itext

Please refer to this picture below taken from itext url https://itextpdf.com/en/resources/faq/technical-support/itext-5/how-add-alternative-text-image-tagged-pdf. I am trying to get the bytearray of the images in the pdf, while traversing all the branches, looking for structural element marked as /Figure. Goal is to insert a tag (alternate text) to the image as well get the byte array of that specific image to which I am adding the alternate text.
Here is the code
static void manipulate(PdfDictionary element)
{
if (element == null)
return;
var t = element.GetBytes();
if (PdfName.FIGURE.Equals(element.Get(PdfName.S)))
{
element.Put(PdfName.ALT, new PdfString("Figure without ALT"));
var byteaarray = element.GetBytes(); //// Here bytearray is null
}
var kids = element.GetAsArray(PdfName.K);
if (kids == null)
return;
for(int i = 0; i < kids.Size; i++)
{
manipulate(kids.GetAsDict(i));
}
}
The program successfully adds alternate text "Figure without ALT" to all the images. However, the bytes of that figure is null.
Can you please tell what am I doing wrong here?
Thank you.

Related

iTextSharp automatically shrinking images when end of the page reach

I'm using iTextSharp to display images in a pdf report. Here I want display two images in a row and it's working as expected but having a issue when end of the page reaches. The issue is that last row images get shrink to fit in same page, it doesn't automatically add it to the next page. All images having same dimension and resolution.
Please, provide us with the code.
I wrote the test below (although it's in java, there should be no problem) and the results seem to be correct.
public void tableWithImagesTest01() throws IOException, InterruptedException {
String testName = "tableWithImagesTest01.pdf";
String outFileName = destinationFolder + testName;
String cmpFileName = sourceFolder + "cmp_" + testName;
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(outFileName));
Document doc = new Document(pdfDoc, PageSize.A3);
Image image1 = new Image(ImageDataFactory.create(sourceFolder + "itis.jpg"));
Table table = new Table(2);
for (int i = 0; i < 20; i++) {
table.addCell(new Cell().add(image1));
table.addCell(new Cell().add(image1));
table.addCell(new Cell().add(new Paragraph("Hello")));
table.addCell(new Cell().add(new Paragraph("World")));
}
doc.add(table);
doc.close();
Assert.assertNull(new CompareTool().compareByContent(outFileName, cmpFileName, destinationFolder, "diff"));
}
The result pdf looks like this:
Maybe you use summat image1.setAutoScale(true);? Still we need your code to look at.
The easiest solution (considering all images have the same dimension and resolution) would be to manually insert a new page and pagebreak every time you have inserted the maximum number of images to a page.
Taken from a comment below, the solution that works is, on the individual images you need to set:
image.ScaleToFitHeight = false;
Likely to happen when keeping rows together

How to use existing font in itext [duplicate]

There are tips in "iText In Action" that cover setting fonts, as well as the "FontFactory.RegisterDirectories" method (which is, as the book says...an expensive call). However, in my case, the font that I want to use for new fields is already embedded in the document (in an existing Acrofield). With no guarantee that the same font will exist on the user's machine (or on a web server)....is there a way that I can register that already-embedded font, so that I can re-use it for other objects? In the code below, Acrofield "TheFieldIWantTheFontFrom" has the font that I want to re-use for a field named "my_new_field". Any help would be greatly appreciated!
using (MemoryStream output = new MemoryStream())
{
// Use iTextSharp PDF Reader, to get the fields and send to the
//Stamper to set the fields in the document
PdfReader pdfReader = new PdfReader(#"C:\MadScience\MSE_030414.pdf");
// Initialize Stamper (ms is a MemoryStream object)
PdfStamper pdfStamper = new PdfStamper(pdfReader, output);
// Get Reference to PDF Document Fields
AcroFields pdfFormFields = pdfStamper.AcroFields;
//*** CODE THAT HAVE NOT YET BEEN ABLE TO MAKE USE OF TO ASSIST WITH MY FONT ISSUE
//*** MIGHT BE HELP?
//List<object[]> fonts = BaseFont.GetDocumentFonts(pdfReader);
//BaseFont[] baseFonts = new BaseFont[fonts.Count];
//string[] fn = new string[fonts.Count];
//for (int i = 0; i < fonts.Count; i++)
//{
// Object[] obj = (Object[])fonts[i];
// baseFonts[i] = BaseFont.CreateFont((PRIndirectReference)(obj[1]));
// fn[i] = baseFonts[i].PostscriptFontName.ToString();
// //Console.WriteLine(baseFonts[i].FamilyFontName[0][1].ToString());
// //FontFactory.RegisteredFonts.Add(fn[i]);
// //FontFactory.Register(
// Console.WriteLine(fn[i]);
//}
//ICollection<string> registeredFonts = iTextSharp.text.FontFactory.RegisteredFonts;
//foreach (string s in registeredFonts)
//{
// Console.WriteLine("pre-registered: " + s);
//}
if (!FontFactory.Contains("georgia-bold"))
{
FontFactory.RegisterDirectories();
Console.WriteLine("had to register everything"); }
//registeredFonts = iTextSharp.text.FontFactory.RegisteredFonts;
//foreach (string s in registeredFonts)
//{
// Console.WriteLine("post-registered: " + s);
//}
Font myfont = FontFactory.GetFont("georgia-bold");
string nameOfField = "my_field";
AcroFields.Item fld = pdfFormFields.GetFieldItem(nameOfField);
//set the text of the form field
pdfFormFields.SetField(nameOfField, "test stuff");
pdfFormFields.SetField("TheFieldIWantTheFontFrom", "test more stuff");
bool madeit = pdfFormFields.SetFieldProperty(nameOfField, "textfont", myfont.BaseFont, null);
bool madeit2 = pdfFormFields.SetFieldProperty(nameOfField, "textsize", 8f, null);
pdfFormFields.RegenerateField(nameOfField);
// Set the flattening flag to false, so the document can continue to be edited
pdfStamper.FormFlattening = true;
// close the pdf stamper
pdfStamper.Close();
//get the bytes from the MemoryStream
byte[] content = output.ToArray();
using (FileStream fs = File.Create(#"C:\MadScience\MSE_Results.pdf"))
{
//byte[] b = outList[i];
fs.Write(content, 0, (int)content.Length);
fs.Flush();
}
}
Yes you can re-use fonts and the PDF specification actually encourages it. You should, however, keep in mind that some fonts may be embedded as subsets only.
The below code is adapted from this post (be careful, that site has nasty popups sometimes). See the comments in the code for more information. This code was tested against iTextSharp 5.4.4.
/// <summary>
/// Look for the given font name (not file name) in the supplied PdfReader's AcroForm dictionary.
/// </summary>
/// <param name="reader">An open PdfReader to search for fonts in.</param>
/// <param name="fontName">The font's name as listed in the PDF.</param>
/// <returns>A BaseFont object if the font is found or null.</returns>
static BaseFont findFontInForm(PdfReader reader, String fontName) {
//Get the document's acroform dictionary
PdfDictionary acroForm = (PdfDictionary)PdfReader.GetPdfObject(reader.Catalog.Get(PdfName.ACROFORM));
//Bail if there isn't one
if (acroForm == null) {
return null;
}
//Get the resource dictionary
var DR = acroForm.GetAsDict(PdfName.DR);
//Get the font dictionary (required per spec)
var FONT = DR.GetAsDict(PdfName.FONT);
//Look for the actual font and return it
return findFontInFontDict(FONT, fontName);
}
/// <summary>
/// Helper method to look at a specific font dictionary for a given font string.
/// </summary>
/// <remarks>
/// This method is a helper method and should not be called directly without knowledge of
/// the internals of the PDF spec.
/// </remarks>
/// <param name="fontDict">A /FONT dictionary.</param>
/// <param name="fontName">Optional. The font's name as listed in the PDF. If not supplied then the first font found is returned.</param>
/// <returns>A BaseFont object if the font is found or null.</returns>
static BaseFont findFontInFontDict(PdfDictionary fontDict, string fontName) {
//This code is adapted from http://osdir.com/ml/java.lib.itext.general/2004-09/msg00018.html
foreach (var internalFontName in fontDict.Keys) {
var internalFontDict = (PdfDictionary)PdfReader.GetPdfObject(fontDict.Get(internalFontName));
var baseFontName = (PdfName)PdfReader.GetPdfObject(internalFontDict.Get(PdfName.BASEFONT));
//// compare names, ignoring the initial '/' in the baseFontName
if (fontName == null || baseFontName.ToString().IndexOf(fontName) == 1) {
var iRef = (PRIndirectReference)fontDict.GetAsIndirectObject(internalFontName);
if (iRef != null) {
return BaseFont.CreateFont(iRef);
}
}
}
return null;
}
And here's the test code that runs this. It first creates a sample document with an embedded font and then it creates a second document based upon that and re-uses that font. In your code you'll need to actually know beforehand what the font name is that you're searching for. If you don't have ROCK.TTF (Rockwell) installed you'll need to pick a different font file to run this.
//Test file that we'll create with an embedded font
var file1 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "test.pdf");
//Secondary file that we'll try to re-use the font above from
var file2 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "test2.pdf");
//Path to font file that we'd like to use
var fontFilePath = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ROCK.TTF");
//Create a basefont object
var font = BaseFont.CreateFont(fontFilePath, BaseFont.WINANSI, true);
//Get the name that we're going to be searching for later on.
var searchForFontName = font.PostscriptFontName;
//Step #1 - Create sample document
//The below block creates a sample PDF file with an embedded font in an AcroForm, nothing too special
using (var fs = new FileStream(file1, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
//Create our field, set the font and add it to the document
var tf = new TextField(writer, new iTextSharp.text.Rectangle(50, 50, 400, 150), "first-name");
tf.Font = font;
writer.AddAnnotation(tf.GetTextField());
doc.Close();
}
}
}
//Step #2 - Look for font
//This uses a stamper to draw on top of the existing PDF using a font already embedded
using (var fs = new FileStream(file2, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var reader = new PdfReader(file1)) {
using (var stamper = new PdfStamper(reader, fs)) {
//Try to get the font file
var f = findFontInForm(reader, searchForFontName);
//Make sure we found something
if (f != null) {
//Draw some text
var cb = stamper.GetOverContent(1);
cb.BeginText();
cb.MoveText(200, 400);
cb.SetFontAndSize(f, 72);
cb.ShowText("Hello!");
cb.EndText();
}
}
}
}
EDIT
I made a small modification to the findFontInFontDict method above. The second parameter is now optional. If null it returns the first font object that it finds in the supplied dictionary. This change allows me to introduce the below method which looks for a specific field by name and gets the font.
static BaseFont findFontByFieldName(PdfReader reader, String fieldName) {
//Get the document's acroform dictionary
PdfDictionary acroForm = (PdfDictionary)PdfReader.GetPdfObject(reader.Catalog.Get(PdfName.ACROFORM));
//Bail if there isn't one
if (acroForm == null) {
return null;
}
//Get the fields array
var FIELDS = acroForm.GetAsArray(PdfName.FIELDS);
if (FIELDS == null || FIELDS.Length == 0) {
return null;
}
//Loop through each field reference
foreach (var fieldIR in FIELDS) {
var field = (PdfDictionary)PdfReader.GetPdfObject(fieldIR);
//Check the field name against the supplied field name
if (field.GetAsString(PdfName.T).ToString() == fieldName) {
//Get the resource dictionary
var DR = acroForm.GetAsDict(PdfName.DR);
//Get the font dictionary (required per spec)
var FONT = DR.GetAsDict(PdfName.FONT);
return findFontInFontDict(FONT);
}
}
return null;
}

Eclipse RCP ListProvider tweaking

I have problem with changing this image list provider in to thumbnail provider. In case of need I will post View for it too.
public Object[] getElements(Object inputElement) {
if (iDirname == null)
return null;
File dir = new File(iDirname);
FilenameFilter filter = new FilenameFilter() {
public boolean accept(File directory, String filename) {
if (filename.endsWith("jpg") || (filename.endsWith("bmp")) || (filename.endsWith("png") || (filename.endsWith("JPG") || (filename.endsWith("BMP")) || (filename.endsWith("PNG")))))
return true;
else
return false;
}
};
String[] dirList = null;
if (dir.isDirectory()) {
dirList = dir.list(filter);
for (int i=0; i<dirList.length;++i){
//dirList2[i] = new Image(device, dirList2[i]); added this to try passing array of Images - failed.
dirList[i] = iDirname + File.separatorChar + dirList[i];
}
}
return dirList;
}
And the view
public void createPartControl(Composite parent) {
iViewer = new ListViewer(parent);
iViewer.setContentProvider(new DirListProvider());
getSite().setSelectionProvider(iViewer);
makeActions();
hookContextMenu();
contributeToActionBars();
}
I don't know how to change provided path lists to the thumbnail displaying. Should I get the provided content in to Array and iterate through it creating Images? If so how?
Thanks in advance for your help.
EDIT:
I added
ImageDescriptor[] dirList = null;
if (dir.isDirectory()) {
String[] dirList2 = dir.list(filter);
for (int i=0; i<dirList2.length;++i){
dirList[i] = ImageDescriptor.createFromImageData(new ImageData(iDirname + File.separatorChar + dirList2[i]));
//dirList[i] = iDirname + File.separatorChar + dirList[i];
}
}
return dirList;
but this is not showing anything at all.
When you are telling me to use Composite, is it my parent variable? I still don't know how to display the images from paths passed by ListProvider. I am really green in this :/
What you are missing here is a LabelProvider. You can use a LabelProvider to provide an image for each element in your viewer's input.
However, Francis Upton is right, I don't think ListViewer will really suit your needs as you will end up with a single column of images. Although you won't be able to add the images directly to your Composite, you will need to set them as the background image of a label.
There are a couple of other things to consider:
You need to dispose() of your Images once you're done with them as they use up System handles. Therefore you need to keep track of the Images you create in your getElements(Object) method.
If the directories you are reading the images from do not already contain thumbnails, you will need to scale the images before presenting them on your UI.
Remember, the array type you return from your ContentProvider's getElements(Object) method defines the type that will get passed into your LabelProvider's methods. So you started off returning an array of strings representing paths to the images. Your LabelProvider would need to load these into images to be returned from the provider's getImage method - but bear in mind what I said about disposing of these images! Then you switched to returning an Array of image descriptors, in this case you would need to cast your incoming Object to an ImageDescriptor and use that to create the Image in the getImage method. Maybe once you have this working you can think about whether this meets your needs, and then possibly look at doing a different implementation, such as the composite/gridlayout/label approach.
I would not use a ListViewer for this. I would just create a Composite and then using GridLayout set up the number of columns you want and margins and so forth, and then just add the images directly to the composite. As far as I know you cannot put arbitrary things like imagines in an SWT List, so the ListViewer is not going to help you. You can do all of this in the createPartControl method.

Insert an Image in PDF using ITextSharp

I have to insert a an image in a pdf. That is, wherever I see a text 'Signature', I have to insert an signature image there . I can do by saying absolute positions .
But, I am looking for how to find the position of the word 'Signature' in the pdf and insert the image.
Appreciate ur help!
This is the working code:
using (Stream inputImageStream = new FileStream(#"C:\signature.jpeg", FileMode.Open, FileAccess.Read, FileShare.Read))
using (Stream outputPdfStream = new FileStream(#"C:\test\1282011\Result.pdf", FileMode.Create, FileAccess.Write, FileShare.None))
{
var reader = new PdfReader(#"C:\Test\1282011\Input.pdf");
var stamper = new PdfStamper(reader, outputPdfStream);
var count = reader.NumberOfPages;
iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(inputImageStream);
image.SetAbsolutePosition(300, 200); // Absolute position
image.ScaleToFit(200, 30);
PRTokeniser pkt = null;
string strpages = string.Empty;
System.Text.StringBuilder build = new System.Text.StringBuilder();
for (int i = 1; i <= count; i++)
{
var pdfContentByte = stamper.GetOverContent(i);
if (pdfContentByte != null)
{
pkt = new PRTokeniser(stamper.Reader.GetPageContent(i));
while (pkt.NextToken())
{
if (pkt.TokenType == PRTokeniser.TokType.STRING)
{
if (pkt.StringValue == "Signature")
{
pdfContentByte.AddImage(image);
}
}
}
}
}
stamper.Close();
}
}
After some googling, I found out that I could absolute position of text as follows:
extSharp.text.pdf.AcroFields fields = stamper.AcroFields;
IList<iTextSharp.text.pdf.AcroFields.FieldPosition> signatureArea = fields.GetFieldPositions("Signature");
iTextSharp.text.Rectangle rect= signatureArea.First().position;
iTextSharp.text.Rectangle logoRect = new iTextSharp.text.Rectangle(rect);
image.SetAbsolutePosition(logoRect.Width ,logoRect .Height );
But the variable , signatureArea is null all the time even when the pdf contains the word 'Signature'.
Any input..? :)
Jaleel
Check out PdfTextExtractor and specifically the LocationTextExtractionStrategy. Create a class in your project with the exact code for the LocationTextExtractionStrategy and put a breakpoint on the line return sb.ToString(); (line 131 in SVN) and take a look at the contents of the variable locationalResult. You'll see pretty much exactly what you're looking for, a collection of text with start and end locations. If your search word isn't on a line by itself you might have to dig a little deeper but this should point you in the right direction.
That was perfect Chris. I am able to find the text position and insert the signature. What I understood is , there is a list List<TextChunk> LocationalResult in the LocationTextExtractionStrategy class. The RenderText() method in LocationTextExtractionStrategy will add each text to the LocationalResult list.
Actually the list LocationalResult is a private list, I made it public to access it from outside.
I loop through each page of PDF document and call PdfTextExtractor.GetTextFromPage(reader, i, locationStrat); where i is the pagenumber. At this time all text in the page will be added to the LocationalResult with all the position information.
This is what I done . And it works perfect.

iText AddImage() to specific page

I'm having a problem trying to locate a PdfContentByte directly into an specific page. My problem is: I need to add an Image for each page (That works) and need to add a QRCode to each of the pages at the right bottom corner but this works only for the first Page and I don't know how to repeat it on the other ones.
This is my code:
public string GeneratePDFDocument(Atomic.Development.Montenegro.Data.Entities.Document document, Stamp stamp)
{
string filename = #"C:\Users\Sheldon\Desktop\Pdf.Pdf";
FileStream fs = new FileStream(filename, FileMode.Create);
iTextSharp.text.Document pdfDocument = new iTextSharp.text.Document(PageSize.LETTER, PAGE_LEFT_MARGIN, PAGE_RIGHT_MARGIN, PAGE_TOP_MARGIN, PAGE_BOTTOM_MARGIN);
iTextSharp.text.pdf.PdfWriter writer = iTextSharp.text.pdf.PdfWriter.GetInstance(pdfDocument, fs);
pdfDocument.Open();
int count = document.Pages.Count;
foreach (Page page in document.Pages)
{
Image img = Image.GetInstance(page.Image);
img.ScaleToFit(PageSize.LETTER.Width-(PAGE_LEFT_MARGIN + PAGE_RIGHT_MARGIN), PageSize.LETTER.Height-(PAGE_TOP_MARGIN + PAGE_BOTTOM_MARGIN));
pdfDocument.Add(img);
PlaceCodeBar(writer);
}
pdfDocument.Close();
writer.Close();
fs.Close();
return filename;
}
private static void PlaceCodeBar(iTextSharp.text.pdf.PdfWriter writer)
{
String codeText = "TEXT TO ENCODE";
iTextSharp.text.pdf.BarcodePDF417 pdf417 = new iTextSharp.text.pdf.BarcodePDF417();
pdf417.SetText(codeText);
Image img = pdf417.GetImage();
iTextSharp.text.pdf.BarcodeQRCode qrcode = new iTextSharp.text.pdf.BarcodeQRCode(codeText, 1, 1, null);
img = qrcode.GetImage();
iTextSharp.text.pdf.PdfContentByte cb = writer.DirectContent;
cb.SaveState();
cb.BeginText();
img.SetAbsolutePosition(PageSize.LETTER.Width-PAGE_RIGHT_MARGIN-img.ScaledWidth, PAGE_BOTTOM_MARGIN);
cb.AddImage(img);
cb.EndText();
cb.RestoreState();
}
So add it in your foreach (Page...) loop:
foreach (Page page in document.Pages)
{
Image img = Image.GetInstance(page.Image);
img.ScaleToFit(PageSize.LETTER.Width-(PAGE_LEFT_MARGIN + PAGE_RIGHT_MARGIN), PageSize.LETTER.Height-(PAGE_TOP_MARGIN + PAGE_BOTTOM_MARGIN));
pdfDocument.Add(img);
PlaceCodeBar(writer);
}
If this is a second pass on the same PDF (you've closed it then opened it again), use a PdfStamper rather than a PdfWriter. You can then get the direct content of each page rather than the one direct content that is reused (and reset) for each page.
PS: Drop the BeginText() and EndText() calls. Those operators should only be used when actually drawing text/setting fonts/etc. No line art. No images. The SaveState()/RestoreState() are good though. Definitely keep those.
I just figure out how to solve the problem. Just delete the cb.SaveState() and cb.RestoreState() and it put the image on the page is actually active.