How to replace text of Paragraph in Itext? - itext

I want to set page number when I merge pdf files. The page number paragraph will defind by someone to custom style what he want. Now I can add text to the paragraph(like doc.add(paragraph.add(text))), but I can not replace it.
public static byte[] mergePdf(Map<String, PdfDocument> filesToMerge, Paragraph paragraph) {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(baos));
Document doc = new Document(pdfDoc);
pdfDoc.initializeOutlines();
PdfPageFormCopier formCopier = new PdfPageFormCopier();
int page = 1;
for (Map.Entry<String, PdfDocument> entry : filesToMerge.entrySet()) {
String title = entry.getKey();
PdfDocument srcDoc = entry.getValue();
int numberOfPages = srcDoc.getNumberOfPages();
for (int i = 1; i <= numberOfPages; i++, page++) {
Text text = new Text(String.format("page %d", page));
srcDoc.copyPagesTo(i, i, pdfDoc, formCopier);
if (i == 1) {
text.setDestination("p" + page);
PdfOutline rootOutLine = pdfDoc.getOutlines(false);
PdfOutline outline = rootOutLine.addOutline(title);
outline.addDestination(PdfDestination.makeDestination(new PdfString("p" + page)));
}
// I want do like "doc.add(paragraph.set(text))";
// paragraph already have been set position,font,fontSize and so on. SO I dont want to "new Paragraph(text)"
doc.add(paragraph.add(text));
}
}
for (PdfDocument srcDoc : filesToMerge.values()) {
srcDoc.close();
}
doc.close();
return baos.toByteArray();
}

I am the questioner. At last I resolve the question do like this:
public class MyParagraph extends Paragraph {
public MyParagraph() {
}
public void setContent(Text text) {
List<IElement> children = this.getChildren();
if (!children.isEmpty()) {
children.clear();
}
children.add(text);
}
}

Related

Multi-Line Text Fitting in IText 7

This question is a follow-up from this
After the previous post I managed to create the following method that fits text in certain spaces in paragraphs.
public static void getPlainFill2(String str, Document doc, PdfDocument document, Paragraph root,
Paragraph space, boolean isCentred) {
// System.out.println("prevText: "+prev.getText());
float width = doc.getPageEffectiveArea(PageSize.A4).getWidth();
float height = doc.getPageEffectiveArea(PageSize.A4).getHeight();
if (str.isEmpty() || str.isBlank()) {
str = "________";
}
IRenderer spaceRenderer = space.createRendererSubTree().setParent(doc.getRenderer());
LayoutResult spaceResult = spaceRenderer
.layout(new LayoutContext(new LayoutArea(1, new Rectangle(width, height))));
Rectangle rectSpaceBox = ((ParagraphRenderer) spaceRenderer).getOccupiedArea().getBBox();
float writingWidth = rectSpaceBox.getWidth();
float writingHeight = rectSpaceBox.getHeight();
Rectangle remaining = doc.getRenderer().getCurrentArea().getBBox();
float yReal = remaining.getTop() + 2f;// orig 4f
float sizet = 0;
for (int i = 0; i < root.getChildren().size(); i++) {
IElement e = root.getChildren().get(i);
if (e.equals(space)) {
break;
}
IRenderer ss = e.createRendererSubTree().setParent(doc.getRenderer());
LayoutResult ss2 = ss.layout(new LayoutContext(new LayoutArea(1, new Rectangle(width, height))));
sizet += ss.getOccupiedArea().getBBox().getWidth();
System.out.println("width: " + width + " current: " + sizet);
}
float start = sizet+doc.getLeftMargin();
if(isCentred)
start = (width - getRealWidth(doc, root,width,height))/2+doc.getLeftMargin()+sizet;
Rectangle towr = new Rectangle(start, yReal, writingWidth, writingHeight);// sizet+doc.getLeftMargin()
PdfCanvas pdfcanvas = new PdfCanvas(document.getFirstPage());
Canvas canvas = new Canvas(pdfcanvas, towr);
canvas.setTextAlignment(TextAlignment.CENTER);
canvas.setHorizontalAlignment(HorizontalAlignment.CENTER);
Paragraph paragraph = new Paragraph(str).setTextAlignment(TextAlignment.CENTER).setBold();//.setMultipliedLeading(0.9f);
Div lineDiv = new Div();
lineDiv.setVerticalAlignment(VerticalAlignment.MIDDLE);
lineDiv.add(paragraph);
float fontSizeL = 1f;
float fontSizeR = 12;
int adjust = 0;
while (Math.abs(fontSizeL - fontSizeR) > 1e-1) {
float curFontSize = (fontSizeL + fontSizeR) / 2;
lineDiv.setFontSize(curFontSize);
// It is important to set parent for the current element renderer to a root
// renderer
IRenderer renderer = lineDiv.createRendererSubTree().setParent(canvas.getRenderer());
LayoutContext context = new LayoutContext(new LayoutArea(1, towr));
if (renderer.layout(context).getStatus() == LayoutResult.FULL) {
// we can fit all the text with curFontSize
fontSizeL = curFontSize;
} else {
fontSizeR = curFontSize;
}
if(adjust>=2) {
writingHeight -=1.3f;
yReal += 1.4f;
adjust= 0;
}
}
lineDiv.setFontSize(fontSizeL);
canvas.add(lineDiv);
// border
// PdfCanvas(document.getFirstPage()).rectangle(towr).setStrokeColor(ColorConstants.BLACK).stroke();
canvas.close();
}
public static float getRealWidth (Document doc, Paragraph root,float width,float height) {
float sizet = 0;
for(int i = 0;i<root.getChildren().size();i++) {
IElement e = root.getChildren().get(i);
IRenderer ss = e.createRendererSubTree().setParent(doc.getRenderer());
LayoutResult ss2 = ss.layout(new LayoutContext(new LayoutArea(1, new Rectangle(width,height))));
sizet +=ss.getOccupiedArea().getBBox().getWidth();
}
return sizet;}
Now this works almost decent, there are minor issues when text scales to lower sizes and it goes like:
https://i.ibb.co/MkxfwjQ/Screenshot-from-2021-06-14-18-27-09.png (I can't post images because I have no rep.)
but the main issue is that you have to write Paragraphs line by line to work. As next example:
Cell cell3 = new Cell();
LineCountingParagraph line3 = new LineCountingParagraph("");
Text ch07 = new Text("Paragraph Prev ");
line3.add(ch07);
Paragraph nrZile = getEmptySpace(15);
line3.add(nrZile);
Text ch08 = new Text("afterStr, textasdsadasdas ");
line3.add(ch08);
Paragraph data = getEmptySpace(18);
line3.add(data);
Text ch09 = new Text(".\n");
line3.add(ch09);
line3.setTextAlignment(TextAlignment.CENTER);
cell3.add(line3);
doc.add(cell3);
getPlainFill2("thisisalongstring", doc, document, line3, nrZile, true);
getPlainFill2("1333", doc, document, line3, data, true);
Cell cell4 = new Cell();
LineCountingParagraph line4 = new LineCountingParagraph("");
Paragraph loc2 = getEmptySpace(30);
line4.add(loc2);
Text pr32 = new Text(" aasdbsadasd ");
line4.add(pr32);
Paragraph nr2 = getEmptySpace(8);
line4.add(nr2);
Text pr33 = new Text(" asdasdasdasd.\n");
line4.add(pr33);
line4.setTextAlignment(TextAlignment.CENTER);
cell4.add(line4);
doc.add(cell4);
getPlainFill2("1333", doc, document, line4, nr2, true);
If you need more code, I'll upload it somewhere.
Now is there a way to insert text within the same paragraph on multiple lines ? because there seems I cannot find a way to detect line break in IText 7.1.11.
Full code:
package pdfFill;
import java.io.File;
import java.io.IOException;
import com.itextpdf.kernel.colors.ColorConstants;
import com.itextpdf.kernel.geom.PageSize;
import com.itextpdf.kernel.geom.Rectangle;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.kernel.pdf.canvas.PdfCanvas;
import com.itextpdf.layout.Canvas;
import com.itextpdf.layout.Document;
import com.itextpdf.layout.element.Cell;
import com.itextpdf.layout.element.Div;
import com.itextpdf.layout.element.IElement;
import com.itextpdf.layout.element.Paragraph;
import com.itextpdf.layout.element.Text;
import com.itextpdf.layout.layout.LayoutArea;
import com.itextpdf.layout.layout.LayoutContext;
import com.itextpdf.layout.layout.LayoutResult;
import com.itextpdf.layout.property.HorizontalAlignment;
import com.itextpdf.layout.property.TextAlignment;
import com.itextpdf.layout.property.VerticalAlignment;
import com.itextpdf.layout.renderer.DrawContext;
import com.itextpdf.layout.renderer.IRenderer;
import com.itextpdf.layout.renderer.ParagraphRenderer;
public class Newway4 {
public static void main(String[] args) {
PdfWriter writer;
try {
writer = new PdfWriter(new File("test4.pdf"));
PdfDocument document = new PdfDocument(writer);
document.getDocumentInfo().addCreationDate();
document.getDocumentInfo().setAuthor("Piri");
document.getDocumentInfo().setTitle("Test_Stackoverflow");
document.setDefaultPageSize(PageSize.A4);
Document doc = new Document(document);
doc.setFontSize(12);
final Paragraph titlu = new Paragraph();
final Text t1 = new Text("\n\n\n\nTest Stackoverflow\n\n\n").setBold().setUnderline();
titlu.setHorizontalAlignment(HorizontalAlignment.CENTER);
titlu.setTextAlignment(TextAlignment.CENTER);
titlu.add(t1).setBold();
doc.add(titlu);
Cell cell1 = new Cell();
LineCountingParagraph line1 = new LineCountingParagraph("");
line1.add( addTab());
Text ch01 = new Text("This is the 1st example ");
line1.add(ch01);
Paragraph name = getEmptySpace(42);
line1.add(name);// cnp new line
Text ch02 = new Text(" that works ");
line1.add(ch02);
Paragraph domiciliu = getEmptySpace(63);
line1.add(domiciliu);
/* Text ch03 = new Text("\njudet");
line1.add(ch03);
Paragraph judet = getEmptySpace(12);
line1.add(judet);*/
Text ch031 = new Text("\n");
line1.add(ch031);
cell1.add(line1);
doc.add(cell1);
getPlainFill2("with insertion str", doc, document, line1, name, false);
getPlainFill2("because is writtin line by line", doc, document, line1, domiciliu, false);
Cell cell2 = new Cell();
LineCountingParagraph line2 = new LineCountingParagraph("");
Text p51 = new Text("as you can see in this");
line2.add(p51);
Paragraph localitatea = getEmptySpace(30);
line2.add(localitatea);
Text p7 = new Text(" and ");
line2.add(p7);
Paragraph nrCasa =getEmptySpace(8);
line2.add(nrCasa);
Text p09 = new Text(" of text scalling ");
line2.add(p09);
Paragraph telefon = getEmptySpace(22);
line2.add(telefon);
Text p11 = new Text(".");
line2.add(p11);
line2.setTextAlignment(TextAlignment.CENTER);
cell2.add(line2);
doc.add(cell2);
getPlainFill2("sentence", doc, document, line2, localitatea, true);
getPlainFill2("example", doc, document, line2, nrCasa, true);
getPlainFill2("text scalling bla bla", doc, document, line2, telefon, true);
doc.add(new Paragraph("\n\n\n"));
LineCountingParagraph paragraphTest = new LineCountingParagraph("");
paragraphTest.add(addTab());
Text testch01 = new Text("This is the 2nd example ");
paragraphTest.add(testch01);
Paragraph emptyTest01 = getEmptySpace(42);
paragraphTest.add(emptyTest01);
Text testch02 = new Text(" that doesn't work ");
paragraphTest.add(testch02);
Paragraph emptyTest02 = getEmptySpace(53);
paragraphTest.add(emptyTest02);
Text testch04 = new Text(" this next goes to the next line but ");
paragraphTest.add(testch04);
Paragraph emptyTest03 = getEmptySpace(42);
paragraphTest.add(emptyTest03);
Text testch05 = new Text(" won't appear !!");
paragraphTest.add(testch05);
doc.add(paragraphTest);
getPlainFill2("with insertion str", doc, document, paragraphTest, emptyTest01, false);
getPlainFill2("because next text goes next line", doc, document, paragraphTest, emptyTest02, false);
getPlainFill2("this text", doc, document, paragraphTest, emptyTest03, false);
doc.close();
writer.flush();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static String getStrWithDots(final int dots, final String str) {
final int strSize = str.length();
final StringBuilder sb = new StringBuilder();
int dotsRemained;
if (strSize > dots) {
dotsRemained = 0;
} else {
dotsRemained = dots - strSize;
}
for (int i = 0; i < dotsRemained; ++i) {
if (i == dotsRemained / 2) {
sb.append(str);
}
sb.append(".");
}
return sb.toString();
}
public static void getPlainFill2(String str, Document doc, PdfDocument document, Paragraph root,
Paragraph space, boolean isCentred) {
// System.out.println("prevText: "+prev.getText());
float width = doc.getPageEffectiveArea(PageSize.A4).getWidth();
float height = doc.getPageEffectiveArea(PageSize.A4).getHeight();
if (str.isEmpty() || str.isBlank()) {
str = "________";
}
IRenderer spaceRenderer = space.createRendererSubTree().setParent(doc.getRenderer());
LayoutResult spaceResult = spaceRenderer
.layout(new LayoutContext(new LayoutArea(1, new Rectangle(width, height))));
Rectangle rectSpaceBox = ((ParagraphRenderer) spaceRenderer).getOccupiedArea().getBBox();
float writingWidth = rectSpaceBox.getWidth();
float writingHeight = rectSpaceBox.getHeight();
Rectangle remaining = doc.getRenderer().getCurrentArea().getBBox();
float yReal = remaining.getTop() + 2f;// orig 4f
float sizet = 0;
for (int i = 0; i < root.getChildren().size(); i++) {
IElement e = root.getChildren().get(i);
if (e.equals(space)) {
break;
}
IRenderer ss = e.createRendererSubTree().setParent(doc.getRenderer());
LayoutResult ss2 = ss.layout(new LayoutContext(new LayoutArea(1, new Rectangle(width, height))));
sizet += ss.getOccupiedArea().getBBox().getWidth();
}
float start = sizet+doc.getLeftMargin();
if(isCentred)
start = (width - getRealWidth(doc, root,width,height))/2+doc.getLeftMargin()+sizet;
Rectangle towr = new Rectangle(start, yReal, writingWidth, writingHeight);// sizet+doc.getLeftMargin()
PdfCanvas pdfcanvas = new PdfCanvas(document.getFirstPage());
Canvas canvas = new Canvas(pdfcanvas, towr);
canvas.setTextAlignment(TextAlignment.CENTER);
canvas.setHorizontalAlignment(HorizontalAlignment.CENTER);
Paragraph paragraph = new Paragraph(str).setTextAlignment(TextAlignment.CENTER).setBold();//.setMultipliedLeading(0.9f);//setbold oprtional
Div lineDiv = new Div();
lineDiv.setVerticalAlignment(VerticalAlignment.MIDDLE);
lineDiv.add(paragraph);
float fontSizeL = 0.0001f, fontSizeR= 10000;
int adjust = 0;
while (Math.abs(fontSizeL - fontSizeR) > 1e-1) {
float curFontSize = (fontSizeL + fontSizeR) / 2;
lineDiv.setFontSize(curFontSize);
// It is important to set parent for the current element renderer to a root
// renderer
IRenderer renderer = lineDiv.createRendererSubTree().setParent(canvas.getRenderer());
LayoutContext context = new LayoutContext(new LayoutArea(1, towr));
if (renderer.layout(context).getStatus() == LayoutResult.FULL) {
// we can fit all the text with curFontSize
fontSizeL = curFontSize;
if (++adjust>1)
towr.setHeight(towr.getHeight()-0.90f);
} else {
fontSizeR = curFontSize;
}
}
lineDiv.setFontSize(fontSizeL);
canvas.add(lineDiv);
new PdfCanvas(document.getFirstPage()).rectangle(towr).setStrokeColor(ColorConstants.BLACK).stroke();
canvas.close();
}
public static Text addTab() {
StringBuilder sb = new StringBuilder();
for(int i = 0;i<8;i++)
sb.append("\u00a0");
return new Text(sb.toString());
}
public static float getRealWidth (Document doc, Paragraph root,float width,float height) {
float sizet = 0;
for(int i = 0;i<root.getChildren().size();i++) {
IElement e = root.getChildren().get(i);
IRenderer ss = e.createRendererSubTree().setParent(doc.getRenderer());
LayoutResult ss2 = ss.layout(new LayoutContext(new LayoutArea(1, new Rectangle(width,height))));
sizet +=ss.getOccupiedArea().getBBox().getWidth();
}
return sizet;
}
private static Paragraph getEmptySpace(int size) {
Paragraph space = new Paragraph();
space.setMaxWidth(size);
for(int i=0;i<size;i++) {
// par.add("\u00a0");
space.add("\u00a0");
}
return space;
}
private static class LineCountingParagraph extends Paragraph {
private int linesWritten = 0;
public LineCountingParagraph(String text) {
super(text);
}
public void addWrittenLines(int toAdd) {
linesWritten += toAdd;
}
public int getNumberOfWrittenLines() {
return linesWritten;
}
#Override
protected IRenderer makeNewRenderer() {
return new LineCountingParagraphRenderer(this);
}
}
private static class LineCountingParagraphRenderer extends ParagraphRenderer {
public LineCountingParagraphRenderer(LineCountingParagraph modelElement) {
super(modelElement);
}
#Override
public void drawChildren(DrawContext drawContext) {
((LineCountingParagraph)modelElement).addWrittenLines(lines.size());
super.drawChildren(drawContext);
}
#Override
public IRenderer getNextRenderer() {
return new LineCountingParagraphRenderer((LineCountingParagraph) modelElement);
}
}
}
The issue: in the top half of the PDF you can see the result of two LineCountingParagraph instances being created, one per line. In the bottom half of the PDF you can see the result when only one instance of LineCountingParagraph is created. So fitting the text in boxes does not work well in case content of the paragraph wraps to the next line.
You have unnecessarily complicated things in such a way that we have to start from scratch :)
So the goal is to be able to create paragraphs with boxed intrusions of fixed width, where we need to copy-fit (place) some text, making sure the font size is selected in such a way that the text fits into that box.
The result should look similar to this picture:
The idea is that we will add paragraphs of fixed width into our main paragraph (iText allows adding block elements and a paragraph is a block element - into paragraphs). The fixed width will be guaranteed by the contents of our paragraph - it will just contain non-breakable spaces. Our paragraph will actually be backed by another paragraph with the actual content we want to fit into our wrapping paragraph. During the layout of the wrapping paragraph we will know its effective boundary and we will just use that area to determine the right font size for our content paragraph using binary search algorithm. Once the right font size has been determined we will just make sure the paragraph with the content gets drawn right next to our wrapping paragraph.
The code for our wrapping paragraph is pretty simple. It just expects the underlying paragraph with real content as the parameter. As always with iText layout, we should customize the renderer of our autoscaling paragraph:
private static class AutoScalingParagraph extends Paragraph {
Paragraph innerParagraph;
public AutoScalingParagraph(Paragraph innerParagraph) {
this.innerParagraph = innerParagraph;
}
#Override
protected IRenderer makeNewRenderer() {
return new AutoScalingParagraphRenderer(this);
}
}
private static class AutoScalingParagraphRenderer extends ParagraphRenderer {
private IRenderer innerRenderer;
public AutoScalingParagraphRenderer(AutoScalingParagraph modelElement) {
super(modelElement);
}
#Override
public LayoutResult layout(LayoutContext layoutContext) {
LayoutResult baseResult = super.layout(layoutContext);
this.innerRenderer = ((AutoScalingParagraph)modelElement).innerParagraph.createRendererSubTree().setParent(this);
if (baseResult.getStatus() == LayoutResult.FULL) {
float fontSizeL = 0.0001f, fontSizeR= 10000;
while (Math.abs(fontSizeL - fontSizeR) > 1e-1) {
float curFontSize = (fontSizeL + fontSizeR) / 2;
this.innerRenderer.setProperty(Property.FONT_SIZE, UnitValue.createPointValue(curFontSize));
if (this.innerRenderer.layout(new LayoutContext(getOccupiedArea().clone())).getStatus() == LayoutResult.FULL) {
// we can fit all the text with curFontSize
fontSizeL = curFontSize;
} else {
fontSizeR = curFontSize;
}
}
this.innerRenderer.setProperty(Property.FONT_SIZE, UnitValue.createPointValue(fontSizeL));
this.innerRenderer.layout(new LayoutContext(getOccupiedArea().clone()));
}
return baseResult;
}
#Override
public void drawChildren(DrawContext drawContext) {
super.drawChildren(drawContext);
innerRenderer.draw(drawContext);
}
#Override
public IRenderer getNextRenderer() {
return new AutoScalingParagraphRenderer((AutoScalingParagraph) modelElement);
}
}
Now we add the helper function for creating our wrapper paragraphs that just accepts the desired paragraph width in spaces and the underlying content we want to fit into that space:
private static Paragraph createAdjustableParagraph(int widthInSpaces, Paragraph innerContent) {
AutoScalingParagraph paragraph = new AutoScalingParagraph(innerContent);
paragraph.setBorder(new SolidBorder(1));
StringBuilder sb = new StringBuilder();
for(int i=0;i<widthInSpaces;i++) {
sb.append("\u00a0");
}
paragraph.add(sb.toString());
return paragraph;
}
Finally, the main code:
PdfWriter writer = new PdfWriter(new File("test4.pdf"));
PdfDocument document = new PdfDocument(writer);
Document doc = new Document(document);
Paragraph paragraphTest = new Paragraph();
Text testch01 = new Text("This is the 2nd example ");
paragraphTest.add(testch01);
paragraphTest.add(createAdjustableParagraph(42, new Paragraph("with insertion str")));
Text testch02 = new Text(" that doesn't work ");
paragraphTest.add(testch02);
paragraphTest.add(createAdjustableParagraph(53, new Paragraph("because next text goes next line")));
Text testch04 = new Text(" this next goes to the next line but ");
paragraphTest.add(testch04);
paragraphTest.add(createAdjustableParagraph(42, new Paragraph("this text")));
Text testch05 = new Text(" won't appear !!");
paragraphTest.add(testch05);
doc.add(paragraphTest);
doc.close();
Which gives us the following result:
So we just have the one main paragraph which contains some content and our paragraph wrappers, which in tern have the underlying content we want to fit.
Hint: centering the text is very easy, you don't need to calculate coordinates etc. Just set the right property to the paragraph with the content that you feed to your wrapper paragraph:
paragraphTest.add(createAdjustableParagraph(42, new Paragraph("this text").setTextAlignment(TextAlignment.CENTER)));
And you get the result:

How I can make dynamic header column and dynamic data to export to Excel/CSV/PDF using MongoDB, Spring Boot, and apache poi

I want to make export function using Spring Boot, I have data on MongoDB NoSQL, and then want to export my document on MongoDB Dynamically using Apache POI ( If any better dependency you can recommend to me).
I don't want to declare header column, entity model, etc., I want to export data dynamically as shown as in my Document database, any one can give me an example for it?
Please try these code may be help for you.
->add Gson Depandencies in porm.xml
Controller code
MasterController.java
#Autowired
MasterServiceImpl masterServiceImpl;
#GetMapping(value="/dynamicfile/{flag}/{fileType}/{fileName}")
public ResponseEntity<InputStreamResource> downloadsFiles(#PathVariable("flag") int flag,#PathVariable("fileType") String fileType,#PathVariable("fileName") String fileName) throws IOException{
List<?> objects=new ArrayList<>();
if(flag==1) {
objects=masterServiceImpl.getData();
}
ByteArrayInputStream in = masterServiceImpl.downloadsFiles(objects,fileType);
HttpHeaders headers = new HttpHeaders();
if(fileType.equals("Excel")) {
headers.add("Content-Disposition", "attachment; filename="+fileName+".xlsx");
}else if(fileType.equals("Pdf")){
headers.add("Content-Disposition", "attachment; filename="+fileName+".pdf");
}else if(fileType.equals("Csv")) {
headers.add("Content-Disposition", "attachment; filename="+fileName+".csv");
}
return ResponseEntity.ok().headers(headers).body(new InputStreamResource(in));
}
Service code :
MasterServiceImpl.java
public static List<HashMap<Object, Object>> getListOfObjectToListOfHashMap(List<?> objects) {
List<HashMap<Object,Object>> list=new ArrayList<>();
for(int i=0;i<objects.size();i++) {
HashMap<Object,Object> map=new HashMap<>();
String temp=new Gson().toJson(objects.get(i)).toString();
String temo1= temp.substring(1, temp.length()-1);
String[] temp2=temo1.split(",");
for(int j=-1;j<temp2.length;j++) {
if(j==-1) {
map.put("SrNo",i+1);
}else {
String tempKey=temp2[j].toString().split(":")[0].toString();
String tempValue=temp2[j].toString().split(":")[1].toString();
char ch=tempValue.charAt(0);
if(ch=='"') {
map.put(tempKey.substring(1, tempKey.length()-1), tempValue.substring(1, tempValue.length()-1));
}else {
map.put(tempKey.substring(1, tempKey.length()-1), tempValue);
}
}
}
list.add(map);
}
return list;
}
public static ByteArrayInputStream downloadsFiles(List<?> objects,String fileType) throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
List<HashMap<Object, Object>> list = getListOfObjectToListOfHashMap(objects);
String[] COLUMNs = getColumnsNameFromListOfObject(objects);
try{
if(fileType.equals("Excel")) {
generateExcel(list, COLUMNs,out);
}else if(fileType.equals("Pdf")) {
generatePdf(list, COLUMNs,out);
}else if(fileType.equals("Csv")) {
generatePdf(list, COLUMNs,out);
}
}catch(Exception ex) {
System.out.println("Error occurred:"+ ex);
}
return new ByteArrayInputStream(out.toByteArray());
}
public static final String[] getColumnsNameFromListOfObject(List<?> objects) {
String strObjects=new Gson().toJson(objects.get(0)).toString();
String[] setHeader= strObjects.substring(1, strObjects.length()-1).split(",");
String header="SrNo";
for(int i=0;i<setHeader.length;i++) {
String str=setHeader[i].toString().split(":")[0].toString();
header=header+","+str.substring(1, str.length()-1);
}
return header.split(",");
}
public static final void generateExcel(List<HashMap<Object, Object>> list, String[] COLUMNs,ByteArrayOutputStream out) throws IOException {
Workbook workbook = new XSSFWorkbook();
Sheet sheet = workbook.createSheet("Excelshit");
Font headerFont = workbook.createFont();
headerFont.setBold(true);
headerFont.setColor(IndexedColors.BLUE.getIndex());
CellStyle headerCellStyle = workbook.createCellStyle();
headerCellStyle.setFont(headerFont);
Row headerRow = sheet.createRow(0);
for (int col = 0; col < COLUMNs.length; col++) {
Cell cell = headerRow.createCell(col);
cell.setCellValue(COLUMNs[col]);
cell.setCellStyle(headerCellStyle);
}
int rowIdx = 1;
for(int k = 0; k < list.size(); k++){
Row row =sheet.createRow(rowIdx++);
for (Map.Entry<Object, Object> entry : list.get(k).entrySet()){
Object key = entry.getKey();
Object value = entry.getValue();
for (int col = 0; col < COLUMNs.length; col++) {
if(key.toString().equals(COLUMNs[col].toString())) {
row.createCell(col).setCellValue(value.toString());
}
}
}
}
workbook.write(out);
System.out.println(workbook);
}
private static final void generatePdf(List<HashMap<Object, Object>> list, String[] COLUMNs,ByteArrayOutputStream out) throws DocumentException {
Document document = new Document();
com.itextpdf.text.Font headerFont =FontFactory.getFont(FontFactory.HELVETICA_BOLD,8);
com.itextpdf.text.Font dataFont =FontFactory.getFont(FontFactory.TIMES_ROMAN,8);
PdfPCell hcell=null;
PdfPTable table = new PdfPTable(COLUMNs.length);
for (int col = 0; col < COLUMNs.length; col++) {
hcell = new PdfPCell(new Phrase(COLUMNs[col], headerFont));
hcell.setHorizontalAlignment(Element.ALIGN_CENTER);
table.addCell(hcell);
}
for(int index = 0; index < list.size(); index++){
PdfPCell cell = null;
for (Map.Entry<Object, Object> entry : list.get(index).entrySet()){
Object key = entry.getKey();
Object value = entry.getValue();
for (int col = 0; col < COLUMNs.length; col++) {
if(key.toString().equals(COLUMNs[col].toString())) {
cell = new PdfPCell(new Phrase(value.toString(),dataFont));
cell.setVerticalAlignment(Element.ALIGN_MIDDLE);
cell.setHorizontalAlignment(Element.ALIGN_CENTER);
}
}
table.addCell(cell);
}
}
PdfWriter.getInstance(document, out);
document.open();
document.add(table);
document.close();
}
public List<TblDepartment> getData() {
List<TblDepartment> list = new ArrayList<>();
departmentRepository.findAll().forEach(list::add);
return list;
}

Manipulate paths, color etc. in iText

I need to analyze path data of PDF files and manipulate content with iText 7. Manipulations include deletion/replacemant and coloring.
I can analyze the graphics alright with something like the following code:
public class ContentParsing {
public static void main(String[] args) throws IOException {
new ContentParsing().inspectPdf("testdata/test.pdf");
}
public void inspectPdf(String path) throws IOException {
File file = new File(path);
PdfDocument pdf = new PdfDocument(new PdfReader(file.getAbsolutePath()));
PdfDocumentContentParser parser = new PdfDocumentContentParser(pdf);
for (int i=1; i<=pdf.getNumberOfPages(); i++) {
parser.processContent(i, new PathEventListener());
}
pdf.close();
}
}
public class PathEventListener implements IEventListener {
public void eventOccurred(IEventData eventData, EventType eventType) {
PathRenderInfo pathRenderInfo = (PathRenderInfo) eventData;
for ( Subpath subpath : pathRenderInfo.getPath().getSubpaths() ) {
for ( IShape segment : subpath.getSegments() ) {
// Here goes some path analysis code
System.out.println(segment.getBasePoints());
}
}
}
public Set<EventType> getSupportedEvents() {
Set<EventType> supportedEvents = new HashSet<EventType>();
supportedEvents.add(EventType.RENDER_PATH);
return supportedEvents;
}
}
Now, what's the way to go with manipulating things and writing them back to the PDF? Do I have to construct an entirely new PDF document and copy everything over (in manipulated form), or can I somehow manipulate the read PDF data directly?
Now, what's the way to go with manipulating things and writing them back to the PDF? Do I have to construct an entirely new PDF document and copy everything over (in manipulated form), or can I somehow manipulate the read PDF data directly?
In essence you are looking for a class which is not merely parsing a PDF content stream and signaling the instructions in it like the PdfCanvasProcessor (the PdfDocumentContentParser you use is merely a very thin wrapper for PdfCanvasProcessor) but which also creates the content stream anew with the instructions you forward back to it.
A generic content stream editor class
For iText 5.5.x a proof-of-concept for such a content stream editor class can be found in this answer (the Java version is a bit further down in the answer text).
This is a port of that proof-of-concept to iText 7:
public class PdfCanvasEditor extends PdfCanvasProcessor
{
/**
* This method edits the immediate contents of a page, i.e. its content stream.
* It explicitly does not descent into form xobjects, patterns, or annotations.
*/
public void editPage(PdfDocument pdfDocument, int pageNumber) throws IOException
{
if ((pdfDocument.getReader() == null) || (pdfDocument.getWriter() == null))
{
throw new PdfException("PdfDocument must be opened in stamping mode.");
}
PdfPage page = pdfDocument.getPage(pageNumber);
PdfResources pdfResources = page.getResources();
PdfCanvas pdfCanvas = new PdfCanvas(new PdfStream(), pdfResources, pdfDocument);
editContent(page.getContentBytes(), pdfResources, pdfCanvas);
page.put(PdfName.Contents, pdfCanvas.getContentStream());
}
/**
* This method processes the content bytes and outputs to the given canvas.
* It explicitly does not descent into form xobjects, patterns, or annotations.
*/
public void editContent(byte[] contentBytes, PdfResources resources, PdfCanvas canvas)
{
this.canvas = canvas;
processContent(contentBytes, resources);
this.canvas = null;
}
/**
* <p>
* This method writes content stream operations to the target canvas. The default
* implementation writes them as they come, so it essentially generates identical
* copies of the original instructions the {#link ContentOperatorWrapper} instances
* forward to it.
* </p>
* <p>
* Override this method to achieve some fancy editing effect.
* </p>
*/
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
PdfOutputStream pdfOutputStream = canvas.getContentStream().getOutputStream();
int index = 0;
for (PdfObject object : operands)
{
pdfOutputStream.write(object);
if (operands.size() > ++index)
pdfOutputStream.writeSpace();
else
pdfOutputStream.writeNewLine();
}
}
//
// constructor giving the parent a dummy listener to talk to
//
public PdfCanvasEditor()
{
super(new DummyEventListener());
}
//
// Overrides of PdfContentStreamProcessor methods
//
#Override
public IContentOperator registerContentOperator(String operatorString, IContentOperator operator)
{
ContentOperatorWrapper wrapper = new ContentOperatorWrapper();
wrapper.setOriginalOperator(operator);
IContentOperator formerOperator = super.registerContentOperator(operatorString, wrapper);
return formerOperator instanceof ContentOperatorWrapper ? ((ContentOperatorWrapper)formerOperator).getOriginalOperator() : formerOperator;
}
//
// members holding the output canvas and the resources
//
protected PdfCanvas canvas = null;
//
// A content operator class to wrap all content operators to forward the invocation to the editor
//
class ContentOperatorWrapper implements IContentOperator
{
public IContentOperator getOriginalOperator()
{
return originalOperator;
}
public void setOriginalOperator(IContentOperator originalOperator)
{
this.originalOperator = originalOperator;
}
#Override
public void invoke(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
if (originalOperator != null && !"Do".equals(operator.toString()))
{
originalOperator.invoke(processor, operator, operands);
}
write(processor, operator, operands);
}
private IContentOperator originalOperator = null;
}
//
// A dummy event listener to give to the underlying canvas processor to feed events to
//
static class DummyEventListener implements IEventListener
{
#Override
public void eventOccurred(IEventData data, EventType type)
{ }
#Override
public Set<EventType> getSupportedEvents()
{
return null;
}
}
}
(PdfCanvasEditor.java)
The explanations from the iText 5 answer still apply, the parsing framework has not changed much from iText 5.5.x to iText 7.0.x.
Usage examples
Unfortunately you wrote in very vague terms about how exactly you want to change the contents. Thus I simply ported some iText 5 samples which made use of the original iText 5 content stream editor class:
Watermark removal
These are ports of the use cases in this answer.
testRemoveBoldMTTextDocument
This example drops all text written in a font the name of which ends with "BoldMT":
try ( InputStream resource = getClass().getResourceAsStream("document.pdf");
PdfReader pdfReader = new PdfReader(resource);
OutputStream result = new FileOutputStream(new File(RESULT_FOLDER, "document-noBoldMTText.pdf"));
PdfWriter pdfWriter = new PdfWriter(result);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfCanvasEditor()
{
#Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (TEXT_SHOWING_OPERATORS.contains(operatorString))
{
if (getGraphicsState().getFont().getFontProgram().getFontNames().getFontName().endsWith("BoldMT"))
return;
}
super.write(processor, operator, operands);
}
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(EditPageContent.java test method testRemoveBoldMTTextDocument)
testRemoveBigTextDocument
This example drops all text written with a large font size:
try ( InputStream resource = getClass().getResourceAsStream("document.pdf");
PdfReader pdfReader = new PdfReader(resource);
OutputStream result = new FileOutputStream(new File(RESULT_FOLDER, "document-noBigText.pdf"));
PdfWriter pdfWriter = new PdfWriter(result);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfCanvasEditor()
{
#Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (TEXT_SHOWING_OPERATORS.contains(operatorString))
{
if (getGraphicsState().getFontSize() > 100)
return;
}
super.write(processor, operator, operands);
}
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(EditPageContent.java test method testRemoveBigTextDocument)
Text color change
This is a port of the use case in this answer.
testChangeBlackTextToGreenDocument
This example changes the color of black text to green.
try ( InputStream resource = getClass().getResourceAsStream("document.pdf");
PdfReader pdfReader = new PdfReader(resource);
OutputStream result = new FileOutputStream(new File(RESULT_FOLDER, "document-blackTextToGreen.pdf"));
PdfWriter pdfWriter = new PdfWriter(result);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfCanvasEditor()
{
#Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (TEXT_SHOWING_OPERATORS.contains(operatorString))
{
if (currentlyReplacedBlack == null)
{
Color currentFillColor = getGraphicsState().getFillColor();
if (Color.BLACK.equals(currentFillColor))
{
currentlyReplacedBlack = currentFillColor;
super.write(processor, new PdfLiteral("rg"), Arrays.asList(new PdfNumber(0), new PdfNumber(1), new PdfNumber(0), new PdfLiteral("rg")));
}
}
}
else if (currentlyReplacedBlack != null)
{
if (currentlyReplacedBlack instanceof DeviceCmyk)
{
super.write(processor, new PdfLiteral("k"), Arrays.asList(new PdfNumber(0), new PdfNumber(0), new PdfNumber(0), new PdfNumber(1), new PdfLiteral("k")));
}
else if (currentlyReplacedBlack instanceof DeviceGray)
{
super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
}
else
{
super.write(processor, new PdfLiteral("rg"), Arrays.asList(new PdfNumber(0), new PdfNumber(0), new PdfNumber(0), new PdfLiteral("rg")));
}
currentlyReplacedBlack = null;
}
super.write(processor, operator, operands);
}
Color currentlyReplacedBlack = null;
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(EditPageContent.java test method testChangeBlackTextToGreenDocument)

Copying fields in iTextSharp 5.4.5.0

I was under the impression that it is now possible to copy AcroFields using PdfCopy. In the release notes for iText 5.4.4.0 this is listed as possible now. However, when I try to do so it appears all the annotations (I think I am using that term correctly, still fairly new to iText...) for the fields are stripped out. It looks like the fields are there (meaning I can see the blue boxes that indicate an editable field), but they are not editable. If I try to bring the PDF up in Acrobat I get a message saying that "there are no fields, would you like Acrobat to discover them?" and most are found and marked and fields properly (check boxes aren't, but the text fields are).
I assume there is an additional step somewhere along the lines to re-add the annotations to the PdfCopy object, but I do not see a way to get the annotations from the PdfReader. I also cannot seem to find any documentation on how to do this (since AcroFields were for so long not supported in PdfCopy most of what I find is along that vein).
Due to sensitivity I cannot provide a copy of the PDF's in question, but using an altered version of a test program used earlier you can see the issue with the following code. It should generate a table with some check boxes in the four right columns. If I use the exact same code with PdfCopyFields in the MergePdfs method instead of PdfCopy it works as expected. This code does not produce any text fields, but in my main project they are part of the original parent PDF that is used as a template.
(Sorry for the long example, it has been cherry picked from a much larger application. You will need a PDF with a field named "TableStartPosition" somewhere in it and update RunTest with the correct paths for your local machine to get this to work.)
Has the PdfCopy functionality not made it into iTextSharp yet? I am using version 5.4.5.0.
class Program
{
Stream _pdfTemplateStream;
MemoryStream _pdfResultStream;
PdfReader _pdfTemplateReader;
PdfStamper _pdfResultStamper;
static void Main(string[] args)
{
Program p = new Program();
try
{
p.RunTest();
}
catch (Exception f)
{
Console.WriteLine(f.Message);
Console.ReadLine();
}
}
internal void RunTest()
{
FileStream fs = File.OpenRead(#"C:\temp\a\RenameFieldTest\RenameFieldTest\Library\CoverPage.pdf");
_pdfTemplateStream = fs;
_pdfResultStream = new MemoryStream();
//PDFTemplateStream = new FileStream(_templatePath, FileMode.Open);
_pdfTemplateReader = new PdfReader(_pdfTemplateStream);
_pdfResultStamper = new PdfStamper(_pdfTemplateReader, _pdfResultStream);
#region setup objects
List<CustomCategory> Categories = new List<CustomCategory>();
CustomCategory c1 = new CustomCategory();
c1.CategorySizesInUse.Add(CustomCategory.AvailableSizes[1]);
c1.CategorySizesInUse.Add(CustomCategory.AvailableSizes[2]);
Categories.Add(c1);
CustomCategory c2 = new CustomCategory();
c2.CategorySizesInUse.Add(CustomCategory.AvailableSizes[0]);
c2.CategorySizesInUse.Add(CustomCategory.AvailableSizes[1]);
Categories.Add(c2);
List<CustomObject> Items = new List<CustomObject>();
CustomObject co1 = new CustomObject();
co1.Category = c1;
co1.Title = "Object 1";
Items.Add(co1);
CustomObject co2 = new CustomObject();
co2.Category = c2;
co2.Title = "Object 2";
Items.Add(co2);
#endregion
FillCoverPage(Items);
_pdfResultStamper.Close();
_pdfTemplateReader.Close();
List<MemoryStream> pdfStreams = new List<MemoryStream>();
pdfStreams.Add(new MemoryStream(_pdfResultStream.ToArray()));
MergePdfs(#"C:\temp\a\RenameFieldTest\RenameFieldTest\Library\Outfile.pdf", pdfStreams);
_pdfResultStream.Dispose();
_pdfTemplateStream.Dispose();
}
internal void FillCoverPage(List<CustomObject> Items)
{
//Before we start we need to figure out where to start adding the table
var fieldPositions = _pdfResultStamper.AcroFields.GetFieldPositions("TableStartPosition");
if (fieldPositions == null)
{ throw new Exception("Could not find the TableStartPosition field. Unable to determine point of origin for the table!"); }
_pdfResultStamper.AcroFields.RemoveField("TableStartPosition");
var fieldPosition = fieldPositions[0];
// Get the position of the field
var targetPosition = fieldPosition.position;
//First, get all the available card sizes
List<string> availableSizes = CustomCategory.AvailableSizes;
//Generate a table with the number of available card sizes + 1 for the device name
PdfPTable table = new PdfPTable(availableSizes.Count + 1);
float[] columnWidth = new float[availableSizes.Count + 1];
for (int y = 0; y < columnWidth.Length; y++)
{
if (y == 0)
{ columnWidth[y] = 320; }
else
{ columnWidth[y] = 120; }
}
table.SetTotalWidth(columnWidth);
table.WidthPercentage = 100;
PdfContentByte canvas;
List<PdfFormField> checkboxes = new List<PdfFormField>();
//Build the header row
table.Rows.Add(new PdfPRow(this.GetTableHeaderRow(availableSizes)));
//Insert the global check boxes
PdfPCell[] globalRow = new PdfPCell[availableSizes.Count + 1];
Phrase tPhrase = new Phrase("Select/Unselect All");
PdfPCell tCell = new PdfPCell();
tCell.BackgroundColor = BaseColor.LIGHT_GRAY;
tCell.AddElement(tPhrase);
globalRow[0] = tCell;
for (int x = 0; x < availableSizes.Count; x++)
{
tCell = new PdfPCell();
tCell.BackgroundColor = BaseColor.LIGHT_GRAY;
PdfFormField f = PdfFormField.CreateCheckBox(_pdfResultStamper.Writer);
string fieldName = string.Format("InkSaver.Global.chk{0}", availableSizes[x].Replace(".", ""));
//f.FieldName = fieldName;
string js = string.Format("hideAll(event.target, '{0}');", availableSizes[x].Replace(".", ""));
f.Action = PdfAction.JavaScript(js, _pdfResultStamper.Writer);
tCell.CellEvent = new ChildFieldEvent(_pdfResultStamper.Writer, f, fieldName);
globalRow[x + 1] = tCell;
checkboxes.Add(f);
}
table.Rows.Add(new PdfPRow(globalRow));
int status = 0;
int pageNum = 1;
for (int itemIndex = 0; itemIndex < Items.Count; itemIndex++)
{
tCell = new PdfPCell();
Phrase p = new Phrase(Items[itemIndex].Title);
tCell.AddElement(p);
tCell.HorizontalAlignment = Element.ALIGN_LEFT;
PdfPCell[] cells = new PdfPCell[availableSizes.Count + 1];
cells[0] = tCell;
for (int availCardSizeIndex = 0; availCardSizeIndex < availableSizes.Count; availCardSizeIndex++)
{
if (Items[itemIndex].Category.CategorySizesInUse.Contains(availableSizes[availCardSizeIndex]))
{
string str = availableSizes[availCardSizeIndex];
tCell = new PdfPCell();
tCell.PaddingLeft = 10f;
tCell.PaddingRight = 10f;
cells[availCardSizeIndex + 1] = tCell;
cells[availCardSizeIndex].HorizontalAlignment = Element.ALIGN_CENTER;
PdfFormField f = PdfFormField.CreateCheckBox(_pdfResultStamper.Writer);
string fieldName = string.Format("InkSaver.chk{0}.{1}", availableSizes[availCardSizeIndex].Replace(".", ""), itemIndex + 1);
//f.FieldName = fieldName; <-- This causes the checkbox to be double-named (i.e. InkSaver.Global.chk0.InkSaver.Global.chk0
string js = string.Format("hideCardSize(event.target, {0}, '{1}');", itemIndex + 1, availableSizes[availCardSizeIndex]);
f.Action = PdfAction.JavaScript(js, _pdfResultStamper.Writer);
tCell.CellEvent = new ChildFieldEvent(_pdfResultStamper.Writer, f, fieldName);
checkboxes.Add(f);
}
else
{
//Add a blank cell
tCell = new PdfPCell();
cells[availCardSizeIndex + 1] = tCell;
}
}
//Test if the column text will fit
table.Rows.Add(new PdfPRow(cells));
canvas = _pdfResultStamper.GetUnderContent(pageNum);
ColumnText ct2 = new ColumnText(canvas);
ct2.AddElement(new PdfPTable(table));
ct2.Alignment = Element.ALIGN_LEFT;
ct2.SetSimpleColumn(targetPosition.Left, 0, targetPosition.Right, targetPosition.Top, 0, 0);
status = ct2.Go(true);
if ((status != ColumnText.NO_MORE_TEXT) || (itemIndex == (Items.Count - 1)))
{
ColumnText ct3 = new ColumnText(canvas);
ct3.AddElement(table);
ct3.Alignment = Element.ALIGN_LEFT;
ct3.SetSimpleColumn(targetPosition.Left, 0, targetPosition.Right, targetPosition.Top, 0, 0);
ct3.Go();
foreach (PdfFormField f in checkboxes)
{
_pdfResultStamper.AddAnnotation(f, pageNum);
}
checkboxes.Clear();
if (itemIndex < (Items.Count - 1))
{
pageNum++;
_pdfResultStamper.InsertPage(pageNum, _pdfTemplateReader.GetPageSize(1));
table = new PdfPTable(availableSizes.Count + 1);
table.SetTotalWidth(columnWidth);
table.WidthPercentage = 100;
table.Rows.Add(new PdfPRow(this.GetTableHeaderRow(availableSizes)));
}
}
}
}
private PdfPCell[] GetTableHeaderRow(List<string> AvailableSizes)
{
PdfPCell[] sizeHeaders = new PdfPCell[AvailableSizes.Count + 1];
Phrase devName = new Phrase("Device Name");
PdfPCell deviceHeader = new PdfPCell(devName);
deviceHeader.HorizontalAlignment = Element.ALIGN_CENTER;
deviceHeader.BackgroundColor = BaseColor.GRAY;
sizeHeaders[0] = deviceHeader;
for (int x = 0; x < AvailableSizes.Count; x++)
{
PdfPCell hCell = new PdfPCell(new Phrase(AvailableSizes[x]));
hCell.HorizontalAlignment = Element.ALIGN_CENTER;
hCell.BackgroundColor = BaseColor.GRAY;
sizeHeaders[x + 1] = hCell;
}
return sizeHeaders;
}
public void MergePdfs(string filePath, List<MemoryStream> pdfStreams)
{
//Create output stream
FileStream outStream = new FileStream(filePath, FileMode.Create);
Document document = null;
if (pdfStreams.Count > 0)
{
try
{
int PageCounter = 0;
//Create Main reader
PdfReader reader = new PdfReader(pdfStreams[0]);
PageCounter = reader.NumberOfPages;//This is if we have multiple pages in the cover page, we need to adjust the offset.
//rename fields in the PDF. This is required because PDF's cannot have more than one field with the same name
RenameFields(reader, PageCounter++);
//Create Main Doc
document = new Document(reader.GetPageSizeWithRotation(1));
//Create main writer
PdfCopy Writer = new PdfCopy(document, outStream);
//PdfCopyFields Writer = new PdfCopyFields(outStream);
//Open document for writing
document.Open();
////Add pages
Writer.AddDocument(reader);
//For each additional pdf after first combine them into main document
foreach (var PdfStream in pdfStreams.Skip(1))
{
PdfReader reader2 = new PdfReader(PdfStream);
//rename PDF fields
RenameFields(reader2, PageCounter++);
// Add content
Writer.AddDocument(reader);
}
//Writer.AddJavaScript(PostProcessing.GetSuperscriptJavaScript());
Writer.Close();
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
finally
{
if (document != null)
document.Close();
foreach (var Strm in pdfStreams)
{
try { if (null != Strm) Strm.Dispose(); }
catch { }
}
//pdfStamper.Close();
outStream.Close();
}
}
}
private void RenameFields(PdfReader reader, int PageNum)
{
int tempPageNum = 1;
//rename all fields
foreach (string field in reader.AcroFields.Fields.Keys)
{
if (((reader.AcroFields.GetFieldType(field) == 1) || (reader.AcroFields.GetFieldType(field) == 2)) && (field.StartsWith("InkSaver")))
{
//This is a InkSaver button, set the name so its subclassed
string classPath;
if (reader.AcroFields.GetFieldType(field) == 2)
{
classPath = field.Substring(0, field.LastIndexOf("."));
if (field.StartsWith("InkSaver.chk"))
{
int a = field.LastIndexOf(".");
string sub = field.Substring(a + 1, (field.Length - a - 1));
int pageNum = int.Parse(sub);
int realPageNum = pageNum + tempPageNum;//PostProcessing.Instance.CoverPageLength;
PageNum = realPageNum;
}
}
else
{
classPath = field.Substring(0, field.LastIndexOf("."));
}
string newID = classPath + ".page" + PageNum.ToString();
bool ret = reader.AcroFields.RenameField(field, newID);
}
else
{
reader.AcroFields.RenameField(field, field + "_" + PageNum.ToString());// field + Guid.NewGuid().ToString("N"));
}
}
}
}
public class ChildFieldEvent : IPdfPCellEvent
{
protected PdfWriter writer;
protected PdfFormField parent;
protected string checkBoxName;
internal ChildFieldEvent(PdfWriter writer, PdfFormField parent, string CheckBoxName)
{
this.writer = writer;
this.parent = parent;
this.checkBoxName = CheckBoxName;
}
public void CellLayout(PdfPCell cell, Rectangle rect, PdfContentByte[] cb)
{
createCheckboxField(rect);
}
private void createCheckboxField(Rectangle rect)
{
RadioCheckField bt = new RadioCheckField(this.writer, rect, this.checkBoxName, "Yes");
bt.CheckType = RadioCheckField.TYPE_SQUARE;
bt.Checked = true;
this.parent.AddKid(bt.CheckField);
}
}
internal class CustomCategory
{
internal static List<string> AvailableSizes
{
get
{
List<string> retVal = new List<string>();
retVal.Add("1");
retVal.Add("2");
retVal.Add("3");
retVal.Add("4");
return retVal;
}
}
internal CustomCategory()
{
CategorySizesInUse = new List<string>();
}
internal List<string> CategorySizesInUse { get; set; }
}
internal class CustomObject
{
internal string Title { get; set; }
internal CustomCategory Category { get;set; }
}
Please take a look at the MergeForms example. Your example is too long for me to read, but at first sight, I'm missing the following line:
copy.setMergeFields();
By the way, in MergeForms2, the fields are also renamed before the form is merged.

Extract images using iTextSharp

I have been using this code with great success to pull out the first image found in each page of a PDF. However, it is now not working with some new PDFs for an uknown reason. I have used other tools (Datalogics, etc) that do pull out the images fine with these new PDFs. However, I do not want to buy Datalogics or any tool if I can use iTextSharp. Can anybody tell me why this code is not finding the images in the PDF?
Knowns: my PDFs only have 1 image per page and nothing else.
using iTextSharp.text;
using iTextSharp.text.pdf;
...
public static void ExtractImagesFromPDF(string sourcePdf, string outputPath)
{
// NOTE: This will only get the first image it finds per page.
PdfReader pdf = new PdfReader(sourcePdf);
RandomAccessFileOrArray raf = new iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf);
try
{
for (int pageNumber = 1; pageNumber <= pdf.NumberOfPages; pageNumber++)
{
PdfDictionary pg = pdf.GetPageN(pageNumber);
PdfDictionary res = (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
PdfDictionary xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
if (xobj != null)
{
foreach (PdfName name in xobj.Keys)
{
PdfObject obj = xobj.Get(name);
if (obj.IsIndirect())
{
PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
PdfName type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
if (PdfName.IMAGE.Equals(type))
{
int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(System.Globalization.CultureInfo.InvariantCulture));
PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
PdfStream pdfStrem = (PdfStream)pdfObj;
byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)pdfStrem);
if ((bytes != null))
{
using (System.IO.MemoryStream memStream = new System.IO.MemoryStream(bytes))
{
memStream.Position = 0;
System.Drawing.Image img = System.Drawing.Image.FromStream(memStream);
// must save the file while stream is open.
if (!Directory.Exists(outputPath))
Directory.CreateDirectory(outputPath);
string path = Path.Combine(outputPath, String.Format(#"{0}.jpg", pageNumber));
System.Drawing.Imaging.EncoderParameters parms = new System.Drawing.Imaging.EncoderParameters(1);
parms.Param[0] = new System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0);
System.Drawing.Imaging.ImageCodecInfo jpegEncoder = Utilities.GetImageEncoder("JPEG");
img.Save(path, jpegEncoder, parms);
break;
}
}
}
}
}
}
}
}
catch
{
throw;
}
finally
{
pdf.Close();
raf.Close();
}
}
I found that my problem was that I was not recursively searching inside of forms and groups for images. Basically, the original code would only find images that were embedded at the root of the pdf document. Here is the revised method plus a new method (FindImageInPDFDictionary) that recursively searches for images in the page. NOTE: the flaws of only supporting JPEG and non-compressed images still applies. See R Ubben's code for options to fix those flaws. HTH someone.
public static void ExtractImagesFromPDF(string sourcePdf, string outputPath)
{
// NOTE: This will only get the first image it finds per page.
PdfReader pdf = new PdfReader(sourcePdf);
RandomAccessFileOrArray raf = new iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf);
try
{
for (int pageNumber = 1; pageNumber <= pdf.NumberOfPages; pageNumber++)
{
PdfDictionary pg = pdf.GetPageN(pageNumber);
// recursively search pages, forms and groups for images.
PdfObject obj = FindImageInPDFDictionary(pg);
if (obj != null)
{
int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(System.Globalization.CultureInfo.InvariantCulture));
PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
PdfStream pdfStrem = (PdfStream)pdfObj;
byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)pdfStrem);
if ((bytes != null))
{
using (System.IO.MemoryStream memStream = new System.IO.MemoryStream(bytes))
{
memStream.Position = 0;
System.Drawing.Image img = System.Drawing.Image.FromStream(memStream);
// must save the file while stream is open.
if (!Directory.Exists(outputPath))
Directory.CreateDirectory(outputPath);
string path = Path.Combine(outputPath, String.Format(#"{0}.jpg", pageNumber));
System.Drawing.Imaging.EncoderParameters parms = new System.Drawing.Imaging.EncoderParameters(1);
parms.Param[0] = new System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0);
System.Drawing.Imaging.ImageCodecInfo jpegEncoder = Utilities.GetImageEncoder("JPEG");
img.Save(path, jpegEncoder, parms);
}
}
}
}
}
catch
{
throw;
}
finally
{
pdf.Close();
raf.Close();
}
}
private static PdfObject FindImageInPDFDictionary(PdfDictionary pg)
{
PdfDictionary res =
(PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
PdfDictionary xobj =
(PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
if (xobj != null)
{
foreach (PdfName name in xobj.Keys)
{
PdfObject obj = xobj.Get(name);
if (obj.IsIndirect())
{
PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
PdfName type =
(PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
//image at the root of the pdf
if (PdfName.IMAGE.Equals(type))
{
return obj;
}// image inside a form
else if (PdfName.FORM.Equals(type))
{
return FindImageInPDFDictionary(tg);
} //image inside a group
else if (PdfName.GROUP.Equals(type))
{
return FindImageInPDFDictionary(tg);
}
}
}
}
return null;
}
Here is a simpler solution:
iTextSharp.text.pdf.parser.PdfImageObject pdfImage =
new iTextSharp.text.pdf.parser.PdfImageObject(imgPRStream);
System.Drawing.Image img = pdfImage.GetDrawingImage();
The following code incorporates all of Dave and R Ubben's ideas above, plus it returns a full list of all the images and also deals with multiple bit depths. I had to convert it to VB for the project I'm working on though, sorry about that...
Private Sub getAllImages(ByVal dict As pdf.PdfDictionary, ByVal images As List(Of Byte()), ByVal doc As pdf.PdfReader)
Dim res As pdf.PdfDictionary = CType(pdf.PdfReader.GetPdfObject(dict.Get(pdf.PdfName.RESOURCES)), pdf.PdfDictionary)
Dim xobj As pdf.PdfDictionary = CType(pdf.PdfReader.GetPdfObject(res.Get(pdf.PdfName.XOBJECT)), pdf.PdfDictionary)
If xobj IsNot Nothing Then
For Each name As pdf.PdfName In xobj.Keys
Dim obj As pdf.PdfObject = xobj.Get(name)
If (obj.IsIndirect) Then
Dim tg As pdf.PdfDictionary = CType(pdf.PdfReader.GetPdfObject(obj), pdf.PdfDictionary)
Dim subtype As pdf.PdfName = CType(pdf.PdfReader.GetPdfObject(tg.Get(pdf.PdfName.SUBTYPE)), pdf.PdfName)
If pdf.PdfName.IMAGE.Equals(subtype) Then
Dim xrefIdx As Integer = CType(obj, pdf.PRIndirectReference).Number
Dim pdfObj As pdf.PdfObject = doc.GetPdfObject(xrefIdx)
Dim str As pdf.PdfStream = CType(pdfObj, pdf.PdfStream)
Dim bytes As Byte() = pdf.PdfReader.GetStreamBytesRaw(CType(str, pdf.PRStream))
Dim filter As String = tg.Get(pdf.PdfName.FILTER).ToString
Dim width As String = tg.Get(pdf.PdfName.WIDTH).ToString
Dim height As String = tg.Get(pdf.PdfName.HEIGHT).ToString
Dim bpp As String = tg.Get(pdf.PdfName.BITSPERCOMPONENT).ToString
If filter = "/FlateDecode" Then
bytes = pdf.PdfReader.FlateDecode(bytes, True)
Dim pixelFormat As System.Drawing.Imaging.PixelFormat
Select Case Integer.Parse(bpp)
Case 1
pixelFormat = Drawing.Imaging.PixelFormat.Format1bppIndexed
Case 24
pixelFormat = Drawing.Imaging.PixelFormat.Format24bppRgb
Case Else
Throw New Exception("Unknown pixel format " + bpp)
End Select
Dim bmp As New System.Drawing.Bitmap(Int32.Parse(width), Int32.Parse(height), pixelFormat)
Dim bmd As System.Drawing.Imaging.BitmapData = bmp.LockBits(New System.Drawing.Rectangle(0, 0, Int32.Parse(width), Int32.Parse(height)), System.Drawing.Imaging.ImageLockMode.WriteOnly, pixelFormat)
Marshal.Copy(bytes, 0, bmd.Scan0, bytes.Length)
bmp.UnlockBits(bmd)
Using ms As New MemoryStream
bmp.Save(ms, System.Drawing.Imaging.ImageFormat.Png)
bytes = ms.GetBuffer
End Using
End If
images.Add(bytes)
ElseIf pdf.PdfName.FORM.Equals(subtype) Or pdf.PdfName.GROUP.Equals(subtype) Then
getAllImages(tg, images, doc)
End If
End If
Next
End If
End Sub
This is just another rehash of others' ideas, but the one that worked for me. Here I use #Malco's image grabbing snippet with R Ubben's looping:
private IList<System.Drawing.Image> GetImagesFromPdfDict(PdfDictionary dict, PdfReader doc)
{
List<System.Drawing.Image> images = new List<System.Drawing.Image>();
PdfDictionary res = (PdfDictionary)(PdfReader.GetPdfObject(dict.Get(PdfName.RESOURCES)));
PdfDictionary xobj = (PdfDictionary)(PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT)));
if (xobj != null)
{
foreach (PdfName name in xobj.Keys)
{
PdfObject obj = xobj.Get(name);
if (obj.IsIndirect())
{
PdfDictionary tg = (PdfDictionary)(PdfReader.GetPdfObject(obj));
PdfName subtype = (PdfName)(PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE)));
if (PdfName.IMAGE.Equals(subtype))
{
int xrefIdx = ((PRIndirectReference)obj).Number;
PdfObject pdfObj = doc.GetPdfObject(xrefIdx);
PdfStream str = (PdfStream)(pdfObj);
iTextSharp.text.pdf.parser.PdfImageObject pdfImage =
new iTextSharp.text.pdf.parser.PdfImageObject((PRStream)str);
System.Drawing.Image img = pdfImage.GetDrawingImage();
images.Add(img);
}
else if (PdfName.FORM.Equals(subtype) || PdfName.GROUP.Equals(subtype))
{
images.AddRange(GetImagesFromPdfDict(tg, doc));
}
}
}
}
return images;
}
De c# version:
private IList<System.Drawing.Image> GetImagesFromPdfDict(PdfDictionary dict, PdfReader doc){
List<System.Drawing.Image> images = new List<System.Drawing.Image>();
PdfDictionary res = (PdfDictionary)(PdfReader.GetPdfObject(dict.Get(PdfName.RESOURCES)));
PdfDictionary xobj = (PdfDictionary)(PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT)));
if (xobj != null)
{
foreach (PdfName name in xobj.Keys)
{
PdfObject obj = xobj.Get(name);
if (obj.IsIndirect())
{
PdfDictionary tg = (PdfDictionary)(PdfReader.GetPdfObject(obj));
pdf.PdfName subtype = (pdf.PdfName)(pdf.PdfReader.GetPdfObject(tg.Get(pdf.PdfName.SUBTYPE)));
if (pdf.PdfName.IMAGE.Equals(subtype))
{
int xrefIdx = ((pdf.PRIndirectReference)obj).Number;
pdf.PdfObject pdfObj = doc.GetPdfObject(xrefIdx);
pdf.PdfStream str = (pdf.PdfStream)(pdfObj);
byte[] bytes = pdf.PdfReader.GetStreamBytesRaw((pdf.PRStream)str);
string filter = tg.Get(pdf.PdfName.FILTER).ToString();
string width = tg.Get(pdf.PdfName.WIDTH).ToString();
string height = tg.Get(pdf.PdfName.HEIGHT).ToString();
string bpp = tg.Get(pdf.PdfName.BITSPERCOMPONENT).ToString();
if (filter == "/FlateDecode")
{
bytes = pdf.PdfReader.FlateDecode(bytes, true);
System.Drawing.Imaging.PixelFormat pixelFormat;
switch (int.Parse(bpp))
{
case 1:
pixelFormat = System.Drawing.Imaging.PixelFormat.Format1bppIndexed;
break;
case 24:
pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb;
break;
default:
throw new Exception("Unknown pixel format " + bpp);
}
var bmp = new System.Drawing.Bitmap(Int32.Parse(width), Int32.Parse(height), pixelFormat);
System.Drawing.Imaging.BitmapData bmd = bmp.LockBits(new System.Drawing.Rectangle(0, 0, Int32.Parse(width),
Int32.Parse(height)), System.Drawing.Imaging.ImageLockMode.WriteOnly, pixelFormat);
Marshal.Copy(bytes, 0, bmd.Scan0, bytes.Length);
bmp.UnlockBits(bmd);
using (var ms = new MemoryStream())
{
bmp.Save(ms, System.Drawing.Imaging.ImageFormat.Png);
bytes = ms.GetBuffer();
}
}
images.Add(System.Drawing.Image.FromStream(new MemoryStream(bytes)));
}
else if (pdf.PdfName.FORM.Equals(subtype) || pdf.PdfName.GROUP.Equals(subtype))
{
images.AddRange(GetImagesFromPdfDict(tg, doc));
}
}
}
}
return images;
}
The above will only work with JPEGs. Excluding inline images and embedded files, you need to go through the objects of subtype IMAGE, then look at the filter and take the appropriate action. Here's an example, assuming we have a PdfObject of subtype IMAGE:
PdfReader pdf = new PdfReader("c:\\temp\\exp0.pdf");
int xo=pdf.XrefSize;
for (int i=0;i<xo;i++)
{
PdfObject obj=pdf.GetPdfObject(i);
if (obj!=null && obj.IsStream())
{
PdfDictionary pd=(PdfDictionary)obj;
if (pd.Contains(PdfName.SUBTYPE) && pd.Get(PdfName.SUBTYPE).ToString()=="/Image")
{
string filter=pd.Get(PdfName.FILTER).ToString();
string width=pd.Get(PdfName.WIDTH).ToString();
string height=pd.Get(PdfName.HEIGHT).ToString();
string bpp=pd.Get(PdfName.BITSPERCOMPONENT).ToString();
string extent=".";
byte [] img=null;
switch (filter)
{
case "/FlateDecode":
byte[] arr=PdfReader.FlateDecode(PdfReader.GetStreamBytesRaw((PRStream)obj),true);
Bitmap bmp=new Bitmap(Int32.Parse(width),Int32.Parse(height),PixelFormat.Format24bppRgb);
BitmapData bmd=bmp.LockBits(new Rectangle(0,0,Int32.Parse(width),Int32.Parse(height)),ImageLockMode.WriteOnly,
PixelFormat.Format24bppRgb);
Marshal.Copy(arr,0,bmd.Scan0,arr.Length);
bmp.UnlockBits(bmd);
bmp.Save("c:\\temp\\bmp1.png",ImageFormat.Png);
break;
default:
break;
}
}
}
}
This will mess the color up because of the Microsoft BGR, of course, but I wanted to keep it short. Do something similar for "/CCITTFaxDecode", etc.