iTextSharp - how to find out how many columns have the page [duplicate] - itext

My code below is lost when opening PDF file which has only one column on the front page and more than 1 column on other pages.
Someone can tell me what I'm doing wrong?
Below my code:
PdfReader pdfreader = new PdfReader(pathNmArq);
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
for (int page=1; page <= lastPage; page++)
{
     extractText = PdfTextExtractor.GetTextFromPage(pdfreader, page, strategy);
extractText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(extractText)));
    / / ...
}

You use the SimpleTextExtractionStrategy. This strategy assumes that the text drawing instructions in the PDF are sorted by the reading order. In your case that does not seem to be the case.
If you cannot count on the PDF containing drawing operations in reading order but are only using iText text extraction strategies from the distribution, you have to know areas which constitute a single column. If a page contains multiple columns, you have to use RegionTextRenderFilter to restrict to a column and then use the LocationTextExtractionStrategy.
PS: What exactly is your intention in that
extractText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(extractText)));
line?

Related

maximum page size in itextpdf

I'm creating adf project and generating reports using itextpdf 5.1.3.
For eg:- My table customer has 3000 rows.In my jspx page has customer table & button has file downloader listner there i'm simply calling the report.
Report is generating for only first 50 rows (maximum 4 pages) are only coming.
Why the remaining rows are not coming in reports?
private void generatePDFFile(FacesContext facesContext, java.io.OutputStream outputStream) {
try {
DCBindingContainer dcBindings = (DCBindingContainer)BindingContext.getCurrent().getCurrentBindingsEntry();
DCIteratorBinding iterBind1 = (DCIteratorBinding)dcBindings.get("CustomerView1Iterator");
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(FILE)); \\file path local disk D:\mywork\customer.pdf
document.open();
Row[] rows= iterBind1.getAllRowsInRange();
for(Row row:rows){
custcode = (String) row.getAttribute("CustomerCode");
custname = (String) row.getAttribute("CustomerNameE");
addgroup(document);
}
document.close();
facesContext = facesContext.getCurrentInstance();
ServletContext context = (ServletContext)facesContext.getExternalContext().getContext();
File file = new File(FILE);
FileInputStream fileInputStream;
byte[] b;
System.out.println(file.getCanonicalPath());
System.out.println(file.getAbsolutePath());
fileInputStream = new FileInputStream(file);
int n = fileInputStream.available();
while (n > 0) {
b = new byte[n];
//b = new byte[8192];
int result = fileInputStream.read(b);
outputStream.write(b, 0, b.length);
if (result == -1)
break;
}
outputStream.flush();
} catch (Exception e) {
e.printStackTrace();
}
}
private static void addgroup(Document document) throws DocumentException{
Paragraph preface1 = new Paragraph();
Paragraph preface2 = new Paragraph();
preface1.add(new Chunk("\n"));
preface1.add(new Chunk("Customer : "+custcode+" "+custname,BlueFont));
preface1.add(new Chunk("\n"));
document.add(preface1);
document.add(preface2);
}
The maximum size of a page in a PDF document is 14,400 by 14,400 user units. See the blog post Help, I only see blank pages in my PDF. This is not a limitation introduced by iText; it's a limitation that exists in PDF viewers such as Adobe Reader. Other viewers (such as Apple Preview) have no problem displaying PDFs with larger pages.
This answers the question in the Subject-line of your post.
The body of your post contains a completely different question. Maybe you aren't asking about the page size, but rather about the file size. That question has also been answered in the past. See What is the size limit of pdf file?
Allow me to summarize that answer:
The maximum size of a PDF with a version older than PDF 1.5 is about 10 gigabytes.
The maximum size of a PDF with a version PDF 1.5 or higher using a compressed cross-reference stream depends only on the limitations of the software processing the PDF.
The maximum size of a PDF created with iText versions before 5.3 is 2 gigabytes. The maximum size of a PDF created with iText versions 5.3 and higher is 1 terabyte.
Since you are using iText 5.1.3, you can create a PDF of 2 GBytes. It beats me why you are using a version of iText that dates from November 2011 instead of a more recent version, but there is no reason why you wouldn't be able to put a table containing 3000 rows in a PDF that can be as large as 2 GBytes.
Chances are that your code is really bad. Unfortunately, you don't show us your code. Did you read the documentation on how to create Large Tables?
I can't comment on the part where you say:
In my jspx page has customer table & button has file downloader listner there i'm simply calling the report.
That doesn't seem to be relevant in the context of your question.

Multiple Jasper reports - Single request

I have a requirement,where I can select multiple records,and choose the action to generate zipped report with document each(pdf in my case) for each of the selected entries.
For example Employee1,Employee2,Employee3 would be selected,and when I choose to generate report,3 reports should be generated one each for the employee,and the output has to be zipped and downloaded.
For now what I do is generate the jasperPrint and export the report to ZipOutputstream with a new zipentry for each of the employee.
This means running the queries thrice and adding the outputstream to zip.
Is there a better way of doing it?
Your solution is normally the correct way, the alternatives are if you need to avoid multiple queries.
Load all you data in to a List<Employee> pass a new JRBeanCollectionDataSource(List<Employee>) for each Employee. (This decrease queries but increase memory usage).
If you have control of the pages (example 1 employee per page), you can generate 1 pdf with all employees and then use itext to split it in multiple pdf.
PdfReader reader = new PdfReader("nameOfReport.pdf");
int n = reader.getNumberOfPages();
int i = 0;
while ( i < n ) {
Document document = new Document(reader.getPageSizeWithRotation(1));
PdfCopy writer = new PdfCopy(document, new FileOutputStream("Employee_" + i + ".pdf"));
document.open();
PdfImportedPage page = writer.getImportedPage(reader, ++i);
writer.addPage(page);
document.close();
writer.close();
}

OpenXML editing docx

I have a dynamically generated docx file.
Need write the text strictly to end of page.
With Microsoft.Interop i insert Paragraphs before text:
int kk = objDoc.ComputeStatistics(WdStatistic.wdStatisticPages, ref wMissing);
while (objDoc.ComputeStatistics(WdStatistic.wdStatisticPages, ref wMissing) != kk + 1)
{
objWord.Selection.TypeParagraph();
}
objWord.Selection.TypeBackspace();
But i can't use same code with Open XML, because pages.count calculated only by word.
Using interop impossible, because it so slowwwww.
There are 2 options of doing this in Open XML.
create Content Place holder from Microsoft Office Developer Tab at the end of your document and now you can access this Content Place Holder programatically and can place any text in it.
you can append text driectly to your word document where it will be inserted at the end of your text. In this approach you got to write all the stuff to your document first and once you are done than you can append your document the following way
//
public void WriteTextToWordDocument()
{
using(WordprocessingDocument doc = WordprocessingDocument.Open(documentPath, true))
{
MainDocumentPart mainPart = doc.MainDocumentPart;
Body body = mainPart.Document.Body;
Paragraph paragraph = new Paragraph();
Run run = new Run();
Text myText = new Text("Append this text at the end of the word document");
run.Append(myText);
paragraph.Append(run);
body.Append(paragraph);
// dont forget to save and close your document as in the following two lines
mainPart.Document.Save();
doc.Close();
}
}
I haven't tested the above code but hope it will give you an idea of dealing with word document in OpenXML.
Regards,

Getting line locations with iText

How can one find where are lines located in a document with iText?
Suppose say I have a table in a PDF document, and want to read its contents; I would like to find where exactly the cells are located. In order to do that I thought I might find the intersections of lines.
I think your only option using iText will be to parse the PDF tokens manually. Before doing that I would have a copy of the PDF spec handy.
(I'm a .Net guy so I use iTextSharp but other than some capitalization differences and property declarations they're almost 100% the same.)
You can get the individual tokens using the PRTokeniser object which you feed bytes into from calling getPageContent(pageNum) on your PdfReader.
//Get bytes for page 1
byte[] pageBytes = reader.getPageContent(1);
//Get the tokens for page 1
PRTokeniser tokeniser = new PRTokeniser(pageBytes);
Then just loop through the PRTokeniser:
PRTokeniser.TokType tokenType;
string tokenValue;
while (tokeniser.nextToken()) {
tokenType = tokeniser.tokenType;
tokenValue = tokeniser.stringValue;
//...check tokenValue, do something with it
}
As far a tokenValue, you'd want to probably look for re and l values for rectangle and line. If you see an re then you want to look at the previous 4 values and if you see an l then previous 2 values. This also means that you need to store each tokenValue in an array so you can look back later.
Depending on what you used to create the PDF with you might get some interesting results. For instance, I created a 4 cell table with Microsoft Word and saved as a PDF. For some reason there are two sets of 10 rectangles with many duplicates, but the general idea still works.
Below is C# code targeting iTextSharp 5.1.1.0. You should be able to convert it to Java and iText very easily, I noted the one line that has .Net-specific code that needs to be adjusted from a Generic List (List<string>) to a Java equivalent, probably an ArrayList. You'll also need to adjust some casing, .Net uses Object.Method() whereas Java uses Object.method(). Lastly, .Net accesses properties without gets and sets, so Object.Property is both the getter and setter compared to Java's Object.getProperty and Object.setProperty.
Hopefully this gets you started at least!
//Source file to read from
string sourceFile = "c:\\Hello.pdf";
//Bind a reader to our PDF
PdfReader reader = new PdfReader(sourceFile);
//Create our buffer for previous token values. For Java users, List<string> is a generic list, probably most similar to an ArrayList
List<string> buf = new List<string>();
//Get the raw bytes for the page
byte[] pageBytes = reader.GetPageContent(1);
//Get the raw tokens from the bytes
PRTokeniser tokeniser = new PRTokeniser(pageBytes);
//Create some variables to set later
PRTokeniser.TokType tokenType;
string tokenValue;
//Loop through each token
while (tokeniser.NextToken()) {
//Get the types and value
tokenType = tokeniser.TokenType;
tokenValue = tokeniser.StringValue;
//If the type is a numeric type
if (tokenType == PRTokeniser.TokType.NUMBER) {
//Store it in our buffer for later user
buf.Add(tokenValue);
//Otherwise we only care about raw commands which are categorized as "OTHER"
} else if (tokenType == PRTokeniser.TokType.OTHER) {
//Look for a rectangle token
if (tokenValue == "re") {
//Sanity check, make sure we have enough items in the buffer
if (buf.Count < 4) throw new Exception("Not enough elements in buffer for a rectangle");
//Read and convert the values
float x = float.Parse(buf[buf.Count - 4]);
float y = float.Parse(buf[buf.Count - 3]);
float w = float.Parse(buf[buf.Count - 2]);
float h = float.Parse(buf[buf.Count - 1]);
//..do something with them here
}
}
}

Controlling table row height for document in poi

I am trying to draw a table in poi using XWPF component using the below code
// InputStream in= Temp.class.getResourceAsStream("Sample.docx");
XWPFDocument doc = new XWPFDocument();
XWPFTable table=doc.createTable(2,2);
// CTDecimalNumber rowSize = table.getCTTbl().getTblPr().getTblStyleRowBandSize();
// rowSize.setVal(new BigInteger("10"));
// table.getRow(1).setHeight(0);
table.getRow(0).getCell(0).setText("Row-0-Cell-0");
table.getRow(0).getCell(1).setText("Row-0-Cell-1");
table.getRow(1).getCell(0).setText("Row-1-Cell-0");
table.getRow(1).getCell(1).setText("Row-1-Cell-1");
FileOutputStream out = new FileOutputStream("simpleTable.docx");
doc.write(out);
out.close();
It draws the table properly but The height of the cells are too big and width also does not falls properly in place. I saw in a note that the table are supposed to auto fit as per the content. Tried couple of things which are as follows:
Tried setting the height as shown in the commented code above but
that did not worked.
Tried reading an existing doc as shown in
inputstream commented code. That gave an exception that could not
read a poi-ooxml file.
Tried using TblStyleRowBandSize but that
always remains null. Not sure how to create a new instance of
CTDecimalNumber or TblStyleRowBandSize
thanks in advance.
Some more insight:
When I create an empty table, and add rows and column by using create, it works fine and does not inserts the formatting character. But making an empty table results in a cell created at the begining and I am still trying to find a way to remove that first column. New code
XWPFTable table=doc.createTable();
XWPFTableRow row1 = table.createRow();
row1.createCell().setText("Row-0-Cell-0");
row1.createCell().setText("Row-0-Cell-1");
XWPFTableRow row2 = table.createRow();
row2.createCell().setText("Row-1-Cell-0");
row2.createCell().setText("Row-1-Cell-1");
I'm not sure on most of your query, but I can help with the last bit, you'll want something like:
if(! table.getTblPr().isSetTblStyleRowBandSize()) {
table.getTblPr().addNewTblStyleRowBandSize();
}
System.out.println("Was " + table.getTblPr().getTblStyleRowBandSize().getVal());
table.getTblPr().getTblStyleRowBandSize().setVal(new BigInteger("12345"));
System.out.println("Now " + table.getTblPr().getTblStyleRowBandSize().getVal());