I want to Do these tasks:
1-find a word in pdf
2-highlight all occurrences in that pdf,
3-save highlighted pdf as images of its pages.
How can I do this ?
Any help will be appreciated.
com.itextpdf.text.pdf.parser.LocationTextExtractionStrategy
It'll give you the baseline, ascent, and descent of every piece of text on the page. It's up to you to build words and rectangles from that information.
Not Easy, but possible.
After that, you just need to call GhostScript or PDFBox or something that can render PDFs. Hardly "the easy part", but it's a solved problem many times over.
Related
We are now working on upload word document on Google Docs by .NET. We have an template which contains tables, we set border style of tables to none, then all the borders are invisible. But after we uploaded it on Google Docs, the borders appears in black. We tried upload the template to Google Docs manually, the borders appear too. So I think our code is correct, does the Google Docs API allow us to change the style of table border after convert .docx to Google Doc? Or any solution to keep the borders invisible?
I've tried to make the table border to white (the paper color), then the borders is hidden while I upload it to Google Docs without conversion. But while I try to edit it, the table border appears again. I guess that's because Google viewer convert the .docx to GDoc while I try to edit the .docx document.
I've tried to set table border to none in Word, but the borders still appears after conversion. Is this a bug of Google document conversion? It should set the border to zero while the table border was set to none in Word, but it doesn't do that. Is there anybody can help me on this issue? Many thanks.
To answer your first question, there is no way of modifying the document content after uploading. You may, however find better fidelity by converting to HTML or PDF and uploading those formats.
Otherwise, you should raise a bug report on the issue tracker, so that the conversion can be improved.
This is not an answer to your problem, but it's a work around that I'm using currently, by setting the border colour to white it no longer displays the black borders.
I'm developing a book app.My client has provided all the contents of pdf.
I have already implemented all the contents of pdf to book.
But he wanted to highlight a text in that pdf.
The user would like the text to allow for highlighting (like if you're reading a paper book).
Is this possible? Can anyone help me on this, please?
Thanks in advance.
Theoretically yes.. depends on a bit hacking and other things, for example fonts used in the PDF. Have a look at PdfKitten, their demo project can find text in a PDF and highlight it. That should give you a first pointer on how to highlight. If you want to the user to highlight with the touches you would need to be able to transform locations of touches into the PDF to determine where exactly the user touched, but it should be possible.
I have about twenty UITextViews and corresponding .txt files. What I want is to make each UITextView take the contents of the corresponding file and display it.
Moreover, the text is formatted and it contains some formulas (copy/pasted from Grapher). I've heard about displaying formatted text in UIWebView, but I haven't found a clear explanation anywhere.
Thanks in advance!
Text files normally don't contain formatted text.
If by "formatted" you mean "html" then yes, you will want to use UIWebView. Basically you will convert the text to an HTML document, and then use the web view to display that document. There are several example projects available from Apple that show you how to use UIWebView.
Displaying formula in a UITextView will be difficult as the character rules for formula are completely different from language text. You could generate HTML to display it that but that is difficult as well.
I think your best bet would be to draw the formula to an image and then display the image. That is the traditional way to handle the display of formula.
I want to write an application to validate a PDF file. The validation that is required is to verify that all the text and images in the PDF should start after 0.5" margin from left and 0.5" margin from the right. If any of the text is going outside this margin then application should be able to catch this.
I tried to search this into iText but couldn't get anything usefull that can solve my purpose.
Can somebody help me out in writing this code in .net csharp.
Thanks,
Praveen
In addition to R Ubben's answer : reader.getPageSize(pageNumber) is exactly the same as reader.getBoxSize(pageNumber,"media").
That's how it's implemented in iTextSharp. You can see it in source code.
Extract:
public Rectangle GetPageSize(PdfDictionary page) {
PdfArray mediaBox = page.GetAsArray(PdfName.MEDIABOX);
return GetNormalizedRectangle(mediaBox);
}
use SetMarginMirroring(true)
The PDF standard doesn't really have the concept of margins, since a PDF is supposed to be device independent. What it can have is five boxes designed to constrain output: media box, crop box, bleed box, art box, and trim box. Usually the other four boxes are the same size or smaller than the media box.
If a mediabox is present in your pdfs, you could retrieve it and check to see that it is 0.5" smaller on each side than the page. Try comparing the results of reader.getPageSize(pageNumber) and reader.getBoxSize(pageNumber,"media"). Very likely they will be the same.
What you can do is rewrite the pdfs to make sure there are 1/2 inch margins. The easiest way to this is shrink the page.
I'm using JasperReports to generate a word (docx) document but I have a problem when I want to try to print the doc. The exporter messes up the margins of the page. Does anyone know how to prevent that from happening.
I know how to set the margin in iReport, but it just makes the data generate further from the page borders, but the margins in word which can be adjusted at the top of the page is laying right at the edge.
Has anyone had this problem?
Are you able to specify the page size? Its possible that using this in combination with the margin settings will help the export problem.
If you are not too heavily invested in your Jasper solution, Docmosis and JODReports let you layout the document visually using Word or Writer then render the report in various formats. This may save you time in the long run, but all reporting systems have quirks. Hope you find a solution, especially when your output is not PDF.