PdfUtilities.convertPdf2Png Create automatic images in My directory - itext

I've written some code to perform OCR on a PDF using Tesseract (Tess4J):
public void DoOCRAnalyse(String From) throws FileNotFoundException {
Tesseract instance = Tesseract.getInstance(); // JNA Interface Mapping
File[] files=PdfUtilities.convertPdf2Png(new File(From));
for (File f:files) {
try {
String result = instance.doOCR(f);
/*String result = instance.doOCR(take File or BufferedImage); */
SearchForSVHC(result,SvhcList);
} catch (TesseractException e) {
System.err.println(e.getMessage());
}
}
}
It recognizes text, which is great, but my problem is that it needs the images to be in a directory on disk. How can I pass a BufferedImage or File to the methode doOCR() without needing the files on disk?

You are passing a File object to doOCR. When you call convertPdf2Png, it invokes GhostScript to convert a PDF file to one or more PNG files. You certainly can delete them after OCR if you want, e.g., by executing f.Delete() in a finally block.

Related

How to get files from selected folder with GtkFileChooserButton?

I am making a GTK+3 App with GJS where users select a folder from a GtkFileChooserButton (action property set to select-folder). I want to find all image files in the given folder the user have selected, so I can display one of the images.
I tried this._fileChooserButton.get_files() and this._folderChooseButton.get_uris() but they only return one file, which is the path to the selected folder. Like this:
_init(application) {
this._folderChooseButton.connect('file-set', () => {
this._onFolderChosen();
});
}
_onFolderChosen() {
let folder = this._folderChooseButton.get_file();
// somehow get files from the folder here
this._image.set_from_file(files[1]);
}
From the API it is not really clear to me, how do I find out which image files are inside the user's selected directory (and subdirectories)?
OK, after help from patrick, georges and matthias at guadec, here is what I got.
The get_file() function I tried returns a GFile, which in this case is a folder (in UNIX, folders are also files). In order to get the files within the directory path, we need to call enumerate_children_async() on our GFile, returned by the get_file() function.
The enumate_children_async() function takes five parameters:
A comma-separated attribute list. In our case, since we want the identifiers of the children in the directory, we want to use the attribute called standard::name.
FileQueryInfoFlag: This allows to either follow or not follow symbolic links. In this case, we will use FileQueryInfoFlag.NONE which will not follow symbolic links.
io_priority: How high priority the IO operation should have (we will use GLib.PRIORITY_DEFAULT)
cancellable: A cancellable, which is a way to cancel this operation, in this case we will leave it as null.
callback: This is the function/code you want to run in response to the files having been retreived.
More info on this function is at GJS-Docs at GNOME.org
The enumerate_children_async() function returns a GFileEnumerator, which we can use to retreive a number of the files, by calling next_files_async(), which takes these arguments:
num_files: How many files you want to retreive. In your case, we use 1.
io_priority and cancellable (same as above).
callback: Where we can run a function or code to actually retreive the file.
Below, is the final code for doing this.
const { Gio, GLib, GObject, Gtk } = imports.gi; // import Gio and GLib API at top of your document.
_onFolderChosen() {
let folder = this._folderChooseButton.get_file();
let files = folder.enumerate_children_async(
'standard::name',
Gio.FileQueryInfoFlags.NONE,
GLib.PRIORITY_DEFAULT,
null,
(source, result, data) => {
this._fileEnumerator = null;
try {
this._fileEnumerator = folder.enumerate_children_finish(result);
} catch (e) {
log('(Error) Could not retreive list of files! Error:' + e);
return;
}
this._readNextFile();
});
}
_readNextFile() {
if (!this._fileEnumerator)
return;
let fileInfo = null;
this._fileEnumerator.next_files_async(
1,
GLib.PRIORITY_DEFAULT,
null,
(source, result, data) => {
try {
fileInfo = this._fileEnumerator.next_files_finish(result);
} catch (e) {
log('Could not retreive the next file! Error:' + e);
return;
}
let file = fileInfo[0].get_name();
let filePath = GLib.build_filenamev([this._folderChooseButton.get_filename(), file]);
this._carousselImage.set_from_file(filePath);
});
}

In Eclipse How to refactor a filename in a project programatically

We can refactor the project names, but I need help regarding the filename refactoring in eclipse in a programatic way. We have a folder under this folder there lies a xxx.zzz file and we want to rename/refactor this file.
Kind regards
Try to right click the file, select refactor and enter your desired file name. You may be asked additionally to update references and similarly names variables. Check them both (recommended) and your file And all its references will be updated
If the file is a class file, you can try the code below:
RefactoringContribution contribution = RefactoringCore.getRefactoringContribution(IJavaRefactorings.RENAME_COMPILATION_UNIT);
RenameJavaElementDescriptor descriptor = (RenameJavaElementDescriptor) contribution.createDescriptor();
descriptor.setProject(cu.getResource().getProject().getName());
descriptor.setNewName(newFileName);
descriptor.setJavaElement(cu);
descriptor.setUpdateReferences(true);
RefactoringStatus status = new RefactoringStatus();
try {
RenameRefactoring refactoring = (RenameRefactoring) descriptor.createRefactoring(status);
IProgressMonitor monitor = new NullProgressMonitor();
RefactoringStatus status1 = refactoring.checkInitialConditions(monitor);
if (!status1.hasFatalError()) {
RefactoringStatus status2 = refactoring.checkFinalConditions(monitor);
if (!status2.hasFatalError()) {
Change change = refactoring.createChange(monitor);
change.perform(monitor);
}
}
} catch (CoreException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
cu is of type ICompilationUnit and you can get a compilation unit from IPackageFragment.
You can also replace IJavaRefactorings.RENAME_COMPILATION_UNIT with what your need.

How to recognize PDF watermark and remove it using PDFBox

I'm trying to extract text except watermark text from PDF files with Apache PDFBox library,so I want to remove the watermark first and the rest is what I want.but unfortunately,Both PDmetadata and PDXObject can't recognize the watermark,any help will be appreciated.I found some code below.
// Open PDF document
PDDocument document = null;
try {
document = PDDocument.load(PATH_TO_YOUR_DOCUMENT);
} catch (IOException e) {
e.printStackTrace();
}
// Get all pages and loop through them
List pages = document.getDocumentCatalog().getAllPages();
Iterator iter = pages.iterator();
while( iter.hasNext() ) {
PDPage page = (PDPage)iter.next();
PDResources resources = page.getResources();
Map images = null;
// Get all Images on page
try {
images = resources.getImages();//How to specify watermark instead of images??
} catch (IOException e) {
e.printStackTrace();
}
if( images != null ) {
// Check all images for metadata
Iterator imageIter = images.keySet().iterator();
while( imageIter.hasNext() ) {
String key = (String)imageIter.next();
PDXObjectImage image = (PDXObjectImage)images.get( key );
PDMetadata metadata = image.getMetadata();
System.out.println("Found a image: Analyzing for Metadata");
if (metadata == null) {
System.out.println("No Metadata found for this image.");
} else {
InputStream xmlInputStream = null;
try {
xmlInputStream = metadata.createInputStream();
} catch (IOException e) {
e.printStackTrace();
}
try {
System.out.println("--------------------------------------------------------------------------------");
String mystring = convertStreamToString(xmlInputStream);
System.out.println(mystring);
} catch (IOException e) {
e.printStackTrace();
}
}
// Export the images
String name = getUniqueFileName( key, image.getSuffix() );
System.out.println( "Writing image:" + name );
try {
image.write2file( name );
} catch (IOException e) {
// TODO Auto-generated catch block
//e.printStackTrace();
}
System.out.println("--------------------------------------------------------------------------------");
}
}
}
In contrast to your assumption there is nothing like an explicit watermark object in a PDF to recognize watermarks in generic PDFs.
Watermarks can be applied to a PDF page in many ways; each PDF creating library or application has its own way to add watermarks, some even offer multiple ways.
Watermarks can be
anything (Bitmap graphics, vector graphics, text, ...) drawn early in the content and, therefore, forming a background on which the rest of the content is drawn;
anything (Bitmap graphics, vector graphics, text, ...) drawn late in the content with transparency, forming a transparent overlay;
anything (Bitmap graphics, vector graphics, text, ...) drawn in the content stream of a watermark annotation which shall be used to represent graphics that shall be printed at a fixed size and position on a page, regardless of the dimensions of the printed page (cf. section 12.5.6.22 of the PDF specification ISO 32000-1).
Some times even mixed forms are used, have a look at this answer for an example, at the bottom you find a 'watermark' drawn above graphics but beneath text (to allow for easy reading).
The latter choice (the watermark annotation) obviously is easy to remove, but it actually also is the least often used choice, most likely because it is so easy to remove; people applying watermarks generally don't want their watermarks to get lost. Furthermore, annotations are sometimes handled incorrectly by PDF viewers, and code copying page content often ignores annotations.
If you do not handle generic documents but a specific type of documents (all generated alike), on the other hand, the very manner in which the watermarks are applied in them, probably can be recognized and an extraction routine might be feasible. If you have such a use case, please share a sample PDF for inspection.

EncodedImage.getEncodedImageResource fail to load image with the same name different subfolder in Eclipse (Blackberry plugin)

I'm using the Blackberry JDE Plugin v1.3 for Eclipse and I'm trying this code to create a BitmapField and I've always done it this way:
this.bitmap = EncodedImage.getEncodedImageResource("ico_01.png");
this.bitmap = this.bitmap.scaleImage32(
this.conf.getWidthScale(), this.conf.getHeightScale());
this.imagenLoad = new BitmapField(this.bitmap.getBitmap(), this.style);
It works fine with no errors, but now I have this set of images with the same name but in different subfolders like this:
I made it smaller than it actually is for explanatory reasons. I wouldn't want to rename the files so they're all different. I would like to know how to access the different subfolders. "res/img/on/ico_01.jpg", "img/on/ico_01.jpg", "on/ico_01.jpg" are some examples that I tried and failed.
It appears that EncodedImage.getEncodedImageResource(filename) will retrieve the first instance of filename regardless of where it is in your resource directory tree.
This is not very helpful if you have the images with the same filename in different directories (as you have).
The solution I have used is to create my own method which can return an image based on a path and filename.
public static Bitmap getBitmapFromResource(String resourceFilename){
Bitmap imageBitmap = null;
//get the image as a byte stream
InputStream imageStream = getInstance().getClass().getResourceAsStream(resourceFilename);
//load it into memory
byte imageBytes[];
try {
imageBytes = IOUtilities.streamToBytes(imageStream);
//create the bitmap
imageBitmap = Bitmap.createBitmapFromBytes(imageBytes, 0, imageBytes.length, 1);
} catch (IOException e) {
Logger.log("Error loading: "+resourceFilename+". "+e.getMessage());
}
return imageBitmap;
}

Eclipse plugin: create a new file

I'm trying to create a new file in an eclipse plugin. It's not necessarily a Java file, it can be an HTML file for example.
Right now I'm doing this:
IProject project = ...;
IFile file = project.getFile("/somepath/somefilename"); // such as file.exists() == false
String contents = "Whatever";
InputStream source = new ByteArrayInputStream(contents.getBytes());
file.create(source, false, null);
The file gets created, but the problem is that it doesn't get recognized as any type; I can't open it in any internal editor. That's until I restart Eclipse (refresh or close then open the project doesn't help). After a restart, the file is perfectly usable and opens in the correct default editor for its type.
Is there any method I need to call to get the file outside of that "limbo" state?
That thread does mention the createFile call, but also refers to a FileEditorInput to open it:
Instead of java.io.File, you should use IFile.create(..) or IFile.createLink(..). You will need to get an IFile handle from the project using IProject.getFile(..) first, then create the file using that handle.
Once the file is created you can create FileEditorInput from it and use IWorkbenchPage.openEditor(..) to open the file in an editor.
Now, would that kind of method (from this AbstractExampleInstallerWizard) be of any help in this case?
protected void openEditor(IFile file, String editorID) throws PartInitException
{
IEditorRegistry editorRegistry = getWorkbench().getEditorRegistry();
if (editorID == null || editorRegistry.findEditor(editorID) == null)
{
editorID = getWorkbench().getEditorRegistry().getDefaultEditor(file.getFullPath().toString()).getId();
}
IWorkbenchPage page = getWorkbench().getActiveWorkbenchWindow().getActivePage();
page.openEditor(new FileEditorInput(file), editorID, true, IWorkbenchPage.MATCH_ID);
}
See also this SDOModelWizard opening an editor on a new IFile:
// Open an editor on the new file.
//
try
{
page.openEditor
(new FileEditorInput(modelFile),
workbench.getEditorRegistry().getDefaultEditor(modelFile.getFullPath().toString()).getId());
}
catch (PartInitException exception)
{
MessageDialog.openError(workbenchWindow.getShell(), SDOEditorPlugin.INSTANCE.getString("_UI_OpenEditorError_label"), exception.getMessage());
return false;
}