Loading JDBC connector in Eclipse Jython plugin - eclipse

I'm working with an Eclipse-based tool which provides an interactive Jython shell for scripting and data analysis against an internal data model.
I'm trying to write a script which exports results to some form of database, so I'm trying to use the built-in com.ziclix.python.sql package in Jython to provide the interface and the Xerial JDBC connector for SQLite (https://github.com/xerial/sqlite-jdbc) to provide the backend.
The script below works perfectly when run outside of the third-party tool using a standard command line Jython interpreter, including relying on the importJar() hack which is commonly used to work around Jython not always using the user CLASSPATH when run using java -jar <blah>:
from com.ziclix.python.sql import zxJDBC
from java.net import URL, URLClassLoader
from java.lang import ClassLoader
from java.io import File
JDBC_URL = "jdbc:sqlite:test.db"
JDBC_DRIVER = "org.sqlite.JDBC"
JDBC_JAR = "E:/sqlite-jdbc-3.21.0.jar"
# Import Jar file into local class path
def importJar(jarFile):
m = URLClassLoader.getDeclaredMethod("addURL", [URL])
m.accessible = 1
m.invoke(ClassLoader.getSystemClassLoader(), [File(jarFile).toURL()])
def main():
try:
importJar(JDBC_JAR)
dbConn = zxJDBC.connect(JDBC_URL, None, None, JDBC_DRIVER)
cursor = dbConn.cursor()
# Do something useful
cursor.close()
dbConn.close()
except zxJDBC.DatabaseError, msg:
print msg
if __name__ == '__main__':
main()
... but fails when run from the plugin inside Eclipse, where the zxJDBC.connect() call errors with:
driver [org.sqlite.JDBC] not found
If I add the Jar file to the JYTHONPATH environment I can do import org.sqlite.JDBC successfully in the Python script, but the connect call still fails in the Java-side of the JDBC driver manager.
For sake of completeness the full path to the Jar file is on the CLASSPATH, PYTHONPATH, and JYTHONPATH environment variables ...
Any ideas?

Related

How can I run pytesseract / tesseract in Foundry Code Repositories?

I am trying to use the function image_to_string from the library pytesseract in a repository to perform OCR of PDFs. However, I am getting the following error:
From the checks I would assume the library was loaded correctly:
Does anyone have an idea how to trouble shoot here?
It seems like Foundry is not respecting / running the environment activation script
https://github.com/conda-forge/tesseract-feedstock/blob/main/recipe/activate.sh
that sets the TESSDATA_PREFIX environment variable automatically. However, we can infer the value manually and provide it to the pytesseract API calls.
Define the following helper function:
def _get_tessdata_directory_path():
import sys
from pathlib import Path
env_root = Path(sys.executable).parent.parent
share_dir = env_root / 'share' / 'tessdata'
assert share_dir.exists(), 'tessdata directory does not exist in <envroot>/share/tessdata'
return str(share_dir)
and use it like shown in the following snippet:
tessdata_dir_config = f'--tessdata-dir "{_get_tessdata_directory_path()}"'
pytesseract.image_to_string(image, ..., config=tessdata_dir_config)

Postgres PL/JAVA: java.lang.ClassNotFoundException error after loading JAR file in database

I am getting the java.lang.ClassNotFoundException: error inside Postgres when running a function that calls a JAR file I have loaded. I have installed and configured PL/JAVA (including the delivered examples) in my database and can run the examples to success. I am not attempting to load/install my first JAR, but I am doing something wrong.
My host controls the OS version: CentOS 6.8. Postgres is version 8.4.
I am attempting to install my own very simple java class, which is a derivative of the delivered example Parameters.addOne class. All my code is in /tmp. Here are the steps I've followed:
Doug.java:
package com.msmetric;
import java.math.BigDecimal;
import java.sql.Date;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Time;
import java.sql.Timestamp;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.TimeZone;
import java.util.logging.Logger;
public class Doug {
public static int addOne(int value) {
return value + 1;
}
}
Compile Doug.java using 'javac Doug.java' succeeds.
Create JAR file with Doug.class file in it using 'jar -cvf Doug.jar Doug.class. This works fine.
Now I load the JAR file into Postgres (public schema), change the classpath, create the function that calls the JAR, then attempt to run at psql prompt.
Run sqlj.install_jar from psql:
select sqlj.install_jar('file:/tmp/Doug.jar','Doug',false);
Set the classpath inside Postgres (from psql prompt postgres=#):
select sqlj.set_classpath('public','Doug');
Create the function that calls the JAR. This create function code is taken directly from the examples.ddr file that came with PL/JAVA. I simply changed org.postgres to com.msmetric.
create or replace function addone(int) returns int as 'com.msmetric.Doug.addOne(java.lang.Integer)' language java;
Now with the JAR loaded and function created, I attempt to run it. This function should simply add 1 to the number provided.
select addone(3);
Results:
ERROR: java.lang.ClassNotFoundException: com.msmetric.Doug
Thoughts?
I'm very sorry I didn't see your question sooner. Underneath all the exotic details (PostgreSQL, PL/Java, schemas, classpaths...), there's just a bit of basic Java going on here: if a jar file contains a class Doug.class in package com.msmetric, its path within the jar has to reflect that: it has to be com/msmetric/Doug.class. Otherwise, it won't be found.
You can set up that whole structure step by step:
javac Doug.java
mkdir com
mkdir com/msmetric
mv Doug.class com/msmetric/
jar -cvf Doug.jar com/msmetric/Doug.class
Or, you can let javac do more of the work for you:
mkdir classes
javac -d classes Doug.java
jar -cvf Doug.jar -C classes .
When you give javac a -ddirectory option, instead of just writing class files next to their .java sources, it will put them all in their proper places under the directory you named, and then you can just tell jar to change into that directory and slurp them all up (don't overlook the . at the end of that jar command).
Once you fix that, if you retry your original steps, you'll see that you now get a different error:
ERROR: Unable to find static method com.msmetric.Doug.addOne with signature (Ljava/lang/Integer;)I
That happens because you declared the function in Doug.java with int addOne(int value) (that is, taking a primitive int argument), but you declared it in SQL with returns int as 'com.msmetric.Doug.addOne(java.lang.Integer)' taking an Integer object.
Once you correct that:
create or replace function addone(int) returns int as 'com.msmetric.Doug.addOne(int)' language java;
you'll be able to see:
# select addone(3);
addone
--------
4
(1 row)
If you happen to see this belated answer, may I ask what version of PL/Java you are using? That's one detail you didn't mention. If it is older than 1.5.0, there are newer features that can help you out. For one, you can just annotate that function:
#Function
public static int addOne(int value) {
return value + 1;
}
and have javac spit out not only the Doug.class file but also a pljava.ddr file with your SQL function declaration already written correctly (no mixing up argument types!). There is a way to include that .ddr file into the jar you create so that you can just call sqlj.install_jar with the last parameter true so it runs the commands in the .ddr and your functions are ready to use. There's a Hello, world example in the docs that shows more of how it's done.
Cheers,
-Chap

How to import modules in IPython Clusters

I am trying to import some of my personal modules into my IPython Clusters. I am using Anacondas on Windows Vista 64 bit
from IPython.parallel import Client
rc = Client()
dview = rc[:]
with dview.sync_imports():
import lib.rf
It is giving me this error:
No module named 'lib.rf'
I can import the module in the rest of my IPython notebook, as I have this .bat file to start ipython notebook:
cd C:\Users\Jon\workspace\bf
set PYTHONPATH=%PYTHONPATH%;C:\Users\Jon\workspace\bf
C:\Anaconda\envs\p33\scripts\ipython notebook
I am using this similar code to start my ip clusters:
cd C:\Users\Jon\workspace\bf
set PYTHONPATH=%PYTHONPATH%;C:\Users\Jon\workspace\bf
C:\Anaconda\envs\p33\Scripts\ipcluster start --n=7
Why is this not working?
More info:
If I print out sys.path, I get a list that contains C:\Users\Jon\workspace\bf
If I print out the paths of my clusters, I get the same list:
%px sys.path
['',
'',
'',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\distribute-0.6.28-py3.3.egg',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\pykalman-0.9.5-py3.3.egg',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\patsy-0.2.1-py3.3.egg',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\joblib-0.8.3_r1-py3.3.egg',
'C:\\Users\\Jon\\workspace\\bf',
'C:\\Users\\Jon\\workspace\\bf\\my_numba',
'C:\\Anaconda\\envs\\p33\\python33.zip',
'C:\\Anaconda\\envs\\p33\\DLLs',
'C:\\Anaconda\\envs\\p33\\lib',
'C:\\Anaconda\\envs\\p33',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\Sphinx-1.2.3-py3.3.egg',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\win32',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\win32\\lib',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\Pythonwin',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\runipy-0.1.1-py3.3.egg',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\setuptools-7.0-py3.3.egg',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\IPython\\extensions']
In [45]:
Further analysis:
%px lib.__path__
Out[0:11]: _NamespacePath(['C:\\Anaconda\\envs\\p33\\lib\\site-packages\\win32\\lib'])
lib.__path__
Out[57]: ['.\\lib']
Looks like the ipcluster and notebook are looking at lib in different places. I have tried renaming lib to mylib. It has not helped.
It seems that with dview.sync_imports() is being run someplace other than your IPython Notebook environment and is therefore relying a different PYTHONPATH. It is definitely not being run on one of the cluster engines and so wouldn't expect it to leverage your cluster settings of PYTHONPATH.
I'm thinking you'll need to have that directory in your PYTHONPATH (not your PATH) for the calling python environment because that is the location from which you are importing the modules.
The impact of the bit you have about setting the PYTHONPATH in the DOS shell from which you invoke ipclusters isn't clear to me. I can see that one might expect this to let the engines know about your directory, but I'm wondering if that PYTHONPATH gets initilized to the environment from which you call IPython.parallel.Client.

How to access static resources in jar (that correspond to src/main/resources folder)?

I have a Spark Streaming application built with Maven (as jar) and deployed with the spark-submit script. The application project layout follows the standard directory layout:
myApp
src
main
scala
com.mycompany.package
MyApp.scala
DoSomething.scala
...
resources
aPerlScript.pl
...
test
scala
com.mycompany.package
MyAppTest.scala
...
target
...
pom.xml
In the DoSomething.scala object I have a method (let's call it doSomething()) that tries to execute a Perl script -- aPerlScript.pl (from the resources folder) -- using scala.sys.process.Process and passing two arguments to the script (the first one is the absolute path to a binary file used as input, the second one is the path/name of the produced output file). I call then DoSomething.doSomething().
The issue is that I was not able to access the script, not with absolute paths, relative paths, getClass.getClassLoader.getResource, getClass.getResource, I have specified the resources folder in my pom.xml. None of my attempts succeeded. I don't know how to find the stuff I put in src/main/resources.
I will appreciate any help.
SIDE NOTES:
I use an external Process instead of a Spark pipe because, at this step of my workflow, I must handle binary files as input and output.
I'm using Spark-streaming 1.1.0, Scala 2.10.4 and Java 7. I build the jar with "Maven install" from within Eclipse (Kepler)
When I use the getClass.getClassLoader.getResource "standard" method to access resources I find that the actual classpath is the spark-submit script's one.
There are a few solutions. The simplest is to use Scala's process infrastructure:
import scala.sys.process._
object RunScript {
val arg = "some argument"
val stream = RunScript.getClass.getClassLoader.getResourceAsStream("aPerlScript.pl")
val ret: Int = (s"/usr/bin/perl - $arg" #< stream).!
}
In this case, ret is the return code for the process and any output from the process is directed to stdout.
A second (longer) solution is to copy the file aPerlScript.pl from the jar file to some temporary location and execute it from there. This code snippet should have most of what you need.
object RunScript {
// Set up copy destination from the Java temporary directory. This is /tmp on Linux
val destDir = System.getProperty("java.io.tmpdir") + "/"
// Get a stream to the script in the resources dir
val source = Channels.newChannel(RunScript.getClass.getClassLoader.getResourceAsStream("aPerlScript.pl"))
val fileOut = new File(destDir, "aPerlScript.pl")
val dest = new FileOutputStream(fileOut)
// Copy file to temporary directory
dest.getChannel.transferFrom(source, 0, Long.MaxValue)
source.close()
dest.close()
}
// Schedule the file for deletion for when the JVM quits
sys.addShutdownHook {
new File(destDir, "aPerlScript.pl").delete
}
// Now you can execute the script.
This approach allows you to bundle native libraries in JAR files. Copying them out allows the libraries to be loaded at runtime for whatever JNI mischief you have planned.

java.lang.NoClassDefFoundError: Could not initialize class sun.nio.ch.FileChannelImpl

I am working on an application that executes a Jython 2.5.3 script from JAVA 1.6.027. The script just open a file using codecs library and it looks like this:
try:
from codecs import open as codecs_open
except ImportError:
print 'ERROR', 'Could not import.'
CODECS_LIST = ['latin-1', 'utf-8', 'utf-16', '1250', '1252']
def open_file(filename, mode):
'''
DOC
'''
for encoding in CODECS_LIST:
try:
f = codecs_open(filename, mode, encoding)
f.read()
f.close()
print 'INFO', "File %s supports encoding %s." % (filename.split("\\")[-1], encoding)
...
except:
...
When I execute this script debugging in Eclipse, everything works OK, but when I execute the part of the JAVA application that invokes this script, I get this error:
Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class sun.nio.ch.FileChannelImpl
at java.io.RandomAccessFile.getChannel(RandomAccessFile.java:253)
at org.python.core.io.FileIO.fromRandomAccessFile(FileIO.java:173)
at org.python.core.io.FileIO.<init>(FileIO.java:79)
at org.python.core.io.FileIO.<init>(FileIO.java:57)
at org.python.core.PyFile.<init>(PyFile.java:135)
at org.python.core.PyTraceback.getLine(PyTraceback.java:65)
at org.python.core.PyTraceback.tracebackInfo(PyTraceback.java:38)
at org.python.core.PyTraceback.dumpStack(PyTraceback.java:109)
at org.python.core.PyTraceback.dumpStack(PyTraceback.java:120)
at org.python.core.Py.displayException(Py.java:1080)
at org.python.core.PySystemState.excepthook(PySystemState.java:1242)
at org.python.core.PySystemStateFunctions.__call__(PySystemState.java:1421)
at org.python.core.Py.printException(Py.java:1053)
at org.python.core.Py.printException(Py.java:1012)
at org.python.util.jython.run(jython.java:264)
at org.python.util.jython.main(jython.java:129)
The JAVA application is able to execute others similar jython scripts. I have detected that the class sun.nio.ch.FileChannelImpl is in the library rt.jar, which is inside /bin/common/ folder and included in my classpath via jvm.cfg file:
...
#LIBRARY PATH
./bin/common;...
...
The same way I do it with other libraries that work fine.
I have been stacked with this problem for a few days, so any help will be appreciated.
Thank you
The problem has been resolved by reinstalling Java Runtime Environment, in my case version jre-6u45
This happened to me because the mysql package that I had installed was installed globally and required root privileges, so when running java, I had to include sudo to get it to work correctly.