How to add custom JDBC dialects in PySpark - scala

I have a custom JDBC Dialect in Scala, which works flawlessly through registerDialect method in Scala Spark API. I was hoping to use the same class in PySpark by accessing it through
sc._jvm.org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(sc._jvm.com.me.MyJDBCDialect)
But I receive this error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1124, in __call__
File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1094, in _build_args
File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 289, in get_command_part
File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1363, in __getattr__
py4j.protocol.Py4JError: com.me.MyJDBCDialect._get_object_id does not exist in the JVM
I'm totally unfamiliar with Py4J but it sounds like _get_object_id error is raised since sc._jvm.com.me.MyJDBCDialect is a Python object and I try to pass it to sc._jvm.org.apache.spark.sql.jdbc.JdbcDialects.registerDialect, which is a Java(?) construct. How do I get around this problem?

This worked for me:
Make sure that your dialect is declared as class, not object
from py4j.java_gateway import java_import
gw = spark.sparkContext._gateway
java_import(gw.jvm, "com.me.MyJDBCDialect")
gw.jvm.org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(
gw.jvm.com.me.MyJDBCDialect())
Note the () - it will call class constructor for your dialect

Related

How to get Metadata in python ONVIF?

Currently, I'm looking for metadata access functions using the in Python-onvif.
I want to get the coordinates of "BoundingBox" inside the red box.
How to i access data?
https://www.onvif.org/ver20/analytics/wsdl/analytics.wsdl#op.GetSupportedMetadata
This function was used, but an AttributeError occurred.
from onvif import ONVIFCamera
cam = ONVIFCamera('192.168.100.133', 80, 'ID', 'P/W')
cam.create_analytics_service()
meta = cam.analytics.GetSupportedMetadata()
print(meta)
result:
Traceback (most recent call last):
File "C:\Users\User\anaconda3\envs\py310\lib\site-packages\zeep\proxy.py", line 97, in __getitem__
return self._operations[key]
KeyError: 'GetSupportedMetadata'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\User\AppData\Roaming\JetBrains\PyCharmCE2021.2\scratches\meta_data.py", line 24, in <module>
meta = cam.analytics.GetSupportedMetadata()
File "C:\Users\User\anaconda3\envs\py310\lib\site-packages\onvif\client.py", line 167, in __getattr__
return self.service_wrapper(getattr(self.ws_client, name))
File "C:\Users\User\anaconda3\envs\py310\lib\site-packages\zeep\proxy.py", line 88, in __getattr__
return self[key]
File "C:\Users\User\anaconda3\envs\py310\lib\site-packages\zeep\proxy.py", line 99, in __getitem__
raise AttributeError("Service has no operation %r" % key)
AttributeError: Service has no operation 'GetSupportedMetadata'
I need your help.
I'm not exactly using the same onvif package you are, and I'm not so sure myself how to achieve this, but this is what I've got so far:
Both the package python-onvif, that you are using, and the Valkka inspired implementation I'm using, rely on a folder WSDL which contains pretty old versions of Onvif operations. It seems we both were using version 2.2 while the current version is 20.12.
So what I did was to download the newer versions at their repository and replace the contents of the WSDL folder with the content of the new WSDL folder.
I also had to replace how the paths were built since now there is some folder hierarchy inplace to reach the WSDL files, but after that I was able to call GetSupportedMetadata successfully.

Why does StringIndexer has no outputCols?

I am using Apache Zeppelin. My anaconda version is conda 4.8.4. and my spark version is:
%spark2.pyspark
spark.version
u'2.3.1.3.0.1.0-187'
When I run my code, it throws followed error:
Exception AttributeError: "'StringIndexer' object has no attribute '_java_obj'" in <object repr() failed> ignored
Fail to execute line 4: indexerFeatures = StringIndexer(inputCols=catColumns, outputCols=catIndexedColumns, handleInvalid="keep")
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-66369397479549554.py", line 375, in <module>
exec(code, _zcUserQueryNameSpace)
File "<stdin>", line 4, in <module>
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/__init__.py", line 105, in wrapper
return func(self, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'outputCols'
I ran the same code in Databricks and everything worked fine. I also checked the import for the StringIndexer with the help() function and it didn't included the outputCols argument.
It should be outputCol, not outputCols.
For spark 2.3.1, you can refer to: https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.feature.StringIndexer
class pyspark.ml.feature.StringIndexer(inputCol=None, outputCol=None, handleInvalid='error', stringOrderType='frequencyDesc')

regarding a py4j exception

I have installed Java 11 and Python 3 on CentOS. Trying to run a code that worked perfectly fine on a Windows environment. Getting this exception:
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/usr/lib/python3.4/site-packages/py4j/java_gateway.py", line 1188, in
send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.4/site-packages/py4j/java_gateway.py", line 1014, in send_command
response = connection.send_command(command)
File "/usr/lib/python3.4/site-packages/py4j/java_gateway.py", line 1193, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "WordInformation.py", line 493, in
status = read_from_source("../Corpora/Bhandarkar Oriental Research Books")
File "WordInformation.py", line 473, in read_from_source
author, year)
File "WordInformation.py", line 381, in fetch_from_hwn
return read_store_properties(word, file, sentence, source, category, author,
year);
File "WordInformation.py", line 79, in read_store_properties
properties["synsets"] = get_other_props(word)
File "WordInformation.py", line 226, in get_other_props
output = gateway.jvm.Properties.getProperties(word)
File "/usr/lib/python3.4/site-packages/py4j/java_gateway.py", line 1286, in
call
answer, self.gateway_client, self.target_id, self.name)
File "/usr/lib/python3.4/site-packages/py4j/protocol.py", line 336, in
get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling
z:in.ac.iitb.cfilt.jhwnl.examples.Properties.getProperties
Initialised the gateway as follows:
gateway = JavaGateway.launch_gateway(classpath="/home/gayatri/Code/hindiwn.jar")
Is this because of some dependency? I have set the JAVA_HOME and updated the PATH variable.
I don't have the reputation to comment, but
Answer from Java side is empty
But this error indicates that the Java code is not reachable.
Just to verify basic steps
1) Be sure that the java program is running
2) Be sure that you run the Python script after the java code is running
3) The java program is running the entire time.
If you are doing those two things, then an issue may be that the operating system may be already using a port.
You can try
Java
'GatewayServerBuilder server = new GatewayServerBuilder().javaPort(1001).build()'
'server.start()'
Python
java = JavaGateway(gateway_parameters=GatewayParameters(port=1001))

Issue running psycopg2 inside AWS Lambda Function

I'm getting the following error when trying to run psycopg2 in a AWS Lambda:
/var/task/functions/../vendored/psycopg2/_psycopg.so: ELF file's phentsize not the expected size: ImportError
Traceback (most recent call last):
File "/var/task/functions/refresh_mv.py", line 64, in execute
session = SessionFactoryGraphQL.get_session(app=item['app'])
File "/var/task/lib/session_factory.py", line 22, in get_session
engine = create_engine(conn_string, poolclass=NullPool)
File "/var/task/functions/../vendored/sqlalchemy/engine/__init__.py", line 387, in create_engine
return strategy.create(*args, **kwargs)
File "/var/task/functions/../vendored/sqlalchemy/engine/strategies.py", line 80, in create
dbapi = dialect_cls.dbapi(**dbapi_args)
File "/var/task/functions/../vendored/sqlalchemy/dialects/postgresql/psycopg2.py", line 554, in dbapi
import psycopg2
File "/var/task/functions/../vendored/psycopg2/__init__.py", line 50, in <module>
from psycopg2._psycopg import ( # noqa
ImportError: /var/task/functions/../vendored/psycopg2/_psycopg.so: ELF file's phentsize not the expected size
The weird thing is: everything was working fine until yesterday (for more than 5 months), and suddenly stopped working. None of the libraries has been updated.
I tried to build from scratch, as in https://github.com/jkehler/awslambda-psycopg2, but still having the same error.
Can someone help me with it?
The problem is in the latest version of serverless framework. I assume that you are using serverless to deploy your lambda function.
serverless remove
npm install serverless#1.20.2 -g
This should work.

PicklingError caused by p4a.calendar.interfaces.ICalendarSupport when migrating from 3.3.5 to 4.0.7

I am trying to migrate a Plone 3.3.5 installation to 4.0.7.
Before migrating, it was decided that Plone4Artists Calendar should be removed.
We followed the procedures proposed here http://plone.org/documentation/kb/cleaning-p4a, applying it for the following interfaces:
from p4a.calendar.interfaces import ICalendarEnhanced
from p4a.calendar.interfaces import IPossibleCalendar
from p4a.calendar.interfaces import ICalendarConfig
from p4a.calendar.interfaces import IEventProvider
from p4a.calendar.interfaces import IEvent
from p4a.calendar.interfaces import IBasicCalendarSupport
from p4a.calendar.interfaces import ICalendarSupport
Apparently, it works (except for the IPossibleCalendar interface, for which I get a "Can only remove directly provided interfaces." message). In order to remove that, we uninstalled the product (following good advice from here)
Now, after moving the Data.fs from 3.3.5 to the 4.0.7 install, and trying to activate the migration at /Plone/##plone-upgrade, I get the following traceback:
Traceback (most recent call last):
File "d:\plone-4.0.7-teste-20110927\buildout-cache\eggs\plone-4.0.7-py2.6.egg\Products\CMFPlone\MigrationTool.py", line 175, in upgrade
step['step'].doStep(setup)
File "d:\plone-4.0.7-teste-20110927\buildout-cache\eggs\products.genericsetup-1.6.3-py2.6.egg\Products\GenericSetup\upgrade.py", line 142, in doStep
self.handler(tool)
File "d:\plone-4.0.7-teste-20110927\buildout-cache\eggs\plone.app.upgrade-1.0.7-py2.6.egg\plone\app\upgrade\v40\alphas.py", line 443, in migrateFolders
transaction.savepoint(optimistic=True)
File "d:\plone-4.0.7-teste-20110927\buildout-cache\eggs\transaction-1.0.1-py2.6.egg\transaction\_manager.py", line 99, in savepoint
return self.get().savepoint(optimistic)
File "d:\plone-4.0.7-teste-20110927\buildout-cache\eggs\transaction-1.0.1-py2.6.egg\transaction\_transaction.py", line 253, in savepoint
self._saveAndRaiseCommitishError() # reraises!
File "d:\plone-4.0.7-teste-20110927\buildout-cache\eggs\transaction-1.0.1-py2.6.egg\transaction\_transaction.py", line 250, in savepoint
savepoint = Savepoint(self, optimistic, *self._resources)
File "d:\plone-4.0.7-teste-20110927\buildout-cache\eggs\transaction-1.0.1-py2.6.egg\transaction\_transaction.py", line 647, in __init__
savepoint = savepoint()
File "d:\plone-4.0.7-teste-20110927\buildout-cache\eggs\zodb3-3.9.5-py2.6-win32.egg\ZODB\Connection.py", line 1128, in savepoint
self._commit(None)
File "d:\plone-4.0.7-teste-20110927\buildout-cache\eggs\zodb3-3.9.5-py2.6-win32.egg\ZODB\Connection.py", line 606, in _commit
self._store_objects(ObjectWriter(obj), transaction)
File "d:\plone-4.0.7-teste-20110927\buildout-cache\eggs\zodb3-3.9.5-py2.6-win32.egg\ZODB\Connection.py", line 640, in _store_objects
p = writer.serialize(obj) # This calls __getstate__ of obj
File "d:\plone-4.0.7-teste-20110927\buildout-cache\eggs\zodb3-3.9.5-py2.6-win32.egg\ZODB\serialize.py", line 422, in serialize
return self._dump(meta, obj.__getstate__())
File "d:\plone-4.0.7-teste-20110927\buildout-cache\eggs\zodb3-3.9.5-py2.6-win32.egg\ZODB\serialize.py", line 431, in _dump
self._p.dump(state)
PicklingError: Can't pickle <class 'p4a.calendar.interfaces.ICalendarSupport'>: import of module p4a.calendar.interfaces failed
Please note that this is a test run: it is a bogus and clean Data.fs, and the only product involved is P4A.Calendar, which was installed (and uninstalled before moving the Data.fs to the 4.0.7 install) only at the 3.3.5 buildout.
I also tried to follow the advice from here, and the SiteManager for the Plone object does not have any adapter or subscriber with the "p4a" string. The dict attribute from the global SiteManager does not report any occurrence of the "p4a" string either.
Can anyone give me a light on this? Where should I look for "p4a.calendar.interfaces.ICalendarSupport" in the Data.fs?