Delete a file from hdfs using pyspark - pyspark

I am trying to delete a file from hdfs using pyspark.
Note that hdfs and spark are running on different docker containers.
I'm trying to run : subprocess.call(["hadoop", "fs", "-rm", "-f", PATH])
The error I get is:
File not found
Traceback (most recent call last):
File "/spark/bin/testPyspark.py", line 63, in <module>
deleteDataset()
File "/spark/bin/testPyspark.py", line 14, in deleteDataset
subprocess.call(["hadoop", "fs", "-rm", "-f", PATH])
File "/usr/lib/python2.7/subprocess.py", line 172, in call
return Popen(*popenargs, **kwargs).wait()
File "/usr/lib/python2.7/subprocess.py", line 394, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

Related

Running supervisord on a read only filesystem

I'm trying to run supervisord on a read only filesystem.
I have tried to stop supervisord from writing log and pid files using the following configuration:
[supervisord]
nodaemon=true
user=root
logfile=/dev/stdout
logfile_maxbytes=0
pidfile=/dev/null
However, when I attempt to start, I still receive the following error:
Traceback (most recent call last):
File "/usr/bin/supervisord", line 11, in <module>
load_entry_point('supervisor==3.3.4', 'console_scripts', 'supervisord')()
File "/usr/lib/python2.7/site-packages/supervisor/supervisord.py", line 349, in main
options = ServerOptions()
File "/usr/lib/python2.7/site-packages/supervisor/options.py", line 428, in __init__
existing_directory, default=tempfile.gettempdir())
File "/usr/lib/python2.7/tempfile.py", line 275, in gettempdir
tempdir = _get_default_tempdir()
File "/usr/lib/python2.7/tempfile.py", line 217, in _get_default_tempdir
("No usable temporary directory found in %s" % dirlist))
IOError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/']
Is it possible to start/run supervisord on a read only filesystem?
Set the TEMPDIR environment variable to a rw volume mount.

docker-compose up returns OSError: Can not read file in context: .../data/mmaps/5332641.mmtile

Followed the instructions from here: http://www.azerothcore.org/wiki/Install-with-Docker
I used the v8 data
When I run docker-compose up I get the following:
Building ac-worldserver
Traceback (most recent call last):
File "site-packages/docker/utils/build.py", line 97, in create_archive
File "tarfile.py", line 1972, in addfile
File "tarfile.py", line 250, in copyfileobj
File "tempfile.py", line 481, in func_wrapper
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "bin/docker-compose", line 6, in <module>
File "compose/cli/main.py", line 72, in main
File "compose/cli/main.py", line 128, in perform_command
File "compose/cli/main.py", line 1077, in up
File "compose/cli/main.py", line 1073, in up
File "compose/project.py", line 548, in up
File "compose/service.py", line 367, in ensure_image_exists
File "compose/service.py", line 1106, in build
File "site-packages/docker/api/build.py", line 160, in build
File "site-packages/docker/utils/build.py", line 31, in tar
File "site-packages/docker/utils/build.py", line 100, in create_archive
OSError: Can not read file in context: /home/azerothcore/wotlk/azerothcore-wotlk/docker/worldserver/data/mmaps/5332641.mmtile
[21981] Failed to execute script docker-compose
It is likely disc space related, i had same error and there is an error above it that indicates the build ran out of disc space. Works after clearing space it uses over 10gb in my case.
I had the same error when I tried to mount a large file. The solution for me was to create a .dockerignore file containing the name of the directory where the large file was saved.

Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'

I am unable to run Hive queries from Pyspark.
I tried copying hive-site.xml into spark's conf but inspite of doing that it is throwing the same error
Full rror
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/spark-2.4.0/python/pyspark/sql/context.py", line 358, in sql
return self.sparkSession.sql(sqlQuery)
File "/usr/local/spark-2.4.0/python/pyspark/sql/session.py", line 767, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/local/spark-2.4.0/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/local/spark-2.4.0/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':"
In my test with oozie, I had to add the Hive-related jars that Spark needs. Try adding the same in spark's conf maybe

Apache Beam Python wordcount example errors on Windows10

I am running Anaconda - conda virtual env with Python 2.7
I have followed Apache Beam Python SDK Quickstart
When I run -
'python -m apache_beam.examples.wordcount --input C:\Users\simon_6dagkya\OneDrive\ProgrammingCore\Apache Beam\examples\wordcount\kinglear.txt --output C:\Users\simon_6dagkya\OneDrive\ProgrammingCore\Apache Beam\examples\wordcount\output.txt'
I get following error:
INFO:root:Missing pipeline option (runner). Executing pipeline using the default runner: DirectRunner.
Traceback (most recent call last):
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\site-packages\apache_beam\examples\wordcount.py", line 136, in <module>
run()
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\site-packages\apache_beam\examples\wordcount.py", line 90, in run
lines = p | 'read' >> ReadFromText(known_args.input)
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\site-packages\apache_beam\io\textio.py", line 524, in __init__
skip_header_lines=skip_header_lines)
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\site-packages\apache_beam\io\textio.py", line 119, in __init__
validate=validate)
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\site-packages\apache_beam\io\filebasedsource.py", line 121, in __init__
self._validate()
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\site-packages\apache_beam\options\value_provider.py", line 133, in _f
return fnc(self, *args, **kwargs)
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\site-packages\apache_beam\io\filebasedsource.py", line 181, in _validate
'No files found based on the file pattern %s' % pattern)
IOError: No files found based on the file pattern C:\Users\simon_6dagkya\OneDrive\ProgrammingCore\Apache
Any help most appreciated.
IOError: No files found based on the file pattern C:\Users\simon_6dagkya\OneDrive\ProgrammingCore\Apache
Your input string has a space in it. Add quotes.

IPython Notebook (Jupyter) Bash Kernel raising FileNotFoundError with Python3

Recently upgraded to Anaconda Python3 but now when I try to launch a new notebook with the bash kernel I get the below traceback indicating that it is looking still for my previous python interpreter. Not sure how this can be updated to point to my new python in anaconda3 folder. Any help would be appreciated.
[E 10:40:58.086 NotebookApp] Unhandled error in API request
Traceback (most recent call last):
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/html/base/handlers.py", line 365, in wrapper
result = yield gen.maybe_future(method(self, *args, **kwargs))
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/html/services/sessions/handlers.py", line 53, in post
model = sm.create_session(path=path, kernel_name=kernel_name)
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/html/services/sessions/sessionmanager.py", line 66, in create_session
kernel_name=kernel_name)
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/html/services/kernels/kernelmanager.py", line 84, in start_kernel
kernel_name=kernel_name, **kwargs)
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/kernel/multikernelmanager.py", line 112, in start_kernel
km.start_kernel(**kwargs)
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/kernel/manager.py", line 240, in start_kernel
**kw)
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/kernel/manager.py", line 189, in _launch_kernel
return launch_kernel(kernel_cmd, **kw)
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/kernel/launcher.py", line 213, in launch_kernel
proc = Popen(cmd, **kwargs)
File "/ebs/anaconda3/lib/python3.4/subprocess.py", line 859, in __init__
restore_signals, start_new_session)
File "/ebs/anaconda3/lib/python3.4/subprocess.py", line 1457, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: '/root/anaconda/bin/python'
Use a symbolic link to achieve this:
ln -s /path/to/old/binary /path/to/new/binary
If you receive an insufficient permissions error, prepend sudo to the command.
In your case, you should symlink /root/anaconda/bin/python to the binary in your anaconda folder.