Apache Beam Python wordcount example errors on Windows10 - apache-beam

I am running Anaconda - conda virtual env with Python 2.7
I have followed Apache Beam Python SDK Quickstart
When I run -
'python -m apache_beam.examples.wordcount --input C:\Users\simon_6dagkya\OneDrive\ProgrammingCore\Apache Beam\examples\wordcount\kinglear.txt --output C:\Users\simon_6dagkya\OneDrive\ProgrammingCore\Apache Beam\examples\wordcount\output.txt'
I get following error:
INFO:root:Missing pipeline option (runner). Executing pipeline using the default runner: DirectRunner.
Traceback (most recent call last):
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\site-packages\apache_beam\examples\wordcount.py", line 136, in <module>
run()
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\site-packages\apache_beam\examples\wordcount.py", line 90, in run
lines = p | 'read' >> ReadFromText(known_args.input)
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\site-packages\apache_beam\io\textio.py", line 524, in __init__
skip_header_lines=skip_header_lines)
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\site-packages\apache_beam\io\textio.py", line 119, in __init__
validate=validate)
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\site-packages\apache_beam\io\filebasedsource.py", line 121, in __init__
self._validate()
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\site-packages\apache_beam\options\value_provider.py", line 133, in _f
return fnc(self, *args, **kwargs)
File "C:\Users\simon_6dagkya\Anaconda3\envs\apachebeam\lib\site-packages\apache_beam\io\filebasedsource.py", line 181, in _validate
'No files found based on the file pattern %s' % pattern)
IOError: No files found based on the file pattern C:\Users\simon_6dagkya\OneDrive\ProgrammingCore\Apache
Any help most appreciated.

IOError: No files found based on the file pattern C:\Users\simon_6dagkya\OneDrive\ProgrammingCore\Apache
Your input string has a space in it. Add quotes.

Related

Dataflow command doesn't excute

I'm trying to create a dataflow pipeline but I get this error in my powershell console
Traceback (most recent call last):
File "load_pole_emploi.py", line 107, in <module>
run()
File "load_pole_emploi.py", line 92, in run
gcs_bucket_name + file_pattern)
File "E:\Utilisateurs\SI084342\venv\gcp\lib\site-packages\apache_beam\io\textio.py", line 675, in __init__
escapechar=escapechar)
File "E:\Utilisateurs\SI084342\venv\gcp\lib\site-packages\apache_beam\io\textio.py", line 132, in __init__
validate=validate)
File "E:\Utilisateurs\SI084342\venv\gcp\lib\site-packages\apache_beam\io\filebasedsource.py", line 124, in __init__
self._validate()
File "E:\Utilisateurs\SI084342\venv\gcp\lib\site-packages\apache_beam\options\value_provider.py", line 193, in _f
return fnc(self, *args, **kwargs)
File "E:\Utilisateurs\SI084342\venv\gcp\lib\site-packages\apache_beam\io\filebasedsource.py", line 185, in _validate
match_result = FileSystems.match([pattern], limits=[1])[0]
File "E:\Utilisateurs\SI084342\venv\gcp\lib\site-packages\apache_beam\io\filesystems.py", line 203, in match
filesystem = FileSystems.get_filesystem(patterns[0])
File "E:\Utilisateurs\SI084342\venv\gcp\lib\site-packages\apache_beam\io\filesystems.py", line 106, in get_filesystem
'e.g., pip install apache-beam[gcp]. Path specified: %s' % path)
ValueError: Unable to get filesystem from specified path, please use the correct path or ensure the required dependency is installed, e.g., pip install apache-beam[gcp]. Path specified: gs://bck-fr-fichiers-manuel-dev/de_par_categorie_et_code_rome/file.csv
I have re installed apache beam[gcp] but the problem still remains
Any help, thanks
thank you, it was related to missing apache beam test and docs

ckan.plugins.core.PluginNotFoundException: pdf_view Error

I installed CKAN 2.8.5 from package on Ubuntu 16.04. I activated the ckanext-pdfview. Everything is OK. Bu t when I run paster commands I received the below error.
Traceback (most recent call last):
File "/usr/lib/ckan/default/bin/paster", line 8, in <module>
sys.exit(run())
File "/usr/lib/ckan/default/lib/python2.7/site-packages/paste/script/command.py", line 102, in run
invoke(command, command_name, options, args[1:])
File "/usr/lib/ckan/default/lib/python2.7/site-packages/paste/script/command.py", line 141, in invoke
exit_code = runner.run(args)
File "/usr/lib/ckan/default/lib/python2.7/site-packages/paste/script/command.py", line 236, in run
result = self.command()
File "/usr/lib/ckan/default/src/ckan/ckan/lib/cli.py", line 1108, in command
self._load_config()
File "/usr/lib/ckan/default/src/ckan/ckan/lib/cli.py", line 330, in _load_config
self.site_user = load_config(self.options.config, load_site_user)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/cli.py", line 237, in load_config
load_environment(conf.global_conf, conf.local_conf)
File "/usr/lib/ckan/default/src/ckan/ckan/config/environment.py", line 116, in load_environment
p.load_all()
File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 140, in load_all
load(*plugins)
File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 154, in load
service = _get_service(plugin)
File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 257, in _get_service
raise PluginNotFoundException(plugin_name)
ckan.plugins.core.PluginNotFoundException: pdf_view
I found the solution. When I entered into the virtual environment I ran the pip command with path. First I found the path with which pip command in the virtual env and then I ran the pip command with path.

Librabbitmq 2.0.0 with Python 3 gives TypeError: can't pickle memoryview objects

I am using the latest master branch of the git repo https://github.com/celery/librabbitmq and installing librabbitmq==2.0.0 for Python 3.6 by following the instructions in the readme
Using the development version
You can clone the repository by doing the following:
$ git clone git://github.com/celery/librabbitmq.git
Then install it by doing the following:
$ cd librabbitmq
$ make install # or make develop
This works fine (after installing certain binaries for c compliation in the OS), but when I then make a small a+b add task and call it with add.delay(2,2) it fails with the following error. I looked up and saw that Celery 4 uses json as serializer, so clearly it is not because if pickle serialization
Changing from librabbitmq to pyamqp broker works normally
Same exact situation in both MacOS and Ubuntu 16
[2018-04-30 23:40:02,956: CRITICAL/MainProcess] Unrecoverable error:
SystemError(' returned a result with an error set',) Traceback (most
recent call last): File
"/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/kombu/messaging.py",
line 624, in _receive_callback
return on_m(message) if on_m else self.receive(decoded, message) File
"/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/consumer/consumer.py",
line 570, in on_task_received
callbacks, File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/strategy.py",
line 145, in task_message_handler
handle(req) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/worker.py",
line 221, in _process_task_sem
return self._quick_acquire(self._process_task, req) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/kombu/async/semaphore.py",
line 62, in acquire
callback(*partial_args, **partial_kwargs) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/worker.py",
line 226, in _process_task
req.execute_using_pool(self.pool) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/request.py",
line 531, in execute_using_pool
correlation_id=task_id, File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/concurrency/base.py",
line 155, in apply_async
**options) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/billiard/pool.py",
line 1486, in apply_async
self._quick_put((TASK, (result._job, None, func, args, kwds))) File
"/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/concurrency/asynpool.py",
line 813, in send_job
body = dumps(tup, protocol=protocol) TypeError: can't pickle memoryview objects
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File
"/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/worker.py",
line 203, in start
self.blueprint.start(self) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/bootsteps.py",
line 119, in start
step.start(parent) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/bootsteps.py",
line 370, in start
return self.obj.start() File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/consumer/consumer.py",
line 320, in start
blueprint.start(self) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/bootsteps.py",
line 119, in start
step.start(parent) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/consumer/consumer.py",
line 596, in start
c.loop(*c.loop_args()) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/loops.py",
line 88, in asynloop
next(loop) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/kombu/async/hub.py",
line 354, in create_loop
cb(*cbargs) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/kombu/transport/base.py",
line 236, in on_readable
reader(loop) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/kombu/transport/base.py",
line 218, in _read
drain_events(timeout=0) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/librabbitmq-2.0.0-py3.6-macosx-10.6-intel.egg/librabbitmq/init.py",
line 227, in drain_events
self._basic_recv(timeout) SystemError: returned a result with an error set
This library is not recommended to use as rabbitmq broker with celery. Instead please try py-amqp. this is more maintained and less buggy.

IPython Notebook (Jupyter) Bash Kernel raising FileNotFoundError with Python3

Recently upgraded to Anaconda Python3 but now when I try to launch a new notebook with the bash kernel I get the below traceback indicating that it is looking still for my previous python interpreter. Not sure how this can be updated to point to my new python in anaconda3 folder. Any help would be appreciated.
[E 10:40:58.086 NotebookApp] Unhandled error in API request
Traceback (most recent call last):
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/html/base/handlers.py", line 365, in wrapper
result = yield gen.maybe_future(method(self, *args, **kwargs))
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/html/services/sessions/handlers.py", line 53, in post
model = sm.create_session(path=path, kernel_name=kernel_name)
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/html/services/sessions/sessionmanager.py", line 66, in create_session
kernel_name=kernel_name)
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/html/services/kernels/kernelmanager.py", line 84, in start_kernel
kernel_name=kernel_name, **kwargs)
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/kernel/multikernelmanager.py", line 112, in start_kernel
km.start_kernel(**kwargs)
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/kernel/manager.py", line 240, in start_kernel
**kw)
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/kernel/manager.py", line 189, in _launch_kernel
return launch_kernel(kernel_cmd, **kw)
File "/ebs/anaconda3/lib/python3.4/site-packages/IPython/kernel/launcher.py", line 213, in launch_kernel
proc = Popen(cmd, **kwargs)
File "/ebs/anaconda3/lib/python3.4/subprocess.py", line 859, in __init__
restore_signals, start_new_session)
File "/ebs/anaconda3/lib/python3.4/subprocess.py", line 1457, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: '/root/anaconda/bin/python'
Use a symbolic link to achieve this:
ln -s /path/to/old/binary /path/to/new/binary
If you receive an insufficient permissions error, prepend sudo to the command.
In your case, you should symlink /root/anaconda/bin/python to the binary in your anaconda folder.

how to properly resolve gsutil error

I just installed gsutil on OS X, exactly following Google's instruction, and am seeing errors of the following format when running any gsutil command:
Traceback (most recent call last):
File "/Users//gsutil/gsutil", line 22, in <module>
gsutil.RunMain()
File "/Users//gsutil/gsutil.py", line 101, in RunMain
sys.exit(gslib.__main__.main())
File "/Users//gsutil/gslib/__main__.py", line 175, in main
command_runner = CommandRunner()
File "/Users//gsutil/gslib/command_runner.py", line 107, in __init__
self.command_map = self._LoadCommandMap()
File "/Users//gsutil/gslib/command_runner.py", line 113, in _LoadCommandMap
__import__('gslib.commands.%s' % module_name)
File "/Users//gsutil/gslib/commands/disablelogging.py", line 16, in <module>
from gslib.command import COMMAND_NAME
ImportError: cannot import name COMMAND_NAME
This error occurs on several modules in the commands directory. The only thing I could do to get rid of these errors is to remove the following modules from the directory which reference COMMAND_NAME: disablelogging, enablelogging, getacl, getcors, getdefacl, getlogging, setacl, setcors, setdefacl.
Did I do the right thing here? Is this a bug in gsutil?