Using Beautiful Soup on the heroku Application - github

I am trying to deploy a bot I made in Python using the following libraries:
requests, beautifulsoup4, discord.
This is to be deployed using I believe git hub and Heroku. The bot deploys successfully; however, when I check the logs, the bot has crashed. Here is the error message:
2020-05-17T23:17:42.624634+00:00 app[api]: Deploy 83c32a30 by user ****************************
2020-05-17T23:17:42.624634+00:00 app[api]: Release v12 created by user ****************************
2
2020-05-17T23:17:43.134443+00:00 heroku[worker.1]: State changed from crashed to starting
2020-05-17T23:17:48.338694+00:00 heroku[worker.1]: State changed from starting to up
2020-05-17T23:17:51.764352+00:00 heroku[worker.1]: State changed from up to crashed
2020-05-17T23:17:51.660991+00:00 app[worker.1]: Traceback (most recent call last):
2020-05-17T23:17:51.661016+00:00 app[worker.1]: File "BocoBot_Version1.py", line 126, in <module>
2020-05-17T23:17:51.661182+00:00 app[worker.1]: soup = BeautifulSoup(source, 'lxml')
2020-05-17T23:17:51.661184+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.6/site-packages/bs4/__init__.py", line 245, in __init__
2020-05-17T23:17:51.661401+00:00 app[worker.1]: % ",".join(features))
**2020-05-17T23:17:51.661423+00:00 app[worker.1]: bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?**
2020-05-17T23:17:57.000000+00:00 app[api]: Build succeeded
I believe this is the issue in question:
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
But I do not know what I need to do to resolve it. My guess is that it has to do with my requirements.txt file where I tell it what packages to add. But no matter what changes I make to BeautifulSoup4, it continues not to work.
Here is the requirements.txt file information:
git+https://github.com/Rapptz/discord.py
PyNaCl==1.3.0
pandas
beautifulsoup4
requests
discord
dnspython==1.16.0
async-timeout==3.0.1
Any suggestions would be greatly appreciated and I will be happy to add more information.

Try adding lxml to your requirements.txt.

Related

TypeError--using slurm queue to submit pyiron jobs

I'm facing some issues while running pyiron jobs on my HPC via the pysqa adapter. I had accidentally erased the main pyiron directory containing pyiron, projects and resources folders. I had copied all the three from another cluster. The only thing that I think will cause problem is sqlite.db file in the resources folder. Previously, I had no issues running interactive VASP jobs through the adapter. I'm guessing something happened after the deletion incident.
The pyiron version I'm using is: 0.2.17
Here is a minimal example using an Interactive vasp job that I have tried:
from pyiron import Project
pr = Project('Al-test')
structure = pr.create_structure('Al', 'fcc', 4.05)
pr.remove_jobs(recursive=True)
from pysqa import QueueAdapter
sqa = QueueAdapter(directory='~/pyiron/resources/queues/')
sqa.queue_view
pr.job_table()
job = pr.create_job(pr.job_type.Vasp, 'job_int')
job.structure = structure
job.server.run_mode.interactive = True
job.executable.executable_path = '~/pyiron/resources/vasp/bin/run_vasp_5.4.4_std_mpi.sh'
job.input.incar['NCORE']=4
job.server.queue = 'slurm'
job.server.cores=16
job.server.view_queues()
sqa.get_queue_status()
job.run(run_again=True)
end of the error log:
~/pyiron/pyiron/pyiron/base/server/generic.py in queue_id(self, qid)
208 qid (int): queue ID
209 """
--> 210 self._queue_id = int(qid)
211
212 #property
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
Some inputs/feedback on this would be greatly appreciated.
Thanks!
We updated the queuing system interface in pyiron 0.3.X you can read more about this here:
https://pyiron.org/news/releases/2020/09/06/pyiron-0-3-X-HPC-release.html
For pyiron 0.3.X we have a detailed installation guide available on readthedocs.org:
https://pyiron.readthedocs.io/en/latest/source/installation.html#remote-hpc-cluster
So I highly recommend updating to pyiron 0.3.13.
Apart from this the error message basically says that the submission was not successful. If you navigate to the jobs working directory job.working_directory you should find a run_queue.sh script in the working directory. This is the script pyiron is using to submit the job to the queuing system. You can try to submit it manually using sbatch run_queue.sh this should print the queue id if successful and otherwise the error message from your queuing system.

IBM Language Translator returns 403 Forbidden upon identify()

I followed the official documentation to create a multilingual Watson assistant outlined here:
https://github.com/with-watson/multilingual-chatbot
However, after deploying the function on IBM Cloud and testing the deployed function via IBM Cloud CLI with the below command, I am getting an error (logs below):
bx wsk action invoke translator --result --param text "Hallo, ich habe eine Frage."
{
"error": "The action did not return a dictionary."
}
"2020-01-13T12:54:57.787506Z stderr: Traceback (most recent call last):",
"2020-01-13T12:54:57.787554Z stderr: File \"pythonrunner.py\", line 88, in run",
"2020-01-13T12:54:57.787560Z stderr: exec('fun = %s(param)' % self.mainFn, self.global_context)",
"2020-01-13T12:54:57.787564Z stderr: File \"<string>\", line 1, in <module>",
"2020-01-13T12:54:57.787568Z stderr: File \"__main__.py\", line 98, in main",
"2020-01-13T12:54:57.787571Z stderr: response = translator.identify( text )",
"2020-01-13T12:54:57.787575Z stderr: File \"/action/virtualenv/lib/python3.6/site-packages/watson_developer_cloud/language_translator_v3.py\", line 193, in identify",
"2020-01-13T12:54:57.787579Z stderr: accept_json=True)",
"2020-01-13T12:54:57.787583Z stderr: File \"/action/virtualenv/lib/python3.6/site-packages/watson_developer_cloud/watson_service.py\", line 587, in request",
"2020-01-13T12:54:57.787587Z stderr: info=error_info, httpResponse=response)",
"2020-01-13T12:54:57.787591Z stderr: watson_developer_cloud.watson_service.WatsonApiException: Error: Forbidden, Code: 403",
"2020-01-13T12:54:57.788Z stderr: The action did not initialize or run as expected. Log data might be missing."
Looks like the API key is recognized but not permitted to be used for this action, however the key being used does return the right values when used via cURL.
The code executed in main is the same as provided on the Github above, I did not make any changes.
Any ideas on how to fix this issue? Thanks!
The key string used by curl is a bearer token. The API key needed by the cloud function is probably one provided by Identity and Access Management, IAM.
In the https://cloud.ibm.com console GUI in the top click Manage > Access (IAM) then select the IBM Cloud API keys on the left and select an API key. This creates an API key that represents you, just like login name and credentials. This is the simplest way to get this to work, but is not great for production.
For production consider using a Service ID and probably in combination with Access Group.
Here's what worked for me with additional changes
I have run the below command to update the packages mentioned in the environment.yml file
conda update --all
The conda version on my machine is 4.8.1
cloud-functions/wsk/functions/fn plugin version is 1.0.36
While creating Language Translator instance make sure to choose the right region.
It worked for me after I changed it.

Flow server and vscode flow extension breaking after updating to catlina and mojave

Does anyone encounter this issue?
Connection to server got closed. Server will not be restarted.
This I am getting when I am checking out to old commit and in locus-dashboard (there we do have an old version of flow) and then switching back to current. Then it starts throwing an error Connection to server got closed. Server will not be restarted..
This are the logs of flow.
[Info - 12:03:15 PM - locus-dashboard-v2/.flowconfig] Found flow using option `useNPMPackagedFlow`
[Info - 12:03:16 PM - locus-dashboard-v2/.flowconfig] Using flow '/Users/shubanusharma/workspace/locus-dashboard-v2/node_modules/flow-bin/flow-osx-v0.111.3/flow' (v0.111.3)
Unhandled exception: (Sys_error "/tmp/daemon_param688afa.bin: Permission denied")
Raised by primitive operation at file "stdlib.ml", line 316, characters 29-55
Called from file "filename.ml", line 259, characters 7-73
Re-raised at file "filename.ml", line 261, characters 30-37
Called from file "hack/utils/sys/daemon.ml", line 267, characters 2-53
Called from file "hack/utils/jsonrpc/jsonrpc.ml", line 215, characters 4-357
Called from file "src/lsp/flowLsp.ml", line 1555, characters 15-36
Called from file "src/commands/commandUtils.ml", line 13, characters 4-32
[Error - 12:03:16 PM] Connection to server got closed. Server will not be restarted.
I've tried cleaning up node_modules, Clearing yarn and npm cache, reinstalling extension.
This seems to be a Catalina permissions issue running flow with sudo works for flow but vscode extension have same issue still.
Not a permanent solution but changing tmp folder permissions to 777 fixing this.
Go to your repo dir and stop server yarn flow stop
Change /tmp dir permissions sudo chmod 777 /tmp
Start flow server yarn flow start
Restart vscode flow client by cmd+shift+p(windows ctrl+shift+p) type restart client press enter
[EDIT 15-jun-2021]
In most cases just giving permission is enough no need of stopping the server
Also, this not only happens in Catalina happens Catalina onwards
Mac OS Catlina
Mac OS Mojave
I've found both contains the issue

Airflow scheduler is throwing out an error - 'DisabledBackend' object has no attribute '_get_task_meta_for'

I am trying to install airflow (distributed mode) in WSL, I got the setup of Airflow webserver, Airflow Scheduler, Airflow Worker, Celery (3.1) and RabbitMQ.
While running the Airflow Scheduler it is throwing out this error (below) even though the backend is set up.
ERROR
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", line 92, in sync
state = task.state
File "/usr/local/lib/python3.6/dist-packages/celery/result.py", line 398, in state
return self._get_task_meta()['status']
File "/usr/local/lib/python3.6/dist-packages/celery/result.py", line 341, in _get_task_meta
return self._maybe_set_cache(self.backend.get_task_meta(self.id))
File "/usr/local/lib/python3.6/dist-packages/celery/backends/base.py", line 288, in get_task_meta
meta = self._get_task_meta_for(task_id)
AttributeError: 'DisabledBackend' object has no attribute '_get_task_meta_for'
https://issues.apache.org/jira/browse/AIRFLOW-1840
This is the exact error I am getting but couldn't find a solution.
Result Backend-
result_backend = db+postgresql://postgres:****#localhost:5432/postgres
broker_url = amqp://rabbitmq_user_name:rabbitmq_password#localhost/rabbitmq_virtual_host_name
Help please, gone through almost all the documents but couldn't find a solution
I was facing the same issue on celery version - 3.1.26.post2 (with rabitmq,postgresql and airflow),the reason for this issue is the dictionary used in celery base.py file at(lib/python3.5/site-packages/celery/app/base.py)
does not capture celery backend at key CELERY_RESULT_BACKEND instead it captures at key result_backend.
So the solution here is go to _get_config function available in base.py file at(lib/python3.5/site-packages/celery/app/base.py),at the end of the function before returning dictionary s add the below code.
s['CELERY_RESULT_BACKEND'] = s['result_backend'] #code to be added
return s
This solved the problem.

pypi erro when try to install gcsfs in google composer(airflow)

I use google composer-1.0.0-airflow-1.9.0. I used dask in one of my DAG and wanted to setup composer to use dask. One of the required package for this DAG is gcsfs. When I tried to install it via Web UI I got the below error:
Composer Backend timed out. Currently running tasks are [stage: CP_COMPOSER_AGENT_RUNNING description: "Composer Agent Running. Latest Agent Stage: stage: DEPLOYMENTS_UPDATED\n ." response_timestamp { seconds: 1540331648 nanos: 860000000 } ].
Updated:
The error is coming from this line of code when dask tries to read file from gcp bucket:dd.read_csv(bucket)
log:
[2018-10-24 22:25:12,729] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/dask/bytes/core.py", line 350, in get_fs_token_paths
[2018-10-24 22:25:12,733] {base_task_runner.py:98} INFO - Subtask: fs, fs_token = get_fs(protocol, options)
[2018-10-24 22:25:12,735] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/dask/bytes/core.py", line 473, in get_fs
[2018-10-24 22:25:12,740] {base_task_runner.py:98} INFO - Subtask: "Need to install `gcsfs` library for Google Cloud Storage support\n"
[2018-10-24 22:25:12,741] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/dask/utils.py", line 94, in import_required
[2018-10-24 22:25:12,748] {base_task_runner.py:98} INFO - Subtask: raise RuntimeError(error_msg)
[2018-10-24 22:25:12,751] {base_task_runner.py:98} INFO - Subtask: RuntimeError: Need to install `gcsfs` library for Google Cloud Storage support
[2018-10-24 22:25:12,756] {base_task_runner.py:98} INFO - Subtask: conda install gcsfs -c conda-forge
[2018-10-24 22:25:12,758] {base_task_runner.py:98} INFO - Subtask: or
[2018-10-24 22:25:12,762] {base_task_runner.py:98} INFO - Subtask: pip install gcsfs
When tried to install gcsfs in google composer UI using pypi got below error:
{
insertId: "17ks763f726w1i"
logName: "projects/xxxxxxxxx/logs/airflow-worker"
receiveTimestamp: "2018-10-25T15:42:24.935880717Z"
resource: {…}
severity: "ERROR"
textPayload: "Traceback (most recent call last):
File "/usr/local/bin/gcsfuse", line 7, in <module>
from gcsfs.cli.gcsfuse import main
File "/usr/local/lib/python2.7/site-
packages/gcsfs/cli/gcsfuse.py", line 3, in <module>
fuse import FUSE
ImportError: No module named fuse
"
timestamp: "2018-10-25T15:41:53Z"
}
Unfortunately, your error mssage doesn't mean much to me.
gcsfs is pure python code, so it is very unlikely that anything is going wrong with installing it - as is done very commonly with pip or conda. The dependency libraries are a bunch of google ones, some of which may require compilation (I don't know), so I would suggest trying to find out from logs which one is stalling and taking it up with them. On the other hand, this kind of thing can often be a network/intermittent problem, so waiting may also fix things.
For the future, I recommend basing installations around conda, which never needs to compile anything and is generally better at dependency tracking.
This has to do with the fact that Composer and Airflow have silent dependencies and they are not syncd. So if gcsfs installation has conflicts with Airflow dependency, we get this error. More details here. The only workarounds ( other than updating to the Nov 28 release of composer) are:
Source: Thanks to Jake Biesinger (jake.biesinger#infusionsoft.com)
use a separate Kubernetes Pod for running various jobs, but it's a
large change and requires infra we're not very familiar with (GKE).
This particular issue can also be solved by installing dbt in a
PythonVirtualEnvOperator, then having the python_callable re-use the
virtualenv's bin dir, something like:
``` def _run_cmd_in_virtual_env(cmd):
subprocess.check_call(os.path.join(os.path.split(sys.argv[0])[0], cmd)
task =
PythonVirtualEnvOperator(python_callable=_run_cmd_in_virtual_env,
op_args=('dbt',)) # this will call the temporarily-installed dbt
binary, something like /tmp/virtualenv-asdasd/bin/dbt.
```
I haven't tried this, but this might help you out.
In general, installing arbitrary system packages (like fuse or whatever which becomes the dependencies of what you are trying to install) is not supported by Google Composer. As discussed here: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!searchin/cloud-composer-discuss/sugimiyanto%7Csort:date/cloud-composer-discuss/jpxAGCPFkZo/mCx_P1LPCQAJ
However, you may be able to do this by uploading the package folder that you have installed it in your local (i.e. fuse), into your Google Cloud Storage bucket for example: gs://<your_bukcet_name>/libs, so that it becomes shared libraries.
Then, you can set LD_LIBRARY_PATH environment variable in Google Composer to /home/airflow/gcs/libs, to make GCC look for shared libraries in that directory.
Then, try to reinstall the gcsfs using pypi Google Composer.