Scrapinghub can't connect? - deployment

I am trying to simply deploy a Scrapy Spider to ScrapingHub using their rules provided. For some reason, it is searching for a Python 3.6 directory specifically, when it should be able to search for any 3.x Python directory. My spider is written on Python 3.5, and this is an issue. Scrapinghub says that identifying "scrapy:1.4-py3" will work for 3.x Python set, but this is obviously not true.
Also, for some reason, it can't seem to find my spider in the project. Is this related to the issue with the 3.6 directory.
Finally, I have installed everything needed from the requirements file.
C:\Users\Desktop\Empery Code\YahooScrape>shub deploy
Packing version 1.0
Deploying to Scrapy Cloud project "205357"
Deploy log last 30 lines:
Deploy log location: C:\Users\AppData\Local\Temp\shub_deploy_of5_m4
qg.log
Error: Deploy failed: b'{"status": "error", "message": "Internal build error"}'
_run(args, settings)
File "/usr/local/lib/python3.6/site-packages/sh_scrapy/crawl.py", line 103, in
_run
_run_scrapy(args, settings)
File "/usr/local/lib/python3.6/site-packages/sh_scrapy/crawl.py", line 111, in
_run_scrapy
execute(settings=settings)
File "/usr/local/lib/python3.6/site-packages/scrapy/cmdline.py", line 148, in
execute
cmd.crawler_process = CrawlerProcess(settings)
File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 243, in
__init__
super(CrawlerProcess, self).__init__(settings)
File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 134, in
__init__
self.spider_loader = _get_spider_loader(settings)
File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 330, in
_get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "/usr/local/lib/python3.6/site-packages/scrapy/spiderloader.py", line 61,
in from_settings
return cls(settings)
File "/usr/local/lib/python3.6/site-packages/scrapy/spiderloader.py", line 25,
in __init__
self._load_all_spiders()
File "/usr/local/lib/python3.6/site-packages/scrapy/spiderloader.py", line 47,
in _load_all_spiders
for module in walk_modules(name):
File "/usr/local/lib/python3.6/site-packages/scrapy/utils/misc.py", line 63, i
n walk_modules
mod = import_module(path)
File "/usr/local/lib/python3.6/importlib/__init__.py", line 126, in import_mod
ule
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 978, in _gcd_import
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 948, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'YahooScrape.spiders'
{"message": "list-spiders exit code: 1", "details": null, "error": "build_error"
}
{"status": "error", "message": "Internal build error"}
C:\Users\Desktop\Empery Code\YahooScrape>\
Scrapy.cfg file:
# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# https://scrapyd.readthedocs.org/en/latest/deploy.html
[settings]
default = YahooScrape.settings
[deploy]
#url = http://localhost:6800/
project = YahooScrape
Scrapinghub.yml code:
project: -----
requirements:
file: requirements.txt
stacks:
default: scrapy:1.4-py3
Folder image

Make sure your directory tree looks like this:
$ tree
.
├── YahooScrape
│   ├── __init__.py
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── spiders
│   ├── yahoo.py
│   └── __init__.py
├── requirements.txt
├── scrapinghub.yml
├── scrapy.cfg
└── setup.py
Pay special attention to YahooScrape/spiders/. It should contain a __init__.py file (an empty one is fine), and your different spiders, usually as seperate .py files.
Otherwise YahooScrape.spiders cannot be understood as a Python module, hence the "ModuleNotFoundError: No module named 'YahooScrape.spiders'" message.

Related

Testdriven.io: Scalable FastAPI Applications on AWS. pytest not found main module

while i try to this course https://testdriven.io/courses/scalable-fastapi-aws/
i have some problems.
when i type
talk-booking/services/talk_booking $ poetry run pytest tests/integration
error raises
=============================================== test session starts ================================================
platform darwin -- Python 3.11.1, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /Users/blarblar/Workspace/talk-booking/services/talk_booking
plugins: cov-4.0.0, anyio-3.6.2
collected 1 item / 1 error
====================================================== ERRORS ======================================================
___________________________ ERROR collecting tests/integration/test_web_app/test_main.py ___________________________
ImportError while importing test module '/Users/blarblar/Workspace/talk-booking/services/talk_booking/tests/integration/test_web_app/test_main.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/Cellar/python#3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/integration/test_web_app/test_main.py:4: in <module>
from web_app.main import app
E ModuleNotFoundError: No module named 'web_app'
============================================= short test summary info ==============================================
ERROR tests/integration/test_web_app/test_main.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================= 1 error in 0.29s =================================================
my directory is
.
├── README.md
└── services
└── talk_booking
├── poetry.lock
├── pyproject.toml
├── tests
│ ├── __init__.py
│ └── integration
│ └── test_web_app
│ ├── __init__.py
│ └── test_main.py
└── web_app
├── __init__.py
└── main.py
i check health-check api work well. 127.0.0.1:8000
when i write test code in main.py, it works well.
if i resolve imporing issue, that will be perfect

Import File Mismatch in pytest with same test names

This is a much asked question, but none of the solutions mentioned on SO have worked so far.
The folder structure is as follows:
project/
└── tests/
├── conftest.py
├── __init__.py
└── int_tests/
└── test_device.py
└── project_core/
└── tests/
├── conftest.py
├── __init__.py
└── int_tests/
└── test_device.py
import file mismatch:
imported module 'test_device' has this __file__ attribute:
/home/.../project/project_core/tests/int_tests/test_device.py
which is not the same as the test file we want to collect:
/home/.../project/tests/int_tests/test_device.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules
Steps tried so far:
Removing pycache and pyc files.
Adding _init to each folder. (As is stated in pytest GIP)
Removing _init from each folder.
Do i need init files in each tests/subfolder?
The same error occurs with conftest.py as well. This error is not limited to vscode-pytest plugin, also occurs on the terminal.
PS : For CI purposes, the system is configured with docker & tox. Development is done in venv.

How to access templates part of a package from a script within a package

I have trouble creating a package with setuptools. I have a repository which I'm cleaning up to make it a package. The directory structure looks something like this
my-proj
├── setup.py
├── MANIFEST.in
├── MakeFile
├── README.rst
├── setup.py
└── myproj
├── __init__.py
├── my_code.py
├── templates
│ ├── template1.yaml
│ ├── template2.yaml
Initial version of "my_code.py" had code snippet which would directly reference the files withing templates folder to do some processing. If I package this using setup tools, I provide the following information in these files:
MANIFEST.in:
include README.rst
include requirements.txt
include LICENSE.txt
recursive-include myproj/templates *
setup.py:
setup(
name='myproj',
package_dir={'testbed_init': 'testbed_init'},
package_data={'templates': ['templates/*'], 'configs': ['configs/*']},
include_package_data=True,
)
My question is as follows. In "my_Code.py" I used to reference templates directly without any problem as I would run script from the myproj folder. If I package this, how can I make sure, I include the templates as part of package and when script runs, I need to open the templates relative to where the package is installed.
Code snippet from my_code.py:
if _type == "a":
temp_file = f"templates/template1.yaml"
else:
temp_file = f"templates/template2.yaml"
build_config(deploy_esx_file, output_file, data)
Code snippet of what happens in build_config:
def build_config(template_file, output_file, inputs):
templateLoader = jinja2.FileSystemLoader(searchpath="./")
templateEnv = jinja2.Environment(loader=templateLoader)
template = templateEnv.get_template(template_file)
outputText = template.render(inputs)
with open(output_file, 'w') as h:
h.write(outputText)

learning python packaging, the old ModuleNotFoundErrro

What am I doing wrong here???
My structure :-
├── tst
│   ├── setup.py
│   └── tst
│   ├── __init__.py
│   ├── mre.py
│   └── start.py
contents of start.py
from mre import mre
def proc1():
mre.more()
return ('ran proc1')
if __name__ == "__main__":
print('test')
print(proc1())
contents of mre.py
class mre(object):
def more():
print('this is some more')
contents of setup.py
from setuptools import setup
setup(name='tst',
version='0.1',
description='just a test',
author='Mr Test',
author_email='test#example.com',
entry_points={'console_scripts': ['tst=tst.start:proc1']},
license='MIT',
packages=['tst'],
zip_safe=False)
nothing in __init__.py
When I run this from the command line all is fine, runs as expected.
However when I package this up using PIP and run using tst I get:-
Traceback (most recent call last):
File "/home/simon/.local/bin/tst", line 5, in <module>
from tst.start import proc1
File "/home/simon/.local/lib/python3.8/site-packages/tst/start.py", line 1, in <module>
from mre import mre
ModuleNotFoundError: No module named 'mre'
I've read numerous posts and I just can't seem to figure this out, if I go into the installed code and change the line
from mre import mre
to
from tst.mre import mre
then it works, but then that doesn't work when running it from the dir for development purposes... I'm obviously missing something obvious :) is it a path issue or am I missing a command in the setup.py?
If someone could point me in the right direction?
edit: do I need to do something different while developing a module thats going to be packaged, perhaps call the code some different way?
cheers
From my point of view, the absolute import from tst.mre import mre is the right thing. You could eventually use from .mre import mre, but the absolute import is safer.
For development purposes:
Use pip's editable mode:
path/to/pythonX.Y -m pip install --editable .
Similar to setuptools develop mode which is slowly going towards deprecation path/to/pythonX.Y setup.py develop.
And run the console script, or the executable module:
tst
path/to/pythonX.Y -m tst.start
Without installation, it is often sill possible to run the executable module:
path/to/pythonX.Y -m tst.start.

Cython project structure with dependent extension classes

I'm getting to the point with a project where I need a proper directory structure. I'm trying to arrange this and getting ImportErrors when using my cython extension classes.
The directory structure looks like:
.
├── __init__.py
├── Makefile
├── README.rst
├── setup.py
├── src
│   ├── foo.pxd
│   ├── foo.pyx
│   ├── __init__.py
│   └── metafoo.pyx
└── test
├── test_foo.py
└── test_metafoo.py
The contents of all files can be found (in commit e635617 at time of writing) of this github repo.
My setup.py looks like the following:
from setuptools import setup, Extension, Command
from Cython.Build import cythonize
SRC_DIR = "src"
PACKAGES = [SRC_DIR]
ext_foo = Extension(SRC_DIR + ".foo",
[SRC_DIR + "/foo.pyx"]
)
ext_meta = Extension(SRC_DIR + ".metafoo",
[SRC_DIR + "/metafoo.pyx"]
)
EXTENSIONS = cythonize([ext_foo, ext_meta])
setup(
name = 'minimalcriminal',
packages=PACKAGES,
ext_modules=EXTENSIONS
)
The complexity seems to lie in that extension classes in metafoo.pyx use extension classes from foo.pyx.
After building with python setup.py build_ext --inplace, the test_foo.py program runs ok:
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
import src.foo as foo
somefoo = foo.Foo(2)
somefoo.special_print()
When run from both the cyproj/test and cyproj directories:
/cyproj$ python test/test_foo.py
The value of somefield is: 2
and
/cyproj/test$ python test_foo.py
The value of somefield is: 2
But the test_metafoo.py crashes when run in the cyproj/test directory:
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
import src.foo as foo
import src.metafoo as metafoo
lotsafoo = [foo.Foo(i) for i in range(4)]
mf = metafoo.MetaFoo(lotsafoo)
mf.special_print()
With the message:
ubuntu#ubuntu-UX21E:/projects/cyproj/test$ python test_metafoo.py
Traceback (most recent call last):
File "test_metafoo.py", line 6, in <module>
import src.metafoo as metafoo
File "cyproj/src/foo.pxd", line 6, in init cyproj.src.metafoo (src/metafoo.c:1154)
ImportError: No module named cyproj.src.foo
But runs properly from the parent cyproj directory:
/cyproj$ python test/test_metafoo.py
The value of somefield is: 0
The value of somefield is: 1
The value of somefield is: 2
The value of somefield is: 3
I don't really get what's driving the different behaviour of these errors. If I can't use import src.foo in test_metafoo.py why does it work in test_foo.py?
Similarly if I open up an interactive session in the parent directory and try to import all:
In [1]: from src import *
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-7b8bc2c1dfb9> in <module>()
----> 1 from src import *
/projects/cyproj/cyproj/src/foo.pxd in init cyproj.src.metafoo (src/metafoo.c:1154)()
ImportError: No module named cyproj.src.foo
When src/__init__.py looks like:
__all__ = ["foo", "metafoo"]
Which I thought would allow importing all...
I was able to compile and test your package after removing the __init__.py file from the project root directory and changing test_foo.py and test_metafoo.py.
sys.path.append(os.path.abspath("."))
sys.path.append(os.path.abspath("../"))