Pytest: load and run test from notebook - pytest

In the official Databricks example a notebook is created which runs tests with unittest:
The notebook imports another notebook (containing the test_* methods) with %run
The test is launched in the notebook with unittest.main(...
I would like to do the same thing with pytest:
I imported the notebook containing th test_* methods using %run
I tried to launch the tests with retcode = pytest.main([]) but I always get no tests ran
Notebook containing the tests (notebook_my_test)
def test_trivial2():
myvar=True
assert myvar == True
Main notebook:
%run ./notebook_my_test
import pytest
retcode = pytest.main([])

Related

Unit testing in Databricks notebooks

The following code is intended to run unit tests in Databricks notebooks, using pytest.
import pytest
import os
import sys
repo_name = "Databricks-Code-Repo"
# Get the path to this notebook, for example "/Workspace/Repos/{username}/{repo-name}".
notebook_path = dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
# Get the repo's root directory name.
repo_root = os.path.dirname(os.path.dirname(notebook_path))
# Prepare to run pytest from the repo.
os.chdir(f"/Workspace/{repo_root}/{repo_name}")
print(os.getcwd())
# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True
# Run pytest.
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])
# Fail the cell execution if there are any test failures.
assert retcode == 0, "The pytest invocation failed. See the log for details."
This code snippet is in the guide provided by Databricks.
However, it produces the following error:
PermissionError: [Errno 1] Operation not permitted: '/Workspace//Repos/<email_address>/Databricks-Code-Repo/Databricks-Code-Repo'
This notebook is inside Databricks Repos. I have two other notebooks:
functions (where I have defined three data transformation functions);
test_functions (where I have defined test function for each of the data transformation functions from the previous notebook).
I get that the error has something to do with permissions, but I can't figure out what is causing it. I will appreciate any suggestions.

how to pytest an app that can use ipython embed as arg parameter?

I have a python application that has an option "-y" to end its procedure in a ipython terminal with all objects created ready for an interactive manipulation.
I'm trying to think in how can I design a pytest that could allow me, somehow, to interact with this terminal, to check if objects are there in a python session, exit, and then capture the results for assert (I know how to use capsys for example).
During my attempts (all failed so far) I got a suggestion to use pytest -s option which, obviously, is not my case.
So I have this example:
go_to_python.py
import argparse
import random
parser = argparse.ArgumentParser()
parser.add_argument(
"-y",
"--ipython",
action="store_true",
dest="ipython",
help="start iPython interpreter")
args = parser.parse_args()
if __name__ == "__main__":
randomlist = []
for i in range(0, 5):
n = random.randint(1, 30)
randomlist.append(n)
if args.ipython:
import IPython
IPython.embed(colors="neutral")
How could I create a test that could assert that randomlist is inside the ipython session?

How to force pytest return error code error code

I have following structure:
Koholo job that calling python script, the script returns error code (1 - failed, 0 - passed) as it ends. Koholo wait for the error code to continue to next job step (next scrips).
Now instead of python script I'm running pytest scrips (with command: python -m pytest test_name) but pytest is not returning error code, so the Kohola job timeouts.
Please let me know if there is a way that pytest will return error code as it finish's?
example you can pass any pytest argument that you normaly pass in the cli, i am just using the markers as an example
import sys
import pytest
results = pytest.main(["-m", "my_marker"])
sys.exit(results)
if you want more details
https://docs.pytest.org/en/7.1.x/reference/exit-codes.html
when pytest finish it calls pytest_sessionfinish(session, exitstatus) method.
try to add sys.exit(exitstatus) to this method.
import sys
def pytest_sessionfinish(session, exitstatus):
""" whole test run finishes. """
sys.exit(exitstatus)
also you can check by running this script to check the exit code
start /wait python -m pytest test_name
echo %errorlevel%

ModuleNotFoundError: No module named 'pyspark.dbutils' while running multiple.py file/notebook on job clusters in databricks

I am working in TravisCI, MlFlow and Databricks environment where .tavis.yml sits at git master branch and detects any change in .py file and whenever it gets updated, It will run mlflow command to run .py file in databricks environment.
my MLProject file looks as following:
name: mercury_cltv_lib
conda_env: conda-env.yml
entry_points:
main:
command: "python3 run-multiple-notebooks.py"
Workflow is as following:
TravisCI detects change in master branch-->triggers build which will run MLFlow command and it'll spin up a job cluster in databricks to run .py file from repo.
It worked fine with one .py file but when I tried to run multiple notebook using dbutils, it is throwing
File "run-multiple-notebooks.py", line 3, in <module>
from pyspark.dbutils import DBUtils
ModuleNotFoundError: No module named 'pyspark.dbutils'
Please find below the relevant code section from run-multiple-notebooks.py
def get_spark_session():
from pyspark.sql import SparkSession
return SparkSession.builder.getOrCreate()
def get_dbutils(self, spark = None):
try:
if spark == None:
spark = spark
from pyspark.dbutils import DBUtils #error line
dbutils = DBUtils(spark) #error line
except ImportError:
import IPython
dbutils = IPython.get_ipython().user_ns["dbutils"]
return dbutils
def submitNotebook(notebook):
print("Running notebook %s" % notebook.path)
spark = get_spark_session()
dbutils = get_dbutils(spark)
I tried all the options and tried
https://stackoverflow.com/questions/61546680/modulenotfounderror-no-module-named-pyspark-dbutils
as well. It is not working :(
Can someone please suggest if there is fix for the above-mentioned error while running .py in job cluster. My code works fine inside databricks local notebook but running from outside using TravisCI and MLFlow isn't working which is must requirement for pipeline automation.

How can I debug my python unit tests within Tox with PUDB?

I'm trying to debug a python codebase that uses tox for unit tests. One of the failing tests is proving difficult due to figure out, and I'd like to use pudb to step through the code.
At first thought, one would think to just pip install pudb then in the unit test code add in import pudb and pudb.settrace(). But that results in a ModuleNotFoundError:
> import pudb
>E ModuleNotFoundError: No module named 'pudb'
>tests/mytest.py:130: ModuleNotFoundError
> ERROR: InvocationError for command '/Users/me/myproject/.tox/py3/bin/pytest tests' (exited with code 1)
Noticing the .tox project folder leads one to realize there's a site-packages folder within tox, which makes sense since the point of tox is to manage testing under different virtualenv scenarios. This also means there's a tox.ini configuration file, with a deps section that may look like this:
[tox]
envlist = lint, py3
[testenv]
deps =
pytest
commands = pytest tests
adding pudb to the deps list should solve the ModuleNotFoundError, but leads to another error:
self = <_pytest.capture.DontReadFromInput object at 0x103bd2b00>
def fileno(self):
> raise UnsupportedOperation("redirected stdin is pseudofile, "
"has no fileno()")
E io.UnsupportedOperation: redirected stdin is pseudofile, has no fileno()
.tox/py3/lib/python3.6/site-packages/_pytest/capture.py:583: UnsupportedOperation
So, I'm stuck at this point. Is it not possible to use pudb instead of pdb within Tox?
There's a package called pytest-pudb which overrides the pudb entry points within an automated test environment like tox to successfully jump into the debugger.
To use it, just make your tox.ini file have both the pudb and pytest-pudb entries in its testenv dependencies, similar to this:
[tox]
envlist = lint, py3
[testenv]
deps =
pytest
pudb
pytest-pudb
commands = pytest tests
Using the original PDB (not PUDB) could work too. At least it works on Django and Nose testers. Without changing tox.ini, simply add a pdb breakpoint wherever you need, with:
import pdb; pdb.set_trace()
Then, when it get to that breakpoint, you can use the regular PDB commands:
w - print stacktrace
s - step into
n - step over
c - continue
p - print an argument value
a - print arguments of current function