TemplateNotFound in Airflow - kubernetes

I have the following dir structure
.
├── ConfigSpark.yaml
├── project1
│   ├── dags
│   │   └── dag_1.py
│   └── sparkjob
│   └── spark_1.py
└── sparkutils
I'm trying to import de ConfigSpark.yaml file in my SparkKubernetesOperator using:
job= SparkKubernetesOperator(
task_id = 'job',
params=dict(
app_name='job',
mainApplicationFile='/opt/airflow/dags/project1/sparkjob/spark_1.py',
driverCores=1,
driverCoreRequest='250m',
driverCoreLimit='500m',
driverMemory='2G',
executorInstances=1,
executorCores=2,
executorCoreRequest='1000m',
executorCoreLimit='1000m',
executorMemory='2G'
),
application_file='/opt/airflow/dags/ConfigSpark.yaml',
kubernetes_conn_id='conn_prd_eks',
do_xcom_push=True
)
My dag is returning the following error:
jinja2.exceptions.TemplateNotFound: /opt/airflow/dags/ConfigSpark.yaml
I've noticed that if the DAG is in the same directory of ConfigSpark.yaml my tasks run perfectly, but why my task is not running when I place my dag in a subfolder?
I've checked my values.yaml file and airflowHome is /opt/airflow and defaultAirflowRepository is apache/airflow.
What is happening?

Airflow searches for the template file (ConfigSpark.yaml in your case) from the directory in which the DAG file is stored. Therefore, it doesn't find it automatically with your code.
If you would store the template file in same folder your DAG file is stored in (/project1/dags), or a nested folder inside the /project1/dags folder, you can specify the path from there in your task:
job = SparkKubernetesOperator(
...,
application_file='/path/to/ConfigSpark.yaml',
...
)
Which would read the template file from /project1/dags/path/to/ConfigSpark.yaml.
However, if the folder your template file is stored in is not a child of the folder your DAG file is stored in, the above won't work. In that case you can specify template_searchpath on the DAG-level:
with DAG(..., template_searchpath="/opt/airflow/dags/repo/dags") as dag:
job = SparkKubernetesOperator(
task_id='job',
application_file='ConfigSpark.yaml',
...,
)
This path (for example /opt/airflow/dags) is added to the Jinja searchpath and that way ConfigSpark.yaml will be found.

Related

How does DVC store differences on the directory level into DVC cache?

Can someone explain how DVC stores differences on the directory level into DVC cache.
I understand that the DVC-files (.dvc) are metafiles to track data, models and reproduce pipeline stages. However, it is not clear for me how the process of creating branches, commiting them and switching back to a master file is exactly saved in differences.
Short version:
.dvc file contains info (md5) about JSON file inside cache that describes current state of directory.
When directory gets updated, there is new md5 in .dvc file and new JSON file is created with updated state of directory.
In git, you store the .dvc file, so that DVC know (basing on md5) where to look for information about directory.
Longer version:
Let me try to break particular steps of directory handling with DVC.
Lets assume we have some data directory you want to add under DVC control:
data
├── 1
└── 2
You are using dvc add data to make DVC track you directory. In result, DVC produces data.dvc file. As you noted this file contains metadata required to connect your git repository with your data storage. Inside this file (besides other things) you can see:
outs:
- md5: f437247ec66d73ba66b0ade0246fcb49.dir
path: data
The md5 part is used to store information about directory in DVC cache (.dvc/cache):
(dvc3.7) ➜ repo$ tree .dvc/cache
.dvc/cache
├── 26
│   └── ab0db90d72e28ad0ba1e22ee510510
├── b0
│   └── 26324c6904b2a9cb4b88d6d61c81d1
└── f4
└── 37247ec66d73ba66b0ade0246fcb49.dir
If you will open the file with .dir suffix, you will see that it contains description of current data state:
(dvc3.7) ➜ repo$ cat .dvc/cache/f4/37247ec66d73ba66b0ade0246fcb49.dir
[{"md5": "b026324c6904b2a9cb4b88d6d61c81d1", "relpath": "1"},
{"md5": "26ab0db90d72e28ad0ba1e22ee510510", "relpath": "2"}]
As you can see, particular files(1 and 2) are described by entries in this file
When you change your directory:
(dvc3.7) ➜ repo$ echo 3 >> data/3
(dvc3.7) ➜ repo$ dvc commit data.dvc
The content of data.dvc will be updated:
outs:
- md5: 12f4b7d54a32e58818e27fba28376fba.dir
path: data
And there is new file inside the cache:
├── 12
│   └── f4b7d54a32e58818e27fba28376fba.dir
...
(dvc3.7) ➜ repo$ cat .dvc/cache/12/f4b7d54a32e58818e27fba28376fba.dir
[{"md5": "b026324c6904b2a9cb4b88d6d61c81d1", "relpath": "1"},
{"md5": "26ab0db90d72e28ad0ba1e22ee510510", "relpath": "2"},
{"md5": "6d7fce9fee471194aa8b5b6e47267f03", "relpath": "3"}]
From perspecitve of git the only change is inside data.dvc.
(Assuming you did git commit after adding data with 1 and 2 inside):
diff --git a/data.dvc b/data.dvc
index 098aec5..88d1a90 100644
--- a/data.dvc
+++ b/data.dvc
## -1,6 +1,6 ##
-md5: a427c5bf8680fbf8d1951806b28b82fe
+md5: 1b674d61c195eea7a6b14f176c020b9c
outs:
-- md5: f437247ec66d73ba66b0ade0246fcb49.dir
+- md5: 12f4b7d54a32e58818e27fba28376fba.dir
path: data
cache: true
metric: false
NOTE: First md5 corresponds to md5 of this file, so it had to change with dir md5 change

Is possible to auto generate documentation for pytest tests?

I have a project which contains only pytest tests, without modules or classes, which test remote project.
E.g. structure ->
.
├── __init__.py
├── test_basic_auth_app.py
├── test_basic_auth_user.py
├── test_advanced_app_id.py
├── test_advanced_user.py
└── test_oauth_auth.py
Tests look like
"""
Service requires credentials (app_id, app_key) to be passed using the Basic Auth
"""
import base64
import pytest
import authorising.auth
from authorising.resources import Service
#pytest.fixture(scope="module")
def service_settings(service_settings):
"Set auth mode to app_id/app_key"
service_settings.update({"backend_version": Service.Auth_app})
return service_settings
def test_basic_auth_app_id_key(application):
"""Test client access with Basic HTTP Auth using app id and app key
Configure Api/Service to use App ID / App Key Authentication
and Basic HTTP Auth to pass the credentials.
"""
credentials = application.authobj.credentials
encoded = base64.b64encode(
f"{creds['app_id']}:{credentials['app_key']}".encode("utf-8")).decode("utf-8")
response = application.test_request()
assert response.status_code == 200
assert response.request.headers["Auth"] == "Basic %s" % encoded
Is it possible to auto generate documentation from docstrings e.g using Sphinx ?
You can use sphinx-apidoc to generate test-documentation automatically using python-docstrings
For instance, if you have directory structure like below
.
docs
|-- rst
|-- html
tests
├── __init__.py
├── test_basic_auth_app.py
├── test_basic_auth_user.py
├── test_advanced_app_id.py
├── test_advanced_user.py
└── test_oauth_auth.py
sphinx-apidoc -o docs/rst tests
sphinx-build -a -b html docs/rst docs/html -j auto
All Your docs HTML Files will be under docs/html.
There are multiple options sphinx-apidoc supports. Here is the [link]: https://www.sphinx-doc.org/en/master/man/sphinx-apidoc.html
When using sphinx, you should add your test-folder to the Python path in the conf.py file:
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..', 'tests')))
Then in each rst file you can simply write:
.. automodule:: test_basic_auth_app
:members:
If you want to document also the test results, please take a look into Sphinx-Test-Reports

'length error in the tickerplant kdb+/q

When I start up the tick.q with sym.q and feed.q with files provided as follows:
q tick.q sym -p 5010
q feed.q
Github links: https://github.com/KxSystems/cookbook/tree/master/start/tick ,
https://github.com/KxSystems/kdb-tick
The tickerplant process prints 'length error on every update, which usually occurs when incorrect number of elements is passed: https://code.kx.com/wiki/Errors
I suspect that this happens when the feed process calls .u.upd
Any suggestions as to how to solve this problem?
Entering \e 1 into the command line will suspend execution and run the debugger allowing you to see what failed and query the variables which should help pinpoint what is causing the issues.
More about debugging here https://code.kx.com/q/ref/debug/
If you are using the plain vanilla tick setup from KX there is no reason for that error to appear.
Also, I think you need to start the feed as feed.q -t 200 otherwise you will get no data coming through.
Usually the 'length error appears when the table schema does not match. So if you have the sym.q file (and it is loaded correctly) you should not have that issue.
Just to confirm this is the structure of your directory:
.
├── feed.q
├── README.md
├── tick
│   ├── r.q
│   ├── sym.q
│   └── u.q
└── tick.q
The sym.q file contains your table schema. If you change something in the feedhandler the table schema in the sym.q must match that change (i.e if you add a column in the feed you must also add a holder in the table for that column)
Open a new q session on some port (9999), add your schema definition there and define insert as .u.upd or something like this :
.u.upd:{[t;d]
.test.t:t;
.test.d:d;
t upsert d
}
Now point your feed to this q session and stream some data; this will enable you to analyse the test variables in case of the errors.

How patching works in yocto

I am using BBB to understand the yocto-project. I am not sure how patching works. This is my project directory
├── meta-testlayer
├── poky
meta-test layer contains a "helloworld" example
├── conf
│   └── layer.conf
├── COPYING.MIT
├── README
└── recipes-hello
└── helloworld
├── helloworld-0.1
│   ├── helloworld.c
│   ├── helloworld.patch
│   └── newhelloworld.c
└── helloworld_0.1.bb
"helloworld.c" and "newhelloworld.c" differ by only one printf() statement. Here is the content of "helloworld.c":
#include <stdio.h>
int main(int argc, char **argv)
{
printf("Hi, this is my first custom recipe. Have a good day\n");
return 0;
}
The content of "newhelloworld.c":
#include <stdio.h>
int main(int argc, char **argv)
{
printf("Let see if patch works\n");
printf("Hi, this patch is from the test-layer\n");
return 0;
}
Here is the patch I created using diff helloworld.c newhelloworld.c > helloworld.patch command.
6c6,7
< printf("Hi, this is my first custom recipe. Have a good day\n");
---
> printf("Let see if patch works\n");
> printf("Hi, this patch is from the test-layer\n");
This is the content of "helloworld_0.1.bb" file
SUMMARY = "Simple helloworld application"
SECTION = "examples"
LICENSE = "MIT"
LIC_FILES_CHKSUM = "file://${COMMON_LICENSE_DIR}/MIT;md5=0835ade698e0bcf8506ecda2f7b4f302"
#here we specify the source we want to build
SRC_URI = "file://helloworld.c"
SRC_URI += "file://helloworld.patch"
#here we specify the source directory, where we can do all the building and expect sources to be placed
S = "${WORKDIR}"
#bitbake task
do_compile() {
${CC} ${LDFLAGS} helloworld.c -o helloworld
}
#bitbake task
do_install() {
install -d ${D}${bindir}
install -m 0755 helloworld ${D}${bindir}
}
This is the error message when I run bitbake -c patch helloworld:
NOTE: Executing SetScene Tasks
NOTE: Executing RunQueue Tasks
ERROR: helloworld-0.1-r0 do_patch: Command Error: 'quilt --quiltrc /home/guest/yocto_practice/poky/build-beaglebone/tmp/work/cortexa8hf-neon-poky-linux-gnueabi/helloworld/0.1-r0/recipe-sysroot-native/etc/quiltrc push' exited with 0 Output:
Applying patch helloworld.patch
patch: **** Only garbage was found in the patch input.
Patch helloworld.patch does not apply (enforce with -f)
ERROR: helloworld-0.1-r0 do_patch: Function failed: patch_do_patch
ERROR: Logfile of failure stored in: /home/guest/yocto_practice/poky/build-beaglebone/tmp/work/cortexa8hf-neon-poky-linux-gnueabi/helloworld/0.1-r0/temp/log.do_patch.22267
ERROR: Task (/home/guest/yocto_practice/meta-testlayer/recipes-hello/helloworld/helloworld_0.1.bb:do_patch) failed with exit code '1'
NOTE: Tasks Summary: Attempted 11 tasks of which 8 didn't need to be rerun and 1 failed.
Summary: 1 task failed:
/home/guest/yocto_practice/meta-testlayer/recipes-hello/helloworld/helloworld_0.1.bb:do_patch
Summary: There were 2 ERROR messages shown, returning a non-zero exit code.
First, create the patch:
diff -u helloworld.c newhelloworld.c > helloworld.patch
or using Git (replace x by the number of commits you want to extract a patch):
git format-patch -x
Two ways to apply the patch:
Put it into your test-layer, add a line on your .bb file:
SRC_URI += " file://example.patch "
Put it in another layer, but it's only needed if it isn't your layer (meta-oe, meta-fsl, meta-qt...)
For this case, use in your .bbappend file:
FILESEXTRAPATHS_prepend := "${THISDIR}/${PN}:"
SRC_URI += " file://helloworld.patch "
Create a .bbappend file for that recipe.
FILESEXTRAPATHS:prepend := "${THISDIR}/${PN}:"
SRC_URI += " file://your.patch "

Custom CoffeeScript compilation with GruntJS

I have a Gruntfile, which takes all *.coffee files from a certain folder, and compiles them to JS files, keeping the same folder structure (if any).
So with a folder structure like:
scripts
|--widgets
| |--a.coffee
|--vendor
| |--b.coffee
|--c.coffee
|--d.coffee
It will generate the same folder structure, but with JS files instead of Coffeescript files. I would like to have a separate rule for the widgets folder and the c.coffee file, ie. I want to compile all of the contents of widgets and c.coffee into one single file.
How can I exclude one file and one folder from the files property of the grunt object? This is the code I'm currently running:
files: [{
expand: true,
cwd: '<%= params.app %>/scripts',
src: '{,*/}*.{coffee,litcoffee,coffee.md}',
dest: '.tmp/scripts',
ext: '.js'
}]
I've also seen that there are 2 syntaxes to declare files. One is as an object, and one as an array (what I have above). What is the difference and would the other declaration better help me in my case?
The documentation on configuring grunt tasks has some words on what you want. Actually, there are four ways to define a files property, of which one is deprecated.
Here is a Gruntfile.coffee, because it is shorter. Use the File Arrays Format with exclusion patterns for the compile subtask and the Compact Format for the compileJoined subtask. I hope you use grunt-contrib-coffee. grunt-coffee is out of maintenance for almost two years now and doesn't seem to have a join option.
module.exports = (grunt) ->
grunt.initConfig
params: app: '.' # ignore this, it's just that this file works as expected.
coffee:
compile:
files: [
cwd: '<%= params.app %>/scripts'
expand: yes
src: ['**/*.{coffee,litcoffee,coffee.md}' # everything coffee in the scripts dir
'!c.coffee' # exclude this
'!widgets/**/*'] # and these
dest: '.tmp/scripts'
ext: '.js'
extDot: 'first' # to make .js files from .coffee.md files
]
compileJoined:
options: join: yes
# sadly you can't use expand here, so you'll have to do cwd "by hand".
src: [
'<%= params.app %>/scripts/c.coffee'
'<%= params.app %>/scripts/widgets/**/*.{coffee,litcoffee,coffee.md}'
]
dest: '.tmp/special.js'
grunt.loadNpmTasks 'grunt-contrib-coffee'
here's a small output from script, it seems to work:
$ tree scripts
scripts
├── c.coffee
├── d.coffee
├── vendor
│ └── b.coffee
└── widgets
└── a.coffee
2 directories, 4 files
$ rm -rf .tmp
$ grunt coffee
Running "coffee:compile" (coffee) task
>> 2 files created.
Running "coffee:compileJoined" (coffee) task
>> 1 files created.
Done, without errors.
$ tree .tmp
.tmp
├── scripts
│ ├── d.js
│ └── vendor
│ └── b.js
└── special.js
2 directories, 3 files
$ cat scrips/c.coffee
variableInC_coffee = "a variable"
$ cat scripts/widgets/a.coffee
variableInC_coffee = variableInC_coffee.replace /\s+/, '_'
$ cat .tmp/special.js
(function() {
var variableInC_coffee;
variableInC_coffee = "a variable";
variableInC_coffee = variableInC_coffee.replace(/\s+/, '_');
}).call(this);