airflow http callback sensor - callback

Our airflow implementation sends out http requests to get services to do tasks. We want those services to let airflow know when they complete their task, so we are sending a callback url to the service which they will call when their task is complete. I can't seem to find a callback sensor, however. How do people handle this normally?

There is no such thing as a callback or webhook sensor in Airflow. The sensor definition follows as taken from the documentation:
Sensors are a certain type of operator that will keep running until a certain criterion is met. Examples include a specific file landing in HDFS or S3, a partition appearing in Hive, or a specific time of the day. Sensors are derived from BaseSensorOperator and run a poke method at a specified poke_interval until it returns True.
This means that a sensor is an operator that performs polling behavior on external systems. In that sense, your external services should have a way of keeping state for each executed task - either internally or externally - so that a polling sensor can check on that state.
This way you can use for example the airflow.operators.HttpSensor that polls an HTTP endpoint until a condition is met. Or even better, write your own custom sensor that gives you the opportunity to do more complex processing and keep state.
Otherwise, if the service outputs data in a storage system you can use a sensor that polls a database for example. I believe you get the idea.
I'm attaching a custom operator example that I've written for integrating with the Apache Livy API. The sensor does two things: a) submits a Spark job through the REST API and b) waits for the job to be completed.
The operator extends the SimpleHttpOperator and at the same time implements the HttpSensor thus combining both functionalities.
class LivyBatchOperator(SimpleHttpOperator):
"""
Submits a new Spark batch job through
the Apache Livy REST API.
"""
template_fields = ('args',)
ui_color = '#f4a460'
#apply_defaults
def __init__(self,
name,
className,
file,
executorMemory='1g',
driverMemory='512m',
driverCores=1,
executorCores=1,
numExecutors=1,
args=[],
conf={},
timeout=120,
http_conn_id='apache_livy',
*arguments, **kwargs):
"""
If xcom_push is True, response of an HTTP request will also
be pushed to an XCom.
"""
super(LivyBatchOperator, self).__init__(
endpoint='batches', *arguments, **kwargs)
self.http_conn_id = http_conn_id
self.method = 'POST'
self.endpoint = 'batches'
self.name = name
self.className = className
self.file = file
self.executorMemory = executorMemory
self.driverMemory = driverMemory
self.driverCores = driverCores
self.executorCores = executorCores
self.numExecutors = numExecutors
self.args = args
self.conf = conf
self.timeout = timeout
self.poke_interval = 10
def execute(self, context):
"""
Executes the task
"""
payload = {
"name": self.name,
"className": self.className,
"executorMemory": self.executorMemory,
"driverMemory": self.driverMemory,
"driverCores": self.driverCores,
"executorCores": self.executorCores,
"numExecutors": self.numExecutors,
"file": self.file,
"args": self.args,
"conf": self.conf
}
print payload
headers = {
'X-Requested-By': 'airflow',
'Content-Type': 'application/json'
}
http = HttpHook(self.method, http_conn_id=self.http_conn_id)
self.log.info("Submitting batch through Apache Livy API")
response = http.run(self.endpoint,
json.dumps(payload),
headers,
self.extra_options)
# parse the JSON response
obj = json.loads(response.content)
# get the new batch Id
self.batch_id = obj['id']
log.info('Batch successfully submitted with Id %s', self.batch_id)
# start polling the batch status
started_at = datetime.utcnow()
while not self.poke(context):
if (datetime.utcnow() - started_at).total_seconds() > self.timeout:
raise AirflowSensorTimeout('Snap. Time is OUT.')
sleep(self.poke_interval)
self.log.info("Batch %s has finished", self.batch_id)
def poke(self, context):
'''
Function that the sensors defined while deriving this class should
override.
'''
http = HttpHook(method='GET', http_conn_id=self.http_conn_id)
self.log.info("Calling Apache Livy API to get batch status")
# call the API endpoint
endpoint = 'batches/' + str(self.batch_id)
response = http.run(endpoint)
# parse the JSON response
obj = json.loads(response.content)
# get the current state of the batch
state = obj['state']
# check the batch state
if (state == 'starting') or (state == 'running'):
# if state is 'starting' or 'running'
# signal a new polling cycle
self.log.info('Batch %s has not finished yet (%s)',
self.batch_id, state)
return False
elif state == 'success':
# if state is 'success' exit
return True
else:
# for all other states
# raise an exception and
# terminate the task
raise AirflowException(
'Batch ' + str(self.batch_id) + ' failed (' + state + ')')
Hope this will help you a bit.

Related

ForkingPickler: TypeError: cannot pickle 'memoryview' object

I am trying to send and receive pickled versions of a random value generated by the producer. I am using the Multiprocess(not '-ing') and ForkingPickler module to to pickle and qeueue the generated value. However upon running the sample program below, i get the below error. Basis for using ForkingPickler is to pickle socket objects in future. I am now testing out with a sample version. Is this a feasible way to go about pickling socket objects?
rv = reduce(self.proto)
TypeError: cannot pickle 'memoryview' object
def producer(queue):
print('Producer: Running', flush=True)
# generate work
for i in range(10):
# generate a value
value = random()
# block
sleep(value)
# add to the queue
fork_value = ForkingPickler.dumps(value)
queue.put(fork_value)
# all done
queue.put(None)
print(f'Queue Size Consumer: {queue.qsize()}', flush=True)
print('Producer: Done', flush=True)
# consume work
def consumer(queue):
print('Consumer: Running', flush=True)
# consume work
while True:
print(f'Queue Size Consumer: {queue.qsize()}', flush=True)
# get a unit of work
fork_value = queue.get()
item = ForkingPickler.loads(fork_value)
# check for stop
if item is None:
break
# report
print(f'>got {item}', flush=True)
# all done
print('Consumer: Done', flush=True)
# entry point
if __name__ == '__main__':
# create the shared queue
queue = JoinableQueue()
# start the consumer
consumer_process = Process(target=consumer, args=(queue,))
consumer_process.start()
# start the producer
producer_process = Process(target=producer, args=(queue,))
producer_process.start()
# wait for all processes to finish
consumer_process.join()
producer_process.join()

Grpc parallel Stream communication leads to error:AkkaNettyGrpcClientGraphStage

I have two services: one that sends stream data and the second one receives it using akka-grpc for communication. When source data is provided Service one is called to process and send it to service two via grpc client. It's possible that multiple instances of server one runs at the same time when multiple source data are provided at the same time.In long running test of my application. I see below error in service one:
ERROR i.a.g.application.actors.DbActor - GraphStage [akka.grpc.internal.AkkaNettyGrpcClientGraphStage$$anon$1#59d40805] terminated abruptly, caused by for example materializer or act
akka.stream.AbruptStageTerminationException: GraphStage [akka.grpc.internal.AkkaNettyGrpcClientGraphStage$$anon$1#59d40805] terminated abruptly, caused by for example materializer or actor system termination.
I have never shutdown actor systems but only kill actors after doing their job. Also I used proto3 and http2 for request binding. Here is a piece of my code in service one:
////////////////////server http binding /////////
val service: HttpRequest => Future[HttpResponse] =
ServiceOneServiceHandler(new ServiceOneServiceImpl(system))
val bound = Http().bindAndHandleAsync(
service,
interface = config.getString("akka.grpc.server.interface"),
port = config.getString("akka.grpc.server.default-http-port").toInt,
connectionContext = HttpConnectionContext(http2 = Always))
bound.foreach { binding =>
logger.info(s"gRPC server bound to: ${binding.localAddress}")
}
////////////////////client /////////
def send2Server[A](data: ListBuffer[A]): Future[ResponseDTO] = {
val reply = {
val thisClient = interface.initialize()
interface.call(client = thisClient, req = data.asInstanceOf[ListBuffer[StoreRequest]].toList)
}
reply
}
///////////////// grpc communication //////////
def send2GrpcServer[A](data: ListBuffer[A]): Unit = {
val reply = send2Server(data)
Await.ready(reply, Duration.Inf) onComplete {
case util.Success(response: ResponseDTO) =>
logger.info(s"got reply message: ${res.description}")
//////check response content and stop application if desired result not found in response
}
case util.Failure(exp) =>
//////stop application
throw exp.getCause
}
}
Error occurred exactly after waiting for service 2 response :
Await.ready(reply, Duration.Inf)
I can't catch the cause of error.
UPDATE
I found that some stream is missed such that service one sends an stream an indefinitely wait for the response and service two does not receive any thing to reply to service one but still don't know why stream is missed
I also updated akka grpc plugin but has no sense:
addSbtPlugin("com.lightbend.akka.grpc" % "sbt-akka-grpc" % "0.6.1")
addSbtPlugin("com.lightbend.sbt" % "sbt-javaagent" % "0.1.4")

Remove trailing bits from hex pyModBus

I want to built a function that sends a request from ModBus to serial in hex. I more o less have a working function but have two issues.
Issue 1
[b'\x06', b'\x1c', b'\x00!', b'\r', b'\x1e', b'\x1d\xd3', b'\r', b'\n', b'\x1e', b'\x1d']
I cant remove this part b'\r', b'\n', using the .split('\r \n') method since It's not a string.
Issue 2
When getting a value from holding register 40 (33) and i try to use the .to_bytes() method I keep getting b'\x00!', b'\r' and I'm expecting b'\x21'
r = client.read_holding_registers(40)
re = r.registers[0]
req = re.to_bytes(2, 'big')
My functions to generate my request and to send trough pyserial.
def scanned_code():
code = client.read_holding_registers(0)
# code2= client.re
r = code.registers[0]
return r
def send_request(data):
""" Takes input from create_request() and sends data to serial port"""
try:
for i in range(data):
serial_client.write(data[i])
# serial_client.writelines(data[i])
except:
print('no se pudo enviar el paquete <<<--------------------')
def create_request(job):
""" Request type is 33 looks for job
[06]
[1c]
req=33[0d][0a]
job=30925[0d][0a][1e]
[1d]
"""
r = client.read_holding_registers(40)
re = r.registers[0]
req = re.to_bytes(2, 'big')
num = job.to_bytes(2, 'big')
data = [
b'\x06',
b'\x1C',
req,
b'\x0D',
b'\x1E',
num,
b'\x0D',
b'\x0A',
b'\x1E',
b'\x1D'
]
print(data)
while True:
# verify order_trigger() is True.
while order_trigger() != False:
print('inside while loop')
# set flag coil back to 0
reset_trigger()
# get Job no.
job = scanned_code()
# check for JOB No. dif. than 0
if job != 0:
print(scanned_code())
send_request(create_request(job))
# send job request to host to get job data
# send_request()
# if TRUE send job request by serial to DVI client
# get job request data
# translate job request data to modbus
# send data to plc
else:
print(' no scanned code')
break
time.sleep(INTERNAL_SLEEP_TIME)
print('outside loop')
time.sleep(EXTERNAL_SLEEP_TIME)
As an additional question is this the proper way of doing things?

Kubernetes API call equivalent to 'kubectl apply'

I try to use the master api to update resources.
In 1.2 to update a deployment resource I'm doing kubectl apply -f new updateddeployment.yaml
How to do the same action with the api?
I checked the code in pkg/kubectl/cmd/apply.go and I think the following lines of code shows what's behind the scene when you run kubectl apply -f:
// Compute a three way strategic merge patch to send to server.
patch, err := strategicpatch.CreateThreeWayMergePatch(original, modified, current,
versionedObject, true)
helper := resource.NewHelper(info.Client, info.Mapping)
_, err = helper.Patch(info.Namespace, info.Name, api.StrategicMergePatchType, patch)
And here is the code helper.Patch:
func (m *Helper) Patch(namespace, name string, pt api.PatchType, data []byte) (runtime.Object, error) {
return m.RESTClient.Patch(pt).
NamespaceIfScoped(namespace, m.NamespaceScoped).
Resource(m.Resource).
Name(name).
Body(data).
Do().
Get()
}
This API is not really convincingly designed, since it forces us to reimplement such basic stuff at the client side...
Anyway, here is my attempt to reinvent the hexagonal wheel in Python...
Python module kube_apply
Usage is like kube_apply.fromYaml(myStuff)
can read strings or opened file streams (via lib Yaml)
handles yaml files with several concatenated objects
implementation is rather braindead and first attempts
to insert the resource. If this fails, it tries a patch,
and if this also fails, it deletes the resource and
inserts it anew.
File: kube_apply.py
#!/usr/bin/python3
# coding: utf-8
# __________ ________________________________________________ #
# kube_apply - apply Yaml similar to kubectl apply -f file.yaml #
# #
# (C) 2019 Hermann Vosseler <Ichthyostega#web.de> #
# This is OpenSource software; licensed under Apache License v2+ #
# ############################################################### #
'''
Utility for the official Kubernetes python client: apply Yaml data.
While still limited to some degree, this utility attempts to provide
functionality similar to `kubectl apply -f`
- load and parse Yaml
- try to figure out the object type and API to use
- figure out if the resource already exists, in which case
it needs to be patched or replaced alltogether.
- otherwise just create a new resource.
Based on inspiration from `kubernetes/utils/create_from_yaml.py`
#since: 2/2019
#author: Ichthyostega
'''
import re
import yaml
import logging
import kubernetes.client
def runUsageExample():
''' demonstrate usage by creating a simple Pod through default client
'''
logging.basicConfig(level=logging.DEBUG)
#
# KUBECONFIG = '/path/to/special/kubecfg.yaml'
# import kubernetes.config
# client = kubernetes.config.new_client_from_config(config_file=KUBECONFIG)
# # --or alternatively--
# kubernetes.config.load_kube_config(config_file=KUBECONFIG)
fromYaml('''
kind: Pod
apiVersion: v1
metadata:
name: dummy-pod
labels:
blow: job
spec:
containers:
- name: sleepr
image: busybox
command:
- /bin/sh
- -c
- sleep 24000
''')
def fromYaml(rawData, client=None, **kwargs):
''' invoke the K8s API to create or replace an object given as YAML spec.
#param rawData: either a string or an opened input stream with a
YAML formatted spec, as you'd use for `kubectl apply -f`
#param client: (optional) preconfigured client environment to use for invocation
#param kwargs: (optional) further arguments to pass to the create/replace call
#return: response object from Kubernetes API call
'''
for obj in yaml.load_all(rawData):
createOrUpdateOrReplace(obj, client, **kwargs)
def createOrUpdateOrReplace(obj, client=None, **kwargs):
''' invoke the K8s API to create or replace a kubernetes object.
The first attempt is to create(insert) this object; when this is rejected because
of an existing object with same name, we attempt to patch this existing object.
As a last resort, if even the patch is rejected, we *delete* the existing object
and recreate from scratch.
#param obj: complete object specification, including API version and metadata.
#param client: (optional) preconfigured client environment to use for invocation
#param kwargs: (optional) further arguments to pass to the create/replace call
#return: response object from Kubernetes API call
'''
k8sApi = findK8sApi(obj, client)
try:
res = invokeApi(k8sApi, 'create', obj, **kwargs)
logging.debug('K8s: %s created -> uid=%s', describe(obj), res.metadata.uid)
except kubernetes.client.rest.ApiException as apiEx:
if apiEx.reason != 'Conflict': raise
try:
# asking for forgiveness...
res = invokeApi(k8sApi, 'patch', obj, **kwargs)
logging.debug('K8s: %s PATCHED -> uid=%s', describe(obj), res.metadata.uid)
except kubernetes.client.rest.ApiException as apiEx:
if apiEx.reason != 'Unprocessable Entity': raise
try:
# second attempt... delete the existing object and re-insert
logging.debug('K8s: replacing %s FAILED. Attempting deletion and recreation...', describe(obj))
res = invokeApi(k8sApi, 'delete', obj, **kwargs)
logging.debug('K8s: %s DELETED...', describe(obj))
res = invokeApi(k8sApi, 'create', obj, **kwargs)
logging.debug('K8s: %s CREATED -> uid=%s', describe(obj), res.metadata.uid)
except Exception as ex:
message = 'K8s: FAILURE updating %s. Exception: %s' % (describe(obj), ex)
logging.error(message)
raise RuntimeError(message)
return res
def patchObject(obj, client=None, **kwargs):
k8sApi = findK8sApi(obj, client)
try:
res = invokeApi(k8sApi, 'patch', obj, **kwargs)
logging.debug('K8s: %s PATCHED -> uid=%s', describe(obj), res.metadata.uid)
return res
except kubernetes.client.rest.ApiException as apiEx:
if apiEx.reason == 'Unprocessable Entity':
message = 'K8s: patch for %s rejected. Exception: %s' % (describe(obj), apiEx)
logging.error(message)
raise RuntimeError(message)
else:
raise
def deleteObject(obj, client=None, **kwargs):
k8sApi = findK8sApi(obj, client)
try:
res = invokeApi(k8sApi, 'delete', obj, **kwargs)
logging.debug('K8s: %s DELETED. uid was: %s', describe(obj), res.details and res.details.uid or '?')
return True
except kubernetes.client.rest.ApiException as apiEx:
if apiEx.reason == 'Not Found':
logging.warning('K8s: %s does not exist (anymore).', describe(obj))
return False
else:
message = 'K8s: deleting %s FAILED. Exception: %s' % (describe(obj), apiEx)
logging.error(message)
raise RuntimeError(message)
def findK8sApi(obj, client=None):
''' Investigate the object spec and lookup the corresponding API object
#param client: (optional) preconfigured client environment to use for invocation
#return: a client instance wired to the apriopriate API
'''
grp, _, ver = obj['apiVersion'].partition('/')
if ver == '':
ver = grp
grp = 'core'
# Strip 'k8s.io', camel-case-join dot separated parts. rbac.authorization.k8s.io -> RbacAuthorzation
grp = ''.join(part.capitalize() for part in grp.rsplit('.k8s.io', 1)[0].split('.'))
ver = ver.capitalize()
k8sApi = '%s%sApi' % (grp, ver)
return getattr(kubernetes.client, k8sApi)(client)
def invokeApi(k8sApi, action, obj, **args):
''' find a suitalbe function and perform the actual API invocation.
#param k8sApi: client object for the invocation, wired to correct API version
#param action: either 'create' (to inject a new objet) or 'replace','patch','delete'
#param obj: the full object spec to be passed into the API invocation
#param args: (optional) extraneous arguments to pass
#return: response object from Kubernetes API call
'''
# transform ActionType from Yaml into action_type for swagger API
kind = camel2snake(obj['kind'])
# determine namespace to place the object in, supply default
try: namespace = obj['metadata']['namespace']
except: namespace = 'default'
functionName = '%s_%s' %(action,kind)
if hasattr(k8sApi, functionName):
# namespace agnostic API
function = getattr(k8sApi, functionName)
else:
functionName = '%s_namespaced_%s' %(action,kind)
function = getattr(k8sApi, functionName)
args['namespace'] = namespace
if not 'create' in functionName:
args['name'] = obj['metadata']['name']
if 'delete' in functionName:
from kubernetes.client.models.v1_delete_options import V1DeleteOptions
obj = V1DeleteOptions()
return function(body=obj, **args)
def describe(obj):
return "%s '%s'" % (obj['kind'], obj['metadata']['name'])
def camel2snake(string):
string = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', string)
string = re.sub('([a-z0-9])([A-Z])', r'\1_\2', string).lower()
return string
if __name__=='__main__':
runUsageExample()
You could install the kubectl binary and invoke it from within your Python program a la:
exec(f"kubectl apply -f - <<EOF{yaml_manifests}EOF --prune")
Once server-side apply is ready this problem should get a bit easier as there will effectively be a k8s API endpoint you can hit (though still doesn't sound like it will be resource-agnostic, i.e. you will still have to PATCH /api/v1/some-k8s-resource specifically, whereas with kubectl apply you can input some heterogenous list of resources).

Check status of build request sent by "buildbot sendchange" command

I have a case where I am able to successfully trigger a build in buildbot by using sendchange command. While this works, I am unable to find a command to check if the build that was triggered by sendchange has finished.
Is there a way to achieve this in buildbot?
Thanks!
Since buildbot is asynchronous, you will need to poll the builder for builds that match your sendchange, and then poll that build for build status. Using e.g. python, it's fairly trivial using requests (https://pypi.python.org/pypi/requests) to retrieve a build's json and examine the state from the command line.
The "API" in this case is to use requests.get(url).json() and traverse the buildbot builds looking for your change request. The buildbot json is documented in the "REST API" section of the docs (http://docs.buildbot.net/latest/developer/rest.html), you'll have to hunt to figure out how change requests are stored.
Here's some code that will get you started:
import pprint, requests
def get_url_base(serv,port):
return 'http://%(serv)s:%(port)d' % locals()
def get_bldr_json(serv,port,bldr):
url = 'http://%(serv)s:%(port)d/json/builders/%(bldr)s' % locals()
print "get_bldr_json: %s ..." % url
jdata = requests.get(url).json()
print "DEBUG: get_bldr_json:", pprint.pformat(jdata)
return jdata
def get_bld_json(serv,port,bldr,bnum):
url = 'http://%(serv)s:%(port)d/json/builders/%(bldr)s/builds/%(bnum)s' % locals()
print "get_bld_json: %s ..." % url
jdata = requests.get(url).json()
print "DEBUG: get_bld_json:", pprint.pformat(jdata)
return jdata
# you'll have to set these values for your buildbot
serv, port, bldr = ('hexbotserver', 8010, 'buildername')
jdata = get_bldr_json(serv,port,bldr)
for bnum in jdata['cachedBuilds']:
jdata = get_bld_json(serv,port,bldr,bnum)
print "build properties:"
pprint.pprint(dict(jdata)['properties'])