watchman: I am missing file deletions happening before subscription - watchman

I am missing deletes in watchman. Version 4.9.0, inotify.
My test code:
#!/usr/bin/env python3
import pathlib
import pywatchman
w = pywatchman.client()
w.query('watch', '/tmp/z')
clock = w.query('clock', '/tmp/z')['clock']
print(clock)
q = w.query('subscribe', '/tmp/z', 'Buffy', {'expression':["since", clock],
"fields": ["name", "exists", "oclock", "ctime_ns", "new", "mode"]})
print(q)
f = pathlib.Path('/tmp/z/xx')
f.touch()
data = w.receive()
clock = data['clock']
print()
print('Touch file:')
print(data)
print('Clock:', clock)
f.unlink()
print()
print('Delete file:')
print(w.receive())
w.close()
w = pywatchman.client(timeout=99999)
q = w.query('subscribe', '/tmp/z', 'Buffy', {'expression':["since", clock],
"fields": ["name", "exists", "oclock", "ctime_ns", "new", "mode"]})
print(q)
print()
print('We request changes since', clock)
print(w.receive())
w.close()
What I am seeing:
We create the file. We receive the notification of the new file and the directory change. GOOD. We take note of the "clock" of this notification.
We delete the file. We get the notification of the file deletion. GOOD. Be DO NOT get the notification of the directory change.
Just imagine now that the process crashes BEFORE it can update the internal details, but it remember the changes notified in step 1 (directory update and creation of a new file). That is, transaction 1 is processed, but the program crashes before transaction 2 is processed.
We now open a new subscription to watchman (remember, we are simulating a crash) and request changes since step 1. I am simulating a recovery, where the program reboots, notice that transaction 1 was OK (the file is present) and request more changes (it should get the deletion).
I would expect to get a file deletion but I get... NOTHING. CATASTROPHIC.
Transcript:
$ ./watchman-bug.py
c:1517109517:10868:3:23
{'clock': 'c:1517109517:10868:3:23', 'subscribe': 'Buffy', 'version': '4.9.0'}
Touch file:
{'unilateral': True, 'subscription': 'Buffy', 'root': '/tmp/z', 'files': [{'name': 'xx', 'exists': True, 'oclock': 'c:1517109517:10868:3:24', 'ctime_ns': 1517114230070245747, 'new': True, 'mode': 33188}], 'is_fresh_instance': False, 'version': '4.9.0', 'since': 'c:1517109517:10868:3:23', 'clock': 'c:1517109517:10868:3:24'}
Clock: c:1517109517:10868:3:24
Delete file:
{'unilateral': True, 'subscription': 'Buffy', 'root': '/tmp/z', 'files': [{'name': 'xx', 'exists': False, 'oclock': 'c:1517109517:10868:3:25', 'ctime_ns': 1517114230070245747, 'new': False, 'mode': 33188}], 'is_fresh_instance': False, 'version': '4.9.0', 'since': 'c:1517109517:10868:3:24', 'clock': 'c:1517109517:10868:3:25'}
{'clock': 'c:1517109517:10868:3:25', 'subscribe': 'Buffy', 'version': '4.9.0'}
We request changes since c:1517109517:10868:3:24
The process hangs expecting the deletion notification.
What am I doing wrong?.
Thanks for your time and knowledge!

The issue is that you're using a since expression term rather than informing watchman to use the since generator (the recency index).
What's the difference? You can think of this as the difference between the FROM and WHERE clauses in SQL. The expression field is similar in intent to the WHERE clause: it applies to the matched results and filters them down, but what you wanted to do is specify the FROM clause by setting the since field in the query spec. This is admittedly a subtle difference.
The solution is to remove the expression term and add the generator term like this:
q = w.query('subscribe', '/tmp/z', 'Buffy',
{"since": clock,
"fields": ["name", "exists", "oclock",
"ctime_ns", "new", "mode"]})
While we don't have really any documentation on the use of the pywatchman API, you can borrow the concepts from the slightly better documented nodejs API; here's a relevant snippet:
https://facebook.github.io/watchman/docs/nodejs.html#subscribing-only-to-changed-files

Related

JupyterHub - log current user

I use a custom logger to log who is currently doing any kind of stuff in Jupyterhub.
logging_config: dict = {
"version": 1,
"disable_existing_loggers": False,
"formatters": {
"company": {
"()": lambda: MyFormatter(user=os.environ.get("JUPYTERHUB_USER", "Unknown"))
},
},
....
c.Application.logging_config = logging_config
Output:
{"asctime": "2022-06-29 14:13:43,773", "level": "WARNING", "name": "JupyterHub", "message": "Updating Hub route http://127.0.0.1:8081 \u2192 http://jupyterhub:8081", "user": "Unknown"
The logger itself works fine, but I am not able to log who was performing the action. In the Image I start, there is a JUPYTERHUB_USER env variable available. This seems to get passed from JupyterHub ( I don´t know how this is done exactly). But in JupyterHub I don´t have this variable available.
Is there a way to use it in JupyterHub, not just in the jupyterLab container?
This doesn't get you all the way there but it's a start - we add extra pod annotations/labels through KubeSpawner's extra_annotations using the cluster_options hook (see our helm chart for our complete daskhub setup):
dask-gateway:
gateway:
extraConfig:
optionHandler: |
from dask_gateway_server.options import Options, String, Select, Mapping, Float, Bool
from math import ceil
def cluster_options(user):
def option_handler(options):
extra_annotations = {
"hub.jupyter.org/username": user.name
}
default_extra_labels = {
"hub.jupyter.org/username": user.name,
}
return Options(
Select(
...
),
...,
handler=option_handler,
)
c.Backend.cluster_options = cluster_options
You can then poll pods with these labels to get real time usage. There may be a more direct way to do this though - not sure.

How to create a schedule in pagerduty using restApi and python

When through the documentation of pagerduty was but still not able to understand what parameters to send in the request body and also facing trouble in understanding how to make the api request.If any one can share the sample code on making a pagerduty schedule that would help me alot.
Below is the sample code to create schedules in PagerDuty.
Each list can have multiple items (to add more users / layers)
import requests
url = "https://api.pagerduty.com/schedules?overflow=false"
payload={
"schedule": {
"schedule_layers": [
{
"start": "<dateTime>", # Start Time of layer | "start": "2021-01-01T00:00:00+05:30",
"users": [
{
"user": {
"id": "<string>", # ID of user to add in layer
"summary": "<string>",
"type": "<string>", # "type": "user"
"self": "<url>",
"html_url": "<url>"
}
}
],
"rotation_virtual_start": "<dateTime>", # Start of layer | "rotation_virtual_start": "2021-01-01T00:00:00+05:30",
"rotation_turn_length_seconds": "<integer>", # Layer rotation, for multiple user switching | "rotation_turn_length_seconds": <seconds>,
"id": "<string>", # Auto-generated. Only needed if you want update and existing Schedule Layer
"end": "<dateTime>", # End Time of layer | "end": "2021-01-01T00:00:00+05:30",
"restrictions": [
{
"type": "<string>", # To restrict shift to certain timings Weekly daily etc | "type": "daily_restriction",
"duration_seconds": "<integer>", # Duration of layer | "duration_seconds": "300"
"start_time_of_day": "<partial-time>", #Start time of layer | "start_time_of_day": "00:00:00",
"start_day_of_week": "<integer>"
}
],
"name": "<string>", # Name to give Layer
}
]
"time_zone": "<activesupport-time-zone>", # Timezone to set for layer and its timings | "time_zone": "Asia/Kolkata",
"type": "schedule",
"name": "<string>", # Name to give Schedule
"description": "<string>",# Description to give Schedule
"id": "<string>", # Auto-generated. Only needed if you want update and existing Schedule Layer
}
}
headers = {
'Authorization': 'Token token=<Your token here>',
'Accept': 'application/vnd.pagerduty+json;version=2',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, json=payload)
print(response.text)
Best way to do this is to get the postman collection for PagerDuty and edit the request as per your liking. Once you get a successful response, convert that into code using the inbuilt feature of postman.
Using PagerDuty API for scheduling is not easy. Creating new schedule is okaish, but if you decide to update schedule - it is definitely not trivial. You'll probably occur bunch of limitation: number of restriction per layer, must reuse current layers, etc.
As option you can use a python library pdscheduling https://github.com/skrypka/pdscheduling

Correct way to invoke the copy module with module param 'content'

I have a custom action plugin and I need to write out returned variable data on the controller to a file. I'm trying this locally right now.
copy_module_args = dict()
copy_module_args["content"] = 'test'
copy_module_args["dest"] = dest
copy_module_args["owner"] = owner
copy_module_args["group"] = group
copy_module_args["mode"] = mode
try:
result = merge_hash(result, self._execute_module(
module_name="copy",
module_args=copy_module_args,
task_vars=task_vars))
except (AnsibleError, TypeError) as err:
err_msg = "Failed to do stuff"
raise AnsibleActionFail(to_text(err_msg), to_text(err))
The result of ._execute_module is
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Source None not found"}
The vaule of result is
{'msg': 'Source None not found', 'failed': True, 'invocation': {'module_args': {'content': 'VALUE_SPECIFIED_IN_NO_LOG_PARAMETER', 'dest': '/home/me/testfile', 'owner': 'me', 'group': 'me', 'mode': None, 'backup': False, 'force': True, 'follow': False, 'src': None, '_original_basename': None, 'validate': None, 'directory_mode': None, 'remote_src': None, 'local_follow': None, 'checksum': None, 'seuser': None, 'serole': None, 'selevel': None, 'setype': None, 'attributes': None, 'regexp': None, 'delimiter': None, 'unsafe_writes': None}}, '_ansible_parsed': True}
This invocation is trying to use the "src" param even though I'm only passing the "content" param. I know this because when I add "src" the failure message changes. I excepted, from the docs and from reading the copy module and template module source that at a bare minimum my implementation would result in:
- name: Copy using inline content
copy:
content: 'test'
dest: /home/me/testfile
Does anyone know what I'm missing or why "src" is being preferred over "content" even though it's not being specified?
The content: argument is just syntatic sugar for writing it to a tempfile, so I would guess you will need to take charge of that, or find a way to invoke the copy action, which apparently runs before the copy module.
I was able to see that "content" was being handled in the action plugin, not the module. I've adapted what I found to fit my needs. I call the action plugin, instead of the module directly.
copy_module_args = dict()
copy_module_args["content"] = 'test'
copy_module_args["dest"] = dest
copy_module_args["owner"] = owner
copy_module_args["group"] = group
copy_module_args["mode"] = mode
copy_module_args["follow"] = True
copy_module_args["force"] = False
copy_action = self._task.copy()
copy_action.args.update(copy_module_args)
# Removing args passed in via the playbook that aren't meant for
# the copy module
for remove in ("arg1", "arg2", "arg3", "arg4"):
copy_action.args.pop(remove, None)
try:
copy_action = self._shared_loader_obj.action_loader.get('copy',
task=copy_action,
connection=self._connection,
play_context=self._play_context,
loader=self._loader,
templar=self._templar,
shared_loader_obj=self._shared_loader_obj)
result = merge_hash(result, copy_action.run(task_vars=task_vars))
This allows me to leverage copy how I originally intended, by utilising its idempotency and checksumming without having to write my own.
changed: [localhost] => {"changed": true, "checksum": "00830d74b4975d59049f6e0e7ce551477a3d9425", "dest": "/home/me/testfile", "gid": 1617705057, "group": "me", "md5sum": "6f007f4188a0d35835f4bb84a2548b66", "mode": "0644", "owner": "me", "size": 9, "src": "/home/me/.ansible/tmp/ansible-tmp-1560715301.737494-249856394953357/source", "state": "file", "uid": 1300225668}
And running it again,
ok: [localhost] => {"changed": false, "dest": "/home/me/testfile", "src": "/home/me/testfile/.ansible/tmp/ansible-local-9531902t7jt3/tmp_nq34zm5"}

Extracting Client ID as a custom dimension with audience dimensions through API

I'm looking to get some answers here regarding the matter. There is no formal documentation available, would like some answers to my dilemma. Currently on analytics, I have Client ID setup as a custom dimension, session scope, I'm currently trying to match this Client ID with other dimensions via the Analytics Reporting API v4. (Reason having done so is that because in order for Client ID to be available outside of User Explorer on Analytics, one has to setup a custom dimension for this)
It's come to my attention that when I try to match Client ID, with an Audience Dimension, such as Affinity, nothing comes up. But say I do so with another dimension like PagePath + Affinity, the table exist. So I know that it is possible to pull Audience dimensions with other dimensions and it's possible for me to pull Client ID together with other dimensions. But what I'm trying to understand is why can't I pull Client ID together with Audience dimensions?
Some clarification on the matter would truly be appreciated, thanks.
For example (Can't show everything, but this is the response body of the python script)
In the case that i try to match my custom dimension (Client ID, session scope) with Affinity.
request_report = {
'viewId': VIEW_ID,
'pageSize' : 100000,
'dateRanges': [{'startDate': '2018-12-14',
'endDate': 'today'}],
'metrics': [{'expression': 'ga:users'}
],
'dimensions': [{'name': 'ga:dateHour'},
{'name':'ga:dimension1'},
{'name': 'ga:interestAffinityCategory'}
]
}
response = api_client.reports().batchGet(
body={
'reportRequests': request_report
}).execute()
Output:
ga:dateHour ga:dimension1 ga:interestAffinityCategory ga:users
Changing my dimensions, to pagePath + Affinity
request_report = {
'viewId': VIEW_ID,
'pageSize' : 100000,
'dateRanges': [{'startDate': '2018-12-14',
'endDate': 'today'}],
'metrics': [{'expression': 'ga:users'}
],
'dimensions': [{'name': 'ga:dateHour'},
{'name': 'ga:pagePath'},
{'name': 'ga:interestAffinityCategory'}
]
}
response = api_client.reports().batchGet(
body={
'reportRequests': request_report
}).execute()
Output:
ga:dateHour ga:pagePath ga:interestAffinityCategory ga:users
2018121415 homepage Business Professionals 10
2019011715 join-beta Beauty Mavens 16
2019011715 join-beta Frequently Visits Salons 21
Now say I change my combination to custom dimension + device category
request_report = {
'viewId': VIEW_ID,
'pageSize' : 100000,
'dateRanges': [{'startDate': '2018-12-14',
'endDate': 'today'}],
'metrics': [{'expression': 'ga:users'}
],
'dimensions': [{'name': 'ga:dateHour'},
{'name': 'ga:adContent'},
{'name': 'ga:deviceCategory'}
]
}
response = api_client.reports().batchGet(
body={
'reportRequests': request_report
}).execute()
Output:
ga:dateHour ga:dimension1 ga:adContent ga:deviceCategory ga:users
2018121410 10 ad1 desktop 1
2018121410 111 ad1 mobile 1
2018121410 119 ad4 mobile 1
2018121410 15 ad3 desktop 1
2018121410 157 ad3 mobile 1
In conclusion:
What I'd like to achieve is being able to pair my custom dimensions (Client ID) together with audience dimensions in order to be able to do segmentations. But first things first, if this permutation is not possible, I would like to understand as to why it's not possible? Is this a limitation from the API side? Or is this a policy thing (taking a guess here as I understand that there are identity protection policies)?
The BigQuery integration does not contain demographics/affinitys. The User Interface contains a variety of mechanisms to prevent you from isolating individual users, so in short no.

Sensu remediation does not work

I have configured the following check:
"cron": {
"command": "check-process.rb -p cron",
"subscribers": [],
"handlers": [
"mailer",
"flowdock",
"remediator"],
"interval": 10,
"occurences": 3,
"refresh": 600,
"standalone": false,
"remediation": {
"light_remediation": {
"occurrences": [1, 2],
"severities": [2]
}
}
},
"light_remediation": {
"command": "touch /tmp/test",
"subscribers": [],
"handlers": ["flowdock"],
"publish": false,
"interval": 10
},
Mailer and flowdock handlers are being executed as expected, so I am receiving e-mails and flowdock notifications when cron service is not running. The problem is that remediator check is not working and I have no idea why. I have used this: https://github.com/nstielau/sensu-community-plugins/blob/remediation/handlers/remediation/sensu.rb
I ran into similar issues but finally managed to get it working with some modifications.
First off, the gotchas:
Each server (client.json.template) needs to subscribe to a channel $HOSTNAME
"subscribers": ["$HOSTNAME"],
You don't have a "trigger_on" section, which is in the code but not the example and you want to set that up to trigger on the $HOSTNAME as well.
my_chek.json.template
"trigger_on": ["$HOSTNAME"]
The remediation checks need to subscribe to $HOSTNAME as well (so you need to template the checks out as well)
"subscribers": ["$HOSTNAME"],
At this point, you should be able to trigger your remediation from the sensu server manually.
Lastly, the example code listed in sensu.rb is broken... The occurrences check needs to be up one level in the loop, and the trigger_on is not inside the remediations section, it's outside.
subscribers = #event['check']['trigger_on'] ? [#event['check']['trigger_on']].flatten : [client]
...
# Check remediations matching the current severity
next unless (conditions["severities"] || []).include?(severity)
remediations_to_trigger << check
end
end
remediations_to_trigger
end
After that, it should work for you.
Oh, and one last gotcha. In your client.json.template
"safe_mode": true
It defaults to false...