How do I get and process the JSON change object? - watchman

I'm using Facebook watchman on Linux under bash to track file system
changes.
I'm confused about how to get the change JSON object. All I seem to
get is a list of the changed files. I setup a watch and a trigger as
below:
watchman watch /Users/osx/Applications/docker/tests watchman --
trigger /Users/osx/Applications/docker/tests 'file-sync' \
-- /Users/osx/Applications/docker/filewatcher/file-sync.sh
However when I query the arguments past to the file-sync.sh script it
looks like just a text field rather than a JSON object.
Do I need to do anything else to actually get the full details of all
the changes that are happening on the root?
watchman trigger-list shows the following results: { "version":
"4.9.0", "triggers": [
{
"command": [
"/Users/osx/Applications/docker/filewatcher/file-sync.sh"
],
"stdin": [
"name",
"exists",
"new",
"size",
"mode"
],
"append_files": true,
"name": "file-sync",
"empty_on_fresh_instance": true
} ] }
From what I understand this should give me a JSON object with the
name,exists,new,size and mode components.
In the system log i can see the following:
2019-01-15T22:28:49,191: [trigger file-sync
/Volumes/UserData/osx/Applications/docker/tests] input_json: sending
json object to stm
What is stm?
In the system log I can see the following:
2019-01-15T22:28:49,191: [trigger file-sync
/Volumes/UserData/osx/Applications/docker/tests] input_json: sending
`enter code here`json object to stm
In my bash script I use jq to dump the JSON output to stdout
$(echo jq '.' $1)

Watchman will pass a list of file names to your trigger program in the argument vector, and the json data will be supplied to it on its stdin stream.
You can change your bash script to do:
jq '.'
and it should show you the data of interest.
Consider using watchman-make rather than triggers
I'd like to note that triggers are a bit hard to use and debug because they log to the watchman server log and are hard to see into. For that reason I tend to push folks towards using watchman-make, which runs in the foreground and has clear messaging about what is happening. For example, assuming that you are going to trigger an rsync, you can probably get away with doing something like:
cd /Users/osx/Applications/docker/tests && \
watchman-make -p '**/*' \
--run /Users/osx/Applications/docker/filewatcher/file-sync.sh
and then file-sync.sh can be something like:
#!/bin/bash
rsync /Users/osx/Applications/docker/tests somewhere:else
Note that this pattern "loses" the list of changed files, but rsync is going to compare the directory structures anyway, and most directory trees tend to be small
enough that you won't mind.
If you want or need to make it incremental you can have your script call back into watchman; something like this will list the files changed since the last time it was run:
watchman since /Users/osx/Applications/docker/tests n:file-sync
The n:file-sync is a named cursor; the server will keep track of the internal clock data on your behalf each time you query using it in a since query. You can find more information about this here: https://facebook.github.io/watchman/docs/clockspec.html

Related

wget appends query string to resulting file

I'm trying to retrieve working webpages with wget and this goes well for most sites with the following command:
wget -p -k http://www.example.com
In these cases I will end up with index.html and the needed CSS/JS etc.
HOWEVER, in certain situations the url will have a query string and in those cases I get an index.html with the query string appended.
Example
www.onlinetechvision.com/?p=566
Combined with the above wget command will result in:
index.html?page=566
I have tried using the --restrict-file-names=windows option, but that only gets me to
index.html#page=566
Can anyone explain why this is needed and how I can end up with a regular index.html file?
UPDATE: I'm sort of on the fence on taking a different approach. I found out I can take the first filename that wget saves by parsing the output. So the name that appears after Saving to: is the one I need.
However, this is wrapped by this strange character รข - rather than just removing that hardcoded - where does this come from?
If you try with parameter "--adjust-extension"
wget -p -k --adjust-extension www.onlinetechvision.com/?p=566
you come closer. In www.onlinetechvision.com folder there will be file with corrected extension: index.html#p=566.html or index.html?p=566.html on *NiX systems. It is simple now to change that file to index.html even with script.
If you are on Microsoft OS make sure you have latter version of wget - it is also available here: https://eternallybored.org/misc/wget/
To answer your question about why this is needed, remember that the web server is likely to return different results based on the parameters in the query string. If a query for index.html?page=52 returns different results from index.html?page=53, you probably wouldn't want both pages to be saved in the same file.
Each HTTP request that uses a different set of query parameters is quite literally a request for a distinct resource. wget can't predict which of these changes is and isn't going to be significant, so it's doing the conservative thing and preserving the query parameter URLs in the filename of the local document.
My solution is to do recursive crawling outside wget:
get directory structure with wget (no file)
loop to get main entry file (index.html) from each dir
This works well with wordpress sites. Could miss some pages tho.
#!/bin/bash
#
# get directory structure
#
wget --spider -r --no-parent http://<site>/
#
# loop through each dir
#
find . -mindepth 1 -maxdepth 10 -type d | cut -c 3- > ./dir_list.txt
while read line;do
wget --wait=5 --tries=20 --page-requisites --html-extension --convert-links --execute=robots=off --domain=<domain> --strict-comments http://${line}/
done < ./dir_list.txt
The query string is required because of the website design what the site is doing is using the same standard index.html for all content and then using the querystring to pull in the content from another page like with script on the server side. (it may be client side if you look in the JavaScript).
Have you tried using --no-cookies it could be storing this information via cookie and pulling it when you hit the page. also this could be caused by URL rewrite logic which you will have little control over from the client side.
use -O or --output-document options. see http://www.electrictoolbox.com/wget-save-different-filename/

using wget to overwrite file but use temporary filename until full file is received, then rename

I'm using wget in a cron job to fetch a .jpg file into a web server folder once per minute (with same filename each time, overwriting). This folder is "live" in that the web server also serves that image from there. However if someone web-browses to that page during the time the image is being fetched, it is considered a jpg with errors and says so in the browser. So what I need to do is, similar to when Firefox is downloading a file, wget should write to a temporary file, either in /var or in the destination folder but with a temporary name, until it has the whole thing, then rename in an atomic (or at least negligible-duration) step.
I've read the wget man page and there doesn't seem to be a command line option for this. Have I missed it? Or do I need to do two commands in my cron job, a wget and a move?
There is no way to do this purely with GNU Wget.
wget's job is to download files and it does that. A simple one line script can achieve what you're looking for:
$ wget -O myfile.jpg.tmp example.com/myfile.jpg && mv myfile.jpg{.tmp,}
Since mv is atomic, atleast on Linux, you get the atomic update of a ready file.
Just wanted to share my solution:
alias wget='func(){ (wget --tries=0 --retry-connrefused --timeout=30 -O download_pkg.tmp "$1" && mv download_pkg.tmp "${1##*/}") || rm download_pkg.tmp; unset -f func; }; func
it creates a function that receives a parameter "url" to download the file to a temporary name. If it is successful, it is renamed to the correct filename extracted from parameter $1 with ${1##*/}. and if it fails, deletes the temp file. If the operation is aborted, the temp file will be replace on the next run. after all, unset -f removes the function definition as the alias is executed.

dpkg: How to use trigger?

I wrote a little CDN server that rebuilds its registry pool when new pool-content-packages are installed into that registry pool.
Instead of having each pool-content-package call the init.d of the cdn-server, I'd like to use triggers. That way it would restart the server only once at the end of an installation run, after all packages were installed.
What have I to do to use triggers in my packages with debhelper support?
What you are looking for is dpkg-triggers.
One solution with use of debhelper to build the debian packages is this:
Step 1)
Create file debian/<serverPackageName>.triggers (replace <serverPackageName> with name of your server package).
Step 1a)
Define a trigger that watch the directory of your pool. The content of file would be:
interest /path/to/my/pool
Step 1b)
But you can also define a named trigger, which have to be fired explicit (see step 3).
content of file:
interest cdn-pool-changed
The name of the trigger cdn-pool-changed is free. You can take what ever you want.
Step 2)
Add handler for trigger to file debian/<serverPackageName>.postinst (replace <serverPackageName> with name of your server package).
Example:
#!/bin/sh
set -e
case "$1" in
configure)
;;
triggered)
#here is the handler
/etc/init.d/<serverPackageName> restart
;;
abort-upgrade|abort-remove|abort-deconfigure)
;;
*)
echo "postinst called with unknown argument \`$1'" >&2
exit 1
;;
esac
#DEBHELPER#
exit 0
Replace <serverPackageName> with name of your server package.
Step 3) (only for named triggers, step 1b) )
Add in every content package the file debian/<contentPackageName>.triggers (replace <contentPackageName> with names of your content packages).
content of file:
activate cdn-pool-changed
Use same name for trigger you defined in step 1.
More detailed Information
The best description for dpkg-triggers I could found is "How to use dpkg triggers". The corresponding git repository with examples you can get here:
git clone git://anonscm.debian.org/users/seanius/dpkg-triggers-example.git
I had a need and read and re-read the docs many times. I think that the process is not clearly explain or rather what goes where is not clearly explained. Here I hope to clarify the use of Debian package triggers.
Service with Configuration Directory
A service reading its settings in a specific directory can mark that directory as being of interest.
Say I create a new service which reads settings from /usr/share/my-service/config/...
That service gets two additions:
In its debian directory I add my-service.triggers
And here are the contents:
# my-service.triggers
interest /usr/share/my-service/config
This means if any other package installs or removes a file from that directory, the trigger enters its "needs to be run" state.
In its debian directory I also add my-service.postinst
And I have a script as follow to check whether the trigger happened and run a process as required:
# my-service.postinst
if [ "$1" = "triggered" ]
then
if [ "$2" = "/usr/share/my-service/config" ]
then
# this may or may not be what you need to do, but this is often
# how you handle a change in your service config files
#
systemctl restart my-service
fi
exit 0
fi
That's it.
Now packages adding extensions to your service can add their own configuration file(s) under /usr/share/my-service/config (or a directory under /etc/my-service/my-service.d/... or /var/lib/my-service/..., although that last one should be reserved for dynamic files, not files installed from a package) and dpkg automatically calls your postinst script with:
postinst triggered /usr/share/my-service/config
# where /usr/share/my-service/config is your <interest-path>
This call happens only once and after all the packages were installed, hence the advantage of having a trigger in the first place. This way each package does not need to know that it has to restart my-service and it does not happen more than once, which could cause all sorts of side effects (i.e. the service tries to listen on a TCP port and get error: address already in use).
IMPORTANT: keep in mind that the postinst should include a line with #DEBHELPER#.
So you do not have to do anything special in other packages. Only make sure to install the configuration files in the correct directory and dpkg picks up from there (i.e. in my example under /usr/share/my-service/config).
I have an extension to BIND9 called ipmgr which makes use of .ini files saved in a specific folder. It uses the files to generate DNS zones (way less errors that way! and it includes support for getting letsencrypt certificates and settings for dmarc/dkim). This package uses this case: a simple directory where configuration files get installed. Other packages do not need to do anything other than install files in the right place (/usr/share/ipmgr/zones, for this package).
Service without a Configuration Folder
In some (rare?) cases, you may need to trigger something in a service which is not driven by the installation of a new configuration file.
In this case, you can use an arbitrary name (it should include your package name to make sure it is unique since this name is global to the entire Debian/Ubuntu system).
To make this one work, you need three files, one of which is a trigger in the other packages.
State the Interest
As above, we have an interest. In this case, the interest is stated as a name on its own. The dpkg system distinguish between a name and a path because a name cannot include a slash (/) character. Names are limited to ASCII except control characters and spaces. I would suggest you stick to a-z, 0-9 and dashes (-).
# my-service.triggers
interest my-service-settings
This is useful if you cannot simply track a folder. For example, the settings could come from a network connection that a package offers once installed.
Listen for the Triggers
Again, as above, you need a postinst script in your Service Package. This captures the trigger and allows you to run a command. The script is the same, only you test for the name instead of the folder (note that you can have any number of triggers, so you could also have both: a folder as above and a special name as here).
# my-service.postinst
if [ "$1" = "triggered" ]
then
if [ "$2" = "my-service-settings" ]
then
# this may or may not what you need to do, but this is often
# how you handle a change in your service config files
#
systemctl restart my-service
fi
exit 0
fi
The Trigger
As mentioned above, we need a third file. An arbitrary name is not going to be triggered automatically by dpkg. It wouldn't know whether your other package needs to trigger something just like that (although it is fairly automated as it is already).
So in other packages, you create a trigger file which looks like this:
# other-package.triggers
activate my-service-settings
Now we recognize the name, it is the same as the interest stated above.
In other words, if the trigger needs to run for something other than just the installation of files in a given location, use a special name and add this triggers file with the activate keyword.
Other Features
I have not tested the other features of the dpkg-trigger(1) tool. There are other keywords support in the triggers files:
interest
interest-await
interest-noawait
activate
activate-await
activate-noawait
The deb-triggers manual page has additional information about those. I am not too sure what the await/noawait implies other than the trigger may happen at any time when nowait is used.
Automatic Trigger Added
The build system on Ubuntu (probably Debian too) automatically adds a triggers file with the following when your package includes a library:
$ cat triggers
# Triggers added by dh_makeshlibs/11.1.6ubuntu2
activate-noawait ldconfig
I suggest you exercise caution if your package includes libraries. If you have your own triggers file, I do not know whether this addition will still happen automatically.
This also shows us a special case where it wants to use the noawait. If I understand correctly, it has to run the ldconfig trigger ASAP so your commands will work as expected after the unpack. Otherwise ldd will not know anything about your newly installed library.

Unsure of how to proceed with creating ec2-consisten-snapshot

I was just put on a task to try and debug and figure out why our ec2-consistent-snapshot script isn't working.
Our lead programmer followed this blog post.
We have a .sh script that we'd like to take the snapshot and it looks like this:
#!/bin/sh
/opt/aws/bin/ec2-consistent-snapshot --aws-access-key-id MYACCESSKEY --aws-secret-access-key MYSECRETKEY --freeze-filesystem /vol --mysql --mysql-host localhost --mysql-socket /var/lib/mysql/mysql.sock --mysql-username USERNAME --mysql-password PASSWORD --description "Demo MySQL data volume: $(date +%c)" vol-MYVOL
If I run this by doing sudo ./snapshot_script.sh I get a single error:
ec2-consistent-snapshot: ERROR: create_snapshot: File does not exist: at /usr/share/perl5/vendor_perl/Net/Amazon/EC2.pm line 232
I of course followed this error and line 232 in EC2.pm is this:
my $ref = $xs->XMLin($xml);
I have 0 perl experience and I don't know what this could be doing.
Any help would be wonderful.
The Net::Amazon::EC2 that I'm looking at on CPAN has that line at 252, not 232 so perhaps you are not on the latest version. Looking above that line, the program has attempted to do a "query to sign" using lots of the security parms. I suspect there is a problem with the authentication keys you are using. There is a debug flag, you might want to turn that on to generate more messages.
If you go to this page, you will see that XMLin() is a function of XML::Simple, and it takes a file as an argument. So, $xml is presumably a variable that contains an xml file name. That file does not exist.
The next step would be to trace the error back into the source code of ec2-consistent-snapshot, in order to see how it is calling XML::Simple and where the bad value gets passed in.

CVS command to get brief history of repository

I am using following command to get a brief history of the CVS repository.
cvs -d :pserver:*User*:*Password*#*Repo* rlog -N -d "*StartDate* < *EndDate*" *Module*
This works just fine except for one small problem. It lists all tags created on each file in that repository. I want the tag info, but I only want the tags that are created in the date range specified. How do I change this command to do that.
I don't see a way to do that natively with the rlog command. Faced with this problem, I would write a Perl script to parse the output of the command, correlate the tags to the date range that I want and print them.
Another solution would be to parse the ,v files directly, but I haven't found any robust libraries for doing that. I prefer Perl for that type of task, and the parsing modules don't seem to be very high quality.