Does osquery inotify install watcher on directory or files

Does osquery inotify install watcher on directory or files - inotify

I am using osquery to monitor files and folders to get events on any operation on those files. There is a specific syntax for osquery configuration:
"/etc/": watches the entire directory at a depth of 1.
"/etc/%": watches the entire directory at a depth of 1.
"/etc/%%": watches the entire tree recursively with /etc/ as the root.
I am trying to evaluate the memory usage in case of watching a lot of directories. In this process I found the following statistics:
"/etc", "/etc/%", "/etc/%.conf": only 1 inotify handle is found registered in the name of osquery.
"/etc/%%: a few more than 289 inotify handles found which are registered in the name of osquery, given that there are a total of 285 directories under the tree. When checking the entries in /proc/$PID/fdinfo, all the inodes listed in the file points to just folders.
eg: for "/etc/%.conf"
$ grep -r "^inotify" /proc/$PID/fdinfo/
18:inotify wd:1 ino:120001 sdev:800001 mask:3ce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:01001200bc0f1cab
$ printf "%d\n" 0x120001
1179649
$ sudo debugfs -R "ncheck 1179649" /dev/sda1
debugfs 1.43.4 (31-Jan-2017)
Inode Pathname
1179649 //etc
The inotify watch is established on the whole directory here, but the events are only reported for the matching files /etc/*.conf. Is osquery filtering the events based on the file_paths supplied, which is what I am assuming, but not sure.
Another experiment that I performed to support the above claim was, use the source in the inotify(7) and run a watcher on a particular file. When I check the list of inotify watchers, it just shows :
$ ./a.out /tmp/inotify.cc &
$ cat /proc/$PID/fdinfo/3
...
inotify wd:1 ino:1a1 sdev:800001 mask:38 ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:a1010000aae325d7
$ sudo debugfs -R "ncheck 417" /dev/sda1
debugfs 1.43.4 (31-Jan-2017)
Inode Pathname
417 /tmp/inotify.cc
So, according to this experiment, establishing a watcher on a single file is possible (which is clear from the inotify man page). This supports the claim that osquery is doing some sort of filtering based on the file patterns supplied.
Could someone verify the claim or present otherwise?
My osquery config:
{
"options": {
"host_identifier": "hostname",
"schedule_splay_percent": 10
},
"schedule": {
"file_events": {
"query": "SELECT * FROM file_events;",
"interval": 5
}
},
"file_paths": {
"sys": ["/etc/%.conf"]
}
}
$ osqueryd --version
osqueryd version 3.3.2
$ uname -a
Linux lab 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux

It sounds like some great sleuthing!
I think the comments in the source code support that. It's worth skimming it. I think the relevant files:
https://github.com/osquery/osquery/blob/master/osquery/tables/events/linux/file_events.cpp
https://github.com/osquery/osquery/blob/master/osquery/events/linux/inotify.cpp

Related

btrfs shows strange dir names in send-receive dump

Today I found out, that just showing a diff in btrfs is extremely complicated.
While in ZFS, it's only zfs diff, in btrfs, one has to use either btrfs subv find-new <SNAPNAME> <last-gen> and this find-new does never show me files I created with touch... or empty directories I created...
or one has to use btrfs send --no-data <SNAP1> <SNAP2> | btrfs recv dump
No what I did to test this: I created an empty directory, called 'blah2' and this is what the "diff" shows me:
[andreas#archlinux data]$ ./bdiff .snaps/Fotos-20220910c .snaps/Fotos-20220910b
At subvol .snaps/Fotos-20220910b
snapshot ./Fotos-20220910b uuid=5b07b5f0-ba94-5a47-b586-6d36805b0c9e transid=6783 parent_uuid=1f25f808-28a4-f34d-98e0-fcb54acf4a8a parent_transid=6788
utimes ./Fotos-20220910b/ atime=2022-09-10T16:10:50+0200 mtime=2022-09-10T16:08:07+0200 ctime=2022-09-10T16:08:07+0200
rmdir ./Fotos-20220910b/blah2
utimes ./Fotos-20220910b/ atime=2022-09-10T16:10:50+0200 mtime=2022-09-10T16:08:07+0200 ctime=2022-09-10T16:08:07+0200
[andreas#archlinux data]$ ./bdiff .snaps/Fotos-20220910b .snaps/Fotos-20220910c
At subvol .snaps/Fotos-20220910c
snapshot ./Fotos-20220910c uuid=1f25f808-28a4-f34d-98e0-fcb54acf4a8a transid=6788 parent_uuid=5b07b5f0-ba94-5a47-b586-6d36805b0c9e parent_transid=6783
utimes ./Fotos-20220910c/ atime=2022-09-10T16:10:50+0200 mtime=2022-09-10T16:16:55+0200 ctime=2022-09-10T16:16:55+0200
mkdir ./Fotos-20220910c/o4139-6788-0
rename ./Fotos-20220910c/o4139-6788-0 dest=./Fotos-20220910c/blah2
utimes ./Fotos-20220910c/ atime=2022-09-10T16:10:50+0200 mtime=2022-09-10T16:16:55+0200 ctime=2022-09-10T16:16:55+0200
chown ./Fotos-20220910c/blah2 gid=1000 uid=1000
chmod ./Fotos-20220910c/blah2 mode=755
utimes ./Fotos-20220910c/blah2 atime=2022-09-10T16:16:55+0200 mtime=2022-09-10T16:16:55+0200 ctime=2022-09-10T16:16:55+0200
[andreas#archlinux data]$
Why the hell does it not only report a mkdir blah2, but a mkdir o4139-6788-0 with a following rename? But the otherway, it just reports one rmdir?

How to update modules.conf for SELINUX in BUILDROOT?

looking to disable some SELinux modules (set to off) and create others in modules.conf. I don't see an obvious way of updating modules.conf as I tried adding my changes as a modules.conf patch but it failed given that the modules.conf file gets built and is not just downloaded by BR so it is not available for patching like other things under the refpolicy directory:
Build window output:
refpolicy 2.20190609 PatchingApplying 0001-refpolicy-update-modules-conf.patch using patch:
can't find file to patch at input line 3
I did see in the log that there is a support/sedoctool.py that autogenerates the policy/modules.conf file so that the file is NOT patchable like most other things in the ref policy.
The relevant section of the buildroot/output/build/refpolicy-2.20190609/Makefile:
# policy building support tools
support := support
genxml := $(PYTHON) $(support)/segenxml.py
gendoc := $(PYTHON) $(support)/sedoctool.py
<...snip...>
########################################
#
# Create config files
#
conf: $(mod_conf) $(booleans) generate$(booleans) $(mod_conf): conf.intermediate.INTERMEDIATE: conf.intermediate
conf.intermediate: $(polxml)
#echo "Updating $(booleans) and $(mod_conf)"
$(verbose) $(gendoc) -b $(booleans) -m $(mod_conf) -x $(polxml)
Part of the hsmlinux build.log showing the sedoctool.py (gendoc) being run:
Updating policy/booleans.conf and policy/modules.conf
.../build-buildroot-sawshark/buildroot/output/host/usr/bin/python3 support/sedoctool.py -b policy/booleans.conf -m policy/modules.conf -x doc/policy.xml
I'm sure there is a standard way of doing this, just doesn't seem to be documented anywhere I can find.
Thanks.

Turns out that the sedoctool.py script is reading the doc/policy.xml. Looking at sedoctool.py:
#modules enabled and disabled values
MOD_BASE = "base"
MOD_ENABLED = "module"
MOD_DISABLED = "off"
<...snip...>
def gen_module_conf(doc, file_name, namevalue_list):
"""
Generates the module configuration file using the XML provided and the
previous module configuration.
"""
# If file exists, preserve settings and modify if needed.
# Otherwise, create it.
<...snip...>
mod_name = node.getAttribute("name")
mod_layer = node.parentNode.getAttribute("name")
<...snip...>
if mod_name and mod_layer:
file_name.write("# Layer: %s\n# Module: %s\n" % (mod_layer,mod_name))
if required:
file_name.write("# Required in base\n")
file_name.write("#\n")
if [mod_name, MOD_DISABLED] in namevalue_list:
file_name.write("%s = %s\n\n" % (mod_name, MOD_DISABLED))
# If the module is set as enabled.
elif [mod_name, MOD_ENABLED] in namevalue_list:
file_name.write("%s = %s\n\n" % (mod_name, MOD_ENABLED))
# If the module is set as base.
elif [mod_name, MOD_BASE] in namevalue_list:
file_name.write("%s = %s\n\n" % (mod_name, MOD_BASE))
So sedoctool.py has the nice feature of: "# If file exists, preserve settings and modify if needed." and modules.conf can just be added whole here via a complete file patch and the modules that are not desired set as "off" : refpolicy-2.20190609/policy/modules.conf and the script will update as needed based on desired policy.
One more detail is that in the next stage of the refpolicy Makefile (Building) the modules.conf with the updates is deleted in the beginning which kind of clashes with the ability of sedoctool to preserve the patched version of modules.conf...so patched the removal in the Building stage of the Makefile.
[7m>>> refpolicy 2.20190609 Building^[
<...snip...>
rm -f policy/modules.conf
The Makefile in refpolicy-2.20190609 has this line that I patched out because we are patching in our own modules.conf:
bare: clean
<...snip...>
$(verbose) rm -f $(mod_conf)
That patch looks like:
--- BUILDROOT/Makefile 2020-08-17 13:25:06.963804709 -0400
+++ FIX/Makefile 2020-08-17 19:25:29.540607763 -0400
## -636,7 +636,6 ##
$(verbose) rm -f $(modxml)
$(verbose) rm -f $(tunxml)
$(verbose) rm -f $(boolxml)
- $(verbose) rm -f $(mod_conf)
$(verbose) rm -f $(booleans)
$(verbose) rm -fR $(htmldir)
$(verbose) rm -f $(tags)
BTW,
Creating a patch with a complete new file in pp1:q!:
diff -crB --new-file pp0 pp1 > pp0.patch

Can you get the number of lines of code from a GitHub repository?

In a GitHub repository you can see “language statistics”, which displays the percentage of the project that’s written in a language. It doesn’t, however, display how many lines of code the project consists of. Often, I want to quickly get an impression of the scale and complexity of a project, and the count of lines of code can give a good first impression. 500 lines of code implies a relatively simple project, 100,000 lines of code implies a very large/complicated project.
So, is it possible to get the lines of code written in the various languages from a GitHub repository, preferably without cloning it?
The question “Count number of lines in a git repository” asks how to count the lines of code in a local Git repository, but:
You have to clone the project, which could be massive. Cloning a project like Wine, for example, takes ages.
You would count lines in files that wouldn’t necessarily be code, like i13n files.
If you count just (for example) Ruby files, you’d potentially miss massive amount of code in other languages, like JavaScript. You’d have to know beforehand which languages the project uses. You’d also have to repeat the count for every language the project uses.
All in all, this is potentially far too time-intensive for “quickly checking the scale of a project”.

You can run something like
git ls-files | xargs wc -l
Which will give you the total count →
You can also add more instructions. Like just looking at the JavaScript files.
git ls-files | grep '\.js' | xargs wc -l
Or use this handy little tool → https://line-count.herokuapp.com/

A shell script, cloc-git
You can use this shell script to count the number of lines in a remote Git repository with one command:
#!/usr/bin/env bash
git clone --depth 1 "$1" temp-linecount-repo &&
printf "('temp-linecount-repo' will be deleted automatically)\n\n\n" &&
cloc temp-linecount-repo &&
rm -rf temp-linecount-repo
Installation
This script requires CLOC (“Count Lines of Code”) to be installed. cloc can probably be installed with your package manager – for example, brew install cloc with Homebrew. There is also a docker image published under mribeiro/cloc.
You can install the script by saving its code to a file cloc-git, running chmod +x cloc-git, and then moving the file to a folder in your $PATH such as /usr/local/bin.
Usage
The script takes one argument, which is any URL that git clone will accept. Examples are https://github.com/evalEmpire/perl5i.git (HTTPS) or git#github.com:evalEmpire/perl5i.git (SSH). You can get this URL from any GitHub project page by clicking “Clone or download”.
Example output:
$ cloc-git https://github.com/evalEmpire/perl5i.git
Cloning into 'temp-linecount-repo'...
remote: Counting objects: 200, done.
remote: Compressing objects: 100% (182/182), done.
remote: Total 200 (delta 13), reused 158 (delta 9), pack-reused 0
Receiving objects: 100% (200/200), 296.52 KiB | 110.00 KiB/s, done.
Resolving deltas: 100% (13/13), done.
Checking connectivity... done.
('temp-linecount-repo' will be deleted automatically)
171 text files.
166 unique files.
17 files ignored.
http://cloc.sourceforge.net v 1.62 T=1.13 s (134.1 files/s, 9764.6 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Perl 149 2795 1425 6382
JSON 1 0 0 270
YAML 2 0 0 198
-------------------------------------------------------------------------------
SUM: 152 2795 1425 6850
-------------------------------------------------------------------------------
Alternatives
Run the commands manually
If you don’t want to bother saving and installing the shell script, you can run the commands manually. An example:
$ git clone --depth 1 https://github.com/evalEmpire/perl5i.git
$ cloc perl5i
$ rm -rf perl5i
Linguist
If you want the results to match GitHub’s language percentages exactly, you can try installing Linguist instead of CLOC. According to its README, you need to gem install linguist and then run linguist. I couldn’t get it to work (issue #2223).

I created an extension for Google Chrome browser - GLOC which works for public and private repos.
Counts the number of lines of code of a project from:
project detail page
user's repositories
organization page
search results page
trending page
explore page

If you go to the graphs/contributors page, you can see a list of all the contributors to the repo and how many lines they've added and removed.
Unless I'm missing something, subtracting the aggregate number of lines deleted from the aggregate number of lines added among all contributors should yield the total number of lines of code in the repo. (EDIT: it turns out I was missing something after all. Take a look at orbitbot's comment for details.)
UPDATE:
This data is also available in GitHub's API. So I wrote a quick script to fetch the data and do the calculation:
'use strict';
async function countGithub(repo) {
const response = await fetch(`https://api.github.com/repos/${repo}/stats/contributors`)
const contributors = await response.json();
const lineCounts = contributors.map(contributor => (
contributor.weeks.reduce((lineCount, week) => lineCount + week.a - week.d, 0)
));
const lines = lineCounts.reduce((lineTotal, lineCount) => lineTotal + lineCount);
window.alert(lines);
}
countGithub('jquery/jquery'); // or count anything you like
Just paste it in a Chrome DevTools snippet, change the repo and click run.
Disclaimer (thanks to lovasoa):
Take the results of this method with a grain of salt, because for some repos (sorich87/bootstrap-tour) it results in negative values, which might indicate there's something wrong with the data returned from GitHub's API.
UPDATE:
Looks like this method to calculate total line numbers isn't entirely reliable. Take a look at orbitbot's comment for details.

You can clone just the latest commit using git clone --depth 1 <url> and then perform your own analysis using Linguist, the same software Github uses. That's the only way I know you're going to get lines of code.
Another option is to use the API to list the languages the project uses. It doesn't give them in lines but in bytes. For example...
$ curl https://api.github.com/repos/evalEmpire/perl5i/languages
{
"Perl": 274835
}
Though take that with a grain of salt, that project includes YAML and JSON which the web site acknowledges but the API does not.
Finally, you can use code search to ask which files match a given language. This example asks which files in perl5i are Perl. https://api.github.com/search/code?q=language:perl+repo:evalEmpire/perl5i. It will not give you lines, and you have to ask for the file size separately using the returned url for each file.

Not currently possible on Github.com or their API-s
I have talked to customer support and confirmed that this can not be done on github.com. They have passed the suggestion along to the Github team though, so hopefully it will be possible in the future. If so, I'll be sure to edit this answer.
Meanwhile, Rory O'Kane's answer is a brilliant alternative based on cloc and a shallow repo clone.

From the #Tgr's comment, there is an online tool :
https://codetabs.com/count-loc/count-loc-online.html

You can use tokei:
cargo install tokei
git clone --depth 1 https://github.com/XAMPPRocky/tokei
tokei tokei/
Output:
===============================================================================
Language Files Lines Code Comments Blanks
===============================================================================
BASH 4 48 30 10 8
JSON 1 1430 1430 0 0
Shell 1 49 38 1 10
TOML 2 78 65 4 9
-------------------------------------------------------------------------------
Markdown 4 1410 0 1121 289
|- JSON 1 41 41 0 0
|- Rust 1 47 38 5 4
|- Shell 1 19 16 0 3
(Total) 1517 95 1126 296
-------------------------------------------------------------------------------
Rust 19 3750 3123 119 508
|- Markdown 12 358 5 302 51
(Total) 4108 3128 421 559
===============================================================================
Total 31 6765 4686 1255 824
===============================================================================
Tokei has support for badges:
Count Lines
[![](https://tokei.rs/b1/github/XAMPPRocky/tokei)](https://github.com/XAMPPRocky/tokei)
By default the badge will show the repo's LoC(Lines of Code), you can also specify for it to show a different category, by using the ?category= query string. It can be either code, blanks, files, lines, comments.
Count Files
[![](https://tokei.rs/b1/github/XAMPPRocky/tokei?category=files)](https://github.com/XAMPPRocky/tokei)

You can use GitHub API to get the sloc like the following function
function getSloc(repo, tries) {
//repo is the repo's path
if (!repo) {
return Promise.reject(new Error("No repo provided"));
}
//GitHub's API may return an empty object the first time it is accessed
//We can try several times then stop
if (tries === 0) {
return Promise.reject(new Error("Too many tries"));
}
let url = "https://api.github.com/repos" + repo + "/stats/code_frequency";
return fetch(url)
.then(x => x.json())
.then(x => x.reduce((total, changes) => total + changes[1] + changes[2], 0))
.catch(err => getSloc(repo, tries - 1));
}
Personally I made an chrome extension which shows the number of SLOC on both github project list and project detail page. You can also set your personal access token to access private repositories and bypass the api rate limit.
You can download from here https://chrome.google.com/webstore/detail/github-sloc/fkjjjamhihnjmihibcmdnianbcbccpnn
Source code is available here https://github.com/martianyi/github-sloc

Hey all this is ridiculously easy...
Create a new branch from your first commit
When you want to find out your stats, create a new PR from main
The PR will show you the number of changed lines - as you're doing a PR from the first commit all your code will be counted as new lines
And the added benefit is that if you don't approve the PR and just leave it in place, the stats (No of commits, files changed and total lines of code) will simply keep up-to-date as you merge changes into main. :) Enjoy.

Firefox add-on Github SLOC
I wrote a small firefox addon that prints the number of lines of code on github project pages: Github SLOC

npm install sloc -g
git clone --depth 1 https://github.com/vuejs/vue/
sloc ".\vue\src" --format cli-table
rm -rf ".\vue\"
Instructions and Explanation
Install sloc from npm, a command line tool (Node.js needs to be installed).
npm install sloc -g
Clone shallow repository (faster download than full clone).
git clone --depth 1 https://github.com/facebook/react/
Run sloc and specifiy the path that should be analyzed.
sloc ".\react\src" --format cli-table
sloc supports formatting the output as a cli-table, as json or csv. Regular expressions can be used to exclude files and folders (Further information on npm).
Delete repository folder (optional)
Powershell: rm -r -force ".\react\" or on Mac/Unix: rm -rf ".\react\"
Screenshots of the executed steps (cli-table):
sloc output (no arguments):
It is also possible to get details for every file with the --details option:
sloc ".\react\src" --format cli-table --details

Open terminal and run the following:
curl -L "https://api.codetabs.com/v1/loc?github=username/reponame"

If the question is "can you quickly get NUMBER OF LINES of a github repo", the answer is no as stated by the other answers.
However, if the question is "can you quickly check the SCALE of a project", I usually gauge a project by looking at its size. Of course the size will include deltas from all active commits, but it is a good metric as the order of magnitude is quite close.
E.g.
How big is the "docker" project?
In your browser, enter api.github.com/repos/ORG_NAME/PROJECT_NAME
i.e. api.github.com/repos/docker/docker
In the response hash, you can find the size attribute:
{
...
size: 161432,
...
}
This should give you an idea of the relative scale of the project. The number seems to be in KB, but when I checked it on my computer it's actually smaller, even though the order of magnitude is consistent. (161432KB = 161MB, du -s -h docker = 65MB)

Pipe the output from the number of lines in each file to sort to organize files by line count.
git ls-files | xargs wc -l |sort -n

This is so easy if you are using Vscode and you clone the project first. Just install the Lines of Code (LOC) Vscode extension and then run LineCount: Count Workspace Files from the Command Pallete.
The extension shows summary statistics by file type and it also outputs result files with detailed information by each folder.

There in another online tool that counts lines of code for public and private repos without having to clone/download them - https://klock.herokuapp.com/

None of the answers here satisfied my requirements. I only wanted to use existing utilities. The following script will use basic utilities:
Git
GNU or BSD awk
GNU or BSD sed
Bash
Get total lines added to a repository (subtracts lines deleted from lines added).
#!/bin/bash
git diff --shortstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD | \
sed 's/[^0-9,]*//g' | \
awk -F, '!($2 > 0) {$2="0"};!($3 > 0) {$3="0"}; {print $2-$3}'
Get lines of code filtered by specified file types of known source code (e.g. *.py files or add more extensions, etc).
#!/bin/bash
git diff --shortstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- *.{py,java,js} | \
sed 's/[^0-9,]*//g' | \
awk -F, '!($2 > 0) {$2="0"};!($3 > 0) {$3="0"}; {print $2-$3}'
4b825dc642cb6eb9a060e54bf8d69288fbee4904 is the id of the "empty tree" in Git and it's always available in every repository.
Sources:
My own scripting
How to get Git diff of the first commit?
Is there a way of having git show lines added, lines changed and lines removed?

shields.io has a badge that can count up all the lines for you here. Here is an example of what it looks like counting the Raycast extensions repo:

You can use sourcegraph, an open source search engine for code. It can connect to your GitHub account, index the content, and then on the admin section you would see the number of lines of code indexed.

I made an NPM package specifically for this usage, which allows you to call a CLI tool and providing the directory path and the folders/files to ignore
it goes like this:
npm i -g #quasimodo147/countlines
to get the $ countlines command in your terminal
then you can do
countlines . node_modules build dist

Where to find logs for a cloud-init user-data script?

I'm initializing spot instances running a derivative of the standard Ubuntu 13.04 AMI by pasting a shell script into the user-data field.
This works. The script runs. But it's difficult to debug because I can't figure out where the output of the script is being logged, if anywhere.
I've looked in /var/log/cloud-init.log, which seems to contain a bunch of stuff that would be relevant to debugging cloud-init, itself, but nothing about my script. I grepped in /var/log and found nothing.
Is there something special I have to do to turn logging on?

The default location for cloud init user data is already /var/log/cloud-init-output.log, in AWS, DigitalOcean and most other cloud providers. You don't need to set up any additional logging to see the output.

You could create a cloud-config file (with "#cloud-config" at the top) for your userdata, use runcmd to call the script, and then enable output logging like this:
output: {all: '| tee -a /var/log/cloud-init-output.log'}

so I tried to replicate your problem. Usually I work in Cloud Config and therefore I just created a simple test user-data script like this:
#!/bin/sh
echo "Hello World. The time is now $(date -R)!" | tee /root/output.txt
echo "I am out of the output file...somewhere?"
yum search git # just for fun
ls
exit 0
Notice that, with CloudInit shell scripts, the user-data "will be executed at rc.local-like level during first boot. rc.local-like means 'very late in the boot sequence'"
After logging in into my instance (a Scientific Linux machine) I first went to /var/log/boot.log and there I found:
Hello World. The time is now Wed, 11 Sep 2013 10:21:37 +0200! I am
out of the file. Log file somewhere? Loaded plugins: changelog,
kernel-module, priorities, protectbase, security,
: tsflags, versionlock 126 packages excluded due to repository priority protections 9 packages excluded due to repository
protections ^Mepel/pkgtags
| 581 kB 00:00
=============================== N/S Matched: git =============================== ^[[1mGit^[[0;10mPython.noarch : Python ^[[1mGit^[[0;10m Library c^[[1mgit^[[0;10m.x86_64 : A fast web
interface for ^[[1mgit^[[0;10m
...
... (more yum search output)
...
bin etc lib lost+found mnt proc sbin srv tmp var
boot dev home lib64 media opt root selinux sys usr
(other unrelated stuff)
So, as you can see, my script ran and was rightly logged.
Also, as expected, I had my forced log 'output.txt' in /root/output.txt with the content:
Hello World. The time is now Wed, 11 Sep 2013 10:21:37 +0200!
So...I am not really sure what is happening in you script.
Make sure you're exiting the script with
exit 0 #or some other code
If it still doesn't work, you should provide more info, like your script, your boot.log, your /etc/rc.local, and your cloudinit.log.
btw: what is your cloudinit version?

wget files from FTP-like listings

So, site that used to use FTP now has an HTTP front-end and won't allow FTP connections. The site in question (for an example directory) will show a page with links to different dates. Inside each of these different dates, there are many files, and I typically just need to get some file with some clear pattern e.g. *h17v04*.hdf. I thought this could work:
wget -I "${PLATFORM}/${PRODUCT}/${YEAR}.*" -r -l 4 \
--user-agent="Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1" \
--verbose -c -np -nc -nd \
-A "*h17v04*.hdf" http://e4ftl01.cr.usgs.gov/$PLATFORM/$PRODUCT/
where PLATFORM=MOLT, PRODUCT=MOD09GA.005 and YEAR=2004, for example. This seems to start looking into all the useful dates, finds the index.html, and then just skips to the next directory, without downloading the relevant hdf file:
--2013-06-14 13:09:18-- http://e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2004.01.01/
Reusing existing connection to e4ftl01.cr.usgs.gov:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2004.01.01/index.html'
[ <=> ] 174,182 134K/s in 1.3s
2013-06-14 13:09:20 (134 KB/s) - `e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2004.01.01/index.html' saved [174182]
Removing e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2004.01.01/index.html since it should be rejected.
--2013-06-14 13:09:20-- http://e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2004.01.02/
[...]
If I ignore the -A option, only the index.html file is downloaded to my system, but it appears it's not parsed and the links are not followed. I don't really know what more is required to make this work, as I can't see why it doesn't!!!
SOLUTION
In the end, the problem was due to an old bug in the local version of wget. However, I ended up writing my own script for downloading MODIS data from the server above. The script is pure Python, and is available from here.

Consider to use pyModis instead of wget which is a Free and Open Source Python based library to work with MODIS data. It offers bulk-download for user selected time ranges, mosaicking of MODIS tiles, and the reprojection from Sinusoidal to other projections, convert HDF format to other formats. See
http://www.pymodis.org/

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Does osquery inotify install watcher on directory or files - inotify

Related

btrfs shows strange dir names in send-receive dump

How to update modules.conf for SELINUX in BUILDROOT?

Can you get the number of lines of code from a GitHub repository?

Where to find logs for a cloud-init user-data script?

wget files from FTP-like listings

Categories

Resources