wget files from FTP-like listings - wget

So, site that used to use FTP now has an HTTP front-end and won't allow FTP connections. The site in question (for an example directory) will show a page with links to different dates. Inside each of these different dates, there are many files, and I typically just need to get some file with some clear pattern e.g. *h17v04*.hdf. I thought this could work:
wget -I "${PLATFORM}/${PRODUCT}/${YEAR}.*" -r -l 4 \
--user-agent="Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1" \
--verbose -c -np -nc -nd \
-A "*h17v04*.hdf" http://e4ftl01.cr.usgs.gov/$PLATFORM/$PRODUCT/
where PLATFORM=MOLT, PRODUCT=MOD09GA.005 and YEAR=2004, for example. This seems to start looking into all the useful dates, finds the index.html, and then just skips to the next directory, without downloading the relevant hdf file:
--2013-06-14 13:09:18-- http://e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2004.01.01/
Reusing existing connection to e4ftl01.cr.usgs.gov:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2004.01.01/index.html'
[ <=> ] 174,182 134K/s in 1.3s
2013-06-14 13:09:20 (134 KB/s) - `e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2004.01.01/index.html' saved [174182]
Removing e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2004.01.01/index.html since it should be rejected.
--2013-06-14 13:09:20-- http://e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2004.01.02/
[...]
If I ignore the -A option, only the index.html file is downloaded to my system, but it appears it's not parsed and the links are not followed. I don't really know what more is required to make this work, as I can't see why it doesn't!!!
SOLUTION
In the end, the problem was due to an old bug in the local version of wget. However, I ended up writing my own script for downloading MODIS data from the server above. The script is pure Python, and is available from here.

Consider to use pyModis instead of wget which is a Free and Open Source Python based library to work with MODIS data. It offers bulk-download for user selected time ranges, mosaicking of MODIS tiles, and the reprojection from Sinusoidal to other projections, convert HDF format to other formats. See
http://www.pymodis.org/

Related

postgresql pgbadger error - can not load incompatible binary data, binary file is from version < 4.0

I am trying to use pgbadger to make html report for postgres slow query log files. My postgres logfiles are in csvlog format in folder pg_log. I transfer all logfiles
(80 files with 10 MB each) to my local windows machine and trying to generate single html report for all files. I created all one file from all files in below way,
type postgresql-2020-06-18_075333.csv > postgresql.csv
type postgresql-2020-06-18_080011.csv >> postgresql.csv
....
....
type postgresql-2020-06-18_094812.csv >> postgresql.csv
I downloaded "pgbadger-11.2" and tried below command but getting error.
D:\pgbadger-11.2>perl --version
This is perl 5, version 28, subversion 1 (v5.28.1) built for MSWin32-x64-multi-thread
D:\pgbadger-11.2>perl pgbadger "D:\June-Logs\postgresql.csv" -o postgresql.html
[========================>] Parsed 923009530 bytes of 923009530 (100.00%), queries: 1254764, events: 53
can not load incompatible binary data, binary file is from version < 4.0.
LOG: Ok, generating html report...
postgresql.html is created but no data in any tab.But it works when i create separate report for individual csv. like below.
D:\pgbadger-11.2>perl pgbadger "D:\June-Logs\postgresql-2020-06-18_075333.csv" -o postgresql-2020-06-18_075333.html
D:\pgbadger-11.2>perl pgbadger "D:\June-Logs\postgresql-2020-06-18_080011.csv" -o postgresql-2020-06-18_080011.html
...
D:\pgbadger-11.2>perl pgbadger "D:\June-Logs\postgresql-2020-06-18_094812.csv" -o postgresql-2020-06-18_094812.html
Please suggest me something to fix this issue.
I going to say this due to:
type postgresql-2020-06-18_075333.csv > postgresql.csv
type postgresql-2020-06-18_080011.csv >> postgresql.csv
Pretty sure that is introducing Windows line endings and pgBadger is looking for Unix line endings. Can you do the concatenate on the server?
UPDATE. Hmm. Ran across this
https://github.com/darold/pgbadger/releases
"This new release breaks backward compatibility with old binary or JSON
files. This also mean that incremental mode will not be able to read
old binary file [...] Add a warning about version and skip loading incompatible binary file.
Update code formatter to pgFormatter 4.0."
Not sure why it is failing on CSV logs, still what is version of pgBadger generating logs?

How to load multiple osm files into Nominatim

I need to figure out the process to load multiple OSM files into a Nominatim database. I have everything setup and can load a single file with no issues.
Basically what I'm trying to do is load some of the GeoFabrik OSM files for only a part of the world. So I'm grabbing like the North America and South America OSM files. Or any 2 on their site.
For the first load I use the setup.php:
./utils/setup.php --osm-file file.osm --all --osm2pgsql-cache 4000
I'm not sure if I have another file (file2.osm) how to load this into the database and keep the original data.
Basically, I just want pieces of the world and I only need to load data like every six months or so. I don't need daily updates/ etc...
I need to split the files up because it just takes too long to load and I want to manage it better.
Can I use the update.php..... But not sure what parameters.
I thought about loading all data with update and the no-index clause...Then maybe building the index??
I did try to re-run the setup.php for the second file but it just hung for a long time
For second file
./utils/setup.php --import-data --osm-file file2.osm --osm2pgsql-cache 4000
But this just hangs on Setting up table: planet_osm_ways. (I tested very small OSM files that should finish within minutes but it just hangs).
The files that I'm using are all non-intersecting so not truly updates. SO I have a North America and a South America...How do I load both into Nominatim separately.
Thanks
The answer can be found at help.openstreetmap.org.
First you need to import it via the update script: ./utils/update.php --import-file <yourfile>. Then you need to trigger a re-indexing of the data: ./utils/update.php --index
But according to lonvia (one of the Nominatim developers) this will be very slow and it is better if you merge all your files first and then import it as one large file.
Sample Merging Code, merging Andorra, Malta and Liechtenstein,
curl -L 'http://download.geofabrik.de/europe/andorra-latest.osm.pbf' --create-dirs -o /srv/nominatim/src/andorra.osm.pbf
curl -L 'http://download.geofabrik.de/europe/malta-latest.osm.pbf' --create-dirs -o /srv/nominatim/src/malta.osm.pbf
curl -L 'http://download.geofabrik.de/europe/liechtenstein-latest.osm.pbf' --create-dirs -o /srv/nominatim/src/liechtenstein.osm.pbf
osmconvert /srv/nominatim/src/andorra.osm.pbf -o=/srv/nominatim/src/andorra.o5m
osmconvert /srv/nominatim/src/malta.osm.pbf -o=/srv/nominatim/src/malta.o5m
osmconvert /srv/nominatim/src/liechtenstein.osm.pbf -o=/srv/nominatim/src/liechtenstein.o5m
osmconvert /srv/nominatim/src/andorra.o5m /srv/nominatim/src/malta.o5m /srv/nominatim/src/liechtenstein.o5m -o=/srv/nominatim/src/data.o5m
osmconvert /srv/nominatim/src/data.o5m -o=/srv/nominatim/src/data.osm.pbf;
More about OsmConvert -> https://wiki.openstreetmap.org/wiki/Osmconvert
Once merged, you can,
sudo -u nominatim /srv/Nominatim/build/utils/setup.php \
--osm-file /srv/nominatim/src/data.osm.pbf \
--all \
--threads ${BUILD_THREADS} \ # 16 Threads?
--osm2pgsql-cache ${OSM2PGSQL_CACHE} # 24000 ?

make wget download a file directly to disk from bash

On a website, after logging in with my credentials I am able to download daa by changing the url address to variations of this:
https://data.somewhere.com/DataDownload/getfile.jsp?ccy=AUDUSD&df=BBO&year=2014&month=02&dllater=Download
This put a zip file in my downlaod directory.
If I try to automate it with wget using:
wget "https://data.somewhere.com/DataDownload/getfile.jsp?ccy=AUDUSD&df=BBO&year=2014&month=02&dllater=Download" --no-check-certificate --ignore-length
$ ~/dnloadHotSpot.sh
--2014-03-22 16:05:16-- https://data.somewhere.com/DataDownload/getfile.jsp?ccy=AUDUSD&df=BBO&year=2014&month=02&dllater=Download
Resolving data.somewhere.com (data.somewhere.com)... 209.191.250.173
Connecting to data.somewhere.com (data.somewhere.com)|209.191.250.173|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: ignored [text/html]
Saving to: `getfile.jsp#ccy=AUDUSD&df=BBO&year=2014&month=02&dllater=Download'
[ <=> ] 8,925 --.-K/s in 0.001s
2014-03-22 16:05:18 (14.4 MB/s) - `getfile.jsp#ccy=AUDUSD&df=BBO&year=2014&month=02&dllater=Download' saved [8925]
What else to I need to add to make wget actually download the file?
If you want to specify the name of the output file into which wget places the contents of the file is is downloading, then use the capital O parameter, something like:
wget -O myfilename ......

Download latest GitHub release

I'd like to have "Download Latest Version" button on my website which would represent the link to the latest release (stored at GitHub Releases). I tried to create release tag named "latest", but it became complicated when I tried to load new release (confusion with tag creation date, tag interchanging, etc.). Updating download links on my website manually is also a time-consuming and scrupulous task. I see the only way - redirect all download buttons to some html, which in turn will redirect to the actual latest release.
Note that my website is hosted at GitHub Pages (static hosting), so I simply can't use server-side scripting to generate links. Any ideas?
You don't need any scripting to generate a download link for the latest release. Simply use this format:
https://github.com/:owner/:repo/zipball/:branch
Examples:
https://github.com/webix-hub/tracker/zipball/master
https://github.com/iDoRecall/selection-menu/zipball/gh-pages
If for some reason you want to obtain a link to the latest release download, including its version number, you can obtain that from the get latest release API:
GET /repos/:owner/:repo/releases/latest
Example:
$.get('https://api.github.com/repos/idorecall/selection-menu/releases/latest', function (data) {
$('#result').attr('href', data.zipball_url);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<a id="result">Download latest release (.ZIP)</a>
Github now provides a "Latest release" button on the release page of a project, after you have created your first release.
In the example you gave, this button links to https://github.com/reactiveui/ReactiveUI/releases/latest
You can use the following where:
${Organization} as the GitHub user or organization
${Repository} is the repository name
curl -L https://api.github.com/repos/${Organization}/${Repository}/tarball > ${Repository}.tar.gz
The top level directory in the .tar.gz file has the sha hash of the commit in the directory name which can be a problem if you need an automated way to change into the resulting directory and do something.
The method below will strip this out, and leave the files in a folder with a predictable name.
mkdir ${Repository}
curl -L https://api.github.com/repos/${Organization}/${Repository}/tarball | tar -zxv -C ${Repository} --strip-components=1
Since February 18th, 2015, the GitHUb V3 release API has a get latest release API.
GET /repos/:owner/:repo/releases/latest
See also "Linking to releases".
Still, the name of the asset can be tricky.
Git-for-Windows, for instance, requires a command like:
curl -IkLs -o NUL -w %{url_effective} \
https://github.com/git-for-windows/git/releases/latest|\
grep -o "[^/]*$"| sed "s/v//g"|\
xargs -I T echo \
https://github.com/git-for-windows/git/releases/download/vT/PortableGit-T-64-bit.7z.exe \
-o PortableGit-T-64-bit.7z.exe| \
sed "s/.windows.1-64/-64/g"|sed "s/.windows.\(.\)-64/.\1-64/g"|\
xargs curl -kL
The first 3 lines extract the latest version 2.35.1.windows.2
The rest will build the right URL
https://github.com/git-for-windows/git/releases/download/
v2.35.1.windows.2/PortableGit-2.35.1.2-64-bit.7z.exe
^^^^^^^^^^^^^^^^^ ^^^^^^^^^
Maybe could you use some client-side scripting and dynamically generate the target of the link by invoking the GitHub api, through some JQuery magic?
The Releases API exposes a way to retrieve the list of all the releases from a repository. For instance, this link return a Json formatted list of all the releases of the ReactiveUI project.
Extracting the first one would return the latest release.
Within this payload:
The html_url attribute will hold the first part of the url to build (ie. https://github.com/{owner}/{repository}/releases/{version}).
The assets array will list of the downloadable archives. Each asset will bear a name attribute
Building the target download url is only a few string operations away.
Insert the download/ keyword between the releases/ segment from the html_url and the version number
Append the name of the asset to download
Resulting url will be of the following format: https://github.com/{owner}/{repository}/releases/download/{version}/name_of_asset
For instance, regarding the Json payload from the link ReactiveUI link above, we've got html_url: "https://github.com/reactiveui/ReactiveUI/releases/5.99.0" and one asset with name: "ReactiveUI.6.0.Preview.1.zip".
As such, the download url is https://github.com/reactiveui/ReactiveUI/releases/download/5.99.0/ReactiveUI.6.0.Preview.1.zip
If you using PHP try follow code:
function getLatestTagUrl($repository, $default = 'master') {
$file = #json_decode(#file_get_contents("https://api.github.com/repos/$repository/tags", false,
stream_context_create(['http' => ['header' => "User-Agent: Vestibulum\r\n"]])
));
return sprintf("https://github.com/$repository/archive/%s.zip", $file ? reset($file)->name : $default);
}
Function usage example
echo 'Download';
As I didn't see the answer here, but it was quite helpful for me while running continuous integration tests, this one-liner that only requires you to have curl will allow to search the Github repo's releases to download the latest version
https://gist.github.com/steinwaywhw/a4cd19cda655b8249d908261a62687f8
I use it to run PHPSTan on our repository using the following script
https://gist.github.com/rvanlaak/7491f2c4f0c456a93f90e31774300b62
If you are trying to download form any linux — even old or tiny versions — or are trying to download from a bash script then the failproof way is using this command:
wget https://api.github.com/repos/$OWNER/$REPO/releases/latest -O - | awk -F \" -v RS="," '/browser_download_url/ {print $(NF-1)}' | xargs wget
do not forget to replace $OWNER and $REPO with the right owner and repository names. The command downloads a json page with the data of the latest release. then awk gets the value from the browser_download_url key.
If you are in a really old linux or a tiny embedded system with a small wget, the download name can be a problem. In such case you can always use the ultra-reliable:
URL=$(wget https://api.github.com/repos/$OWNER/$REPO/releases/latest -O - | awk -F \" -v RS="," '/browser_download_url/ {print $(NF-1)}'); wget $URL -O $(basename "$URL")
As noted by #Dan Dascalescu in a comment to accepted answer, there are some projects (roughly 30%) which do not bother to file formal releases, so neither "Latest release" button nor /releases/latest API call would return useful data.
To reliably fetch the latest release for a GitHub project, you can use lastversion.

Is there a way to tell django compressor to create source maps

I want to be able to debug minified compressed javascript code on my production site. Our site uses django compressor to create minified and compressed js files. I read recently about chrome being able to use source maps to help debug such javascript. However I don't know how/if possible to tell the django compressor to create source maps when compressing the js files
I don't have a good answer regarding outputting separate source map files, however I was able to get inline working.
Prior to adding source maps my settings.py file used the following precompilers
COMPRESS_PRECOMPILERS = (
('text/coffeescript', 'coffee --compile --stdio'),
('text/less', 'lessc {infile} {outfile}'),
('text/x-sass', 'sass {infile} {outfile}'),
('text/x-scss', 'sass --scss {infile} {outfile}'),
('text/stylus', 'stylus < {infile} > {outfile}'),
)
After a quick
$ lessc --help
You find out you can put the less and map files in to the output css file. So my new text/less precompiler entry looks like
('text/less', 'lessc --source-map-less-inline --source-map-map-inline {infile} {outfile}'),
Hope this helps.
Edit: Forgot to add, lessc >= 1.5.0 required for this, to upgrade use
$ [sudo] npm update -g less
While I couldn't get this to work with django-compressor (though it should be possible, I think I just had issues getting the app set up correctly), I was able to get it working with django-assets.
You'll need to add the appropriate command-line argument to the less filter source code as follows:
diff --git a/src/webassets/filter/less.py b/src/webassets/filter/less.py
index eb40658..a75f191 100644
--- a/src/webassets/filter/less.py
+++ b/src/webassets/filter/less.py
## -80,4 +80,4 ## class Less(ExternalTool):
def input(self, in_, out, source_path, **kw):
# Set working directory to the source file so that includes are found
with working_directory(filename=source_path):
- self.subprocess([self.less or 'lessc', '-'], out, in_)
+ self.subprocess([self.less or 'lessc', '--line-numbers=mediaquery', '-'], out, in_)
Aside from that tiny addition:
make sure you've got the node -- not the ruby gem -- less compiler (>=1.3.2 IIRC) available in your path.
turn on the sass source-maps option buried away in chrome's web inspector config pages. (yes, 'sass' not less: less tweaked their debug-info format to match sass's since since sass had already implemented a chrome-compatible mapping and their formats weren't that different to begin with anyway...)
Not out of the box but you can extend a custom filter:
from compressor.filters import CompilerFilter
class UglifyJSFilter(CompilerFilter):
command = "uglifyjs -c -m " /
"--source-map-root={relroot}/ " /
"--source-map-url={name}.map.js" /
"--source-map={relpath}/{name}.map.js -o {output}"