How do I get the raw version of a gist from github?

How do I get the raw version of a gist from github? - gist

I need to load a shell script from a raw gist but I can't find a way to get raw URL.
curl -L address-to-raw-gist.sh | bash

And yet there is, look for the raw button (on the top-right of the source code).
The raw URL should look like this:
https://gist.githubusercontent.com/{user}/{gist_hash}/raw/{commit_hash}/{file}
Note: it is possible to get the latest version by omitting the {commit_hash} part, as shown below:
https://gist.githubusercontent.com/{user}/{gist_hash}/raw/{file}

February 2014: the raw url just changed.
See "Gist raw file URI change":
The raw host for all Gist files is changing immediately.
This change was made to further isolate user content from trusted GitHub applications.
The new host is
https://gist.githubusercontent.com.
Existing URIs will redirect to the new host.
Before it was https://gist.github.com/<username>/<gist-id>/raw/...
Now it is https://gist.githubusercontent.com/<username>/<gist-id>/raw/...
For instance:
https://gist.githubusercontent.com/VonC/9184693/raw/30d74d258442c7c65512eafab474568dd706c430/testNewGist
KrisWebDev adds in the comments:
If you want the last version of a Gist document, just remove the <commit>/ from URL
https://gist.githubusercontent.com/VonC/9184693/raw/testNewGist

One can simply use the github api.
https://api.github.com/gists/$GIST_ID
Reference: https://miguelpiedrafita.com/github-gists

Gitlab snippets provide short concise urls, are easy to create and goes well with the command line.
Sample example: Enable bash completion by patching /etc/bash.bashrc
sudo su -
(curl -s https://gitlab.com/snippets/21846/raw && echo) | patch -s /etc/bash.bashrc

Related

File not found error when using cyberduck CLI for OneDrive

I want to upload encrypted backups on OneDrive using cyberduck to avoid local copys. Having a local file called file.txt I want to upload in folder Backups from OneDrive root, I used this command:
duck --username <myUser> --password <myPassword> --upload onedrive://Backups .\file.txt
Transfer incomplete…
File not found. /. Please contact your web hosting service provider for assistance.
It's not even possible to get the directory content using duck --username <myUser> --password <myPassword> --listonedrive://Backups command. This also cause a File not found error.
What I'm doing wrong?
I exactly followed the documentation and have no clue why this is not working. Cyberduck was installed by using chocolately, current version is Cyberduck 6.6.2 (28219)

Just testing this out, it looks like OneDrive sets a unique identifier for the root folder. You can find that by either inspecting the value of the cid parameter in the URL of your OneDrive site or I found it by using the following command
duck --list OneDrive:///
Note, having three slashes is important. It would appear the first two are part of the protocol name and the first specifies you want the root. The result should look like a unique id of some sort like this: 36d25d24238f8242, which you can then use to upload your files like:
duck --upload onedrive://36d25d24238f8242/Backups .\file.txt
Didn't see any of that in the docs... just tinkering with it. So I might recommend opening a bug with duck to update their docs if this works for you.

What happens if you use the full path to the file, it looks like it is just complaining about not finding the file to uploads so could be you are in a different directory or something so it needs the full path to the source file.

What am I screwing up trying to download particular file types with wget?

I am attempting to regularly archive a few file types hosted on a community website where our admin has been MIA for years, in case he dies or just stops paying for the hosting.
I am able to download all of the files I need using wget -r -np -nd -e robots=off -l 0 URL but this leaves me with about 60,000 extra files to waste time both downloading and deleting.
I am really only looking for files with the extensions "tbt" and "zip". When I add in -A tbt,zip to the input, wget then only downloads a single file, "index.html.tmp". It immediately deletes this file because it doesn't match the file type specified, and then the process stops entirely, with wget announcing that it is finished. It does not attempt to download any of the other files that it grabs when the -A flag is not included.
What am I doing wrong? Why does specifying file types in the way that I did cause it to finish after only looking at one file?

Possibly you're hitting the same problem I've hit when trying to do something similar. When using --accept, wget determines whether a links refers to a file or directory based on whether or not it ends with a /.
For example, say I have a directory named files, and a web page that has:
Lots o' files!
If I were to request this with wget -r, then I wget would happily GET /files, see that it was an HTML document containing a bunch of links, and continue to download those links.
However, if I add -A zip to my command line, and run wget with --debug, I see:
appending ‘http://localhost:8080/files’ to urlpos.
[...]
Deciding whether to enqueue "http://localhost:8080/files".
http://localhost:8080/files (files) does not match acc/rej rules.
Decided NOT to load it.
In other words, wget thinks this is a file (no trailing /) and it doesn't match our acceptance criteria, so it gets rejected.
If I modify the remote file so that it looks like...
Lots o' files!
...then wget will follow the link and download files as desired.
I don't think there's a great solution to this problem if you need to use wget. As I mentioned in my comment, there are other tools available that may handle this situation more gracefully.
It's also possible you're experiencing a different issue; the output of adding --debug to your command line clarify things in that case.

I also experienced this issue, on a page where all the download links looked something like this: filedownload.ashx?name=file.mp3. The solution was to match for both the linked file, and the downloaded file. So my wget accept flag looked like this: -A 'ashx,mp3'. I also used the --trust-server-names flag. This catches all the .ashx that are linked in the webpage, then when wget does the second check, all the mp3 files that were downloaded will stay.
As an alternative to --trust-server-names, you may also find the --content-disposition flag helpful. Both flags help rename the file that gets downloaded from filedownload.ashx?name=file.mp3 to just file.mp3.

How to download a file from box using wget?

I've created a direct link to a file in box:
The previous link is to the browser web interface, so I've then shared with a direct link:
However, if I download the file with a wget I receive garbage.
How can I download the file with wget?

I was able to download the file by making the link public, then replacing /s/ in the url with /shared/static
So my final command was:
curl -L https://MYUNI.box.com/shared/static/EXAMPLEtzwosac6pz --output myfile.zip
This can probably be modified for wget.

I might be a bit late to the party, but FWIW:
I tried to do the same things in order to download a folder.
I went to the box UI and opened the browser's network tab on the developer tools.
Then I clicked on download and copied as cURL the first link generated, it was something like (removed many headers and options for readability)
curl 'https://app.box.com/index.php?folder_id=122215143745&rm=box_v2_zip_folder'
The response of this request is a json object containing a link for downloading the folder:
{
"use_zpdl": "true",
"result": "success",
"download_url": <somg long url>,
"progress_reporting_url": <some other url>
}
I then executed wget -L <download_url> and was able to download the file using wget

The solution was to add the -L option to follow the HTTP redirect:
wget -v -O myfile.tgz -L https://ibm.box.com/shared/static/xxxxx.tgz

What you can do in 2022 is something like this:
wget "https://your_university.app.box.com/index.php?rm=box_download_shared_file&vanity_name=your_private_name&file_id=f_your_file_id"
You can find this link in the POST method in an incognito under Google Chrome's network tab. Note that the double quotes escape characters.

wget appends query string to resulting file

I'm trying to retrieve working webpages with wget and this goes well for most sites with the following command:
wget -p -k http://www.example.com
In these cases I will end up with index.html and the needed CSS/JS etc.
HOWEVER, in certain situations the url will have a query string and in those cases I get an index.html with the query string appended.
Example
www.onlinetechvision.com/?p=566
Combined with the above wget command will result in:
index.html?page=566
I have tried using the --restrict-file-names=windows option, but that only gets me to
index.html#page=566
Can anyone explain why this is needed and how I can end up with a regular index.html file?
UPDATE: I'm sort of on the fence on taking a different approach. I found out I can take the first filename that wget saves by parsing the output. So the name that appears after Saving to: is the one I need.
However, this is wrapped by this strange character â - rather than just removing that hardcoded - where does this come from?

If you try with parameter "--adjust-extension"
wget -p -k --adjust-extension www.onlinetechvision.com/?p=566
you come closer. In www.onlinetechvision.com folder there will be file with corrected extension: index.html#p=566.html or index.html?p=566.html on *NiX systems. It is simple now to change that file to index.html even with script.
If you are on Microsoft OS make sure you have latter version of wget - it is also available here: https://eternallybored.org/misc/wget/

To answer your question about why this is needed, remember that the web server is likely to return different results based on the parameters in the query string. If a query for index.html?page=52 returns different results from index.html?page=53, you probably wouldn't want both pages to be saved in the same file.
Each HTTP request that uses a different set of query parameters is quite literally a request for a distinct resource. wget can't predict which of these changes is and isn't going to be significant, so it's doing the conservative thing and preserving the query parameter URLs in the filename of the local document.

My solution is to do recursive crawling outside wget:
get directory structure with wget (no file)
loop to get main entry file (index.html) from each dir
This works well with wordpress sites. Could miss some pages tho.
#!/bin/bash
#
# get directory structure
#
wget --spider -r --no-parent http://<site>/
#
# loop through each dir
#
find . -mindepth 1 -maxdepth 10 -type d | cut -c 3- > ./dir_list.txt
while read line;do
wget --wait=5 --tries=20 --page-requisites --html-extension --convert-links --execute=robots=off --domain=<domain> --strict-comments http://${line}/
done < ./dir_list.txt

The query string is required because of the website design what the site is doing is using the same standard index.html for all content and then using the querystring to pull in the content from another page like with script on the server side. (it may be client side if you look in the JavaScript).
Have you tried using --no-cookies it could be storing this information via cookie and pulling it when you hit the page. also this could be caused by URL rewrite logic which you will have little control over from the client side.

use -O or --output-document options. see http://www.electrictoolbox.com/wget-save-different-filename/

CVS command to get brief history of repository

I am using following command to get a brief history of the CVS repository.
cvs -d :pserver:*User*:*Password*#*Repo* rlog -N -d "*StartDate* < *EndDate*" *Module*
This works just fine except for one small problem. It lists all tags created on each file in that repository. I want the tag info, but I only want the tags that are created in the date range specified. How do I change this command to do that.

I don't see a way to do that natively with the rlog command. Faced with this problem, I would write a Perl script to parse the output of the command, correlate the tags to the date range that I want and print them.
Another solution would be to parse the ,v files directly, but I haven't found any robust libraries for doing that. I prefer Perl for that type of task, and the parsing modules don't seem to be very high quality.