wget mirror with "pretty urls" structure - wget

I'm using the following command to grab a site wget --mirror --page-requisites --convert-links --adjust-extension --span-hosts --restrict-file-names=windows --no-parent https://example.com
I want to host the static site on GitHub pages, so for URLs like /content/foo I'd like to get /content/foo/index.html. Currently I get /content/foo.html.
My question is how to get that index.html added, as GitHub pages will know how to server /content/foo/ given the index.html is present.

Related

Download released asset from private github repo

I'm trying to download a single added asset from a release in a private repository but all I get is a 404 response. the download url of the asset is https://github.com/<user>/<repo>/releases/download/20211022/file.json
I've tried several different methods of specifying the username and private access token but all give the same result. When I use the access token to access api.github.com then it seems to work.
I've tried the following formats in curl
curl -i -u <user>:<token> <url>
curl -i "https://<user>:<token>#github.com/ ...."
curl -i -H "Authorization: token <token>" <url>
I can download the source code (zip) from the release, but this has a different url: https://github.com/<user>/<repo>/archive/refs/tags/20211022.zip
What am I doing wrong?
As in the document here
You have to use url: /repos/{owner}/{repo}/releases/assets/{asset_id} not browser_download_url and set request header to accept application/octet-stream to download.
And don't forget to add Authorization header too.
The alternative is to use the GitHub CLI gh, with the command gh release download.
Since gh 2.19.0 (Nov. 2022), you can download a single asset with gh release download --output (after a gh auth login, for authentication):
In your case:
gh release download --pattern 'file.json' --output file.json

gsutil setmeta AccessDeniedException: 403 Forbidden

I am trying to change the cache control for my CDN by doing:
gsutil setmeta -h "Cache-Control: max-age=0, s-maxage=86400" gs://<BUCKET>/*
However when I do I get the error: AccessDeniedException: 403 Forbidden
I've tried to change the ACL to a project but get the same error.
gsutil acl set -R public-read gs://<BUCKET>/
Ideally, I'd like this cache control to be on every bucket I create and have a default ACL to allow this.
Does anyone know how I can make this AccessDenied go away. I am signed into the owner of the project.

Facebook does not show thumbnail

I am using Facebook's Graph API and I am trying to share a link to a page with a embedded swf video using the following CURL request.
curl -i -X POST \
-d "height=405" \
-d "link=https%3A%2F%2Fbit.ly%2F1SptrHN" \
-d "message=Test" \
-d "name=The%20Last%20Witch%20Hunter" \
-d "picture=https%3A%2F%2Fstatic1.webvideocore.net%2Fi%2Fstores%2F2%2Fitems%2Fbg%2F9%2F9c%2F9ce4632ed7b89b5a36638cdd6392914d.jpg" \
-d "source=https%3A%2F%2Fplay.streamingvideoprovider.com%2Fplayer3.swf%3Fclip_id%3Dar7hgx038sw8%26autoStart%3D1%26native_fs%3D1%26noControls%3D%26repeatVideo%3D%26stretch_video%3D%26brandNW%3D1%26start_img%3D1%26start_volume%3D100%26autoHide%3D1%26skinAlpha%3D80%26colorBase%3D%2523202020%26colorIcon%3D%2523FFFFFF%26colorHighlight%3D%2523fcad37%26viewers_limit%3D0%26cc_position%3Dbottom%26cc_positionOffset%3D70%26cc_multiplier%3D0.03%26cc_textColor%3D%2523ffffff%26cc_textOutlineColor%3D%2523000000%26cc_bkgColor%3D%2523000000%26cc_bkgAlpha%3D0.7" \
-d "type=link" \
-d "width=720" \
-d "access_token=CAAEl5c0JLDABAJpu3DJVbndfcmZCrr9xnk5zoWn5Ik9KEwS14autS1ZAc4ceDdzr4eTIqqzH6z8ePvkZA1gOVUZCKrInECJiFaZCgM1Y0JDocgfyyg9BLSpNzLtMZCOhiPpRPkk0URyCRDedQxQEx3yodXKiyzRJq7RKPZAVKrb77mlxA8fuUvRDZCcGgwgdZAuNZCWnLvtDly8wZDZD" \
"https://graph.facebook.com/v2.0/me/feed"
As you can see I am setting the picture and the image is publicly available. However the Graph API Explorer returns the following url for the thumbnail which points to the swf.
https://external.xx.fbcdn.net/safe_image.php?d=AQCRoO4J0CcrVO2M&w=130&h=130&url=https%3A%2F%2Fplay.streamingvideoprovider.com%2Fplayer3.swf%3Fclip_id%3Dar7hgx038sw8%26autoStart%3D1%26native_fs%3D1%26noControls%26repeatVideo%26stretch_video%26brandNW%3D1%26start_img%3D1%26start_volume%3D100%26autoHide%3D1%26skinAlpha%3D80%26colorBase%3D%2523202020%26colorIcon%3D%2523FFFFFF%26colorHighlight%3D%2523fcad37%26viewers_limit%3D0%26cc_position%3Dbottom%26cc_positionOffset%3D70%26cc_multiplier%3D0.03%26cc_textColor%3D%2523ffffff%26cc_textOutlineColor%3D%2523000000%26cc_bkgColor%3D%2523000000%26cc_bkgAlpha%3D0.7&cfs=1
Furthermore sharing the link from the page manually shows the correct thumbnail.
What could be the reason for the wrong thumbnail?
https://developers.facebook.com/docs/graph-api/reference/v2.6/user/feed#publish lists the valid parameters for creating posts via that endpoint. It doesn’t mention source at all.
Although it seems to be a valid parameter for the Feed dialog – you are making a post made via API, and that’s something different.
If you want to share a link with a video, then I’d recommend embedding the video via the Open Graph meta tags – https://developers.facebook.com/docs/sharing/webmasters#video

Deployment from private github repository

I am new to git and github (I used Subversion before). I can not find a solution how to export master branch only from my private repository to my production server.
I need to prepare an automated solution (called via fabric). I found that git has the archive command but this doesn't work with github (I set up SSH keys):
someuser#ews1:~/sandbox$ git archive --format=tar --remote=git#github.com:someuser/somerepository.git master
Invalid command: 'git-upload-archive 'someuser/somerepository.git''
You appear to be using ssh to clone a git:// URL.
Make sure your core.gitProxy config option and the
GIT_PROXY_COMMAND environment variable are NOT set.
fatal: The remote end hung up unexpectedly
So I will need an another way how to do this. I don't want any meta files from git in the export. When I do clone, I will have all these files in .git directory (what I don't want) and this downloads more data than I really need.
Is there a way how to do this over ssh? Or I have to download the zip over HTTPS only?
I'm not sure I fully understood your question.
I use this command to pull the current master version to my server:
curl -sL --user "user:pass" https://github.com/<organisation>/<repository>/archive/master.zip > master.zip
Does this help?
As I explained in SO answer on downloading GitHub archives from a private repo, after creating a GitHub access token, you can use wget or curl to obtain the desired <version> of your repo (e.g. master, HEAD, commit SHA-1, ...).
Solution with wget:
wget --output-document=<version>.tar.gz \
https://api.github.com/repos/<owner>/<repo>/tarball/<version>?access_token=<OAUTH-TOKEN>
Solution with curl:
curl -L https://api.github.com/repos/<owner>/<repo>/tarball/<version>?access_token=<OAUTH-TOKEN> \
> <version>.tar.gz

wget download for offline viewing including absolute references

I'm trying to download an entire webpage using the following command
wget -p -k www.myspace.com/
This does download the page and any images or scripts under that directory, but I'm trying to figure out how to download that page for completely offline viewing. How would I get every image, script, and style sheet linked within the source for www.myspace.com including external links?
wget -e robots=off -H -p -k http://www.myspace.com/
The -H or --span-hosts flag is necessary for a complete mirror, as the page is likely to include content on hosts outside the www.myspace.com domain. Ignore robots for good measure.
wget -mk http://www.myspace.com/
works for me. I am not sure about myspace or whatever site you are trying to mirror specifically, but sometimes you have to pass in some other options to get around the no-robots policy. I am not going to say how to do that because it means you are doing something you shouldn't be doing. Although it is definitely possible.