wget page and insert source domain - wget

I'm attempting to create a short script to cache a dynamically generated web page on another server.
I only want the one page so using
wget http://domain/file
Could be enough but the file contains relative links, which arent valid on the host.
Any ideas of an easy way to replace the relative links with absolute ones pointing to the original server/domain?

You are looking for the -k option:
Downloaded files will be changed to refer to the file they point to as a relative link.
Not downloaded files will be changed to include host name and absolute path of the location they point to.

Related

Relative Link to Repo's Root from Markdown file

I need to have a relative link to root of my repo from markdown file
(I need it working for any forks)
So it looks like the only way it's to provide a link to some file in the root:
the [Root](/README.md)
or
the [Root](../README.md)
(if it's located at /doc/README.md for instance)
At the same time I can refer to any folder without referring to a file
the [Doc](/doc)
But if I try to put a link to the root folder:
the [real root](/)
the [real root](../)
I'll have a link such
https://github.com/UserName/RepoName/blob/master
which unlike the
https://github.com/UserName/RepoName/blob/master/doc
refers to 404
So if I don't want to refer to README.md in the root (I could havn't it at all)
Is there any way to have such a link?
After some research I've found this solution:
[the real relative root of any fork](/../../)
It always points to the default branch. For me it's Ok, so it's up to you
PS
With such a trick you can also access the following abilities:
[test](/../../tree/test) - link to another branch
[doc/readme.md](/../../edit/master/doc/readme.md) - open in editor
[doc/readme.md](/../../delete/master/doc/readme.md) - ask to delete file
[doc/readme.md](/../../commits/master/doc/readme.md) - history
[doc/readme.md](/../../blame/master/doc/readme.md) - blame mode
[doc/readme.md](/../../raw/master/doc/readme.md) - raw mode (will redirect)
[doc/](/../../new/master/doc/) - ask to create new file
[doc/](/../../upload/master/doc/) - ask to upload file
[find](/../../find/test) - find file
You can either link directly to the file (../README.md), or simply use a full absolute URL to link directly to the repo root: https://github.com/UserName/RepoName
Using relative links doesn't work so well on GitHub. Notice the difference between the following two URLs:
https://github.com/UserName/RepoName/tree/master/somedir
https://github.com/UserName/RepoName/blob/master/somedir/somefile
Notice that the first points to a directory and the second points to a file. Yet, after the "RepoName" we have either one of tree (for a directory) or blob for a file. Therefore relative links between the two won't work properly. On GitHub, you can't use relative links to link between a file and a directory. However, you can link between two files (as both URLs contain blob). Therefore, if you wanted to link from somefile back to README.md in the root, you could do:
[README](../README.md)
That would give you the URL:
https://github.com/UserName/RepoName/blob/master/somedir/../README.md
which would get normalized to
https://github.com/UserName/RepoName/blob/master/README.md
However, if you just want to point to the root of your Repo (or any other dir), then it is probably best to use a full URL. After all, if someone has downloaded your repo and is viewing the source locally, the relative URL to the Repo root will be different than when viewing the file on GitHub. In that case, you probably want to point them to GitHub anyway. Therefore, you should use:
[root](https://github.com/UserName/RepoName)
Another advantage of that is that if your documentation ever gets published elsewhere (perhaps a documentation hosting service), the link will still point to the GitHub repo, not some random page on the hosting service. After all, the README at your project root is not likely to get included with the contents of the docs/ dir on said hosting service.
Perhaps it would help to understand how GitHub's URL scheme presumably works. I say "presumably" as I have no inside knowledge, just a general understanding of how these types of systems are generally designed.
GitHub is not serving flat files. Rather their server is taking the URL apart, and uses the various pieces to return the proper response. The URL structure looks something like this:
https://github.com/<username>/<repository name>/<resource type>/<branch>/<resource path>
The username, repository name, resource type, and branch are rather arbitrary and just ways to GitHub to ensure they are pulling information from the correct location.
The resource type matters as they are likely not pulling files from a working tree. Rather they are pulling the files/directory listings directly from the Repo itself through a lower level. In that case, obtaining a file is very different than obtaining a directory listing and requires a different code path. Therefore, you can't request a blob (file) with aresource path that points to a tree (directory) or visa versa. The server gets confused and returns an error.
The point is that GitHub's server works on a slightly different set of rules. You can use relative URLs to move around within the resource path part of the URL, but once you change the resource type in the resource path part of the URL, then GitHub's entire scheme is broken if you don't also change the resource type in the URL. However, browsers (or HTML or Markdown) have no knowledge about that and relative URLs don't compensate for that. Therefore, you can't reliably use relative URLs to move around within a GitHub repo unless you understand all of the subtleties. Sometimes its just better to use absolute links.

DNN - Redirecting specific file types

I've taken on the webmaster role for a website that uses DNN version 07.02.02. Most of the links to my pdf files are broken. They pdfs were in a folder called "/pdfs" now they're in a new folder "/docs/pdfs "
A few quick things:
I only have ftp access to the web site files. No access to web.config so rewrite rules are out.
I don't want to copy the old files back to "/pdfs" because it would mean managing two different pdf copies (there are over 500 pdfs).
Using file directories with a .pdf extension then add an index.asp file with a redirect i.e. "/pdfs/file_1001.pdf/index.asp" led to an error page because there's an override which doesn't allow site directory pages exposed.
Using a DNN module where I'd have to enter 500 files to redirect seems redundant when I only want to move a directory.
Any solutions to try?
In DNN if you have HOST level access you can modify Config files through the Host/Configuration manager page.
There you could modify the web.config file.
You might also look at the siteurls.config file (also accessible there) in which you could define some URL rules, might be as easy as
<RewriterRule>
<LookFor>/pdf/(.*)</LookFor>
<SendTo>/docs/pdf/$1</SendTo>
</RewriterRule>
The above rule is completely untested, not positive if it will do what you need or not.
I did a little more testing, and it looks like this won't work out of the box as a default setting that tells it to NOT rewrite PDF files, but I can't find the source code for that currently.

With filepicker.io, is there a way to get the original file path?

I am using filepicker.io and specially computer as main service.
I would like to get the original file path of a file uploaded through the API.
For example, if I upload a file located at /my/path/in/my/computer/file.zip, I will get in the FPFile object the filepicker.io URL but not the original file path.
Is there a way to get it ?
PS: I have tried to retrieve the stat of the file too without success.
Due to browser security limitations, the real local path of the file is never exposed to the javascript application. For more information, see http://davidwalsh.name/fakepath

Is there any way to have a text file on the server be readable only by the browser?

I have a few pages on my web server that extract data from text files that each contain a JSON string. The pages use $.get
Is there any way to allow only the server/webpages access the files? I would prefer to not have people going to the file path and saving the JSON data to their computer.
If I set permissions to deny access to the default IUSR, then people visiting the site won't be able to load them.
Any tricks around this?
I put such files in a directory tree out of the one the web server can see. e.g., html pages accessible by the browser go into /var/www/public_html/filename.php, but files that should not be seen go into /var/privateFiles/anotherfile.txt. The web server root is /var/www - so the web server cannot see anotherfile.txt, but filename.php can include it using full path name.

Relative links issue with Visio 2007

When using the "Use relative path for hyperlink" feature, it appears to only apply for network paths unless the target file happens to be reside in the same directory.
If your relative path contains any forward slashes, e.g. path/to/file.htm Microsoft converts them all to backslashes and the relative path can't be viewed in the browser...
I would like to be able to do this so I can move the folder anywhere without having to update all the links.
Is there any way around this? Thanks
Why would a relative path contain forward slashes?
So something up a directory would be .\someSubDirectory
and something back a directory would be
..\MyParentDirectory
and you can combine those
..\..\MyParentsParent and .\SubDir\SubSubDir.
Do you have a specific example I could help with?
Solution was setting the Hyperlink Base to be the base URL shared by all the pages in the website.