I am building an S3 URL redirect, nothing special just a bunch of zero-length objects with the WebsiteRedirectLocation meta filled out. The S3 bucket is set to server static websites, bucket policy set to public, etc. It works just fine.
HOWEVER - I also want to lock down certain files in the bucket - specifically some HTML files that serve to manage the redirects (like adding new redirects). With the traditional setup, I can both use the redirects, and also serve the HTML page just fine. But in order to lock it down, I need to use Cloudfront and Lambda#edge like in these posts:
https://douglasduhaime.com/posts/s3-lambda-auth.html
http://kynatro.com/blog/2018/01/03/a-step-by-step-guide-to-creating-a-password-protected-s3-bucket/
I have modified the lambda#edge script to only prompt for a password IF the admin page (or its assets like CSS/JS) are requested. If the requested path is something else (presumably a redirect file) the user is not prompted for a password. And yes, I could also set a behavior rule in Cloudfront to decide when to use the Lambda function to prompt for a password.
And it works kind of. When I follow these instructions and visit my site via the Cloudfront URL, I do indeed get prompted for a password when I goto the root of my site - the admin page. However, the redirects will not work. If I try to load a redirect the browser just downloads it instead.
Now, in another post someone suggested that I change my Cloudfront distribution endpoint to the S3 bucket WEBSITE endpoint - which I think also means changing the bucket policy back to website mode and the public which sucks because now its accessible outside of the Cloudfront policy which I do not want. Additionally - Cloudfront no longer automatically serves the specified index file, which isn't the worst thing.
SO - is it possible to lock down my bucket, the server it entirely through Cloudfront with Lambda#edge BUT also have Cloudfront respect those redirects instead of just prompting a download? Is there a setting in Cloudfront to respect the headers? Should I set up different behavior rules for the different files (HTML vs redirects)?
Instead of using the WebsiteRedirectLocation meta, which is specific to S3 static website hosting and has no effect when Cloudfront is the server, replace your empty objects with HTML objects that contain a meta HTML tag with the desired redirect target:
<meta http-equiv="Refresh" content="0; url=https://www.example.com" />
The number before the semicolon is the delay before the redirect, in seconds, where 0 is immediate.
Don't forget to also change the Content-Type meta tag of the objects to text/html.
And if you want to support old browsers that might not handle the Refresh directive correctly, add an anchor link as explained here.
Related
Say I have a website:
https://www.example.com
This website has many different HTML pages such as:
https://www.example.com/page.html
The website is hosted on AWS Amplify and has a variety of 301 redirects which are handled with JSON. Below is an example:
[
{
"source": "https://www.example.com/page.html",
"target": "https://www.example.com/page",
"status": "301",
"condition": null
}
]
So, as result, my page is always showing /page instead of /page.html on the client side, as expected. I read a lot about canonical URLS today and learned:
For the quickest effect, use 3xx HTTP (also known as server-side) redirects.
Suppose your page can be reached in multiple ways:
https://example.com/home
https://home.example.com
https://www.example.com
Pick one of those URLs as your canonical URL, and use redirects to send traffic from the other URLs to your preferred URL.
From: How to specify a canonical with rel="canonical" and other methods | Google Search Central | Documentation | Google Developers
Which is what I did with the JSON in AWS. I also found that using <link rel="canonical" href="desired page" in the <head> of my HTML is the best practice for telling google (Analytics, etc.) which page is the desired canonical. Which I have since updated all my pages with.
Now the main problem is whenever you hover a href or copy the link address, it includes the .HTML extension on the client side. As soon as this link is pasted and entered the server updates without the .HTML extension. My question is what is the best practice to exclude the extension and display the target address when copying the link address or hovering and the href appearing in the bottom left (Chrome MacOS 110.0.5481.77).
I've seen sites using absolute paths that include the full domain. This isn't a problem, however, most of the development of the site is done on a localhost. Doing this will make that a hassle as I would have to type in the full local path each time which includes the .html extension to get an accurate representation locally. Is there a certain way to do this, which is the correct way?
*Most of this is all new information to me so if something I'm saying is invalid, please correct me.
I have a GitHub Pages Jekyll blog at blog.Antrikshy.com. I have been meaning to move it to code.Antrikshy.com for a while now. I made the new address a CNAME for antrikshy.github.io on Amazon and wired everything correctly to make it work. Now my blog.Antrikshy.com URL is broken. How can I set it to redirect it to the new subdomain?
I'm new to this. Comment if you want any more information.
Preferably I'd like to do a 301 redirect and also retain the entire path, but that's not very important. I just want it to work, even if it means that users are redirected to the new home page.
You could accomplish this with s3 website redirect[1].
create a new s3 bucket with the name blog.antrikshy.com
enable s3 website
create an alias to blog.antrikshy.com bucket
enable redirects on your website
you can create redirects per page as well by creating a key for each page or redirect everything to your homepage
http://aws.amazon.com/about-aws/whats-new/2012/10/04/web-page-redirects-on-amazon-s3-hosted-websites/
A domainname that I do not own, is redirecting to my domain. I donĀ“t know who owns it and why it is redirecting to my domain.
This domain however is showing up in Googles search results. When doing a whois it also returns this message:
"Domain:http://[baddomain].com webserver returns 307 Temporary Redirect"
Since I do not own this domain I cannot set a 301 redirect, or disable it. When clicking the baddomain in Google it shows the content of my website but the baddomain.com stays visible in the URL bar.
My question is: How can I stop Google from indexing and showing this bad domain in the search results and only show my website instead?
Thanks.
Some thoughts:
You cannot directly stop Google from indexing other sites, but what you could do is add the cannonical tag to your pages so Google can see that the original content is located on your domain and not "bad domain".
For example check out : https://support.google.com/webmasters/answer/139394?hl=en
Other actions can be taken SEO wise if the 'baddomain' is outscoring you in the search rankings, because then it sounds like your site could use some optimizing.
The better your site and domain rank in the SERPs, the less likely it is that people will see the scraped content and 'baddomain'.
You could however also look at the referrer for the request and if it is 'bad domain' you should be able to do a redirect to your own domain, change content etc, because the code is being run from your own server.
But that might be more trouble than it's worth as you'd need to investigate how the 'baddomain' is doing things and code accordingly. (properly iframe or similar from what you describe, but that can still be circumvented using scripts).
Depending on what country you and 'baddomain' are located in, there are also legal actions. So called DMCA complaints. This however can also be quite a task, and well - it's often not worth it because a new domain will just pop up.
I'm looking to move an existing website to Google Cloud Storage. However, that existing website has changed its URL structure a few times in the past. These changes are currently handled by Apache: for example, the URL /days/000233.html redirects to /days/new-post-name and /days/new-post-name redirects to /days/2002/01/01/new-post-name. Similarly, /index.rss redirects to /feed.xml, and so on.
Is there a way of marking an object in GCS so that it acts as a "symlink" to another GCS object in the same bucket? That is, when I add website configuration to a bucket, requesting an object (ideally) generates a 301 redirect header to a different object, or (less ideally) serves the content of the other object as its own?
I don't want to simply duplicate the object at each URL, because that would triple my storage space. I also can't use meta refresh headers inside the object content, because some of the redirected objects are not HTML documents (they are images, or RSS feeds). For similar reasons, I can't handle this inside the NotFound 404.html with JavaScript.
Unfortunately, symlink functionality is currently not supported by Google Cloud Storage. It's a good idea though and worth considering as a future feature.
We need a CDN (like Rackspace Files or AWS) that allows us to set up 301 redirects to old files.
E.g. if we decide to delete http://cdn.example.com/mportant-case-study.pdf, we'll want to redirect that old asset to our case studies page, http://www.example.com/case-studies/. Or maybe, we noticed there was a typo in the original file, we already shared it in an email and via Twitter, then we would redirect it to http://cdn.example.com/important-case-study.pdf (notice the "i" isn't forgotten this time).
So the question, in case it wasn't already clear: What CDNs offer this?
Do you want the CDN to get the redirects from your origin servers or not?
Akamai can cache redirects from your origin. Alternately you can modify your Akamai configuration to generate the redirects at the edge.
If you have a lot of redirects it's probably easier to generate them from your origin, otherwise you end up modifying the Akamai config every time you need to add/change/delete redirects. This takes time.
Here's an example where the redirect is generated at the origin, but cached by the edge server (I've inserted a header called "FakeDate" from the origin so I can prove the redirect is cached:
http://cdn1.lapthorn.com/testing/redirect-cached.php
Or here's a redirect generated by the edge server which will never come back to the origin:
http://cdn1.lapthorn.com/microsoft-test-redirect
You can go direct to the origin at origin-www.lapthorn.com to see what headers I'm sending.