Redirect S3 subfolder to another domain with Cloudfront - redirect

I have a static showcase website hosted on S3 and using CloudFront, and an online shop (Prestashop) and a blog (Wordpress), both hosted on OVH servers.
I want to make a hidden redirection on two subfolders of my static website so it acts like my 3 websites are on the same host, using the following pattern :
mysite.com/ --> normal behaviour
mysite.com/blog/ --> myblog.com/
mysite.com/store/ --> mystore.com/
Of course, I need every request to be handled that way, eventually having something like that :
mysite.com/store/fr/1-myproduct.html
returns what
mystore.com/fr/1-myproduct.html
would have returned.
This seems really tricky, since I've found no real solution to my problem, and at this point I doubt it may even be possible to do such a thing.
I considered using a proxy but wouldn't that be like using a sledgehammer to get rid of a fly ?
I have searched for any possible redirection and I was only able to find subdomain/domain redirections...
So my question would be "How can I do that ?"
But right now I'm wondering "Can one do that ?"
P.S : It's my first post ever, I'm used to search for a long time before posting and I always end up finding a solution, except for now. Any suggestion is welcome.

I'll check about proxies since it's my last hope
Wait.
I have a static showcase website hosted on S3 and using CloudFront
CloudFront is a reverse proxy.
Depending on how much flexibility you have with the other two sites, CloudFront can potentially take you where you want to go, combining multiple independent sites under one hostname.
This is done by creating additional origin servers for your distributions and then creating additional cache behaviors, with path patterns matching the additonal paths, such as /blog and /blog/* that send requests to the alternate origins.
There is, however, a catch. CloudFront can't remove the matched pattern, so mainsite.example.com/blog/hello-world, matching the pattern /blog/* will be forwarded to blog.example.com/blog/hello-world -- not to blog.example.com/hello-world.¹ This will require changes to the other sites in order to integrate them in this way.
Unless...
If you already have unique path patterns, no problem, but if the extra sites' content is in the root of each individual site, you see the issue, here. Not insurmoubtable, but still an issue.
Your only alternative will be a reverse proxy behind CloudFront to rewrite those paths and send the requests on to the alternate servers. Truly not a big deal either, since HAProxy, Nginx, and Varnish all offer such functionality and can handle a large number of proxied requests on surprisingly small hardware.
The recently (2017) released Lambda#Edge service allows you to rewrite paths on the fly, as requests are processed, if necessary.
But the bottom line is that the reason you have not found a real solution other than a proxy is that there is no alternative -- every path at a given hostname must be handled in one logical place -- one group of one or more identically-configured endpoints. In the case of CloudFront, the logical place is physically distributed globally.
¹ CloudFront, natively, can actually prepend onto the path before forwarding the request, so requests for mainsite.example.com/bar/fizz can be forwarded to foosite.example.com/foo/bar/fizz by setting the origin path to /foo when you configure the origin. But it can't remove path parts or otherwise modify the path without also using Lambda#Edge. In the scenario discussed above, you would leave the origin path blank when configuring the additional origin servers.

Single S3 bucket with the following behavior :
domain.com-> serves the files from root of bucket
domain.com/blog -> serves the files from subfolder in S3 bucket (this is not default behavior)
How to :
https://aws.amazon.com/ru/blogs/compute/implementing-default-directory-indexes-in-amazon-s3-backed-amazon-cloudfront-origins-using-lambdaedge/
Lambda edge code:
'use strict';
exports.handler = (event, context, callback) => {
// Extract the request from the CloudFront event that is sent to Lambda#Edge
var request = event.Records[0].cf.request;
// Extract the URI from the request
var olduri = request.uri;
// Match any '/' that occurs at the end of a URI. Replace it with a default index
var newuri = olduri.replace(/\/$/, '\/index.html');
// Log the URI as received by CloudFront and the new URI to be used to fetch from origin
console.log("Old URI: " + olduri);
console.log("New URI: " + newuri);
// Replace the received URI with the URI that includes the index page
request.uri = newuri;
// Return to CloudFront
return callback(null, request);
};
Summary of code higher :
lambda edge rewrites the path "/blog/" to "/blog/index.html"

Related

REST API allow update of resource depending on state of resource

I have recently read the guide on implementing RESTful API's in Spring Boot from the official Spring.io tutorials website (link to tutorial: https://spring.io/guides/tutorials/rest/)
However, something in the guide seemed to contradict my understanding of how REST API's should be built. I am now wondering if my understanding is wrong or if the guide is not of as high a quality as I expected it to be.
My problem is with this implementation of a PUT method to update the status of an order:
#PutMapping("/orders/{id}/complete")
ResponseEntity<?> complete(#PathVariable Long id) {
Order order = orderRepository.findById(id) //
.orElseThrow(() -> new OrderNotFoundException(id));
if (order.getStatus() == Status.IN_PROGRESS) {
order.setStatus(Status.COMPLETED);
return ResponseEntity.ok(assembler.toModel(orderRepository.save(order)));
}
return ResponseEntity //
.status(HttpStatus.METHOD_NOT_ALLOWED) //
.header(HttpHeaders.CONTENT_TYPE, MediaTypes.HTTP_PROBLEM_DETAILS_JSON_VALUE) //
.body(Problem.create() //
.withTitle("Method not allowed") //
.withDetail("You can't complete an order that is in the " + order.getStatus() + " status"));
}
From what I read at https://restfulapi.net/rest-put-vs-post/ a PUT method should be idempotent; meaning that you should be able to call it multiple times in a row without it causing problems. However, in this implementation only the first PUT request would have an effect and all further PUT requests to the same resource would result in an error message.
Is this okay according to RESTful API's? If not, what would be a better method to use? I don't think POST would be any better.
Also, in the same guide, they use the DELETE method in a similar way to change the status of an order to cancelled:
#DeleteMapping("/orders/{id}/cancel")
ResponseEntity<?> cancel(#PathVariable Long id) {
Order order = orderRepository.findById(id) //
.orElseThrow(() -> new OrderNotFoundException(id));
if (order.getStatus() == Status.IN_PROGRESS) {
order.setStatus(Status.CANCELLED);
return ResponseEntity.ok(assembler.toModel(orderRepository.save(order)));
}
return ResponseEntity //
.status(HttpStatus.METHOD_NOT_ALLOWED) //
.header(HttpHeaders.CONTENT_TYPE, MediaTypes.HTTP_PROBLEM_DETAILS_JSON_VALUE) //
.body(Problem.create() //
.withTitle("Method not allowed") //
.withDetail("You can't cancel an order that is in the " + order.getStatus() + " status"));
}
This looks very wrong to me. We are not deleting anything here, it is basically the same as the previous PUT method just with a different state we want to move to. Am I correct to assume that this part of the tutorial is bogus?
TL;DR: what HTTP method is right to use when you want to advance the status of a resource to the next stage without giving an option of going back to an earlier stage? Basically an update/patch that will invalidate its own pre-conditions.
something in the guide seemed to contradict my understanding of how REST API's should be built. I am now wondering if my understanding is wrong or if the guide is not of as high a quality as I expected it to be.
I wouldn't consider this guide to be a reliable authority - the described resource model has some very questionable choices.
From what I read at https://restfulapi.net/rest-put-vs-post/ a PUT method should be idempotent; meaning that you should be able to call it multiple times in a row without it causing problems. However, in this implementation only the first PUT request would have an effect and all further PUT requests to the same resource would result in an error message.
The authoritative definition of idempotent semantics in HTTP is currently RFC 7231.
A request method is considered "idempotent" if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request.
Note: "effect", not "response".
PUT /orders/12345/complete
means "please replace the current representation of /orders/12345/complete with the representation in the payload". In other words "save this file on top of your current copy". Saving the same file two or three times in row produces the same effect as saving the file once, so that's "idempotent".
HTTP does not define exactly how a PUT method affects the state of an origin server beyond what can be expressed by the intent of the user agent request and the semantics of the origin server response. It does not define what a resource might be, in any sense of that word, beyond the interface provided via HTTP. It does not define how resource state is "stored", nor how such storage might change as a result of a change in resource state, nor how the origin server translates resource state into representations. Generally speaking, all implementation details behind the resource interface are intentionally hidden by the server. -- RFC 7231
So in their CURL example
PUT /orders/4/complete HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.54.0
Accept: */*
The meaning of this message is "replace the current representation of /orders/4/complete with an empty representation". But the origin server gets to choose how to do that, and which standardized responses to return to the client.
So this is fine.
All work is transacted by politely placing documents in in-trays, and then some side effect of placing that document in an in-tray causes some business activity to occur -- Jim Webber, 2011.
In this case, the document we are putting into the "in-tray" happens to be blank.
#DeleteMapping("/orders/{id}/cancel")
I would never approve that choice in a code review. DELETE (like PUT) has semantics in the "transfer of documents over a network domain".
The DELETE method requests that the origin server remove the association between the target resource and its current functionality. In effect, this method is similar to the rm command in UNIX: it expresses a deletion operation on the URI mapping of the origin server rather than an expectation that the previously associated information be deleted.
Trying to hijack the method because the spelling is kind of like the domain action is the wrong heuristic to use in choosing methods.
Relatively few resources allow the DELETE method -- its primary use is for remote authoring environments, where the user has some direction regarding its effect.
The point being that we have a general purpose document manipulation interface, and we are using that interface as a facade that allows us to drive business activity. So we should be using our standardized message semantics the same way every other page on the web does.
#PutMapping would be defensible, using the same justification as we did for /complete.
what HTTP method is right to use when you want to advance the status of a resource to the next stage without giving an option of going back to an earlier stage? Basically an update/patch that will invalidate its own pre-conditions.
PUT, PATCH, and POST are all appropriate methods to use when editing the representation of a resource. Use PUT or PATCH when you are sending a replacement representation for the resource, use POST when you are asking the server to calculate what the edit to the representation should be.

What is the "Restful" way to command a server?

I have a REST endpoint to create an application configuration as such
POST /applications
With a body
{
"appName" : "my-new-app"
}
I returns a newly created application configuration:
{
"appName": "my-new-app",
"appId": "2ed17ff700664dad9bb32e400d39dc68",
"apiKey": "$2a$10$XVDH9F3Ix4lx2LdxeJ4ZOe7H.bw/Me5qAmaIGF.95lUgkerfTG7NW",
"masterKey": "$2a$10$XVDH9F3Ix4lx2LdxeJ4ZOeSZLR1hVSXk2We/DqQahyOFFY6nOfbHS",
"dateCreated": "2021-03-28T11:00:07.340+00:00",
"dateUpdated": "2021-03-28T11:00:07.340+00:00"
}
Note: The keys are auto-generated in the server and not passed from the client.
My question here is, what's the RESTful way to command the server to reset the keys for example:
PUT /applications/my-new-app/update_keys is not noun-based and thus, not restful, also passing a command as query parameter does not also seem to be restful since this is not a GET method rather it's a PUT (update) method.
Here's one way to send a command that is as much as possible RESTful:
Endpoint:
POST /application/:appName/actions
Example Payload:
{
"actions" : [
{
"action" : "name_of_command",
"arguments" : {
"arg1" : "param1"
}
},
{
"action" : "reset_keys",
"arguments" : {
}
}
]
}
Actions would be nouns that are part of the endpoint, and the server will process actions that are submitted (or posted) within the endpoint. And an array of actions would be best suited to allow multiple actions to be sent. And each action having arguments would also be desirable for future actions that would need arguments.
what's the RESTful way to command the server to reset the keys for example:
How would you do it with a web site?
You would be looking at some web page like /www/applications/my-new-app; within the data or the metadata you would find a link. Following that link would bring you to a form; the form would have input controls describing what fields you need to provide to send the message, in addition to any "hidden" inputs. When you click the submit button, your user agent would collect your inputs, construct from them the appropriate message body, then use the form metadata to determine what request method and uri to use.
The client never has to guess what URI to use, because the server is providing links to guide the way.
Hypertext is at the heart of the uniform interface
REST is defined by four interface constraints: identification of resources; manipulation of resources through representations; self-descriptive messages; and, hypermedia as the engine of application state.
Because the server is providing the URI for each of the links, you've got some freedom ot choose which resource "handles" which message.
One interesting way to resolve this to look at HTTP's rules for cache invalidation. The short version is that successful unsafe requests (PATCH/POST/PUT) invalidate the representations of the target-uri.
In other words, we take advantage of cache-invalidation by sending the command to the resource that we are trying to change.
So, assuming that retrieving the representation of the app occurred via a request like:
GET /applications/my-new-app HTTP/x.y
Then we would ask the server to change that resource by sending a request with that same target-uri. Something analogous to:
POST /applications/my-new-app HTTP/x.y
Content-Type: text/plain
Please rotate the keys
Form submissions on the web are usually a representation of key/value pairs, so a more likely spelling would be:
POST /applications/my-new-app HTTP/x.y
Content-Type: applications/x-www-form-urlencoded
action=Please%20rotate%20the%20keys
Your form that describes this request my have an "action" input control, that accepts text from the client, or more likely in this case action would be a hidden control with a pre-defined value.
Note: if we have multiple actions that should invalidate the /applications/my-new-app representations, we would probably use POST for all of them, and resolve the ambiguity at the server based on the request-body (if our routing framework gives us the degree of control we need, we can use that - but more common would be to have a single POST handler for each Content-Type, and parse the request body "by hand".
POST serves many useful purposes in HTTP, including the general purpose of “this action isn’t worth standardizing.” -- Fielding 2009
PUT /applications/my-new-app/update_keys is not noun-based and thus, not restful,
That's not true: REST doesn't care what spelling conventions you use for your resource identifiers. For example
https://www.merriam-webster.com/dictionary/get
https://www.merriam-webster.com/dictionary/post
https://www.merriam-webster.com/dictionary/put
https://www.merriam-webster.com/dictionary/update
These all work fine, just like every other resource on the web.
You absolutely can design your resource model so that editing the update_keys document also modifies the my-new-app document.
The potential difficulty is that general purpose components are not going to know what is going on. HTTP PUT means "update the representation of the target resource", and every general purpose component knows that; the origin server is allowed to modify other resources as a consequence of the changes to the "update-keys" resource.
But we don't have a great language for communicating the general purpose components all of the side effects that may have happened. Without some special magic, previously cached copies of my-new-app, with the original, unrotated, keys, will be left lying around. So the client may be left with a stale copy of the document that describes the app.
(An example of "some special magic" would be Linked Cache Invalidation, which affords describing caching relationships between resources using web linking. Unforunately, LCI has not been adopted as a standard, and you won't find the described link relations in the IANA registry.)

How can I use an Azure Front Door Rules Engine match condition to only match requests to the root of a site?

I'm trying to set up a set of rules on my Azure Front door to redirect all requests to the root of a site to a set of language based subfolders based on the location match of the incoming request.
Doing the Geo-location part is fairly straightforward, but I'm not having much success limiting the requests to only the root of the site - or at least when I try to do so, my rules don't appear to match and I don't get the redirect I'm expecting.
I've tried setting the above conditions:
IF "Request Path" EQUAL "/"
AND IF "Remote address" "Geo Match" "Switzerland, CH"
THEN "Routing Configuration" "Redirect" "307"
Host: Preserve;
Destination Path: Replace: "/de-ch/"
However I don't appear to be getting the redirect when requesting the root of the site from a browser based in Switzerland.
I can't find any actual examples for using the Rules Engine with either Path or matching, so I'm wondering if I should be using "Request URL" (and therefore I'll need to put the scheme and host in there, which is less than ideal as ruleset may be working with multiple front end hosts), or should what I'm doing work?
The "Request Path" match condition appears to match on the path after the initial /, for example given a request for:
https://www.example.com/folder/page.html
The following values are used in the match conditions:
Request Path: folder/page.html
Request URL: https://www.example.com/page.html
Request File Extension: html
Request Filename: page.html
I therefore had to use the Request URL condition and limit my rules to the specific domain in the request to ensure that we were only matching the root requests.
I have not tried specifying an operator of Not Any yet, although that could also be a solution (we needed more that 25 rules, which is a further limitation, so ended up using a different solution).
Zhaph said they have not tried the Not Any operator at the time of writing.
I've just used it and I can confirm Not Any works for matching just the root of the domain/subdomain. Definitely takes the hassle out of creating multiple match conditions on Request URL.

Rest API designing PUT vs PATCH

I am developing 2 REST APIs which edits and pause something at my backend.
For editing I was using:
PUT /video/1
What is the best way to develop a pause video service. Should I use PATCH or PUT for this? Input would be just the id. If I use PUT then how can differentiate between edit and pause? And if I have another API to be developed for eg: video restart how can I accommodate these verbs in REST API?
Distinguishing the state using the HTTP method only is a poor idea. What you can is to:
Introduce state, and then use PATCH to change the state:
PATCH /vidoes/1
{
"state": "PLAYING|PAUSED|STOPPED" // what you need here
}
Mind don't patch like an idiot, however it is common to patch like an idiot.
Introduce new endpoints that will reflect the operation invoked on the resource - this is not fully RESTful, however also common:
POST /vidoes/1/play/
POST /vidoes/1/stop/
POST /vidoes/1/pause/
PUT for editing is ok of course, however remember that PUT is idempotent and requires the resource to be sent.
I do not agree with #Opal's answer here hence I post this answer. I do feel you use the wrong tools (or terms) to achieve what yo want. REST is more then just a HTTP invocation via a cleanly designed URI. As proposed by #Opal in a comment on his answer, WebSockets might be what you are looking for, though REST may be able to server your needs as well (as plain HTTP would do either).
Pausing a video
It should not be the task of the HTTP server to stop the video but the client. Usually partial GET requests are sent to the server retrieving only a portion of the resource and adding them to a buffer which the client reads. In the back the client site will issue further partial requests to keep the buffer filled while the client is reading it. If the client wants to pause, it simply stops reading the buffer and optionally stop sending further partial GET requests to the server.
This allows to spread the actual video onto mutliple servers and let the client talk to any of these and still get the correct responses. If the server has to maintain the client state, you need to ensure that the state is also replicated to all the other serving nodes. Sure, this is possible but also combined with higher overhead!
Updating videos
As you obviously create a video-editing system you have two options here as also suggested by the PUT definiton:
Partial content updates are possible by targeting a separately identified resource with state that overlaps a portion of the larger resource, or by using a different method that has been specifically defined for partial updates (for example, the PATCH method defined in RFC5789).
Separate the resource into smaller resources
Use an other method like PATCH
As already pointed out by #Opal in his answer, in case when you use PATCH to partially update a resource you should not only provide the new content within the body but also instruct the server what is should do with it.
The separation into smaller resources however does feel more natural to me for a video-editing system though. A video can be seen as a sequence of scenes which consist of numerous pictures and maybe an attached soundfile.
A movie therefore could be represented like this in pseudo Json-HAL:
Movie : {
title: The Matrix,
release_year: 1999,
actors: [Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, Joe Pantoliano],
...
link: {
self: http://...,
...
},
embedded: {
Scenes : [
{
description: Trinity chased by police,
links: [
self: http://...,
video: http://.../scene01.vid
]
},
{
description: Thomas Anderson get notified to follow the white rabbit,
start_offset: 5091,
end_offset: 193920,
links: [
self: http://...,
video: http://.../scene02.vid
]
},
...
]
}
}
Instead of having all the bytes in one file you could maintain each scene separately. The movie representation combines the scenes to a full movie if played from scene 1 to scene n.
If now one scene is edited and the whole scene file should be replaced, using a simple PUT request is enough. If you want to trim the first or last few seconds off the video, you could introduce a start and stop offset for the respective scene and instead of reuploading the full scene again, you tell the client that it should start at the suggested offest or stop at the suggested position.
The client can use this parameters in the partial GET request to retrieve only the necessary bytes. This fields should then of course be modified via a PATCH command in order to prevent altering the video bytes or its URI. In order for a client to learn the total bytes of a video it can issue a HEAD request first to the URI and use the content length returned from the response
This, of course, screems for its own media-type, but this is what REST is actually all about. I don't know why so many misuse the REST-term for plain URI-design or think that a neat URI-API is more RESTful when REST doesn't care much about the URI layout actually.

Is it possible to use wildcards or catch-all paths in AWS API Gateway

I am trying to redirect all traffic for one domain to another. Rather than running a server specifically for this job I was trying to use AWS API Gateway with lambda to perform the redirect.
I have this working ok for the root path "/" but any requests for sub-paths e.g. /a are not handled. Is there a way to define a "catch all" resource or wildcard path handler?
As of last week, API Gateway now supports what they call “Catch-all Path Variables”.
Full details and a walk-through here: API Gateway Update – New Features Simplify API Development
You can create a resource with path like /{thepath+}. Plus sign is important.
Then in your lambda function you can access the value with both
event.path - always contains the full path
or event.pathParameters.thepath - contains the part defined by you. Other possible use case: define resource like /images/{imagepath+} to only match pathes with certain prefix. The variable will contain only the subpath.
You can debug all the values passed to your function with: JSON.stringify(event)
Full documentation
Update: As of last week, API Gateway now supports what they call “Catch-all Path Variables”. See API Gateway Update – New Features Simplify API Development.
You will need to create a resource for each level unfortunately. The reason for this is API Gateway allows you to access those params via an object.
For example: method.request.path.XXXX
So if you did just /{param} you could access that with: method.request.path.param but if you had a nested path (params with slashes), it wouldn't work. You'd also get a 404 for the entire request.
If method.request.path.param was an array instead...then it could get params by position when not named. For example method.request.path.param[] ...Named params could even be handled under there, but accessing them wouldn't really be easy. It would require using something some sort of JSON path mapping (think like what you can do with their mapping templates). Sadly this is not how it's handled in API Gateway.
I think it's ok though because this might make configuring API Gateway even more complex. However, it does also limit API Gateway and to handle this situation you will ultimately end up with a more confusing configuration anyway.
So, you can go the long way here. Create the same method for multiple resources and do something like: /{1}/{2}/{3}/{4}/{5}/{6}/{7} and so on. Then you can handle each path parameter level if need be.
IF the number of parameters is always the same, then you're a bit luckier and only need to set up a bunch of resources, but one method at the end.
source: https://forums.aws.amazon.com/thread.jspa?messageID=689700&#689700
Related to HTTPAPI that AWS introduced recently, $default is used a wildcard for catching all routes that don't match a defined pattern.
For more details, refer to: aws blogs
You can create a resource with path variable /{param}, and you can treat this as wildcard path handler.
Thanks,
- Ka Hou