Varnish 3.0.3 req.hash_always_miss vs Vary - hash

I'm trying to build a system that can purge and regenerate URLs as required for a particular system. I previously was having issues with purging when the system located the object by hash but missed the variant as I didn't have a "purge;" in my vcl_miss (only in my vcl_hit, some guides/example vcl files do not mention this need but the main documentation does here).
What I'm trying to figure-out is if I need to do something similar for a REGEN call. From my understanding, "set req.hash_always_miss = true;" will mean that the old hash is missed and a new hash object is generated. Subsequent calls will find the new hash, but may still miss that object if there is not an appropriate variant in the cache.
Could someone confirm for me whether a subsequent request missing the variant in the new object will lead directly to a cache miss and fetch, rather than finding any of the variants from the previous object?

hash_always_miss will only influence the current/ongoing request and the cache contents that it replaces. A fetch will always happen, and the object will be put into the cache using the same rules as any other miss/fetch sequence.
The "old" other variants of the same hash are still valid objects and will be served to a client indicating request headers matching the varied headers.
hash_always_miss will replace the current variant, and nothing else.
To answer your question, the second part of your sentence is most correct.

Related

Read YAML config through Rest API

I have a really complicated system which use multiple languages and frameworks (Java Python Scala Bash). In each module I need to retrieve configuration values which are similar and change frequently. Currently I'm maintaining multiple conf files which holds lots of duplicates.
I wonder if there is out of the box RestAPI which can retrieve variables by demand from remote location.
All I manage to find by now are ways to load the entire file from remote source which is half a solution from me:
YAML.parse(open('https://link_to_file/file.yaml'))
My goal, which I fail to find a lead to it, is to make a direct call.
MyRemoteAPI.get("level1.level2.x")
P.S
YAML is not mandatory solution for me, I'm Open for suggestions.
I don't know about an out-of-the-box API, but it's fairly trivial to build. Make a service that will read the YAML file and traverse to the appropriate key. e.g. using a dynamic language like Ruby (+Rails), you could do something like
def value
config = YAML.load_file '/local/path/to/config.yaml'
render plain: config.dig(params[:key].split('.'))
end
dig essentially traverses a structure and safely returns nil if a key isn't found, so this returns the value at the "leaf" of the requested path.
You might also want to cache the structure in memory to prevent constantly reading from the file, e.g. could do something like ##config ||= YAML.parse(open('https://link_to_file/file.yaml')) or config = Rails.cache.fetch('config', expire_in: 1.hour) { ... }. And/or cache the API's HTTP response.

is there a simple way to determine 'last' PDF signature and lock it?

I am reading carefully the Digital Signature white paper and ITEXT IN ACTION: CHAPTER 12: PROTECTING YOUR PDF..
I have successfully added multiple signatures in append mode to a source PDF, and I have client who will add 2 or 3, or 4 signatures as a method of approving a source as a change management document.
Question:
Is there a way to treat the 'last' chosen signature as somehow final? We will be already using the field name as the signing persons Id, the Location as the persistent Id of the signing machine, and the reason as well the reason for signing.
This is for internal purposes so are OK with using the computers clock, and at the moment the only method I have come up with is to sign all detached signatures as CMS, except the last as CADES - so that if the last signature in the current file is ETSI rather than ADBE, then I will not allow more signatures. This feels however not very elegant, and if the starting PDF has a validated timestamp then this basic methodology will fail. It also relies on text parsing which also feels a little flimsy.
I have read the section on attaching actions but this seems a huge hammer to crack what should, in theory at least, be a much simpler exercise.
Did you get a chance to read 2.5.5 Locking fields and documents after signing?
In this case, the dictionary defining the signature field has a /Lock entry of which the value is a signature lock dictionary. One of the lock permissions could be LockPermissions.NO_CHANGES_ALLOWED.
The result would then be what you can see in figure 2.31 (locked fields after final approval). In this screen shot, you can see that sig4 locks the document.

RESTful archiving of entities in WebAPI

I've implemented CRUD functionality pretty restfully in my WebAPI project. I'm now trying to implement Archiving of objects (not quite deleting) - if only there were an ARCHIVE HTTP method.
I see two options:
1) Have isArchived as a property of every archive-able entity, which must be included in PUT and POST requests even if archiving isn't relevant to the request. Archiving an entity would be a matter of calling PUT /api/object/id with isArchived set to true. Seems bulky on the wire but restful.
2) Have an RPC-ish url like PUT /api/object/id/archive that doesn't require a body. Seems the most efficient but not restful.
What's everyone doing in the "archive my stuff via an api call" space?
This is an excellent question, but I suspect it may eventually be marked as opinionated because I don't see a correct answer... as the OP also stated.
I would recommend treating the archive as a separate object store (or, even, different object types), if that makes sense for your system. Object design should not depend upon how the DB persists your data.
Thus, this is the most RESTful design I can come up with right now (assuming archiving and updating are always separate actions -- which they should be):
Typical (everybody knows this):
GET /api/object get all current objects
POST /api/object new current object
PUT /api/object/id update current object
DELETE /api/object/id delete current object
GET /api/object/id get current object
The weirdness:
POST /api/object/id/archive move object to archive (makes some REST sense)
POST /api/object/id move object from archive (muddy)
The archive:
GET /api/object/archive get all archive objects
PUT /api/object/id/archive update archive object (if possible)
DELETE /api/object/id/archive delete archive object (tempting for unarchive)
GET /api/object/id/archive get archive object
Or, maybe one of these mods for archive URLs:
GET /api/object/archive/id get archive object
GET /api/objectarchive/id get archive object
But......
The above feels pretty muddy (not very self-documenting) for moving objects in and out of the archive. It also leads to some REST API design pain where update/delete/get of an archived object probably don't need archive-specific functions. Thus, I ultimately settled on this:
GET /api/object get all objects
GET /api/object?archived=false get all current objects
GET /api/object?archived=true get all archive objects
POST /api/object new current object, returns all current objects*
PUT /api/object/id update object (current or archived; cannot change archive state)
DELETE /api/object/id delete object (current or archived), returns objects of same archive state as deleted*
GET /api/object/id get object (current or archived)*
PUT /api/object/id/archive body:{archived:true} move object to archive, returns all current objects*
PUT /api/object/id/archive body:{archived:false} move object from archive, returns all archive objects*
* Return could be expanded/overridden with a query string if design calls for it.
Admittedly, this is mostly a reversal from my earlier statement of treating the archive as a separate object store. Yet, that thought process is what ultimately led to this compromise in design. This feels good to me on most fronts.
I, personally, don't agree with using the query string for anything but... uh... queries. So, I don't. Payload for data changes -- no matter how small -- should go in the body (when it doesn't fit with a REST verb and URL, that is).
If you always archive a particular resource and never delete it, I would repurpose DELETE to actually archive. If you really need to differentiate between delete and archive, I would either do
GET /foo/33
200 OK
<foo id="33">blah</foo>
POST /archive
<foo id="33">blah</foo>
201 Created
Location: http://example.org/archive/foo/33
or just
POST /archive?target=http://example.org/foo/33
201 Created
Location: http://example.org/archive/foo/33
I'd use the /api/object/id?archive=true approach.
But, as to whether you should use PUT or POST depends. If you use PUT, any subsequent calls to the same URL would not change anything about the resource. If you use POST, the implementer expects that any subsequent calls to that URL will indeed change the state. (Don't ask me how, I'm assuming that you will use the PUT verb on this one.)
This is due to the fact that PUT operations should be idempotent. See section 9.1.2 here: http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
I would probably use the /api/object/id endpoint with a query parameter, so it looks something like /api/object/id?isArchived=true. You can still use whatever HTTP verb you were using.

Rest Standard: Path parameters or Request parameters

I am creating a new REST service.
What is the standard for passing parameters to REST services. From different REST implementations in Java, you can configure parameters as part of the path or as request parameters. For example,
Path parameters
http://www.rest.services.com/item/b
Request parameters
http://www.rest.services.com/get?item=b
Does anyone know what the advantages/disadvantages for each method of passing parameters. It seems that passing the parameters as part of the path seems to coincide better with the notion of the REST protocol. That is, a single location signifies a unique response, correct?
Paths tend to be cached, parameters tend to not be, as a general rule.
So...
GET /customers/bob
vs
GET /customers?name=bob
The first is more likely to be cached (assuming proper headers, etc.) whereas the latter is likely not to be cached.
tl;dr: You might want both.
Item #42 exists:
GET /items/42
Accept: application/vnd.foo.item+json
--> 200 OK
{
"id": 42,
"bar": "baz"
}
GET /items?id=42
Accept: application/vnd.foo.item-list+json
--> 200 OK
[
{
"id": 42,
"bar": "baz"
}
]
Item #99 doesn't exist:
GET /items/99
Accept: application/vnd.foo.item+json
--> 404 Not Found
GET /items?id=99
Accept: application/vnd.foo.item-list+json
--> 200 OK
[
]
Explanations & comments
/items/{id} returns an item while /items?id={id} returns an item-list.
Even if there is only a single element in a filtered item-list, a list of a single element is still returned for consistency (as opposed to the element itself).
It just so happens that id is a unique property. If we were to filter on other properties, this would still work in exactly the same way.
Elements of a collection resource can only be named using unique properties (e.g. keys as a subresource of the collection) for obvious reasons (they're normal resources and URIs uniquely identify resources).
If the element is not found when using a filter, the response is still OK and still contains a list (albeit empty). Just because we're requesting a filtered list containing an item that doesn't exist doesn't mean the list itself doesn't exist.
Because they're so different and independently useful, you might want both. The client will want to differentiate between all cases (e.g. whether the list is empty or the list itself doesn't exist, in which case you should return a 404 for /items?...).
Disclaimer: This approach is by no means "standard". It makes so much sense to me though that I felt like sharing.
PS: Naming the item collection "get" is a code smell; prefer "items" or similar.
Your second example of "request parameters" is not correct because "get" is included as part of the path. GET is the request type, it should not be part of the path.
There are 4 main types of requests:
GET
PUT
POST
DELETE
GET requests should always be able to be completed without any information in the request body. Additionally, GET requests should be "safe", meaning that no significant data is modified by the request.
Besides the caching concern mentioned above, parameters in the URL path would tend to be required and/or expected because they are also part of your routing, whereas parameters passed in the query string are more variable and don't affect which part of your application the request is routed to. Although could potentially also pass a variable length set of parameters through the url:
GET somedomain.com/states/Virginia,California,Mississippi/
A good book to read as a primer on this topic is "Restful Web Services". Though I will warn you to be prepared to skim over some redundant information.
I think it depends. One URL for one resource. If you want to receive that resource in a slightly different way, give it a query string. But for a value that would deliver a different resource, put it in the path.
So in your example, the variable's value is directly related to the resource being returned. So it makes more sense in the path.
The first variation is a little cleaner, and allows you to reserve the request parameters for things like sort order and page, as in
http://www.rest.services.com/items/b?sort=ascending;page=6
This is a great fundamental question. I've recently come to the conclusion to stay away from using path parameters. They lead to ambiguous resource resolution. The URL is a basically the 'method name' of a piece of code running somewhere on a server. I prefer not to mix variable names with method names. The name of your method is apparently 'customer' (which IMHO is a rotten name for a method but REST folks love this pattern). The parameter you're passing to this method is the name of the customer. A query parameter works well for that, and this resource and query-parameter value can even be cached if desired.
There is no physical IT customer resource. There is likely no file on disk under a customer folder that's named after the customer. This is a web-service that performs some kind of database transaction. The 'resource' is your service, not the customer.
This obsession over REST and web-verbs reminds me of the early days of Object Oriented programming where we attempted to cram our code into virtual representations of physical objects. Then we realized that objects are usually virtual concepts in a system. OO is still useful when done the right way. REST is also useful if you realize that RESTful resources are services, not objects.

Is there a working example of simple Net::OpenID::Consumer::Lite CGI script?

I have seen the examples of Net::OpenID::Consumer::Lite on CPAN but I was hoping to get a single script that uses POST method. If nobody has this than I will post my solution back here once I get it working.
This seems to be the only applicable test in the manifest and it doesn't seem too useful
http://cpansearch.perl.org/src/TOKUHIROM/Net-OpenID-Consumer-Lite-0.02/xt/001_mixi.t Apache2::AuthMixi also uses it a bit
This module simply delegates to LWP::UserAgent. I don't like it, it should subclass LWP::UserAgent instead of delegating. You can find the docs for LWP::UserAgent on cpan, and access the base copy through the hidden method _ua (though, by convention, the preceding underscore tells you it isn't supported and it is supposed to be kept secret)
my $csr = Net::OpenID::Consumer::Lite->new();
$csr->_ua->post(); # same as LWP::UserAgent::post()
It seems as if you're supposed to use only handle_server_response() which calls _check_authentication() which calls _get() which delegates to ->_ua->get().
check_authentication() wants an HashRef jump-table with 5 events for not_openid, setup_required, cancelled, verified, and error. In addition I believe it wants a bunch of key (openid.) prefixed stuff, and values.
Per the code, for a request to be sent $request->{'openid.mode'} must exist in the $request and be set (preferably to) check_authentication, and not set to 'cancel'. The openid.user_setup_url key must logically not be set or it will just call the respective callback. It must also have an op_endpoint.endpoint key set, which is where the request is destined to go.
This code isn't hard to read, I'd suggesting taking a look The author also seems to have a bunch of modules which is a good sign. I don't like jump-tables with data like that, it seems kind of weird from a UI perspective.