Caching associations in active model serializers - memcached

Currently, I have the following serializers:
class UserSerializer < ActiveModel::Serializer
cached
delegate :cache_key, to: :object
has_many :profiles
end
class ProfileSerializer < ActiveModel::Serializer
cached
delegate :cache_key, to: :object
has_many :pages
end
class PageSerializer < ActiveModel::Serializer
cached
delegate :cache_key, to: :object
has_many :posts
end
For various reasons, I need to serialize all of the associated profiles, pages, and posts when serializing my User model. Unfortunately, this results in a rather large JSON hash that is difficult to cache efficiently - my local memcached server can store only around 75 serialized users. Is there a way to set up the serializers so that instead of caching the output of the entire user model JSON, I only cache the unique parts of the JSON and commit another cache fetch to retrieve the serialized data for the associated profiles, pages, and posts?

I don't think there's a way to do that within ActiveModel::Serializers. You could reduce the cache size at the cost of complicating your code. Something like this would work:
user_json = UserSerializer.new(#user).as_json
user_json[:profiles] = #user.profiles.map { |profile| ProfileSerializer.new(profile).as_json }
etc.
In your example, there would of course be more nesting required.
This isn't a great solution - if you're bound to returning deeply nested JSON, I think the best short-term option would be to add some more memcache capacity. Long-term it might be worth rethinking this approach, as it might not be sustainable to return everything at once.

Related

Dependency between data store

TL;DR
What's the best way to handle dependency between types of data that is loaded asynchronously from different backend endpoints?
Problem
My app fetches data from a backend, for each entity I have an endpoint to fetch all instances.
For example api.myserver.com/v1/users for User model and api.myserver.com/v1/things for Thing model.
This data is parsed and placed into data store objects (e.g. UserDataStore and ThingDataStore) that serve these models to the rest of the app.
Question
What should I do if the data that comes from /things depends on data that comes from /users and the fetch operations are async. In my case /things returns the id of a user that created them. This means that if /things returns before /users, then I won't have enough data to create the Thing model.
Options
Have /things return also relevant /users data nested.
This is bad because:
I'll then have multiple model instances User for the same actual user - one that came from /users and one that came nested in /things.
Increases the total payload size transferred.
In a system with some permission policy, data that is returned for /users can be different to /things, and then it'll allow partially populated models to be in the app.
Create an operational dependency between the two data stores, so that ThingsDataStore will have to wait for UserDataStore to be populated before it attempts to load its own data.
This is also bad because:
Design-wise this dependency is not welcome.
Operational-wise, it will very quickly become complicated once you throw in another data stores (e.g. dependency cycles, etc).
What is the best solution for my problem and in general?
This is obviously not platform / language dependent.
I see two possible solutions:
Late initialization of UserDataStore in ThingDataStore. You will have to allow for creation an object that is not fully valid. And you will also need to add method that will give you an information whether UserDataStore is initialized or not. Not perfect, because for some time there will exists an invalid instance.
Create some kind of proxy or maybe a buider object for ThingDataStore that will hold all information about particular thing and will create ThingDataStore object as soon as UserDataStore related with this instance will be received.
Maybe it will help you. Good luck!

REST API Design - Collections as arrays with ETag

Take the following URL template:
~/pupil/{id}/subjects
If the subjects are a collection represented in the traditional way, as if each item stands alone like the pupil items, then is this the correct way to expose them?
It begins to feel wrong when you consider updating the collection in terms of the pupil and concurrently with another API caller.
Suddenly, you cannot synchronize access since there's no ETag to cover the set and you'll end up interleaving the changes and getting in a tangle.
A different design could see the subjects incorporated as a sub array in the entity of the pupil and the /subjects URL is just for read access.
Perhaps the subjects should be returned as a single array set entity with a discreet ETag and that POSTing individual subjects should be disabled and updates made via a POST/PUT of the whole set, but what if the list is very long? Needs paging?
Maybe the design decision is case-by-case, not a sweeping guideline. Not sure.
Thoughts?
It depends on whether you want to treat "subjects" as a single resource or not.
If, as you say, consumers of your API and going to want to add, remove or modify individual subjects then, yes the traditional way of representing them would be correct with regard to REST patterns:
~/pupil/{id}/subjects
would return a list of resources at
~/pupil/{id}/subjects/{subjectId}
Unless there's a strong reason to optimize bulk operations or caching, this is the most RESTful and straightforward implementation.

What are the benefits of using the fields option when querying in Meteor

SomeCollection.find({}, { fields: { exerciseId: 1} })
After running a time test, it looks like this query takes longer than just find() without any arguments. Are there any benefits of limiting the field? I guess, it will use less local memory but when is that worth it?
In my experience the benefits of using fields on the client are limited. But you should absolutely use it when defining your publications on the server for two reasons:
Security: hide fields the client has no business seeing, especially fields about other users
Performance: fewer fields means fewer bytes on the wire. Especially important when a subscription includes many documents.
The benefit is that it becomes less reactive, which you can benefit from in Tracker.autorun and when you use observe or observeChanges.
EDIT
I'm not 100% sure about Tracker.autorun.
Explicitly specifying is indeed saving memory, sends less data over the wire and might give you a small performance boost. Another important use for it is as a security measure. Imagine that you want your client-side to subscribe to the Users collection, but you don't want the user's password or any other field containing sensitive information to be visible on the client. In that case, when you're publishing the collection, you'd do it this way:
Meteor.publish('secureUsers', function() {
if(!this.userId) return null;
return Meteor.users.find(this.userId,
{fields:
{
someNonSensitiveField: 1, /* avatar information, for example */
otherNonSensitiveField: 1
}
}
);
});
This way only the specified fields will be included in the collection objects, leaving out the other data, which might be sensitive. Another approach is to specify which fields are sensitive and exclude them explicitly.
HTH!

Is it RESTful to create complex objects in a single POST?

I have a form where users create Person records. Each Person can have several attributes -- height, weight, etc. But they can also have lists of associated data such as interests, favorite movies, etc.
I have a single form where all this data is collected. To me it seems like I should POST all of this data in a single request. But is that RESTful? My reading suggests that the interests, favorite movies and other lists should be added in separate POST requests. But I don't think that makes sense because one of those could fail and then there would be a partial insert of the Person and it may be missing their interests or favorite movies.
I'd say that it depends entirely upon the addressability and uniqueness of the dependent data.
If your user-associated data is dependent upon the user (i.e., a "distinct" string, e.g. an attribute such as a string representing an (unvalidated) name of a movie), then it should be included in the POST creation of the user representation; however, if the data is independent of the user (where the data can be addressed independently of the user, e.g. a reference, such as a movie from a set of movies) then it should be added independently.
The reasoning behind this is that reference addition when bundled with the original POST implies transactionality; that is, if another user deletes the movie reference for the "favorite" movie between when it is chosen on the client and when the POST goes through, the user add will (should by that design) fail, whereas if the "favorite" movie is not associative but is just an attribute, there's nothing to fail on (attributes (presumably) cannot be invalidated by a third party).
And again, this goes very much to your specific needs, but I fall on the side of allowing the partial inserts and indicating the failures. The proper way to handle this sort of thing if you really want to not allow partial inserts is to just implement transactions on the back end; they're the only way to truly handle a situation where a critical associated resource is removed mid-process.
The real restriction in REST is that for a modifiable resource that you GET, you can also turn around and PUT the same representation back to change its state. Or POST. Since it's reasonable (and very common) to GET resources that are big bundles of other things, it's perfectly reasonable to PUT big bundles of things, too.
Think of resources in REST very broadly. They can map one-to-one with database rows, but they don't have to. An addressable resource can embed other addressable resources, or include links to them. As long as you're honoring your representation and the semantics of the underlying protocol's operations (i.e. HTTP GET POST PUT etc.), REST doesn't have anything to say about other design considerations that might make your life easier or harder.
I don't think there is a problem with adding all data in one request as long as its inherently associated with the main resource (i.e. the person in your case). If interest, fav. movies etc are resources of their own, they should also be handled as such.

How should I deal with object hierarchies in a RESTful API?

I am currently designing the API for an existing PHP application, and to this end am investigating REST as a sensible architectural approach.
I believe I have a reasonable grasp of the key concepts, but I'm struggling to find anybody that has tackled object hierarchies and REST.
Here's the problem...
In the [application] business object hierarchy we have:
Users
L which have one-to-many Channel objects
L which have one-to-many Member objects
In the application itself we use a lazy load approach to populate the User object with arrays of these objects as required. I believe in OO terms this is object aggregation, but I have seen various naming inconsistencies and do not care to start a war about the precise naming convention </flame war>.
For now, consider I have some loosely coupled objects that I may / may not populate depending on application need.
From a REST perspective, I am trying to ascertain what the approach should be. Here is my current thinking (considering GET only for the time being):
Option 1 - fully populate the objects:
GET api.example.com/user/{user_id}
Read the User object (resource) and return the User object with all possible Channel and Member objects pre-loaded and encoded (JSON or XML).
PROS: reduces the number of objects, no traversal of object hierarchies required
CONS: objects must be fully populated (expensive)
Option 2 - populate the primary object and include links to the other object resources:
GET api.example.com/user/{user_id}
Read the User object (resource) and return the User object User data populated and two lists.
Each list references the appropriate (sub) resource i.e.
api.example.com/channel/{channel_id}
api.example.com/member/{member_id}
I think this is close to (or exactly) the implications of hypermedia - the client can get the other resources if it wants (as long as I tag them sensibly).
PROS: client can choose to load the subordinates or otherwise, better separation of the objects as REST resources
CONS: further trip required to get the secondary resources
Option 3 - enable recursive retrieves
GET api.example.com/user/{user_id}
Read the User object and include links to lists of the sub-objects i.e.
api.example.com/user/{user_id}/channels
api.example.com/user/{user_id}/members
the /channels call would return a list of channel resources in the form (as above):
api.example.com/channel/{channel_id}
PROS: primary resources expose where to go to get the subordinates but not what they are (more RESTful?), no requirement to get the subordinates up front, the subordinate list generators (/channels and /members) provide interfaces (method like) making the response more service like.
CONS: three calls now required to fully populate the object
Option 4 - (re)consider the object design for REST
I am re-using the [existing] application object hierarchy and trying to apply it to REST - or perhaps more directly, provide an API interface to it.
Perhaps the REST object hierarchy should be different, or perhaps the new RESTful thinking is exposing limitations of the existing object design.
Any thoughts on the above welcomed.
There's no reason not to combine these.
api.example.com/user/{user_id} – return a user representation
api.example.com/channel/{channel_id} – return a channel representation
api.example.com/user/{user_id}/channels – return a list of channel representations
api.example.com/user/{user_id}/channel_list – return a list of channel ids (or links to their full representations, using the above links)
When in doubt, think about how you would display the data to a human user without "API" concerns: a user wants both index pages ({user_id}/channel_list) and full views ({user_id}/channels).
Once you have that, just support JSON instead of (or in addition to) HTML as the representation format, and you have REST.
The best advice I can give is to try and avoid thinking about your REST api as exposing your objects. The resources you create should support the use cases you need. If necessary you might create resources for all three options:
api.example.com/completeuser/{id}
api.example.com/linkeduser/{id}
api.example.com/lightweightuser/{id}
Obviously my names are a bit goofy, but it really doesn't matter what you call them. The idea is that you use the REST api to present data in the most logical way for the particular usage scenario. If there are multiple scenarios, create multiple resources, if necessary. I like to think of my resources more like UI models rather than business entities.
I would recommend Restful Obects which is standards for exposing domain model's restful
The idea of Restful Objects is to provide a standard, generic RESTful interface for domain object models, exposing representations of their structure using JSON and enabling interactions with domain object instances using HTTP GET, POST, PUT and DELETE.
According to the standard, the URIs will be like:
api.example.com/object/user/31
api.example.com/object/user/31/properties/username
api.example.com/object/user/31/collections/channels
api.example.com/object/user/31/collections/members
api.example.com/object/user/31/actions/someFunction
api.example.com/object/user/31/actions/someFunction/invoke
There are also other resources
api.example.com/services
api.example.com/domain-types
The specification defines a few primary representations:
object (which represents any domain object or service)
list (of links to other objects)
property
collection
action
action result (typically containing either an object or a list, or just feedback messages)
and also a small number of secondary representations such as home, and user
This is interesting as you’ll see that representations are fully self-describing, opening up the possibility of generic viewers to be implemented if required.
Alternatively, the representations can be consumed directly by a bespoke application.
Here's my conclusions from many hours searching and with input from the responders here:
Where I have an object that is effectively a multi-part object, I need to treat that as a single resource. Thus if I GET the object, all the sub-ordinates should be present. This is required in order that the resource is cacheable. If I part load the object (and provide an ETag stamp) then other requestors may receive a partial object when they expected a full one. Conclude - objects should be fully populated if they are being made available as resources.
Associated object relationships should be made available as links to other (primary) resources. In this way the objects are discoverable by traversing the API.
Also, the object hierarchy that made sense for main application site may appear not be what you need to act in RESTful manner, but is more likely revealing problems with the existing hierarchy. Having said this the API may require more specialised use cases than had been previously envisaged, and specialised resources may be required.
Hope that helps someone