what's the difference between haazelcast Imap and ICache? - distributed-computing

I want to know when it's better to use Imap and when to use Icache.
There is a big ambuity to choose between both of them.
I know this question may be a duplicate of Difference in IMap and ICache in Hazelcast but is it possible to get more details on when to choose one of them.
Both interfaces have some common methods but the way to choose is very difficult.
Thanks

Do you want to query objects your cache? Do you want more efficiency oriented capabilities for example, pipelining and performing async map operations? Do you want more eviction policies even custom ones? Would you rather send your processing to the cache rather than being limited to shuffle data back and forth? Yes to any of these, then use Hazelcast Maps.

Related

REST design principles: Referencing related objects vs Nesting objects

My team and I we are refactoring a REST-API and I have come to a question.
For terms of brevity, let us assume that we have an SQL database with 4 tables: Teachers, Students, Courses and Classrooms.
Right now all the relations between the items are represented in the REST-API through referencing the URL of the related item. For example for a course we could have the following
{ "id":"Course1", "teacher": "http://server.com/teacher1", ... }
In addition, if ask a list of courses thought a call GET call to /courses, I get a list of references as shown below:
{
... //pagination details
"items": [
{"href": "http://server1.com/course1"},
{"href": "http://server1.com/course2"}...
]
}
All this is nice and clean but if I want a list of all the courses titles with the teachers' names and I have 2000 courses and 500 teachers I have to do the following:
Approximately 2500 queries just to read the data.
Implement the join between the teachers and courses
Optimize with caching etc, so that I will do it as fast as possible.
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently.
Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
My question therefore is:
1. Is it wrong if we we nest the teacher information in the courses.
2. Should the listing of items e.g. GET /courses return a list of references or a list of items?
Edit: After some research I would say the model I have in mind corresponds mainly to the one shown in jsonapi.org. Is this a good approach?
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently. Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
Your colleagues have lost the plot.
Here's your heuristic - how would you support this use case on a web site?
You would probably do it by defining a new web page, that produces the report you need. You'd run the query, you the result set to generate a bunch of HTML, and ta-da! The client has the information that they need in a standardized representation.
A REST-API is the same thing, with more emphasis on machine readability. Create a new document, with a schema so that your clients can understand the semantics of the document you return to them, tell the clients how to find the target uri for the document, and voila.
Creating new resources to handle new use cases is the normal approach to REST.
Yes, I totally think you should design something similar to jsonapi.org. As a rule of thumb, I would say "prefer a solution that requires less network calls". It's especially true if amount of network calls will be less by order of magnitude.
Of course it doesn't eliminate the need to limit the request/response size if it becomes unreasonable.
Real life solutions must have a proper balance. Clean API is nice as long as it works.
So in your case I would so something like:
GET /courses?include=teachers
Or
GET /courses?includeTeacher=true
Or
GET /courses?includeTeacher=brief|full
In the last one the response can have only the teacher's id for brief and full teacher details for full.
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently. Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
Have you actually measured the overhead generated by each request? If not, how do you know that the overhead will be too intense? From an object-oriented programmers perspective it may sound bad to perform each call on their own, your design, however, lacks one important asset which helped the Web to grew to its current size: caching.
Caching can occur on multiple levels. You can do it on the API level or the client might do something or an intermediary server might do it. Fielding even mad it a constraint of REST! So, if you want to comply to the REST architecture philosophy you should also support caching of responses. Caching helps to reduce the number of requests having to be calculated or even processed by a single server. With the help of stateless communication you might even introduce a multitude of servers that all perform calculations for billions of requests that act as one cohesive system to the client. An intermediary cache may further help to reduce the number of requests that actually reach the server significantly.
A URI as a whole (including any path, matrix or query parameters) is actually a key for a cache. Upon receiving a GET request, i.e., an application checks whether its current cache already contains a stored response for that URI and returns the stored response on behalf of the server directly to the client if the stored data is "fresh enough". If the stored data already exceeded the freshness threshold it will throw away the stored data and route the request to the next hop in line (might be the actual server, might be a further intermediary).
Spotting resources that are ideal for caching might not be easy at times, though the majority of data doesn't change that quickly to completely neglect caching at all. Thus, it should be, at least, of general interest to introduce caching, especially the more traffic your API produces.
While certain media-types such as HAL JSON, jsonapi, ... allow you to embed content gathered from related resources into the response, embedding content has some potential drawbacks such as:
Utilization of the cache might be low due to mixing data that changes quickly with data that is more static
Server might calculate data the client wont need
One server calculates the whole response
If related resources are only linked to instead of directly embedded, a client for sure has to fire off a further request to obtain that data, though it actually is more likely to get (partly) served by a cache which, as mentioned a couple times now throughout the post, reduces the workload on the server. Besides that, a positive side effect could be that you gain more insights into what the clients are actually interested in (if an intermediary cache is run by you i.e.).
Is it wrong if we we nest the teacher information in the courses.
It is not wrong, but it might not be ideal as explained above
Should the listing of items e.g. GET /courses return a list of references or a list of items?
It depends. There is no right or wrong.
As REST is just a generalization of the interaction model used in the Web, basically the same concepts apply to REST as well. Depending on the size of the "item" it might be beneficial to return a short summary of the items content and add a link to the item. Similar things are done in the Web as well. For a list of students enrolled in a course this might be the name and its matriculation number and the link further details of that student could be asked for accompanied by a link-relation name that give the actual link some semantical context which a client can use to decide whether invoking such URI makes sense or not.
Such link-relation names are either standardized by IANA, common approaches such as Dublin Core or schema.org or custom extensions as defined in RFC 8288 (Web Linking). For the above mentioned list of students enrolled in a course you could i.e. make use of the about relation name to hint a client that further information on the current item can be found by following the link. If you want to enable pagination the usage of first, next, prev and last can and probably should be used as well and so forth.
This is actually what HATEOAS is all about. Linking data together and giving them meaningful relation names to span a kind of semantic net between resources. By simply embedding things into a response such semantic graphs might be harder to build and maintain.
In the end it basically boils down to implementation choice whether you want to embed or reference resources. I hope, I could shed some light on the usefulness of caching and the benefits it could yield, especially on large-scale systems, as well as on the benefit of providing link-relation names for URIs, that enhance the semantical context of relations used within your API.

How to design a query where I retrieve last data from resource that I want to apply filter to in RESTful way?

How should a query look like when I want to retrieve last measurements from installations that aren't removed?
Something like that?
/my-web-service/installations/measurements/last?removed=false
The thing is, I don't want to retrieve last measurements that weren't removed from installations. I want to retrieve last measurements from installations that weren't removed.
I see a couple possibilities here:
If you need to read the data from the endpoint transactionally, the way you designed it is the way to go. What I'd change is the name of the param from removed to installationRemoved since it's more descriptive and shorten the endpoint to /my-web-service/measurements/ - since with installations it's unclear in which scope does the client operate. Also, don't you need since param to filter the last measurements?
It there's a chance to split the two endpoints I'd add:
/my-web-service/installations/?removed=false
/my-web-service/measurements/?since=timestamp&installations=<array>
It does not make it better (when it comes to better or worse) but easier and more predictive for the users.
In general try to add more general endpoints with filtering options rather then highly dedicated - doing one particular thing. This way leads to hard to use, loose API. Also, on filtering.
And final notice, your API is good if your clients use it not because they have to but when they like it ;)
According to this best practices article, you could use "aliases for common queries":
To make the API experience more pleasant for the average consumer,
consider packaging up sets of conditions into easily accessible
RESTful paths. For example, the recently closed tickets query above
could be packaged up as GET /tickets/recently_closed
So, in your case, it could be:
/my-web-service/installations/non_removed/measurements/last
where non_removed would be an alias for querying installations that weren't removed.
Hope it helps!

Best practice to map OrientDB ORecordId onto RestFull friendly ID representation

We are looking into OrientDB as our persistency solution behind a restful web service, because a GraphDB would be a perfect match for our use case. One of the things we have noticed is that entities (both Vertex and Edges) are uniquely identified by a ORecordId, containing the '#${clusterId}:${clusterPosition}'. In a restful API, based on my personal experience from relational DB's, you typically have several solutions to identify entities uniquely, for example:
UUID's, generated in code and persisted on DB level
Long/Int values, generated on DB level incrementally
etc...
The problem is that the format "#${clusterId}:${clusterPosition}" is not really URL/REST friendly (example: .../api/user/[#${clusterId}:${clusterPosition}]/address). Do you have any advice/experience on how you would deal with this, keeping in mind that you need a bi-directional mapping between the ORecordId and the "RestFulFriendlyId"?
Any hints and best practices based on experience would be truly appreciated....
Best regards,
Bart
We're looking into using HashID. http://hashids.org/
There are some minor concerns we have still, but theoretically, HashID should get you a hashed Rid, which is also convertible, so it won't take up more storage space (like with a UUID). It will just take a small bit of CPU time.
Please note, this little tool is not in any way a true hash, as in, it makes it very hard to crack the hash. It is more about good obfuscation. If you are at all worried about the Rids being known, this isn't a proper solution.
Scott
Actually, I'd say the RIDs are very RESTful, if you do this:
.../domain.com/other-segments/{cluster}/{position}/...
Since clusters are a "superset" of a specific class (i.e. one class will have one or more clusters), this can be thought of as identifying the target data object by type/record. I'm not sure what backend you're using, but extracting those two URL segments and recombining them to #x:y should be a fairly simple (and maybe mostly automatic) task.

ZeroMQ Subscriber filtering on multipart messages

With ZeroMQ PUB/SUB model, is it possible for a subscriber to filter based on contents of more than just the first frame?
For example, if we have a multi-frame message that contains three frames1) data type,2) instrument, and then 3) the actual data,is it possible to subscribe to a specific data type, instrument pair?
(All of the examples I've seen only shows filtering based off of the first message of a multipart message).
Different modes of ZeroMQ PUB/SUB filtering
Initial ZeroMQ model used a subscriber-side subscription-based filtering.
That was easier for publisher, as it need not handle subscriber-specific handling and simply fed all data to all SUB-s ( yes, at the cost of the vasted network traffic and causing SUB-side workload to process all incoming BLOBs, all the way up from the PHY-media, even if it did not subscribe to the particular topic. Yes, pretty expensive in low-latency designs )
PUB-side filtering was initially proposed for later implementation
still, this mode does not allow your idea to fly just on using the PUB/SUB Scaleable Formal Communication Pattern.
ZeroMQ protocol design strives to inspire users, how to achieve just-enough designed distributed systems' behaviour.
As understood from your 1-2-3 intention, a custom logic shall be just enough to achieve the desired processing.
So, how to approach the solution?
Do not hesitate to setup several messaging / control relays in parallel in your application domain-specific solution, it works much better for any tailor-made solutions and a way safer than to try to "bend" a library's primitive archetype ( in this case the trivial PUB/SUB topic-based filtering ) to make something a bit different, than for what an original use-case was designed.
The more true, if your domain-specific use is FOREX / Equity trading, where latency is your worst enemy, so a successful approach has to minimise stream-decoding and alternative branching as much as possible.
nanoseconds do matter
If one reads into details about multiframe composition ( a sender-side BLOB assembly ), there is no latency-wise advantage, as the whole BLOB gets onto wire only after it has been completed - se there is no advantage for your SUB-side in case your idea was to handle initial frame content for signalling, as the whole BLOB arrives "together"

GWT : Type of Container

I see that there are two ways of transferring objects from server to client
Use the same domain object (Contact.java) as used in the service layer. (I do not use hibernate)
Use the HashMap to send the domain object field values in the form of
Map with the help of BeanUtilsBean class. For multiple objects, use
the List>. Similary, use the Map to submit form
values from client to server
Is there any performance advantage for option 1 over 2?.
Is there a way to hide the classname/package name that is sent to the browser if we
use option 1?.
thanks!.
You have to understand that whatever option you choose, it will need to get converted to JavaScript (+ some wrappers, etc.) - this stuff takes more time and space/bandwidth (note: I haven't done any benchmarks, this is just a [reasonable] conclusion I came up with ;)) than, say, JSON. But if you used JSON, you have to recreate the object on the server side, os it's not a silver bullet. In the end, it all depends how much performance is of an issue to you - for more insight, see this question.
I'd go with option 1: just leave it to the GWT team to pack your domain objects and transfer them between client and server. In the future (GWT 2.1), we'll have some really nice things, including a more lightweight transfer protocol - see this years presentation from Google I/O on architecting GWT apps - it's something worth keeping in mind.
PS: It's always good to do benchmarks yourself in this kind of situations - your configuration, the type of objects, etc. might yield some different results than expected.