I am creating a REST service that has two methods one is GetAll and other is GetById.In my scenario, database request is very costly so i want to store output of GetAll somewhere (Cache) and use it for subsequent request GetById.
One of the characteristic of REST is it should be Statelessness. A request cannot be dependent on a past request and a service treats each request independently.
I want to understand what should be ideal approach to handle such scenarios or how to design this requirement in REST?
The proper way to achieve what you want is by using caching, like MemoryCache.
You create a separate, private function which fetches all the data and caches it in memory. Then you can have both GetAll and GetById use that function.
Your service will remain stateless.
MemoryCache usage example
MemoryCache cache = MemoryCache.Default;
string cacheName = "MyCache";
if (!cache.Contains(cacheName) || cache[cacheName] == null)
{
// get data
var data = ...
// cache data
cache.Set(cacheName, data, new CacheItemPolicy() { SlidingExpiration = DateTime.Now.AddDays(1).TimeOfDay });
}
return cache[cacheName];
The requirement for statelessness in REST is that the service should appear to be stateless. It doesn't matter if the service maintains some state internally. That's just an implementation detail.
there are many solutions to this. one solution I would recommend you is the following:
configure infinispan to use it as a cache mechanism. inside there store a concurrent hashmap with key the query and with value the resultset of your database,
second time you want to getAll, check if the query exists as a key to your cache, if yes retrieve value from cache, else contact database and then insert the result in the cache.
Related
We have implemented drools engine in our platform in order to be able to evaluate rules from streams.
In our use case we have a change detection stream which contains the changes of multiple entities.
Rules need to be evaluated for each entity from the stream over a period of time and evolve it's state apart from others entities(Sessions). Those rules produces alerts based on the state of each entity. And for this reason entities should be into boundaries, so the state of one entity does not interfere on the others.
To achieve this, we create a session as a Spring Bean for each entity id and store it in a inMemory HashMap. So every time an entity arrives, we try to find it`s session on the inMemory Map by using it's Id. If we get a null return we create it.
It does`t seems the right way to accomplish it. Because it does not offer a disaster recover strategy neither offers a great memory management.
We could use some kind of inMemory database such as Redis or Memchached. But I don`t think it would be able to recover a stateful session precisely.
Does someone know how to achieve disaster recover and a good memory management with a embedded Drools with multi sessions in the right way? Does the platform offers some solution?
Thanks very much for your attention and support
The answer is not to try to persist and reuse sessions, but rather to persist an object that models the current state of the entity.
Your current workflow is this:
Entity arrives at your application (from change detection stream or elsewhere)
You do a lookup on a hashmap to get a Session which has the entity's state stored
You fire the rules, which updates the session (and possibly the entity)
You persist the session in-memory.
What your workflow should be is this:
(same) Entity arrives at your application
You do a look-up on an external data source for the entity's state -- for example from a database or data store
You fire the rules, passing in the entity state. Instead of updating the session, you update the state instance.
You persist the state to your external data source.
If you add appropriate write-through caches you can guarantee both performance and consistency. This will also allow you to scale your application sideways if you implement appropriate locking / transaction handling for your data source.
Here's a toy example.
Let's say we have an application modelling a Library where a user is allowed to check out books. A user is only allowed to check out a total of 3 books at a time.
The 'event' we receive models a book check-in or check-out event:
class BookBorrowEvent {
int userId;
int bookId;
EventType eventType; // EventType.CHECK_IN or EventType.CHECK_OUT
}
In an external data source we maintain a UserState record -- maybe as a distinct record in a traditional RDBMS or an aggregate; how we store it isn't really relevant to the example. But let's say our UserState record as returned from the data source looks something like this:
class UserState {
int userId;
int[] borrowedBookIds;
}
When we receive the event, we'll first retrieve the user state from the external data store (or an internally-managed write-through cache), then add the UserState to the rule inputs. We should be appropriately handling our sessions (disposing of them after use, using session pools as needed), of course.
public void handleBookBorrow(BookBorrowEvent event) {
UserState state = getUserStateFromStore(event.getUserId());
KieSession kieSession = ...;
kieSession.insert( event );
kieSession.insert( state );
kieSession.fireAllRules();
persistUserStateToStore(state);
}
Your rules would then do their work against the UserState instance, instead of storing values in local variables.
Some example rules:
rule "User borrows a book"
when
BookBorrowEvent( eventType == EventType.CHECK_OUT,
$bookId: bookId != null )
$state: UserState( $checkedOutBooks: borrowedBookIds not contains $bookId )
Integer( this < 3 ) from $checkedOutBooks.length
then
modify( $state ) { ... }
end
rule "User returns a book"
when
BookBorrowEvent( eventType == EventType.CHECK_IN,
$bookId: bookId != null )
$state: UserState( $checkedOutBooks: borrowedBookIds contains $bookId )
then
modify( $state ) { ... }
end
Obviously a toy example, but you could easily add additional rules for cases like user attempts to check out a duplicate copy of a book, user tries to return a book that they hadn't checked out, return an error if the user exceeds the 3 max book borrowing limit, add time-based logic for length of checkout allowed, etc.
Even if you were using stream-based processing so you can take advantage of the temporal operators, this workflow still works because you would be passing the state instance into the evaluation stream as you receive it. Of course in this case it would be more important to properly implement a write-through cache for performance reasons (unless your temporal operators are permissive enough to allow for some data source transaction latency). The only changes you need to make is to refocus your rules to target their data persistence to the state object instead of the session itself -- which isn't generally recommended anyway since sessions are designed to be disposed of.
I currently have a ReliableActor for every user in the system. This actor is appropriately named User, and for the sake of this question has a Location property. What would be the recommended approach for querying Users by Location?
My current thought is to create a ReliableService that contains a ReliableDictionary. The data in the dictionary would be a projection of the User data. If I did that, then I would need to:
Query the dictionary. After GA, this seems like the recommended approach.
Keep the dictionary in sync. Perhaps through Pub/Sub or IActorEvents.
Another alternative would be to have a persistent store outside Service Fabric, such as a database. This feels wrong, as it goes against some of the ideals of using the Service Fabric. If I did, I would assume something similar to the above but using a Stateless service?
Thank you very much.
I'm personally exploring the use of Actors as the main datastore (ie: source of truth) for my entities. As Actors are added, updated or deleted, I use MassTransit to publish events. I then have Reliable Statefull Services subscribed to these events. The services receive the events and update their internal IReliableDictionary's. The services can then be queried to find the entities required by the client. Each service only keeps the entity data that it requires to perform it's queries.
I'm also exploring the use of EventStore to publish the events as well. That way, if in the future I decide I need to query the entities in a new way, I could create a new service and replay all the events to it.
These Pub/Sub methods do mean the query services are only eventually consistent, but in a distributed system, this seems to be the norm.
While the standard recommendation is definitely as Vaclav's response, if querying is the exception then Actors could still be appropriate. For me whether they're suitable or not is defined by the normal way of accessing them, if it's by key (presumably for a user record it would be) then Actors work well.
It is possible to iterate over Actors, but it's quite a heavy task, so like I say is only appropriate if it's the exceptional case. The following code will build up a set of Actor references, you then iterate over this set to fetch the actors and then can use Linq or similar on the collection that you've built up.
ContinuationToken continuationToken = null;
var actorServiceProxy = ActorServiceProxy.Create("fabric:/MyActorApp/MyActorService", partitionKey);
var queriedActorCount = 0;
do
{
var queryResult = actorServiceProxy.GetActorsAsync(continuationToken, cancellationToken).GetAwaiter().GetResult();
queriedActorCount += queryResult.Items.Count();
continuationToken = queryResult.ContinuationToken;
} while (continuationToken != null);
TLDR: It's not always advisable to query over actors, but it can be achieved if required. Code above will get you started.
if you find yourself needing to query across a data set by some data property, like User.Location, then Reliable Collections are the right answer. Reliable Actors are not meant to be queried over this way.
In your case, a user could simply be a row in a Reliable Dictionary.
TL;DR
What's the best way to handle dependency between types of data that is loaded asynchronously from different backend endpoints?
Problem
My app fetches data from a backend, for each entity I have an endpoint to fetch all instances.
For example api.myserver.com/v1/users for User model and api.myserver.com/v1/things for Thing model.
This data is parsed and placed into data store objects (e.g. UserDataStore and ThingDataStore) that serve these models to the rest of the app.
Question
What should I do if the data that comes from /things depends on data that comes from /users and the fetch operations are async. In my case /things returns the id of a user that created them. This means that if /things returns before /users, then I won't have enough data to create the Thing model.
Options
Have /things return also relevant /users data nested.
This is bad because:
I'll then have multiple model instances User for the same actual user - one that came from /users and one that came nested in /things.
Increases the total payload size transferred.
In a system with some permission policy, data that is returned for /users can be different to /things, and then it'll allow partially populated models to be in the app.
Create an operational dependency between the two data stores, so that ThingsDataStore will have to wait for UserDataStore to be populated before it attempts to load its own data.
This is also bad because:
Design-wise this dependency is not welcome.
Operational-wise, it will very quickly become complicated once you throw in another data stores (e.g. dependency cycles, etc).
What is the best solution for my problem and in general?
This is obviously not platform / language dependent.
I see two possible solutions:
Late initialization of UserDataStore in ThingDataStore. You will have to allow for creation an object that is not fully valid. And you will also need to add method that will give you an information whether UserDataStore is initialized or not. Not perfect, because for some time there will exists an invalid instance.
Create some kind of proxy or maybe a buider object for ThingDataStore that will hold all information about particular thing and will create ThingDataStore object as soon as UserDataStore related with this instance will be received.
Maybe it will help you. Good luck!
I am currently creating Restful API through ASP.Net WebAPI technology. I have 2 questions related to WebAPI
I had done below:
Created below method in Controller class:
public HttpResponseMessage PostOrderData(OrderParam OrderInfo)
Based on Parameter OrderInfo, Query the SQL Server and get list of orders.
Set the Response.Content with the collection object:
List<Orders> ordList = new List<Orders>();
//filled the ordList from SQL query result
var response = Request.CreateResponse<List<Orders>>(HttpStatusCode.OK, ordList);
On Client side,
OrderParam ordparam = new OrderParam();
response = client.PostAsJsonAsync("api/order", ordparam).Result;
if (response.IsSuccessStatusCode)
{
List<Orders> mydata = response.Content.ReadAsAsync<List<Orders>>().Result;
}
So question: is it fine to Post the data to server to Get the data i.e. usage of Post data insted of Get is correct? Is there any disadvantage in approach? (One disadvantage is: I will not able to query directly from browser) I have used Post here because parameter "OrderParam" might extend in future and there might be problem due to increase in Length of URL due to that.
2nd Question is: I have used classes for parameter and for returning objects i.e. OrderParam and Orders. Now consumer (clients) of this web api are different customers and they will consume API through .Net (C#) or through Jquery/JS. So do we need to pass this class file containing defination of OrderParam and Orders classes manually to each client? and send each time to client when there will be any change in above classes?
Thanks in advance
Shah
Typically no.
POST is not safe nor idempotent - as such cannot be cached. It is meant to be used for cases where you are changing the state on the server.
If you have a big critieria, you need to redesign but in most cases, URL fragments or querystring params work. Have a look at OData which uses querystring for very complex queries and uses GET.
With regard to second question, also no. Server can expose schema (similar to WSDL) or docs but should not know about the client.
Yes you can, RESTFUL is nothing to do with Security, it is just a Convention and for Web API you can use it because you do not need any caching for web api.
I'm using Web Api and have a scenario where clients are sending a heartbeat notification every n seconds. There is a heartbeat object which is sent in a POST rather than a PUT, because as I see it they are creating a new heartbeat rather than updating an existing heartbeat.
Additionally, the clients have a requirement that calls for them to retrieve all of the other currently online clients and the number of unread messages that individual client has. It seems to me that I have two options:
Perform the POST followed by a GET, which to me seems cleaner from a pure REST standpoint. I am doing a creation and a retrieval and I think the SOLID principles would prefer to split them accordingly. However, this approach means two round trips.
Have the POST return an object which contains the same information that the GET would otherwise have done. This consolidates everything into a single request, but I'm concerned that this approach would be considered ill-advised. It's not a pure POST.
Option #2 stubbed out looks like this:
public HeartbeatEcho Post(Heartbeat heartbeat)
{
}
HeartbeatEcho is a class which contains properties for the other online clients and the number of unread messages.
Web Api certainly supports option #2, but just because I can do something doesn't mean I should. Is option #2 an abomination, premature optimization, or pragmatism?
The option 2 is not an abomination at all. A POST request creates a new resource, but it's quite common that the resource itself is returned to the caller. For example, if your resources are items in a database (e.g., a Person), the POST request would send the required members for the INSERT operation (e.g., name, age, address), and the response would contain a Person object which in addition to the parameters passed as input it would also have an identifier (the DB primary key) which can be used to uniquely identify the object.
Notice that it's also perfectly valid for the POST request only return the id of the newly created resource, but that's a choice you have, depending on the requirements of the client.
public HttpResponseMessage Post(Person p)
{
var id = InsertPersonInDBAndReturnId(p);
p.Id = id;
var result = this.Request.CreateResponse(HttpStatusCode.Created, p);
result.Headers.Location = new Uri("the location for the newly created resource");
return result;
}
Whichever way solves your business problem will work. You're correct POST for new record vs PUT for update to existing record.
SUGGESTION:
One thing you may want to consider is adding Redis to your stack and the apps can post very fast, then you could use the Pub/Sub functionality for the echo part or Blpop (blocking until record matches criteria). It's super fast and may help you scale and perfectly designed for what you are trying to do.
See: http://redis.io/topics/pubsub/
See: http://redis.io/commands/blpop
I've used both Redis for similar, but also RabbitMQ and with RabbitMQ we added socket.io connection to "stream" the heartbeat in real time without need for long polling.