I am working on a web application which needs to get data from some local and some non local resources and then display it. As it could take arbitrary amount of time to get the data from these resources I am thinking of using the actors concept so that each actor is responsible for getting data from the respective resource. The request thread will wait for each actor to finish its task and then use ajax to update only the portion of the web page that is dependent on that data. This way user will start seeing the data as soon as it is received rather than wait for all of them to finish and then get a first look at the data.
I am planning to look into scala/lift framework for this. I have read some articles on the web for scala/lift and want to explore if this is the correct way to approach this problem and also if scala/lift is good platform of choice. I have worked in Java and C# previously. Any opinions, comments, suggestions are welcome.
Thanks,
gary
Take a look a message queue technology like Java's JMS. Message queues allow you to handle long running background tasks asynchronously and reliably. This is the technique sites like Flickr and YouTube use to do media transcoding asynchronously. You can use a Java EE server, or a JMS technology like Apache's ActiveMQ, and then layer your Scala/Lift code on top of it.
Richard Monson-Haefel's book on JMS covers it well.
For more general help with web site scaling and construction, take a look at Todd Hoff's excellent blog, highscalability.com/. There are some good pointers to using message queues to offload long-running tasks this way.
BTW, Twitter uses Scala for something much like what you're considering. Here's an interview with some of their developers; they describe one way they use Scala:
Robey Pointer: A lot of our architecture is based on letting Rails do what it does best, which is the AJAX, the web front ends, the website—what the user sees. Anything we can offload out of the request/response cycle, we do. So we queue those tasks into a messaging system and have back-end daemons handle them.
If the non-local resources originate at some other service or system, Event Driven Architecture might work for you. Instead of pulling from the non-local resources you could set up this web-application as a subscriber to the events published by these services. Upon receiving a message regarding part of its functionality it would cache locally the data it's interested in. This should let you escape the issue of asynchronous update of parts of the page (all data would be accessible locally).
Udi Dahan blogs about this approach a lot and is also an author of a .NET message bus (NServiceBus) that can be used in such scenarios. See for example http://msdn.microsoft.com/en-us/architecture/aa699424.aspx
Actors would be a way to go. You're essentially setting up a light weight version of JMS. And Lift does the comet stuff very well.
In addition to the Scala actors, and the Lift Actors, you also have akka actors. When Scala Swarm becomes production ready you'll be ready for that too.
If the delayed information is distinct from the information that needs to display immediately, with Lift you can use a LazyLoad snippet that has the longer-running computation (calling the web service) as part of its logic. Lift will take care of inserting it into the page when its ready.
Related
I am unsure how to make use of event-driven architecture in real-world scenarios. Let's say there is a route planning platform consisting of the following back-end services:
user-service (manages user data and roles)
map-data-service (roads & addresses, only modified by admins)
planning-tasks-service
(accepts new route planning tasks, keeps track of background tasks, stores results)
The public website will usually request data from all 3 of those services. map-data-service needs information about user-roles on a data change request. planning-tasks-service needs information about users, as well as about map-data to validate new tasks.
Right now those services would just make a sync request to each other to get the needed data. What would be the best way to translate this basic structure into an event-driven architecture? Can dependencies be reduced by making use of events? How will the public website get the needed data?
Cosmin is 100% correct in that you need something to do some orchestration.
One approach to take, if you have a client that needs data from multiple services, is the Experience API approach.
Clients call the experience API, which performs the orchestration - pulling data from different sources and providing it back to the client. The design of the experience API is heavily, and deliberately, biased towards what the client needs.
Based on the details you've said so far, I can't see anything that cries out for event-based architecture. The communication between the client and ExpAPI can be a mix of sync and async, as can the ExpAPI to [Services] communication.
And for what it's worth, putting all of that on API gateway is not a bad idea, in that they are designed to host API's and therefore provide the desirable controls and observability for managing them.
Update based on OP Comment
I was really interested in how an event-driven architecture could
reduce dependencies between my microservices, as it is often stated
Having components (or systems) talk via events is sort-of the asynchronous equivalent of Inversion of Control, in that the event consumers are not tightly-coupled to the thing that emits the events. That's how the dependencies are reduced.
One thing you could do would be to do a little side-project just as a learning exercise - take a snapshot of your code and do a rough-n-ready conversion to event-based and just see how that went - not so much as an attempt to event-a-cise your solution but to see what putting events into a real-world solution looks like. If you have the time, of course.
The missing piece in your architecture is the API Gateway, which should be the only entry-point in your system, used by the public website directly.
The API Gateway would play the role of an orchestrator, which decides to which services to route the request, and also it assembles the final response needed by the frontend.
For scalability purposes, the communication between the API Gateway and individual microservices should be done asynchronously through an event-bus (or message queue).
However, the most important step in creating a scalable event-driven architecture which leverages microservices, is to properly define the bounded contexts of your system and understand the boundaries of each functionality.
More details about this architecture can be found here
Event storming is the first thing you need to do to identify domain events(a change in state in your system). For example, 'userCreated', 'userModified', 'locatinCreated', 'routeCreated', 'routeCompleted' etc. Then you can define topics that manage these events. Interested parties can consume these events by subscribing to published events(via topics/channel) and then act accordingly. Implementation of an event-driven architecture is often composed of loosely coupled microservices that communicate asynchronously through a message broker like Apache Kafka. Free EDA book is an excellent resource to know most of the things in EDA.
Tutorial: Even-driven-architecture pattern
I have to build a component that runs in a jvm, uses MongoDB as database and doesn't need a UI. It will be integrated into other products. I'm planning to build this using scala and related tools.
My first thoughts are to just let it expose REST API and let other products integrate using the API. While this is acceptable for some products, it isn't for others due to performance reasons. So I have to enable other components to communicate to this using either http or ipc or message queues. How can achieve this without much duplication of business logic.
Would Play framework be the right choice for this even though there is no UI involved and there is a need to accept messages via http or ipc or message queues?
Using Play for that is ok but there are frameworks better fit for what you are planning to do, as you already said, play has a lot of support for frontend features you don't need.
It will not so much affect the runtime speed as the time you will need for programming, compiling, building and deployment.
There are some framworks that might fit you needs better:
Scalatra Nice, easy to use, integrates good with JavaEE-Stack http://www.scalatra.org/
Finatra Cool if you have the twitter stack running. Metrics and other stuff almost for free http://finatra.info/
Skinny Framework : Looks nice, never tried myself
Spray : Cool features to come, a little elitist
As a Java developer, I'm used to use Spring Batch for batch processing, generally using a streaming library to export large XML files with StAX for exemple.
I'm now developping a Scala application, and wonder if there's any framework, tool or guideline to achieve batch processing.
My Scala application uses the Cake Pattern and I'm not sure how I could integrate this with SpringBatch. Also, I'd like to follow the guidelines described in Functional programming in Scala and try to keep functional purity, using stuff like the IO monad...
I know this is kind of an open question, but I never read anything about this...
Has anyone already achieved functional batch processing here? How was it working? Am I supposed to have a main that creates a batch processing operation in an IO monad and run it? Is there any tool or guideline to help, monitor or handle restartability, like we use Spring Batch in Java.
Do you use Spring Batch in Scala?
How do you handle the integration part, for exemple waiting for a JMS/AMQP message to start the treatment that produces an XML?
Any feedback on the subjet is welcome
You don't mention what kind of app you are developing with Scala, so I'm going to wild guess here and suppose you are doing a server side one. Going further with wild guessing let's say you are using Akka... because you are using it, aren't you? :)
In that case, I guess what you are looking for is Akka Quartz Scheduler, the official Quartz Extension and utilities for cron-style scheduling in Akka. I haven't tried it myself, but from your requirements it seems that Akka + this module would be a good fit. Take into account that Akka already provides hooks to handle restartability of failed actors, and I don't think that it would be that difficult to add monitoring of batch processes leveraging the lifecycle callbacks built into actors.
Regarding interaction with JMS/AMQP messaging, you could use the Akka Camel module, that provides support for sending and receiving messages through a lot of protocols, including JMS. Using this module you could have a consumer actor receiving messages from some JMS endpoint, and fire whatever process you want from there, probably forwarding or sending a new message to the actor responsible for that process. If the process is fired either by a cron style timer or an incoming message you can reuse the same actor to accomplish the task.
Web-applications experienced a great paradigm shift over the last years.
A decade ago (and unfortunately even nowadays), web-applications lived only in heavyweighted servers, processing everything from data to presentation formats and sending to dumb clients which only rendered the output of the server (browsers).
Then AJAX joined the game and web-applications started to turn into something that lived between the server and the browser.
During the climax of AJAX, the web-application logic started to live entirely on the browser. I think this was when HTTP RESTful API's started to emerge. Suddenly every new service had its kind-of RESTful API, and suddenly JavaScript MV* frameworks started popping like popcorns. The use of mobile devices also greatly increased, and REST fits just great for these kind of scenarios. I say "kind-of RESTful" here because almost every API that claims to be REST, isn't. But that's an entirely different story.
In fact, I became a sort of a "REST evangelist".
When I thought that web-applications couldn't evolve much more, a new era seems to be dawning: Stateful persistent connection web-applications.
Meteor is an example of a brilliant framework of that kind of applications. Then I saw this video. In this video Matt Debergalis talks about Meteor and both do a fantastic job!
However he is kind of bringing down REST API's for this kind of purposes in favor of persistent real-time connections.
I would like very much to have real-time model updates, for example, but still having all the REST awesomeness.
Streaming REST API's seem like what I need (firehose.io and Twitter's API, for example), But there is very few info on this new kind of API's.
So my question is:
Is web-based real-time communication incompatible with REST paradigm?
(Sorry for the long introductory text, but I thought that this question would only make sense with some context)
Stateful persistent tcp/ip connections for web-applications are great, as long as you are not moving around.
I have developed a real-time web based framework and in my experience I found that when using a mobile device based web browser, the IP address kept changing as I moved from tower to tower, or, wi-fi to wi-fi.
When IP addresses keep changing, the notion that it is a persistent connection evaporates rather quickly.
The framework for real-time web-app has to be architected with the assumption that connections will be transient and the framework must implement its own notion of a session while the underlying IP connection to the back-end keeps changing.
Once a session has been defined and used in all requests and responses between clients and servers, one essentially has a 'web connection'. And now, one can develop real-time web based apps using the REST paradigm.
The back-end server of the framework has to be intelligent to queue up updates while IP addresses are undergoing transitions and then sync-up when tcp/ip connections has been re-established.
The short answer is, 'Yes, you can do real-time web based app using the REST paradigm'.
If you want to play with one, let me know.
I'm very interested in this subject too. This post has a few links to papers that discuss some of the troubles with poorly-designed RPC:
http://thomasdavis.github.com/2012/04/11/the-obligatory-refutation-of-rpc.html
I am not saying Meteor is poorly designed, because I do not know much about Meteor.
In any case, I think I want the best of both "world". I want to benefit from REST and all that it affords with the constrained generic interface, addressability, statelessness, etc.
And, I don't want to get left behind in this "real-time" web revolution either! It's definitely very awesome.
I am wondering if there is not a hybrid approach that can work:
RESTful endpoints can allow a client to enter a space, and follow links to related documents as HATEOAS calls for. But, for the "stream of updates" to a resource, perhaps the "subcription name" could itself be a URI, which when browsed to in a point-in-time single request, like through the web browser's address bar or curl, would return either a representation of the "current state", or a list of links with the href for prior states of the resource and/or a way to query the discrete "events" that have occured against the object.
In this way, if you state with the "version 1" of the entity, and then replay each of the events against it, you can mutate it up to its "current state", and this events could be streamed into a client that does not want to get complete representations just because one small part of an entity has changed. This is basically the concept of an "event store", which is covered in lots of the CQRS info out there.
As far as being REST-compatible, I believe this approach has been done (though I'm not sure about the streaming side of this), I cannot remember if it was in this book http://shop.oreilly.com/product/9780596805838.do (REST in Practice), or in a presentation I heard by Vaughn Vernon at this recorded talk in QCon 2010: http://www.infoq.com/presentations/RESTful-SOA-DDD.
He talked about a URI design something like this (I don't remember exactly)
host/entity <-- current version of a resource
host/entity/events <-- list of events that have happened to mutate the object into its current state
Example:
host/entity/events/1 <-- this would correspond to the creation of the entity
host/entity/events/2 <-- this would correspond to the second event ever against the entity
He may have also had something there for history, the complete monent-in-time state, like:
host/entity/version/2 <-- this would be the entire state of the entity after the event 2 above.
Vaughn recently published a book, Implementing Domain-Driven Design, which from the table of contents looks like it covers REST and event-driven architecture: http://www.amazon.com/gp/product/0321834577
I'm the author of http://firehose.io/, a realtime framework based on the premise that Streaming RESTful API's can and should exist. From the project website:
Firehose is a minimally invasive way of building realtime web apps
without complex protocols or rewriting your app from scratch. Its a
dirt simple pub/sub server that keeps client-side Javascript models in
sync with the server code via WebSockets or HTTP long polling. It
fully embraces RESTful design patterns, which means you'll end up with
a nice API after you build your app.
Its my hope that this framework prevents the Internet from going back into the RPC dark ages, but we'll see what happens. We do use Firehose.io in production at Poll Everywhere to push millions of messages per day to people on all sorts of different devices. It works.
Is there a way to use a webservice (REST in this case) as the data source for a Lift application? I can find a number of tutorials/examples of using Lift to provide the REST API, but in my case the data is hosted elsewhere and exported as a REST webservice. Pointers to doc are greatly appreciated.
Thanks,
Jeff
This is not related to Lift in fact. There is a lot of different pieces of information already:
HttpClient library as was suggested already,
or Dispatch Scala library for accessing HTTP services
information on how to cache data in Scala in various ways in case you need it
Think about caching thoroughly, it is generally a good choice if your application generates a lot of requests and you can afford caching. Caching will let you achieve many goals:
decrease response time, as you do not depend on the remote service (if you do synchronous data processing)
avoid Denial of Service in case the remote service dies. Otherwise your application will generate many sockets to read data and exhaust resources (either sockets or threads or something else)
do not exceed SLA of the remote service, as many services constrain the number of requests you are allowed to pefrorm per some unit of time.
So you can just sit and put these things together, that's it.
If you really want to be fancy, you can create a Record implementation for a REST-based data source. There's already one of these in existence that works with CouchDB. Using the lift-couchdb module, the interactions with CouchDB are abstracted away and all you deal with is the Scala code. There is a short wiki page with instructions on how to get started with lift-couchdb here:
http://www.assembla.com/wiki/show/liftweb/CouchDB
The pertinent source code files are available here:
http://github.com/lift/lift/tree/master/framework/lift-persistence/lift-couchdb/src/main/scala/net/liftweb/couchdb/
Using the Record interface gives you access to lots of Traits which you use to provide functionality with minimal code-writing such as creating HTML forms, providing lifecycle based calls, and easy hooks for validation.
I've put a scala layer over HttpClient and then use that. I've been meaning to put this on github for some time.
I use Dispatch (which is a wrapper around HttpClient) for making REST calls. Looks nice and simple