Using Akka to seperate a Lucene service form a website - scala

I'm developing a website with a Lucene backend. Lucene connects directly to index files, making it difficult to develop the website from machines other than the index machine. Traditional databases have a server running to provide an intermediary between the raw data and the application.
I would like to create such an intermediary between Lucene and my web application. On first thought, Akka seems like the right tool, and I think I would use Akka futures or typed actors to perform the call. However, the Akka Typed Actors page warns:
"A bit more background: TypedActors can very easily be abused as RPC, and that is an abstraction which is well-known to be leaky. Hence TypedActors are not what we think of first when we talk about making highly scalable concurrent software easier to write correctly. They have their niche, use them sparingly."
I think the point is that RPC promotes centralization, but is my plan a good one or an abuse of Akka?

Why not use solr? It provides the application to manage your lucene indexes (as it is basically lucene with an application over the top to interact with the data. It would be easier than dealing with actors and it should provide everything you need.

Related

What is the real benefit of using GraphQL?

I have been reading about the articles on the web about the benefits of graphql but so far I have not been able to find a single benefit of it.
One of the most common benefits mentioned in those articles are below?
No Overfetching with GraphQL.
Reducing number of calls made from client side.
Data Load Control Granularity
Evolve your API without versions.
Those above all makes sense but it is not the graphql itself that provides these benefits. Any second layer api written in java/python or any other language would be able to provide this benefits too. It is basically introducing another layer of abstraction above the data retrieval systems, rest or whatever, and decoupling the client side from that layer. After you do that everything you can do with graphql can also be done with any other language too.
Anyone can implement a say scala server that retrieves the data from various api's integrates them, create objects internally and feeds the client with only the relevant part of the data with total control on the data. This api can be easily versioned and released accordingly. Considering the syntax of graphql and how cumbersome it is and difficulty of creating a good cache around it, I can't see why would you use it really.
So the overall question is there any benefits of graphql that is provided to the application because of the graphql itself and not because you implement another layer of abstraction between your applications and your api's?
Best practices known as REST existed earlier, too.
GraphQL is more standarized than REST, safer (no injections) and syntax gives great flexibility in the area of quickly changing client needs.
It's just a good standard of best practices.
I feel GrapgQL is another example of overengineering. I would say "Best standards and practices" are "Keeping It Simple."
Breaking down and object and building a custom one before sending it to the client is very basic.

ArangoDB Foxx as a REST back-end

I am working on an app that would greatly benefit from Arangos' multi-model capabilities. Considering the app needs for the back-end, I have concluded that most, if not all, of it could be served through a REST API as to aid cleaner design for future development and integration with others. The API would then be consumed by several web and mobile front-end frameworks to handle the rest of the logic. The project will be developed with Javascript for the whole stack, using the NodeJS ecosystem.
.
The question itself:
Should and could one use arangodb + foxx to create the complete back-end stack for serving a REST API, thus avoiding another layer/component in the stack? e.g. express/hapi/loopback etc.
.
Major back-end requirements:
Authentication with roles
Sessions
Encryption
Complex querying (root of my initial thought, as to avoid multiple hops between DB and back-end)
Entry parsing, validation and sanitization
Scheduled tasks
.
Mainly looking for:
Known design advantages
Known design limitations
"Hidden" bottlenecks
Other possible future regrets
.
Side question (that might answer some of the above): Could Foxx utilise some of the node middleware available via npm?
Thanks in advance for your time!
You can use ArangoDB Foxx as the sole backend of your application, however it is important to keep the limitations of Foxx (compared to a general purpose JS environment like Node.js) in mind when doing this.
You mention encryption. While ArangoDB does support some cryptography (e.g. HMAC signing and PBKDF2 key derivation for passwords) the support is not as exhaustive and extensible as in Node.js. Also when using computationally expensive cryptography this will affect the performance of the database (because unlike Node.js Foxx is strictly synchronous and thus all operations should be considered blocking).
ArangoDB does not support role-based authentication out of the box but it is perfectly reasonable to implement it within ArangoDB using Foxx (just like you would implement it in Node.js, except you don't need to leave the database).
For sessions there are generally two possible approaches: you can either use a collection with session documents (using ArangoDB as your session backend) or you can keep your services stateless by using signed tokens (Foxx comes with JWT support out of the box).
Complex/stored queries and input validation (using the joi schema library originally written for hapi) are actually some of the main use cases of Foxx so those shouldn't be any problem whatsoever.
Foxx comes with its own mechanism for queueing tasks, which can also be scheduled ahead or recur periodically. However depending on your requirements an external job or message queue may be a better fit. The good thing is you can get started with the built-in job queue right away and still move on to a dedicated solution if the need arises during development.
As for middleware and NPM packages: Foxx is not fully compatible with Node.js code. While we provide a lot of compatibility code and try to keep the core modules compatible where possible, a big difference is that Node.js is generally used to perform asynchronous operations while in ArangoDB all operations are synchronous.
If you have Node.js modules that don't use crypto, file or network I/O and don't use asynchronous APIs (e.g. setTimeout, promises) they may be compatible with Foxx. A lot of utility libraries like lodash work with no problems at all. Even if you find that a module doesn't work it may be possible to write an adapter for it like we have done with mocha (integrated into Foxx) and GraphQL (via the graphql-sync package on NPM).
In my experience it is a good approach to put your Foxx service behind a thin layer of Node.js (e.g. a simple express application that mostly just proxies to your Foxx API) and/or to delegate some parts of your backend to standalone Node.js microservices (e.g. integration with non-HTTP services like e-mail or LDAP) which can be integrated in Foxx via HTTP.
One more thing: while a lot of existing express middleware likely isn't compatible with Foxx because of Node-specific dependencies and async logic, ArangoDB 3 will bring a new version of Foxx with support for middleware using a functionally express-compatible API.
I'm just starting to port my sails application to a FOXX application so I can answer some of your questions.
Role based authorization in ArangoDB is probably at too high a level than you want. In our case, we use an external service to authorize various web and service based applications at a very fine-grained level (much lower than a vertex or an edge). My feeling is that Authorization at that level will require you to write it yourself in javascript. If it's just CRUD on a per collection basis, then it shouldn't require much effort.
For authorization and sessions, I would look at the FOXX example found at: FOXX authorization-session example
It's not clear what you're asking about encryption. If you're talking about SSL connections, then that is natively supported (see arangodb end-points). As for internal encryption, there is a javascript crypto module ArangoDb crypto
Entry validation, etc. is supported by the javascript joi package.
Complex querying... Absolutely and getting even better in ArangoDB version 3.x. Traversals can be chained (go down using one edge collection, then up using another).
You're right on the ball when thinking about efficiency. This is the main reason we're going from sails to FOXX. In our case, we filter query results based on permissions from our external service. This means that we can't use ArangoDB native skip and limit support if these attributes are specified by the client. In sails, we have to bring back results in chunks and collect until we hit the appropriate skip and limit values. By moving to FOXX, we save a lot of network and other resources. We tested this by having sails forward the request to our prototype FOXX implementation. This scaled much better than the sails post-processing setup.
You can use NPM modules with restrictions. See Javascript Modules

Play framework to build application with no UI and need to accept requests using REST and ipc and/or message queues

I have to build a component that runs in a jvm, uses MongoDB as database and doesn't need a UI. It will be integrated into other products. I'm planning to build this using scala and related tools.
My first thoughts are to just let it expose REST API and let other products integrate using the API. While this is acceptable for some products, it isn't for others due to performance reasons. So I have to enable other components to communicate to this using either http or ipc or message queues. How can achieve this without much duplication of business logic.
Would Play framework be the right choice for this even though there is no UI involved and there is a need to accept messages via http or ipc or message queues?
Using Play for that is ok but there are frameworks better fit for what you are planning to do, as you already said, play has a lot of support for frontend features you don't need.
It will not so much affect the runtime speed as the time you will need for programming, compiling, building and deployment.
There are some framworks that might fit you needs better:
Scalatra Nice, easy to use, integrates good with JavaEE-Stack http://www.scalatra.org/
Finatra Cool if you have the twitter stack running. Metrics and other stuff almost for free http://finatra.info/
Skinny Framework : Looks nice, never tried myself
Spray : Cool features to come, a little elitist

GWT non-Java backend application structure

Does anyone know where is a good example of GWT application with non-Java backend? Something like the "Contacts" application on the official pages of GWT. I'm interested in the following topics specifically (this is how I see the "back" part of the application, but it may be different, of course):
DTO serialization.
Communication layer (the very back one). Should it use generics and work with any abstract DTO? Is there any other proven approach?
Service layer, which uses the communication layer with specific DTOs.
Requests caching. Should it be implemended on the service or communication level?
Good abstraction. So we can easily substitute any parts for testing and other purposes, like using different serializators (e. g. XML, JSON), different servers' behaviors when managing user sessions (URL might change once the user is logged in).
I know, there are many similar topics here, but I haven't found one, which is focused on the structure of the client part.
Use REST on both sides. PHP/Python REST server-side. GWT client-side.
Don't use GWT-RPC.
The following gives a guideline for Java-java REST, but there is leeway to move out of server-side Java. It explains why REST is appropriate.
http://h2g2java.blessedgeek.com/2011/11/gwt-with-jax-rs-aka-rpcrest-part-0.html.
REST is an industry established pattern (Google. Yahoo, in fact every stable-minded establishment deploys REST services).
REST is abstracted as a HTTP level data structure. Which Java, PHP and Python have established libraries to comply to ensure DTO integrity.
Communication layer (the very back one). Should it use generics ???
Don't understand the question or why the question exists. Just use the REST pattern to provide integrity to non-homogeneous language between server and client and to HTTP request/response.
If you are using Java at back-end, there is no escape from using generics. Generics saves code. But using generics extensively needs programmer to have equally extensive visual capacity to visualise the generics. If your back-end is on PHP or Python, are there generics for PHP? Python generics? Might as well stay in Java land or C# land and forget about Java-free service provider.
Did you mean DTO polymorphism? Don't try polymorphism or decide on it until after you have established your service. Then adaptively and with agility introduce polymorphism into your DTOs if you really see the need. But try to avoid it because with JSON data interchange, it gets rather confusing between server and client. Especially if they don't speak the same programming language.
If you are asking HTTP level generics? I don't know of any framework, not SOAP, not REST where you could have generics carried by the XML or JSON. Is there? Generics?
Service layer? REST.
Requests caching? Cache at every appropriate opportunity. Have service-provider cache query results for items common and static to all sessions like menu, menu/drop-down box choices, labels, etc. Cache your history and places.
On GWT side cache records so that forward/backward button will not trigger inadvertent query. Use MVP pattern and history to manage history traversal that might trigger redisplay of info.
If you are talking about unified info abstraction, you should start your project with JAX-RS to define/test the API and perform data abstraction. Without performing any business logic.
Then, once your HTTP-level APIs andDTOs are defined, convert server-side to using language of your choice to proceed to write more complex code.
BTW, I don't dig your terminology "Backend".
We normally use the terminology client-side for service consumer, server-side for service provider, backend for data repository/persistence access, mid-tier or middle-ware for intervening/auxiliary software required to provide mathematical/scientific/graphical analysis/synthesis.
If our terminologies did not coincide, I probably answered this question wrong.

When should I use RequestFactory vs GWT-RPC?

I am trying to figure out if I should migrate my gwt-rpc calls to the new GWT2.1 RequestFactory cals.
Google documentation vaguely mentions that RequestFactory is a better client-server communication method for "data-oriented services"
What I can distill from the documentation is that there is a new Proxy class that simplifies the communication (you don't pass back and forth the actual entity but just the proxy, so it is lighter weight and easier to manage)
Is that the whole point or am I missing something else in the big picture?
The big difference between GWT RPC and RequestFactory is that the RPC system is "RPC-by-concrete-type" while RequestFactory is "RPC-by-interface".
RPC is more convenient to get started with, because you write fewer lines of code and use the same class on both the client and the server. You might create a Person class with a bunch of getters and setters and maybe some simple business logic for further slicing-and-dicing of the data in the Person object. This works quite well until you wind up wanting to have server-specific, non-GWT-compatible, code inside your class. Because the RPC system is based on having the same concrete type on both the client and the server, you can hit a complexity wall based on the capabilities of your GWT client.
To get around the use of incompatible code, many users wind up creating a peer PersonDTO that shadows the real Person object used on the server. The PersonDTO just has a subset of the getters and setters of the server-side, "domain", Person object. Now you have to write code that marshalls data between the Person and PersonDTO object and all other object types that you want to pass to the client.
RequestFactory starts off by assuming that your domain objects aren't going to be GWT-compatible. You simply declare the properties that should be read and written by the client code in a Proxy interface, and the RequestFactory server components take care of marshaling the data and invoking your service methods. For applications that have a well-defined concept of "Entities" or "Objects with identity and version", the EntityProxy type is used to expose the persistent identity semantics of your data to the client code. Simple objects are mapped using the ValueProxy type.
With RequestFactory, you pay an up-front startup cost to accommodate more complicated systems than GWT RPC easily supports. RequestFactory's ServiceLayer provides significantly more hooks to customize its behavior by adding ServiceLayerDecorator instances.
I went through a transition from RPC to RF. First I have to say my experience is limited in that, I used as many EntityProxies as 0.
Advantages of GWT RPC:
It's very easy to set-up, understand and to LEARN!
Same class-based objects are used on the client and on the server.
This approach saves tons of code.
Ideal, when the same model objects (and POJOS) are used on either client and server, POJOs == MODEL OBJECTs == DTOs
Easy to move stuff from the server to client.
Easy to share implementation of common logic between client and server (this can turn out as a critical disadvantage when you need a different logic).
Disadvatages of GWT RPC:
Impossible to have different implementation of some methods for server and client, e.g. you might need to use different logging framework on client and server, or different equals method.
REALLY BAD implementation that is not further extensible: most of the server functionality is implemented as static methods on a RPC class. THAT REALLY SUCKS.
e.g. It is impossible to add server-side errors obfuscation
Some security XSS concerns that are not quite elegantly solvable, see docs (I am not sure whether this is more elegant for RequestFactory)
Disadvantages of RequestFactory:
REALLY HARD to understand from the official doc, what's the merit of it! It starts right at completely misleading term PROXIES - these are actually DTOs of RF that are created by RF automatically. Proxies are defined by interfaces, e.g. #ProxyFor(Journal.class). IDE checks if there exists corresponding methods on Journal. So much for the mapping.
RF will not do much for you in terms of commonalities of client and server because
On the client you need to convert "PROXIES" to your client domain objects and vice-versa. This is completely ridiculous. It could be done in few lines of code declaratively, but there's NO SUPPORT FOR THAT! If only we could map our domain objects to proxies more elegantly, something like JavaScript method JSON.stringify(..,,) is MISSING in RF toolbox.
Don't forget you are also responsible for setting transferable properties of your domain objects to proxies, and so on recursively.
POOR ERROR HANDLING on the server and - Stack-traces are omitted by default on the server and you re getting empty useless exceptions on the client. Even when I set custom error handler, I was not able to get to low-level stack traces! Terrible.
Some minor bugs in IDE support and elsewhere. I filed two bug requests that were accepted. Not an Einstein was needed to figure out that those were actually bugs.
DOCUMENTATION SUCKS. As I mentioned proxies should be better explained, the term is MISLEADING. For the basic common problems, that I was solving, DOCS IS USELESS. Another example of misunderstanding from the DOC is connection of JPA annotations to RF. It looks from the succinct docs that they kinda play together, and yes, there is a corresponding question on StackOverflow. I recommend to forget any JPA 'connection' before understanding RF.
Advantages of RequestFactory
Excellent forum support.
IDE support is pretty good (but is not an advantage in contrast with RPC)
Flexibility of your client and server implementation (loose coupling)
Fancy stuff, connected to EntityProxies, beyond simple DTOs - caching, partial updates, very useful for mobile.
You can use ValueProxies as the simplest replacement for DTOs (but you have to do all not so fancy conversions yourself).
Support for Bean Validations JSR-303.
Considering other disadvantages of GWT in general:
Impossible to run integration tests (GWT client code + remote server) with provided JUnit support <= all JSNI has to be mocked (e.g. localStorage), SOP is an issue.
No support for testing setup - headless browser + remote server <= no simple headless testing for GWT, SOP.
Yes, it is possible to run selenium integration tests (but that's not what I want)
JSNI is very powerful, but at those shiny talks they give at conferences they do not talk much about that writing JSNI codes has some also some rules. Again, figuring out how to write a simple callback was a task worth of true researcher.
In summary, transition from GWT RPC to RequestFactory is far from WIN-WIN situation,
when RPC mostly fits your needs. You end up writing tons conversions from client domain objects to proxies and vice-versa. But you get some flexibility and robustness of your solution. And support on the forum is excellent, on Saturday as well!
Considering all advantages and disadvantages I just mentioned, it pays really well to think in advance whether any of these approaches actually brings improvement to your solution and to your development set-up without big trade-offs.
I find the idea of creating Proxy classes for all my entities quite annoying. My Hibernate/JPA pojos are auto-generated from the database model. Why do I now need to create a second mirror of those for RPC? We have a nice "estivation" framework that takes care of "de-hibernating" the pojos.
Also, the idea of defining service interfaces that don't quite implement the server side service as a java contract but do implement the methods - sounds very J2EE 1.x/2.x to me.
Unlike RequestFactory which has poor error handling and testing capabilities (since it processes most of the stuff under the hood of GWT), RPC allows you to use a more service oriented approach. RequestFactory implements a more modern dependency injection styled approach that can provide a useful approach if you need to invoke complex polymorphic data structures. When using RPC your data structures will need to be more flat, as this will allow your marshaling utilities to translate between your json/xml and java models. Using RPC also allows you to implement more robust architecture, as quoted from the gwt dev section on Google's website.
"Simple Client/Server Deployment
The first and most straightforward way to think of service definitions is to treat them as your application's entire back end. From this perspective, client-side code is your "front end" and all service code that runs on the server is "back end." If you take this approach, your service implementations would tend to be more general-purpose APIs that are not tightly coupled to one specific application. Your service definitions would likely directly access databases through JDBC or Hibernate or even files in the server's file system. For many applications, this view is appropriate, and it can be very efficient because it reduces the number of tiers.
Multi-Tier Deployment
In more complex, multi-tiered architectures, your GWT service definitions could simply be lightweight gateways that call through to back-end server environments such as J2EE servers. From this perspective, your services can be viewed as the "server half" of your application's user interface. Instead of being general-purpose, services are created for the specific needs of your user interface. Your services become the "front end" to the "back end" classes that are written by stitching together calls to a more general-purpose back-end layer of services, implemented, for example, as a cluster of J2EE servers. This kind of architecture is appropriate if you require your back-end services to run on a physically separate computer from your HTTP server."
Also note that setting up a single RequestFactory service requires creating around 6 or so java classes where as RPC only requires 3. More code == more errors and complexity in my book.
RequestFactory also has a little bit more overhead during the request processing, as it has to marshal serialization between the data proxies and actual java models. This added interface adds extra processing cycles which can really add up in an enterprise or production environment.
I also do not believe that RequestFactory services are serialization like RPC services.
All in all after using both for some time now, i always go with RPC as its more lightweight, easier to test and debug, and faster then using a RequestFactory. Although RequestFactory might be more elegant and extensible then its RPC counter part. The added complexity does not make it a better tool necessary.
My opinion is that the best architecture is to use two web apps , one client and one server. The server is a simple lightweight generic java webapp that uses the servlet.jar library. The client is GWT. You make RESTful request via GWT-RPC into the server side of the client web application. The server side of the client is just a pass though to apache http client which uses a persistant tunnel into the request handler you have running as a single servlet in your server servlet web application. The servlet web application should contain your database application layer (hibernate, cayenne, sql etc..) This allows you to fully divorce the database object models from the actual client providing a much more extensible and robust way to develop and unit test your application. Granted it requires a tad bit of initial setup time, but in the end allows you to create a dynamic request factory sitting outside of GWT. This allows you to leverage the best of both worlds. Not to mention being able to test and make changes to your server side without having to have the gwt client compiled or build.
I think it's really helpful if you have a heavy pojo on the client side, for example if you use Hibernate or JPA entities.
We adopted another solution, using a Django style persistence framework with very light entities.
The only caveat I would put in is that RequestFactory uses the binary data transport (deRPC maybe?) and not the normal GWT-RPC.
This only matters if you are doing heavy testing with SyncProxy, Jmeter, Fiddler, or any similar tool that can read/evaluate the contents of the HTTP request/response (like GWT-RPC), but would be more challenging with deRPC or RequestFactory.
We have have a very large implementation of GWT-RPC in our project.
Actually we have 50 Service interfaces with many methods each, and we have problems with the size of TypeSerializers generated by the compiler that turns our JS code huge.
So we are analizing to move towards RequestFactory.
I have been read for a couple of days digging into the web and trying to find what other people are doing.
The most important drawback I saw, and maybe I could be wrong, is that with RequestFactory your are no longer in control of the communication between your Server Domain objects and your client ones.
What we need is apply the load / save pattern in a controlled way. I mean, for example client receive the whole object graph of objects belonging to a specific transaction, do his updates and them send the whole back to the server. The server will be responsible for doing validation, compare old with new values and do persistance. If 2 users from different sites gets the same transaction and do some updates, the resulting transaction shouldn't be the merged one. One of the updates should fail in my scenario.
I don't see that RequestFactory helps supporting this kind of processing.
Regards
Daniel
Is it fair to say that when considering a limited MIS application, say with 10-20 CRUD'able business objects, and each with ~1-10 properties, that really it's down to personal preference which route to go with?
If so, then perhaps projecting how your application is going to scale could be the key in choosing your route GWT RPC or RequestFactory:
My application is expected to stay with that relatively limited number of entities but will massively increase in terms of their numbers. 10-20 objects * 100,000 records.
My application is going to increase significantly in the breadth of entities but the relative numbers involved of each will remain low. 5000 objects * 100 records.
My application is expected to stay with that relatively limited number of entities AND will stay in relatively low numbers of e.g. 10-20 objects * 100 records
In my case, I'm at the very starting point of trying to make this decision. Further complicated by having to change UI client side architecture as well as making the transport choice. My previous (significantly) large scale GWT UI used the Hmvc4Gwt library, which has been superseded by the GWT MVP facilities.