Enterprise NoSQL Stack Solution for Mobile/Web [closed] - mongodb

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm tasked with investigating for our firm a full-stack solution where we'll be using a NoSQL database backend. It'll most likely be fed from a data warehouse and/or operational data store of some type in near-realtime (hopefully :). It will be used mainly by our mobile and web applications via REST.
A few requirements/assumptions:
It will be read-only (in the near term) and consumed by clients in REST format
It has to be scalable
Fast response time
Enterprise support - or if lacking actual support, something industry proven if open-source (basically management wants to hold
someone accountable if something in the stack fails)
Minimal client data transformations - i.e: data should be stored in as close to ready-to-use format as possible
Service API Management of some sort will most likely be needed (eg: 3scale)
Services will be used internally, but solution shouldn't prevent us from exposing them externally as a longterm goal
Micro-services are preferable (provided sufficient API management is in place)
We have in-house expertise in Java and Grails for our mobile/portal solutions
Some of the options I was tossing around were:
CouchDB: inherently returns REST - no need for translation layer - as
long as clients speak REST, we're all good
MongoDB: need a REST layer in between client and DB - haven't found a widely used one based on my investigation (the ones on Mongo's site all seem in their infancy - i.e: RestHeart)
Some questions I have:
Do I need an appserver? Or any layer in between the client and DB
for performance/caching reasons? I was thinking a reverse-proxy like
nginx would be a good idea for this?
Why not use CouchDB in this solution if it supports REST out of the box?
I'm struggling with deciding between which NoSQL DB to use, whether or not I need a REST translation layer, appserver, etc. I've read the pros and cons of each and mostly they say go Mongo - but for what I'm trying to do the lack of a mature REST layer is concerning.
I'm just looking for some ideas, tips, lessons learned that anyone out there would be willing to share.
Thanks!

The problem with exposing the database directly to the client is that most databases do not support permission control which is as fine-grained as you want it to be. You often can not allow a client to view and edit its own data while also forbidding it from viewing and editing any data of other users or even worse from the server itself. At least not when you still want a sane database schema.
You will also often find yourself in the situation that you have a document with several fields of which only some are supposed to be under the control of the user and others are not. I can, for example, edit the content of this answer, but I can not edit the time it was posted, the name it was posted under or its voting score. So far I have never seen a database system which can handle permission for individual fields (when anyone has: feel free to post in the comments).
You might think about trying to handle this on the client and just don't offer any user interface for editing said fields. But that will only work in a trusted environment. When you have untrusted users, they could create a clone of your client-sided application which does expose this functionality. There is no way for you to tell the difference between the genuine client and a clone, especially not when you don't have a smart application server (and even then it is practically impossible).
For that reason it is almost always required to have an application server between clients and database which handles authentication and permission management of the clients and only forwards those requests to the persistence layer which are permitted.

I totally agree with the answer from #Philipp. In the case of using CouchDB you will minimum want to use a proxy server in front to enable SSL.
Almost all of your requirements can be fulfilled by CouchDB. Especially the upcoming v2 will give you the "datacenter-needs".
But it's simply very complex to answer what should be the right tool for you purpose. If you get some business model requirements on top like lets say: throttling - then you will definitely need an application server middleware like http://mcavage.me/node-restify/
Maybe it's a good idea to spend some money to professionals like
http://www.neighbourhood.ie/couchdb-support/ ? (I'm not involved)

Related

Best architecture for Kafka consumer [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm creating an application (web application) that needs to consume data (update client transactions) from a Kafka broker, but I'm not sure whats the best way to approach this.
I can think of three different scenarios to process each update:
Install the Kafka consumer directly in my app, then I can just start another instance of it (I'm using docker, so another container) and make the required updates there (I think this is the fastest one).
Create a separate service that consumes from Kafka and make the required updates in the app database. It seems to be pretty much the same as option 1, but a smaller app and more maintenance (2 apps instead of 1).
Create a separate service that consumes from Kafka and sends the updates to a REST endpoint in my app. It seems this would be a tiny service, very specific and the process remains in the app; but the app will receive more requests.
So, which are the pros/cons of each solution? Are all of them valid or some of them are a complete no? What drawbacks/risks should I be aware of?
I'm not looking just for a recommendation, I am more interested in understanding which solution works best for a given scenario.
Thank you.
With 3 you are splitting your application into multiple services. When you distribute your code across multiple services, you increase the level of indirection. The more indirection you have in your codebase, the harder it is for one person to work across the entire codebase because they have to keep more things in their head, and working across network boundaries requires a lot more code than working across files, and finally it's harder to debug across a network API.
Now, this doesn't mean that it's bad to split your application into multiple services. Doing so will help you scale your application as you can scale only the pieces that need scaling. Perhaps more importantly, splitting your application into multiple services makes it easier for more people to work on the codebase at the same time, since they have to adhere to the API contracts between the services, and are less likely to be working on the same files at the same time.
So 3 is a good choice if you have scaling issues, either for load on your application, or the number of developers that will work on it.
1 is a good choice if you want to move as quickly as possible and can put off scaling concerns for some time.
2 is the worst of both worlds. Your two services will be coupled by the database schema and will be sharing the same database instance. The separation of code means that you have extra indirection, the database schema coupling means that you won't fully get the people scaling benefits, and since most applications are bottlenecked by the database, the sharing of the db instance will deprive you of scaling independently for performance.
Personal rule-of-thumb -
If you have control of the REST API code, then the first one.
If the API has specific validation before reaching the database, dont do the second one unless you plan on copying that code into the consumer. If you want to write directly to a database, then Kafka Connect is the suggested framework for that, not a plain consumer, anyway
If you dont control the API code (its a third-party API), then you are left with option 3

Is our Microservices Design Wrong? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
In our project, we are using micro services, and each micro service is making rest call to the same 3rd party service. As per my knowledge, we should make rest calls like ms1 -> ms2, where ms2 -> 3rd party api, instead we are directly getting data from ms1 -> 3rd party api.
So, are we doing it wrong? Or is my assumption wrong?
All our micro services are hitting the same 3rd party api. Do you think we can we apply concepts of eventsourcing and cqrs in this case? Or do you think can we apply messaging system concept in this case in order to improve the architecture?
What's your suggestion for the architecture?
EDIT:
3) The reason I've asked above bcoz, most of my microservices are using for loops and making rest calls to 3rd party api inside each for loop, so which means performance gets reduced. I am thinking that if we have some db in each of our microservice, so that whenever we make any GET 3rd party rest api call, then we can save that response in DB itself. And later when making further requests, will first check whether data exists in db , if yes, just return data from db , otherwise make rest call and save again new data in that respective microservice db. Does my thinking wrong?
4) Does making rest call faster or retreiving data from DB is faster? Sorry, if its a dumb query..
While I don't think what you're doing is inherently wrong, there are a number of things you need to consider:
Temporal coupling / Availability: If during a single process you make multiple repeated calls to a third party service, you run the risk of increased unavailability
Service boundaries: If one service needs data from another, is the data ownership really correct? Does the service holding the data need, or uses all of the data, or does it use only a small subset?
I recommend you have a read on microservices should be autonomous, which cover the above points in more detail.
As for messaging, I strongly recommend those where availability is of high importance. Using asynchronous communication will solve the issue of temporal coupling (one service not being able to fulfill it's function while the other is not responding), which immediately increases availability, at the cost of higher windows for inconsistency or responding with stale data. The actual business impact of working on slightly stale data is (from experience) non-existent, but you will need to be aware of this fact to avoid surprises.
Event sourcing is a methodology I recommend highly in certain contexts, but it does come at a cost, and with a lot of gotcha's, so I don't recommend it as a default approach. It can yield surprising accurate results, with a very small number of bugs appearing after launch, but you need to have:
Direct access to domain experts, who are happy to engage the development team
A development team who is happy to work side-by-side with domain experts
A shift in mentality away from how we think about data when using normal form stores or document stores
While DDD isn't necessarilly a requirement, it does make the discovery process much easier and safer, so I'd strongly recommend you are at least aware of it.
UPDATE (to address edit)
Regarding your point 4: this depends on your design \ architecture, but most probably, it would be faster to hit your local db rather than make a remote call. You can use the same caching strategies to access the data (if that is needed for performance) in both microservices, but you'll have the cost of going over an additional network call to hit the remove microservice.
Regarding your point 3: Nothing changes from what I already suggested. Temporal coupling has a greater impact the more times you call the remote service, so you're making your situation worse by keeping the data outside of your service. The boundary argument again applies to a similar extent: if you need the data in another service, maybe you should look into the possibility that the data needs to belong to the second microservice which uses it or part of the data belong to one microservice and some other parts belong to the other.
In general, I was (and still am) suggesting that you keep a local copy of the subset of data that you need to do your operation, especially if this is a common operation.
Short answer to your question can be like,
There nothing correct or wrong in your design, But if you want to significantly improve your design then you use a message broker like Apache Kafka (Cost effective alternative can be hazelcast). This will definitely make your architecture strong/robust and failure resistant.
Second alternative can be like , Using some design pattern like facade and also having capability of doing async calls.
Note:I am new to this platform and in initial stages of answering. So any improvement suggested will be appreciated. Thanks.

Is it a good practice to publicly document an open API? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Everybody can be sure of the benefits and the drawbacks of an open API.
But is it a good or a bad practice to publicly document the open API (which requires authentication for its requests)?
By publicly document I mean creating a documentation showing the structure of the body of the request that the API can receive and by giving descriptions for all these fields.
Eg, given an endpoint my-public.url/myendpoint/myresource, with available PUT, POST, DELETE and GET http requests there's a static page my-public.url/document/myendpointwhich shows all the acceptable http requests along with description for the headers and the body of the request that are needed in order to perform it.
On the one side, this will help external developers to use the API easily, but on the other hand, if somebody gains access somehow, it would be easy for them to make requests and corrupt the system, since the whole structure of the API is given.
You can look at this from the risk perspective. Providing public documentation for an API presents a risk, for the reason you mentioned, it may help an attacker. On the other hand, security is always a balance, providing documentation helps (or is even necessary for) your users.
Also you shouldn't implement security by obscurity, ie. how things work should be considered known to attackers - but it's true that many times that's not the case in reality.
As providing public documentation is a risk, you then have to treat it somehow. You can do several things with risk, for example you can accept it (~do nothing), eliminate it (~not provide documentation in this case), or mitigate it.
Mitigating this risk would mean additional things you do to make an exploit less likely, or to decrease the impact. Likelihood can be reduced by for example stronger controls around how you develop your software, adding automated testing around authentication and authorization features, adding static code analyzers to the mix and so on. Impact can be reduced by good architecture that separates logical layers, intrusion detection/prevention systems, or even going for single tenant instead of multi-tenant.
In the end it all comes down to what risk you want to accept, and that entirely depends on you. With proper controls, it is ok to provide public docs - how else could you expect users to be able to use your api? The question is what are "proper" controls, and that depends on your risk appetite.
Risk in your application's APIs is not at all increased because of publicly documenting it.
Here are some reasons which I get in mind when thinking of this,
If you are having a browser based client, anyone with a bit of knowledge to inspect the browser developer tools' NETWORK tab can find your API details.
In case if there is any mobile client which uses your API, the request information can be easily viewed using applications like WireShark. The most famous API tester application POSTMAN also supports such a functionality.
With the help of the above tools, it's more likely for anyone can know about your API details.
Advantages of public documentation
API consumers can just visit your public API documentation which clearly saves a lot of man-time in API developer-consumer communication. (Version based API (and documentation) will be very helpful in time of changes in existing ones)
Creating API demo/test kits like Postman Collections will help the API consumers to test and use the APIs easily.
Here are some points which you can note to reduce the risks in your application.
Authentication - Login credentials / Access token / API Keys
Authorization - Access check for any resource which is being accessed / modified.
API Rate limiting - To avoid a DOS attack.

EventStore vs. MongoDb [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I would like to know what advantages there are to using EventStore (http://geteventstore.com) over implementing event sourcing yourself in a MongoDb.
The reason I ask, is that our company has a number of people that work with MongoDb daily. They don't work with Event Sourcing though. While they are not completely in the dark about the subject, they aren't about to start implementing it anywhere either.
I am about to start a project, that is perfectly suited for Event Sourcing. There are about 16 very well defined events, and about 7 well defined projections. I say "about" because I know there will be demand for more projections and events once they see the product in use.
The approach is going to be API first, with a REST Api that other parts of our organisation are going to consume.
While I have read a lot about Event Sourcing the way Greg Young defines it, I have never actually implemented an Event Sourcing solution.
This is a green field project. No technology restrictions since we are going to expose everything as a REST interface. So if anyone has working experience with EvenStore or Event Sourcing with MongoDb please enlighten me.
Also an almost totally non related question about Event Sourcing:
Do you ever query the event store directly? Or would you always create new projections and replay event to populate those projections?
Disclaimer I am Greg Young (if you cant read my name :))
I am going to answer this question though I believe it will likely get deleted anyways. This question alone for me is a bit odd, but the answers are fairly bizarre. I won't take the time to answer each reply individually but will instead put all of my comments in this reply.
1) There is a comment that we only run on a custom version of mono which is a detail but... This is not the case (and has not been for over a year). We were waiting on critical patches we made to mono (as example threadpool.c to hit their master). This has happened.
2) EventStore is 3-clause BSD licensed. Not sure how you could claim we are not Open Source. We also have a company behind it and provide commercial support.
3) Someone mentioned us going on to version 3 in Sept. Version 1 was released 2 years ago. Version 2 added Clustering (obviously some breaking changes vs single node). Version 3 is adding a ton of stuff including ability to have competing consumers. Very little has changed in terms of the actual client protocol over this time (especially for those using the HTTP API).
What is really disturbing for me in the recommendations however is that they don't seem to understand what they are comparing. It would be roughly the equivalent of me saying "Which should I use neo4j or leveldb?". You could build yourself a graph database on top of leveldb but that would be quite a bit of work.
Mongo in this case would be a storage engine on the event store the OP would have to write him/herself. The writing of a production quality event store is a non-trivial exercise on top of a storage engine if you want to have even the most basic operations.
I wrote this in response to the mailing list equivalent of this question:
How will you do the following with Mongo?:
Write and read events to/from streams with ordering/optimistic concurrency/etc
Then:
Your projections don't want to read from streams in the same way they were written, projections are normally interested in event types and want all events of type T regardless of stream written to and in proper order.
You probably also want for instance the ability to switch live from pushed event notifications to handling pulled information (eg polling) etc.
It would make more sense if Kafka, datomic, and Event Store were being compared.
Seeing as the other replies don't talk about the tooling or benefits in EventStore and only refer to the benefits of MongoDB I'll chime in. But note that my experience is limited.
I'll start with the cons...
There are a lot of check-ins which can lead to deciding which version you are going to actively support yourself. While the team has been solidifying their releases, that they have arrived at version 3 not even 18 months after being released should be an indicator that you have to pull up the version you are supporting for another more recent version (which can also impact the platform you choose to deploy to).
It's not going to easily work on every platform (especially if you're trying to move to a cloud environment or a docker based lxc container). Some of this is due to the community surrounding other DBs such as Mongo. But the team seems to have been working their butts off on read/write performance while maintaining cross platform stability. As time presses on I've found that you don't want to deviate too far from a bare-metal OS implementation which this day in age is not attractive.
Uses a special version of Mono. Finding support for older versions of Mono only serve to make the process more of a root canal.
To make the most of performance of EventStore you really need to think about your architecture. EventStore outputs to flat files and event data can grow pretty quickly. What's the fail rate of the disks are you persisting your data to. How are things compressed? archived? etc. You have a lot of control and the control is geared towards storing your data as events. However, while I'm sure Greg Young himself could quote me to my grave the features that optimize and save your disks in the long term, I'll more than likely find a mature Mongo community that has had experience running into similar cases.
And the Pros...
RESTful - It's AtomPub. Is your stream not specific enough? Create another and do http gets till your hearts content. Concerned about routing do do an http forward. Concerned about security put an http proxy in front. Simple!
You have a nice suite of tools and UI for testing out and building your projections as your events start to generate new data (eg. use chrome browser as a way to debug your projections... ya they're written with java script)
Read performance - Since the application outputs to a flat file you can get kernel level caching and expose them via http in the drop of a hat. Also indexes are across your streams for querying projections against larger data sets (but I really get the feeling index performance will creep up on you over time).
I personally would not use this for a core / mission critical / or growing application! However, if you have a side case for keeping your evented environment interesting then I'd give it go! I personally have to stick to Mongo for now.

Own Backend vs BaaS [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am trying to decide between two development firms. One wants to go with Parse while the other wants to build a backend. I would like to get feedback and reasons why building a backend or using a BaaS such as Parse, Stackmob is better in terms of scalability and performance.
For example let's use SnapChat a highly used app that handles millions of users and data requests. What would happen if a newly created app were to experience a large increase in users and data request. Would the backend be able to handle this? Would I be looking to have it fixed shortly after the increase in users?
Something like Parse.com gives you a lot of value for very little capital investment. With BaaS, all of the gory details of infrastructure management are hidden. Deployment, system capacity issues, system availability, system security, database administration and a myriad of other task simply go away when using a good BaaS. Parse.com for instance, uses Amazon Web Services and elastic load balancing to dynamically add more capacity to the system as usage increases. This is the nirvana of capacity management.
Parse.com is a special kind of BaaS. Parse.com's intended purpose is to be a light-weight back-end back-end for mobile apps. I believe Parse.com is a very good mobile backend-as-a-service (MBaaS - link to a Forrester article on the subject).
That said, there are times when Parse.com is not the right solution. Estimate the number of users for the application and the number of HTTP requests and average user would send in a day. Parse.com charges by the number of transactions. The Pro Plan has these limits:
15 million HTTP requests per month
Burst limit of 40 requests per second
Many small transactions can result in a higher cost to the app owner. For example, if there are 4,500 users, each sending 125 HTTP requests to Parse.com per day, then you are already looking at 16,850,000 requests every 30 days. Parse.com also offers a higher level of service called Parse Enterprise. Details about this plan are not published.
The services provided by a BaaS/MBaaS save much time and energy on the part of the application developer, but impose some constraints. For example, the response time of Parse.com might be too slow for your needs. Unless you upgrade to their Enterprise plan, you have no control over response times. You currently have no control over where your app is hosted (Parse apps are presently run out of Amazon's data centers in Virginia, I believe).
The BaaS providers I have looked at do not provide quality-of-service metrics. Even if they did, there is no community agreement on what metrics would be meaningful. You just get what you get and hope it is good enough for your needs.
An application is a good candidate for an MBaaS if :
It is simple or the application logic can run entirely on the client (phone, tablet...)
It is impossible to estimate the number of users or the number of users could be huge.
You don't want a big upfront capital investment.
You don't want to hire infrastructure specialists to handle capacity/security/data/recovery/network engineering.
Your application does not have strict response time requirements.
Parse's best use case is the iPhone developer who wrote a game and needs to store the user's high scores, but knows nothing about servers. That said, complex application like Hipmunk are using Parse. Have a look at Parse.com's portfolio of case studies. Can you imagine your application in that portfolio or is it very different from those apps?
Even if a BaaS is not the right solution, a PaaS or IaaS might be. Look at Rackspace and AWS. In this day and age, buying hardware and running a data center is tough to justify.
BaaS providers like apiomat or parse have to handle the requests of thousands of apps. Every app can have lots of users there. The providers are forced to make the system absolutely secure and scalable because if there are any issues about one of those points it will be the end of their business... Building scalable secure backends on your own is not as easy a you would expect. Those companys mentioned above have invested some man-years in that.