Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm creating an application (web application) that needs to consume data (update client transactions) from a Kafka broker, but I'm not sure whats the best way to approach this.
I can think of three different scenarios to process each update:
Install the Kafka consumer directly in my app, then I can just start another instance of it (I'm using docker, so another container) and make the required updates there (I think this is the fastest one).
Create a separate service that consumes from Kafka and make the required updates in the app database. It seems to be pretty much the same as option 1, but a smaller app and more maintenance (2 apps instead of 1).
Create a separate service that consumes from Kafka and sends the updates to a REST endpoint in my app. It seems this would be a tiny service, very specific and the process remains in the app; but the app will receive more requests.
So, which are the pros/cons of each solution? Are all of them valid or some of them are a complete no? What drawbacks/risks should I be aware of?
I'm not looking just for a recommendation, I am more interested in understanding which solution works best for a given scenario.
Thank you.
With 3 you are splitting your application into multiple services. When you distribute your code across multiple services, you increase the level of indirection. The more indirection you have in your codebase, the harder it is for one person to work across the entire codebase because they have to keep more things in their head, and working across network boundaries requires a lot more code than working across files, and finally it's harder to debug across a network API.
Now, this doesn't mean that it's bad to split your application into multiple services. Doing so will help you scale your application as you can scale only the pieces that need scaling. Perhaps more importantly, splitting your application into multiple services makes it easier for more people to work on the codebase at the same time, since they have to adhere to the API contracts between the services, and are less likely to be working on the same files at the same time.
So 3 is a good choice if you have scaling issues, either for load on your application, or the number of developers that will work on it.
1 is a good choice if you want to move as quickly as possible and can put off scaling concerns for some time.
2 is the worst of both worlds. Your two services will be coupled by the database schema and will be sharing the same database instance. The separation of code means that you have extra indirection, the database schema coupling means that you won't fully get the people scaling benefits, and since most applications are bottlenecked by the database, the sharing of the db instance will deprive you of scaling independently for performance.
Personal rule-of-thumb -
If you have control of the REST API code, then the first one.
If the API has specific validation before reaching the database, dont do the second one unless you plan on copying that code into the consumer. If you want to write directly to a database, then Kafka Connect is the suggested framework for that, not a plain consumer, anyway
If you dont control the API code (its a third-party API), then you are left with option 3
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
In our project, we are using micro services, and each micro service is making rest call to the same 3rd party service. As per my knowledge, we should make rest calls like ms1 -> ms2, where ms2 -> 3rd party api, instead we are directly getting data from ms1 -> 3rd party api.
So, are we doing it wrong? Or is my assumption wrong?
All our micro services are hitting the same 3rd party api. Do you think we can we apply concepts of eventsourcing and cqrs in this case? Or do you think can we apply messaging system concept in this case in order to improve the architecture?
What's your suggestion for the architecture?
EDIT:
3) The reason I've asked above bcoz, most of my microservices are using for loops and making rest calls to 3rd party api inside each for loop, so which means performance gets reduced. I am thinking that if we have some db in each of our microservice, so that whenever we make any GET 3rd party rest api call, then we can save that response in DB itself. And later when making further requests, will first check whether data exists in db , if yes, just return data from db , otherwise make rest call and save again new data in that respective microservice db. Does my thinking wrong?
4) Does making rest call faster or retreiving data from DB is faster? Sorry, if its a dumb query..
While I don't think what you're doing is inherently wrong, there are a number of things you need to consider:
Temporal coupling / Availability: If during a single process you make multiple repeated calls to a third party service, you run the risk of increased unavailability
Service boundaries: If one service needs data from another, is the data ownership really correct? Does the service holding the data need, or uses all of the data, or does it use only a small subset?
I recommend you have a read on microservices should be autonomous, which cover the above points in more detail.
As for messaging, I strongly recommend those where availability is of high importance. Using asynchronous communication will solve the issue of temporal coupling (one service not being able to fulfill it's function while the other is not responding), which immediately increases availability, at the cost of higher windows for inconsistency or responding with stale data. The actual business impact of working on slightly stale data is (from experience) non-existent, but you will need to be aware of this fact to avoid surprises.
Event sourcing is a methodology I recommend highly in certain contexts, but it does come at a cost, and with a lot of gotcha's, so I don't recommend it as a default approach. It can yield surprising accurate results, with a very small number of bugs appearing after launch, but you need to have:
Direct access to domain experts, who are happy to engage the development team
A development team who is happy to work side-by-side with domain experts
A shift in mentality away from how we think about data when using normal form stores or document stores
While DDD isn't necessarilly a requirement, it does make the discovery process much easier and safer, so I'd strongly recommend you are at least aware of it.
UPDATE (to address edit)
Regarding your point 4: this depends on your design \ architecture, but most probably, it would be faster to hit your local db rather than make a remote call. You can use the same caching strategies to access the data (if that is needed for performance) in both microservices, but you'll have the cost of going over an additional network call to hit the remove microservice.
Regarding your point 3: Nothing changes from what I already suggested. Temporal coupling has a greater impact the more times you call the remote service, so you're making your situation worse by keeping the data outside of your service. The boundary argument again applies to a similar extent: if you need the data in another service, maybe you should look into the possibility that the data needs to belong to the second microservice which uses it or part of the data belong to one microservice and some other parts belong to the other.
In general, I was (and still am) suggesting that you keep a local copy of the subset of data that you need to do your operation, especially if this is a common operation.
Short answer to your question can be like,
There nothing correct or wrong in your design, But if you want to significantly improve your design then you use a message broker like Apache Kafka (Cost effective alternative can be hazelcast). This will definitely make your architecture strong/robust and failure resistant.
Second alternative can be like , Using some design pattern like facade and also having capability of doing async calls.
Note:I am new to this platform and in initial stages of answering. So any improvement suggested will be appreciated. Thanks.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm tasked with investigating for our firm a full-stack solution where we'll be using a NoSQL database backend. It'll most likely be fed from a data warehouse and/or operational data store of some type in near-realtime (hopefully :). It will be used mainly by our mobile and web applications via REST.
A few requirements/assumptions:
It will be read-only (in the near term) and consumed by clients in REST format
It has to be scalable
Fast response time
Enterprise support - or if lacking actual support, something industry proven if open-source (basically management wants to hold
someone accountable if something in the stack fails)
Minimal client data transformations - i.e: data should be stored in as close to ready-to-use format as possible
Service API Management of some sort will most likely be needed (eg: 3scale)
Services will be used internally, but solution shouldn't prevent us from exposing them externally as a longterm goal
Micro-services are preferable (provided sufficient API management is in place)
We have in-house expertise in Java and Grails for our mobile/portal solutions
Some of the options I was tossing around were:
CouchDB: inherently returns REST - no need for translation layer - as
long as clients speak REST, we're all good
MongoDB: need a REST layer in between client and DB - haven't found a widely used one based on my investigation (the ones on Mongo's site all seem in their infancy - i.e: RestHeart)
Some questions I have:
Do I need an appserver? Or any layer in between the client and DB
for performance/caching reasons? I was thinking a reverse-proxy like
nginx would be a good idea for this?
Why not use CouchDB in this solution if it supports REST out of the box?
I'm struggling with deciding between which NoSQL DB to use, whether or not I need a REST translation layer, appserver, etc. I've read the pros and cons of each and mostly they say go Mongo - but for what I'm trying to do the lack of a mature REST layer is concerning.
I'm just looking for some ideas, tips, lessons learned that anyone out there would be willing to share.
Thanks!
The problem with exposing the database directly to the client is that most databases do not support permission control which is as fine-grained as you want it to be. You often can not allow a client to view and edit its own data while also forbidding it from viewing and editing any data of other users or even worse from the server itself. At least not when you still want a sane database schema.
You will also often find yourself in the situation that you have a document with several fields of which only some are supposed to be under the control of the user and others are not. I can, for example, edit the content of this answer, but I can not edit the time it was posted, the name it was posted under or its voting score. So far I have never seen a database system which can handle permission for individual fields (when anyone has: feel free to post in the comments).
You might think about trying to handle this on the client and just don't offer any user interface for editing said fields. But that will only work in a trusted environment. When you have untrusted users, they could create a clone of your client-sided application which does expose this functionality. There is no way for you to tell the difference between the genuine client and a clone, especially not when you don't have a smart application server (and even then it is practically impossible).
For that reason it is almost always required to have an application server between clients and database which handles authentication and permission management of the clients and only forwards those requests to the persistence layer which are permitted.
I totally agree with the answer from #Philipp. In the case of using CouchDB you will minimum want to use a proxy server in front to enable SSL.
Almost all of your requirements can be fulfilled by CouchDB. Especially the upcoming v2 will give you the "datacenter-needs".
But it's simply very complex to answer what should be the right tool for you purpose. If you get some business model requirements on top like lets say: throttling - then you will definitely need an application server middleware like http://mcavage.me/node-restify/
Maybe it's a good idea to spend some money to professionals like
http://www.neighbourhood.ie/couchdb-support/ ? (I'm not involved)
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I would like to know what advantages there are to using EventStore (http://geteventstore.com) over implementing event sourcing yourself in a MongoDb.
The reason I ask, is that our company has a number of people that work with MongoDb daily. They don't work with Event Sourcing though. While they are not completely in the dark about the subject, they aren't about to start implementing it anywhere either.
I am about to start a project, that is perfectly suited for Event Sourcing. There are about 16 very well defined events, and about 7 well defined projections. I say "about" because I know there will be demand for more projections and events once they see the product in use.
The approach is going to be API first, with a REST Api that other parts of our organisation are going to consume.
While I have read a lot about Event Sourcing the way Greg Young defines it, I have never actually implemented an Event Sourcing solution.
This is a green field project. No technology restrictions since we are going to expose everything as a REST interface. So if anyone has working experience with EvenStore or Event Sourcing with MongoDb please enlighten me.
Also an almost totally non related question about Event Sourcing:
Do you ever query the event store directly? Or would you always create new projections and replay event to populate those projections?
Disclaimer I am Greg Young (if you cant read my name :))
I am going to answer this question though I believe it will likely get deleted anyways. This question alone for me is a bit odd, but the answers are fairly bizarre. I won't take the time to answer each reply individually but will instead put all of my comments in this reply.
1) There is a comment that we only run on a custom version of mono which is a detail but... This is not the case (and has not been for over a year). We were waiting on critical patches we made to mono (as example threadpool.c to hit their master). This has happened.
2) EventStore is 3-clause BSD licensed. Not sure how you could claim we are not Open Source. We also have a company behind it and provide commercial support.
3) Someone mentioned us going on to version 3 in Sept. Version 1 was released 2 years ago. Version 2 added Clustering (obviously some breaking changes vs single node). Version 3 is adding a ton of stuff including ability to have competing consumers. Very little has changed in terms of the actual client protocol over this time (especially for those using the HTTP API).
What is really disturbing for me in the recommendations however is that they don't seem to understand what they are comparing. It would be roughly the equivalent of me saying "Which should I use neo4j or leveldb?". You could build yourself a graph database on top of leveldb but that would be quite a bit of work.
Mongo in this case would be a storage engine on the event store the OP would have to write him/herself. The writing of a production quality event store is a non-trivial exercise on top of a storage engine if you want to have even the most basic operations.
I wrote this in response to the mailing list equivalent of this question:
How will you do the following with Mongo?:
Write and read events to/from streams with ordering/optimistic concurrency/etc
Then:
Your projections don't want to read from streams in the same way they were written, projections are normally interested in event types and want all events of type T regardless of stream written to and in proper order.
You probably also want for instance the ability to switch live from pushed event notifications to handling pulled information (eg polling) etc.
It would make more sense if Kafka, datomic, and Event Store were being compared.
Seeing as the other replies don't talk about the tooling or benefits in EventStore and only refer to the benefits of MongoDB I'll chime in. But note that my experience is limited.
I'll start with the cons...
There are a lot of check-ins which can lead to deciding which version you are going to actively support yourself. While the team has been solidifying their releases, that they have arrived at version 3 not even 18 months after being released should be an indicator that you have to pull up the version you are supporting for another more recent version (which can also impact the platform you choose to deploy to).
It's not going to easily work on every platform (especially if you're trying to move to a cloud environment or a docker based lxc container). Some of this is due to the community surrounding other DBs such as Mongo. But the team seems to have been working their butts off on read/write performance while maintaining cross platform stability. As time presses on I've found that you don't want to deviate too far from a bare-metal OS implementation which this day in age is not attractive.
Uses a special version of Mono. Finding support for older versions of Mono only serve to make the process more of a root canal.
To make the most of performance of EventStore you really need to think about your architecture. EventStore outputs to flat files and event data can grow pretty quickly. What's the fail rate of the disks are you persisting your data to. How are things compressed? archived? etc. You have a lot of control and the control is geared towards storing your data as events. However, while I'm sure Greg Young himself could quote me to my grave the features that optimize and save your disks in the long term, I'll more than likely find a mature Mongo community that has had experience running into similar cases.
And the Pros...
RESTful - It's AtomPub. Is your stream not specific enough? Create another and do http gets till your hearts content. Concerned about routing do do an http forward. Concerned about security put an http proxy in front. Simple!
You have a nice suite of tools and UI for testing out and building your projections as your events start to generate new data (eg. use chrome browser as a way to debug your projections... ya they're written with java script)
Read performance - Since the application outputs to a flat file you can get kernel level caching and expose them via http in the drop of a hat. Also indexes are across your streams for querying projections against larger data sets (but I really get the feeling index performance will creep up on you over time).
I personally would not use this for a core / mission critical / or growing application! However, if you have a side case for keeping your evented environment interesting then I'd give it go! I personally have to stick to Mongo for now.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am trying to decide between two development firms. One wants to go with Parse while the other wants to build a backend. I would like to get feedback and reasons why building a backend or using a BaaS such as Parse, Stackmob is better in terms of scalability and performance.
For example let's use SnapChat a highly used app that handles millions of users and data requests. What would happen if a newly created app were to experience a large increase in users and data request. Would the backend be able to handle this? Would I be looking to have it fixed shortly after the increase in users?
Something like Parse.com gives you a lot of value for very little capital investment. With BaaS, all of the gory details of infrastructure management are hidden. Deployment, system capacity issues, system availability, system security, database administration and a myriad of other task simply go away when using a good BaaS. Parse.com for instance, uses Amazon Web Services and elastic load balancing to dynamically add more capacity to the system as usage increases. This is the nirvana of capacity management.
Parse.com is a special kind of BaaS. Parse.com's intended purpose is to be a light-weight back-end back-end for mobile apps. I believe Parse.com is a very good mobile backend-as-a-service (MBaaS - link to a Forrester article on the subject).
That said, there are times when Parse.com is not the right solution. Estimate the number of users for the application and the number of HTTP requests and average user would send in a day. Parse.com charges by the number of transactions. The Pro Plan has these limits:
15 million HTTP requests per month
Burst limit of 40 requests per second
Many small transactions can result in a higher cost to the app owner. For example, if there are 4,500 users, each sending 125 HTTP requests to Parse.com per day, then you are already looking at 16,850,000 requests every 30 days. Parse.com also offers a higher level of service called Parse Enterprise. Details about this plan are not published.
The services provided by a BaaS/MBaaS save much time and energy on the part of the application developer, but impose some constraints. For example, the response time of Parse.com might be too slow for your needs. Unless you upgrade to their Enterprise plan, you have no control over response times. You currently have no control over where your app is hosted (Parse apps are presently run out of Amazon's data centers in Virginia, I believe).
The BaaS providers I have looked at do not provide quality-of-service metrics. Even if they did, there is no community agreement on what metrics would be meaningful. You just get what you get and hope it is good enough for your needs.
An application is a good candidate for an MBaaS if :
It is simple or the application logic can run entirely on the client (phone, tablet...)
It is impossible to estimate the number of users or the number of users could be huge.
You don't want a big upfront capital investment.
You don't want to hire infrastructure specialists to handle capacity/security/data/recovery/network engineering.
Your application does not have strict response time requirements.
Parse's best use case is the iPhone developer who wrote a game and needs to store the user's high scores, but knows nothing about servers. That said, complex application like Hipmunk are using Parse. Have a look at Parse.com's portfolio of case studies. Can you imagine your application in that portfolio or is it very different from those apps?
Even if a BaaS is not the right solution, a PaaS or IaaS might be. Look at Rackspace and AWS. In this day and age, buying hardware and running a data center is tough to justify.
BaaS providers like apiomat or parse have to handle the requests of thousands of apps. Every app can have lots of users there. The providers are forced to make the system absolutely secure and scalable because if there are any issues about one of those points it will be the end of their business... Building scalable secure backends on your own is not as easy a you would expect. Those companys mentioned above have invested some man-years in that.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
We have written a software package for a particular niche industry. This package has been pretty successful, to the extent that we have signed up several different clients in the industry, who use us as a hosted solution provider, and many others are knocking on our doors. If we achieve the kind of success that we're aiming for, we will have literally hundreds of clients, each with their own web site hosted on our servers.
Trouble is, each client comes in with their own little customizations and tweaks that they need for their own local circumstances and conditions, often (but not always) based on local state or even county legislation or bureaucracy. So while probably 90-95% of the system is the same across all clients, we're going to have to build and support these little customizations.
Moreover, the system is still very much a work in progress. There are enhancements and bug fixes happening continually on the core system that need to be applied across all clients.
We are writing code in .NET (ASP, C#), MS-SQL 2005 is our DB server, and we're using SourceGear Vault as our source control system. I have worked with branching in Vault before, and it's great if you only need to keep 2 or 3 branches synchronized - but we're looking at maintaining hundreds of branches, which is just unthinkable.
My question is: How do you recommend we manage all this?
I expect answers will be addressing things like object architecture, web server architecture, source control management, developer teams etc. I have a few ideas of my own, but I have no real experience in managing something like this, and I'd really appreciate hearing from people who have done this sort of thing before.
Thanks!
I would recommend against maintaining separate code branches per customer. This is a nightmare to maintain working code against your Core.
I do recommend you do implement the Strategy Pattern and cover your "customer customizations" with automated tests (e.g. Unit & Functional) whenever you are changing your Core.
UPDATE:
I recommend that before you get too many customers, you need to establish a system of creating and updating each of their websites. How involved you get is going to be balanced by your current revenue stream of course, but you should have an end in mind.
For example, when you just signed up Customer X (hopefully all via the web), their website will be created in XX minutes and send the customer an email stating it's ready.
You definitely want to setup a Continuous Integration (CI) environment. TeamCity is a great tool, and free.
With this in place, you'll be able to check your updates in a staging environment and can then apply those patches across your production instances.
Bottom Line: Once you get over a handful of customers, you need to start thinking about automating your operations and your deployment as yet another application to itself.
UPDATE: This post highlights the negative effects of branching per customer.
Our software has very similar requirements and I've picked up a few things over the years.
First of all, such customizations will cost you both in the short and long-term. If you have control over it, place some checks and balances such that sales & marketing do not over-zealously sell customizations.
I agree with the other posters that say NOT to use source control to manage this. It should be built into the project architecture wherever possible. When I first began working for my current employer, source control was being used for this and it quickly became a nightmare.
We use a separate database for each client, mainly because for many of our clients, the law or the client themselves require it due to privacy concerns, etc...
I would say that the business logic differences have probably been the least difficult part of the experience for us (your mileage may vary depending on the nature of the customizations required). For us, most variations in business logic can be broken down into a set of configuration values which we store in an xml file that is modified upon deployment (if machine specific) or stored in a client-specific folder and kept in source control (explained below). The business logic obtains these values at runtime and adjusts its execution appropriately. You can use this in concert with various strategy and factory patterns as well -- config fields can contain names of strategies etc... . Also, unit testing can be used to verify that you haven't broken things for other clients when you make changes. Currently, adding most new clients to the system involves simply mixing/matching the appropriate config values (as far as business logic is concerned).
More of a problem for us is managing the content of the site itself including the pages/style sheets/text strings/images, all of which our clients often want customized. The current approach that I've taken for this is to create a folder tree for each client that mirrors the main site - this tree is rooted at a folder named "custom" that is located in the main site folder and deployed with the site. Content placed in the client-specific set of folders either overrides or merges with the default content (depending on file type). At runtime the correct file is chosen based on the current context (user, language, etc...). The site can be made to serve multiple clients this way. Efficiency may also be a concern - you can use caching, etc... to make it faster (I use a custom VirtualPathProvider). The largest problem we run into is the burden of visually testing all of these pages when we need to make changes. Basically, to be 100% sure you haven't broken something in a client's custom setup when you have changed a shared stylesheet, image, etc... you would have to visually inspect every single page after any significant design change. I've developed some "feel" over time as to what changes can be comfortably made without breaking things, but it's still not a foolproof system by any means.
In my case I also have no control other than offering my opinion over which visual/code customizations are sold so MANY more of them than I would like have been sold and implemented.
This is not something that you want to solve with source control management, but within the architecture of your application.
I would come up with some sort of plugin like architecture. Which plugins to use for which website would then become a configuration issue and not a source control issue.
This allows you to use branches, etc. for the stuff that they are intended for: parallel development of code between (or maybe even over) releases. Each plugin becomes a seperate project (or subproject) within your source code system. This also allows you to combine all plugins and your main application into one visual studio solution to help with dependency analisys etc.
Loosely coupling the various components in your application is the best way to go.
As mention before, source control does not sound like a good solution for your problem. To me it sounds that is better yo have a single code base using a multi-tenant architecture. This way you get a lot of benefits in terms of managing your application, load on the service, scalability, etc.
Our product using this approach and what we have is some (a lot) of core functionality that is the same for all clients, custom modules that are used by one or more clients and at the core a the "customization" is a simple workflow engine that uses different workflows for different clients, so each clients gets the core functionality, its own workflow(s) and some extended set of modules that are either client specific or generalized for more that one client.
Here's something to get you started on multi-tenancy architecture:
Multi-Tenant Data Architecture
SaaS database tenancy patterns
Without more info, such as types of client specific customization, one can only guess how deep or superficial the changes are. Some simple/standard approaches to consider:
If you can keep a central config specifying the uniqueness from client to client
If you can centralize the business rules to one class or group of classes
If you can store the business rules in the database and pull out based on client
If the business rules can all be DB/SQL based (each client having their own DB
Overall hard coding differences based on client name/id is very problematic, keeping different code bases per client is costly (think of the complete testing/retesting time required for the 90% that doesn't change)...I think more info is required to properly answer (give some specifics)
Layer the application. One of those layers contains customizations and should be able to be pulled out at any time without affect on the rest of the system. Application- and DB-level "triggers" (quoted because they may or many not employ actual DB triggers) that call customer-specific code or are parametrized with customer keys) are very helpful.
Core should never be customized, but you must layer it in somewhere, even if it is simplistic web filtering.
What we have is a a core datbase that has the functionality that all clients get. Then each client has a separate database that contains the customizations for that client. This is expensive in terms of maintenance. The other problem is that when two clients ask for a simliar functionality, it is often done differnetly by the two separate teams. There is currently little done to share custiomizations between clients and make common ones become part of the core application. Each client has their own application portal, so we don't have the worry about a change to one client affecting some other client.
Right now we are looking at changing to a process using a rules engine, but there is some concern that the perfomance won't be there for the number of records we need to be able to process. However, in your circumstances, this might be a viable alternative.
I've used some applications that offered the following customizations:
Web pages were configurable - we could drag fields out of view, position them where we wanted with our own name for the field label.
Add our own views or stored procedures and use them in: data grids (along with an update proc) and reports. Each client would need their own database.
Custom mapping of Excel files to import data into system.
Add our own calculated fields.
Ability to run custom scripts on forms during various events.
Identify our own custom fields.
If you clients are larger companies, you're almost going to need your own SDK, API's, etc.