Getting Beyond 50 Replica Set Members in Mongodb - mongodb

I’m looking to build a distributed Access Control system for a microservice platform. I’m considering using Mongodb as my database technology. My system design objectives are as follows:
Policy Enforcement should be distributed - If any given Policy
Enforcement Point (PEP) experiences downtime, only the application
that the PEP serves should be affected.
Policy Decisions should be
distributed - We don’t want the whole platform to experience downtime
because a central Policy Decision Point (PDP) is experiencing
downtime. We only want it to affect the application that it serves.
Policy Administration should be centralized - Creating a centralized
policy administration interface provides the ability for any system
(including a UI) to understand what rights an individual has, and by
establishing a common interface it allows us to more easily audit
changes to access across a whole platform.
Policy Information (context) is distributed - We don’t get to choose this if we are
building a distributed microservice platform. We can centralize the
retrieval of additional context by aggregating data that is needed to
make access control decisions into a single place, but the data
sources are still distributed.
I’m considering building a system like the one shown below. The idea is that Access Policies are administered by a central Policy Admin API. This API manages Policies that are persisted to a mongodb cluster with a 3 member replica set backing it. I would like other APIs in the platform to have a dedicated policy-query-api (Policy Decision Point) that is deployed along side it to make Access Control decisions pertinent to the API. The idea is that if any one of the policy-query-apis goes down, only the API that it serves will be affected.
I want changes to Policies to be governed by the Policy Admin API and I would like the changes to be replicated across each mongo instance that is used by each of the policy-query-apis.I don’t want the mongo replicas for each policy-query-api to affect a write to the primaries.
I also don’t need immediate data consistency (less than 5 sec latency), but I would like the data replication to be handled at the database layer if possible. The technology is already built to handle this and I don’t want to reinvent the wheel at the application layer if possible.
I’ve looked at the documentation on Replica Set Members and I’ve pretty thoroughly reviewed the documentation on Replica Sets in Mongo. It seems like having a Hidden Member or Delayed Member would be a good fit for my use case. Do you agree? Also, I’m concerned about the 50 member replica set limit 1. Since each one of these replicas would serve an API in my platform, if there exceeded more than 50 microservices (which is quite likely) how would I manage replication like this?

Just so that I understand, you are asking about:
one standalone (?? your picture suggests standalone but you are asking about 50 node RS limit) node per application, data mirrored to standalone from the master RS
the application only queries its local standalone
MongoDB provides read preference nearest for the use case of reading data from local nodes. Importantly the nearest read preference still provides availability if your local node is unavailable - the next closest (roughly) node will be used in this case. Your proposed architecture would take the application down every time its local database node needs to be restarted for version upgrades.
You may also look into tag sets.
Additionally, MongoDB allows specifying priorities on nodes for election purposes. If you put all of your MongoDB nodes into the same RS, you can use priorities to have one of the 3 designated "main" servers be primaries if any of them are available.

Related

Multi region high availability on GKE - what to do with the PostgreSQL database?

Google has ]this cool tool kubemci - Command line tool to configure L7 load balancers using multiple kubernetes clusters with which you can basically have a HA multi region Kubernetes setup. Which is kind of cool.
But let's say we have an basic architecture like this:
Front end is implemented as SPA and uses json API to talk to backend
Backend is a set of microservices which use PostgreSQL as a DB storage engine.
So I can create two Kubernetes Clusters on GKE, put both backend and frontend on them (e.g. let's say in London and Belgium) and all looks fine.
Until we think about the database. PostgreSQL is single master only, so it must be placed in one of the regions only. And If backend from London region starts to talk to PostgreSQL in Belgium region the performance will really be poor considering the 6ms+ latency between those regions.
So that whole HA setup kind of doesn't make any sense? Or am I missing something? One option to slightly mitigate the issue is would be have a readonly replica in the the "slave" region, and direct read-only queries there (is that even possible with PostgreSQL?)
This is a classic architecture scenario that has no easy solution. Making data available in multiple regions is a challenging problem that major companies spend a lot of time and money to solve.
PostgreSQL does not natively support multi-master writes. Your idea of a replica located in the other region with logic in your app to read and write to the correct database would work. This will give you fast local reads, but slower writes in one region. It's also more complicated code in you app and more work to handle failover of the master. Bandwidth and costs can also be problems with heavy updates.
Use 3rd-party solutions for multi-master Postgres (like Postgres-BDR by 2nd Quadrant) to offload the work to the database layer. This can get expensive and your application still has to manage data conflicts from two regions overwriting the same data at the same time.
Choose another database that supports multi-regional replication with multi-master writes. Cassandra (or ScyllaDB) is a good choice, or hosted options like Google Spanner, Azure CosmosDB, AWS DynamoDB Global Tables, and others. An interesting option is CockroachDB which supports the PostgreSQL protocol but is a scalable relational database and supports multiple regions.
If none of these options work, you'll have to create your own replication system. Some companies do this with a event-sourced / CQRS architecture where every write is a message sent to a central log, then applied in every location. This is a more work but provides the most flexibility. At this point you're also basically building your own database replication system.
If you have multi cluster ingress set up on two clusters in different regions, then the multi cluster ingress will only send traffic to the closest region to the user.
If the closest region is down, this is when traffic will be routed to the cluster in the other region.
So using the example you have provided, if there is traffic being sent to the backend and this user is closer to London, then traffic sent by this user will always be sent to London as long as the Region is up and running.
In regards dealing with latency, you will have to deal with the latency in this case as you cannot create a read replica within another region.
The benefit of this functionality (multi-cluster ingress) is that if one region goes down, then you have another region to route the traffic to.

Does this make sense for Orleans or SF and if so guidance please

We’re working to take our software to Azure cloud and looking at Orleans and Service Fabric (SF) as potential frameworks. We need to:
Populate our analysis engines with lots of data (e.g., 100MB to 2GB) per engine instance.
Maintain that state, and if an engine instance goes idle for say 20 minutes or more, we’d like to unload it (i.e., and not pay for the engine instance resource).
Each engine instance will support one to several end users with a specific data set.
Each engine instance can be highly interactive generating lots of plot data near realtime. We’re maintaining state as we don’t want to pay the price to populate engine instance for each engine interaction.
An engine instance action can take a few seconds, a few minutes, to even tens of minutes. We’ll want some feedback.
Users may access an engine instance every few seconds (e.g., to steer the engine towards a result based on feedback) and will want live plot data.
Each user will want to talk to a specific engine instance.
As a user expresses interest in running a simulation (i.e., standing up an engine instance), ideally we want him to choose small/medium/large computing resource to run his engine instance (i.e., based on the problem he’s trying to solve he may want more or less computing/memory power).
We’re considering Orleans and SF but we’re having difficulty specifying architecture based on above requirements. We’ve considered:
Trying to think about an SF partition, or an Orleans silo as an ‘engine instance’ described above.
Leveraging both Orleans and SF notion of fault tolerance through replication.
Leveraging local (i.e., to partition or silo) storage to store results and maintain state (i.e., for long periods or until idle for 20 minutes).
We’ve not understood how to:
Limit a silo or a partition to a single engine instance so that we can control resourcing of the engine instance.
Keep a user’s engine instance data separate from another users engine instance data.
Direct a request from a user (e.g., through a web API) to a particular engine instance.
Does this make sense for Orleans, does it make more sense for SF? Any pointers on how to implement the above would be helpful.
When you say SF I assume you mean SF Actors right?
You can use them the way you want, but in both cases does not look as the right solution for your problem, because:
Actors are single threaded, if you plan to share the same instance with multiple clients, each one would have to wait for the previous one to finish before it start processing anything. If you need to monitor the status of a running actor, you would have to make the actor publish the updates to external subscribers.
Actor state is isolated, so you can't access the state of other actors, the way to do it is provide a method to return it, but if the actor is running a command you have to wait the completion, unless you make a separate state service to hold the processed data.
You can't limit the resources required for a actor, in service fabric you specify the resources needed for a service, but you can't do it for actors, and you can't limit the resources they use, when they hit the limit, service fabric will try to balance the resources for your, but nothing prevent the process to consume more memory than requested.
Both actor services communicates using the ask approach, so they will "block" the caller waiting for an answer, it is asynchronous but you still have to keep the caller 'waiting'. (block and wait is because there is not an idea of fire and forget like Akka that uses the Tell approach, where it delivery the message and forget.)
Based on some of your requirements, I think a containers would be a better approach. Because:
You can limit the resource consumption for each container
The data is isolated inside the container and not visible to others
But on containers you have to manage the replication and partitioning by yourself, so in this case I would recommend the best of both worlds:
Create SF services to host the shared data sets between the the users
SF Service+Actor to only store the results of users simulations.
Containers to run the simulations and send updates to actors
This is just an example, it all will depend on your requirements, architecture and how data will be isolated from each other.

Orleans - What happens when system storage is down or inaccessible?

I'm evaluating Microsoft Orleans as the base for a custom distributed cache (among other features).
I was able to create a non-reliable cluster for evaluation purposes using MembershipTableGrain. All was working as described within the documentation.
Now I'm planning on set up a reliable cluster using on-premise servers (Azure is not an option). I appear to be leaning towards using the Relational Storage (SqlServer/ADO.net) Membership provider
https://dotnet.github.io/orleans/Documentation/Runtime-Implementation-Details/Relational-Storage.html
https://dotnet.github.io/orleans/Documentation/Advanced-Concepts/Configuring-SQL-Tables.html
My question is:
What happens to the status of the silos within the cluster if the Silo Membership database is down or it is not accessible (server outage, network issues, etc.)? I would assume it would affect the whole cluster as far as I understand the Orleans Membership Protocol.
You can read about it here:
http://dotnet.github.io/orleans/Documentation/Runtime-Implementation-Details/Cluster-Management.html
Basically, all existing silos and clients will keep working as is, and will not get impacted, but new silos or clients will not be able to join. Also, if a silo dies, it will not be excluded from the membership and thus some proportion of traffic will be failing until the membership is up.
But as long as no one else fails or joins, failures/unavailability of the storage is completely transparent. That was a deliberate design choice.

MongoDB: What's the use of a secondary in a sharding scheme?

I've read Google Cloud Platform's article on deploying MongoDB. Using a sharding scheme, it is clear that the application will never read from a secondary MongoDB server:
Because the production application never reads data from a secondary server, the application never needs to handle the complexity of stale reads and eventual consistency.
My questions are:
Are secondary servers only useful for fault tolerance, i.e. as a backup in case of a failure in the primary server? Or are there performance benefits to having secondaries within the same shard region?
If so, considering the following:
Compute Engine disks have built-in redundancy to protect data against failures and to ensure data availability through maintenance events
Why are secondary servers needed at all on a fault-tolerant platform like Google Cloud?
Thank you!
To answer the two questions:
Other Benefits of Replica Sets
Replica sets also allow you to perform rolling updates to MongoDB so are useful for supporting updates.
It is also possible to allow some applications (e.g. a reporting application) to read from a secondary which reduces some load on the primary. Some details and use cases are available on the MongoDB site - https://docs.mongodb.com/v3.2/core/read-preference/
Requirement for Secondary Servers
The Google article states:
Barring a catastrophic outage, the MongoDB primary server should
always be in this region
By having multiple members in a replica set you are protecting yourself from this kind catastrophic outage. If you require very high availability then you want this level of protection.
MongoDB's own database as a service (Atlas) deploys replica set members to different Amazon Web Service Availbility Zones to protect against this kind of catastrophic outage.

Amazon aws hints

I have to setup a server environment for a web application. I have to use aws, and so far it looks good for that purpose.
I need:
a scalable Tomcat 7 webapp server
session replication!
a mongodb database cluster(?)
As far as I think it could work with:
The scalable Tomcat 7 I can do easily with elastic beanstalk.
The session replication could work with elasticache
It seems like I have todo the mongodb cluster "manually", so I created some ec2 instances todo so.
I have some problems with that.
the costs would be quite high. The minimum setup would be 2 ec2 instances and one for the elasticache
The only thing, which autoscales, is the elastic beanstalk, means that I have to take care of that, too. (Well, for the mongodb instances, I could use a balancer, too)
In case of the mongodb ec2 instances, I need to setup each instance by myself
Do you have any idea how to:
lower the costs (Especially in the beginning, it would be a little much, no?)?
make the administration easier?
If you install mongo, have a look at: AWS_NoSQL_MongoDB.pdf. It is very customary to have one member of the replica set outside the availability zone you use, or even outside of the same region, so you can have a hot backup in case of a failure.
About prices - experiment (load test) and find the smaller instance type that fits you. Also, remember to shut down unused instances
About management - there is the AWS console and many 3rd party products. Also, Netflix have released some nice management tools.
Autoscaling for your web server is easily done with Elastic Beanstalk, but you can use it independently. Check out the documentation for autoscaling here: http://aws.amazon.com/autoscaling/
It has couple of features that will help you save most of your computation costs;
one is the ability to scale out (and most importantly in), by changing automatically the number of web server you are using, based on your load. You can define the minimum number (for example, 1) and the maximum number. The system can watch a set of metrics (number of requests, CPU load...) and decide when to add or subtract instances.
the second is the ability to change the scale policy and increase or decrease the size of the machines, based on your usage. You can use medium size instances and switch to large or extra large ones, if you find it more cost effective. You are encourage to try out different sizes to see what fits you best.
Using Elastic Cache, can help you both in the session replication, but also to lower the load on your DB machine. You can cache there your frequent queries output (front page, main category page...), and get better performance, and use fewer DB instances. It supports Memecached clients, which makes it very easy to develop in almost any programming language.
You should check Couchbase instead of MongoDB (see comparison). It is more robust and more reliable in scale.
Elastic BeanStalk is the best to reduce the over head of management for Web/App servers. However if haven't taken the development too far, I would recommend to use DynamoDB for the administration easement aspect.
However keep check on cost as well. Performance and Management is something really awesome in DynamoDB