Can we switch to using authenticated access to MongoDB with no downtime? - mongodb

We currently have a fairly complex Mongo environment with multiple query routers and data servers in different AWS regions using sharding and replication so that data will be initially written to a master shard in a local data server and then replicated to all regions.
When we first set this up we didn't add any security to the Mongo infrastructure and are using unauthenticated access for read and write. We now need to enable authentication so that the platform components that are writing data can use a single identity for write and read, and our system administrators can use their own user accounts for admin functionality.
The question is whether and how we can switch to using authentication without taking any downtime in the backend. We can change connection strings on the fly in the components that read and write to the DB, and can roll components in and out of load-balancers if we do need a restart. The concern is on the Mongo side.
Can we enable authentication without having to restart?
Can we continue to allow open access from an anonymous user after enabling authentication (to allow backward compatibility while we update the connection strings)?
If not, can we change the query strings before we enable authentication and have Mongo accept the connection requests even though it isn't authenticating?
Can we add authorization to our DBs and Collections after the fact?
Will there be any risk to replication as we go through this process? We have a couple of TB of data and if things get out of sync it's very difficult to force a resync.
I'm sure I'm missing some things, so any thoughts here will be much appreciated.
Thanks,
Ian

Related

Symmetric DS simple configuration guidance

Over the past couple of weeks I have been prototyping out some examples in symmetric DS. Looking for some guidance and examples because I am really running into some walls here. I have used the server and android examples successfully, don't need any assistance with setup on getting the basics working. It is a complex tool and I;m still learning it as well.
So I am trying to setup an environment where all the clients that run on android device sync up to a server. So I know it's fairly straight forward to do a setup where its 1 MASTER -> <- multiple clients, as the example that they provide do.
What I am trying to do is multiple masters to multiple clients. Essentially I want a database on the server for each client. Ill attach a diagram to try to help explain but I want a database for each store so store #1 has a master DB on the server and it syncs both ways with the client device.
server-diagram
SymmetricDS requires having a central node to store the configuration. I would recommend to have a central node with bunch of databases that connect to the central database. Connect each android application to another database. This topology will allow configuring what data syncs from the central node to the bunch of databases and what goes back
On the router from client to server you can set the target catalog to be a variable : $(sourceExternalId). This will use the clients external id as the database name on your server.
If you also need to replicate data back down you can set the external select on the triggers at the server. This would need to be an expression on your server database that would evaluate the current database. This would fire when a change occurs on the server database and populate the external_data column on sym_data during capture with the database that the change occurred in. You would then adjust the router from server to client to be a column match router type. Your expression then for the router would be: EXTERNAL_DATA=:EXTERNAL_ID. This would ensure that this data only be sent to the appropriate client.

What possibilities are there to connect to a postgres database

I was wondering: What possibilities are there to connect to a postgres database?
I know off top of my head that there are at least two possibilities.
The first possibility is a brute one: Open a port and let users anonymously make changes.
The second way is to create a website that communicates with postgres with use of SQL commands.
I couldn't find any more options on the internet so I was wondering if there are any. I'm curious if other options exist. Because maybe one of those options is the best solution to communicate with postgres via the internet.
This is more of a networking/security type question, I think.
You can have your database fully exposed to the internet which is generally a bad idea unless you are just screwing around for fun and don't mind it being completely hosed at some point. I assume this is what you mean by option 1 of your question.
You can have a firewall in front that only exposes it for certain incoming IP's. This is a little better, but still feels a little exposed for a database, especially if there is sensitive data on it.
If you have a limited number of folks that need to interact with the DB, you can have it completely firewalled, but allow SSH connections to the internal network (possibly the same server) and then port forward through the ssh tunnel. This is generally the best way if you need to give full DB access to folks that are external to the DB's network since SSH can be made much more secure than a direct DB connection by using a public/private keypair for each incoming connection. You can also only allow SSH from specific IP's through your firewall as an added level of security.
Similar to SSH, you could stand up a VPN and allow access to the LAN upon which the DB sits and control access through the VPN.
If you have a wider audience, you can allow no external access to the database (except for you or a DBA/Administrator type person through SSH tunneling or VPN). Then build access through a website where communication to the DB is done on the server side scripting (php, node.js, rails, .net, what-have-you). This is the usual website set up that every site with a database behind it uses. I assume that's what you mean in your option 2 of your question.

MongoDB connection overhead on client side

We are evaluating different alternatives for multi-tenancy in our platform. We think that one database per customer is the way to go as data structure and requirements are completely different from one customer to another, and we want to keep them as isolated as possible.
However we are facing the question of how to manage the connection to multiple databases. We don't want to have one app instance per customer. Instead we want to have a pool of app instances handling requests for all our customers and use the correct database depending on the customer.
Our concern is if keeping connections open to many (maybe thousands) of database will cause a performance issue. We are actually worried about memory usage, so we are wondering what's the overhead on client side when performing a connection to the MongoDB server.
Also we are thinking about moving the database access to a different service, which is going to be responsible of handling the database connection for all customers. In this case, is there an existing tool that allows to do that kind of "multiplexing" of MongoDB databases?
Some additional notes:
We discarded sharding. It won't fit our needs. We need different databases.
Databases will be in different servers with reserved resources. This means all databases run its own mondod process and we need different connections.
We use Java driver.

Multitenancy using LDAP Integration

I need your suggestion for the following stuff of Multitenancy:
Actually I need to achieve multitenancy for my app. I've got it for using traditional flow with use of DB/Schema (eg. separate schema for each tenant).
Now I need to integrate user validation from LDAP and multitenancy stuff as well.
So What I am thinking is that If I store User info + DB/Schema info (DB connectivity info) in LDAP Server for more dynamic nature of the app. Because with this I would be able to connect any DB/Schema (irrespective of their physical location).
What's your opinion for this approach, Would it be really feasible?
If there is any cons in your mind, please share.
Thanks & Regards.
It sounds like you are trying to host multiple clients' systems on your system and each client may have multiple connections from multiple locations. From your question it sounds like you are putting one customer per database though databases may not be on the same cluster.
The first thing to do is to lock down PostgreSQL appropriately in the pg_hba.conf and expose only those database/user combos you want to expose. If you are going this route, LDAP sounds sane to me for publishing the connection info, since it means you can control who can access which accounts. Another option, possibly closely tied, would be to issue SSL certs and use certificate authentication by clients, so they can be issued a cert and use it to connect. You could even authenticate PostgreSQL against LDAP.
So there are a lot of options here for using these two together once you have things otherwise properly secured. Best of luck.

Memcached/other key-value engine isolation

I have a bunch of web servers(frontends) behind balancer. Each apache process runs with it's own user for every virtualhost. Code that apache runs is PHP and it's not trusted code.
I need to have shared (between web servers) session storage and limit user(vhost) to only access it's session storage. So I want to avoid one tenant to be able to purge or corrupt memcached stored data.
So I basically looking for solution to authenticate users + create private buckets.
I know there is always MySQL way avaliable but I want to avoid performance penalty introduced by SQL layer.
Any solution in your mind so far?
I found product called CouchBase which fully comply with my requirements. It has buckets along with memcache caching layer and access protocol. It has SASL authentication and a bonus of load balancing and fail tolerance.