Sync two offline masters when network available - mongodb

I have a use case where I need to set up two physical stations at a venue. Each station will be running a couple of app servers and a mongodb server.
I can't rely on the venue's internet access so I need my app to be able to work offline and "sync" the dbs every once in a while.
I initially thought about having two masters that would somehow sync with a remote one but TIL that master-master replication is not possible with mongodb.
I've read about the active-active approach, however, that won't let me write to a different shard when offline.
I'm running out of ideas, any recommendation would be greatly appreciated.
------ Update on what I'm trying to achieve:
I'm working with a venue that has two entrances. The idea is to be able to capture some information from people attending the events (name, email, etc). After getting registered we will print a name tag with some of the info.
Everything sounds pretty easy, however, if possible, I would like to not rely on the venue's network (internet). So that's where I started struggling figuring out whats the best approach. I guess what I want is being able to have a remote mongo but if the network goes down somehow keep saving records locally and send them to the remote mongo instance when network is available again.
Extra considerations:
- Events last a couple of days, some people lose their name tag overnight, they should be able to go to either of the entrances and get it reprinted. So we should be able to find their info even if they registered in entrance A but they are asking for a reprint in entrance B.
More questions:
- Am I overthinking it? Maybe venue's network + a 4G/LTE modem as a backup should be enough? I would prefer not relying on it tho.

I believe you're overthinking things. Here's what I would do if faced with a similar situation:
From the description, it doesn't sound like the two sites need to be connected in real time at all. I would create a server on Entry A, another in Entry B, and consolidate their data each day after the day ended if required. This is because:
It's unlikely that one person will register in both sites within a single day. If they lost their tag on that day, I'll just tell them to go back to where they registered earlier and get it reprinted there. Worst case, you'll create a duplicate entry (should be obvious which is the duplicate since no one would lose their tag within seconds) but I would not anticipate hundreds of people all lost their tags within a day.
If the attendee lost their tag overnight, both servers will have synced data and should be able to reprint.
If you're concerned about the venue's Wifi access, just run cables from the server to the printing stations.
Personally, I would argue that the overnight sync is not really needed at all (see the likelihood of people registering twice). I would just collect the data from both servers after the event ended. That is, unless you have specific needs for the combined data from both entries during the 2nd day.
Note: please make sure you're running a minimum of 3-node replica set. Running a standalone instance for prod environment is not recommended. Hardware/disk corruption is a common event.

Related

How do you sync many writable MongoDB databases?

What I mean by witable is that you can CRUD on each database, and it automatically syncs with the other so that all of them are synced all the time (as much as possible).
I want to start a project for a company with some tricks.
The company is present in many locations (at least 5) and wants the app to run locally (with local database), but when there's a change(Create Update or Delete), the change is propagated to the other databases.
The goal is to have them all synced at every moment, but with the possibility that if internet connection is lost on one site, they continue to use the app properly since they are actually connected to the local database. That's why they don't one a totally online database.
They use MongoDB.
I saw the replica sets technology, but since it's with a unique master, it seems complicated.
Please can you share solutions to such a situation?

Stuck with understanding how to build a scalable system

I need some guidance on how to properly build out a system that will be able to scale. I will give you some information about what I am trying to do and then ask my specific question.
I have a site where I want visitors to send some data to be processed. They input the data into a textarea or upload it in a file. Simple. The data is somewhat preprocessed on the client side before a POST request is made to a REST endpoint.
What I am stuck on is what is a good way to take this posted data store it and then associate an id with it that references the user since I cannot process the data fast enough for it to be returned to the user in a reasonable amount of time?
This question is a bit vague and open to opinion, I admit it. I just need a push in the right direction to keep moving. What I have been considering is throwing the data into a message queue and then having some workers process the data elsewhere and when the data is processed alert the user as to where to find it with some sort of link to an S3 bucket or just a URL to a file. The other idea was to just run the request for each item to be processed against another end-point that already processes individual records in some sort of loop client side. The problem is as follows with this idea:
To process the data it may take somewhere from 30 minutes to 2 hours depending upon the amount that they want processed. It's not ideal for them to just sit there and wait for that to finish depending on the amount of records they need processed, so I have ruled this out mostly.
Any guidance would be very much appreciated as I don't have any coworkers to bounce things off of, nor do I know many people with the domain knowledge that I could freely ask. If this isn't the right place to ask this, could you point me in the right direction as to where it should be asked?
Chris
If I've got you right, your pipeline is:
Accept item from user
Possibly preprocess/validate it (?)
Put into some queue
Process data
Return result.
You man use one or several queues on stage (3). Entity from user gets added to one of the queues. If it's big enough, it could be stored in S3 or storage alike, and only info about it put into the queue: link, add date, user id (or email of alike). Processors can pull items from queue and give feedback to users.
If you have no strict requirements on order, things get much simpler: you don't need any sync between them. Treat all the components: upload acceptors, queues, storages and processors as independent pools of processes. Monitor each pool separately. If there's some bottlenecks - add machines to that pool.

Change value after period of time in Firebase

I have a lobby in which I want the users to be in sync. So when a user turns off his internet while the app is running, he should be removed. I know Firebase does not support server side coding, so the coding needs to be client side. The answers from How to delete firebase data after "n" days and Delete firebase data older than 2 hours do not answer this question since they expect that the user is online and they have an internet connection. So my question is if is possible to delete users when they got no internet? I thought maybe it is an idea to let the users update a value every 5 seconds, and when that update is not done, the other users in that lobby remove the player. This way is not good, since every player needs to retrieve and upload alot of data every 5 seconds. What is the best way to solve this?
Edit: to make it short, lets say each user has an image. The image should be green when the user is connected, and grey when disconnected.
Edit 2: after thinking it over, it is really hard to accurate present the connected users on a client-side server. That is why, if nobody has a different solution, I should add another server which can execute server-side codes. Because of the larges amount of servers, I would like to know which server I should use. The server should run a simple function which only checks if the users are connected or disconnected and can communicate with Firebase. If I am correct it should look like this:
But the server also needs to communicate with the users directly. I have absoluty no idea where to start.
If I'm not completely wrong, you should be able to use onDisconnect.
From the Firebase, documentation:
How onDisconnect:Works:
When an onDisconnect() operation is established, it lives on the Firebase Realtime Database server. The server checks security to make sure the user can perform the write event requested, and informs the client if it is invalid. The server then monitors the connection. If at any point it times out, or is actively closed by the client, the server checks security a second time (to make sure the operation is still valid) and then invokes the event.
In app in production I'm using onDisconnectRemoveValue, and when I close the app, the user removes himself from the lobby. Not sure how it works when you turn the device in airplane mode, but from the documentation it seems there should be no problem.
One thing: when you test it better do it on real device, the simulator have issues with turning it off and on, at least the on I have installed.
Edit: So i checked the onDisconnect when you put the device on airplane mode and it works! The question is, that it removes the user in about a 1:30 min, approximately, so if you read the documentation or ask the support, you may be (and only may be) able to find a way to set the time you want.

Unique ids in MongoDB

I'm about to deploy my first production version of a web service that uses MongoDB.
This web service could be prone to attacks (hackers).
I have been using the built-in Object-ID as unique identifier for each value, and this is exposed publicly (or at least to authenticated users).
Could this be a problem considering that it's built up on data such as, object creation timestamp, machine and process IDs etc (http://docs.mongodb.org/manual/core/object-id/)?
Could it be that I'm giving away too much information about when the object was created, how many machines that is used to etc?
What would your recommendations be?
Not really. It can't be used to graph your network or your machines or find hidden objects or to even graph your traffic and load times (as I have found out myself from trying).
For example, unlike an auto incrementing ID you cannot easily judge what time stamp or PID or Machine ID was used to create the next ObjectID. This makes it very hard to crawl for the hidden objects especially if you don't publicly link to them somewhere.
The PID and Machine ID are not very good at identifying anything about your network really and the PID can change almost any time it likes, whether the process was restarted or whether you restarted the server; or if you are using a language like PHP, every time a new connection comes in.
The machine ID is another piece of useless information that doesn't really give any meaningful results for anyone but your computer. I don't believe it uses the network interfaces ID (some drivers did) any more so it cannot be used to identify the machine externally.
So in short, not really.

PostgreSQL availability and merges

Is there a PostgreSQL HA solution that can handle a splitbrain situation gracefully. To elaborate, the system i'm working on is expected to run in several areas with users close to the servers there and connectivity between the zones is known to be questionable. I'd like for the users to be able to continue using the system in a degraded state (without updates from disconnected zones) and for a sensible merge once they come back online.
If you're prepared to live with a time delay, there should be some log shipping solutions that you could implement with a scheduled job. Basically, you send slices of the transaction log to the backup server. Here's some links with a better description:
http://developer.postgresql.org/pgdocs/postgres/warm-standby.html
http://developer.postgresql.org/~wieck/slony1/adminguide-1.1.rc1/logshipping.html
http://www.network-theory.co.uk/docs/postgresql/vol3/RecordbasedLogShipping.html
Note that a full implementation of Slony-I may be clunky (at least I found it that way a couple years ago, it may have drastically improved).