Getting PostgreSQL notifications into an AWS SQS Queue - postgresql

An application that I'm working on uses AFTER CREATE/UPDATE/DELETE triggers to create pg_notify notifications when certain actions occur within the system. Currently, we have a small Node.JS application that LISTENs for the events and then immediately turns around and posts them to an AWS SNS topic, which gets forwarded to our SQS event queue. From that queue, we trigger all sorts of things based on the event (emails, SMSs, lambdas, long running jobs, etc).
This architecture works well, but the Node.JS application that sits in between the PostgreSQL instance and the SNS topic seems a bit fragile. I can't really run two copies in two availability zones, because messages will be duplicated.
I'm looking for a better way to get these Postgres notifications into SQS. Are there any options out there for this? If Postgres Aurora has something, we might consider that.

Use your current strategy of a small application that LISTENs for events. Just introduce a deduplication step between that app and your event subscribers. This will allow you to run several instances of your app.
For example, you could use a FIFO SQS queue. These automatically drop duplicate messages. Since FIFO queues cannot subscribe to SNS, you'd need to put messages directly to the queue instead of through SNS.
Alternatively, you could use DynamoDB to store checksums of your recent messages and if your app encounters a duplicate, drop it manually (make sure to use conditional writes to prevent race conditions).

Some options I've found:
Continue with the current method
I could keep the current small application that's redirecting events from my PostgreSQL RDS and dumping them into SNS->SQS. I can deploy it in a 1 region/max 1/min 1 auto-scaling group to make sure there is not more than copy running at a time.
Ditch my RDS and use a self hosted database
I could ditch RDS and run PostgreSQL on an EC2 instance and then use PL/Python along with the AWS-SDK to make calls to SNS instead of using pg_notify. I don't like this idea, because I lose the ease of use that comes with RDS.
For now, I'll be sticking with the current method, unless someone has some other ideas that I could explore. I'm sure there will be more options in the future (like when Aurora PostgreSQL adds support for calling Lambdas, like the Aurora MySQL has).

Related

Is it me or are DynamoDb Streams just really lacking?

I have an application running in multiple regions in AWS, this application reads from global DynamoDb table(s). Updates occur in the background via another process and I wanted to be able to be able to monitor for these updates so the application can invalidate its cache (I'm not using DAX).
I was thinking I could use DynamoDb streams for this, however; after going through a number of road blocks with Spring Kinesis Streams Binder (e.g. the fact that it requires 2 tables [SpringIntegrationMetadataStore & SpringIntegrationLockRegistry] be created, my company doesn't allow dynamic creation of tables (so that was fun to hunt down as I couldn't find any mention in the docs - 🤷‍♀️ maybe I missed it). Now I think I have found out that only 1 application can listen to a Kinesis stream at a time?
Is that true?
Is there a way
Is there a way for multiple applications, that only read from DynamoDb, to get notified when an update occurs? I was thinking that I could use DynamoDb Streams such that each app would monitor the stream for updates and be able to invalidate their cache. If the above is true, then I need to do something more involved or complex (use a SNS/SQS for updates, elasticache, Redis, Kafka) which just seems like overkill for this scenario.
e.g. the fact that it requires 2 tables [SpringIntegrationMetadataStore & SpringIntegrationLockRegistry]
Well, that's how consumer group management is handled by Spring Cloud Stream Kinesis Binder. Even if you would use only a KCL, it still would require from you extra table in DynamoDB. Therefore your concern sounds more like a lack of confidence in cloud services you use.
Now I think I have found out that only 1 application can listen to a Kinesis stream at a time?
That's not true if all your consumer applications are configured for different consumer groups.
Please, make yourself familiar with Spring Cloud Stream and its model: https://docs.spring.io/spring-cloud-stream/docs/3.1.1/reference/html/spring-cloud-stream.html#_main_concepts
Another way probably could be done via AWS Lambda trigger for DynamoDB Streams: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html

Firebase centered architecture

This is an architectural question.
I am currently in the process of designing a web application and I am used to a basic: frontend, api, database, microservices setup.
For the sake of saving money and making my architecture a little bit more modern than what I am used to I decided to look into serverless.
The two main parts I am interested in are google cloud functions and firebase. My understanding is that google cloud functions can be fired when a database entry in firebase has been manipulated.
The way I used to communicate between services was through message queues such as RabbitMQ but it seems to me that by using firebase and cloud functions you can build communication through the database without the need for message queues. What I mean by communication in this case, would be that one service would be able to react to the execution of another service by seeing that an entry in the database was changed.
My question therefore is, what are the upsides and downsides of letting all your "communication" between microservices run through firebase instead of message queues, and is this an architecture that is generally used?
AFAIK, cloud function triggers is a beta feature in Firebase, and according to the doc, there are some limitations for firestore trigger events:
It can take up to 10 seconds for a function to respond to changes in Cloud Firestore.
Ordering is not guaranteed. Rapid changes can trigger function invocations in an unexpected order.
Events are delivered at least once, but a single event may result in multiple function invocations. Avoid depending on exactly-once mechanics, and write idempotent functions.
Cloud Firestore triggers for Cloud Functions is available only for Cloud Firestore in Native mode. It is not available for Cloud Firestore in Datastore mode.
The most concerning limitation here is the first one. 10 seconds for an update is a long time if you need that update to be visible to the user.
Another disadvantage I see is that it may run out of control (in terms of system design) as the complexity increases. You may be tempted to add events for everything, and it may be hard to partition them by category, for example (in message queues, you can use topics for that).
Also, according to the doc, cloud functions are rate-limited to 16 invocations per 100 seconds, which may quickly be reached if you got some traffic on your app.
I would use trigger-events for isolated scenarios and use a message queue for the backbone communication between microservices.

how to design a realtime database update system?

I am designing a whatsapp like messenger application for the desktop using WPF and .Net. Now, when a user creates a group I want other members of the group to receive a notification that they were added to a group. My frontend is built in C#.Net, which is connected to a RESTful Webservice (Ruby on Rails). I am using Postgres for the database. I also have a Redis layer to cache my rails models.
I am considering the following options.
1) Use Postgres's inbuilt NOTIFY/LISTEN mechanism which the clients can subscribe to directly. I foresee two issues here
i) Postgres might not be able to handle 10000's of clients subscribed directly.
ii) There is no guarantee of delivery if the client is disconnected
2) Use Redis' Pub/Sub mechanism to which the clients can subscribe. I am still concerned with no guarantee of delivery here.
3) Use a messaging queue like RabbitMQ. The producer of this queue will be postgres which will push in messages through triggers. The consumer of-course will be the .Net clients.
So far, I am inclined to use the 3rd option.
Does anyone have any suggestions how to design this?
In an application like WhatsApp itself, the client running in your phone is an integral part of a large and complex event-based, distributed system.
Without more context, it would be impossible to point in the right direction. That said:
For option 1: You seem to imply that each client, as in a WhatsApp client, would directly (or through some web service) communicate with Postgres as an event bus, which is not sound and would not scale because you can only have ONE Postgres instance.
For option 2: You have the same problem that in option 1 with worse failure modes.
For option 3: RabbitMQ seems like a reasonable ally here. It is distributed in nature and scales well. As a matter of fact, it runs on erlang just as most of WhatsApp does. Using triggers inside Postgres to publish messages however does not make a lot of sense.
You need a message bus because you would have lots of updates to do in the background, not to directly connect your users to each other. As you said, clients can be offline.
Architecture is more about deferring decisions than taking them.
I suggest that you start simple. Build a small, monolithic, synchronous system first, pushing updates as persisted data to all the involved users. For example; In a group of n users, just write n records to a table. It is already complicated to reliably keep track of who has received and read what.
This heavy "group" updates can then be moved to long-running processes using RabbitMQ or the like, but a system with several thousand users can very well work without such thing, especially because a simple message from user A to user B would not need many writes.

How to send instance-wide notifications in PostgreSQL, across different databases?

I'm using PostgreSQL's NOTIFY command to send async events to inform external programs of the changes happening inside a database. It works perfect but now I've got a new scenario. I need to have several databases within an instance of PostgreSQL.
As I've read the documentation and tested it myself, NOTIFY does not go beyond the borders of a database (to other databases within the PostgreSQL instance).
Whenever the command NOTIFY channel is invoked, either by this session
or another one connected to the same database, all the sessions
currently listening on that notification channel are notified, and
each will in turn notify its connected client application.
Which means I have to listen to notifications of each database separately. And since I'm planning to provide my users with the capability to instantiate their own database on-demand, it means I have to make new listener connections for each new database as well. It poses a challenge and I really prefer if I can have a constant number of listener connections, regardless of the number of databases.
Does anyone know how to send notifications across databases in PostrgeSQL or some other feature I can use?

How to connect meteor to an existing backend?

I recently discovered Meteor, and I really love the simplicity that it brings to programming new apps. My question is: how do you connect it to an existing back-end? We have a substantial amount of existing Clojure code, also running with MongoDB. What I would like to do is use Meteor to build the front-end of my app. I guess I could connect my Meteor app directly to the MongoDB instance of the back-end, but this does not seem like a good practice... or is it?
Another option I imagined was to access the DB from either the webapp or the Clojure code and create a separate way of communication between the two with a queue mechanism, or sockets. Any hint or pointer to relevant documentation would be helpful!
Take a look at Meteor's environment variable settings. By setting these variables you can easily define an external MongoDB instance. In particular it would be
$export MONGO_URL="mongodb://yourmongodbserver/your-db"
There is a screencast of eventedmind.com for this specific topic https://eventedmind.com/feed/sg3ejYnmhxpBNoWan which is quite resourceful.
Regarding the "how" to point them to the same, #Michael's answer is spot on; just point your Meteor web servers at the same MongoDB.
Regarding whether or not you should, that depends on your situation. Having everything run off the same DB certainly simplifies things.
Having separate dbs can potentially reduce the load on your db tier as you could selectively choose which writes/updates to replicate between the clojure and Meteor dbs.
One issue with either method is speed of notification of changes. Currently, Meteor servers poll the DB every 10 secs to recognize changes. Happily, once the oplog branch gets merged into master, it will give a large speed improvement in how quickly external changes made in the DB (as opposed to directly through a Meteor server) are reflected in the Meteor clients. The oplog support will enable Meteor servers to emulate a replica-set instance, tailing the oplog which will mean practically instant notification of db changes.
Using a queue as a middle-ware layer introduces complexity and adds another point of failure. It also increases latency of notification. These issues can be mitigated, though, and there may be other pieces of your infrastructure in the future that would benefit from such a middle-ware queue. For example, other interested systems could register with the queue to receive notification of changes without querying or needing to know about your db. You can also scale your MongoDB instances independently and tune the queue to determine what "eventually" means in the "eventually consistent" guarantee.
I think the questions to ask are:
how much overlap is there between the clojure dataset and the Meteor dataset
how quickly do you need changes to be reflected between the two
will a middle-ware queue be useful in other circumstances as you grow
Regarding possible queue technologies to look into, I've heard very good things about RabbitMQ. The Oct. 2013 talk at the Clojure NYC meetup included a description of switching to RabbitMQ from Amazon SQS due to latency issues with SQS and anecdotally RabbitMQ has been rock-solid for them.