Architecting a configurable user notification service - service

I am building an application which needs to send notifications to users at a fixed time of day. Users can choose which time of day they would like to be notified, and which days they would like to be notified. For example, a user might like to be notified at 6am every day, or 7am only on week days.
On the back-end, I am unsure how to architect the service that sends these notifications. The solution needs to handle:
concurrency, so I can scale my servers (notifications should not be duplicated)
system restarts
if a user changes their preferences, pending notifications should be rescheduled

Using a message broker such as RabbitMQ and task scheduler such as Celery may meet your requirements.
Asynchronous, or non-blocking, processing is a method of separating the execution of certain tasks from the main flow of a program. This provides you with several advantages, including allowing your user-facing code to run without interruption.
Message passing is a method which program components can use to communicate and exchange information. It can be implemented synchronously or asynchronously and can allow discrete processes to communicate without problems. Message passing is often implemented as an alternative to traditional databases for this type of usage because message queues often implement additional features, provide increased performance, and can reside completely in-memory.
Celery is a task queue that is built on an asynchronous message passing system. It can be used as a bucket where programming tasks can be dumped. The program that passed the task can continue to execute and function responsively, and then later on, it can poll celery to see if the computation is complete and retrieve the data.
While celery is written in Python, its protocol can be implemented in any language. worker is an implementation of Celery in Python. If the language has an AMQP client, there shouldn’t be much work to create a worker in your language. A Celery worker is just a program connecting to the broker to process messages.
Also, there’s another way to be language independent, and that’s to use REST tasks, instead of your tasks being functions, they’re URLs. With this information you can even create simple web servers that enable preloading of code. Simply expose an endpoint that performs an operation, and create a task that just performs an HTTP request to that endpoint.
Here it is the python example from official documentation:
from celery import Celery
from celery.schedules import crontab
app = Celery()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
# Calls test('hello') every 10 seconds.
sender.add_periodic_task(10.0, test.s('hello'), name='add every 10')
# Calls test('world') every 30 seconds
sender.add_periodic_task(30.0, test.s('world'), expires=10)
# Executes every Monday morning at 7:30 a.m.
sender.add_periodic_task(
crontab(hour=7, minute=30, day_of_week=1),
test.s('Happy Mondays!'),
)
#app.task
def test(arg):
print(arg)

As I can see you need to have 3 types of entities: users (to store email or some other way to reach the user), notifications (to store what you want to send to user - text etc) and schedules (to store when user want to get notifications). You need to store entities of those types in some kind of database.
Schedule should be connected to user, notification should be connected to user and schedule.
Assume you have cron job that starts some script every minute. This script will try to get all notifications connected with schedule for current time (job starting time). Don't forget to implement some type of overlaping prevention.
After this script will place a tasks (with all needed data: type of notification, users who you want to notify etc) in queue (beanstalkd or something). You can create as many workers (even on different physical instances) as you want to serve this queue (without thinking about duplication) - this will give you a great power of scalability.
In case user changed his schedule it will affect all his notification at the same moment. There is no pending notification as they will be served only when they really should be send.
This is a very highlevel description. Many things depends on language, database(s), queue server, wokers implementation.

Related

non-stop workers in celery

I'm creating a distributed web crawler that crawls multiple social media at the same time. This system is designed to distribute the available resources to the different social media based on their current post rate.
For example, if social media 1 has 10 new posts per hour and social media 2 has 5 posts per hour, 2 crawlers focus on social media 1 and 1 crawler focus on social media 2 (if we are allowed to have just three crawlers).
I have decided to implement this project via Celery, Flask, rabbitMQ, and Kubernetes as the resource manager.
I have some questions regarding the deployment:
How can I tell celery to keep a fixed number of tasks in rabbitMQ? This crawler should never stop crawling and should create a new task based on the social media's post rates (which is gathered from the previous crawling data), but the problem is, I don't have a task submitter for this process. Usually, there is a task submitter for celery that submits tasks, but there is no such thing as a task submitter in this project. We have a list of social media and the number of workers they need (stored in Postgres) and need celery to put a task in rabbitMQ as soon as a task is finished.
I have tried the solution to submit a task at the end of every job (Crawling Process), but this approach has a problem and is not scalable. In this case, the submitted job would be the last in the rabbitMQ queue.
I need a system to manage the free workers and assign tasks to them immediately. The system I want should check the free and busy workers and database post rates and give a task to the worker. I think using rabbitMQ (or even Redis) might not be good because they are message brokers which assign a worker to a task in the queue but here, I don't want to have a queue; I want to start a task immediately when a free worker is found. The main reason queueing is not good is that the task should be decided when the job is starting, not before that.
My insights on your problem.
I need a system to manage the free workers and assign tasks to them
immediately.
-- Celery does this job for you
The system I want should check the free and busy workers and database
post rates and give a task to the worker.
Celery is a task distribution system, it will distribute the tasks as you expect
I think using rabbitMQ (or even Redis) might not be good because they
are message brokers which assign a worker to a task in the queue
Using celery, you definitely need a broker, they just hold your messages, celery will poll the queues and distribute them to the right workers(priority, timeout, soft handling, retries)
but here, I don't want to have a queue; I want to start a task
immediately when a free worker is found. The main reason queueing is
not good is that the task should be decided when the job is starting,
not before that.
This is kind of a chain reaction or like triggering a new job once the previous one is done. If this is the case, you don't even need celery or a distributed producer-consumer system.
Identify the problem:
Do you need a periodic task to be executed at a point in time? ---> go with a cronjob or celery-beat(cron job-based celery scheduler)
Do you require multiple tasks to be executed without blocking the other running tasks - You need a producer-consumer system(Celery(out of the box solution, Rabbitmq/Redis Native Python Consumers))
3.If the same task should be triggering the new task, there is no need to have multiple workers, what will we achieve from having multiple workers if your work is just a single thread.
Outcome -- [Celery, RabbitMQ, and Kubernetes - Good combo for a distributed orchestrated system] or [a webhook model] or [recursive python script]
Reply to your below comment #alavi
One way of doing it can be like, write a periodic job(can run every
second/minute or an hour or whatever rate) using celery beat, which
will act as a producer or parent task. It can iterate all media sites
from DB and spawn a new task for crawling. The same work status can be
maintained in DB, based on the status, new tasks can be spawn. For a
start I can say like this parenting task will check if the previous
job is still running, or check the progress of the last task, based on
the progress decide upon, even we can think about splitting the crawl
job again into micro tasks and being triggered from the parent job.
You can collect some more x and y going further during development or
with performance.

Recurring function at date/time

I'm trying to call a function when my macOS application is in any state, including terminated. Here is what i'm trying to accomplish:
Schedule a function (much like DispatchQueue.main.asyncAfter()) to run daily at a given time (let's say 9AM). I would like to add a feature to my application that allows a user to pick a time of day, and have an Alamofire POST request run at that time every day.
I have tried using a Runloop, and more recently Grand Central Dispatch:
DispatchQueue.main.asyncAfter(wallDeadline: DispatchWallTime.now() + .seconds(60)) {
//Alamofire
}
I can easily accomplish this while the application is running with a timer, but have yet to find a way to accomplish this in the background, with the app running.
This may be pretty heavy to implement (i.e. not straightforward), but if you want a task to run even if your app is terminated, you might need to consider writing your own LaunchAgent.
The trick here would be for the agent to be able to interact with your application (retrieving or sending shared information).

Scheduling/delaying of jobs/tasks in Play framework 2.x app

In a typical web application, there are some things that I would prefer to run as delayed jobs/tasks. They tend to have some or all of the following properties:
Takes a long time (anywhere from multiple seconds to multiple minutes to multiple hours).
Occupy some resource heavily (CPU, network, disk, external API limits, etc.)
Result not immediately necessary. Can complete HTTP response without it. OK (and possibly even preferable) to delay until later.
Can be (and possibly preferable to) run on (a) different machine(s) than web server(s). The machine(s) are potentially dedicated job/task runners.
Should be run in response to other event(s), or started periodically.
What would be the preferred way(s) to set up, enqueue, schedule, and run delayed jobs/tasks in a Scala + Play Framework 2.x app?
For more details...
The pattern I have used in the past, and which I would like to replicate if applicable, is:
In handler of web request, or in cron-like call, enqueue job(s)
In job runner(s), repeatedly dequeue and run one job at a time
Possibly handle recording job results
This seems to be a relatively simple yet still relatively flexible pattern.
Examples I have encountered in the past include:
Updating derived data in DB
Analytics/tracking API calls for a web request
Delete expired sessions or other stale/outdated DB records
Periodic batch ETLs
In other languages/frameworks, I would typically use a job/task framework. Examples include:
Resque in a Ruby + Rails app
Celery in a Python + Django app
I have found the following existing materials, but unfortunately, I don't think they fit my use case directly.
Play 1.x asynchronous jobs API (+ various SO questions referencing it). Appears to have been removed in 2.x line. No reference to what replaced it.
Play 2.x Akka integration. Seems very general-purpose. I'd imagine it's possible to use Akka for the above, but I'd prefer not to write a jobs/tasks framework if one already exists. Also, no info on how to separate the job runner machine(s) from your web server(s).
This SO answer. Seems potentially promising for the "short to medium duration IO bound" case, e.g. analytics calls, but not necessarily for the "CPU bound" case (probably shouldn't tie up CPU on web server, prefer to ship off to different node), the "lots of network" case, or the "multiple hour" case (probably shouldn't leave that in the background on the web server, even if it isn't eating up too many resources).
This SO question, and related questions. Similar to above, it seems to me that this covers only the cases where it would be appropriate to run on the same web server.
Some further clarification on use-cases (as per commenters' request). There are two main use-cases that I have experienced with something like resque or celery that I am trying to replicate here:
Some event on the site (Most often, an incoming web request causes task to be enqueued.)
Task should run periodically. (Most often, this is implemented as: periodically, enqueue task to be run as above.)
In the case of resque or celery, the tasks enqueued by both use-cases enter queues the same way and are treated the same way by the runner/worker process. Barring other Scala or Play-specific considerations, that would be my initial guess for how to approach this.
Some further clarification on why I do not believe the Akka scheduler fits my use case out-of-the-box (as per commenters' request):
While it is no doubt possible to construct a fitting solution using some combination of the Akka scheduler (for periodic jobs), akka-remote and akka-cluster (for communicating between the job caller and the job runner), that approach requires a certain amount of glue code which is almost a delayed job framework in and of itself. If it exists, I would prefer to use an existing out-of-the-box solution rather than reinvent the wheel.

Sending Reminders for Tasks

I have recently been thinking about possible architecture for a simple task reminder system. User will schedule a task and reminder in form of SMS/email/android needs to be sent to all stakeholders at some x minutes before the task is scheduled to be performed(much in the same way google calendar works). The problem here is to send the reminder at that precise point in time. Here are the two possible approaches I can think of:
Cron: I can setup a cron to run every minute. This will scan the table for notifications which need to be sent in the next minute and simply sends the notifications. But, precision is lost as there is always the chance of that +/-1 min error.
Work Queues: I can simply put a message with appropriate delay in a queue at the time task was scheduled. Workers will send the notification as and when they receive the message. I can add as many workers as I want in case my real time behavior starts getting affected because of load. There are still a few issues. How to choose the appropriate work queue? I have evaluated RabbitMq and Beanstalk. While Rabbitmq follows standard AMQP protocol and is widely suggested, it doesn't provide the delay functionality out of the box. There are ways to simulate this using dead-letter-exchanges but this will not work in my case because the delay needs to be variable. Beanstalk supports this but the problem is that beanstalk queue resides entirely in memory which I don't like(but can live with). Any possible alternatives?
Third Approach: ??????. I am sure a simple desktop notification tool does neither of the two. What technology do they use to achieve the same thing?
We had the same scenario and we use Redis for long schedules even now reminders for up to 2 years. You can use Sorted Set where the timestamp is the score.
We use Beanstalkd delay jobs for those kind of reminders where we know it's relatively short term couple of hours, and there is no cancellations, as removing from beanstalkd a delayed message you need to retain the job id in a database for later removal, and that is no viable.
Although you mention memory limit, we use persistence on both Redis/Beanstalkd

Queue suggestions for deferred execution for a one-off task

I'm looking for a lightweight system that will let me queue up a one-off (non-recurring) task and have it execute at a specific time in the future.
This is for the backend of a game where the user does tasks that are time-based. I need the server to check the status of the user's "job" at the completion time and perform the necessary housekeeping on their game state.
I'm somewhat familiar with Redis, Celery, Beanstalkd, ZeroMQ, et al., but I haven't found any info on scheduling a single unit of work to be executed in the future. (or pop off the queue at a set time) Celerybeat has a scheduler for cron-type recurring tasks, but I didn't see anything for one-off.
I've also seen the "at" command in *nix, but I'm not aware of any frontend for it that can help me manage the jobs.
I realize there are some easy solutions such as ordering keys in Redis and doing a blocking pop, but I'd like to not have to continuously poll a queue to see if the next job is ready.
The closest I've found is the deferred library on GAE, but I was hoping for something that runs on my own Linux box along with my other components.
I'd appreciate any suggestions!
Celery allows you to specify a countdown or an ETA at the call of a task to be executed.
The documentation says it best:
http://docs.celeryproject.org/en/latest/userguide/calling.html#eta-and-countdown