C# What is my best option? Service/Application/Multiple Applications [closed] - projects-and-solutions

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am developing a solution that requires a number of tasks to be completed at various times. Example:
Task 1 - Monitor mailbox, process mail items
Task 2 - Monitor mailbox (different folder), process mail items
Task 3 - Generate PDF reports
Task 4 - Monitor folder, distribute files via email as attachments when new ones arrive.
I have already implemented the solution, however, it was basically just a quick fix to get the thing running. Now that it is up, I want to revisit the current setup and improve it so it is as efficient as possible.
For the current solution I have created a sepearate application for each different task and used the Task Scheduler to execute them at specific times.
Task 1 is a console application that runs on a scheduled task every 5 minutes
Task 2 is a console application that runs on a scheduled task every 5 minutes (2 minutes after the first application this is because Task 1 will move emails into the folder Task 2 is monitoring)
Task 3 is run at 5am every day as a runonce application on a scheduled task
Task 4 is running indefinetly.
My question is, does this seem like a reasonable approach for a solution to this type of application? Do some of the tasks seem better as a service rather than an application?

I think I'd probably use a single service which can be easily configured to run the various tasks (so that if you want to separate them later, you can do so).
Scheduling specific applications is okay and certainly a simpler way of working, but this feels more like a service to me. Of course, if you separate out the "doing stuff" logic from the "invocation" side of things, you can easily switch from one to the other.
The efficiency side of things is unlikely to change much by this decision. Do you have good grounds to be worried about the overall efficiency at the moment? Have you profiled your applications to work out where any bottlenecks are? I'd say they're unlikely to be in the scheduling side of things.

A service sounds like the right way to approach this.
Long running subtasks such as PDF generation are well suited to perform using the asynchronous programming method, i.e. using worker threads that call back to the parent thread upon completion. This way the monitor tasks can run independently of the action tasks.

Related

non-stop workers in celery

I'm creating a distributed web crawler that crawls multiple social media at the same time. This system is designed to distribute the available resources to the different social media based on their current post rate.
For example, if social media 1 has 10 new posts per hour and social media 2 has 5 posts per hour, 2 crawlers focus on social media 1 and 1 crawler focus on social media 2 (if we are allowed to have just three crawlers).
I have decided to implement this project via Celery, Flask, rabbitMQ, and Kubernetes as the resource manager.
I have some questions regarding the deployment:
How can I tell celery to keep a fixed number of tasks in rabbitMQ? This crawler should never stop crawling and should create a new task based on the social media's post rates (which is gathered from the previous crawling data), but the problem is, I don't have a task submitter for this process. Usually, there is a task submitter for celery that submits tasks, but there is no such thing as a task submitter in this project. We have a list of social media and the number of workers they need (stored in Postgres) and need celery to put a task in rabbitMQ as soon as a task is finished.
I have tried the solution to submit a task at the end of every job (Crawling Process), but this approach has a problem and is not scalable. In this case, the submitted job would be the last in the rabbitMQ queue.
I need a system to manage the free workers and assign tasks to them immediately. The system I want should check the free and busy workers and database post rates and give a task to the worker. I think using rabbitMQ (or even Redis) might not be good because they are message brokers which assign a worker to a task in the queue but here, I don't want to have a queue; I want to start a task immediately when a free worker is found. The main reason queueing is not good is that the task should be decided when the job is starting, not before that.
My insights on your problem.
I need a system to manage the free workers and assign tasks to them
immediately.
-- Celery does this job for you
The system I want should check the free and busy workers and database
post rates and give a task to the worker.
Celery is a task distribution system, it will distribute the tasks as you expect
I think using rabbitMQ (or even Redis) might not be good because they
are message brokers which assign a worker to a task in the queue
Using celery, you definitely need a broker, they just hold your messages, celery will poll the queues and distribute them to the right workers(priority, timeout, soft handling, retries)
but here, I don't want to have a queue; I want to start a task
immediately when a free worker is found. The main reason queueing is
not good is that the task should be decided when the job is starting,
not before that.
This is kind of a chain reaction or like triggering a new job once the previous one is done. If this is the case, you don't even need celery or a distributed producer-consumer system.
Identify the problem:
Do you need a periodic task to be executed at a point in time? ---> go with a cronjob or celery-beat(cron job-based celery scheduler)
Do you require multiple tasks to be executed without blocking the other running tasks - You need a producer-consumer system(Celery(out of the box solution, Rabbitmq/Redis Native Python Consumers))
3.If the same task should be triggering the new task, there is no need to have multiple workers, what will we achieve from having multiple workers if your work is just a single thread.
Outcome -- [Celery, RabbitMQ, and Kubernetes - Good combo for a distributed orchestrated system] or [a webhook model] or [recursive python script]
Reply to your below comment #alavi
One way of doing it can be like, write a periodic job(can run every
second/minute or an hour or whatever rate) using celery beat, which
will act as a producer or parent task. It can iterate all media sites
from DB and spawn a new task for crawling. The same work status can be
maintained in DB, based on the status, new tasks can be spawn. For a
start I can say like this parenting task will check if the previous
job is still running, or check the progress of the last task, based on
the progress decide upon, even we can think about splitting the crawl
job again into micro tasks and being triggered from the parent job.
You can collect some more x and y going further during development or
with performance.

CQRS - where should I place requests to external systems? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm wondering where requests to external systems (to be specific: a Webservice) should be placed in a CQRS-based system.
For example, given a system that sends a booking-request to an external flight service:
Should this be in the domain object, in the command handler for "bookFlight"? Or should this be in a saga, as a reaction to a domain object event "flightBookingPlaced"?
I'll make some assumptions:
The external request is part of the "transaction".
The external request is core to the behaviour of the command.
The external system response is synchronous so much as it either responds or fails, there are no callbacks or polling involved.
I would say it can belong in the command or as a series of commands.
Hide the external service behind an ACL or facade, make that a dependency on the command. The command will then represent the transition from "not booked" to "booked". Ignoring the complexities of the command "blocking" until complete (effectively), that'll cover what you need.
If you wanted to support a more granular approach, the small series of commands approach feels like it fits best:
not booked -> booking pending -> booked
Launch the event and trigger a RequestBookingCommand, which changes the booking state from "not booked" to "booking pending", and commits the transaction. This can then trigger the next command ExternalBookingCommand, which can work in the background without needing the domain object initially. The booking can be performed on the external system and if successful, take you from "booking pending" to "booked". If it fails, you can retry or revert the booking to "booking failed".
This then at least allows you to start putting validation around not attemping to double book etc.
I can't speak to sagas specifically, but I would like to think you could represent the protocol of "booking commands" as a little saga; mapping you from one domain state (not booked) to the eventual state (booked) with as many stops as you need inbetween.
In either approach, what is important is defending domain state and ensuring any transactions are integral. Going more granular with the states and events might help also because you can use better language (one of DDD's tenets) to describe what is occurring, such as RequestBookingCommand leaving you in a BookingRequested state, following onto a PerformExternalBooking command starting with a BookingRequested state and leaving you in a Booked or BookingFailed state. You can also then introduce domain events such as SuccessfullyBooked or BookingRequestedOnFoo.
My approach to these situations, usually, is to try not to overthink it and first build a model that matches how I describe it verbally. Frameworks and infrastructure can help you combat technical considerations (such as transactions or concurrency).
If this is not an internal microservice - really fast and stable, I would do this in Saga/Process Manager/Gateway - an async actor with its own state machine.
With external services you would want to have error processing, retries, timeouts - everything async, so your aggregate is not blocked.

Scheduling/delaying of jobs/tasks in Play framework 2.x app

In a typical web application, there are some things that I would prefer to run as delayed jobs/tasks. They tend to have some or all of the following properties:
Takes a long time (anywhere from multiple seconds to multiple minutes to multiple hours).
Occupy some resource heavily (CPU, network, disk, external API limits, etc.)
Result not immediately necessary. Can complete HTTP response without it. OK (and possibly even preferable) to delay until later.
Can be (and possibly preferable to) run on (a) different machine(s) than web server(s). The machine(s) are potentially dedicated job/task runners.
Should be run in response to other event(s), or started periodically.
What would be the preferred way(s) to set up, enqueue, schedule, and run delayed jobs/tasks in a Scala + Play Framework 2.x app?
For more details...
The pattern I have used in the past, and which I would like to replicate if applicable, is:
In handler of web request, or in cron-like call, enqueue job(s)
In job runner(s), repeatedly dequeue and run one job at a time
Possibly handle recording job results
This seems to be a relatively simple yet still relatively flexible pattern.
Examples I have encountered in the past include:
Updating derived data in DB
Analytics/tracking API calls for a web request
Delete expired sessions or other stale/outdated DB records
Periodic batch ETLs
In other languages/frameworks, I would typically use a job/task framework. Examples include:
Resque in a Ruby + Rails app
Celery in a Python + Django app
I have found the following existing materials, but unfortunately, I don't think they fit my use case directly.
Play 1.x asynchronous jobs API (+ various SO questions referencing it). Appears to have been removed in 2.x line. No reference to what replaced it.
Play 2.x Akka integration. Seems very general-purpose. I'd imagine it's possible to use Akka for the above, but I'd prefer not to write a jobs/tasks framework if one already exists. Also, no info on how to separate the job runner machine(s) from your web server(s).
This SO answer. Seems potentially promising for the "short to medium duration IO bound" case, e.g. analytics calls, but not necessarily for the "CPU bound" case (probably shouldn't tie up CPU on web server, prefer to ship off to different node), the "lots of network" case, or the "multiple hour" case (probably shouldn't leave that in the background on the web server, even if it isn't eating up too many resources).
This SO question, and related questions. Similar to above, it seems to me that this covers only the cases where it would be appropriate to run on the same web server.
Some further clarification on use-cases (as per commenters' request). There are two main use-cases that I have experienced with something like resque or celery that I am trying to replicate here:
Some event on the site (Most often, an incoming web request causes task to be enqueued.)
Task should run periodically. (Most often, this is implemented as: periodically, enqueue task to be run as above.)
In the case of resque or celery, the tasks enqueued by both use-cases enter queues the same way and are treated the same way by the runner/worker process. Barring other Scala or Play-specific considerations, that would be my initial guess for how to approach this.
Some further clarification on why I do not believe the Akka scheduler fits my use case out-of-the-box (as per commenters' request):
While it is no doubt possible to construct a fitting solution using some combination of the Akka scheduler (for periodic jobs), akka-remote and akka-cluster (for communicating between the job caller and the job runner), that approach requires a certain amount of glue code which is almost a delayed job framework in and of itself. If it exists, I would prefer to use an existing out-of-the-box solution rather than reinvent the wheel.

Operating System Overhead [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am working on a time-consuming computation algorithm and want to run it as fast as possible.
How much presence (running algorithm under it) of Operating System (Windows or Linux) slows the process?
Is there any example of "OS" specifically implemented to run predefined program?
First of all I'd like to introduce that I am also working on a very similar topic time-consuming computation algorithm! So much common here OR maybe just a co-incidence...
Now,let's proceed to the answer section :-
Presence of the process(your algorithm which is running) in OS is affected by daemons and other available user programs waiting in the ready queue depending on the scheduling algorithm applied by your OS. Generally, daemons are always running and some of the system applications related process just preempts other low-priority processes(maybe like your's if your process has lower priority,generally system processes and daemons preempt all other processes). The very presence of OS(Windows Or Linux)---I am considering only their kernel here--- doesn't affect as the kernels are the manager of the OS and all process and tasks. So,they don't slow the process but daemons and system processes are heavy one and they do affect your program significantly. I also wish if we could just disable all the daemons but they are just for the efficient working of OS(like mouse control,power efficiency,etc) all in all...
Just for an example, on Linux and Unix based systems, top command provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system.
So, if you will execute this code on a Linux system,you'll get the result of all the heavy processes which are intensely consuming memory! here, you'll find that apart from your process which is heavily utilising memory there are several daemons like powerd, moused, etc., and other System processes like Xorg,kdeinit4,etc... which does affect the user processes !!!
But, one thing is clear that each process or daemons generally won't occupy more memory than your intense computation algorithm process! The ratio will be lesser instead may be one-eighth,one-fourth!!!
UPDATE BASED ON COMMENTS :-
If you're specifically looking for the process to be running on the native hardware without OS facilitation/installation---you have got two choices.
Either develop the code in machine-level language or assembly languages or other low-level languages which will directly run your process on the hardware without the need for OS to manage memory sections and all and other system processes and daemons!
Second solution is to develop/utilise a very minimal OS comprising of only those settings which are required for your algorithmic program/process! And,then this minimal OS won't be a complete OS---thereby lack of daemons,multiple system calls as in major OS' like Windows,Linux,Unix,etc.
One of the useful link which Nazar554 has provided in the comment section.I'll just quote him :-
if you really want to remove any possible overhead you can try:
BareMetal OS
In your case,it seems you are preferring the first option more than the other. But,you can achieve your task in either way!
LATEST EDIT :-
It's just a feedback from myside as I couldn't get you more clearly! It would be better if you ask the same question on Operating Systems Beta as there are several experts sitting to answer all queries regarding OS development/functionality,etc! There you'll receive a more strong and positive response regarding every single tiny detail which is relevant to your topic that I might have missed.
Best wishes from myside...
The main idea in giving processor to a task is same among all major operating systems. I've provided a diagram demonstrating it. First let me describe this diagram then I'll answer your question.
Diagram Description
When a operating system wants to execute some tasks simultaneously, it can not give processor to all of them at once. Because processor can process a single operation at a time and it can't do more that one tasks processing at the same time. Because of it OS shares it among all tasks in a time-slot by time-slot manner. In other words each task is allowed to use the processor just in its own time slot and it should give the processor back to the OS once its time slot finished.
Operating systems uses a dispatcher component to select and dispatch a pending task to give the processor to it. What is different among operating systems is how the dispatcher works, What does a typical dispatcher do? in simple words :
Pick next pending task from the queues based on a scheduling algorithm
Context switching
Decide where the removed task (from processor) should go
Answer to your question
How much presence (running algorithm under it) of Operating System (Windows or Linux) slows the process?
It depends on:
Dispatcher algorithm (i.e. which OS do you use)
Current loads on the system (i.e. how much applications and daemons is running now)
How much priority have your process task (i.e. real-time priority, UI priority, regular priority, low ,...)
How much I/O stuff is going to be done by your task (Because I/O requesting tasks usually are scheduled in a separate queue)
Excuse me for my English issues, because English isn't my native language
Hope it helps you
Try booting in single-user mode.
From debian-administration.org and debianadmin.com:
Run Level 1 is known as 'single user' mode. A more apt description would be 'rescue', or 'trouble-shooting' mode. In run level 1, no daemons (services) are started. Hopefully single user mode will allow you to fix whatever made the transition to rescue mode necessary.
I guess "no daemons" is not entirely true, with wiki.debian.org claiming:
For example, a daemon can be configured to run only when the computer is in single-user mode (runlevel 1) or, more commonly, when in multi-user mode (runlevels 2-5).
But I suppose single-user mode will surely kill most of your daemons.
It's a bit of a hack, but it may just do the job for you.

Quartz job fires multiple times

I have a building block which sets up a Quartz job to send out emails every morning. The job is fired three times every morning instead of once. We have a hosted instance of Blackboard, which I am told runs on three virtual servers. I am guessing this is what is causing the problem, as the building block was previously working fine on a single server installation.
Does anyone have Quartz experience, or could suggest how one might prevent the job from firing multiple times?
Thanks,
You didn't describe in detail how your Quartz instance(s) are being instantiated and started, but be aware that undefined behavior will result if you run multiple Quartz instances against the same job store database at the same time, unless you enable clustering (see http://www.quartz-scheduler.org/docs/configuration/ConfigJDBCJobStoreClustering.html).
I guess I'm a little late responding to this, but we have a similar sort of scenario with our application. We have 4 servers running jobs, some of which can run on multiple servers concurrently, and some should only be run once. As Will's response said, you can look into the clustering features of Quartz.
Our approach was a bit different, as we had a home-grown solution in place before we switched to Quartz. Our jobs utilize a database table that store the cron triggers and other job information, and then "lock" the entry for a job so that none of the other servers can execute it. This keeps jobs from running multiple-times on the servers, and has been fairly effective so far.
Hope that helps.
I had the same issue before but I discovered that I was calling scheduler.scheduleJob(job, trigger); to update the job data while the job is running which is randomly triggered the job 5-6 times each run. I had to use the following to update the job data without updating the trigger scheduler.addJob(job, true);