good architecture for quartz based email processor - quartz-scheduler

I need to write a windows service to send emails. The emails will likely be stored in a database table and they should be sent as early as convenient. It would be advantages to have multiple threads sending messages as there will be hurts at certain times of the day however it is not good to send the same message multiple times.
So I'm having a little bit of trouble understanding in this kind of scenario how I can best leverage quartz.net to alleviate some of queueing and concurrency issues. So my architecture questions are:
1. For this kind of scenario, is it best for a Job to check if there are emails to send or should a job be to actually send one email?
2. If the answer to 1) is to check for emails to be sent then that would leave me with a concurrency issue and I would need to use DisallowConcurrentExecution which would result in only 1 email being sent at a time?
3. If there answer to 1) is send a single email then I take it the job details would need to reflect the specific ID of the email to be sent?
4. In either case - two web users could trigger the creation of the same email job (concurrently). So it doesn't seem that Quartz really helps solve my problem - it might provide a nice architecture for a unit of work and controlling polling frequency but not really the core of my problem? Or am I mssing / over thinking something?
Finally, just to be clear, each email relates to a specific Order so there is ID and state potential. So because two web users can send the same email at the same instant in time should not result in two emails being sent.
Look forward to any advice.
Thanks
Josh

Quartz.Net would meet your scheduling needs.
However, you have conflicting needs. You want "more than one thread" to send the emails, but you also want "Do not want duplicate emails".
The DisallowConcurrentExecution will prevent multiple instances of the same job running at the same time. However, if you have only one instance of the job running, you don't know which individuals emails have been sent or not sent.
If you only keep "these emails have been sent, and these haven't" in memory.....you're always at risk of sending duplicates.
You can solve this, but you're gonna have to have a "pessimistic" flag on which emails have already been sent. Like at the database level.
So if you want multiple threads to send emails...that's ok. But your "get some emails to send" code is going to have to 'mark' the emails it is working on. (So the next thread doesn't get them). Then you have to mark them again right after they are sent.
Quartz is good for scheduling the "when" your jobs run. But it doesn't have the ability to "track" which emails you need to send and which ones have already been sent. That's gonna be your responsibility.
I had this similar problem....where I had many many users trying to "get at" a bunch of to-do items. Thus why I wrote this blog entry for Sql Server. I needed to "mark" the rows, but also had to order them before I marked them.
http://granadacoder.wordpress.com/2009/07/06/update-top-n-order-by-example/
I also added some "hints".......
WITH ( UPDLOCK, READPAST , ROWLOCK ) –<<Optional Hints
because so many different users were trying to "get-at" the items.
(Think about how T1cket M#ster has to work.......there has to be some pessimistic locking on the tickets....and they have a timer that releases the locks if you don't buy the tickets in time).
Hope that helps.

Related

Instant Messaging Schema design advice

I'm trying to build an Instant Messaging functionality in my app as part a bigger project.
Chats can have more than 2 participants (group chats)
If participant A delete a message, it still should be visible to participant B (that's why I used the Message Participants table)
Same applies to Conversation.
By same logic, if all participants delete the conversation/message, it should be erased from DB.
Questions :
I'm afraid that this schema is too cumbersome, meaning that the queries will be too slow once the app gets certain traffic mark (1k active users ? I'm guessing)
Message Participants will have multiple records for each message - one for each participants in the chat. Instant Messaging means it will involve those writes with very tight timings. Wouldn't that be a problem?
Should I add a layer of Redis DB, to manage a chat's active session's messaging? it will store the recent messages, and actively sync the PostgreSQL db with those messages (perhaps with Async transactions functionality that postgresql has?)
UPDATED schema :
I would also gladly hear ideas for having a "read" status functionality. I'm assuming it's much more complex with Group chats, so at least offering that for 1:1 chats would be nice.
I am a little confused by your diagram. Shouldn't the Conversation Participants be linked to the Conversations instead of the Message? The FKs look all right, just the lines appear wrong.
I wouldn't be worried about performance yet. The Premature Optimization Anti-Pattern warns us not to give up a clean design for performance reasons until we know whether we are going to have a performance problem. You anticipate 1000 users - that's not much for a modern information system. Even if they are all active at the same time and enter a message every 10 seconds, this will just mean 100 transactions per second, which is nothing to be afraid of. Of course, I don't know the platform on which you are going to run this. But it should be an easy task to set up those tables and write a simple test program that inserts those records as fast as possible.
Your second question makes me wonder how "instant" you expect your message passing to be. Shall all viewers of a message receive each keystroke of the text within a millisecond? Or do they just need to see each message appear right after it was posted? Anyway, the limiting factor for user responsiveness will probably be the network, not the database.
Maybe this is not mainly a database design issue. Let's assume you will have a tremendous rate of postings and viewings. But not all conversations will be busy all the time. If the need arises - but not earlier - it might be necessary to hold the currently busy conversations in memory and using the database just as a backup for future times when they aren't busy any more.
Concerning your additional comments:
100k users: This is a topic not for this forum, but concerning business development of a startup. Many founders of startup companies imagine huge masses of users being attracted to their site, while in reality most startups just fail or only reach very few. So beware of investments (in money, but also in design and implementation effort) that will only pay in the highly improbable case that your company will be the next Whatsapp.
In case you don't really anticipate such masses of users but just want to imagine this as a programming exercise, you still have a difficult task. You won't have the platform to simulate the traffic, so there is no way to make measurements on where you actually have a performance problem to solve. That's one of the reasons for the Premature Optimization warning: Unless you know positively where you have a bottleneck, you - and all of us - will be just guessing and probably make the wrong decisions.
Marking a message as read is easy: Introduce a boolean attribute read at Message Participants, and set it to true as soon as, well, the user has read the message. It's up to your business requirements in which cases and to whom you show this.

Processing e-mail in Odoo 11 without previous threads – is this possible?

This Odoo company we're working with basically sends a lot of e-mail. One e-mail thread can turn into 100+ e-mails with different people brought into the conversation (CC'd) at different times. Due to the complexity of their e-mail management, they want to use their Gmail interface (Google Hosted) and CC an e-mail into odoo and they want it to get tracked in a thread. I've basically already done that... they have an e-mail like odoo+res.partner-432#domain.com (although it's hashed to not be easily readable) - they CC this and the full body thread gets included in chatter (mail.message) under respective model / id.
The challenge with this: the chatter messages can get huge very fast, due to their e-mail messages (because each e-mail includes main reply, and all previous history on thread). I've looked into some systems that have a "reply above this line" - and it just takes the latest message. And in those systems, eg. ticketing systems such as Zendesk, help scout, I believe the teams are using the ticketing system (not a gmail) and thus there is much more control over the inbox and incoming email (not to mention, those e-mails are usually 1-to-1, not including groups).
My questions:
Is there any other workaround that you see here to have odoo pull in only the last e-mail reply and not the full e-mail thread? I could probably build something like this: https://github.com/zapier/email-reply-parser - and hook it into odoos e-mail parsing, but that works on text format e-mails only (not HTML)... only. So it's not bulletproof, and I'm not sure it's worth it.
Even if this client DID use odoo 100%, I still don't think it would be possible to get it to work the way they want without major customizations (eg. Odoo's default behavior is to include all past e-mail threads)
I'm curious if anyone here see's any other solutions, otherwise – I doubt there is something here I haven't seen. :) (But very open to be proven incorrect!)

Stuck with understanding how to build a scalable system

I need some guidance on how to properly build out a system that will be able to scale. I will give you some information about what I am trying to do and then ask my specific question.
I have a site where I want visitors to send some data to be processed. They input the data into a textarea or upload it in a file. Simple. The data is somewhat preprocessed on the client side before a POST request is made to a REST endpoint.
What I am stuck on is what is a good way to take this posted data store it and then associate an id with it that references the user since I cannot process the data fast enough for it to be returned to the user in a reasonable amount of time?
This question is a bit vague and open to opinion, I admit it. I just need a push in the right direction to keep moving. What I have been considering is throwing the data into a message queue and then having some workers process the data elsewhere and when the data is processed alert the user as to where to find it with some sort of link to an S3 bucket or just a URL to a file. The other idea was to just run the request for each item to be processed against another end-point that already processes individual records in some sort of loop client side. The problem is as follows with this idea:
To process the data it may take somewhere from 30 minutes to 2 hours depending upon the amount that they want processed. It's not ideal for them to just sit there and wait for that to finish depending on the amount of records they need processed, so I have ruled this out mostly.
Any guidance would be very much appreciated as I don't have any coworkers to bounce things off of, nor do I know many people with the domain knowledge that I could freely ask. If this isn't the right place to ask this, could you point me in the right direction as to where it should be asked?
Chris
If I've got you right, your pipeline is:
Accept item from user
Possibly preprocess/validate it (?)
Put into some queue
Process data
Return result.
You man use one or several queues on stage (3). Entity from user gets added to one of the queues. If it's big enough, it could be stored in S3 or storage alike, and only info about it put into the queue: link, add date, user id (or email of alike). Processors can pull items from queue and give feedback to users.
If you have no strict requirements on order, things get much simpler: you don't need any sync between them. Treat all the components: upload acceptors, queues, storages and processors as independent pools of processes. Monitor each pool separately. If there's some bottlenecks - add machines to that pool.

How can I search for past sent emails with Sendgrid?

As Sendgrid's documentation makes clear, their web GUI activity page is only searchable for the past 7 days.
How do I search for activity from farther in the past?
Web API documentation is here, but I can't find anything about just plain searching for info on sent emails. All I see are endpoints for seeing particular categories of emails' various fates, like blocks, bounces, invalid emails, and "filters", which seem like actions and not like filters.
It's got to be possible to just find info about some particular sent email, right?
It's not possible. As you noted, the documentation clearly states that:
Email activity only shows the most recent 7 days. To access data in
real time, we recommend that you consider implementing our Event
Webhook.
If you want to record all the history associated with your account you should record and save it yourself. You can record all the emails you send provided you have an endpoint to do so. See here: https://sendgrid.com/docs/User_Guide/Settings/parse.html
Later Edit:
"real time" means "as it happens", it does not mean "history searchable at any point in time".
When you use an API, as a developer, the responsibility to log all API calls and responses lies with you. While it's true that bounces aren't necessarily reported in the API call response, the SendGrid API offers several ways in which you can be notified. Personal opinion: I know this functionality is often omitted in the MVP because you need to go to market as soon as possible, but an ELK stack is not that hard to set up.
There are several ways you can look for bounces and other events as you can see here: https://sendgrid.com/docs/Classroom/Track/Bounces/bounce_reports_how_can_i_be_notified.html
Webhook for events: http://sendgrid.com/docs/API_Reference/Webhooks/event.html
Enabling Bounce Forwarding on your account
Bounce API: https://sendgrid.com/docs/API_Reference/Web_API_v3/bounces.html
If you really need to find out what happened on day X with email send Y, you can contact their Support team. They can probably look it up for you.
Personal opinion:
That 7 days is not a random number. I'm willing to bet that SendGrid does in fact log all calls you made but it can't provide them for an earlier time. When you use Facebook API, Twitter API, etc. You don't expect them to provide you with historical data of every API call you made. This is an ungodly amount of data. We're talking about an API that is used to send probably upwards of millions of emails per day, maybe even more. I believe they actually did the math and recalling historical data from earlier would put an unnecessary strain on the system, it would take a long time to answer such a request.
I'm sorry if I went on a bit of a rant but people often don't think about the volume of data needed to store such things and how much it would cost to search it.

PHPlist - running it from localhost with tracking via live website

I've got a significantly large mailing list - 50K+ subscribers. To avoid stressing my servers, I would like to avoid sending emails through a components embedded on my website. Sending through here sends the CPU usage through the roof - so I'd like to be able to send emails locally. I can easily send emails through mandrillapp, so sending the emails out is not a problem. However, I've hit a bit of a snag.
Phplist seems to assume it is living on a public site, and inserts tracking info which routes the users to a phplist directory on my site (which obviously) does not exist.
Question 1: First of all, I'd like to avoid embedding this tracking - is this possible? Or else is there someway to include it and avoid the 404 error. Would I have to install phplist anyway on the server?
Question 2: I've already got acymailing to handle unsubscribes, so is it actually possible to keep this in place - just to make sure the acymailing is still my point of reference?
Question 3: How do people handle sending out large mailing lists? I know CampaignMonitor, MailChimp etc, but these get a bit expensive for my situation. I'd like to keep sending "internally" so to speak. Is there an elegant solution which will NOT kill my server but is not too expensive? I know I want to have the cake and eat, but it would be nice to hear what people out there are doing.
TIA
David