Efficient Way To Design Database For My Specific Use Case - postgresql

I am building a website where users can view emails that are fetched from my gmail account.
Users can read emails, change their labels & archive them. Each email has metadata associated with it, and users can search through the emails based on the metadata. Furthermore, each user is associated with an organization. Changes made to an email (e.g., if the email is archived, or if the tags are changed) by any one user gets reflected across the organization.
Right now, I store all emails in a single table along with their metadata. However, the problem is that I now have over 20,000 emails in the database, and searching through them based on the metadata takes too much time.
Now one way to optimize this is that when a user runs a search command then the system should only search through emails that are in the inbox & not archived or deleted. But the issue is that where one organization might have archived an email, another organization might have not. So I can not create separate tables for Inbox & Archive. By default emails also get auto-archived after some time (this option can be disabled also), so the Inbox generally has around 4 thousand emails, whereas the archive has many many times that.
My question is does it make sense to create separate Inbox & Archive tables for each organization & just copy all new incoming emails to the tables? Since organizations only join by invitation, so I do not expect the total number to cross 100. Or would this just explode and become too difficult to handle in the code later on, with so many tables.
I am using PostgreSQL for this.

If your operational workflow says "upon adding a new customer create such-and-such a table" then you have a serious database design problem. When you have more than about 50 customers things will slow down due to per-table overhead. In other words, when you start to succeed in business you will start to fail in performance. Not good.
You have a message entity. It, no doubt, contains the message's text, subject, timestamp, from, to, and other attributes that form part of the original message. Each message will have a unique (primary key) message_id. But the entity should not contain attributes like inbox and archive, because those attributes relate to the organization.
You need an org entity. Each organization has a unique org_id, a 'name and other attributes of the organization.
Then you need an org_message table. Its primary key contains both org_id and message_id. And it will contain Boolean attributes like archived and read, and a VARCHAR attribute naming its current folder. So, each org's window into your message table is organized by the org_messages.
If you start with an organization named, for example, shipping, and you want to see all its messages, you use a query like this.
SELECT org.id, org.name,
message.*,
COALESCE(org_message.read, 0) unread,
COALESCE(org_message.archived, 0) archived,
COALESCE(org_message.folder, 'inbox') folder
FROM org
LEFT JOIN org_message ON org.org_id = org_message.org_id
LEFT JOIN message ON message.message_id = org_message.message_id
WHERE org.name = 'shipping';
The LEFT JOINs and COALESCEs work to set each org's defaults for each message to unread, not archived, and in the inbox folder. That way you don't have to create a row in org_message for each organization and each message until the org handles the message.
If you want to mark a message as read and archived for a particular org, you INSERT a row into org_message, using ON CONFLICT DO UPDATE
INSERT INTO org_message (org_id, message_id, read, archived, folder)
VALUES (?, ?, ?, ?, ?) ON CONFLICT DO UPDATE;
That either sets or updates the org's attributes for the messages
If you find that searching these tables is too slow, you'll need indexes. That's the subject of a different question.

Related

Approval workflow tables with 3NF form in PostgreSQL

I am working on one HRMS application. There is one Transfers workflow in application. As a part of it employee can get transferred from one department/office to other department/office.
Transfers process works as below:
Employee checks for vacancy in application and applies for transfer if vacancy is available. Once employee submits transfer form, it goes for approval to employee's department/office. There are 5 approver in approval flow. In PostgreSQL to achieve 3NF I have decided to create 3 separate tables as below:
1) One for employee's transfer request
2) Transfer request approval request
3) Transfer request approval request details
So in above image you can see for approval request I have created two separate tables i.e. transfer_request_approval and transfer_request_approval_details. transfer_request_approval refers transfer_request_id as FK and transfer_request_approval_details contains transfer_request_approval_id as FK and all 5 approver's record for that particular approval request.
e.g. We have transfer request with id=1, so there would be one row in transfer_request_approval table (e.g. transfer_request_approval_id = 1, transfer_request_id = 1 as FK). In transfer_request_approval_details there would be 5 rows for 5 approver's records.
(Note: approval_status is either Approved or Reject or Rework, note is nothing but user can put some remark. If approver puts rework status it will go back to approver1 again e.g. if Approver5 put rework status it will go to approver1 again for same process. so there would be 10 records for that entire approval request in transfer_request_approval_details table).
Question : Do I really need to have transfer_request_approval_details table or I can include all those approvers records in transfer_request_approval table?
In order to maintain normal form, you need the approval details table separate, otherwise you end up with duplicate values of status for each approver. However, I see no need to have separate transfer_request and transfer_request_approval tables. Just add status to transfer_request and link to apporver_details.
I think you can include all those approvers records in the transfer_request_approval table.
There is no meaning in storing data in a separate 2 tables.

Mixpanel: Merge duplicate people profiles and also merge events

I have duplicate profiles due to switching of the identifier in the code. I would like to merge the duplicate profiles now and also merge the events / activity feed.
I got the API working and by calling
deduplicate_people(prop_to_match='$email',merge_props=True,case_sensitive=False,backup=True,backup_file=None)
Duplicates are in fact removed, but the events / activity feed is not merged. So I'd loose many events.
Is there a way to remove duplicates and merging events / activity feed at the same time?
Duplicates happen because some persons use ID and others email as distinct_id due to the change of identifier. The events are referenced by that ID or email to the corresponding person.
So here is what I ended up doing to re-create the identity mapping for people and their events:
I used Mixpanel's API (export_people / export_events) to create a backup of people and events. I wrote a script that creates a mapping "distinct_id <-> email" for people that use an actual ID as distinct_id and not an email (each person has an $email field regardless of the content of the $distinct_id).
Then I went over all exported events. For each event that had an ID as distinct_id I used the mapping to change that distinct_id to email. Updated events were saved in a JSON file. Thus creating the reference from events to person using email as distinct_id -- the events that got lost otherwise.
Then I went ahead and used the de-duplicate API from Mixpanel to delete all duplicates -- thus loosing some events. Now I imported the events from the step before, which gave me back those missing events.
Three open questions to consider before using this approach:
I believe events are not actually deleted on deduplication. So by importing them again there are probably duplicate events in the system that are just not referenced to a person and that may show up at some point.
the deduplication by $email did keep the people that use email as distinct_id and removed the ones with the actual ID. I don't know if this is true every time or may have been a coincidence. My approach will fail for persons that still use ID as distinct_id.
I suppose it's generally discouraged to hack around the distinct_id like that, because making a mistake may result in data loss. So make sure to get it right..

How to import users in CRM 2011 with source GUID

We have three Organization tenents, Dev, Test and Live. All hosted on premise (CRM 2011. [5.0.9690.4376] [DB 5.0.9690.4376]).
Because the way dialogs uses GUIDs to refference record in Lookup, we aim to maintain GUIDs for static records same across all three tenents.
While all other entities are working fine, I am failing to import USERS and also maintain their GUIDS. I am using Export/Import to get the data from Master tenent (Dev) in to the Test and Live tenents. It is very similar to what 'configuration migration tool' does in CRM 2013.
Issue I am facing is that in all other entities I can see the Guid field and hence I map it during the import wizard but no such field shows up in SystemUser entity while running import wizards. For example, with Account, I will export a Account, amend CSV file and import it in the target tenant. When I do this, I map AccountId (from target) to the Account of source and as a result this account's AccountId will be same both in source and target.
At this point, I am about to give up trying but that will cause all dialogs that uses User lookup will fail.
Thank you for your help,
Try following steps. I would strongly recommend to try this on a old out of use tenant before trying it on live system. I am not sure if this is supported by MS but it works for me. (Another thing, you will have to manually assign BU and Roles following import)
Create advance find. Include all required fields for the SystemUser record. Add criteria that selects list of users you would like to move across.
Export
Save file as CSV (this will show the first few hidden columns in excel)
Rename the Primary Key field (in this case User) and remove all other fields with Do Not Modify.
Import file and map this User column (with GUID) to the User from CRM
Import file and check GUIDs in both tenants.
Good luck.
My only suggestion is that you could try to write a small console application that connects to both your source and destination organisations.
Using that you can duplicate the user records from the source to the destination preserving the IDs in the process
I can't say 100% it'll work but I can't immediately think of a reason why it wouldn't. This is assuming all of the users you're copying over don't already existing in your target environments
I prefer to resolve these issues by creating custom workflow activities. For example; you could create a custom workflow activity that returns a user record by an input domain name as a string.
This means your dialogs contain only shared configuration values, e.g. mydomain\james.wood which are used to dynamically find the record you need. Your dialog is then linked to a specific record, but without having the encode the source guid.

Message library

The scenario is: some user sending messages to some group of people.
I was thinking to create one ROW for that specific conversation into one CLASS. WHERE in that ROW contains information such "sender name", "receiver " and addition I have column (PFRelation) which connects this specific row to another class where all messages from the user to the receiver would be saved(vice-versa) into.
So this action will happen every time the user starts a new conversation.
The benefit from this prospective :
Privacy because the only convo that is being saved are only from the user and the receiver group.
Downside of this prospective:
We all know that parse only provide 30reqs/s for free which means that 1 min =1800 reqs. So every time I create a new class to keep track of the convo. Am I using a lot of requests ?
I am looking suggestions and thoughts for the ideal way before I implement this messenger library.
It sounds like you have come up with something that is similar to what I have used before to implement messaging in an app with Parse as a backend. It's also important to think about how your UI will be querying for data. In general, it's most important to ensure that it is very easy and fast to read data. For most social apps, the following quote from Facebook's engineering team on Haystack is particularly relevant.
Haystack is an object store that we designed for sharing photos on
Facebook where data is written once, read often, never modified, and
rarely deleted.
The crucial piece of information here is written once, read often, never modified, and rarely deleted. No matter what approach you decide to take, keep that in mind while engineering your solution. The approach that I have used before to implement a messaging system using Parse is described below.
Overview
Each row (object) of the Message class corresponds with an individual text, picture, or video message that was posted. Each Message belongs to a Group. A Group can be as small as 2 User (private conversation) or grow as large as you like.
The RecentMessage class is the solution I came up with to deal with quickly and easily populating the UI. Each RecentMessage object corresponds to each Group that a given User may belong. Each User in a Group will have their own RecentMessage object which is kept up to date using beforeSave/afterSave cloud code triggers. Whenever a new Message is created, in the afterSave trigger we want to update all of the RecentMessage objects that belong to the Group.
You will most likely have a table in your app which displays all of the conversations that the user is part of. This is easily achieved by querying for all of that user's RecentMessage objects which already contains all of the Group information needed to load the rest of the messages when selected and also contains the most recent message's data (hence the name) to display in the table. Alternatively, RecentMessage could contain a pointer to the most recent Message, however I decided that copying the data was a beneficial tradeoff since it streamlines future queries.
Message
group (pointer to group which message is part of)
user (pointer to user who created it)
text (string)
picture (optional file)
video (optional file)
RecentMessage
group (group pointer)
user (user pointer)
lastMessage (string containing the text of most recent Message)
lastUser (pointer to the User who posted the most recent Message)
Group
members (array of user pointers)
name or whatever other info you want
Security/Privacy
Security and privacy are imperative when creating messaging functionality in your app. Make sure to read through the Parse Engineering security blog posts, and take your time to let it all soak in: Part I, Part II, Part III, Part IV, Part V.
Most important in our case is Part III which describes ACLs, or Access Control Lists. Group objects will have an ACL which corresponds to all of its member User. RecentMessage objects will have a restricted read/write ACL to its owner User. Message objects will inherit the ACL of the Group to which they belong, allowing all of the Group members to read. I recommend disabling the write ACL in the afterSave trigger so messages cannot be modified.
General Remarks
With regards to Parse and the request limit, you need to accept that fact that you will very quickly surpass the 30 req/s free tier. As a general rule of thumb, it's much better to focus on building the best possible user experience than to focus too much on scalability. By and large, issues of scalability rarely come into play because most apps fail. Not saying that to be discouraging — just something to keep in mind to prevent you from falling into the trap of over-engineering at the cost of time :)

run one crystal report multiple times with different parameters

I am using the BusinessObjects Enterprise server and i have a report that uses the "department" as a parameter field to control the selection of records. there are 20 different departments.
I want to schedule this report to run 20 times with a new single department selected each time. Is there a way to do this without scheduling the report 20 times?
thanks for any help
Yes, you can. A bit of a process:
Create a Group for each department
Add users to groups as desired; ensure that they have an email address
Create a Profile; add a Profile Value for each Group (one Profile Value for each Group/Department ID combination); the Profile Values will be strings (important)
Create a Publication; add your report to the Source Document; add the Groups that you created earlier to the Enterprise-Recipient list
now define the Personalization (the key part of this); you can either add a Filter (set TABLE.FIELD or FORMULA to your Profile (Report Field & Enterprise Recipient Mapping columns) OR set the Department ID parameter to the appropriate Enterprise Recipient Mapping value (your parameter needs to be a string for this to work; note comment earlier).
set Destination to Email
set other properties (e.g. Format) as desired
Save & Close
You can also schedule this Publication to occur on a recurring basis.
Notes:
This solution uses the Publication Job Server (runs the Publication), the Crystal Reports Job Server (to run the report), the Adaptive Processing Server (does the bursting), and the Destination Job Server (send the email messages). You may want to create a separate set of these services and package them into their own server group, then force the Publications to use only this server group.
Related to the earlier point, you may want to create a server group just for scheduled reports and force recurring instances to use this server group. Why? Publications don't seem to do a good job of waiting for reports in a queue--if a Crystal Reports Job server isn't available, the Publication will fail. Forcing scheduled-report instances to generate on their own server group helps to eliminate this issue.
If you make significant changes to the report (e.g. add a parameter), you may need to remove then add the report to the Source-Document list to ensure that it has the most-recent definition; other changes to the report (e.g. adding a column) don't seem to require this attention. YOUR MILEAGE MAY VARY.
You can design the report with the department as a group.
Have a new page after each group and be sure to print the records from the department group section, not the details.
This is assuming you are getting all the departments inside your database fields.