Parse Platform on iOS: Relations, Joins, or Arrays for Large Many-to-Many? - swift

In the Parse.com API reference for Swift on iOS, it is very clear when to use the different kinds of One-to-Many relationships, based on the expected size of the Many side.
But I find it less clear on what kind of Many-to-Many relationships to use when both sides could be very large.
In my case, I have a Charity object that my Users can make small (often one-dollar) contributions to--so each User could conceivably make thousands of these contributions, and each Charity could have thousands of Users making contributions to it.
The Many-to-Many options listed for this kind of thing are Parse Relations, Join Tables, and Arrays, of which the docs explain:
Arrays should be used when the relationship will reliably include under 100 references, which is very clear and helpful guidance that I should not use Arrays.
The docs say Parse Relations could be used, for instance, to connect Books with multiple Authors and Authors with multiple Books--a situation in which a given Book is unlikely to have over 100 Authors, and only rarely will an Author have over 100 Books--so it's unclear if this is appropriate when both sides could be very large, as in my case.
The docs say Join Tables should be used when extra metadata should be attached to each relationship, so for one thing, I don't at present have an explicit need for this, and for another, the docs don't seem to even mention anything about how or if it matters how large each side of the Many-to-Many relationship is.
In the absence of any other information, it looks like I should use Join Tables, but only because the docs don't imply that I shouldn't, and not for the reason the docs say I should.
Which seems like a flimsy rationale.
I would greatly appreciate any guidance anyone can give.

Behind the scenes, when you use Relation, Parse Server automatically creates a Joint Table for you and delivers some APIs for easily managing and fetching its data. So, in terms of performance, it should be very similar.
The downside of the Relation is the impossibility to add new fields to this "Joint Table" it creates. So, if you need, for example, to store the charities that each of the users like, a relation between User and Charity would be a good fit, because you just need to store that the relation exists and do not need to store any extra information.
On the other hand, if you need to store the donations that each user did to each of the charities, I'd create a Joint Table called Donation or UserCharity with a pointer to the User class, a pointer to the Charity class, and the value of the donation. In this case, Relation is not a fit because you need to store the donation value.

Related

Does 2 additional tables better than one with meta?

Have a question about architecture: I have 2 subjects, DocumentLetter and DocumentOther, both should be approved by managers.
What would be better: to use 2 additional models DocumentLetterApprove and DocumentOtherApprove with entity relations, OR one additional table without relations but contains info about model identity (columns ModelName and ModelID)?
Or another example, attachments for different documents.
Letter, contract - 2 different tables and each should have own attachment.
I can use additional table for each model (for letter and for contract) or create one table with fields field ModelName and ModelID?
Personally, I would favor keeping the separate entities /w the relationships if there is any possibility that the related entities (approvals) could be in any way different depending on what they are applied to. I avoid ambiguously linked tables unless they represent a large 1 to many entity that might be associated to one of a number of other entities.
The problem with using something like a "ParentType" + "ParentId" is that you cannot leverage any form of FK constraint between the related tables. This also means you cannot leverage EF relationships given there will probably be times loading one of documents and wanting to know if it is approved and details from the approval.
If an Approval for the different document types is expected to be identical then I would sooner declare a common Approval table/Entity and put an ApprovalId on each of the document type tables to establish a many-to-1 relationship from the document to the approval.
If an approval is identical and can form a many to many, then a suitable many-to-many relation table DocumentLetter - DocumentLetterApproval (FKs) - Approval (Approval details) can be employed.
If a Letter approval vs. other approval could be different then: DocumentLetter - DocumentLetterApproval (Approval details)
Design decisions like this usually come from considerations around DRY (Don't Repeat Yourself). What advice I can give is that KISS (Keep It Stupidly Simple) should trump DRY, and that DRY should only apply to logic/structure that is proven to be identical. (not merely expected to be identical, or worse, expected to be similar) DRY should be a re-factoring consideration for constant improvement, not an up-front design decision. Coding for DRY too early ends up costing you time when you paint yourself into corners. By keeping code fluid these relationships can be proven, then if they are proven to be identical, re-factored into a single entity. Time is still spent re-factoring, but re-factoring to make code structure better rather than making code worse when having to work around design assumptions.
An example where i might consider an ambiguous loosely linked linked table would be something like File Attachments. I might have several entities that can hold references to 1 or more attachments. Attachments are not something I would need to link to often, but rather through an explicit action that I could fire off an additional query for anyways since I'm not about to pre-load attachment details when loading a document. In this case an attachment table might have a ParentType and ParentId indexed so that I can quickly get details for a particular document or other entity. I would never try to do something like Context.Documents.Include(x => x.Attachments) or the like, there would be no such reference available. Attachments would always be accessed by single document so I would resort to Context.Attachments.Where(x => x.ParentType == ParentTypes.DocumentLetter && x.ParentId == documenLetterId).ToList();
I have experience working on systems that were designed solely with these types of ambiguously linked tables. They are not only extremely slow as they scale to any reasonable size, but they are also extremely error prone as systems evolve and the nature of the relationships change. Records have a tendency to get out of sync with the expected rules.

Modeling many to many relations with postgreSQL

I work in cattle production and I am learning about database design with postgreSQL. Now I am working on an entity attribute relationship model for a database that allows to register the allocation of the pastures in which cattle graze. In the logic of this business an animal can be assigned to several grazing groups during its life. Each grazing group in turn has a duration and is composed of several pastures in which the animals graze according to a rotation calendar. In this way, at a specific time, animals graze in a pasture that is part of a grazing group.
I have a situation in which many grazing groups can be assigned to many animals as well as many pastures. Trying to model this problem I find a fan trap because there are two one-to-many relationships for a single table. According to this, I would like to ask you about how one can deal with this type of relationship in which one entity relates to two others in the form of many-to-many relationships.
I put a diagram on the problem.
model diagram
Thanks
Traditionally, using a link table (the ones you call assignment) between two tables has been the right way to do many-to-many relationships. Other choices include having an ARRAY of animal ids in grazing group, using JSONB fields etc. Those might prove to be problematic later, so I'd recommend going the old way.
If you want to keep track of history, you can add an active boolean field (to the link table probably) to indicate which assignment is current or have a start date and end date for each assignment. This also makes it possible to plan future assignments. To make things easier, make VIEWs showing only current assignment and further VIEWs to show JOINed tables.
Since there's no clear question in your post, I'd just say you are going the right way.

When to use Core Data relationships in Swift?

I've read through a bunch of tutorials to the best of my ability, but I'm still stumped on how to handle my current application. I just can't quite grasp it.
My application is simply a read-only directory that lists employees by their company, department, or sorted in alphabetical order.
I am pulling down JSON data in the form of:
Employee
Company name
Department name
First name
Last name
Job title
Phone number
Company
Company name
Department
Company name
Department name
As you can see, the information here is pretty redundant. I do not have control over the API and it will remain structured this way. I should also add that not every employee has a department, and not every company has departments.
I need to store this data, so that it persists. I have chosen Core Data to do this (which I'm assuming was the right move), but I do not know how to structure the model in this instance. I should add that I'm very new to databases.
This leads me to some questions:
Every example I've seen online uses relationships so that the information can be updated appropriately upon deletion of an object - this will not be the case here since this is read-only. Do I even need relationships for this case then? These 3 sets of objects are obviously related, so I am just assuming that I should structure it this way. If it is still advised to create relationships, then what do I gain out of creating those relationships in a read-only application? (For instance, does it make searching my data easier and cleaner? etc.)
The tutorials I've looked at don't seem to have all of this redundant data. As you can see, "company name" appears as a property in each set of objects. If it would be advised that I create relationships amongst my entities (which are Employee, Company, Department), can someone show me how this should look so that I may get an idea of what to do? (This is of course assuming that I should use relationships in my model.)
And I would imagine that this would be the set of rules:
Each company has many or no departments
Each department has 1 or many employees
Each employee has 1 company and 1 (or no) department
Please let me know if I'm on the right track here. If you need clarification, I will try my best.
Yes, use relationships. Make them bi-directional.
The redundant information in your feed doesn't matter, ignore it. If you received partial data it could be used to build the relationships, but you don't need to use it.
You say this data comes from an API, so it isn't read-only as far as the app is concerned. Worry more about how you're going to use the data in the app than how it comes from the server when designing your data model.

Best practices to design classes to represent database tables

This may be a dumb question, but I've always wondered what's the best way to do this.
Suppose we have a database with two tables: Users and Orders (one user can have many orders), and in any OOP language you have two classes to represent those tables User and Order. In the database it's evident that the 'order' will have the 'user' ID because it's a one to many relationship (because one user can have many orders) and the user won't have any order ID. But in code what's the best practice out of the following three?
a) Should the user have an array of Orders?
b) Should the order have the user ID?
c) Should the order have a reference to the user object?
Or are there more efficient ways to tackle this? I've always done it in different ways, they all have both pros and cons, but I've never asked an expert's opinion.
Thanks in advance!
In this instance, the User could have an array of orders if you're performing operations on the User that also involves orders that they own.
Whenever I design my classes, objects that are related contain pointers to each other, so I can access the Orders from the User and the User from an Order.
I don't believe there is a best practice as it really depends on what you're trying to accomplish. With Users and Orders, I could see you starting with an Order and needing to access the User and vice versa; therefore, in your situation it sounds like you should map the objects both ways.
One word of warning, just be careful not to create a circular reference. If you delete both objects without removing the reference, it could create a memory leak.
You are asking about what is known as "object relational mapping" (ORM). I think the best way to learn what you want to learn is to look at some well established ORM libraries [such as ActiveRecord(Ruby) or Hibernate (Java)] and see how they do it.
With that in mind:
a) If the application requires it there should be access to an array (or similar enumeration) of objects representing the users orders through the user object. However this will usually best involve lazy loading (i.e. the orders will usually not be pulled from the database when the user pulled from the database....the orders will be subsequently queried when the application needs access to them). After objects are lazy loaded they can be cached by the ORM to eliminate the need for further queries on that istantiation.
b) Unless for performance reasons you only pull specific columns you're usually going to pull all columns when pulling an order. So it would include the user id.
c) Answer a applies to this as well.

When should i use properties instead of object references?

For example if i have an Author class and a book class independently. We all know an author writes a book.
What i would love to know when it's best to include the book as a reference object in the Author class or just include the book name?
The reason for this question ties mainly to flexibility and easy maintenance.
Update:
What design pattern should i read up that relate to this type of issue?
I would generally store the reference to the book, and the book object is therefore readily accessible from the author. If you store the property (name) in this scenario, then some questions are:
is the name unique ?
is it costly to retrieve the book from the author (e.g. do you have to go to the database) ?
If you don't want the whole book object in memory (perhaps storing all authors with all their books consumes huge amounts of resource), then perhaps you want a book placeholder object referenced from the author class. That placeholder would store the book's unique key and can retrieve the book upon demand. It may implement a book interface and thus be indistinguishable from the real book. The downside is that the book still has to be populated upon, or prior to, a request for info.
The name would be a very bad idea; it's quite possible for two different books to have the same title.
I'd use an object reference (inside some sort of collection, of course, since an author can write more than one book) - that's what they're for. This is certainly flexible and maintainable.
There may be exceptional circumstances where this causes problems, and only then would I consider keeping some sort of unique ID (in the case of books, the ISBN would be the prime candidate) instead of a reference.
It really depends if author is ever going to need access to book information beyond that of just the title. If really never, then ok maybe you only need the title as a String. However, if additional book info is likely to be needed, you need to think about how you are going to access that. In this case it probably makes sense to use a book class.
Having the book name inside the author object will create duplication if a book has many authors. If somehow you need to modify the book name then you will have to go through all the authors collection in order to determine where you need to change.
Having a single book object referenced by all its authors is much simple. Just change the book name in one place and that's all.
In don't think is it easy maintainable or flexible a structure with unneeded duplicates.
EDIT: With duplication of the name you can also get to inconsistencies. Just imagine what happens if a book's titles (with two authors) gets its name modified just for one author.
It depends on what you want to do with the information. Sometimes it might be easier to have your list of book names stored in the author, other times you might need the full book information (ISBN, publisher etc.)
In a database you would have an Author table with all of their details, and a book table with all details of the book, then probably a many-to-many relationship table to tie books to authors and vice-versa.
Oh, and properties / object references aren't really mutually exclusive. A property CAN be an object reference. It sounds more like you are asking if you should store the full obejct information or just the piece you might need immediately.
Option B. Avoid having such references - it will quickly render your program into a mess of dependencies. Keep the two concepts independent, and if you need, some higher-order abstraction to link between the two.
Your question is a part of a complex saga known as "object-relational mappings", there's a lot of material on design patterns for that on the web.
Object-oriented when each object has references to its best friends – not to their names. There are certain downsides to it, like: it is notoriously hard to keep track of back-references.
However, the problem calls for a more general solution than a best-practice pattern.
As long as you're using OO to design your application at all, I'd opt to be consistent and keep direct references to your objects.
It really depends on your domain. Most book store databases have some discrepancies between names of authors between books, and no information whether a given name is shared by more than one author. So in that a book would have a list of author names, and there would be no object identity mapped to that data.
On the other hand, if your domain is a publishing house, you have a very good idea which of the authors John Smith ( client number 19024982 ), J. Henry Smith ( client number 19024982 ) and John Smith ( client number 773829 ) are the same or different authors, and which books those two authors have created, and so using object references for author and book identity would be a good mapping of the domain.