The idea of the SaaS tool is to have dynamic tables with dynamic custom fields and values of different types, we were thinking to use "force.com/salesforce.com" example but is seems to be too complicated to maintain moving forward, also making some reports to create with a huge abstraction level, so we came up with simple idea but we have to be sure that this is kinda good approach.
This is the architecture we have today (in few steps).
Each tenant has it own separate database on the cluster (Postgres 12).
TABLE table, used to keep all of those tables as reference, this entity has ManyToOne relation to META table and OneToMany relation with DATA table.
META table is used for metadata configuration, has OneToMany relation with FIELDS (which has name of the fields as well as the type of field e.g. TEXT/INTEGER/BOOLEAN/DATETIME etc. and attribute value - as string, only as reference).
DATA table has ManyToOne relation to TABLES and 50 character varying columns with names like: attribute1...50 which are NULL-able.
Example flow today:
When user wants to open a TABLE DATA e.g. "CARS", we load the META table with all the FIELDS (to get fields for this query). User specified that he want to query against: Brand, Class, Year, Price columns.
We are checking by the logic, the reference for Brand, Class, Year and Price in META>FIELDS table, so we know that Brand = attribute2, Class = attribute 5, Year = attribute6 and Price = attribute7.
We parse his request into a query e.g.: SELECT [attr...2,5,6,7] FROM DATA and then show the results to user, if user decide to do some filters on it, based on this data e.g. Year > 2017 AND Class = 'A' we use CAST() functionality of SQL for example SELECT CAST(attribute6 AS int) AND attribute5 FROM DATA WHERE CAST(attribute6 AS int) > 2017 AND attribute5 = 'A';, so then we can actually support most principles of SQL.
However moving forward we are scared a bit:
Manage such a environment for more tenants while we are going to have more tables (e.g. 50 per customer, with roughly 1-5 mil per TABLE (5mil is maximum which we allow, for bigger data we have BigQuery) which is giving us 50-250 mil rows in single table DATA_X) which might affect performance of the queries, especially when we gave possibilities to manage simple WHERE statements (less,equal,null etc.) using some abstraction language e.g. GET CARS [BRAND,CLASS,PRICE...] FILTER [EQ(CLASS,A),MT(YEAR,2017)] developed to be similar to JQL (Jira Query Language).
Transactions lock, as we allow to batch upload CSV into the DATA_X so once they want to load e.g. 1GB of the data, it kinda locks the table for other systems to access the DATA table.
Keeping multiple NULL columns which can affect space a bit (for now we are not that scared as while TABLE creation, customer can decide how many columns he wants, so based on that we are assigning this TABLE to one of hardcoded entities DATA_5, DATA_10, DATA_15, DATA_20, DATA_30, DATA_50, where numbers corresponds to limitations of the attribute columns, and those entities are different, we also support migration option if they decide to switch from 5 to 10 attributes etc.
We are on super early stage, so we can/should make those before we scale, as we knew that this is most likely not the best approach, but we kept it to run the project for small customers which for now is working just fine.
We were thinking also about JSONB objects but that is not the option, as we want to keep it simple for getting the data.
What do you think about this solution (fyi DATA has PRIMARY key out of 2 tables - (ID,TABLEID) and built in column CreatedAt which is used form most of the queries, so there will be maximum 3 indexes)?
If it seem bad, what would you recommend as the alternative to this solution based on the details which I shared (basically schema-less RDBMS)?
IMHO, I anticipate issues when you wanted to join tables and also using cast etc.
We had followed the approach below that will be of help to you
We have a table called as Cars and also have a couple of tables like CarsMeta, CarsExtension columns. The underlying Cars table will have all the common fields for a ll tenant's. Also, we will have the CarsMeta table point out what are the types of columns that you can have for extending the Cars entity. In the CarsExtension table, you will have columns like StringCol1...5, IntCol1....5, LongCol1...10
In this way, you can easily filter for data also like,
If you have a filter on the base table, perform the search, if results are found, match the ids to the CarsExtension table to get the list of exentended rows for this entity
In case the filter is on the extended fields, do a search on the extension table and match with that of the base entity ids.
As we will have the extension table organized like below
id - UniqueId
entityid - uniqueid (points to the primary key of the entity)
StringCol1 - string,
...
IntCol1 - int,
...
In this case, it will be easy to do a join for entity and then get the data along with the extension fields.
In case you are having the table metadata and data being inferred from separate tables, it will be a difficult task to maintain this over long period of time and also huge volume of data.
HTH
So, my Swift app allows a user to choose sports teams to see historic match information for. Currently, a user selects team(s) and the JSON data file of historic matches is scanned.
If a historic match includes a name of a selected team, the details of the match are stored in a Core Data entity, which is fed into my main Table View.
However, this presents an issue I can't get my head around solving.
If a user selects team A and B, and the database contains a match where team A and B played EACH OTHER, two objects for the match details are created, and as such, Table View cell is created twice, once for team A being found in the instance of the match, and again for team B.
Is there an easy and efficient way to trim any duplicates caused in this way? I don't know whether to handle this at the object creation time, or just to find a way of removing any duplicated cells from my Table View.
Thanks so much.
I think you should redesign your setup. Have all the records to be searched stored in Core Data.
If you have a hardcoded JSON file - import it on first start. If you have retrieved JSON - insert / update the elements that are new / changed in your Core Data object graph.
You would have a Match or Game entity and it would be retrieved only once. The fetch predicate would be something like
NSPredicate(format: "homeTeam = %# || guestTeam = %#", selectedTeam, selectedTeam)
I am using Entitiy Framework and have come across a weird problem.
I am trying to save a collection to the database (say: Collection of Rounds).
Now each item in this collection in turn has a collection of child elements (say: Collection of Events).
Which would look something like this:
Round 1
(No Child Elements)
Round 2
Event 1
Event 2
Event 3
Round 3
(No Child Elements)
Round 4
Event 1
Event 2
As shown above it is not necessary that the parent object will always have a child collection.
Now here is the problem:
My requirement is that i want to save the data as i have added it to the collection.
But, while saving EF saves the items having a child collection first and so the order is modified upon saving.
So, in the database Round 2 is saved first and then others are saved randomly.
Is there any way to force EF to save the Rounds collection in the order that i have constructed it??
It should always save starting from Round 1 and should end with saving Round 4.
Thanks guys : )
Neither database or EF guarantees ordering. In the same way if you query the database you don't have to get elements in expected order. If you want exact order you must add additional column to keep record's ordering value and use OrderBy extension method when retrieving data.
Order of operations executed by SaveChanges is in full control of EF. You cannot change it.
My iPhone application is using a SQLite database with the following schema:
items(id, name, ...) -> this table contains 50 records
tags(id, name) -> this table contains 50 records
item_tags(id, item_id, tag_id, user_id)
similarities(id, item1_id, item2_id, score)
The items, tags, item_tags and similarities tables are populated with pre-defined records, hence also the similarities between different items have already been calculated offline (using cosine similarity algorithm based on the items' tags).
Users are able to add additional tags to items and to remove their custom tags later on. Whenever this happens the similarity scores between the items should be updated locally, i.e. without contacting the server application.
My question now is the following:
What is the most efficient way to do so? So far, on startup of the iPhone application, I compute a term-document matrix for all the items and tags (which reflects the tag frequencies for each item) and keep this matrix in memory as long as the application is running. Whenever a tag is added or removed, I use this matrix to update the similarities in the database. However, this is rather inefficient. Do you have any suggestions?
Thanks!
This presentation might help you:
http://www.slideshare.net/jnvms/incremental-itembased-collaborative-filtering-4095306
I'm using search logic to filter and order my results but it removes records from my results when I order by a association and when that association is not always present for all records.
For example say I have a user model which can have one vehicle model but does not have to, if I have a results table where you can order by the users vehicles make I would hope all users without a vehicle record would be considered empty strings and therefore ordered all at the beginning followed by the other user records which have vehicles ordered by the make name.
Unfortunately all the user records which do not have a vehicle are removed from the results.
Is there anyway round this and still use search logic as I find it extremely useful
I think you'll have to explicitly assign a default vehicle that has an empty name