Track history of ManyToMany Relationship table with extra fields - postgresql

I have a many to many relationship table (named UserLabel) in the postgres db with some extra field. I want to be able to track the history of changes to this many to many table. I came up with the following structure. I'd like to know if there's any better way of implementing it
User
id
Label
id
UserLabel
id
user_id
label_id
label_info (jsonb)
is_deleted (true or false)
UserLabel can contain more than one record with same user_id and label_id but with different label_info. At any point of time, if I want to query for all the labels for a given user I can do that using this table. Now, updates could occur on this table on label_id or label_info or is_deleted fields. I want to be able to know at any given point of time, what were the labels and label info of a user. For this, I'm using the below table.
UserLabelEvent
id
user_label_id
user_id
label_id
label_info
change_type (value will be one of (create, update, delete))
created_timestamp
If I want to check the user labels for any user at any time, I just have to query on user_id and created_timestamp and order the records by created_timestamp and loop over the records to construct the user labels at any given time.
The problems in my current approach:
By default, anyone seeing at the schema of UserLabel table feels like there cannot be more than one record with same user_id and same label_id.
By looking at the UserLabelEvent, it's not obvious to understand how that table is working.
I need to do some post processing to find out the user labels at any given time. By post processing, I mean, loop over the query results and construct the user labels.
Please do suggest any other problems you find with this approach. I will update the post with new inputs.

Related

How do I set my hasura permission to only see the rows of my table corresponding to a user?

Here's the thing. I have the 3 tables depicted here:
People on my application can place orders. Then, I want
a user with rex permission to see all the orders table's rows
a user with delivery permission to only see the rows of the orders table that have the zip column set to the delivery user's zip
From the orders table, I can get for each order a zip. With the table zip_user, I can get a user_id out of a zip. Out of that user_id, I can get the delivery user from the users table.
While it is trivial to get the rex to see all of the orders table, I have not yet been able to configure the permissions for the delivery user. What do I need to do?
In other words, given the user performing a select on the orders table has x-hasura-user-id set to some user id and x-hasura-role set to delivery, how does that user get only the rows from the orders table that match with the zips associated with that user's user_id?
Hasura has the concept of relations. If you have foreign keys, it makes the relations automatically, if not you can make them yourself in the UI. Once the relationships have been set up, you will be able to set deep permissions, so on the orders table, you'll be able to use users.id.
Start here: https://hasura.io/docs/1.0/graphql/manual/schema/relationships/index.html

Filter and display database audit / changelog (activity stream)

I'm developing an application with SQLAlchemy and PostgreSQL. Users of the system modify data in 8 or so tables. Consider this contrived example schema:
I want to add visible logging to the system to record what has changed, but not necessarily how it has changed. For example: "User A modified product Foo", "User A added user B" or "User C purchased product Bar". So basically I want to store:
Who made the change
A message describing the change
Enough information to reference the object that changed, e.g. the product_id and customer_id when an order is placed, so the user can click through to that entity
I want to show each user a list of recent and relevant changes when they log in to the application (a bit like the main timeline in Facebook etc). And I want to store subscriptions, so that users can subscribe to changes, e.g. "tell me when product X is modified", or "tell me when any products in store S are modified".
I have seen the audit trigger recipe, but I'm not sure it's what I want. That audit trigger might do a good job of recording changes, but how can I quickly filter it to show recent, relevant changes to the user? Options that I'm considering:
Have one column per ID type in the log and subscription tables, with an index on each column
Use full text search, combining the ID types as a tsvector
Use an hstore or json column for the IDs, and index the contents somehow
Store references as URIs (strings) without an index, and walk over the logs in reverse date order, using application logic to filter by URI
Any insights appreciated :)
Edit It seems what I'm talking about it an activity stream. The suggestion in this answer to filter by time first is sounding pretty good.
Since the objects all use uuid for the id field, I think I'll create the activity table like this:
Have a generic reference to the target object, with a uuid column with no foreign key, and an enum column specifying the type of object it refers to.
Have an array column that stores generic uuids (maybe as text[]) of the target object and its parents (e.g. parent categories, store and organisation), and search the array for marching subscriptions. That way a subscription for a parent category can match a child in one step (denormalised).
Put a btree index on the date column, and (maybe) a GIN index on the array UUID column.
I'll probably filter by time first to reduce the amount of searching required. Later, if needed, I'll look at using GIN to index the array column (this partially answers my question "Is there a trick for indexing an hstore in a flexible way?")
Update this is working well. The SQL to fetch a timeline looks something like this:
SELECT *
FROM (
SELECT DISTINCT ON (activity.created, activity.id)
*
FROM activity
LEFT OUTER JOIN unnest(activity.object_ref) WITH ORDINALITY AS act_ref
ON true
LEFT OUTER JOIN subscription
ON subscription.object_id = act_ref.act_ref
WHERE activity.created BETWEEN :lower_date AND :upper_date
AND subscription.user_id = :user_id
ORDER BY activity.created DESC,
activity.id,
act_ref.ordinality DESC
) AS sub
WHERE sub.subscribed = true;
Joining with unnest(...) WITH ORDINALITY, ordering by ordinality, and selecting distinct on the activity ID filters out activities that have been unsubscribed from at a deeper level. If you don't need to do that, then you could avoid the unnest and just use the array containment #> operator, and no subquery:
SELECT *
FROM activity
JOIN subscription ON activity.object_ref #> subscription.object_id
WHERE subscription.user_id = :user_id
AND activity.created BETWEEN :lower_date AND :upper_date
ORDER BY activity.created DESC;
You could also join with the other object tables to get the object titles - but instead, I decided to add a title column to the activity table. This is denormalised, but it doesn't require a complex join with many tables, and it tolerates objects being deleted (which might be the action that triggered the activity logging).

FileMaker Pro 12 Auto-populating Tables

I'm new to Filemaker and need some advice on auto-populating tables.
Part 1:
I have TableA which includes many records with client information. I want a separate TableB which is identical to TableA except that it is "de-identified"; that is, it does not contain two of the fields, first name and last name.
I would like the two tables to interact such that if I add a new record to TableA, that same record (sans first and last name) appear automatically in TableB.
Part 2:
In addition to the above functionality, I would also like said functionality to be dependent on a specific field type from TableA. For example, I enter a new record, which has a "status" field set to "active," into tableA. I then want that record to be auto-popualted into TableB; however, if I add another record with a "status" of "inactive," I want that that record auto-populated into a TableC but not into TableB.
FileMaker can perform this with script triggers so long as every layout where TableA will be edited has a layout script trigger of OnRecordCommit connected to it. When the record is committed (which can happen in a number of ways), the attached script will run, which you can use to create the appropriate record in the appropriate table.
The script could create the record in a number of ways. If the primary keys for both records are the same, you could use lookups. You could export the record in TableA and then import it into the correct table. You could pass the field information as a parameter to the script. The best choice really depends on your needs.
Having said that, I would question the wisdom of this approach. It brings up a few questions that would seem to complicate matters. For example, what happens when the status changes? When a record in TableA is deleted? When fields in TableA are modified? Each of these contingencies (and others) will require thought and more complicated scripts.
So I would ask what problem you're really trying to solve. My best guess is that you are trying to keep the name information private from certain users. User accounts and privileges with dedicated layouts for each privilege can solve this without the need for duplicate tables. FileMaker privilege sets can be quite granular.
For example, you can specify that users with PrivilegeA can create records and view names, but PrivilegeB users can only view records if the status is "active" and the name fields are not available to them, while PrivilegeC users can view records if the status is "inactive" and the name fields are also not available to them.
I would definitely use filters and permissions on the "status field" to achieve this and not two mirroring tables. Unless the inactive information is drastically different, you would be complicated your solution and creating more possible pitfalls.

How do you store and display if a user has voted or not on something?

I'm working on a voting site and I'm wondering how I should handle votes.
For example on SO when you vote for a question (or answer) your vote is stored, and each time I go back on the page I can see that I already voted for this question because the up/down button are colored.
How do you do that? I mean I've several ideas but I'm wondering if it won't be an heavy load for the database.
Here is my ideas:
Write an helper which will check for every question if a voted has been casted
That's means that the number of queries will depends on the number of items displayed on the page (usually ~20)
Loop on my items get the ids and for each page write a query which will returns if a vote has been casted or NULL
Looks ok because only one query doesn't matter how much items on the page but may be break some MVC/Domain Model design, dunno.
When User log in (or a guest for whom an anonymous user is created) retrieve all votes, store them in session, if a new vote is casted, just add it to the session.
Looks nice because no queries is needed at all except the first one, however, this one and, depending on the number of votes casted (maybe a bunch for each user) can increase the size of the session for each users and potentially make the authentification slow.
How do you do? Any other ideas?
For eg : Lets assume you have a table to store votes and the user who cast it.
Lets assume you keep votes in user_votes when a vote is cast with a table structure something like the below one.
id of type int autoincrement
user_id type int, Foreign key representing users table
question_id type of int, Foreign key representing questions table
Now as the user will be logged in , when you are doing a fetch for the questions do a left join with the user_id in the user_votes table.
Something like
SELECT q.id, q.question, uv.id
FROM questions AS q
LEFT JOIN user_votes AS uv ON
uv.question_id = q.id AND
uv.user_id = <logged_in_user_id>
WHERE <Your criteria>
From the view you can check whether the id is present. If so mark voted, else not.
You may need to change your fields of the questions table and all. I am assuming you store questions in questions table and users in user table so and so. All having the primary key id .
Thanks
You could use a combination of your suggested strategies.
Retrieve all the votes made by the logged in user for recent/active questions only and store them in the session.
You then have the ones that are more likely to be needed while still reducing the amount you need to store in the session.
In the less likely event that you need other results, query for just those as and when you need to.
This strategy will reduce the amount you need to store in the session and also reduce the number of calls you make to your database.
Just based on the information than you've given so far, I would take the second approach: get the IDs of all the items on the page, and then do a single query to get all the user's votes for that list of item IDs. Then pass the collection of the user's item votes to your view, so it can render items differently when the user has voted for that item.
The other two approaches seem like they would tend to be less efficient, if I understood you correctly. Using a view helper to initiate an individual query for each item to check if the user has voted on it could lead to a lot of unnecessary queries. And preloading all the user's voting history at login seems to add unnecessary overhead, getting data that isn't always needed and adding the burden of keeping it up to date for the duration of the session.

Ways to implement data versioning in PostreSQL

Can you share your thoughts how would you implement data versioning in PostgreSQL. (I've asked similar question regarding Cassandra and MongoDB. If you have any thoughts which db is better for that please share)
Suppose that I need to version records in a simple address book. Address book records are stored in one table without relations for simplicity. I expect that the history:
will be used infrequently
will be used all at once to present it in a "time machine" fashion
there won't be more versions than few hundred to a single record.
history won't expire.
I'm considering the following approaches:
Create a new object table to store history of records with a copy of schema of addressbook table and add timestamp and foreign key to address book table.
Create a kind of schema less table to store changes to address book records. Such table would consist of: AddressBookId, TimeStamp, FieldName, Value. This way I would store only changes to the records and I wouldn't have to keep history table and address book table in sync.
Create a table to store seralized (JSON) address book records or changes to address book records. Such table would looks as follows: AddressBookId, TimeStamp, Object (varchar).
Again this is schema less so I wouldn't have to keep the history table with address book table in sync.
(This is modelled after Simple Document Versioning with CouchDB)
I do something like your second approach: have the table with the actual working set and a history with changes (timestamp, record_id, property_id, property_value). This includes the creation of records. A third table describes the properties (id, property_name, property_type), which helps in data conversion higher up in the application. So you can also track very easily changes of single properties.
Instead of a timestamp you could also have an int-like, wich you increment for every change per record_id, so you have an actual version.
You could have start_date and end_date.
When end_date is NULL, it`s the actual record.
I'm versioning glossary data, and my approach was pretty successful for my needs. Basically, for records you need versioning, you divide the fieldset into persistent fields and version-dependent fields, thus creating two tables. Some of the first set should also be the unique key for the first table.
Address
id [pk]
fullname [uk]
birthday [uk]
Version
id [pk]
address_id [uk]
timestamp [uk]
address
In this fashion, you get an address subjects determined by fullname and birthday (should not change by versioning) and a versioned records containing addresses. address_id should be related to Address:id through foreign key. With each entry in Version table you'll get new version for subject Address:id=address_id with a specific timestamp, in which way you can have a history reference.