I am attempting to create a DB for my app and one thing I'd like to find the best way of doing is creating a one-to-many relationship between my Users and Items tables.
I know I can make a third table, ReviewedItems, and have the columns be a User id and an Item id, but I'd like to know if it's possible to make a column in Users, let's say reviewedItems, which is an integer array containing foreign keys to Items that the User has reviewed.
If PostgreSQL can do this, please let me know! If not, I'll just go down my third table route.
It may soon be possible to do this: https://commitfest.postgresql.org/17/1252/ - Mark Rofail has been doing some excellent work on this patch!
The patch will (once complete) allow
CREATE TABLE PKTABLEFORARRAY (
ptest1 float8 PRIMARY KEY,
ptest2 text
);
CREATE TABLE FKTABLEFORARRAY (
ftest1 int[],
FOREIGN KEY (EACH ELEMENT OF ftest1) REFERENCES PKTABLEFORARRAY,
ftest2 int
);
However, author currently needs help to rebase the patch (beyond my own ability) so anyone reading this who knows Postgres internals please help if you can.
No, this is not possible.
PostgreSQL is a relational DBMS, operating most efficiently on properly normalized data models. Arrays are not relational data structures - by definition they are sets - and while the SQL standard supports defining foreign keys on array elements, PostgreSQL currently does not support it. There is an (dormant? no activity on commitfest since February 2021) effort to implement this - see this answer to this same question - so the functionality might one day be supported.
For the time being you can, however, build a perfectly fine database with array elements linking to primary keys in other tables. Those array elements, however, can not be declared to be foreign keys and the DBMS will therefore not maintain referential integrity. Using an appropriate set of triggers (both on the referenced and referencing tables, as a change in either would have to trigger a check and possible update on the other) one would in principle be able to implement referential integrity over the array elements but the performance is unlikely to be stellar (because indexes would not be used, for instance).
Related
The application we're designing has a function where users can dynamically add new elements to an entity that then need to be efficiently searched. The number of these elements is essentially unlimited. Our team has been looking at DynamoDB as a data store option, and we've been wrestling with the key/value model and how to get this dynamic data under an index for efficient querying.
I think I have a single-table solution that handles the problem elegantly and also allows for querying on any given attribute in the data store, but am disturbed that I can't find an example of it anywhere else. Hopefully it's not fundamentally flawed in some way - I would appreciate any critique!
The model is essentially the Entity-Attribute-Value approach used for adding dynamic or sparse data to RDBMs. So instead of storing different entities/objects in a DynamoDB table like so:
PK SK SK-1 SK-2 SK-3 SK-N... PK SK SK-1 SK-N...
Key Key Key Key --> Name Money
Entity Id Value Value Value Value Person 22 Fred 30000
... which lets me query things like "all persons where name = Fred" but where you would eventually run out of sort key indexes and you would need to know which index goes with which key before you query, the data could be stored in EAV format like so:
PK SK & GSI-PK GSI-SK PK SK & GSI-PK GSI-SK
Id Entity#Key Value 22 Person#Name Fred
Id Entity#Key Value --> 22 Person#Money 30000
Id Entity#Key Value 22 Person#Sex M
Id Entity#Key Value 22 Person#DOB 09/00
Now, with one global secondary index (GSI-1 PK over Entity.Key and GSI-1 SK over Value) I can do a range search on any value for any key and get a list of Ids that match. Users can add their attributes or even entirely new entities and have them persisted in a way that's instantly indexed without us having to revamp the DynamoDB schema.
The one major downside to this approach that I can think of is that data returned from a query on an Entity#Key-Value only contains values for that key and the entity Id, not the entire entity. That's fine for charts and graphs but a problem if you want to get a grid-type result with one query. I also worry about hot partition keys on the index, but hopefully we could solve that with intelligent write sharding.
That's pretty much it. With a few tweaks the model can be extended to support the logging of all changes on each key and allow some nice time series queries against those changes, but my question is if anyone has found it useful to take an EAV type approach to a KV store like DynamoDB, or if there's another way to handle querying a dynamic schema?
You can have pk as the id of the entity. And then a sort key of {attributeName}. You may still want to have the base entity with fields like createdAt, etc.
So you might have:
PK SORT Attributes:
#Entity#22 #Entity#Details createdAt=2020
#Entity#22 #Attribute#name key=name value=Fred
#Entity#22 #Attribute#money key=money value=30000
To get all the attributes of an entity you simply do a query with no filter of pk={id}. You cannot dynamically sort by every given attribute, this is exactly what DynamoDB is not good at, I repeat! That case is exactly what NOSQL performs poorly at.
What you can do is use streaming to do aggregation. So you can for instance store the top 10 wealthiest people:
PK SORT Attributes:
#Money#Highest #1 id=#Entity#22 value=30000
#Money#Highest #2 id=#Entity#52 value=30000
Which you would calculate in a DynamoDB Streams. But you couldn't dynamically index values, DynamoDB works by effectively copying data from one form to another so that it can be efficiently retrieved. So you would be copying your entire database for each new attribute you wanted to search by, or otherwise you would have to use Scans and that wouldn't make any sense to do because you would get no benefit to using DynamoDB if all you ever did was do Scans all the time.
Your processes need to be very well understood to make good use of DynamoDb, if you want to index data at will, and do all sorts of different queries, you probably want an SQL database or elasticsearch.
What's the optimal way to store values for a select list in a web-app with Postgres?
If I use an enum, this has the benefit of acting as a constraint for whatever column is set to that type (only allowing possible values). I can also write rather normal queries to pull those values to populate the select... but this has the drawback of requiring the option text and value to be identical.
If I create a table to store these, I can have columns for both value and text, and perhaps even a third (comment/description, whatever). However, it means a full table for every set of values, of which I expect several dozen throughout the webapp. Not sure why this feels like a "heavier" solution than enums, but it does. (A "create enum" statement plus possible "alter enum" in the future vs. a "create table" plus many initial insert statements and maybe more in the future.)
Nor can I create a single table for all dropdown lists, because then I would need to do convoluted constraint logic in the various tables that related to that.
Is there a code pattern that is ideal for this problem that I'm unaware of?
The solution doesn't need to be portable to other database engines... I'm more than happy to use a postgres-only solution.
I want to ORDER_BY by time/date, and paginate through all items in a table. Scan seems designed to paginate through everything, but does not seem to have a "ASC/DESC" equiv. Query has ScanIndexForward but requires specific primary keys. (no way to SELECT * ?)
Based on the first comment of this question I'm thinking the only way to achieve this is to use a common primary key (!?) and then Query based on that, focusing on the Range key. Is this really how it's supposed to work? I'd have to make a whole separate table with mirrored attributes if I wanted to Query an individual item based on a unique primary key.
Please excuse my NoSQL noobness. I'm a front-end dev who's only dabbled in MySQL and SimpleDB.
Yes, this is what Query is for. The hash key identifies the list of things to page over, and the range key indicates the position within the list. If you can tolerate the latency hit, all you need to store in the table is primary keys where all the data being paged over lives, you can then issue a BatchGetItem to read a pageful of data in parallel.
Duplicate data isn't the sin in NoSQL that it is in the relational model, you're essentially crafting a MySQL style index by hand.
I'm interested in using the following audit mechanism in an existing PostgreSQL database.
http://wiki.postgresql.org/wiki/Audit_trigger
but, would like (if possible) to make one modification. I would also like to log the primary_key's value where it could be queried later. So, I would like to add a field named something like "record_id" to the "logged_actions" table. The problem is that every table in the existing database has a different primary key fieldname. The good news is that the database has a very consistent naming convention. It's always, _id. So, if a table was named "employee", the primary key is "employee_id".
Is there anyway to do this? basically, I need something like OLD.FieldByName(x) or OLD[x] to get value out of the id field to put into the record_id field in the new audit record.
I do understand that I could just create a separate, custom trigger for each table that I want to keep track of, but it would be nice to have it be generic.
edit: I also understand that the key value does get logged in either the old/new data fields. But, what I would like would be to make querying for the history easier and more efficient. In other words,
select * from audit.logged_actions where table_name = 'xxxx' and record_id = 12345;
another edit: I'm using PostgreSQL 9.1
Thanks!
You didn't mention your version of PostgreSQL, which is very important when writing answers to questions like this.
If you're running PostgreSQL 9.0 or newer (or able to upgrade) you can use this approach as documented by Pavel:
http://okbob.blogspot.com/2009/10/dynamic-access-to-record-fields-in.html
In general, what you want is to reference a dynamically named field in a record-typed PL/PgSQL variable like 'NEW' or 'OLD'. This has historically been annoyingly hard, and is still awkward but is at least possible in 9.0.
Your other alternative - which may be simpler - is to write your audit triggers in plperlu, where dynamic field references are trivial.
This may be a very simplistic question, so apologies in advance, but I am very new to database usage.
I'd like to have Postgres run its full text search across multiple joined tables. Imagine something like a model User, with related models UserProfile and UserInfo. The search would only be for Users, but would include information from UserProfile and UserInfo.
I'm planning on using a gin index for the search. I'm unclear, however, on whether I'm going to need a separate tsvector column in the User table to hold the aggregated tsvectors from across the tables, and to setup triggers to keep it up to date. Or if it's possible to create an index without a tsvector column that'll keep itself up to date whenever any of the relevant fields in any of the relevant tables change. Also, any tips on the syntax of the command to create all this would be much appreciated as well.
Your best answer is probably to have a separate tsvector column in each table (with an index on, of course). If you aggregate the data up to a shared tsvector, that'll create a lot of updates on that shared one whenever the individual ones update.
You will need one index per table. Then when you query it, obviously you need multiple WHERE clauses, one for each field. PostgreSQL will then automatically figure out which combination of indexes to use to give you the quickest results - likely using bitmap scanning. It will make your queries a little more complex to write (since you need multiple column matching clauses), but that keeps the flexibility to only query some of the fields in the cases where you want.
You cannot create one index that tracks multiple tables. To do that you need the separate tsvector column and triggers on each table to update it.