Many-to-Many in Postgres? - postgresql

I went with PostgreSQL because it is an ORDMBS rather than a standard relational DBMS. I have a class/object (below) that I would like to implement into the database.
class User{
int id;
String name;
ArrayList<User> friends;
}
Now, a user has many friends, so, logically, the table should be declared like so:
CREATE TABLE user_table(
id INT,
name TEXT,
friends TYPEOF(user_table)[]
)
However, to my knowledge, it is not possible to use a row of a table as a type (-10 points for postgreSQL), so, instead, my array of friends is stored as integers:
CREATE TABLE user_table(
id INT,
name TEXT,
friends INT[]
)
This is an issue because elements of an array cannot reference - only the array itself can. Added to this, there seems to be no way to import the whole user (that is to say, the user and all the user's friends) without doing multiple queries.
Am I using postgreSQL wrong? It seems to me that the only efficient way to use it is by using a relational approach.
I want a cleaner object-oriented approach similar to that of Java.

I'm afraid you are indeed using PostgreSQL wrong, and possibly misunderstanding the purpose of Object-relational databases as opposed to classic relational databases. Both classes of database are still inherently relational, but the former provides allowances for inheritance and user-defined types that the latter does not.
This answer to one of your previous questions provides you with some great pointers to achieve what you're trying to do using the Postgres pattern.

Well, first off PostgreSQL absolutely supports arrays of complex types like you describe (although I don't think it has a TYPEOF operator). How would the declaration you describe work, though? You are trying to use the table type in the declaration of the table. If what you want is a composite type in an array (and I'm not really sure that it is) you would declare this in two steps:
CREATE TYPE ima_type AS ( some_id integer, some_val text);
CREATE TABLE ima_table
( some_other_id serial NOT NULL
, friendz ima_type []
)
;
That runs fine. You can also create arrays of table types, because every table definition is a type definition in Postgres.
However, in a relational database, a more traditional model would use two tables:
CREATE TABLE persons
( person_id serial NOT NULL PRIMARY KEY
, person_name text NOT NULL
)
;
CREATE TABLE friend_lookup
( person_id integer FOREIGN KEY REFERENCES persons
, friend_id integer FOREIGN KEY REFERENCES persons(person_id)
, CONSTRAINT uq_person_friend UNIQUE (person_id, friend_id)
)
;
Ignoring the fact that the persons table has absolutely no way to prevent duplicate persons (what about misspellings, middle initials, spacing, honorifics, etc?; also two different people can have the same name), this will do what you want and allow for a simple query that lists all friends.

Related

PostgreSQL - CREATE TABLE - Apply constraints like PRIMARY KEY in attributes that are inside composite types

I want to implement an object-relational database, using PostgreSQL. I do not want to use ORACLE.
Can I create a composite type, and then use it in a table, adding a restriction for example of a primary key in one of its attributes?
Below I leave an example:
CREATE TYPE teamObj AS (
idnumeric,
name character varying,
city character varying,
estadiumName character varying,
jugadores playerObj[]
);
CREATE TABLE teamTable (
equipo equipoobj,
PRIMARY KEY (equipo.id)
);
The line PRIMARY KEY (equipo.id) gives an error, and I've read a lot of documentation of this topic, and I don't find the solution, maybe PostgreSQL has not implemented yet, or will never implement it, or I don't understand how runs PostgreSQL...
Does somebody have a solution?
Thank you.
No, you cannot do that, and I recommend that you do not create tables like that.
Your table definition should look pretty much like your type definition does. There is no need for an intermediate type definition.
Reasons:
that will not make your schema more readable
that violates the first normal form for relational databases
that does not match an object oriented design any better than the simple table definition
As a comfort, PostgreSQL will implicitly define a composite type of the same name whenever you define a table, so you can do things like
CAST(ROW(12, 'Raiders', 'Wolfschoaßing', 'Dorfwiesn', NULL) AS team)
or
CREATE FUNCTION getteam(id) RETURNS team

Postgres ENUM data type or CHECK CONSTRAINT?

I have been migrating a MySQL db to Pg (9.1), and have been emulating MySQL ENUM data types by creating a new data type in Pg, and then using that as the column definition. My question -- could I, and would it be better to, use a CHECK CONSTRAINT instead? The MySQL ENUM types are implemented to enforce specific values entries in the rows. Could that be done with a CHECK CONSTRAINT? and, if yes, would it be better (or worse)?
Based on the comments and answers here, and some rudimentary research, I have the following summary to offer for comments from the Postgres-erati. Will really appreciate your input.
There are three ways to restrict entries in a Postgres database table column. Consider a table to store "colors" where you want only 'red', 'green', or 'blue' to be valid entries.
Enumerated data type
CREATE TYPE valid_colors AS ENUM ('red', 'green', 'blue');
CREATE TABLE t (
color VALID_COLORS
);
Advantages are that the type can be defined once and then reused in as many tables as needed. A standard query can list all the values for an ENUM type, and can be used to make application form widgets.
SELECT n.nspname AS enum_schema,
t.typname AS enum_name,
e.enumlabel AS enum_value
FROM pg_type t JOIN
pg_enum e ON t.oid = e.enumtypid JOIN
pg_catalog.pg_namespace n ON n.oid = t.typnamespace
WHERE t.typname = 'valid_colors'
enum_schema | enum_name | enum_value
-------------+---------------+------------
public | valid_colors | red
public | valid_colors | green
public | valid_colors | blue
Disadvantages are, the ENUM type is stored in system catalogs, so a query as above is required to view its definition. These values are not apparent when viewing the table definition. And, since an ENUM type is actually a data type separate from the built in NUMERIC and TEXT data types, the regular numeric and string operators and functions don't work on it. So, one can't do a query like
SELECT FROM t WHERE color LIKE 'bl%';
Check constraints
CREATE TABLE t (
colors TEXT CHECK (colors IN ('red', 'green', 'blue'))
);
Two advantage are that, one, "what you see is what you get," that is, the valid values for the column are recorded right in the table definition, and two, all native string or numeric operators work.
Foreign keys
CREATE TABLE valid_colors (
id SERIAL PRIMARY KEY NOT NULL,
color TEXT
);
INSERT INTO valid_colors (color) VALUES
('red'),
('green'),
('blue');
CREATE TABLE t (
color_id INTEGER REFERENCES valid_colors (id)
);
Essentially the same as creating an ENUM type, except, the native numeric or string operators work, and one doesn't have to query system catalogs to discover the valid values. A join is required to link the color_id to the desired text value.
As other answers point out, check constraints have flexibility issues, but setting a foreign key on an integer id requires joining during lookups. Why not just use the allowed values as natural keys in the reference table?
To adapt the schema from punkish's answer:
CREATE TABLE valid_colors (
color TEXT PRIMARY KEY
);
INSERT INTO valid_colors (color) VALUES
('red'),
('green'),
('blue');
CREATE TABLE t (
color TEXT REFERENCES valid_colors (color) ON UPDATE CASCADE
);
Values are stored inline as in the check constraint case, so there are no joins, but new valid value options can be easily added and existing values instances can be updated via ON UPDATE CASCADE (e.g. if it's decided "red" should actually be "Red", update valid_colors accordingly and the change propagates automatically).
One of the big disadvantages of Foreign keys vs Check constraints is that any reporting or UI displays will have to perform a join to resolve the id to the text.
In a small system this is not a big deal but if you are working on a HR or similar system with very many small lookup tables then this can be a very big deal with lots of joins taking place just to get the text.
My recommendation would be that if the values are few and rarely changing, then use a constraint on a text field otherwise use a lookup table against an integer id field.
PostgreSQL has enum types, works as it should. I don't know if an enum is "better" than a constraint, they just both work.
From my point of view, given a same set of values
Enum is a better solution if you will use it on multiple column
If you want to limit the values of only one column in your application, a check constraint is a better solution.
Of course, there is a whole lot of other parameters which could creep in your decision process (typically, the fact that builtin operators are not available), but I think these two are the most prevalent ones.
I'm hoping somebody will chime in with a good answer from the PostgreSQL database side as to why one might be preferable to the other.
From a software developer point of view, I have a slight preference for using check constraints, since PostgreSQL enum's require a cast in your SQL to do an update/insert, such as:
INSERT INTO table1 (colA, colB) VALUES('foo', 'bar'::myenum)
where "myenum" is the enum type you specified in PostgreSQL.
This certainly makes the SQL non-portable (which may not be a big deal for most people), but also is just yet another thing you have to deal with while developing applications, so I prefer having VARCHARs (or other typical primitives) with check constraints.
As a side note, I've noticed that MySQL enums do not require this type of cast, so this is something particular to PostgreSQL in my experience.

postgresql hstore key/value vs traditional SQL performance

I need to develop a key/value backend, something like this:
Table T1 id-PK, Key - string, Value - string
INSERT into T1('String1', 'Value1')
INSERT INTO T1('String1', 'Value2')
Table T2 id-PK2, id2->external key to id
some other data in T2, which references data in T1 (like users which have those K/V etc)
I heard about PostgreSQL hstore with GIN/GIST. What is better (performance-wise)?
Doing this the traditional way with SQL joins and having separate columns(Key/Value) ?
Does PostgreSQL hstore perform better in this case?
The format of the data should be any key=>any value.
I also want to do text matching e.g. partially search for (LIKE % in SQL or using the hstore equivalent).
I plan to have around 1M-2M entries in it and probably scale at some point.
What do you recommend ? Going the SQL traditional way/PostgreSQL hstore or any other distributed key/value store with persistence?
If it helps, my server is a VPS with 1-2GB RAM, so not a pretty good hardware. I was also thinking to have a cache layer on top of this, but I think it rather complicates the problem. I just want good performance for 2M entries. Updates will be done often but searches even more often.
Thanks.
Your question is unclear because your not clear about your objective.
The key here is the index (pun intended) - if your dealing with a large amount of keys you want to be able to retrieve them with a the least lookups and without pulling up unrelated data.
Short answer is you probably don't want to use hstore, but lets look into more detail...
Does each id have many key/value pairs (hundreds+)? Don't use hstore.
Will any of your values contain large blocks of text (4kb+)? Don't use hstore.
Do you want to be able to search by keys in wildcard expressions? Don't use hstore.
Do you want to do complex joins/aggregation/reports? Don't use hstore.
Will you update the value for a single key? Don't use hstore.
Multiple keys with the same name under an id? Can't use hstore.
So what's the use of hstore? Well, one good scenario would be if you wanted to hold key/value pairs for an external application where you know you always want to retrive all key/values and will always save the data back as a block (ie, it's never edited in-place). At the same time you do want some flexibility to be able to search this data - albiet very simply - rather than storing it in say a block of XML or JSON. In this case since the number of key/value pairs are small you save on space because your compressing several tuples into one hstore.
Consider this as your table:
CREATE TABLE kv (
id /* SOME TYPE */ PRIMARY KEY,
key_name TEXT NOT NULL,
key_value TEXT,
UNIQUE(id, key_name)
);
I think the design is poorly normalized. Try something more like this:
CREATE TABLE t1
(
t1_id serial PRIMARY KEY,
<other data which depends on t1_id and nothing else>,
-- possibly an hstore, but maybe better as a separate table
t1_props hstore
);
-- if properties are done as a separate table:
CREATE TABLE t1_properties
(
t1_id int NOT NULL REFERENCES t1,
key_name text NOT NULL,
key_value text,
PRIMARY KEY (t1_id, key_name)
);
If the properties are small and you don't need to use them heavily in joins or with fancy selection criteria, and hstore may suffice. Elliot laid out some sensible things to consider in that regard.
Your reference to users suggests that this is incomplete, but you didn't really give enough information to suggest where those belong. You might get by with an array in t1, or you might be better off with a separate table.

When to use inherited tables in PostgreSQL?

In which situations you should use inherited tables? I tried to use them very briefly and inheritance didn't seem like in OOP world.
I thought it worked like this:
Table users has all fields required for all user levels. Tables like moderators, admins, bloggers, etc but fields are not checked from parent. For example users has email field and inherited bloggers has it now too but it's not unique for both users and bloggers at the same time. ie. same as I add email field to both tables.
The only usage I could think of is fields that are usually used, like row_is_deleted, created_at, modified_at. Is this the only usage for inherited tables?
There are some major reasons for using table inheritance in postgres.
Let's say, we have some tables needed for statistics, which are created and filled each month:
statistics
- statistics_2010_04 (inherits statistics)
- statistics_2010_05 (inherits statistics)
In this sample, we have 2.000.000 rows in each table. Each table has a CHECK constraint to make sure only data for the matching month gets stored in it.
So what makes the inheritance a cool feature - why is it cool to split the data?
PERFORMANCE: When selecting data, we SELECT * FROM statistics WHERE date BETWEEN x and Y, and Postgres only uses the tables, where it makes sense. Eg. SELECT * FROM statistics WHERE date BETWEEN '2010-04-01' AND '2010-04-15' only scans the table statistics_2010_04, all other tables won't get touched - fast!
Index size: We have no big fat table with a big fat index on column date. We have small tables per month, with small indexes - faster reads.
Maintenance: We can run vacuum full, reindex, cluster on each month table without locking all other data
For the correct use of table inheritance as a performance booster, look at the postgresql manual.
You need to set CHECK constraints on each table to tell the database, on which key your data gets split (partitioned).
I make heavy use of table inheritance, especially when it comes to storing log data grouped by month. Hint: If you store data, which will never change (log data), create or indexes with CREATE INDEX ON () WITH(fillfactor=100); This means no space for updates will be reserved in the index - index is smaller on disk.
UPDATE:
fillfactor default is 100, from http://www.postgresql.org/docs/9.1/static/sql-createtable.html:
The fillfactor for a table is a percentage between 10 and 100. 100 (complete packing) is the default
"Table inheritance" means something different than "class inheritance" and they serve different purposes.
Postgres is all about data definitions. Sometimes really complex data definitions. OOP (in the common Java-colored sense of things) is about subordinating behaviors to data definitions in a single atomic structure. The purpose and meaning of the word "inheritance" is significantly different here.
In OOP land I might define (being very loose with syntax and semantics here):
import life
class Animal(life.Autonomous):
metabolism = biofunc(alive=True)
def die(self):
self.metabolism = False
class Mammal(Animal):
hair_color = color(foo=bar)
def gray(self, mate):
self.hair_color = age_effect('hair', self.age)
class Human(Mammal):
alcoholic = vice_boolean(baz=balls)
The tables for this might look like:
CREATE TABLE animal
(name varchar(20) PRIMARY KEY,
metabolism boolean NOT NULL);
CREATE TABLE mammal
(hair_color varchar(20) REFERENCES hair_color(code) NOT NULL,
PRIMARY KEY (name))
INHERITS (animal);
CREATE TABLE human
(alcoholic boolean NOT NULL,
FOREIGN KEY (hair_color) REFERENCES hair_color(code),
PRIMARY KEY (name))
INHERITS (mammal);
But where are the behaviors? They don't fit anywhere. This is not the purpose of "objects" as they are discussed in the database world, because databases are concerned with data, not procedural code. You could write functions in the database to do calculations for you (often a very good idea, but not really something that fits this case) but functions are not the same thing as methods -- methods as understood in the form of OOP you are talking about are deliberately less flexible.
There is one more thing to point out about inheritance as a schematic device: As of Postgres 9.2 there is no way to reference a foreign key constraint across all of the partitions/table family members at once. You can write checks to do this or get around it another way, but its not a built-in feature (it comes down to issues with complex indexing, really, and nobody has written the bits necessary to make that automatic). Instead of using table inheritance for this purpose, often a better match in the database for object inheritance is to make schematic extensions to tables. Something like this:
CREATE TABLE animal
(name varchar(20) PRIMARY KEY,
ilk varchar(20) REFERENCES animal_ilk NOT NULL,
metabolism boolean NOT NULL);
CREATE TABLE mammal
(animal varchar(20) REFERENCES animal PRIMARY KEY,
ilk varchar(20) REFERENCES mammal_ilk NOT NULL,
hair_color varchar(20) REFERENCES hair_color(code) NOT NULL);
CREATE TABLE human
(mammal varchar(20) REFERENCES mammal PRIMARY KEY,
alcoholic boolean NOT NULL);
Now we have a canonical reference for the instance of the animal that we can reliably use as a foreign key reference, and we have an "ilk" column that references a table of xxx_ilk definitions which points to the "next" table of extended data (or indicates there is none if the ilk is the generic type itself). Writing table functions, views, etc. against this sort of schema is so easy that most ORM frameworks do exactly this sort of thing in the background when you resort to OOP-style class inheritance to create families of object types.
Inheritance can be used in an OOP paradigm as long as you do not need to create foreign keys on the parent table. By example, if you have an abstract class vehicle stored in a vehicle table and a table car that inherits from it, all cars will be visible in the vehicle table but a foreign key from a driver table on the vehicle table won't match theses records.
Inheritance can be also used as a partitionning tool. This is especially usefull when you have tables meant to be growing forever (log tables etc).
Main use of inheritance is for partitioning, but sometimes it's useful in other situations. In my database there are many tables differing only in a foreign key. My "abstract class" table "image" contains an "ID" (primary key for it must be in every table) and PostGIS 2.0 raster. Inherited tables such as "site_map" or "artifact_drawing" have a foreign key column ("site_name" text column for "site_map", "artifact_id" integer column for the "artifact_drawing" table etc.) and primary and foreign key constraints; the rest is inherited from the the "image" table. I suspect I might have to add a "description" column to all the image tables in the future, so this might save me quite a lot of work without making real issues (well, the database might run little slower).
EDIT: another good use: with two-table handling of unregistered users, other RDBMSs have problems with handling the two tables, but in PostgreSQL it is easy - just add ONLY when you are not interrested in data in the inherited "unregistered user" table.
The only experience I have with inherited tables is in partitioning. It works fine, but it's not the most sophisticated and easy to use part of PostgreSQL.
Last week we were looking the same OOP issue, but we had too many problems with Hibernate - we didn't like our setup, so we didn't use inheritance in PostgreSQL.
I use inheritance when I have more than 1 on 1 relationships between tables.
Example: suppose you want to store object map locations with attributes x, y, rotation, scale.
Now suppose you have several different kinds of objects to display on the map and each object has its own map location parameters, and map parameters are never reused.
In these cases table inheritance would be quite useful to avoid having to maintain unnormalised tables or having to create location id’s and cross referencing it to other tables.
I tried some operations on it, I will not point out if is there any actual use case for database inheritance, but I will give you some detail for making your decision. Here is an example of PostgresQL: https://www.postgresql.org/docs/15/tutorial-inheritance.html
You can try below SQL script.
CREATE TABLE IF NOT EXISTS cities (
name text,
population real,
elevation int -- (in ft)
);
CREATE TABLE IF NOT EXISTS capitals (
state char(2) UNIQUE NOT NULL
) INHERITS (cities);
ALTER TABLE cities
ADD test_id varchar(255); -- Both table would contains test col
DROP TABLE cities; -- Cannot drop because capitals depends on it
ALTER TABLE cities
ADD CONSTRAINT fk_test FOREIGN KEY (test_id) REFERENCES sometable (id);
As you can see my comments, let me summarize:
When you add/delete/update fields -> the inheritance table would also be affected.
Cannot drop the parent table.
Foreign keys would not be inherited.
From my perspective, in growing applications, we cannot easily predict the changes in the future, for me I would avoid applying this to early database developing.
When features are stable as well and we want to create some database model which much likely the same as the existing one, we can consider that use case.
Use it as little as possible. And that usually means never, it boiling down to a way of creating structures that violate the relational model, for instance by breaking the information principle and by creating bags instead of relations.
Instead, use table partitioning combined with proper relational modelling, including further normal forms.

Database Schema for Machine Tags?

Machine tags are more precise tags: http://www.flickr.com/groups/api/discuss/72157594497877875. They allow a user to basically tag anything as an object in the format
object:property=value
Any tips on a rdbms schema that implements this? Just wondering if anyone
has already dabbled with this. I imagine the schema is quite similar to implementing
rdf triples in a rdbms
Unless you start trying to get into some optimisation, you'll end up with a table with Object, Property and Value columns Each record representing a single triple.
Anything more complicated, I'd suggested looking the documentation for Jena, Sesame, etc.
If you want to continue with the RDBMS approach then the following schema might work
CREATE TABLE predicates (
id INT PRIMARY KEY,
namespace VARCHAR(255),
localName VARCHAR(255)
)
CREATE TABLE values (
subject INT,
predicate INT,
value VARCHAR(255)
)
The table predicates holds the tag definitions and values the values.
But Mat is also right. If there are more requirements then it's probably feasible to use an RDF engine with SQL persistence support.
I ended up implementing this schema