Database Schema for Machine Tags? - tags

Machine tags are more precise tags: http://www.flickr.com/groups/api/discuss/72157594497877875. They allow a user to basically tag anything as an object in the format
object:property=value
Any tips on a rdbms schema that implements this? Just wondering if anyone
has already dabbled with this. I imagine the schema is quite similar to implementing
rdf triples in a rdbms

Unless you start trying to get into some optimisation, you'll end up with a table with Object, Property and Value columns Each record representing a single triple.
Anything more complicated, I'd suggested looking the documentation for Jena, Sesame, etc.

If you want to continue with the RDBMS approach then the following schema might work
CREATE TABLE predicates (
id INT PRIMARY KEY,
namespace VARCHAR(255),
localName VARCHAR(255)
)
CREATE TABLE values (
subject INT,
predicate INT,
value VARCHAR(255)
)
The table predicates holds the tag definitions and values the values.
But Mat is also right. If there are more requirements then it's probably feasible to use an RDF engine with SQL persistence support.

I ended up implementing this schema

Related

PostgreSQL - CREATE TABLE - Apply constraints like PRIMARY KEY in attributes that are inside composite types

I want to implement an object-relational database, using PostgreSQL. I do not want to use ORACLE.
Can I create a composite type, and then use it in a table, adding a restriction for example of a primary key in one of its attributes?
Below I leave an example:
CREATE TYPE teamObj AS (
idnumeric,
name character varying,
city character varying,
estadiumName character varying,
jugadores playerObj[]
);
CREATE TABLE teamTable (
equipo equipoobj,
PRIMARY KEY (equipo.id)
);
The line PRIMARY KEY (equipo.id) gives an error, and I've read a lot of documentation of this topic, and I don't find the solution, maybe PostgreSQL has not implemented yet, or will never implement it, or I don't understand how runs PostgreSQL...
Does somebody have a solution?
Thank you.
No, you cannot do that, and I recommend that you do not create tables like that.
Your table definition should look pretty much like your type definition does. There is no need for an intermediate type definition.
Reasons:
that will not make your schema more readable
that violates the first normal form for relational databases
that does not match an object oriented design any better than the simple table definition
As a comfort, PostgreSQL will implicitly define a composite type of the same name whenever you define a table, so you can do things like
CAST(ROW(12, 'Raiders', 'Wolfschoaßing', 'Dorfwiesn', NULL) AS team)
or
CREATE FUNCTION getteam(id) RETURNS team

Many-to-Many in Postgres?

I went with PostgreSQL because it is an ORDMBS rather than a standard relational DBMS. I have a class/object (below) that I would like to implement into the database.
class User{
int id;
String name;
ArrayList<User> friends;
}
Now, a user has many friends, so, logically, the table should be declared like so:
CREATE TABLE user_table(
id INT,
name TEXT,
friends TYPEOF(user_table)[]
)
However, to my knowledge, it is not possible to use a row of a table as a type (-10 points for postgreSQL), so, instead, my array of friends is stored as integers:
CREATE TABLE user_table(
id INT,
name TEXT,
friends INT[]
)
This is an issue because elements of an array cannot reference - only the array itself can. Added to this, there seems to be no way to import the whole user (that is to say, the user and all the user's friends) without doing multiple queries.
Am I using postgreSQL wrong? It seems to me that the only efficient way to use it is by using a relational approach.
I want a cleaner object-oriented approach similar to that of Java.
I'm afraid you are indeed using PostgreSQL wrong, and possibly misunderstanding the purpose of Object-relational databases as opposed to classic relational databases. Both classes of database are still inherently relational, but the former provides allowances for inheritance and user-defined types that the latter does not.
This answer to one of your previous questions provides you with some great pointers to achieve what you're trying to do using the Postgres pattern.
Well, first off PostgreSQL absolutely supports arrays of complex types like you describe (although I don't think it has a TYPEOF operator). How would the declaration you describe work, though? You are trying to use the table type in the declaration of the table. If what you want is a composite type in an array (and I'm not really sure that it is) you would declare this in two steps:
CREATE TYPE ima_type AS ( some_id integer, some_val text);
CREATE TABLE ima_table
( some_other_id serial NOT NULL
, friendz ima_type []
)
;
That runs fine. You can also create arrays of table types, because every table definition is a type definition in Postgres.
However, in a relational database, a more traditional model would use two tables:
CREATE TABLE persons
( person_id serial NOT NULL PRIMARY KEY
, person_name text NOT NULL
)
;
CREATE TABLE friend_lookup
( person_id integer FOREIGN KEY REFERENCES persons
, friend_id integer FOREIGN KEY REFERENCES persons(person_id)
, CONSTRAINT uq_person_friend UNIQUE (person_id, friend_id)
)
;
Ignoring the fact that the persons table has absolutely no way to prevent duplicate persons (what about misspellings, middle initials, spacing, honorifics, etc?; also two different people can have the same name), this will do what you want and allow for a simple query that lists all friends.

postgresql hstore key/value vs traditional SQL performance

I need to develop a key/value backend, something like this:
Table T1 id-PK, Key - string, Value - string
INSERT into T1('String1', 'Value1')
INSERT INTO T1('String1', 'Value2')
Table T2 id-PK2, id2->external key to id
some other data in T2, which references data in T1 (like users which have those K/V etc)
I heard about PostgreSQL hstore with GIN/GIST. What is better (performance-wise)?
Doing this the traditional way with SQL joins and having separate columns(Key/Value) ?
Does PostgreSQL hstore perform better in this case?
The format of the data should be any key=>any value.
I also want to do text matching e.g. partially search for (LIKE % in SQL or using the hstore equivalent).
I plan to have around 1M-2M entries in it and probably scale at some point.
What do you recommend ? Going the SQL traditional way/PostgreSQL hstore or any other distributed key/value store with persistence?
If it helps, my server is a VPS with 1-2GB RAM, so not a pretty good hardware. I was also thinking to have a cache layer on top of this, but I think it rather complicates the problem. I just want good performance for 2M entries. Updates will be done often but searches even more often.
Thanks.
Your question is unclear because your not clear about your objective.
The key here is the index (pun intended) - if your dealing with a large amount of keys you want to be able to retrieve them with a the least lookups and without pulling up unrelated data.
Short answer is you probably don't want to use hstore, but lets look into more detail...
Does each id have many key/value pairs (hundreds+)? Don't use hstore.
Will any of your values contain large blocks of text (4kb+)? Don't use hstore.
Do you want to be able to search by keys in wildcard expressions? Don't use hstore.
Do you want to do complex joins/aggregation/reports? Don't use hstore.
Will you update the value for a single key? Don't use hstore.
Multiple keys with the same name under an id? Can't use hstore.
So what's the use of hstore? Well, one good scenario would be if you wanted to hold key/value pairs for an external application where you know you always want to retrive all key/values and will always save the data back as a block (ie, it's never edited in-place). At the same time you do want some flexibility to be able to search this data - albiet very simply - rather than storing it in say a block of XML or JSON. In this case since the number of key/value pairs are small you save on space because your compressing several tuples into one hstore.
Consider this as your table:
CREATE TABLE kv (
id /* SOME TYPE */ PRIMARY KEY,
key_name TEXT NOT NULL,
key_value TEXT,
UNIQUE(id, key_name)
);
I think the design is poorly normalized. Try something more like this:
CREATE TABLE t1
(
t1_id serial PRIMARY KEY,
<other data which depends on t1_id and nothing else>,
-- possibly an hstore, but maybe better as a separate table
t1_props hstore
);
-- if properties are done as a separate table:
CREATE TABLE t1_properties
(
t1_id int NOT NULL REFERENCES t1,
key_name text NOT NULL,
key_value text,
PRIMARY KEY (t1_id, key_name)
);
If the properties are small and you don't need to use them heavily in joins or with fancy selection criteria, and hstore may suffice. Elliot laid out some sensible things to consider in that regard.
Your reference to users suggests that this is incomplete, but you didn't really give enough information to suggest where those belong. You might get by with an array in t1, or you might be better off with a separate table.

Creating a "table of tables" in PostgreSQL or achieving similar functionality?

I'm just getting started with PostgreSQL, and I'm new to database design.
I'm writing software in which I have various plugins that update a database. Each plugin periodically updates its own designated table in the database. So a plugin named 'KeyboardPlugin' will update the 'KeyboardTable', and 'MousePlugin' will update the 'MouseTable'. I'd like for my database to store these 'plugin-table' relationships while enforcing referential integrity. So ideally, I'd like a configuration table with the following columns:
Plugin-Name (type 'text')
Table-Name (type ?)
My software will read from this configuration table to help the plugins determine which table to update. Originally, my idea was to have the second column (Table-Name) be of type 'text'. But then, if someone mistypes the table name, or an existing relationship becomes invalid because of someone deleting a table, we have problems. I'd like for the 'Table-Name' column to act as a reference to another table, while enforcing referential integrity.
What is the best way to do this in PostgreSQL? Feel free to suggest an entirely new way to setup my database, different from what I'm currently exploring. Also, if it helps you answer my question, I'm using the pgAdmin tool to setup my database.
I appreciate your help.
I would go with your original plan to store the name as text. Possibly enhanced by additionally storing the schema name:
addin text
,sch text
,tbl text
Tables have an OID in the system catalog (pg_catalog.pg_class). You can get those with a nifty special cast:
SELECT 'myschema.mytable'::regclass
But the OID can change over a dump / restore. So just store the names as text and verify the table is there by casting it like demonstrated at application time.
Of course, if you use each tables for multiple addins it might pay to make a separate table
CREATE TABLE tbl (
,tbl_id serial PRIMARY KEY
,sch text
,name text
);
and reference it in ...
CREATE TABLE addin (
,addin_id serial PRIMARY KEY
,addin text
,tbl_id integer REFERENCES tbl(tbl_id) ON UPDATE CASCADE ON DELETE CASCADE
);
Or even make it an n:m relationship if addins have multiple tables. But be aware, as #OMG_Ponies commented, that a setup like this will require you to execute a lot of dynamic SQL because you don't know the identifiers beforehand.
I guess all plugins have a set of basic attributes and then each plugin will have a set of plugin-specific attributes. If this is the case you can use a single table together with the hstore datatype (a standard extension that just needs to be installed).
Something like this:
CREATE TABLE plugins
(
plugin_name text not null primary key,
common_int_attribute integer not null,
common_text_attribute text not null,
plugin_atttributes hstore
)
Then you can do something like this:
INSERT INTO plugins
(plugin_name, common_int_attribute, common_text_attribute, hstore)
VALUES
('plugin_1', 42, 'foobar', 'some_key => "the fish", other_key => 24'),
('plugin_2', 100, 'foobar', 'weird_key => 12345, more_info => "10.2.4"');
This creates two plugins named plugin_1 and plugin_2
Plugin_1 has the additional attributes "some_key" and "other_key", while plugin_2 stores the keys "weird_key" and "more_info".
You can index those hstore columns and query them very efficiently.
The following will select all plugins that have a key "weird_key" defined.
SELECT *
FROM plugins
WHERE plugin_attributes ? 'weird_key'
The following statement will select all plugins that have a key some_key with the value the fish:
SELECT *
FROM plugins
WHERE plugin_attributes #> ('some_key => "the fish"')
Much more convenient than using an EAV model in my opinion (and most probably a lot faster as well).
The only drawback is that you lose type-safety with this approach (but usually you'd lose that with the EAV concept as well).
You don't need an application catalog. Just add the application name to the keys of the table. This of course assumes that all the tables have the same structure. If not: use the application name for a table name, or as others have suggested: as a schema name( which also would allow for multiple tables per application).
EDIT:
But the real issue is of course that you should first model your data, and than build the applications to manipulate it. The data should not serve the code; the code should serve the data.

When to use inherited tables in PostgreSQL?

In which situations you should use inherited tables? I tried to use them very briefly and inheritance didn't seem like in OOP world.
I thought it worked like this:
Table users has all fields required for all user levels. Tables like moderators, admins, bloggers, etc but fields are not checked from parent. For example users has email field and inherited bloggers has it now too but it's not unique for both users and bloggers at the same time. ie. same as I add email field to both tables.
The only usage I could think of is fields that are usually used, like row_is_deleted, created_at, modified_at. Is this the only usage for inherited tables?
There are some major reasons for using table inheritance in postgres.
Let's say, we have some tables needed for statistics, which are created and filled each month:
statistics
- statistics_2010_04 (inherits statistics)
- statistics_2010_05 (inherits statistics)
In this sample, we have 2.000.000 rows in each table. Each table has a CHECK constraint to make sure only data for the matching month gets stored in it.
So what makes the inheritance a cool feature - why is it cool to split the data?
PERFORMANCE: When selecting data, we SELECT * FROM statistics WHERE date BETWEEN x and Y, and Postgres only uses the tables, where it makes sense. Eg. SELECT * FROM statistics WHERE date BETWEEN '2010-04-01' AND '2010-04-15' only scans the table statistics_2010_04, all other tables won't get touched - fast!
Index size: We have no big fat table with a big fat index on column date. We have small tables per month, with small indexes - faster reads.
Maintenance: We can run vacuum full, reindex, cluster on each month table without locking all other data
For the correct use of table inheritance as a performance booster, look at the postgresql manual.
You need to set CHECK constraints on each table to tell the database, on which key your data gets split (partitioned).
I make heavy use of table inheritance, especially when it comes to storing log data grouped by month. Hint: If you store data, which will never change (log data), create or indexes with CREATE INDEX ON () WITH(fillfactor=100); This means no space for updates will be reserved in the index - index is smaller on disk.
UPDATE:
fillfactor default is 100, from http://www.postgresql.org/docs/9.1/static/sql-createtable.html:
The fillfactor for a table is a percentage between 10 and 100. 100 (complete packing) is the default
"Table inheritance" means something different than "class inheritance" and they serve different purposes.
Postgres is all about data definitions. Sometimes really complex data definitions. OOP (in the common Java-colored sense of things) is about subordinating behaviors to data definitions in a single atomic structure. The purpose and meaning of the word "inheritance" is significantly different here.
In OOP land I might define (being very loose with syntax and semantics here):
import life
class Animal(life.Autonomous):
metabolism = biofunc(alive=True)
def die(self):
self.metabolism = False
class Mammal(Animal):
hair_color = color(foo=bar)
def gray(self, mate):
self.hair_color = age_effect('hair', self.age)
class Human(Mammal):
alcoholic = vice_boolean(baz=balls)
The tables for this might look like:
CREATE TABLE animal
(name varchar(20) PRIMARY KEY,
metabolism boolean NOT NULL);
CREATE TABLE mammal
(hair_color varchar(20) REFERENCES hair_color(code) NOT NULL,
PRIMARY KEY (name))
INHERITS (animal);
CREATE TABLE human
(alcoholic boolean NOT NULL,
FOREIGN KEY (hair_color) REFERENCES hair_color(code),
PRIMARY KEY (name))
INHERITS (mammal);
But where are the behaviors? They don't fit anywhere. This is not the purpose of "objects" as they are discussed in the database world, because databases are concerned with data, not procedural code. You could write functions in the database to do calculations for you (often a very good idea, but not really something that fits this case) but functions are not the same thing as methods -- methods as understood in the form of OOP you are talking about are deliberately less flexible.
There is one more thing to point out about inheritance as a schematic device: As of Postgres 9.2 there is no way to reference a foreign key constraint across all of the partitions/table family members at once. You can write checks to do this or get around it another way, but its not a built-in feature (it comes down to issues with complex indexing, really, and nobody has written the bits necessary to make that automatic). Instead of using table inheritance for this purpose, often a better match in the database for object inheritance is to make schematic extensions to tables. Something like this:
CREATE TABLE animal
(name varchar(20) PRIMARY KEY,
ilk varchar(20) REFERENCES animal_ilk NOT NULL,
metabolism boolean NOT NULL);
CREATE TABLE mammal
(animal varchar(20) REFERENCES animal PRIMARY KEY,
ilk varchar(20) REFERENCES mammal_ilk NOT NULL,
hair_color varchar(20) REFERENCES hair_color(code) NOT NULL);
CREATE TABLE human
(mammal varchar(20) REFERENCES mammal PRIMARY KEY,
alcoholic boolean NOT NULL);
Now we have a canonical reference for the instance of the animal that we can reliably use as a foreign key reference, and we have an "ilk" column that references a table of xxx_ilk definitions which points to the "next" table of extended data (or indicates there is none if the ilk is the generic type itself). Writing table functions, views, etc. against this sort of schema is so easy that most ORM frameworks do exactly this sort of thing in the background when you resort to OOP-style class inheritance to create families of object types.
Inheritance can be used in an OOP paradigm as long as you do not need to create foreign keys on the parent table. By example, if you have an abstract class vehicle stored in a vehicle table and a table car that inherits from it, all cars will be visible in the vehicle table but a foreign key from a driver table on the vehicle table won't match theses records.
Inheritance can be also used as a partitionning tool. This is especially usefull when you have tables meant to be growing forever (log tables etc).
Main use of inheritance is for partitioning, but sometimes it's useful in other situations. In my database there are many tables differing only in a foreign key. My "abstract class" table "image" contains an "ID" (primary key for it must be in every table) and PostGIS 2.0 raster. Inherited tables such as "site_map" or "artifact_drawing" have a foreign key column ("site_name" text column for "site_map", "artifact_id" integer column for the "artifact_drawing" table etc.) and primary and foreign key constraints; the rest is inherited from the the "image" table. I suspect I might have to add a "description" column to all the image tables in the future, so this might save me quite a lot of work without making real issues (well, the database might run little slower).
EDIT: another good use: with two-table handling of unregistered users, other RDBMSs have problems with handling the two tables, but in PostgreSQL it is easy - just add ONLY when you are not interrested in data in the inherited "unregistered user" table.
The only experience I have with inherited tables is in partitioning. It works fine, but it's not the most sophisticated and easy to use part of PostgreSQL.
Last week we were looking the same OOP issue, but we had too many problems with Hibernate - we didn't like our setup, so we didn't use inheritance in PostgreSQL.
I use inheritance when I have more than 1 on 1 relationships between tables.
Example: suppose you want to store object map locations with attributes x, y, rotation, scale.
Now suppose you have several different kinds of objects to display on the map and each object has its own map location parameters, and map parameters are never reused.
In these cases table inheritance would be quite useful to avoid having to maintain unnormalised tables or having to create location id’s and cross referencing it to other tables.
I tried some operations on it, I will not point out if is there any actual use case for database inheritance, but I will give you some detail for making your decision. Here is an example of PostgresQL: https://www.postgresql.org/docs/15/tutorial-inheritance.html
You can try below SQL script.
CREATE TABLE IF NOT EXISTS cities (
name text,
population real,
elevation int -- (in ft)
);
CREATE TABLE IF NOT EXISTS capitals (
state char(2) UNIQUE NOT NULL
) INHERITS (cities);
ALTER TABLE cities
ADD test_id varchar(255); -- Both table would contains test col
DROP TABLE cities; -- Cannot drop because capitals depends on it
ALTER TABLE cities
ADD CONSTRAINT fk_test FOREIGN KEY (test_id) REFERENCES sometable (id);
As you can see my comments, let me summarize:
When you add/delete/update fields -> the inheritance table would also be affected.
Cannot drop the parent table.
Foreign keys would not be inherited.
From my perspective, in growing applications, we cannot easily predict the changes in the future, for me I would avoid applying this to early database developing.
When features are stable as well and we want to create some database model which much likely the same as the existing one, we can consider that use case.
Use it as little as possible. And that usually means never, it boiling down to a way of creating structures that violate the relational model, for instance by breaking the information principle and by creating bags instead of relations.
Instead, use table partitioning combined with proper relational modelling, including further normal forms.