NoSQL conception with Cassandra - how to insert? - triggers

I have to evaluate the opportunity to manage some data in a NoSQL base. I chose Cassandra (Apache) to do this.
I created a base:
CREATE KEYSPACE cv WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor' : 1};
with only one little table:
CREATE TABLE element (Id_el int, name varchar, Fk int, PRIMARY KEY(Id_el));
-When trying to insert sample values:
INSERT INTO element (Id_el, name, Fk) VALUES (1, 'smith', 2);
Cassandra.bat answer me:
Trigger directory doesn't exist, please create it and try again.
-When I go int:
C:\var\lib\cassandra\data\cv\
there is element directory, but it is void - not even a schema of table.
P.S.:
-I know there is no foreign key in NoSQL. It is only to evaluated the opportunity to make some Evolved requests in a columns oriented NoSQL.
-I have made other researches on Internet, with few disparate results. Also red french tutorials on développez.net.

Related

PostgreSQL - on conflict update for GENERATED ALWAYS AS IDENTITY

I have a table of values which I want to manage with my application ...
let's say this is the table
CREATE TABLE student (
id_student int4 NOT NULL GENERATED ALWAYS AS IDENTITY,
id_teacher int2
student_name varchar(255),
age int2
CONSTRAINT provider_pk PRIMARY KEY (id_student)
);
In the application, each teacher can see the list of all his/her students .. and they can edit or add new students
I am trying to figure out how to UPSERT data in the table in PostgreSQL ... what I am doing now is for each teacher (after the manipulation in the app) they are allowed to edit on FE only in JS (without the necessity of saving each change individually)... so after the edit, they click SAVE button and that's the time I need to store the changes and new records in the DB ...
what I do now, is I delete all records for that particular teacher and store the new object/array they created (by editing, adding, .. whatever) - so it's easy and I don't have to check for changes and new records ... the drawbacks is a brutal waste of the sequence for ID_STUDENT (autogenerated on the DB side) and of course a huge overhead on indexes while inserting (=rebuilding) considering there will be a lot of teachers saving a lot of their students .. that might cause some perf issues ... not to mention the fragmenting (HWM) so I would have to VACUUM regularly on this table
In Oracle, I could easily use MERGE INTO (which is fantastic for this use case) but the MERGE is not in the PostgreSQL :(
the only thing I know about is the INSERT ON CONFLICT UPDATE ... but the problem is, how am I supposed to apply this on GENERATED ALWAYS AS IDENTITY key? I do not provide this sequence (on top of that I don't even know the latest number) and therefore I cannot trigger the ON CONFLICT (id_student) ....
is there any nice way out of this sh*t ? Or DELETE / INSERT is really the way to go?
You shouldn't be too worried about the data churn – after all, an UPDATE also writes a new version of the row, so it wouldn't be that much different. And the sequence is no problem, because you used bigint for the primary key, right (anything else would have been a mistake)?
If you want to use INSERT ... ON CONFLICT in combination with an auto-generated sequence, you need some way beside the primary key to identify a row, that is, you need a UNIQUE constraint that you can use with ON CONFLICT. If there is no candidate for such a constraint, how can you identify the records for the teacher?

Neo4j Best Practice for Index on Relationship Property?

I'm trying to migrate a PostgreSQL database to Neo4j and have the following m-n relationship in PostgreSQL:
user-(blocked)-user
So in PostgreSQL I have an extra table "blocked" that has the following columns: userid1, userid2, blockedsince, blockeduntil
I also have an index on blocked.blockeduntil to search for rows that eventually must be removed when the blocking is over.
How can I build this relationship including the index in Neo4j?
I already included all user-rows in a node type "user" in Neo4j.
Next step would have been to create a relationship called "blocked" from user to user.
So basically like this: (user)-[:blocked]->(user)
I would have added the 2 relationship properties "blockeduntil" and "blockedsince" to the relationship. But it does not seem to be possible to create an index on the relationship property blocked.blockeduntil
Original code for PostgreSQL:
CREATE TABLE user (
UserId bigserial NOT NULL PRIMARY KEY,
...
);
CREATE TABLE blocked(
UserId1 bigint NOT NULL references user(UserId),
UserId2 bigint NOT NULL references user(UserId),
BlockedSince timestamp NOT NULL,
BlockedUntil timestamp NOT NULL,
PRIMARY KEY (UserId1, UserId2)
);
CREATE INDEX "IdxBlocked" on blocked(blockeduntil);
My first approach in Neo4j:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///blocked.csv" AS row
MATCH (u1:user {userid: toInteger(row.userid1)})
MATCH (u2:user {userid: toInteger(row.userid2)})
MERGE (u1)-[b:BLOCKED]->(u2)
ON CREATE SET b.blockedsince = localdatetime(replace(row.blockedsince, " ", "T")), b.blockeduntil = localdatetime(replace(row.blockeduntil, " ", "T"));
What is the best practice to achieve this relationship including the index on blockeduntil? Would it be better to create a new node type called "blocked" like this?
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///blocked.csv" AS row
CREATE (:blocked{userid1: toInteger(row.userid1), userid2: toInteger(row.userid2), blockedsince: localdatetime(replace(row.blockedsince, " ", "T")),
blockeduntil: localdatetime(replace(row.blockeduntil, " ", "T"))});
And then create an index on blocked.blockeduntil like this?
CREATE INDEX ON :blocked(blockeduntil);
During my research I stumbled upon explicit indexes Explicit Indexes but they seem to be deprecated. Also I'm not sure if Full Text Indexes are the right choice here.
Ok, it seems that I found an official answer from a Neo4J staff member.
https://community.neo4j.com/t/how-can-i-use-index-in-relationship/1627/2
From the post:
We instead recommend refactoring your model. If you need to perform an index lookup of something, that usually suggests that thing would be better modeled as a node. As a node you are able to create the indexes and constraints you need for quick lookup.
So I will model "blocked" as a node and create an index on blocked.blockeduntil.

How to maintain record history on table with one-to-many relationships?

I have a "services" table for detailing services that we provide. Among the data that needs recording are several small one-to-many relationships (all with a foreign key constraint to the service_id) such as:
service_owners -- user_ids responsible for delivery of service
service_tags -- e.g. IT, Records Management, Finance
customer_categories -- ENUM value
provider_categories -- ENUM value
software_used -- self-explanatory
The problem I have is that I want to keep a history of updates to a service, for which I'm using an update trigger on the table, that performs an insert into a history table matching the original columns. However, if a normalized approach to the above data is used, with separate tables and foreign keys for each one-to-many relationship, any update on these tables will not be recognised in the history of the service.
Does anyone have any suggestions? It seems like I need to store child keys in the service table to maintain the integrity of the service history. Is a delimited text field a valid approach here or, as I am using postgreSQL, perhaps arrays are also a valid option? These feel somewhat dirty though!
Thanks.
If your table is:
create table T (
ix int identity primary key,
val nvarchar(50)
)
And your history table is:
create table THistory (
ix int identity primary key,
val nvarchar(50),
updateType char(1), -- C=Create, U=Update or D=Delete
updateTime datetime,
updateUsername sysname
)
Then you just need to put an update trigger on all tables of interest. You can then find out what the state of any/all of the tables were at any point in history, to determine what the relationships were at that time.
I'd avoid using arrays in any database whenever possible.
I don't like updates for the exact reason you are saying here...you lose information as it's over written. My answer is quite simple...don't update. Not sure if you're at a point where this can be implemented...but if you can I'd recommend using the main table itself to store historical (no need for a second set of history tables).
Add a column to your main header table called 'active'. This can be a character or a bit (0 is off and 1 is on). Then it's a bit of trigger magic...when an update is preformed, you insert a row into the table identical to the record being over-written with a status of '0' (or inactive) and then update the existing row (this process keeps the ID column on the active record the same, the newly inserted record is the inactive one with a new ID).
This way no data is ever lost (admittedly you are storing quite a few rows...) and the history can easily be viewed with a select where active = 0.
The pain here is if you are working on something already implemented...every existing query that hits this table will need to be updated to include a check for the active column. Makes this solution very easy to implement if you are designing a new system, but a pain if it's a long standing application. Unfortunately existing reports will include both off and on records (without throwing an error) until you can modify the where clause

Creating a "table of tables" in PostgreSQL or achieving similar functionality?

I'm just getting started with PostgreSQL, and I'm new to database design.
I'm writing software in which I have various plugins that update a database. Each plugin periodically updates its own designated table in the database. So a plugin named 'KeyboardPlugin' will update the 'KeyboardTable', and 'MousePlugin' will update the 'MouseTable'. I'd like for my database to store these 'plugin-table' relationships while enforcing referential integrity. So ideally, I'd like a configuration table with the following columns:
Plugin-Name (type 'text')
Table-Name (type ?)
My software will read from this configuration table to help the plugins determine which table to update. Originally, my idea was to have the second column (Table-Name) be of type 'text'. But then, if someone mistypes the table name, or an existing relationship becomes invalid because of someone deleting a table, we have problems. I'd like for the 'Table-Name' column to act as a reference to another table, while enforcing referential integrity.
What is the best way to do this in PostgreSQL? Feel free to suggest an entirely new way to setup my database, different from what I'm currently exploring. Also, if it helps you answer my question, I'm using the pgAdmin tool to setup my database.
I appreciate your help.
I would go with your original plan to store the name as text. Possibly enhanced by additionally storing the schema name:
addin text
,sch text
,tbl text
Tables have an OID in the system catalog (pg_catalog.pg_class). You can get those with a nifty special cast:
SELECT 'myschema.mytable'::regclass
But the OID can change over a dump / restore. So just store the names as text and verify the table is there by casting it like demonstrated at application time.
Of course, if you use each tables for multiple addins it might pay to make a separate table
CREATE TABLE tbl (
,tbl_id serial PRIMARY KEY
,sch text
,name text
);
and reference it in ...
CREATE TABLE addin (
,addin_id serial PRIMARY KEY
,addin text
,tbl_id integer REFERENCES tbl(tbl_id) ON UPDATE CASCADE ON DELETE CASCADE
);
Or even make it an n:m relationship if addins have multiple tables. But be aware, as #OMG_Ponies commented, that a setup like this will require you to execute a lot of dynamic SQL because you don't know the identifiers beforehand.
I guess all plugins have a set of basic attributes and then each plugin will have a set of plugin-specific attributes. If this is the case you can use a single table together with the hstore datatype (a standard extension that just needs to be installed).
Something like this:
CREATE TABLE plugins
(
plugin_name text not null primary key,
common_int_attribute integer not null,
common_text_attribute text not null,
plugin_atttributes hstore
)
Then you can do something like this:
INSERT INTO plugins
(plugin_name, common_int_attribute, common_text_attribute, hstore)
VALUES
('plugin_1', 42, 'foobar', 'some_key => "the fish", other_key => 24'),
('plugin_2', 100, 'foobar', 'weird_key => 12345, more_info => "10.2.4"');
This creates two plugins named plugin_1 and plugin_2
Plugin_1 has the additional attributes "some_key" and "other_key", while plugin_2 stores the keys "weird_key" and "more_info".
You can index those hstore columns and query them very efficiently.
The following will select all plugins that have a key "weird_key" defined.
SELECT *
FROM plugins
WHERE plugin_attributes ? 'weird_key'
The following statement will select all plugins that have a key some_key with the value the fish:
SELECT *
FROM plugins
WHERE plugin_attributes #> ('some_key => "the fish"')
Much more convenient than using an EAV model in my opinion (and most probably a lot faster as well).
The only drawback is that you lose type-safety with this approach (but usually you'd lose that with the EAV concept as well).
You don't need an application catalog. Just add the application name to the keys of the table. This of course assumes that all the tables have the same structure. If not: use the application name for a table name, or as others have suggested: as a schema name( which also would allow for multiple tables per application).
EDIT:
But the real issue is of course that you should first model your data, and than build the applications to manipulate it. The data should not serve the code; the code should serve the data.

When to use inherited tables in PostgreSQL?

In which situations you should use inherited tables? I tried to use them very briefly and inheritance didn't seem like in OOP world.
I thought it worked like this:
Table users has all fields required for all user levels. Tables like moderators, admins, bloggers, etc but fields are not checked from parent. For example users has email field and inherited bloggers has it now too but it's not unique for both users and bloggers at the same time. ie. same as I add email field to both tables.
The only usage I could think of is fields that are usually used, like row_is_deleted, created_at, modified_at. Is this the only usage for inherited tables?
There are some major reasons for using table inheritance in postgres.
Let's say, we have some tables needed for statistics, which are created and filled each month:
statistics
- statistics_2010_04 (inherits statistics)
- statistics_2010_05 (inherits statistics)
In this sample, we have 2.000.000 rows in each table. Each table has a CHECK constraint to make sure only data for the matching month gets stored in it.
So what makes the inheritance a cool feature - why is it cool to split the data?
PERFORMANCE: When selecting data, we SELECT * FROM statistics WHERE date BETWEEN x and Y, and Postgres only uses the tables, where it makes sense. Eg. SELECT * FROM statistics WHERE date BETWEEN '2010-04-01' AND '2010-04-15' only scans the table statistics_2010_04, all other tables won't get touched - fast!
Index size: We have no big fat table with a big fat index on column date. We have small tables per month, with small indexes - faster reads.
Maintenance: We can run vacuum full, reindex, cluster on each month table without locking all other data
For the correct use of table inheritance as a performance booster, look at the postgresql manual.
You need to set CHECK constraints on each table to tell the database, on which key your data gets split (partitioned).
I make heavy use of table inheritance, especially when it comes to storing log data grouped by month. Hint: If you store data, which will never change (log data), create or indexes with CREATE INDEX ON () WITH(fillfactor=100); This means no space for updates will be reserved in the index - index is smaller on disk.
UPDATE:
fillfactor default is 100, from http://www.postgresql.org/docs/9.1/static/sql-createtable.html:
The fillfactor for a table is a percentage between 10 and 100. 100 (complete packing) is the default
"Table inheritance" means something different than "class inheritance" and they serve different purposes.
Postgres is all about data definitions. Sometimes really complex data definitions. OOP (in the common Java-colored sense of things) is about subordinating behaviors to data definitions in a single atomic structure. The purpose and meaning of the word "inheritance" is significantly different here.
In OOP land I might define (being very loose with syntax and semantics here):
import life
class Animal(life.Autonomous):
metabolism = biofunc(alive=True)
def die(self):
self.metabolism = False
class Mammal(Animal):
hair_color = color(foo=bar)
def gray(self, mate):
self.hair_color = age_effect('hair', self.age)
class Human(Mammal):
alcoholic = vice_boolean(baz=balls)
The tables for this might look like:
CREATE TABLE animal
(name varchar(20) PRIMARY KEY,
metabolism boolean NOT NULL);
CREATE TABLE mammal
(hair_color varchar(20) REFERENCES hair_color(code) NOT NULL,
PRIMARY KEY (name))
INHERITS (animal);
CREATE TABLE human
(alcoholic boolean NOT NULL,
FOREIGN KEY (hair_color) REFERENCES hair_color(code),
PRIMARY KEY (name))
INHERITS (mammal);
But where are the behaviors? They don't fit anywhere. This is not the purpose of "objects" as they are discussed in the database world, because databases are concerned with data, not procedural code. You could write functions in the database to do calculations for you (often a very good idea, but not really something that fits this case) but functions are not the same thing as methods -- methods as understood in the form of OOP you are talking about are deliberately less flexible.
There is one more thing to point out about inheritance as a schematic device: As of Postgres 9.2 there is no way to reference a foreign key constraint across all of the partitions/table family members at once. You can write checks to do this or get around it another way, but its not a built-in feature (it comes down to issues with complex indexing, really, and nobody has written the bits necessary to make that automatic). Instead of using table inheritance for this purpose, often a better match in the database for object inheritance is to make schematic extensions to tables. Something like this:
CREATE TABLE animal
(name varchar(20) PRIMARY KEY,
ilk varchar(20) REFERENCES animal_ilk NOT NULL,
metabolism boolean NOT NULL);
CREATE TABLE mammal
(animal varchar(20) REFERENCES animal PRIMARY KEY,
ilk varchar(20) REFERENCES mammal_ilk NOT NULL,
hair_color varchar(20) REFERENCES hair_color(code) NOT NULL);
CREATE TABLE human
(mammal varchar(20) REFERENCES mammal PRIMARY KEY,
alcoholic boolean NOT NULL);
Now we have a canonical reference for the instance of the animal that we can reliably use as a foreign key reference, and we have an "ilk" column that references a table of xxx_ilk definitions which points to the "next" table of extended data (or indicates there is none if the ilk is the generic type itself). Writing table functions, views, etc. against this sort of schema is so easy that most ORM frameworks do exactly this sort of thing in the background when you resort to OOP-style class inheritance to create families of object types.
Inheritance can be used in an OOP paradigm as long as you do not need to create foreign keys on the parent table. By example, if you have an abstract class vehicle stored in a vehicle table and a table car that inherits from it, all cars will be visible in the vehicle table but a foreign key from a driver table on the vehicle table won't match theses records.
Inheritance can be also used as a partitionning tool. This is especially usefull when you have tables meant to be growing forever (log tables etc).
Main use of inheritance is for partitioning, but sometimes it's useful in other situations. In my database there are many tables differing only in a foreign key. My "abstract class" table "image" contains an "ID" (primary key for it must be in every table) and PostGIS 2.0 raster. Inherited tables such as "site_map" or "artifact_drawing" have a foreign key column ("site_name" text column for "site_map", "artifact_id" integer column for the "artifact_drawing" table etc.) and primary and foreign key constraints; the rest is inherited from the the "image" table. I suspect I might have to add a "description" column to all the image tables in the future, so this might save me quite a lot of work without making real issues (well, the database might run little slower).
EDIT: another good use: with two-table handling of unregistered users, other RDBMSs have problems with handling the two tables, but in PostgreSQL it is easy - just add ONLY when you are not interrested in data in the inherited "unregistered user" table.
The only experience I have with inherited tables is in partitioning. It works fine, but it's not the most sophisticated and easy to use part of PostgreSQL.
Last week we were looking the same OOP issue, but we had too many problems with Hibernate - we didn't like our setup, so we didn't use inheritance in PostgreSQL.
I use inheritance when I have more than 1 on 1 relationships between tables.
Example: suppose you want to store object map locations with attributes x, y, rotation, scale.
Now suppose you have several different kinds of objects to display on the map and each object has its own map location parameters, and map parameters are never reused.
In these cases table inheritance would be quite useful to avoid having to maintain unnormalised tables or having to create location id’s and cross referencing it to other tables.
I tried some operations on it, I will not point out if is there any actual use case for database inheritance, but I will give you some detail for making your decision. Here is an example of PostgresQL: https://www.postgresql.org/docs/15/tutorial-inheritance.html
You can try below SQL script.
CREATE TABLE IF NOT EXISTS cities (
name text,
population real,
elevation int -- (in ft)
);
CREATE TABLE IF NOT EXISTS capitals (
state char(2) UNIQUE NOT NULL
) INHERITS (cities);
ALTER TABLE cities
ADD test_id varchar(255); -- Both table would contains test col
DROP TABLE cities; -- Cannot drop because capitals depends on it
ALTER TABLE cities
ADD CONSTRAINT fk_test FOREIGN KEY (test_id) REFERENCES sometable (id);
As you can see my comments, let me summarize:
When you add/delete/update fields -> the inheritance table would also be affected.
Cannot drop the parent table.
Foreign keys would not be inherited.
From my perspective, in growing applications, we cannot easily predict the changes in the future, for me I would avoid applying this to early database developing.
When features are stable as well and we want to create some database model which much likely the same as the existing one, we can consider that use case.
Use it as little as possible. And that usually means never, it boiling down to a way of creating structures that violate the relational model, for instance by breaking the information principle and by creating bags instead of relations.
Instead, use table partitioning combined with proper relational modelling, including further normal forms.