Check current TTL on collection columns in Cassandra - cql3

Lets assume I have a Column Family with following schema:
CREATE TABLE users (
user_id timeuuid,
name varchar,
last_name varchar,
children list,
phone_numbers map,
PRIMARY KEY(user_id)
);
Then I insert a row into this CF with "USING TTL 60000". When I want to verify if any of these columns still has TTL set I get error: "Cannot use selection function ttl on collections".
My question is: how to get TTL on elements of a column that is defined as collection ?
Cheers!

I reproduced your problem -- naturally getting the very same result. The problem is that either (1) in collections TTL's are element-wise (one TTL per entry in collection) and (2) I found no way of getting entries from Maps or Lists.
Of course I can delete one element -- but selecting it or it's TTL was not possible. Even the Datastax' CQL driver v2 has not provided the metadata for that.
So you may change your data structure for that. If this was 'just' for testing purposes you have to trust Cassandra doing this well enough.

Related

Neo4j Best Practice for Index on Relationship Property?

I'm trying to migrate a PostgreSQL database to Neo4j and have the following m-n relationship in PostgreSQL:
user-(blocked)-user
So in PostgreSQL I have an extra table "blocked" that has the following columns: userid1, userid2, blockedsince, blockeduntil
I also have an index on blocked.blockeduntil to search for rows that eventually must be removed when the blocking is over.
How can I build this relationship including the index in Neo4j?
I already included all user-rows in a node type "user" in Neo4j.
Next step would have been to create a relationship called "blocked" from user to user.
So basically like this: (user)-[:blocked]->(user)
I would have added the 2 relationship properties "blockeduntil" and "blockedsince" to the relationship. But it does not seem to be possible to create an index on the relationship property blocked.blockeduntil
Original code for PostgreSQL:
CREATE TABLE user (
UserId bigserial NOT NULL PRIMARY KEY,
...
);
CREATE TABLE blocked(
UserId1 bigint NOT NULL references user(UserId),
UserId2 bigint NOT NULL references user(UserId),
BlockedSince timestamp NOT NULL,
BlockedUntil timestamp NOT NULL,
PRIMARY KEY (UserId1, UserId2)
);
CREATE INDEX "IdxBlocked" on blocked(blockeduntil);
My first approach in Neo4j:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///blocked.csv" AS row
MATCH (u1:user {userid: toInteger(row.userid1)})
MATCH (u2:user {userid: toInteger(row.userid2)})
MERGE (u1)-[b:BLOCKED]->(u2)
ON CREATE SET b.blockedsince = localdatetime(replace(row.blockedsince, " ", "T")), b.blockeduntil = localdatetime(replace(row.blockeduntil, " ", "T"));
What is the best practice to achieve this relationship including the index on blockeduntil? Would it be better to create a new node type called "blocked" like this?
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///blocked.csv" AS row
CREATE (:blocked{userid1: toInteger(row.userid1), userid2: toInteger(row.userid2), blockedsince: localdatetime(replace(row.blockedsince, " ", "T")),
blockeduntil: localdatetime(replace(row.blockeduntil, " ", "T"))});
And then create an index on blocked.blockeduntil like this?
CREATE INDEX ON :blocked(blockeduntil);
During my research I stumbled upon explicit indexes Explicit Indexes but they seem to be deprecated. Also I'm not sure if Full Text Indexes are the right choice here.
Ok, it seems that I found an official answer from a Neo4J staff member.
https://community.neo4j.com/t/how-can-i-use-index-in-relationship/1627/2
From the post:
We instead recommend refactoring your model. If you need to perform an index lookup of something, that usually suggests that thing would be better modeled as a node. As a node you are able to create the indexes and constraints you need for quick lookup.
So I will model "blocked" as a node and create an index on blocked.blockeduntil.

I need the name of the enterprise to be the same as it was when it was registered and not the value it currently has

I will explain the problem with an example:
I am designing a specific case of referential integrity in a table. In the model there are two tables, enterprise and document. We register the companies and then someone insert the documents associated with it. The name of the enterprise is variable. When it comes to recovering the documents, I need the name of the enterprise to be the same as it was when it was registered and not the value it currently has. The solution that I thought was to register the company again in each change with the same code, the updated name in this way would have the expected result, but I am not sure if it is the best solution. Can someone make a suggestion?
There are several possible solutions and it is hard to determine which one will exactly be the easiest.
Side comment: your question is limited to managing names efficiently but I would like to comment the fact that your DB is sensitive to files being moved, renamed or deleted. Your database will not be able to keep records up-to-date if anything happen at OS level. You should consider to do something about it too.
Amongst the few solution I considered, the one that is best normalized is the schema below:
CREATE TABLE Enterprise
(
IdEnterprise SERIAL PRIMARY KEY
, Code VARCHAR(4) UNIQUE
, IdName INTEGER DEFAULT -1 /* This will be used to get a single active name */
);
CREATE TABLE EnterpriseName (
IDName SERIAL PRIMARY KEY
, IdEnterprise INTEGER NOT NULL REFERENCES Enterprise(IdEnterprise) ON UPDATE NO ACTION ON DELETE CASCADE
, Name TEXT NOT NULL
);
ALTER TABLE Enterprise ADD FOREIGN KEY (IdName) REFERENCES EnterpriseName(IdName) ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED;
CREATE TABLE Document
(
IdDocument SERIAL PRIMARY KEY
, IdName INTEGER NOT NULL REFERENCES EnterpriseName(IDName) ON UPDATE NO ACTION ON DELETE NO ACTION
, FilePath TEXT NOT NULL
, Description TEXT
);
Using flag and/or timestamps or moving the enterprise name to the document table are appealing solutions, but only at first glance.
Especially, the part where you have to ensure a company always has 1, and 1 only "active" name is no easy thing to do.
Add a date range to your enterprise: valid_from, valid_to. Initialise to -infinity,+infinity. When you change the name of an enterprise, instead: update existing rows where valid_to = +infinity to be now() and insert the new name with valid_from = now(), valid_to = +infinity.
Add a date field to the document, something like create_date. Then when joining to enterprise you join on ID and d.create_date between e.valid_from and e.valid_to.
This is a simplistic approach and breaks things like uniqueness for your id and code. To handle that you could record the name in a separate table with the id,from,to,name. Leaving your original table with just the id and code for uniqueness.

Does SELECT DISTINCT on a Cassandra index column work

Can the Cassandra SELECT DISTINCT operation be used to find all the unique values of a column if that column has an index on it?
My question is not the same as simply asking how to find distinct values of a non primary key columns. I realize that Cassandra does not allow queries that would require a table-scan, because they would be inefficient; here the presence of an index eliminates the need for a table scan.
If I have a table thus:
CREATE TABLE thing (
id uuid,
version bigint,
name text,
... data columns ...
PRIMARY KEY ((id),version)
);
CREATE INDEX ON thing(name);
I can SELECT DISTINCT id FROM thing; to get all the thing IDs. That requires one response from each node in my cluster, with each response returning the keys for its node.
But can I SELECT DISTINCT name FROM thing; to get all the thing names? That should also require only one response from each node in my cluster, with each response constructed only by examining the portion of the index on its node. And if name is a good column on which to have an index, each response would be smaller that the query for the primary keys (there should be fewer names than partition keys).
At least to me the documentation suggests that I should be able to select distinct values of any column:
DISTINCT selection_list
selection_list is one of:
A list of partition keys (used with DISTINCT)
selector AS alias, selector AS alias, ...| *
Where selector is column name. The documentation makes no restriction on what column name could be.
Matter of fact, you can only use DISTINCT with partition key columns (C* 2.2.4). Using it on anything else will yield an error:
cqlsh:stresscql> SELECT distinct name FROM thing ;
InvalidRequest: code=2200 [Invalid query] message="SELECT DISTINCT queries must only request partition key columns and/or static columns (not name)"
I don't have any in-depth understanding on the workings of secondary indexes, but I also have the feeling that allowing a DISTINCT count on an indexed column should not be worse in terms of reads incurred than querying the index for a particular value.
But as indexed values repeat across nodes it would be worse in terms of memory and network overhead relative to the result size as the coordinator would condense down the nodes' responses to only contain unique values.
Though, for replication factors > 1 this is also the case for partition key values.

Setting constraint for two unique fields in PostgreSQL

I'm new to postgres. I wonder, what is a PostgreSQL way to set a constraint for a couple of unique values (so that each pair would be unique). Should I create an INDEX for bar and baz fields?
CREATE UNIQUE INDEX foo ON table_name(bar, baz);
If not, what is a right way to do that? Thanks in advance.
If each field needs to be unique unto itself, then create unique indexes on each field. If they need to be unique in combination only, then create a single unique index across both fields.
Don't forget to set each field NOT NULL if it should be. NULLs are never unique, so something like this can happen:
create table test (a int, b int);
create unique index test_a_b_unq on test (a,b);
insert into test values (NULL,1);
insert into test values (NULL,1);
and get no error. Because the two NULLs are not unique.
You can do what you are already thinking of: create a unique constraint on both fields. This way, a unique index will be created behind the scenes, and you will get the behavior you need. Plus, that information can be picked up by information_schema to do some metadata inferring if necessary on the fact that both need to be unique. I would recommend this option. You can also use triggers for this, but a unique constraint is way better for this specific requirement.

How does 'UQ' function if another field is 'PK' in MySQL workbench?

I'm creating a schema for a database using MySQL's workbench. One of my tables contains fields for a personId, as well as a national id number if they have one (which they may not).
The personId field is the one used as a unique identifier throughout the schema, so I've ticked the "PK" and "NN" options for it. Now I'd like to be able to ensure that the system won't allow a new insert with a different personId if it has the same national id as an entity that already exists. However, national ids are not primary keys and may in fact be null.
I've been looking at the 'UQ' option, but I can't find clear documentation on what it actually does. I'm worried it'll create the numbers automatically when I actually want them to be inserted by a user or left null. Does anyone know?
UQ tags a field as a unique key. This enforces uniqueness in a given field, except for NULLs. This is exactly what I need for my national id field.
From http://dev.mysql.com/doc/refman/5.5/en/create-table.html :
A UNIQUE index creates a constraint such that all values in the index must be distinct. An error occurs if you try to add a new row with a key value that matches an existing row. For all engines, a UNIQUE index permits multiple NULL values for columns that can contain NULL.