Cassandra DataModel Query - nosql

Im new to Cassandra and try to understand the datamodel. so i know how to insert if "bob" is following "james". i also know how to query to get a list of all people who follow "bob" and i know how to query to get a list of who "bob" is following.
My Question is, given the below, what does the query look like if i would like to find out if "bob" is following "james" ? (Yes or No)
Is this the right query?
SELECT * FROM followers WHERE username="bob" AND following="james"
Do i need to set a second Index on FOLLOWING to be able to execute the above query?
-- User storage
CREATE TABLE users (username text PRIMARY KEY, password text);
-- Users user is following
CREATE TABLE following (
username text,
followed text,
PRIMARY KEY(username, followed)
);
-- Users who follow user
CREATE TABLE followers (
username text,
following text,
PRIMARY KEY(username, following)
);

No need for a secondary index in this case. You can always test quick ideas like this using the cqlsh shell.
cqlsh> use ks;
cqlsh:ks> CREATE TABLE followers (
... username text,
... following text,
... PRIMARY KEY(username, following)
... );
cqlsh:ks> INSERT INTO followers (username, following ) VALUES ( 'bob', 'james' );
cqlsh:ks> SELECT * FROM followers WHERE username='bob' and following='james';
username | following
----------+-----------
bob | james
The reason why you don't have to make an secondary index (nor should you if you would like perform this sort of query at scale) is because 'following' is specified as a clustering key. This means 'following' describes the layout of the data in the partition meaning we can filter on 'following' very quickly.
As an aside, if a query that is performed frequently requires a secondary index (Allow filtering) that is an indication that you should be rethinking your datamodel.

Related

Select rows using part of UUID4

I have table of users with UUIDv4 as primary key.
How can I select all rows with id starting with 'e2eb5'?
I tried following select:
SELECT * FROM "user" WHERE "id" LIKE 'e2eb5%';
In my application there are less than one thousand users and first part of UUID should be just all info you need to identify them.
Therefore I want user detail to be on url like this:
/users/e2eb5
Instead of:
/users/3b0fbfd6-0661-4880-b5c5-4659ed85fa96
Edit:
Querying it as suggested here: How to query UUID for postgres
where some_uuid between 'e99aec55-0000-0000-0000-000000000000'
and 'e99aec55-ffff-ffff-ffff-ffffffffffff'
is not viable solution as it requires either fixed length of uuid prefix or writing more complex query.
You can convert the id to text and compare how you have in your question.
SELECT *
FROM "user"
WHERE "id"::text LIKE 'e2eb5%';
This needs to convert each row's id to text, so it can be slow with tons or rows. But working with less than 1,000 should be fine.

Fine on SQLite, broken in Postgresql: column must appear in the GROUP BY clause or be used in an aggregate function

I have a query which works fine on SQLite, but when I run it on the same data in Postgresql I get:
column "role.id" must appear in the GROUP BY clause or be used in an aggregate function
I have three tables, for people, exhibitions, and a table that links the two: "One person in one exhibition performing a particular role" (such as "Artist" or "Curator"):
CREATE TABLE "person" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"name" varchar(255));
CREATE TABLE "exhibition" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"name" varchar(255));
CREATE TABLE "role" (`id` integer NOT NULL PRIMARY KEY AUTOINCREMENT,
`name` varchar(30) NOT NULL,
`exhibition_id` integer NOT NULL,
`person_id` integer NOT NULL,
FOREIGN KEY(`exhibition_id`) REFERENCES `exhibition`(`id`),
FOREIGN KEY(`person_id`) REFERENCES `person`(`id`));
I want to display the people involved in an exhibition ordered by how many things they've done. So, I get the IDs of the people in an exhibition (1,2,3,4) and then do this:
SELECT
*,
COUNT(person.id) AS role_count
FROM person
INNER JOIN role
ON person.id = role.person_id
WHERE person.id IN ( 1, 2, 3, 4 )
GROUP BY person.id
ORDER BY role_count DESC
That orders the people by role_count, which is the number of roles they've had across all exhibitions
It works fine on SQLite, but not in Postgresql. I've tried putting role.id into the GROUP BY (instead of, and as well as, person.id) but that changes the results.
You know when you struggle for ages, post an SO question, and then immediately stumble on the answer?
From this answer I realised that I couldn't select role.id (which the SELECT * is implicitly doing) as it wasn't in the GROUP BY.
I couldn't add it to the GROUP BY (because that changes the results) so the solution was to not select it.
So I changed the SELECT part to:
SELECT
person.*,
COUNT(person.id) AS role_count
FROM person
...
Now role.id is not being selected. And that works.
If I needed any other fields from the role table, like name, I could add those explicitly too:
SELECT
person.*,
role.name,
COUNT(person.id) AS role_count
FROM person
...
Just like the error says, Standard SQL doesn't let you SELECT anything other than one of the GROUP BY columns or a call to an aggregate function. (For a logical reason: How would the RDBMS know which role.id to select when there are multiple rows to select from within a group?) PostgreSQL actually enforces this rule; SQLite ignores it and just returns data from an arbitrary row in the group.
As you discovered, omitting role.id from the SELECT fixes your error. But if you do want SQLite's behavior of selecting the ID from an arbitrary row, you can simply wrap it in an aggregate function, e.g., SELECT MAX(role.id) instead of just SELECT role.id).

How do I represent an array of tuples in postgresql?

Here's the easiest way I can think of to explain this. Imagine a user wants to bookmark a bunch of webpages. There's a url table with a UrlID and the actual url. I'd like the user to have a list of UrlIDs which are unique (but I don't need the constraint) and a 32bit int value such as an epoch date. The only two things I care about is 1) being to check if UrlID is in this list or not and 2) get the entire list and sort it by date (or second value)
If it helps I'm expecting no more than 8K bookmarks but most likely it will be <128
If you really want to avoid the extra table to express the relationship, you can do something like that:
CREATE TABLE "user" (
id integer primary key,
name text not null,
bookmarks integer[] not null
);
CREATE TABLE url (
id integer primary key,
time timestamp with time zone not null,
val text not null
);
Then finding all bookmarks for a particular user (say with id 66) would involve doing something like that:
SELECT url,time
FROM (SELECT bookmarks FROM "user" WHERE id=66) u
JOIN url ON url.id=ANY(bookmarks)
ORDER BY TIME;
Now here's why I don't like this schema. First, adding a new bookmark would require to rewrite the bookmarks array and hence the entire user row (so adding n bookmarks, one after the other, would require Θ(n^2) time). Secondly, you cannot use foreign keys on the elements of the array. Thridly, many queries will become more complicated to write, e.g. in order to retrieve all bookmarks for all users, you have to do something like that:
SELECT "user".id,"user".name,url.val,url.time
FROM "user",
LATERAL unnest((SELECT bookmarks)) b
LEFT JOIN url ON b = url.id;
Edit: So here's the schema I would use and which I think fits best with the relational paradigm
CREATE TABLE "user" (
id integer primary key,
name text not null
);
CREATE TABLE url (
id integer primary key,
val text not null
);
CREATE TABLE bookmark (
user_id integer not null REFERENCES "user",
url_id integer REFERENCES url,
time timestamp with time zone not null,
UNIQUE (user_id,url_id)
);

nosql cassandra - how to create update query based on select

I tried to create a query to update the number of rows of specific table:
UPDATE Albums (NumOfphotos)
VALUES (
SELECT COUNT(*)
FROM photos
WHERE AlbumName='nature'
)
WHERE AlbumName='nature';
This does not seem to be the right syntax,
what is the right way to do it (maybe by autokey)?
This type of query is deliberately not supported by Cassandra nor CQL.
If you review the datastax CQL documentation for update you will see that the "where specification" only takes either a primary key or a IN(keys) clause:
primary key name = key_value
primary key name IN (key_value ,...)
So your best bet is to do a
SELECT * FROM photos WHERE AlbumName='nature';
followed by updates to set the correct values.

postgresql duplicate key violates unique constraint

I have a question I know this was posted many times but I didn't find an answer to my problem. The problem is that I have a table and a column "id" I want it to be unique number just as normal. This type of column is serial and the next value after each insert is coming from a sequence so everything seems to be all right but it still sometimes shows this error. I don't know why. In the documentation, it says the sequence is foolproof and always works. If I add a UNIQUE constraint to that column will it help? I worked before many times on Postres but this error is showing for me for the first time. I did everything as normal and I never had this problem before. Can you help me to find the answer that can be used in the future for all tables that will be created? Let's say we have something easy like this:
CREATE TABLE comments
(
id serial NOT NULL,
some_column text NOT NULL,
CONSTRAINT id_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE interesting.comments OWNER TO postgres;
If i add:
ALTER TABLE comments ADD CONSTRAINT id_id_key UNIQUE(id)
Will if be enough or is there some other thing that should be done?
This article explains that your sequence might be out of sync and that you have to manually bring it back in sync.
An excerpt from the article in case the URL changes:
If you get this message when trying to insert data into a PostgreSQL
database:
ERROR: duplicate key violates unique constraint
That likely means that the primary key sequence in the table you're
working with has somehow become out of sync, likely because of a mass
import process (or something along those lines). Call it a "bug by
design", but it seems that you have to manually reset the a primary
key index after restoring from a dump file. At any rate, to see if
your values are out of sync, run these two commands:
SELECT MAX(the_primary_key) FROM the_table;
SELECT nextval('the_primary_key_sequence');
If the first value is higher than the second value, your sequence is
out of sync. Back up your PG database (just in case), then run this command:
SELECT setval('the_primary_key_sequence', (SELECT MAX(the_primary_key) FROM the_table)+1);
That will set the sequence to the next available value that's higher
than any existing primary key in the sequence.
Intro
I also encountered this problem and the solution proposed by #adamo was basically the right solution. However, I had to invest a lot of time in the details, which is why I am now writing a new answer in order to save this time for others.
Case
My case was as follows: There was a table that was filled with data using an app. Now a new entry had to be inserted manually via SQL. After that the sequence was out of sync and no more records could be inserted via the app.
Solution
As mentioned in the answer from #adamo, the sequence must be synchronized manually. For this purpose the name of the sequence is needed. For Postgres, the name of the sequence can be determined with the command PG_GET_SERIAL_SEQUENCE. Most examples use lower case table names. In my case the tables were created by an ORM middleware (like Hibernate or Entity Framework Core etc.) and their names all started with a capital letter.
In an e-mail from 2004 (link) I got the right hint.
(Let's assume for all examples, that Foo is the table's name and Foo_id the related column.)
Command to get the sequence name:
SELECT PG_GET_SERIAL_SEQUENCE('"Foo"', 'Foo_id');
So, the table name must be in double quotes, surrounded by single quotes.
1. Validate, that the sequence is out-of-sync
SELECT CURRVAL(PG_GET_SERIAL_SEQUENCE('"Foo"', 'Foo_id')) AS "Current Value", MAX("Foo_id") AS "Max Value" FROM "Foo";
When the Current Value is less than Max Value, your sequence is out-of-sync.
2. Correction
SELECT SETVAL((SELECT PG_GET_SERIAL_SEQUENCE('"Foo"', 'Foo_id')), (SELECT (MAX("Foo_id") + 1) FROM "Foo"), FALSE);
Replace the table_name to your actual name of the table.
Gives the current last id for the table. Note it that for next step.
SELECT MAX(id) FROM table_name;
Get the next id sequence according to postgresql. Make sure this id is higher than the current max id we get from step 1
SELECT nextVal('"table_name_id_seq"');
if it's not higher than then use this step 3 to update the next sequence.
SELECT setval('"table_name_id_seq"', (SELECT MAX(id) FROM table_name)+1);
The primary key is already protecting you from inserting duplicate values, as you're experiencing when you get that error. Adding another unique constraint isn't necessary to do that.
The "duplicate key" error is telling you that the work was not done because it would produce a duplicate key, not that it discovered a duplicate key already commited to the table.
For future searchs, use ON CONFLICT DO NOTHING.
Referrence - https://www.calazan.com/how-to-reset-the-primary-key-sequence-in-postgresql-with-django/
I had the same problem try this:
python manage.py sqlsequencereset table_name
Eg:
python manage.py sqlsequencereset auth
you need to run this in production settings(if you have)
and you need Postgres installed to run this on the server
From http://www.postgresql.org/docs/current/interactive/datatype.html
Note: Prior to PostgreSQL 7.3, serial implied UNIQUE. This is no longer automatic. If you wish a serial column to be in a unique constraint or a primary key, it must now be specified, same as with any other data type.
In my case carate table script is:
CREATE TABLE public."Survey_symptom_binds"
(
id integer NOT NULL DEFAULT nextval('"Survey_symptom_binds_id_seq"'::regclass),
survey_id integer,
"order" smallint,
symptom_id integer,
CONSTRAINT "Survey_symptom_binds_pkey" PRIMARY KEY (id)
)
SO:
SELECT nextval('"Survey_symptom_binds_id_seq"'::regclass),
MAX(id)
FROM public."Survey_symptom_binds";
SELECT nextval('"Survey_symptom_binds_id_seq"'::regclass) less than MAX(id) !!!
Try to fix the proble:
SELECT setval('"Survey_symptom_binds_id_seq"', (SELECT MAX(id) FROM public."Survey_symptom_binds")+1);
Good Luck every one!
I had the same problem. It was because of the type of my relations. I had a table property which related to both states and cities. So, at first I had a relation from property to states as OneToOne, and the same for cities. And I had the same error "duplicate key violates unique constraint". That means that: I can only have one property related to one state and city. But that doesnt make sense, because a city can have multiple properties. So the problem is the relation. The relation should be ManyToOne. Many properties to One city
Table name started with a capital letter if tables were created by an ORM middleware (like Hibernate or Entity Framework Core etc.)
SELECT setval('"Table_name_Id_seq"', (SELECT MAX("Id") FROM "Table_name") + 1)
WHERE
NOT EXISTS (
SELECT *
FROM (SELECT CURRVAL(PG_GET_SERIAL_SEQUENCE('"Table_name"', 'Id')) AS seq, MAX("Id") AS max_id
FROM "Table_name") AS seq_table
WHERE seq > max_id
)
try that CLI
it's just a suggestion to enhance the adamo code (thanks a lot adamo)
SELECT setval('tableName_columnName_seq', (SELECT MAX(columnName) FROM tableName));
For programatically solution at Django. Based on Paolo Melchiorre's answer, I wrote a chunk as a function to be called before any .save()
from django.db import connection
def setSqlCursor(db_table):
sql = """SELECT pg_catalog.setval(pg_get_serial_sequence('"""+db_table+"""', 'id'), MAX(id)) FROM """+db_table+""";"""
with connection.cursor() as cursor:
cursor.execute(sql)
I have similar problem but I solved it by removing all the foreign key in my Postgresql