One to Many equivalent in Cassandra and data model optimization - nosql

I am modeling my database in Cassandra, coming from RDBMS. I want to know how can I create a one-to-many relationship which is embedded in the same Column Name and model my table to fit the following query needs.
For example:
Boxes:{
23442:{
belongs_to_user: user1,
box_title: 'the box title',
items:{
1: {
name: 'itemname1',
size: 44
},
2: {
name: 'itemname2',
size: 24
}
}
},
{ ... }
}
I read that its preferable to use composite columns instead of super columns, so I need an example of the best way to implement this. My queries are like:
Get items for box by Id
get top 20 boxes with their items (for displaying a range of boxes with their items on the page)
update items size by item id (increment size by a number)
get all boxes by userid (all boxes that belongs to a specific user)
I am expecting lots of writes to change the size of each item in the box. I want to know the best way to implement it without the need to use super columns. Furthermore, I don't mind getting a solution that takes Cassandra 1.2 new features into account, because I will use that in production.
Thanks

This particular model is somewhat challenging, for a number of reasons.
For example, with the box ID as a row key, querying for a range of boxes will require a range query in Cassandra (as opposed to a column slice), which means the use of an ordered partitioner. An ordered partitioner is almost always a Bad Idea.
Another challenge comes from the need to increment the item size, as this calls for the use of a counter column family. Counter column families store counter values only.
Setting aside the need for a range of box IDs for a moment, you could model this using multiple tables in CQL3 as follows:
CREATE TABLE boxes (
id int PRIMARY KEY,
belongs_to_user text,
box_title text,
);
CREATE INDEX useridx on boxes (belongs_to_user);
CREATE TABLE box_items (
id int,
item int,
size counter,
PRIMARY KEY(id, item)
);
CREATE TABLE box_item_names (
id int PRIMARY KEY,
item int,
name text
);
BEGIN BATCH
INSERT INTO boxes (id, belongs_to_user, box_title) VALUES (23442, 'user1', 'the box title');
INSERT INTO box_items (id, item, name) VALUES (23442, 1, 'itemname1');
INSERT INTO box_items (id, item, name) VALUES (23442, 1, 'itemname2');
UPDATE box_items SET size = size + 44 WHERE id = 23442 AND item = 1;
UPDATE box_items SET size = size + 24 WHERE id = 23442 AND item = 2;
APPLY BATCH
-- Get items for box by ID
SELECT size FROM box_items WHERE id = 23442 AND item = 1;
-- Boxes by user ID
SELECT * FROM boxes WHERE belongs_to_user = 'user1';
It's important to note that the BATCH mutation above is both atomic, and isolated.
Technically speaking, you could also denormalize all of this into a single table. For example:
CREATE TABLE boxes (
id int,
belongs_to_user text,
box_title text,
item int,
name text,
size counter,
PRIMARY KEY(id, item, belongs_to_user, box_title, name)
);
UPDATE boxes set size = item_size + 44 WHERE id = 23442 AND belongs_to_user = 'user1'
AND box_title = 'the box title' AND name = 'itemname1' AND item = 1;
SELECT item, name, size FROM boxes WHERE id = 23442;
However, this provides no guarantees of correctness. For example, this model makes it possible for items of the same box to have different users, or titles. And, since this makes boxes a counter column family, it limits how you can evolve the schema in the future.

I think in PlayOrm's objects first, then show the column model below....
Box {
#NoSqlId
String id;
#NoSqlEmbedded
List<Item> items;
}
User {
#NoSqlId
TimeUUID uuid;
#OneToMany
List<Box> boxes;
}
The User then is a row like so
rowkey = uuid=<someuuid> boxes.fkToBox35 = null, boxes.fktoBox37=null, boxes.fkToBox38=null
Note, the form of the above is columname=value where some of the columnnames are composite and some are not.
The box is more interesting and say Item has fields name and idnumber, then box row would be
rowkey = id=myid, items.item23.name=playdo, items.item23.idnumber=5634, itesm.item56.name=pencil, items.item56.idnumber=7894
I am not sure what you meant though on get the top 20 boxes? top boxes meaning by the number of items in them?
Dean

You can use Query-Driven Methodology, for data modeling.You have the three broad access paths:
1) partition per query
2) partition+ per query (one or more partitions)
3) table or table+ per query
The most efficient option is the “partition per query”. This article can help you in this case, step-by-step. it's sample is exactly a one-to-many relation.
And according to this, you will have several tables with some similar columns. You can manage this, by Materialized View or batch-log(as alternative approach).

Related

Prisma Model with unique one to many

A Product can have various sizes so I added one to many relation. But Each product can have only 1 unique size, i.e, in many to many the sizes should not repeat.
For instance, Product X can have size 1, 2, 3 but if user tries to add size 2 which already exists, it should fail (and it does) but now the issue is when Product Y is added, adding size 1 fails because it was unique but It should be unique per variant not overall. Any way to do this while modelling the DB or I have to add a manual check and throw error if the size of the same variant already exists.
Database - Postgresql
Using Prisma
In Prisma, you can model your relations like this:
model Product {
id String #id #default(uuid())
name String
description String
sizes ProductSize[]
}
model ProductSize {
id String #id #default(uuid())
name String
description String
product Product #relation(fields: [productId], references: [id])
productId String
##unique([productId, name])
}
This contains a unique index over a combination of two fields (productId and name). So for each product there can only be a unique named size.
Example - (productId - A, Size - 1), (productId - A, Size - 2). Adding another record with (productId - A, Size - 1) would throw an error but (productId - B, Size - 1) will be allowed.
You can create unique indexes using multiple fields in postgres. https://www.postgresqltutorial.com/postgresql-indexes/postgresql-unique-index/ scroll down a bit for the multiple fields section.
Without knowing your existing table structure I can't say with confidence that this code would work but you'll want something like this.
CREATE UNIQUE INDEX idx_products_size ON products(id, size);

PostgreSQL array of data composite update element using where condition

I have a composite type:
CREATE TYPE mydata_t AS
(
user_id integer,
value character(4)
);
Also, I have a table, uses this composite type as an array of mydata_t.
CREATE TABLE tbl
(
id serial NOT NULL,
data_list mydata_t[],
PRIMARY KEY (id)
);
Here I want to update the mydata_t in data_list, where mydata_t.user_id is 100000
But I don't know which array element's user_id is equal to 100000
So I have to make a search first to find the element where its user_id is equal to 100000 ... that's my problem ... I don't know how to make the query .... in fact, I want to update the value of the array element, where it's user_id is equal to 100000 (Also where the id of tbl is for example 1) ... What will be my query?
Something like this (I know it's wrong !!!)
UPDATE "tbl" SET "data_list"[i]."value"='YYYY'
WHERE "id"=1 AND EXISTS (SELECT ROW_NUMBER() OVER() AS i
FROM unnest("data_list") "d" WHERE "d"."user_id"=10000 LIMIT 1)
For example, this is my tbl data:
Row1 => id = 1, data = ARRAY[ROW(5,'YYYY'),ROW(6,'YYYY')]
Row2 => id = 2, data = ARRAY[ROW(10,'YYYY'),ROW(11,'YYYY')]
Now i want to update tbl where id is 2 and set the value of one of the tbl.data elements to 'XXXX' where the user_id of element is equal to 11
In fact, the final result of Row2 will be this:
Row2 => id = 2, data = ARRAY[ROW(10,'YYYY'),ROW(11,'XXXX')]
If you know the value value, you can use the array_replace() function to make the change:
UPDATE tbl
SET data_list = array_replace(data_list, (11, 'YYYY')::mydata_t, (11, 'XXXX')::mydata_t)
WHERE id = 2
If you do not know the value value then the situation becomes more complex:
UPDATE tbl SET data_list = data_arr
FROM (
-- UPDATE doesn't allow aggregate functions so aggregate here
SELECT array_agg(new_data) AS data_arr
FROM (
-- For the id value, get the data_list values that are NOT modified
SELECT (user_id, value)::mydata_t AS new_data
FROM tbl, unnest(data_list)
WHERE id = 2 AND user_id != 11
UNION
-- Add the values to update
VALUES ((11, 'XXXX')::mydata_t)
) x
) y
WHERE id = 2
You should keep in mind, though, that there is an awful lot of work going on in the background that cannot be optimised. The array of mydata_t values has to be examined from start to finish and you cannot use an index on this. Furthermore, updates actually insert a new row in the underlying file on disk and if your array has more than a few entries this will involve substantial work. This gets even more problematic when your arrays are larger than the pagesize of your PostgreSQL server, typically 8kB. All behind the scene so it will work, but at a performance penalty. Even though array_replace sounds like changes are made in-place (and they indeed are in memory), the UPDATE command will write a completely new tuple to disk. So if you have 4,000 array elements that means that at least 40kB of data will have to be read (8 bytes for the mydata_t type on a typical system x 4,000 = 32kB in a TOAST file, plus the main page of the table, 8kB) and then written to disk after the update. A real performance killer.
As #klin pointed out, this design may be more trouble than it is worth. Should you make data_list as table (as I would do), the update query becomes:
UPDATE data_list SET value = 'XXXX'
WHERE id = 2 AND user_id = 11
This will have MUCH better performance, especially if you add the appropriate indexes. You could then still create a view to publish the data in an aggregated form with a custom type if your business logic so requires.

Create column with aggregated value with calculation in PBI

Imagine you have two tables:
Table User:
ID, Name
Table Orders:
ID, UserID
I'm trying to create a new column in table User which should contain aggregated values of distinct count of Order.IDs.
Calculated column:
OrderCount = CALCULATE(DISTINCTCOUNT(Orders[Id]))
Alternatively if you don't/can't have a relationship between the two tables:
OrderCount2 = CALCULATE(DISTINCTCOUNT(Orders[Id]),FILTER(Orders, Orders[UserId] = User[Id]))
If all you need is to display it in some visualisation, you can use Orders[Id] directly by setting the aggregate option to Count (Distinct) in Values under Visualizations side pane.

updatexml for particular rows only

Context: I want to increase the allowance value of some employees from £1875 to £7500, and update their balance to be £7500 minus whatever they have currently used.
My Update statement works for one employee at a time, but I need to update around 200 records, out of a table containing about 6000.
I am struggling to workout how to modify the below to update more than one record, but only the 200 records I need to update.
UPDATE employeeaccounts
SET xml = To_clob(Updatexml(Xmltype(xml),
'/EmployeeAccount/CurrentAllowance/text()',187500,
'/EmployeeAccount/AllowanceBalance/text()',
750000 - (SELECT Extractvalue(Xmltype(xml),
'/EmployeeAccount/AllowanceBalance',
'xmlns:ts=\"http://schemas.com/\", xmlns:xt=\"http://schemas.com\"'
)
FROM employeeaccounts
WHERE id = '123456')))
WHERE id = '123456'
Example of xml column (stored as clob) that I want to update. Table has column ID that hold PK of employees ID EG 123456
<EmployeeAccount>
<LastUpdated>2016-06-03T09:26:38+01:00</LastUpdated>
<MajorVersion>1</MajorVersion>
<MinorVersion>2</MinorVersion>
<EmployeeID>123456</EmployeeID>
<CurrencyID>GBP</CurrencyID>
<CurrentAllowance>187500</CurrentAllowance>
<AllowanceBalance>100000</AllowanceBalance>
<EarnedDiscount>0.0</EarnedDiscount>
<NormalDiscount>0.0</NormalDiscount>
<AccountCreditLimit>0</AccountCreditLimit>
<AccountBalance>0</AccountBalance>
</EmployeeAccount>
You don't need a subquery to get the old balance, you can use the value from the current row; which means you don't need to correlate that subquery and can just use an in() in the main statement:
UPDATE employeeaccounts
SET xml = To_clob(Updatexml(Xmltype(xml),
'/EmployeeAccount/CurrentAllowance/text()',187500,
'/EmployeeAccount/AllowanceBalance/text()',
750000 - Extractvalue(Xmltype(xml),
'/EmployeeAccount/AllowanceBalance',
'xmlns:ts=\"http://schemas.com/\", xmlns:xt=\"http://schemas.com\"')
))
WHERE id in (123456, 654321, ...);

Cassandra: Update multiple rows?

In mysql, I can do the following query, UPDATE mytable SET price = '100' WHERE manufacturer='22'
Can this be done with cassandra?
Thanks!
For that you will need to maintain your own index in a separate column family (you need a way to find all products (product ids) that are manufactured by a certatin manufacturer. When you have all the product ids doing a batch mutation is simple).
E.g
ManufacturerToProducts = { // this is a ColumnFamily
'22': { // this is the key to this Row inside the CF
// now we have an infinite # of columns in this row
'prod_1': "prod_1", //for simplicity I'm only showing the value (but in reality the values in the map are the entire Column.)
'prod_1911' : null
}, // end row
'23': { // this is the key to another row in the CF
// now we have another infinite # of columns in this row
'prod_1492': null,
},
}
(disclaimer: thanks Arin for the annotation used above)