Cassandra/NoSQL newbie: the right way to model?

Cassandra/NoSQL newbie: the right way to model? - nosql

as the title says I am fairly (read: completely) new to NoSQL DBS such as Cassandra. As many others, I learned RMDBS before. So I did a little reading on 'WTF is a super column' and other obvious google hits, but I am still not sure how to model this:
Say I want to save Users, as in username/password/name/etc... what if that user has like, a mobile phone and a landline telephone? is this the 'right' way to do it? (using the same abbreviated JSON style as seen on other sites)
Users: { // <-- this is the Users SuperColumnFamily, keyed by username
myuser: { // <-- this is a User SuperColumn
username = "myuser", // <-- this is the username Column
email = "myuser#googlemail.com",
...
},
...
}
Phone: { // <-- this is where the users phone numbers are stored
myuser: {
mobile = "0129386835235",
landline = "123876912384",
},
...
}
opinions/corrections please

First things first, don't use super columns. See:
http://www.quora.com/Cassandra-database/Why-is-it-bad-to-use-supercolumns-in-Cassandra
Now to answer your question. The example you described is easily modeled with just a regular column family:
Users: { <- This is the name of the column family
username1: { <- this is a row key within the column family, it is one of your usernames
email: user#email.com <- these are all of the columns within this row, they correspond to attributes for this user
mobile: ...
landline: ...
}
username2: { <- another row for a different user
email: diff#email.com
}
}
You can see the flexible schema above in that each row has a different set of columns for describing that user.
For more info on the cassandra data model I would recommend reading over http://www.datastax.com/docs/1.0/ddl/index

Related

DynamoDB - How to upsert nested objects with updateItem

Hi I am newbie to dynamoDB. Below is the schema of the dynamo table
{
"user_id":1, // partition key
"dob":"1991-09-12", // sort key
"movies_watched":{
"1":{
"movie_name":"twilight",
"movie_released_year":"1990",
"movie_genre":"action"
},
"2":{
"movie_name":"harry potter",
"movie_released_year":"1996",
"movie_genre":"action"
},
"3":{
"movie_name":"lalaland",
"movie_released_year":"1998",
"movie_genre":"action"
},
"4":{
"movie_name":"serendipity",
"movie_released_year":"1999",
"movie_genre":"action"
}
}
..... 6 more attributes
}
I want to insert a new item if the item(that user id with dob) did not exist, otherwise add the movies to existing movies_watched map by checking if the movie is not already available the movies_watched map .
Currently, I am trying to use update(params) method.
Below is my approach:
function getInsertQuery (item) {
const exp = {
UpdateExpression: 'set',
ExpressionAttributeNames: {},
ExpressionAttributeValues: {}
}
Object.entries(item).forEach(([key, item]) => {
if (key !== 'user_id' && key !== 'dob' && key !== 'movies_watched') {
exp.UpdateExpression += ` #${key} = :${key},`
exp.ExpressionAttributeNames[`#${key}`] = key
exp.ExpressionAttributeValues[`:${key}`] = item
}
})
let i = 0
Object.entries(item. movies_watched).forEach(([key, item]) => {
exp.UpdateExpression += ` movies_watched.#uniqueID${i} = :uniqueID${i},`
exp.ExpressionAttributeNames[`#uniqueID${i}`] = key
exp.ExpressionAttributeValues[`:uniqueID${i}`] = item
i++
})
exp.UpdateExpression = exp.UpdateExpression.slice(0, -1)
return exp
}
The above method just creates update expression with expression names and values for all top level attributes as well as nested attributes (with document path).
It works well if the item is already available by updating movies_watched map. But throws exception if the item is not available and while inserting. Below is exception:
The document path provided in the update expression is invalid for update
However, I am still not sure how to check for duplicate movies in movies_watched map
Could someone guide me in right direction, any help is highly appreciated!
Thanks in advance

There is no way to do this, given your model, without reading an item from DDB before an update (at that point the process is trivial). If you don't want to impose this additional read capacity on your table for update, then you would need to re-design your data model:
You can change movies_watched to be a Set and hold references to movies. Caveat is that Set can contain only Numbers or Strings, thus you would have movie id or name or keep the data but as JSON Strings in your Set and then parse it back into JSON on read. With SET you can perform ADD operation on the movies_watched attribute. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.UpdateExpressions.html#Expressions.UpdateExpressions.ADD
You can go with single table design approach and have these movies watched as separate items with (PK:userId and SK:movie_id). To get a user you would perform a query and specify only PK=userId -> you will get a collection where one item is your user record and others are movies_watched. If you are new to DynamoDB and are learning the ropes, then I would suggest go with this approach. https://www.alexdebrie.com/posts/dynamodb-single-table/

Mongodb changing the unique key

I have made the users email the unique key for my entire users database:
var usersSchema = new Schema({
_id: String, // Users Unique Email address
name: String, // Users name
phone: String, // Users phone number
country: String, // Country
type: String, // tenant/agent/manager/admin
username: String, // Username for login
password: String, // Password string
trello: Object, // Trello auth settings
settings: Object, // Settings for manager and other things
createDate: Number, // Date user was created
contactDate: Number, // Date user was last contacted
activityDate: Number // Date of last activity on this user (update/log/etc)
});
So what if the user changes email address?
Is my only way to delete the record and create it again?
Or is there a smarter way?
And the users._id (email) have relations in 16 other tables.
Example the booking table
var bookingSchema = new Schema({
_id: String, // Unique booking ID
user: String, // User ID --> users._id
property: String, // Property ID --> property._id
checkin: Number, // Check in Date
checkout: Number // Check out Date
});
One user can have a LOT of bookings
What I would do is find all records that matches the email and then do a for (i=1 ; i<booking.length ; i++) and then update the email of each record
Is there a smarter way to update all emails that matches using only one mongo call?
(the reason is there are so many relations, so my loop seems a bit like a very primitive way of doing it)

I would say it's much cleaner to create a field for email and create an Unique Index on that.
Unfortunately still the relationship as the ones inside the Relational databases isn't supported! There are plans according to the latest talks to create this feature natively.
The best solution for you would be to think how to use the sub-documents to make things more consistent.

Grails 3 & relational domain mapping with Mongo

Trying to figure out if there is a way that I can relate these two domain objects in a similar way that I would if I were connected to an Oracle db.
gradle.properties
grailsVersion=3.2.9
gradleWrapperVersion=2.9
gormVersion=6.1.3.RELEASE
build.gradle
compile "org.grails.plugins:mongodb:6.1.3"
compile "org.mongodb:mongodb-driver:3.4.2"
Domain objects:
class Store {
Long id
// other properties
Long sellerId
}
class Seller {
Long id
// other properties
}
I thought to do something like this:
class Store {
Long id
// other properties
Long sellerId
Seller seller
Seller getSeller {
Seller.findById(this.sellerId)
}
}
In the case above, only sellerId is persisted to Mongo since it is not marked as embedded. This works great if I reference it in grails code - giving me valid values for all of the properties in store.seller. However, if I return a store from a controller, store.seller does not come through fully. The response JSON for store looks like this (notice how seller ONLY has the id property):
{
id: 1,
// other properties
seller: {
id: 22
}
}
I have also tried something like this but afterLoad never gets hit:
class Store {
Long id
// other properties
Long sellerId
Seller seller
def afterLoad() {
seller = Seller.findById(this.sellerId)
}
}
Is there a better way to go about doing this?

Querying in Firebase by child of child

I have a structure of objects in Firebase looking like this:
-KBP27k4iOTT2m873xSE
categories
Geography: true
Oceania: true
correctanswer: "Yaren (de facto)"
languages: "English"
question: "Nauru"
questiontype: "Text"
wronganswer1: "Majuro"
wronganswer2: "Mata-Utu"
wronganswer3: "Suva"
I'm trying to find objects by categories, so for instance I want all objects which has the category set to "Oceania".
I'm using Swift and I can't really seem to grasp the concept of how to query the data.
My query right now looks like this:
ref.queryEqualToValue("", childKey: "categories").queryOrderedByChild("Oceania")
Where ref is the reference to Firebase in that specific path.
However whatever I've tried I keep getting ALL data returned instead of the objects with category Oceania only.
My data is structured like this: baseurl/questions/
As you can see in the object example one question can have multiple categories added, so from what I've understood it's best to have a reference to the categories inside your objects.
I could change my structure to baseurl/questions/oceania/uniqueids/, but then I would get multiple entries covering the same data, but with different uniqueid, because the question would be present under both the categories oceania and geography.
By using the structure baseurl/questions/oceania/ and baseurl/questions/geography I could also just add unique ids under oceania and geography that points to a specific unique id inside baseurl/questions/uniqueids instead, but that would mean I'd have to keep track of a lot of references. Making a relations table so to speak.
I wonder if that's the way to go or? Should I restructure my data? The app isn't in production yet, so it's possible to restructure the data completely with no bigger consequences, other than I'd have to rewrite my code, that pushes data to Firebase.
Let me know, if all of this doesn't make sense and sorry for the wall of text :-)

Adding some additional code to Tim's answer for future reference.
Just use a deep query. The parent object key is not what is queried so it's 'ignored'. It doesn't matter whether it's a key generated by autoId or a dinosaur name - the query is on the child objects and the parent (key) is returned in snapshot.key.
Based on your Firebase structure, this will retrieve each child nodes where Oceania is true, one at a time:
let questionsRef = Firebase(url:"https://baseurl/questions")
questionsRef.queryOrderedByChild("categories/Oceania").queryEqualToValue(true)
.observeEventType(.ChildAdded, withBlock: { snapshot in
print(snapshot)
})
Edit: A question came up about loading all of the values at once (.value) instead of one at at time (.childAdded)
let questionsRef = Firebase(url:"https://baseurl/questions")
questionsRef.queryOrderedByChild("categories/Oceania").queryEqualToValue(true)
.observeSingleEventOfType(.Value, withBlock: { snapshot in
print(snapshot)
})
Results in (my Firebase structure is a little different but you get the idea) uid_1 did not have Oceania = true so it was omitted from the query
results.
Snap (users) {
"uid_0" = {
categories = {
Oceania = 1;
};
email = "dude#thing.com";
"first_name" = Bill;
};
"uid_2" = {
categories = {
Oceania = 1;
};
"first_name" = Peter;
};
}

I think this should work:
ref.queryOrderedByChild("categories/Oceania").queryEqualToValue(true)

count multiple relations results with a single Parse query

I'm having a very simple setup with _User entity having a likes Relation with itself (reflective).
A common use case is list users.
I'm listing very few users (ex: 15), but i would also like to display the amount of likes he has.
Following standard suggested technique from Parse.com that would require a query for each of the 15 _User(s).
I don't think this is acceptable, maybe 2 queries are enough:
first one getting the first 15 _User(s)
second one getting the amount of likes each of the _User haves
But I have no idea if that's even possible with Parse API, so I'm asking for help ;)

If the column is a relation, then yes, getting the count will require a query per user.
If you expect the number of likes per user to be low (<100 is my semi-arbitrary rule of thumb), you could instead model likes as an array of pointers.
With that, you can know the count just by having the record in hand (i.e. someUser.get("likes").length). Even better, query include will eagerly fetch the related users...
userQuery.include("likes");
userQuery.find().then(function(users) {
if (users.length) {
var someUser = users[0];
var likes = someUser.get("likes");
if (likes.length) { // see, we can get the count without query
var firstLike = likes[0]; // we can even get those other users!
var firstLikeEmail = firstLike.get("email");
}
}
});
Otherwise, using relations, you're stuck with another query...
userQuery.find().then(function(users) {
if (users.length) {
var someUser = users[0];
var likes = someUser.get("likes");
return likes.query().count();
} else {
return 0;
}
}).then(function(count) {
console.log("the first user has " + count + " likes");
});

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse