How to model couchbase schema to store app specific metadata? - app-store

How to model couchbase schema to store app specific metadata?
Similar to 42matters
We are planning to construct the following structure to query the documents based on bundleid and value.
{
"doc-type": "App-Metadata", ----> (From App Store API)
"bundleid": "com.whatsapp",
"value": {
}
}{
"doc-type": "App-Looklike", ----> (From App Store API)
"bundleid": "com.whatsapp",
"value": {
}
}{
"doc-type": "Internal-Metadata", ----> (From MyApp)
"bundleId": "com.whatsapp",
"value": {
}
}
Is there any better schema to model beyond this?

I don't see anything specific wrong with that schema. You might want to call the field you use to specify the type of each document, "type". That's the usual practice.

Related

MongoDB design speculation

I am digging the MongoDB related questions/answers but one thing is still not obvious.
Let's consider the following Product collection:
{
"manufacturer": "Man1",
"model": "Model1"
}
Let's say we have 1.000.000 products, and I would like to create
a dropdown of manufacturers (which would be max 50 options).
In this case every time I have to use the .distinct() function on that huge product collection.
Is this the right way to do it ?
I am a bit concerned about the performance.
Or should I create a separate collection for manufacturers and keep it synced ?
UPDATE
Thanks for all the answers, I still considering them.
And what if I do the following:
Manufacturer:
{
"name": "Man1",
"models": [
{
"name": "Model1",
"products": [Product1, Product2]
}
],
}
and Product
{
"manufcturer": "Man1",
"model": "Model1"
"manufacturer_id": Manufacturer1,
"model_id", Model1
}
First. If you've large number of records, you'd never want to load all the data in just one request to populate the list, dropdown or whatever it is. Rather, implementing something like load more options suits more. Just like pagination.
And you can manage to get like 20,40 records per request and do any optimization on those small chunk of data.
You can create a separate collection for manufacturers. You just have to keep it in sync, after every addition/update/deletion of product from products collection.
I think you can think of designing your Product collections like this:
{
"manufacturer": "Man1",
"model": "Model1"
"Product" :[Product1,Product2]
}
And having an index on "manufacturer" will optimize your query to get list of manufacturers

Inserting multiple key value pair data under single _id in cloudant db at various timings?

My requirement is to get json pair from mqtt subscriber at different timings under single_id in cloudant, but I'm facing error while trying to insert new json pair in existing _id, it simply replace old one. I need at least 10 json pair under one _id. Injecting at different timings.
First, you should make sure about your architectural decision to update a particular document multiple times. In general, this is discouraged, though it depends on your application. Instead, you could consider a way to insert each new piece of information as a separate document and then use a map-reduce view to reflect the state of your application.
For example (I'm going to assume that you have multiple "devices", each with some kind of unique identifier, that need to add data to a cloudant DB)
PUT
{
"info_a":"data a",
"device_id":123
}
{
"info_b":"data b",
"device_id":123
}
{
"info_a":"message a"
"device_id":1234
}
Then you'll need a map function like
_design/device/_view/state
{
function (doc) {
emit(doc.device_id, 1);
}
Then you can GET the results of that view to see all of the "info_X" data that is associated with the particular device.
GET account.cloudant.com/databasename/_design/device/_view/state
{"total_rows":3,"offset":0,"rows":[
{"id":"28324b34907981ba972937f53113ac3f","key":123,"value":1},
{"id":"d50553d206d722b960fb176f11841974","key":123,"value":1},
{"id":"eaa710a5fa1ff4ba6156c997ddf6099b","key":1234,"value":1}
]}
Then you can use the query parameters to control the output, for example
GET account.cloudant.com/databasename/_design/device/_view/state?key=123&include_docs=true
{"total_rows":3,"offset":0,"rows":[
{"id":"28324b34907981ba972937f53113ac3f","key":123,"value":1,"doc":
{"_id":"28324b34907981ba972937f53113ac3f",
"_rev":"1-bac5dd92a502cb984ea4db65eb41feec",
"info_b":"data b",
"device_id":123}
},
{"id":"d50553d206d722b960fb176f11841974","key":123,"value":1,"doc":
{"_id":"d50553d206d722b960fb176f11841974",
"_rev":"1-a2a6fea8704dfc0a0d26c3a7500ccc10",
"info_a":"data a",
"device_id":123}}
]}
And now you have the complete state for device_id:123.
Timing
Another issue is the rate at which you're updating your documents.
Bottom line recommendation is that if you are only updating the document once per ~minute or less frequently, then it could be reasonable for your application to update a single document. That is, you'd add new key-value pairs to the same document with the same _id value. In order to do that, however, you'll need to GET the full doc, add the new key-value pair, and then PUT that document back to the database. You must make sure that your are providing the most recent _rev of that document and you should also check for conflicts that could occur if the document is being updated by multiple devices.
If you are acquiring new data for a particular device at a high rate, you'll likely run into conflicts very frequently -- because cloudant is a distributed document store. In this case, you should follow something like the example I gave above.
Example flow for the second approach outlined by #gadamcox for use cases where document updates are not required very frequently:
[...] you'd add new key-value pairs to the same document with the same _id value. In order to do that, however, you'll need to GET the full doc, add the new key-value pair, and then PUT that document back to the database.
Your application first fetches the existing document by id: (https://docs.cloudant.com/document.html#read)
GET /$DATABASE/100
{
"_id": "100",
"_rev": "1-2902191555...",
"No": ["1"]
}
Then your application updates the document in memory
{
"_id": "100",
"_rev": "1-2902191555...",
"No": ["1","2"]
}
and saves it in the database by specifying the _id and _rev (https://docs.cloudant.com/document.html#update)
PUT /$DATABASE/100
{
"_id": "100",
"_rev": "1-2902191555...",
"No":["1","2"]
}

Mongoose: How to handle versioning?

Today I came to know about versioning concept in MongooseJS(here Mongoose v3 part 1 :: Versioning). But I have question here, I like the versioning feature of Mongoose, but what should I do when my schema changes ?
For example, initially my schema looks like,
{
"_id": String,
"title": String,
"description": String
}
Since I didn't know about versioning, i didn't add any versionKey option, just used the default versionKey, __v.
I created few documents with this above schema. Later I modified the schema as,
{
"_id": String,
"title": String,
"description": String,
"comments": Array
}
Here comes the problem, If I create any new document after this schema change I could able to add/push comments to the document.
But If I want to add/push comments to the document which were created with initial schema, I couldn't able to do, it throws Versioning Error: No matching document found.
Is there anyway to overcome this problem without disabling or skipping the versioning ?

Best way to represent multilingual database on mongodb

I have a MySQL database to support a multilingual website where the data is represented as the following:
table1
id
is_active
created
table1_lang
table1_id
name
surname
address
What's the best way to achieve the same on mongo database?
You can either design a schema where you can reference or embed documents. Let's look at the first option of embedded documents. With you above application, you might store the information in a document as follows:
// db.table1 schema
{
"_id": 3, // table1_id
"is_active": true,
"created": ISODate("2015-04-07T16:00:30.798Z"),
"lang": [
{
"name": "foo",
"surname": "bar",
"address": "xxx"
},
{
"name": "abc",
"surname": "def",
"address": "xyz"
}
]
}
In the example schema above, you would have essentially embedded the table1_lang information within the main table1document. This design has its merits, one of them being data locality. Since MongoDB stores data contiguously on disk, putting all the data you need in one document ensures that the spinning disks will take less time to seek to a particular location on the disk. If your application frequently accesses table1 information along with the table1_lang data then you'll almost certainly want to go the embedded route. The other advantage with embedded documents is the atomicity and isolation in writing data. To illustrate this, say you want to remove a document which has a lang key "name" with value "foo", this can be done with one single (atomic) operation:
db.table.remove({"lang.name": "foo"});
For more details on data modelling in MongoDB, please read the docs Data Modeling Introduction, specifically Model One-to-Many Relationships with Embedded Documents
The other design option is referencing documents where you follow a normalized schema. For example:
// db.table1 schema
{
"_id": 3
"is_active": true
"created": ISODate("2015-04-07T16:00:30.798Z")
}
// db.table1_lang schema
/*
1
*/
{
"_id": 1,
"table1_id": 3,
"name": "foo",
"surname": "bar",
"address": "xxx"
}
/*
2
*/
{
"_id": 2,
"table1_id": 3,
"name": "abc",
"surname": "def",
"address": "xyz"
}
The above approach gives increased flexibility in performing queries. For instance, to retrieve all child table1_lang documents for the main parent entity table1 with id 3 will be straightforward, simply create a query against the collection table1_lang:
db.table1_lang.find({"table1_id": 3});
The above normalized schema using document reference approach also has an advantage when you have one-to-many relationships with very unpredictable arity. If you have hundreds or thousands of table_lang documents per give table entity, embedding has so many setbacks in as far as spacial constraints are concerned because the larger the document, the more RAM it uses and MongoDB documents have a hard size limit of 16MB.
The general rule of thumb is that if your application's query pattern is well-known and data tends to be accessed only in one way, an embedded approach works well. If your application queries data in many ways or you unable to anticipate the data query patterns, a more normalized document referencing model will be appropriate for such case.
Ref:
MongoDB Applied Design Patterns: Practical Use Cases with the Leading NoSQL Database By Rick Copeland

CouchBase Member Login

Could anyone provide me with a great explination or tutorial around a user management couchbase database.
With SQL login would be
SELECT * FROM users WHERE email = :email AND password = :password
How would this work in couchbase, assuming my user document looks like this
{
"uid": "3",
"email": "dummy#example.com",
"password": "6a1644c781989cb3f47c8a38a0e75c6c",
"name" : "John Doe",
"jsonType" : "user"
}
My View at the moment is
function (doc, meta) {
if(doc.jsonType == "user"){
emit([doc.email, doc.password], [doc]);
}
}
And I can goto
http://localhost:8092/default/_design/dev_users/_view/login?stale=false&connection_timeout=60000&limit=10&skip=0&key=[%22dummy#example.com%22,%226a1644c781989cb3f47c8a38a0e75c6c%22]
which returns
{
"total_rows":3,"rows":
[{
"id":"user::3",
"key":["dummy#example.com","6a1644c781989cb3f47c8a38a0e75c6c"],
"value":
[{
"uid":"3",
"email":"dummy#example.com",
"password":"6a1644c781989cb3f47c8a38a0e75c6c",
"name" : "John Doe",
"jsonType" : "user"
}]
}]
}
I would just like advice on if I am approaching this the correct way. I am very new to couchbase. I have googled this but can't seem to find exactly what I'm looking for. Any help appreciated to get me off to a good start.
In noSQL(key/value) you have to pay special attention to the KEY structure design. In your case you should identify what is your loginID? Is it "email" field? If so, you can create a key based on that. e.g.
KEY: "dummy#example.com"
VALUE:
So then when user enters user id/password you can simply call one GET operation from the couchbase.
If no such loginID exists, you will not get any json value.
If loginID exists, then you will get back json document that you can use to check password and also (in case password matches) populate user specific data from that json in your login session.
NOTE: I assume you are not storing clear passwords in json, but instead using password digest with salt.
So no view functionality required here.
As for views usage, I highly recommend reading though "Basic Couchbase querying for SQL people"
also read on Creating an e-commerce platform with Couchbase 2.0
I would suggest not including the whole document in the index - i.e. change your emit to something like:
emit([doc.email, doc.password])
This will minimise the size of (and hence time taken to operate on) the index. If you later need the actual doc contents you can use a normal get operation to fetch it, using the id field of the query row. Some of the SDKs provide a method to perform this for you, for example setIncludeDocs() in the Java SDK.
Other than that this looks pretty reasonable. A detailed overview of Views is included in the Couchbase developer guide. Another good resource for complete example applications (above the standard getting started tutorials) is: http://couchbasemodels.com