DynamoDB PartiQL Query For Specific Map Element From List - select

I have a DynamoDB with data that looks like this:
{
"userId": {
"S": "someuserID"
},
"listOfMaps": {
"L": [
{
"M": {
"neededVal": {
"S": "ThisIsNeeded1"
},
"id": {
"S": "1"
}
}
},
{
"M": {
"neededVal": {
"S": "ThisIsNeeded2"
},
"id": {
"S": "2"
}
}
},
...
]
},
"userName": {
"S": "someuserName"
}
}
The listOfMaps can contain more than just 2 maps, as is denoted by the ellipsis. Is there a PartiQL query I can put together that will let me get the neededVal based on the userId and the id of the item in the map itself?
I know that I can query for the n-th item's neededVal like this:
SELECT "listOfMaps"[N]."neededVal"
FROM "table-name"
WHERE "userId" = 'someuserID'
But is it possible to make it do something like this:
SELECT "listOfMaps"."neededVal"
FROM "table-name"
WHERE "userId" = 'someuserID' AND "listOfMaps"."id" = '4'

It looks like you're modeling the one-to-many relationship using a complex attribute (e.g. a list of objects). This is a completely valid approach to modeling one-to-many relationships and is best used when 1) the results data doesn't change (or don't change often) and 2) you don't have any access patterns around the data within the complex attribute.
However, since you do want to perform searches based on data within the complex attribute, you'd be better off modeling the data differently.
For example, you might consider modeling results in the user partition with a PK=user_id SK=neededVal#. This would allow you to fetch items by User ID (QUERY where PK=USER#user_id SK begins_with neededVal#).
I don't know the specifics around your access patterns, but can say that you'll need to move results into their own items if you want to support access patterns around the data within your complex attribute.

Related

Update Array Children Sorted Order

I have a collection containing objects with the following structure
{
"dep_id": "some_id",
"departament": "dep name",
"employees": [{
"name": "emp1",
"age": 31
},{
"name": "emp2",
"age": 35
}]
}
I would like to sort and save the array of employees for the object with id "some_id", by employees.age, descending. The best outcome would be to do this atomically using mongodb's query language. Is this possible?
If not, how can I rearrange the subdocuments without affecting the parent's other data or the data of the subdocuments? In case I have to download the data from the database and save back the sorted array of children, what would happen if something else performs an update to one of the children or children are added or removed in the meantime?
In the end, the data should be persisted to the database like this:
{
"dep_id": "some_id",
"departament": "dep name",
"employees": [{
"name": "emp2",
"age": 35
},{
"name": "emp1",
"age": 31
}]
}
The best way to do this is to actually apply the $sort modifier as you add items to the array. As you say in your comment "My actual objects have a "rank" and 'created_at'", which means that you really should have asked that in your question instead of writing a "contrived" case ( don't know why people do that ).
So for "sorting" by multiple properties, the following reference would adjust like this:
db.collection.update(
{ },
{ "$push": { "employees": { "$each": [], "$sort": { "rank": -1, "created_at": -1 } } } },
{ "multi": true }
)
But to update all the data you presently have "as is shown in the question", then you would sort on "age" with:
db.collection.update(
{ },
{ "$push": { "employees": { "$each": [], "$sort": { "age": -1 } } } },
{ "multi": true }
)
Which oddly uses $push to actually "modify" an array? Yes it's true, since the $each modifier says we are not actually adding anything new yet the $sort modifier is actually going to apply to the array in place and "re-order" it.
Of course this would then explain how "new" updates to the array should be written in order to apply that $sort and ensure that the "largest age" is always "first" in the array:
db.collection.update(
{ "dep_id": "some_id" },
{ "$push": {
"employees": {
"$each": [{ "name": "emp": 3, "age": 32 }],
"$sort": { "age": -1 }
}
}}
)
So what happens here is as you add the new entry to the array on update, the $sort modifier is applied and re-positions the new element between the two existing ones since that is where it would sort to.
This is a common pattern with MongoDB and is typically used in combination with the $slice modifier in order to keep arrays at a "maximum" length as new items are added, yet retain "ordered" results. And quite often "ranking" is the exact usage.
So overall, you can actually "update" your existing data and re-order it with "one simple atomic statement". No looping or collection renaming required. Furthermore, you now have a simple atomic method to "update" the data and maintain that order as you add new array items, or remove them.
In order to get what you want you can use the following query:
db.collection.aggregate({
$unwind: "$employees" // flatten employees array
}, {
$sort: {
"employees.name": -1 // sort all documents by employee name (descending)
}
}, {
$group: { // restore the previous structure
_id: "$_id",
"dep_id": {
$first: "$dep_id"
},
"departament": {
$first: "$departament"
},
"employees": {
$push: "$employees"
},
}
}, {
$out: "output" // write everything out to a separate collection
})
After this step you would want to drop your source table and rename the "output" collection to match your source table name.
This solution will, however, not deal with the concurrency issue. So you should remove write access from the collection first so nobody modifies it during the process and then restore it once you're done with the migration.
You could alternatively query all data first, then sort the employees array on the client side and then use either single update queries or - faster but more complicated - a bulk write operation with all the individual update calls in order to update the existing documents. Here, you could use the entire document that you've initially read as a filter for the update operation. So if an individual update does not modify any document you'd know straight away, that some other change must have modified the document you read before. Those cases you'd need to retry later (or straight away until the update does actually modify a document).

Should I use selector or views in Cloudant?

I'm having confusion about whether to use selector or views, or both, when try to get a result from the following scenario:
I need to do a wildsearch for a book and return the result of the books plus the price and the details of the store branch name.
So I tried using selector to do wildsearch using regex
"selector": {
"_id": {
"$gt": null
},
"type":"product",
"product_name": {
"$regex":"(?i)"+search
}
},
"fields": [
"_id",
"_rev",
"product_name"
]
I am able to get the result. The idea after getting the result is to use all the _id's from the result set and query to views to get more details like price and store branch name on other documents, which I feel is kind of odd and I'm not certain is that the correct way to do it.
Below is just the idea once I get the result of _id's and insert it as a "productId" variable.
var input = {
method : 'GET',
returnedContentType : 'json',
path : 'test/_design/app/_view/find_price'+"?keys=[\""+productId+"\"]",
};
return WL.Server.invokeHttp(input);
so I'm asking for input from an expert regarding this.
Another question is how to get the store_branch_name? Can it be done in a single view where we can get the product detail, prices and store branch name? Or do I need to have several views to achieve this?
expected result
product_name (from book document) : Book 1
branch_name (from branch array in Store document) : store 1 branch one
price ( from relationship document) : 79.9
References:
Book
"_id": "book1",
"_rev": "1...b",
"product_name": "Book 1",
"type": "book"
"_id": "book2",
"_rev": "1...b",
"product_name": "Book 2 etc",
"type": "book"
relationship
"_id": "c...5",
"_rev": "3...",
"type": "relationship",
"product_id": "book1",
"store_branch_id": "Store1_branch1",
"price": "79.9"
Store
{
"_id": "store1",
"_rev": "1...2",
"store_name": "Store 1 Name",
"type": "stores",
"branch": [
{
"branch_id": "store1_branch1",
"branch_name": "store 1 branch one",
"address": {
"street": "some address",
"postalcode": "33490",
"type": "addresses"
},
"geolocation": {
"coordinates": [
42.34493,
-71.093232
],
"type": "point"
},
"type": "storebranch"
},
{
"branch_id": "store1_branch2",
"branch_name":
**details ommit...**
}
]
}
In Cloudant Query, you can specify two different kinds of indexes, and it's important to know the differences between the two.
For the first part of your question, if you're using Cloudant Query's $regex operator for wildcard searches like that, you might be better off creating a Cloudant Query index of type "text" instead of type "json". It's in the Cloudant docs, but see the intro blog post for details: https://cloudant.com/blog/cloudant-query-grows-up-to-handle-ad-hoc-queries/ There's a more advanced post on this that covers the tradeoffs between the two types of indexes https://cloudant.com/blog/mango-json-vs-text-indexes/
It's harder to address the second part of your question without understanding how your application interacts with your data, but there are a couple pieces of advice.
1) Consider denormalizing some of this information so you're not doing the JOINs to begin with.
2) Inject more logic into your document keys, and use the traditional MapReduce View indexing system to emit a compound key (an array), that you can use to emulate a JOIN by taking advantage of the CouchDB/Cloudant index sorting rules.
That second one's a mouthful, but check out this example on YouTube: https://youtu.be/0al1KnCKjlA?t=23m39s
Here's a preview (example map function) of what I'm talking about:
'map' : function(doc)
{
if (doc.type==="user") {
emit( [doc._id], null );
}
else if (doc.type==="edge:follower") {
emit( [doc.user, doc.follows], {"_id":doc.follows} );
}
}
The resulting secondary index here would take advantage of the rules outlined in http://wiki.apache.org/couchdb/View_collation -- that strings sort before arrays, and arrays sort before objects. You could then issue range queries to emulate the results you'd get with a JOIN.
I think that's as much detail that's appropriate for here. Hope it helps!

How to get (or aggregate) distinct keys of array in MongoDB

I'm trying to get MongoDB to aggregate for me over an array with different key-value pairs, without knowing keys (Just a simple sum would be ok.)
Example docs:
{data: [{a: 3}, {b: 7}]}
{data: [{a: 5}, {c: 12}, {f: 25}]}
{data: [{f: 1}]}
{data: []}
So basically each doc (or it's array really) can have 0 or many entries, and I don't know the keys for those objects, but I want to sum and average the values over those keys.
Right now I'm just loading a bunch of docs and doing it myself in Node, but I'd like to offload that work to MongoDB.
I know I can unwind those first, but how to proceed from there? How to sum/avg/min/max the values if I don't know the keys?
If you do not know the keys or cannot make a reasonable educated guess then you are basically stuck from going any further with the aggregation framework. You could supply "all of the keys" for consideration, but I supect your acutal data looks more like this:
{ "data": [{ "film": 10 }, { "televsion": 5 },{ "boardGames": 1 }] }
So there would be little point here findin out all the "key names" and then throwing that at an aggregation statement.
For the record though, "this is why you do not structure your data storage like this". Information like "film" here should not be used as a "key" name, because it is useful "data" that could be searched upon and most importantly "indexed" in a database system.
So your data should really look like this:
{
"data": [
{ "type": "film", "value": 10 },
{ "type": "televsion", "valule": 5 },
{ "type": "boardGames", "value": 1 }
]
}
Then the aggregation statement is simple, as are many other things:
db.collection.aggregate([
{ "$unwind": "$data" },
{ "$group": {
"_id": null,
"sum": { "$sum": "$data.value" },
"avg": { "$avg": "$data.value" }
}}
])
But since the key names are constantly changing in documents and do not have a uniform structure, then you need JavaScript processing on the server to traverse the keys, and that meand mapReduce:
db.collection.mapReduce(
function() {
this.data.forEach(function(data) {
Object.keys(data).forEach(function(key) {
emit(null,data[key]); // emit the value regardless of key name
});
});
},
function(key,values) {
return Array.sum(values); // Just summing for example
},
{ "out": { "inline": 1 } }
)
And of course the JavaScript execution here will work much more slowly than the native coded operators available to the aggregation framework.
So this should be an abject lesson as to why you don not use "data" as "key names" when storing data in a database. The aggregation framework works with standard structres and is fast, falling back to JavaScript processing is more flexible, but the cost is mostly in speed and other features.

Multi-language attributes in MongoDB

I'm trying to design a schema paradigm in MongoDB which would support multilingual values for variable attributes in documents.
For example, I would have a product catalog where each product may require storing its name, title or any other attribute in various languages.
This same paradigm should probably hold for other locale-specific properties, such as price/currency variations
I've been considering a key-value approach where key is the language code and value is the corresponding value:
{
sku: "1011",
name: { "en": "cheese", "de": "Käse", "es": "queso", etc... },
price: { "usd": 30.95, "eur": 20, "aud": 40, etc... }
}
The problem is I believe this would deny me of using indices on multilingual fields.
Eventually, I'd like a generic, yet intuitive, index-able design.
Any suggestion would be appreciated, thanks.
Wholesale recommendations over your schema design may be a bit broad a topic for discussion here. I can however suggest that you consider putting the elements you are showing into an Array of sub-documents, rather than the singular sub-document with fields for each item.
{
sku: "1011",
name: [{ "en": "cheese" }, {"de": "Käse"}, {"es": "queso"}, etc... ],
price: [{ "usd": 30.95 }, { "eur": 20 }, { "aud": 40 }, etc... ]
}
The main reason for this is consideration for access paths to your elements which should make things easier to query. This I went through in some detail here which may be worth your reading.
It could also be a possibility to expand on this for something like your name field:
name: [
{ "lang": "en", "value": "cheese" },
{ "lang": "de", "value: "Käse" },
{ "lang": "es", "value": "queso" },
etc...
]
All would depend on your indexing and access requirements. It all really depends on what exactly your application needs, and the beauty of MongoDB is that it allows you to structure your documents to your needs.
P.S As to anything where you are storing Money values, I suggest you do some reading and start maybe with this post here:
MongoDB - What about Decimal type of value?

How to store option data in mongo?

I have a site that I'm using Mongo on. So far everything is going well. I've got several fields that are static option data, for example a field for animal breeds and another field for animal registrars.
Breeds
Arabian
Quarter Horse
Saddlebred
Registrars
AQHA
American Arabians
There are maybe 5 or 6 different collections like this that range from 5-15 elements.
What is the best way to put these in Mongo? Right now, I've got a separate collection for each group. That is a breeds collection, a registrars collection etc.
Is that the best way, or would it make more sense to have a single static data collection with a "type" field specifying the option type?
Or something else completely different?
Since this data is static then it's better to just embed the data in documents. This way you don't have to do manual joins.
And also store it in a separate collection (one or several, doesn't matter, choose what's easier) to facilitate presentation (render combo-boxes, etc.)
I believe creating multiple collections has collection size implications? (something about MongoDB creating a collection file on disk as twice the size of the previous file [ db.0 = 64MB, db.1 = 128MB and so on)
Here's what I can think of:
1. Storing as single collection
The benefits here are:
You only need one call to Mongo to fetch, and if you can cache the call you quickly have the data.
You avoid duplication: create a single schema that deals with all your options. You can just nest suboptions if there are any.
Of course, you also avoid duplication in statics/methods to modify options.
I have something similar on a project that I'm working on. I have categories and subcategories all stored in one collection. Here's a JSON/BSON dump as example:
In all the data where I need to store my 'options' (station categories in my case) I simply use the _id.
{
"status": {
"code": 200
},
"response": {
"categories": [
{
"cat": "Taxi",
"_id": "50b92b585cf34cbc0f000004",
"subcat": []
},
{
"cat": "Bus",
"_id": "50b92b585cf34cbc0f000005",
"subcat": [
{
"cat": "Bus Rapid Transit",
"_id": "50b92b585cf34cbc0f00000b"
},
{
"cat": "Express Bus Service",
"_id": "50b92b585cf34cbc0f00000a"
},
{
"cat": "Public Transport Bus",
"_id": "50b92b585cf34cbc0f000009"
},
{
"cat": "Tour Bus",
"_id": "50b92b585cf34cbc0f000008"
},
{
"cat": "Shuttle Bus",
"_id": "50b92b585cf34cbc0f000007"
},
{
"cat": "Intercity Bus",
"_id": "50b92b585cf34cbc0f000006"
}
]
},
{
"cat": "Rail",
"_id": "50b92b585cf34cbc0f00000c",
"subcat": [
{
"cat": "Intercity Train",
"_id": "50b92b585cf34cbc0f000012"
},
{
"cat": "Rapid Transit/Subway",
"_id": "50b92b585cf34cbc0f000011"
},
{
"cat": "High-speed Rail",
"_id": "50b92b585cf34cbc0f000010"
},
{
"cat": "Express Train Service",
"_id": "50b92b585cf34cbc0f00000f"
},
{
"cat": "Passenger Train",
"_id": "50b92b585cf34cbc0f00000e"
},
{
"cat": "Tram",
"_id": "50b92b585cf34cbc0f00000d"
}
]
}
]
}
}
I have a call to my API that gets me that document (app.ly/api/v1/stationcategories). I find this much easier to code with.
In your case you could have something like:
{
"option": "Breeds",
"_id": "xxx",
"suboption": [
{
"option": "Arabian",
"_id": "xxx"
},
{
"option": "Quarter House",
"_id": "xxx"
},
{
"option": "Saddlebred",
"_id": "xxx"
}
]
},
{
"option": "Registrars",
"_id": "xxx",
"suboption": [
{
"option": "AQHA",
"_id": "xxx"
},
{
"option": "American Arabians",
"_id": "xxx"
}
]
}
Whenever you need them, either loop through them, or pull specific options from your collection.
2. Storing as a static JSON document
This as #Sergio mentioned, is a viable and more simplistic approach. You can then either have separate docs for separate options, or put them in one document.
You do lose some flexibility here because you can't reference options by Id (which I prefer because changing option name doesn't affect all your other data).
Prone to typos (though if you know what you're doing this shouldn't be a problem).
For Node.js users: this might leave you with a headache from require('../../../options.json') similar to PHP.
The reader will note that I'm being negative about this approach, it works, but is rather inflexible.
Though we're discouraged from using joins unnecessarily on MongoDB, referencing by ObjectId is sometimes useful and extensible.
An example is if your website becomes popular in one region of the world, and say people from Poland start accounting for say 50% of your site visits. If you decide to add Polish translations. You would need to go back to all your documents, and add Polish names (if exists) to your options. If using approach 1, it's as easy as adding a Polish name to your options, and then plucking the Polish name from your options collection at runtime.
I could only think of 2 options other than storing each option as a collection
UPDATE: If someone has positives or negatives for either approach, may you please add them. My bias might be unhelpful to some people as there are benefits to storing static JSON files
MongoDB is schemaless and also no JOIN is supported. So you have to move out of the RDBMS and normalization given the fact that this is purely a different kind of database.
Few rules which you can apply while designing as a guidelines. Of course, you have the choice of keeping it in a separate collection when needed.
Static Master/Reference Data:
You have to always embed them in your documents wherever required. Since the data is not going to be changed, it is not at all bad idea to keep in the same collection. If the data is too large, group them and store them in a separate collection instead of creating multiple collection for the this master data itself.
NOTE: When embedding the collections as sub-documents, always make sure that you are never going to exceed the 16MB limit. That is the limit (at this point) for each collection can take in MongoDB.
Dynamic Master/Reference Data
Try to keep them in a separate collection as the master data is tend to change often.
Always remember, NO join support, so keep them in a way that you can easily access it without querying the database too many times.
So there is NO suggested best way, it always changes based on your need and the design can go either way.