Build couchbdb view to index all documents whose ID starts with various three or four characters? - nosql

I'm new to nosql and views. Wondering if someone could show me how to build an index such that it will return all the different documents that apply multiple different keys. An example is below.
I have many many documents that all have the naming convention as follows:
AABA_August-11-2017_2017-06-29_10
BBY_August-11-2017_2017-06-29_10
CECO_January-19-2018_2017-06-08_19
GEL_December-15-2017_2017-06-08_1
Etc..
I'd like a view such that I could query on "starts with BBY" for example. And it would return all the documents that start with BBY. Maybe even "BBY_December", "BBY_August" etc.
Wondering if this is possible and what it would look like. I'm using CouchDB which uses Mango to build indexes. If someone could just point me in the right direction that would help too.
Thanks

You could write such a view like this:
function(doc) {
var docId = doc._id;
var p = docId.substring(0, 2); // Or however many chars you want
if (p === 'BBY') emit(doc._id, doc); // Or whatever kind of key you want
}
And then write similar views for alternate prefixes. You can also use query parameters similar to the _all_docs endpoint with views (http://docs.couchdb.org/en/2.0.0/api/ddoc/views.html).
I think the only benefit of using a view instead of what you have done is that you can filter unnecessary fields, do some basic transformations, etc.
Considering the similarities between retrieval from _all_docs vs from a view, it looks like the _all_docs endpoint is just index similar to a custom view. But I'm not sure on that.
Not sure how to use Mango to do the same.

My current naming convention required no new index. I used futon to find:
ip:port/DB/_all_docs?inclusive_end=true&start_key="BBY_Aug"&end_key="BBY_Auh"

Related

Finding the number of different values of an attribute

I am writing an APS.NET MVC 5 application in C#, using a MongoDB database. Suppose I have a MongoDatabase object called my_db, which contains a MongoCollection of Label objects, called labels. Each Label object has a few attributes, one of which is a string called tag. Each tag value may be shared across different Labels, such that some Label objects will have the same value for tag.
I want to find out how many different values for tag there are in this collection, and store these values in an array of some sort.
I'm fairly new to MongoDB, so I don't really know how to do this. All I have done so far is get labels:
var labels = my_db.GetCollection<Label>("labels");
But I'm stuck as to what I need to do now. I could manually iterate through each Label in labels, and check whether that Label's tag attribute has already been seen before. But is there a neater way to do this with a MongoDB function? Thanks!
There is a MongoDB method for this: distinct, that should exist in any API.
As you are doing this on MVC 5 c# application, MongoDB provides C# LINQ Driver which will help your querying MongoDB using LINQ.
http://docs.mongodb.org/ecosystem/tutorial/use-linq-queries-with-csharp-driver/
Hope this helps.
var query = (from e in labels.AsQueryable<labelClass>()
select e.tag).Distinct()

How to change live date?

I wonder, How do I change a live data schema with MongoDB ?
For example If I have "Users" collection with the following document:
var user = {
_id:123312,
name:"name",
age:12,
address:{
country:"",
city:"",
location:""
}
};
now, in a new version of my application, if I add a new property to "User" entity, let us say weight, tall or adult ( based on users year ), How to change all the current live data which does not have adult property. I read MapReduce and group aggregation command but, they seem to be comfortable and suitable for analytic operation or other calculations, or I am wrong.
So what is the best way to change your current running data schema in MongoDB ?
It really depends upon your programming language. MongoDB is really good at having a dynamic schema. I think your pattern of thought at the moment is too SQL related whereby you believe that all rows, even if they do not yet have a value, must have the new field.
The reality is quite different. The rows which have nothing meaningful to put into them do not require the field and you can, in your application, just check to see if the returned document has a value, if not then you can assume, as in a fixed SQL schema, that the value is null.
So this is one aspect where MongoDB shines, is the fact that you don't have to apply that new field to the entire schema on demand, instead you can lazy fill it as data is entered by the user.
So just code the field into your application and let the user do the work for you.
The best way to add this field is to write a loop, in maybe the console close or on the primary of your replica (if you have one, otherwise just on the server), like so:
db.users.find().forEach(function(doc){
doc.weight = '44 stone';
db.users.save(doc);
});
That is currently the best way to do something like what your asking.

Most efficient way to store nested categories (or hierarchical data) in Mongo?

We have nested categories for several products (e.g., Sports -> Basketball -> Men's, Sports -> Tennis -> Women's ) and are using Mongo instead of MySQL.
We know how to store nested categories in a SQL database like MySQL, but would appreciate any advice on what to do for Mongo. The operation we need to optimize for is quickly finding all products in one category or subcategory, which could be nested several layers below a root category (e.g., all products in the Men's Basketball category or all products in the Women's Tennis category).
This Mongo doc suggests one approach, but it says it doesn't work well when operations are needed for subtrees, which we need (since categories can reach multiple levels).
Any suggestions on the best way to efficiently store and search nested categories of arbitrary depth?
The first thing you want to decide is exactly what kind of tree you will use.
The big thing to consider is your data and access patterns. You have already stated that 90% of all your work will be querying and by the sounds of it (e-commerce) updates will only be run by administrators, most likely rarely.
So you want a schema that gives you the power of querying quickly on child through a path, i.e.: Sports -> Basketball -> Men's, Sports -> Tennis -> Women's, and doesn't really need to truly scale to updates.
As you so rightly pointed out MongoDB does have a good documentation page for this: https://docs.mongodb.com/manual/applications/data-models-tree-structures/ whereby 10gen actually state different models and schema methods for trees and describes the main ups and downs of them.
The one that should catch the eye if you are looking to query easily is materialised paths: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-materialized-paths/
This is a very interesting method to build up trees since to query on the example you gave above into "Womens" in "Tennis" you could simply do a pre-fixed regex (which can use the index: http://docs.mongodb.org/manual/reference/operator/regex/ ) like so:
db.products.find({category: /^Sports,Tennis,Womens[,]/})
to find all products listed under a certain path of your tree.
Unfortunately this model is really bad at updating, if you move a category or change its name you have to update all products and there could be thousands of products under one category.
A better method would be to house a cat_id on the product and then separate the categories into a separate collection with the schema:
{
_id: ObjectId(),
name: 'Women\'s',
path: 'Sports,Tennis,Womens',
normed_name: 'all_special_chars_and_spaces_and_case_senstive_letters_taken_out_like_this'
}
So now your queries only involve the categories collection which should make them much smaller and more performant. The exception to this is when you delete a category, the products will still need touching.
So an example of changing "Tennis" to "Badmin":
db.categories.update({path:/^Sports,Tennis[,]/}).forEach(function(doc){
doc.path = doc.path.replace(/,Tennis/, ",Badmin");
db.categories.save(doc);
});
Unfortunately MongoDB provides no in-query document reflection at the moment so you do have to pull them out client side which is a little annoying, however hopefully it shouldn't result in too many categories being brought back.
And this is basically how it works really. It is a bit of a pain to update but the power of being able to query instantly on any path using an index is more fitting for your scenario I believe.
Of course the added benefit is that this schema is compatible with nested set models: http://en.wikipedia.org/wiki/Nested_set_model which I have found time and time again are just awesome for e-commerce sites, for example, Tennis might be under both "Sports" and "Leisure" and you want multiple paths depending on where the user came from.
The schema for materialised paths easily supports this by just adding another path, that simple.
Hope it makes sense, quite a long one there.
If all categories are distinct then think of them as tags. The hierarchy isn't necessary to encode in the items because you don't need them when you query for items. The hierarchy is a presentational thing. Tag each item with all the categories in it's path, so "Sport > Baseball > Shoes" could be saved as {..., categories: ["sport", "baseball", "shoes"], ...}. If you want all items in the "Sport" category, search for {categories: "sport"}, if you want just the shoes, search for {tags: "shoes"}.
This doesn't capture the hierarchy, but if you think about it that doesn't matter. If the categories are distinct, the hierarchy doesn't help you when you query for items. There will be no other "baseball", so when you search for that you will only get things below the "baseball" level in the hierarchy.
My suggestion relies on categories being distinct, and I guess they aren't in your current model. However, there's no reason why you can't make them distinct. You've probably chosen to use the strings you display on the page as category names in the database. If you instead use symbolic names like "sport" or "womens_shoes" and use a lookup table to find the string to display on the page (this will also save you hours of work if the name of a category ever changes -- and it will make translating the site easier, if you would ever need to do that) you can easily make sure that they are distinct because they don't have anything to do with what is displayed on the page. So if you have two "Shoes" in the hierarchy (for example "Tennis > Women's > Shoes" and "Tennis > Men's > Shoes") you can just add a qualifier to make them distinct (for example "womens_shoes" and "mens_shoes", or "tennis_womens_shoes") The symbolic names are arbitrary and can be anything, you could even use numbers and just use the next number in the sequence every time you add a category.

Interactive sorting of grouped bar chart in D3js

I have generated a group bar chart based on the example provided in D3js.org example repository. Now I am trying to introduce an interactive sorting option based on another example from D3js example sets.
I have three variables grouped per state. I was hoping to provide interaction where reader can sort (descending) based on -
1. Any one of the variables (but whole group should move)
3. Three different sorting options one for each variable (complicated and less important)
I am new to javascript and D3js so I am not sure of the way moving forward. Any suggestion would be greatly appreciated.
Without seeing your code, I can only offer a very vague direction:
you may need a function that gets called, each time the viewer change sorting option. Inside the function, you will need to specify different accessor:
var update = function(_value){
data.sort(function(a, b) { return b._value - a._value; })
// add transition with the newly sorted data
}

CouchDB query using :group_level and :key

I am using CouchDB 1.1.1 for my web app-- everything has worked great so far (saving/retrieving documents, saving/querying views, etc) but I am stuck on a querying a view for a particular key at a particular group level.
The map function in my view outputs keys with the following format: ["Thing 1" "Thing 2"]. I have a reduce function which works fine and outputs correct values for group level 1 (ie by "thing 1") and by group level 2 (ie by "thing 2").
Now-- when I query couchdb I CAN grab just one particular key when I set reduce = true (default), group_level=2 (or group=true, which are the same in this case since I only have 2 levels) and key = "desiredkeyhere." I can also query multiple keys with keys = ["key1" "key2"].
HOWEVER-- I really want to be able to grab a particular key for group_level=1, and I cannot get that to work. It seems to return nothing, or if use a post request, it returns everything. Never just the one key that I need.
Heres a link the the couchdb http view api (querying options) that I've been using:
http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options
It contains the following sentence:
"Note: Multiple keys request to a reduce function only supports group=true and NO group_level (identical to group_level=exact). The resulting error is "Multi-key fetchs for reduce view must include group=true""
Im not sure if this means that I cannot do what I have described above (grab a particular key for a particular group_level). That would seem like a huge problem with couchdb, so Im assuming Im doing something wrong.
Any ideas? Thanks
I have hit this too. I am not sure if it is a bug, though.
Try using your startkey and endkey in the normal (2-item) format. You want a result for ["Thing 1", *] (obviously pseudocode, the star represents anything). Reducing with group_level=1 will boil all of that down to one row.
So, query basically everything in the Thing 1 "namespace," so to speak. Since the "smallest" value to collate is null and the "greatest" value is the object {}, those make good bookends for your range.
?group_level=1&startkey=["Thing 1",null]&endkey=["Thing 1",{}]
Does that give you the result you need?