CouchDB and querying not by document ID - rest

Is my understanding correct, that querying documents in CouchDB can only be done in a RESTful way by document ID but not by attributes (e.g. something like .../person?firstname=john).
For every query not by document ID I need a view (which means making a map function)?
Greetings Kudi

Yes, CouchDb does not support ad-hoc queries, but instead you define views by defining a map function. You are in full control of what data the map should produce: Id, Key, Value. You then can query this in a range of ways using key, keys, keyrange etc.
http://guide.couchdb.org/editions/1/en/views.html
http://docs.couchdb.org/en/latest/ddocs.html#view-functions

Related

Does length of indexed field matter while searching?

The chat app schema that I have is something like below.
1. conversations {participants[user_1, user_2], convsersation_id}
2. messages {sender: user_1, sonversation_id, timestamps}
I want to map this relationship using existing _id:ObjectId which is already indexed.
But if I want to get all conversation of user_1 I have to first search in which conversation that user is involed and get that conversation's _id and again search for the messages in messages using that conversation _id.
So my questions are -
Does length of indexed field (here _id) matters while searching?
Should I create another shorter indexed fields?.
Also if there is any better alternative schema please suggest.
I would suggest you to maintain the data as sub documents instead of array. The advantage you have is you can build another index (only) on conversation_id field, which you want to query to know the user's involvement
When you maintain it as array, you cannot index the converstaion_id field separately, instead you will have to build a multi key index, which indexes all the elements of the array (sender and timestamps fields) which you are never going to use for querying and it also increases the index size
Answering you questions:
Does length of indexed field (here _id) matters while searching? - Not really
Should I create another shorter indexed fields? - Create sub-document and index converstaion_id
Also if there is any better alternative schema please suggest. - Maintain the array fields as sub-documents

Mongo: Sort objects in a collection with a timestamp in a different collection

I have some collection A with _id, content, timestamp as fields and some collection B with A_id, _id, content, timestamp as fields. A_id refers to some object in A.
I want to sort the objects in A based on their latest timestamps in B.
I can get it to work by re architecting my db design (e.g. storing a latest_B_timestamp in A) BUT is there a simple way to do this directly with Mongo?
Thanks!
I doubt there is any good way to do that with mongo. Your current solution seems ok and natural in mongo. Duplication is the way to go.
No.
MongoDB has no joins, so if 2 collections have related data, they should be worked in the application layer.

Change size of Objectid

In MongoDb ObjectId is a 12-byte BSON type.
Is there any way to reduce the size of objectID?
No. It's a BSON data type. It's like asking a 32-bit integer to shrink itself.
Every object must have _id property, but you are not restricted to ObjectId.
Every document in a MongoDB collection needs to have a unique _id but the value does not have to be an ObjectId. Therefore, if you are looking to reduce the size of documents in your collection you have two choices:
Pick one of the unique properties of your documents and use it as the _id field. For example, if you have an accounts collection where the account ID--provided externally--is part of your data model, you could store the account ID in the _id field.
Manage primary keys for the collection yourself. Many drivers support custom primary key factories. As #assylias suggests, going with an int will give you good space savings but, still, you will use more space than if you can use one of the fields in your model as the _id.
BTW, the value of an _id field can be composite: you can use an Object/hash/map/dictionary. See, for example, this SO question.
If you are using some type of object/model framework on top of Mongo, I'd be careful with (1). Some frameworks have a hard time with developers overriding id generation. For example, I've had bad experience with Mongoid in Ruby. In that case, (2) may be the safer way to go as the generation happens at the driver layer.

best NoSQL for inverted index

I'm working on a small project where I need to build an inverted index and apply similarity algorithms based on a user query - basic information retrieval. What's the best NoSQL product for building and searching inverted indices?
Thanks,
J
Since an inverted index is all about storing the relationship between words and their locations within a document, I'm not sure this is really a good use case for NoSQL. Traditional SQL will work better here. For example, try a data structure like this:
Documents (DocumentID primary key, DocumentText text)
Words (WordID primary key, Word text)
Instances (InstanceID primary key, WordID foreign key, DocumentID foreign key, WordIndex integer)
With this structure, as you insert a document into the Documents table, you parse out each word and add it to the Words table if it's new or retrieve the existing WordID if it already exists, and then add the associated data to the Instances table.
If you're intent on using NoSQL you can use it with something like MongoDB and put all your documents in one collection and all the words in another collection. Inside each Word document, include an Instances array which would be an array of objects with the ObjectID of the associated document and the word index in that document. However, I'm not sure if MongoDB is optimized for handling such large arrays within documents. Common words like 'a' and 'the' could end up going over the 4MB document limit even, depending on how much data you have.
see Elasticsearch
Distributed, scalable, and highly available
Real-time search and analytics capabilities
Sophisticated RESTful API

mongoDB - URL as document ID

Considering I want to create mongoDB documents for a bunch of distinct URLs: what would be the pros and cons (if any) of using the actual URL as the documents _id value instead of the default BSON ObjectId. Thanks in advance!
Cheers,
Greg
An overview of the subject here: http://www.mongodb.org/display/DOCS/Object+IDs
It has to be unique, you could potentially put yourself in the position of having to resolve collisions yourself. Better to leave the default _id alone and simply query against a field you're storing in the document, just how God (10gen) intended.
From http://www.mongodb.org/display/DOCS/BSON
The element name "_id" is reserved for use as a primary key id, but
you can store anything that is unique in that field. The database
expects that drivers will prevent users from creating documents that
violate these constraints.
From #mongodb
stupid _id values will probably make querying slow, but that's about it
And another user from #mongodb
Tell him the collisions will result in garbage data