MongoDB: Project field to a different type - mongodb

I have a collection that contains documents with string fields that sometimes contain only numerical data. I would like to write a query that utilizes the aggregation framework to filter, project and group data, but this is failing to work as expected due to the fact that the field I would like to $avg is a string. Is there any way to project a field into a different type (in this case, from string to double) ?

Related

Sphinx Search MVA attributes

We are looking into upgrading our Sphinx to version 3.3.1 (currently on 2.2.11).
Now, we are running into an issue with some MVA attributes which are used for facets.
Snippet of our sphinx config:
sql_attr_multi = uint applicantids from field applicantids
The applicantids column in the db is a string containing a comma separated list, some records are an empty string (not null).
This is the error we receive when making a query with applicantids as a facet request:
column 'applicantids' (alias 'applicantids') has incompatible types
across shards
We were wondering if this could be caused by the empty records being handled differently in the new Sphinx version?
This does sound like you've defined the 'applicantids' attribute differently in different indexes. (shards are a another name for a part of a index)
Should use sql_attr_multi in all parts of the index. Even if it indexing an empty string - it should index that as a empty list.

How to create an index exemption on Firestore subdocuments?

We have a database structured as follows:
Collection foo
Documents
Collection bar
Documents with many fields (approaching the 1 MB limit)
Trying to write a document to the bar collection containing 34571 fields, I get (from the Go API):
rpc error: code = InvalidArgument desc = too many builtin index entries for entity
OK, fine, it seems I need to add an exemption:
Large array or map fields
Large array or map fields can approach the limit of 20,000 index entries per document. If you are not querying based on a large array or map field, you should exempt it from indexing.
But how? The console only lets me set a single collection name and a single field path, and slashes aren't accepted:
I tried other combinations, but / isn't accepted in either the Collection ID or the Field path, and using ., while not clearly forbidden, results in a generic error when trying to save the exemption. I'm also not sure if * is allowed.
Index exemptions are based on collection ID and not collection path. In this case, you can enter bar as the collection ID. This also means the exemption applies to all collections with ID bar, regardless of hierarchy.
As for the fields, you can specify only a single field path per exemption. The "*" all-selector is not supported. There is a limit of 200 index exemptions so you wouldn't be able to exempt all 34571 fields. If possible, I suggest moving your fields into a map. Then you could disable indexing on the map field.

Can't find documents by criteria containing string objectId value

I have a collection list such:
{username: 'somename',
friendId: '57d725d6b8b144044602bf74' <-- This a reference objectId to another doc
}
When I query docs in my collection with criteria {friendId : '57d725d6b8b144044602bf74'} I get no results back .
Any other field query works fine.
I tried to convert the value to ObjectId('57d725d6b8b144044602bf74') even though the value is just a string, still no go.
Why am I failing to search for by that type of string ?
you are trying to achieve 'self-join' in mongoDB and you seems to rely on the _id field generated by mongodb.
i would suggest you to supply custom _ID fields to the document.
eg:
{_id:"alex", name:"alex", friendID:""}
{_id"john", name"john", friendID:"alex"}
and then you can execute your queries with ease on friendID field. Give it a shot and see if this make sense for your requirement.

Distinguish array from single value in a document

I have two type of documents in a mongodb collection:
one where key sessions has a simple value:
{"sessions": NumberLong("10000000000001")}
one where key sessions has an array of values.
{"sessions": [NumberLong("10000000000001")]}
Is there any way to retrieve all documents from the second category, ie. only documents whose value is an arary and not a simple value?
You can use this kind of query for that:
db.collectionName.find( { $where : "Array.isArray(this.sessions)" } );
but you'd better convert all the records to one type to keep the things consistent.
This code can be simple like this:
db.c.find({sessions:{$gte:[]}});
Explanation:
Because you only want to retrieve documents whose sessions data type is array, and by the feature of $gte (if data types are different between tow operands, it returns false; Double, Integer32, Integer64 are considered as same data type.), giving an empty array as the opposite operand will help to retrieve all results by required.
Also , $gt, $lt, $lte for standard query (attention: different behaviors to operaors with same name in expression of aggregation pipeline) have the same feature. I proved this by practice on MongoDB V2.4.8, V2.6.4.

Referencing Other Documents by String rather than ObjectId

Let's say I have two collections:
Products and Categories.
The latter collection's documents have 2 fields:
_id (BSON ObjectId)
Name (String)
The latter collection's documents have 3 fields:
_id (BSON ObjectId)
Name (String)
Products (Array of Strings)
Assume I have the following Product document:
{ "_id" : ObjectId("AAA"), "name" : "Shovel" }
Let's say I have the following Category document:
{ "_id" : ObjectId("BBB"), "Name" : "Gardening", "Products" : ["AAA"] }
For purposes of this example, assume that AAA and BBB are legitimate ObjectId's - example: ObjectId("523c7df5c30cc960b235ddee") where they would equal the inner ObjectId's string.
Should the Products field be stored as ObjectId(...)'s rather than as Strings?
I don't think it really matters that much.
I'm pretty sure that the ObjectId format encodes a hex number, so it is probably slightly more efficient with memory and bandwidth. I have done it both ways. As long as you decide, for each field, how you are going to encode it, either will work just fine.
As long as you consistently use the same type (so that comparisons happen correctly), the difference is:
An ObjectId cannot be compared to a String representation of the same ObjectId value. Thus, ObjectId("523c7df5c30cc960b235ddee") is not equal to "523c7df5c30cc960b235ddee".
ObjectIds, when stored natively, will be stored as 12 bytes, plus field name
An ObjectId, when stored as a string, will be commonly stored in 24 bytes (as it will be converted to a hexadecimal number), plus field name
Comparisons can be made more SLIGHTLY more efficiently with the 12 byte number, as it's comparing fewer bytes. It won't matter in most types of usage though, so it's a micro-optimization (but something you should know)
Bonus -- if you don't use short abbreviated field names, the size benefit of using an ObjectId natively as 12 bytes really won't matter, as the field names will far outweigh the size of bytes when stored as a string.
I'd recommend storing them as native ObjectIds. Some drivers can optionally and transparently translate to an ObjectId to a String and back so that the client code can more easily manipulate it. The C# driver for example can do this, and I've used it so that when serializing to JSON, the ObjectId is in a simple format that is easily consumed in JavaScript.
This will matter most when you try to find the details of a product starting from the Categories collection.
Since there are no server side JOIN in Mongo, your code will have to match documents together. ObjectIDs are encoded as 12 bytes, which you can easilly compare in any language. Using either strings or object ids does not really matter.
The real issue you are facing is one of data normalization (or lack thereof). If you store the Name field in your Categories documents, instead of the ObjectID, you will be able to return the products names in a single call (instead of multiple calls, 1 for each products of the category).
It feels wrong the first time you do it. After all, you will have to update many documents if you ever change the name of a product, which might or might not be frequent. You have to model your data by thinking of the way your application will use it.
Finally, index the Name attribute in the Prodcuts collection. Getting the details of a product, starting with the string you found in a Categories document will be fast.
Another way to do it is to not to have a Categories collection at all, but to add a Category attribute to your Products document. You can find documents that have the {'Category':'Gardening'}. Indexing the Category field will probably be a good idea.
Again, ObjectID or String does not matter much. It is about modeling your data thinking of how your application will use it.