I have a collection with single DOC which I use as "source of truth" and want to be able to remove & add values to it:
{
"_id" : ObjectId("61012ada8d2ccb252be87551"),
"language" : "english",
"gui-ipv6" : "disable",
"gui-certificates" : "enable",
"gui-custom-language" : "disable",
"gui-display-hostname" : "disable",
"admin-https-ssl-versions" : "tlsv1-1 tlsv1-2 tlsv1-3",
}
I want to be able to delete a line, say language for example. Whenever I try to do this, it removes entire DOC since my query matches entire DOC e.g:
db.my_collection.remove({'language': {$exists:true}})
Any ideas on how to solve this?
Thank you.
Use operator unset for delete field
https://docs.mongodb.com/manual/reference/operator/update/unset/
db.my_collection.update(
{},
{ $unset: { language: "" },
false, true }
)
For example: I have some arrays without the word "cheese" and some with "cheese" and others with "extra cheese." $regex was a command I found that could search for particular word, so I could used that, I think, to exclude some values.
Here is one of my objects:
{
"_id" : ObjectId("5ebf0cd0b14985ef2b48af46"),
"meat" : "chicken",
"cheese" : false,
"toppings" : [
"mac and cheese",
"biscuit"
]
}
Cheese is included in "mac and cheese" and should not be returned. I used "db.burger.find({}, {meat:1, toppings:1})" but it still returns the value "mac and cheese".
Thanks in advance.
Im a beginner in MongoDB2.6. I'm exploring "Text" indexing in Mongodb.
My Collection has below documents.
{ "_id" : ObjectId("54961bfa913a9f096e9390a3"), "Comments" : "David went to Park today" }
{ "_id" : ObjectId("54961e5b913a9f096e9390a7"), "Comments" : "David went to Park today", "Toldby" : "How are You" }
{ "_id" : ObjectId("54961be4913a9f096e9390a1"), "Comments" : "Park in Irvine are beautiful"}
I have created an "Text" Index on Comments Column.
db.textcollection.find({$text:{$search:"Park"}}) --> This Command returns all three documents
But when i try to replace "Park" with "in" i get no output, should it return the last document for me? . Please correct me if my understanding is wrong.
The most common words of the text index's configured language (english, by default) are known as "stop words" and are excluded from the index. Examples from your strings are words like "to", "in", and "are". As such, you won't get any results if you search on those words.
If you actually need those words included, then you can set the text index's language to "none" which disables all the smarts of stop words and word stemming.
Is there any way, how I can make Solr index embedded mongoDB documents? We already can index top-level values of keys in a mongo document via mongo-connector, pushes the data to Solr.
However, in situations like in this structure which represents a post:
{
author: "someone",
post_text : "some really long text which is already indexed by solr",
comments : [
{
author:"someone else"
comment_text:"some quite long comment, which I do not
know how to index in Solr"
},
{
author:"me"
comment_text:"another quite long comment, which I do not
know how to index in Solr"
}
]
}
This is just an example structure. In our project, we handle more complicated structures, and sometimes, the text we want to index is nested on a second or third level (depth, or what is the formal name for it).
I believe that there is a community of mongoDB + Solr users and so that this issue must have been adressed before, but I was unable to find good materials, that would cover this problem, if there is a nice way, how to handle this or whether there is no solution and workarounds have yet to be founded (and maybe you could provide me with one)
For a better understanding, one of our structures have at top level key that has for its value an array of some several analysis results, where one of them has an array of singular values, that are parts of the result. We need to index these values. E.g. (this is not the actual data structure, we use):
{...
Analysis_performed: [
{
User_tags:
[
{
tag_name: "awesome",
tag_score: 180
},
{
tag_name: "boring",
tag_score: 10
}
]
}
]
}
In this case we would need to index on the tag names. There is a possibility of us having a bad structure for storing the data, we want to store, but we thought hard about it and we think it's quite good. However, even if we switch to less nested information, we will most likely come across at least one situation where we will have to index information stored in embedded documents that are in an array and this is the question's main focus. Can we index such data with SOLR somehow?
I had a question like this a couple months ago. My solution is to use doc_manager.
You can use solr_doc_manager (upsert method), to modify document posted into solr. For example, if you have
ACL: {
Read: [ id1, id2 ... ]
}
you can handle it something like
def upsert(self, doc):
if ("ACL" in doc) and ("Read" in doc["ACL"]):
doc["ACL.Read"] = []
for item in doc["ACL"]["Read"]:
if not isinstance(item, dict):
id = ObjectId(item)
doc["ACL.Read"].append(str(id))
self.solr.add([doc], commit=False)
It adds new field - ACL.Read. This field is multivalued and stores list of ids from ACL : { Read: [ ... ] }
If you do not want to write you own handlers for nested documents, you can try another mongo connector. Github project page https://github.com/SelfishInc/solr-mongo-connector. It supports nested documents out of the box.
Official 10gen mongo connector now supports flattening of arrays and indexing subdocuments.
See https://github.com/10gen-labs/mongo-connector
However for arrays it does something unpleasant like this. It would transform this document:
{
"hashtagEntities" : [
{
"start" : "66",
"end" : "81",
"text" : "startupweekend"
},
{
"start" : "82",
"end" : "90",
"text" : "startup"
},
{
"start" : "91",
"end" : "100",
"text" : "startups"
},
{
"start" : "101",
"end" : "108",
"text" : "london"
}
]
}
into this:
{
"hashtagEntities.0.start" : "66",
"hashtagEntities.0.end" : "81",
"hashtagEntities.0.text" : "startupweekend",
"hashtagEntities.1.start" : "82",
"hashtagEntities.1.end" : "90",
"hashtagEntities.1.text" : "startup",
....
}
The above is very difficult to index in Solr - even more if you have no stable schema for your documents. We wanted something more like this:
{
"hashtagEntities.xArray.start": [
"66",
"82",
"91",
"101"
],
"hashtagEntities.xArray.text": [
"startupweekend",
"startup",
"startups",
"london"
],
"hashtagEntities.xArray.end": [
"81",
"90",
"100",
"108"
],
}
I have implemented an alternative solr_doc_manager.py
If you want to use this, just edit the flatten_doc function in your doc_manager to this, to support such functionality:
def flattened(doc):
return dict(flattened_kernel(doc, []))
def flattened_kernel(doc, path):
for k, v in doc.items():
path.append(k)
if isinstance(v, dict):
for inner_k, inner_v in flattened_kernel(v, path):
yield inner_k, inner_v
elif isinstance(v, list):
for inner_k, inner_v in flattened_list(v, path).items():
yield inner_k, inner_v
path.pop()
else:
yield ".".join(path), v
path.pop()
def flattened_list(v, path):
tem = dict()
#path2 = list()
path.append(str("xArray"))
for li, lv in enumerate(v):
if isinstance(lv, dict):
for dk, dv in flattened_kernel(lv, path):
got = tem.get(dk, list())
if isinstance(dv, list):
got.extend(dv)
else:
got.append(dv)
tem[dk] = got
else:
got = tem.get(".".join(path)+".ROOT", list())
if isinstance(lv, list):
got.extend(lv)
else:
got.append(lv)
tem[".".join(path)+".ROOT"] = got
return tem
In case you do not want to lose data from array, which are not sub-documents, this implementation will place the data into a "array.ROOT" attribute. See here:
{
"array" : [
{
"innerArray" : [
{
"c" : 1,
"d" : 2
},
{
"ahah" : "asdf"
},
42,
43
]
},
1,
2
],
}
into:
{
"array.xArray.ROOT": [
"1.0",
"2.0"
],
"array.xArray.innerArray.xArray.ROOT": [
"42.0",
"43.0"
],
"array.xArray.innerArray.xArray.c": [
"1.0"
],
"array.xArray.innerArray.xArray.d": [
"2.0"
],
"array.xArray.innerArray.xArray.ahah": [
"asdf"
]
}
I had the same issue, I want to index/store in Solr complicated documents. My approach was to modify the the JsonLoader to accept complicated json documents with arrays/objects as values.
It stores the object/array and then flatten it and indexes the fields.
e.g basic example document
{
"titles_json":{"FR":"This is the FR title" , "EN":"This is the EN title"} ,
"id": 1000003,
"guid": "3b2f2998-85ac-4a4e-8867-beb551c0b3c6"
}
It will store
titles_json:{
"FR":"This is the FR title" ,
"EN":"This is the EN title"
}
and then index fields
titles.FR:"This is the FR title"
titles.EN:"This is the EN title"
Not only you will be able to index the child documents, but also when you perform a search on solr you will receive the original complicated structure of the document that you indexed.
If you want to check the source code, installation and integration details with your existing solr, check
http://www.solrfromscratch.com/2014/08/20/embedded-documents-in-solr/
please note that I have tested this for solr 4.9.0
M.
I'm trying to get the values from an array of objects for the keys that match certain criteria. For the objects in the array the keys will be longs and the values strings. Here's a sample MondgoDB document:
"_id" : ObjectId("509eba6d84f30613b4aee1ca"),
"timestamps" : [
{
"1234" : "ABC"
},
{
"2345" : "DEF"
},
{
"3456" : "GHI"
},
{
"4567" : [
"JKL",
"ABC"
]
},
{
"5678" : "GHI"
}
],
"word" : "foo"
For example I'd like to retrieve the values of all "timestamps" entries where the key is less than 3000 (i.e. "ABC" and "DEF" in the above). I've only had luck in finding which documents in the collection have specific keys by using coll.find({"timestamps.4567":{$exists:true}}) but I get no results when trying things like coll.find({"timestamps":{$lt:3000}}) - I'm obviously missing something there that would check if timestamps' keys are less than 3000, not the value of timestamps itself.
Maybe I got it all wrong... looks like you need to alter a bit the structure of your documents:
"_id" : ObjectId("509eba6d84f30613b4aee1ca"),
"timestamps" : [
{
"key": "1234",
"val": "ABC"
},
{
"key": "2345",
"val": "DEF"
},
"word" : "foo"
and then you can query using elemMatch:
db.test.find({timestamps: {$elemMatch: {'key': {$gt: '1234'}}}})
Make sure you have an index on timestamps.key
HTH