Querying for specific objects in a document & Visualization - MongoDB [duplicate] - mongodb

This question already has answers here:
Retrieve only the queried element in an object array in MongoDB collection
(18 answers)
Closed 5 years ago.
I have a complex geojson document stored in my MongoDB. My goal is to retrieve the objects that apply to my condition e.g:
I want to retrieve the objects that contain "avenue" in the 'features.properties.name'field. I have tried this: db.LineString.find({'features.properties.name' : "Avenue"}) which results:
As you can see, this returns the entire document. My goal is just to return the objects like the highlighted object 0 which fulfil the given condition. Also, could the results be visualized somehow?

find(arg1)
command you use returns with documents stored in collection, so it can search nested docs, but cannot return part of top level collection document. It returns all document.
If your docs structure regular try to use this
find(arg1, arg2)
instead to limit returned fields https://docs.mongodb.org/manual/tutorial/project-fields-from-query-results/
if your documents structure irregular, write a script with your favorite programming language

After much research, I found a way of actually querying specific objects and visualizing them using RoboMongo and Google Drag and Drop GeoJSON.
For the query part we can use the below technique as specified here:
//[Thematic] Finds all the LineString documents that contain “Avenue” in their name.
db.LineString.aggregate(
{ $match : {
"features.properties.name": /avenue/i
}},
{ $unwind : "$features" },
{ $match : {
"features.properties.name": /avenue/i
}}
)
Here we use the $unwind aggregation stage in order to define what field we want to retrieve. After that we use $match in order retrieve the documents with the specific conditions. Note that the first match can take advantage of an index (Text, Spatial etc..) and fasten the time by a lot. By right clicking on the document in MongoDB we can view+store the generated JSON.
However, when working with Spatial data you will probably like to have your data in the map, especially when working with complex structures. In order to do so, you will need to convert your data from JSON to GeoJSON. Below I will show the regular expression I used in order to convert my file.
Algorithm JSON file (generated from MongoDB) to GeoJSON:
Erase: "features" : \{.* AND "coordinates"[^}]*\K\} AND "type" : "Feature" AND "ok" : 1.0000000000000000 AND "id".*, **AND**"_id" : ObjectId(.*`
Replace: "type" : "FeatureCollection",[^}]*\}, WITH "type" : "Feature", AND "result" : [ WITH "features" : [
{
"type": "FeatureCollection",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features" : [
I run these regular expressions using Notepad++:
"features" : \{.*|"coordinates"[^}]*\K\}|"type" : "Feature"|"ok" : 1.0000000000000000|"id".*,|"_id" : ObjectId\(.*,
"type" : "FeatureCollection",[^}]*\}, replace with "type" : "Feature",
Disclaimer: Please notice that your file might follow a different structure, since the way you project the data obviously effects the output file. In such a case, extra modification is required.
Now that we have the GeoJSON file we can drag and drop it to Google's GeoJSON API. The above query will give us the roads of Vancouver that contain "Avenue" on their names:
Thoughts:
I believe that this job could be done directly from RoboMongo since the produced JSON can be converted to GeoJSON with some inferior changes. Also please note that this REGEX is way too complicated and if you are interested in a more stable solution I would suggest you to use a JSON Library like NodeJS, Jackson etc and generate a brand new file.I came up with this solution and it worked perfectly for my case.
Enjoy :)

Related

How to search values in real time on a badly designed database?

I have a collection named Company which has the following structure:
{
"_id" : ObjectId("57336ea1a7454c0100d889e4"),
"currentMonth" : 62,
"variables1": { ... },
...
"variables61": { ... },
"variables62" : {
"name" : "Test",
"email": "email#test.com",
...
},
"country" : "US",
}
My need is to be able to search for companies by name with up-to-date data. I don't have permission to change this data structure because many applications still use it. For the moment I haven't found a way to index these variables with this data structure, which makes the search slow.
Today each of these documents can be several megabytes in size and there are over 20,000 of them in this collection.
The system I want to implement uses a search engine to index the names of companies, but for that it needs to be able to detect changes in the collection.
MongoDB's change stream seems like a viable option but I'm not sure how to make it scalable and efficient.
Do you have any suggestions that would help me solve this problem? Any suggestion on the steps needed to set up the above system?
Usually with MongoDB you can add new fields to documents and existing applications would simply ignore the extra fields (though they naturally would not be populated by old code). Therefore:
Create a task that is regularly executed which goes through all documents in your collection, figures out the name for each document from its fields, then writes the name into a top-level field.
Add an index on that field.
In your search code, look up by the values of that field.
Compare the calculated name to the source-of-truth name. If different, discard the document.
If names don't change once set, step 1 only needs to go through documents that are missing the top-level name and step 4 is not needed.
Using the change detection pattern with monstache, I was able to synchronise in real time MongoDB with ElasticSearch, performing a Filter based on the current month and then Map the result of the variables to be indexed 🎊

Best way to build MongoDB indices to cover all types of queries

I am building a query engine whose underlying database is DocumentDB (aka MongoDB 3.6) and I was wondering what the best way to build indices for it in order to get the best query performance possible. While there will definitely be certain queries that will be more common than others, the fact that users will be building these queries, the goal is to have good (enough) performance across any combination of attributes to query by.
A document in this collection would have a structure similar to this:
{ "ContainerName" : "hello",
"description" : "test",
"timestamp" : 1000000,
"isActive" : true,
(15 more attributes of strings, booleans, numbers),
"events": [
{ "eventId" : "test",
(10 more attributes of strings, booleans, numbers)
},
{ "eventId" : "test2",
(10 more attributes of strings, booleans, numbers)
}],
"resources": [
{ "resourceId" : "test",
(8 more attributes of strings, booleans, numbers)
}]
}
I want to be able to query all combinations of attributes including the embedded ones. For instance, get me the container whose name is hello and has an event with an eventId of the test. If the number of attributes was small enough, maybe I could have done all possible combinations as a compound key but it wouldn't be possible here.
In addition, if I wanted to use regex for string contains on some fields, would MongoDB use indices on the attributes of the filter it can and then manually filter to complete the regex or does using regex eliminate the use of indices completely for the query.

MongoDB search via index of documents containing JSON

Say I have objects in a MongoDB collection:
{
...
"json" : "{\"things\":[2494090781803658355,5114030115038563045,3035856943768375362,8931213615561493991,7574631742057150605,480863244020297489]}"
}
It's an Azure "MongoDB" so doesn't support all the features, but suppose it does.
This search will find that document:
db.coll.find({"json" : {$regex : "5114030115038563045|8931213615561493991"}})
Of course, it's scanning the whole collection to pull these records out. What's an efficient/faster way to find documents where the list of "things"
contains any of a list of "things" in a query? It seems like throwing a search engine like Solr or ElasticSearch would solve this, and perhaps
using another Azure's Data Lake storage would make this more searchable, so I'm considering those options. They're outside the scope of this
question though; I'd like to know if there's a Mongo-ish way to search this collection by index.
The only option you have available to you if you're storing a JSON string is to use a text index with a $text operator.
If this document structure isn't set in stone, however, you might consider also separately storing the JSON as a nested subdocument (with the appropriate sanitation, of course). This would allow you to construct an index on json.things, while still storing the JSON string, and allow you to perform a query on e.g. "json.things": {$in: [ "5114030115038563045", "8931213615561493991" ]}

Solr Increase relevance of search result based on a map of word:value

Let's say we have a structure like this per entry that goes to solr. The document is first amended and than saved. The way it is amended at the moment is that we lose the connection between the number and the score. However, we could change that into something else, if necessary.
"keywords" : [
{
"score" : 1,
"content" : "great finisher"
},
{
"score" : 1,
"content" : "project"
},
{
"score" : 1,
"content" : "staying"
},
{
"score" : 1,
"content" : "staying motivated"
}
]
What we want is to give a boost to a solr query result to a document using the "score" value in case the query contains the word/collocation to which the score is associated.
So each document has a different "map" of keyword with a score. And the relevancy would be computed normally how it Solr does now, but with a boost according to this map and the words present in the query.
From what I saw we can give boosts to results according to some criteria, but this criteria is very dynamic - context dependent. Not sure how to implement or where to start.
At the moment there is no built-in support in Solr to do anything like this. The most ideal way would be to have each term in a multiValued field boosted separately, but this is currently not possible (the progress (although there is none) is tracked in SOLR-2499).
There are however ways of working around this; two are suggested in the issue tracker above. I can't say much about using payloads and a custom BoostingTermQuery, but using dynamic fields are a possibility. The drawbacks are managing your cache sizes if you have many different field names and query/sort by most of them. If you have a small index with fewer terms, it will work, but a larger (in the higher five and six digits) with many dynamic fields will eat up your memory quick (as you for each sort/query will have one lookup cache with an int/long-array in the same size as your document count.
Another suggestion would be to look at using function queries together with a boost. If you reference the field here instead, you might avoid the cache issue. Try it!

Multiple nested arrays in MongoDB

I am having difficulties figuring out an effective way of working with a multiple nested document. It looks like the following:
{ "_id" :
{ "$oid" : "53ce46e3f0c25036e7b0ddd8"} ,
"someid" : 7757099 ,
"otherids" :
[ { "id" : 100 ,
"line" : "test" ,
"otherids" :
[ { "id" : 129}]}
]}
and there will be another level of array in addition.
I can not find a way to query this structure except for "otherids" array, but no deeper. Is this possible to do in an effective way at all?
These arrays might grow a bit, but not hugely.
My thought was to use it like this since it will be effective to fetch a lot of data in one go. But this data also needs to be updated quite often. Is this a hopeless solution with mongoDB?
Regards mongoDB newb
EDIT:
I would like to do it as simply and fast as possible :-)
Like: someid.4.otherids.2.line -> somevalue
I know that probably I would have to do a query to check if values exist, but it would be nice to do it as an upsert. Now I only work with objects in java, and it takes 14 secs to insert 10 000 records. Most of these inserts are "leaf nodes", meaning I have to query, then find out what is already there, modify the document, then update the whole root. This takes too long.