Bulk remove a special letter in collection - mongodb

Edited:
Lets say I have a Mongo Database with a collection, lets call it products. In this collection I want to remove all special characters, lets say all dots, from all entries in a certain field, lets say price.
Also, how would i replace for example the info entries of all of my objects?
How would I do this through the mongo shell?
Example:
_id: 123324erwerew
name: 'moisture cream'
price: 30.00
info: 'Good Cream'
_id: 343324erwerew
name: 'moisture cream two'
price: 40.00
info: 'Good Cream also'
Lets say info in both of them should be : "best cream ever" and the dots should be gone for both prices

if your prices were like 30.11 and 40.11 the following pipeline update command would make the prices "3011" and "4011" as well as set info to best cream ever
db.products.updateMany({}, [
{
$set: {
price: {
$reduce: {
input: { $split: [{ $toString: "$price" }, "."] },
initialValue: "",
in: { $concat: ["$$value", "$$this"] }
}
}
}
},
{
$set: {
info: "best cream ever"
}
}
])
explanation about price update:
value of price is converted to a string
the resulting string is split by delimiter .
resulting parts are then concatenated back to a string with the use of $reduce
in mongodb v4.4 this is much simplified with the $replaceOne operator

Related

searching for a phrase in Mongodb: weird behaviour

i have the following collection:
books> db.books.find()
[
{
_id: ObjectId("61ab85b0056b5357b5e23e6b"),
fields: 'hello good morning'
},
{ _id: ObjectId("61ab85b5056b5357b5e23e6c"), fields: 'good morning' },
{
_id: ObjectId("61ab8679056b5357b5e23e6d"),
fields: 'hello good morning guys'
},
{
_id: ObjectId("61ab8684056b5357b5e23e6e"),
fields: 'good morning guys'
}
]
Then, i run this query:
db.books.find({$text : {$search : "\"good morning\" hello"}})
and i get:
[
{ _id: ObjectId("61ab85b5056b5357b5e23e6c"), fields: 'good morning' },
{
_id: ObjectId("61ab8684056b5357b5e23e6e"),
fields: 'good morning guys'
},
{
_id: ObjectId("61ab85b0056b5357b5e23e6b"),
fields: 'hello good morning'
},
{
_id: ObjectId("61ab8679056b5357b5e23e6d"),
fields: 'hello good morning guys'
}
]
Can you help me to understand the output? The first result doesn't make much sense to me (document _id : ...23e6c), as it doesn't contain the "hello" string.
I read the first answer to this question. Does it mean that, in my case, mongodb is searching for
(good morning) AND (good OR morning OR hello)
It would explain my question, but i can't find the exact reference to this in the mongodb documentation
Thanks in advance
Directly from the docs:
A string of terms that MongoDB parses and uses to query the text index. MongoDB performs a logical OR search of the terms unless specified as a phrase. See Behavior for more information on the field.
This is just the behavior a text index allows, What you can do it mark every single expression as a "phrase", like so:
{$text : {$search : "\"good morning\" \"hello\""}}
This will enforce $and logic.

MongoDB match the most array elements

I have a usecase where I'm not sure if it can be solved with MongoDB in any reasonably efficient way.
The DB contains Consultants, consultants have a set of available weeks (array of week numbers).
I now want to filter on the consultants with the best matching overlap of a given set of weeks.
e.g. consultants:
{
_id: ....
name: "James",
weeks: [1,2,3,4,8,9,13]
}
{
_id: ....
name: "Anna",
weeks: [2,3,4,20,23]
}
Search data: [1,2,4]
The more the overlap, the higher I want to rank the consultant in the search result.
James matches all three entries, 1,2,4. Anna matches 2,4
Is this even possible using Mongo?
You can calculate a weight for each consultant as a setIntersection between your search array and weeks array:
db.consultants.aggregate([
{
$addFields: {
weight: {
$size: { $setIntersection: [ "$weeks", [1,2,4] ] }
}
}
},
{ $sort: { weight: -1 } }
])
The longest the array the more weeks matched so you can $sort by this weight field.

Mongo get results in order of maximum matches of value

I've a Document like this:
{
{
name: 'The best book'
},
{
name: 'The book is the best on Sachin'
},
{
name: 'Best book on Sachin Tendulkar'
}
}
I've search regex mongo query:
db.getCollection('books').find({ $in: [/sachin/i, /tendulkar/i, /best/i, /book/i]})
It's giving results, but as per my requirement it should give results in sorted order of maximum matches:
{
name: 'Best book on Sachin Tendulkar' (4 matches)
},
{
name: 'The book is the best on Sachin' (3 matches)
},
{
name: 'The best book' (2 matches)
}
I'm new to mongo. Please help me in writing the mongo query for getting the results.
Your best bet may be to use the aggregation framework (https://docs.mongodb.com/v3.2/reference/operator/aggregation/) in this case.
I'd do it like this.
Split text into an array of words
Intersect the array of tags you want to match with the array produced in step 1.
Project the size of the intersection into a field
Sort by the field projected in step 3.
Something along these lines
db.books.aggregate([
{$match: {}},
{$project: {
name: {$toLower: "$name"},
... any other amount of fields ...
}},
{$project: {
name: true,
... any other amount of fields ...
wordArray: {$split: ["$name", " "]}
}},
{$project: {
name: true,
... any other amount of fields ...
wordArray: true,
numberOfMatches: {
$size: {
$setIntersection: ["$wordArray", ["best", "book"]]
}
}
}},
{$sort: {
numberOfMatches: -1
}}
]);
Keep in mind that you can put a condition where $match: {} is, and filter the initial set of books you're classifying.
I'm not sure if this works with regular expressions though, so I added the first $project phase as a way to ensure you're always comparing lowercase to lowercase

Get last record for several items at once with mongo

In my mongo database, I have basically 2 collections:
pupils
{_id: ObjectID(539ab7ffefbb93120c9697f7), firstname: 'Arnold', lastname: 'Smith'}
{_id: ObjectID(539ab7ffefbb93120c5473c3), firstname: 'Steven', lastname: 'Jens'}
marks
{ date: '2014-06-12', value: 12, pupilID: 539ab7ffefbb93120c9697f7}
{ date: '2014-06-05', value: 9, pupilID: 539ab7ffefbb93120c9697f7}
{ date: '2014-05-10', value: 17, pupilID: 539ab7ffefbb93120c9697f7}
{ date: '2014-05-10', value: 7, pupilID: 539ab7ffefbb93120c5473c3}
Is there a way with mongoshell to get the last mark of each pupils without having to manually loop through the list of pupils and get the last mark for each one ?
Currently I loop through each pupils and perform a:
db.marks.find({pupilID: pupilID}).sort({_id: -1}).limit(1)
But I'm quite concerned regarding the performances if the marks collections contains a high number of items.
Well your dates are not the best example here as they are strings. You should convert them to proper "Date" types, but at least they are lexical for sorting.
Not the "join" you seem to be implicitly looking for, but you can get the $last mark for each student from your "marks" collection, which will probably do some way to helping your result:
db.marks.aggregate([
{ "$sort": { "date": 1 } },
{ "$group": {
"_id": "$pupilID",
"date": { "$last": "$date" },
"value": { "$last": "$value" }
}}
]}
And that will give you the last mark "value" by date for each "pupilID". The joining of data is up to you, but this is better than looping whole collections or otherwise firing off on query per "pupil".

Query running very slow on big MongoDB db

I have a MongoDB db with a single rather large collection of documents (13GB for about 2M documents) sitting on a single server with 8GB RAM. Each document has a text field that can be relatively large (it can be a whole blog post) and the other fields are data about the text content and the text author. Here's what the schema looks like:
{
text: "Last night there was a storm in San Francisco...",
author: {
name: "Firstname Lastname",
website_url: "http://..."
},
date: "201403075612",
language: "en",
concepts: [
{name: "WeatherConcept", hit: "storm", start: 23, stop: 28},
{name: "LocationConcept", hit: "San Francisco", start: 32, stop: 45}
],
location: "us",
coordinates: []
}
I'm planning to query the data in different ways:
Full-text search on the "text" field. So let's say my text search query is q:
db.coll.aggregate([
{
$match:{
$text: {
$search:q
}
}
}
])
Aggregate documents by author:
db.coll.aggregate([
{
$project: {
name: "$author.name",
url: "$author.website_url"
}
},
{
$group: {
_id: "$name",
size: {
$sum:1
},
url: {
$first: "$url"
}
}
},
{
$sort:{
size:-1
}
}
])
Aggregate documents by concepts:
db.coll.aggregate([
{
$unwind: "$concepts"
},
{
$group: {
_id: "$concepts.name",
size: {
$sum:1
}
}
},
{
$sort:{
size:-1
}
}
])
These three queries may also include filtering on the following fields: date, location, coordinates, language, author.
I don't have indexes yet in place, so the queries run very slow. But since the indexes would be very different for the different ways I hit the data, does that rule out indexing as a solution? Or is there a way to index for all these cases and not have to shard the collection? Basically my questions are:
What would be a good indexing strategy in this case?
Do I need to create separate collections for authors and concepts?
Should I somehow restructure my data?
Do I need to shard my collection or is my 8GB single-server powerful enough to handle that data?
Do you have any indexes on your collection?
Have a look at the following
http://docs.mongodb.org/manual/indexes/
if you do have indexes make sure they are being hit by doing the following
db.CollectionName.find({"Concept":"something"}).explain();
You also need to give us more information about your setup. How much RAM does the server have? I've worked with a MongoDB that has 200GB sitting on 3 shards. So 13GB on 1 shouldn't be an issue