MongoDB - Is it possible to only insert a record when the record doesn't exist - mongodb

I have a database with custom ids and I only want to insert a new record if the id is different from the other ids. If the id exists I don't want to update the value (so I think upsert isn't a solution for me).
I'm using the pymongo connector.
Example Database:
[
{"_id": 1, "name": "john"},
{"_id": 2, "name": "paul"}
]

Trap and ignore the DuplicateKeyError, e.g.:
pymongo import MongoClient
from pymongo.errors import DuplicateKeyError
db = MongoClient()['mydatabase']
records = [
{"_id": 1, "name": "john"},
{"_id": 2, "name": "paul"},
{"_id": 3, "name": "ringo"}
]
for record in records:
try:
db.mycollection.insert_one(record)
print (f'Inserted {record}')
except DuplicateKeyError:
print (f'Skipped duplicate {record}')
pass
Result (something like):
Skipped duplicate {'_id': 1, 'name': 'john'}
Skipped duplicate {'_id': 2, 'name': 'paul'}
Inserted {'_id': 3, 'name': 'ringo'}

Related

Get items of array by index in MongoDB

So I have a data structure in a Mongo collection (v. 4.0.18) that looks something like this…
{
"_id": ObjectId("242kl4j2lk23423"),
"name": "Doug",
"kids": [
{
"name": "Alice",
"age": 15,
},
{
"name": "James",
"age": 13,
},
{
"name": "Michael",
"age": 10,
},
{
"name": "Sharon",
"age": 8,
}
]
}
In Mongo, how would I get back a projection of this object with only the first two kids? I want the output to look like this:
{
"_id": ObjectId("242kl4j2lk23423"),
"name": "Doug",
"kids": [
{
"name": "Alice",
"age": 15,
},
{
"name": "James",
"age": 13,
}
]
}
It seems like I should easily be able to get them by index, but I'm not seeing anything in the docs about how to do that. The real-world problem I'm trying to solve has nothing to do with kids, and the array could be quite lengthy. I'm trying to break it up and process it in batches without having to load the whole thing into memory in my application.
EDIT (non-sequential indexes):
I noticed that since I asked about item 1 & 2 that $slice would suffice…however, what if I wanted items 1 & 3? Is there a way I can specify specific array indexes to return?
Any ideas or pointers for how to accomplish that?
Thanks!
You are looking for the $slice projection operator if the desired selection are near each other.
https://docs.mongodb.com/manual/reference/operator/projection/slice/
This would return the first 2
client.db.collection.find({"name":"Doug"}, { "kids": { "$slice": 2 } })
returns
{'_id': ObjectId('5f85f682a45e15af3a907f51'), 'name': 'Doug', 'kids': [{'name': 'Alice', 'age': 15}, {'name': 'James', 'age': 13}]}
this would skip the first kid and return the next two (second and third)
client.db.collection.find({"name":"Doug"}, { "kids": { "$slice": [1, 2] } })
returns
{'_id': ObjectId('5f85f682a45e15af3a907f51'), 'name': 'Doug', 'kids': [{'name': 'James', 'age': 13}, {'name': 'Michael', 'age': 10}]}
Edit:
Arbitrary selections 1 and 3 probably need to route through an aggregation pipeline rather than a simple query. The performance shouldn't be too much different assuming you have an index on the $match field.
Steps of your pipeline should be pretty obvious and you should be able to take it from here.
Hate to point to RTFM, but that's going to be super helpful here to at least be acquainted with the pipeline operations.
https://docs.mongodb.com/manual/reference/operator/aggregation/
Your pipeline should:
$match on your desired query
$set some new field kid_selection to element 1 (second element) and element 3 (4th element) since counting starts at 0. Notice the prefixed $ on the "kids" key name in the kid_selection setter. When referencing a key in the document you're working on, you need to prefix with $
project the whole document, minus the original kids field that we've selected from
client.db.collection.aggregate([
{"$match":{"name":"Doug"}},
{"$set": {"kid_selection": [
{ "$arrayElemAt": [ "$kids", 1 ] },
{ "$arrayElemAt": [ "$kids", 3 ] }
]}},
{ "$project": { "kids": 0 } }
])
returns
{
'_id': ObjectId('5f86038635649a988cdd2ade'),
'name': 'Doug',
'kid_selection': [
{'name': 'James', 'age': 13},
{'name': 'Sharon', 'age': 8}
]
}

MongoDB Merge Equivalent Collections

I have two collections with same structure and want to merge them in aggregation result and query & sort over them after merging.
E.g.;
First collection:
{_id: "123", "name": "sercan"}
Second collection:
{_id: "456", "name": "hakan"}
What I want;
[{_id: "123", "name": "sercan"}, {_id: "456", "name": "hakan"}]
What I tried;
{"from":"secondCollection",pipeline: [],"as":"seconds"}
// result
{_id: "123", "name": "sercan", seconds: [{_id: "456", "name": "hakan"}]}
And the above trial, puts all documents into seconds if there're more documents in the second collection.
Thanks in advance
The one option I have used is $merge in MongoDB.
It's basically merge one collection to another and create new output collection if it does not exist.
Reference:
https://www.mongodb.com/docs/manual/reference/operator/aggregation/merge/

Is it possible to do a subquery to return an array for the $nin operator in MongoDB?

I have a data set that looks something like:
{"key": "abc", "val": 1, "status": "np"}
{"key": "abc", "val": 2, "status": "p"}
{"key": "def", "val": 3, "status": "np"}
{"key": "ghi", "val": 4, "status": "np"}
{"key": "ghi", "val": 5, "status": "p"}
I want a query that returns document(s) that have a status="np" but only where there are other documents with the same key that do not have a status value of "p". So the document returned from the data set above would be key="def" since "abc" has a value of "np" but "abc" also has a document with a value of "p". This is also true for key="ghi". I came up with something close but I don't think the $nin operator supports q distinct query.
db.test2.find({$and: [{"status":"np"}, {"key": {$nin:[<distinct value query>]]})
If I were to hardcode the value in the $nin array, it would work:
db.test2.find({$and: [{"status":"np"}, {"key": {$nin:['abc', 'ghi']}}]})
I just need to be able to write a find inside the square brackets. I could do something like:
var res=[];
res = db.test2.distinct("key", {"status": "p"});
db.test2.find({$and: [{"status":"np"}, {"key": {$nin:res}}]});
But the problem with this is that in the time between the two queries, another process may update the "status" of a document and then I'd have inconsistent data.
Try this
db.so.aggregate([
{$group: {'_id': '$key', 'st': {$push: '$status'}}},
{$project :{st: 1, '$val':1, '$status':1, 'hasNp':{$in:['np', '$st']}, hasP: {$in:['p', '$st']}}},
{$match: {hasNp: true, hasP: false}}
]);

Cannot encode object in pymongo

I have this query in mongodb:
db.getCollection('users').find(
{"first_name": {$in: ['Alex', 'Andrew']}},
{'id': 1, '_id': 0}
)
And have results from this query.
But when I try to run this query in python with pymongo:
select_users_id = collection_users.find(
{"first_name": {"$in: ['Alex', 'Andrew']"}},
{"id": 1, "_id": 0}
)
for i in select_users_id:
print i.values()
I receive error message:
bson.errors.InvalidDocument: Cannot encode object: set(["$in: ['Alex', 'Andrew']"])
You need to check your query. You missed the closing quotes for $in operator.
select_users_id = collection_users.find(
{"first_name": {"$in": ['Alex', 'Andrew']}},
{"id": 1, "_id": 0}
)

Pymongo and items in DB

I am writitng an article, its contents and keywords to a MongoDB database using Python..now the user can give me a keyword I need to find articles having those keywords..
I am wrting to DB as below:
myrecord = {"Link": link,
"Title": title,
"HeadLine": headline,
"BodyText":innerBodyText,
"Keywords":keywords,
"date": datetime.datetime.utcnow()
}
try:
print("Inserting the record in the DB")
result = my_collection.insert_one(myrecord, False)
keywords is a list of bnary tuple
[("africa",3),("content",5),...]
I wanted to know hoe to implement above usecases..I neeed to travese all records in DB to find articles having a particular keyword
Writing below uery for this?
def getArticlesbyKeywords(self,keyword,showBody=False):
client = pymongo.MongoClient(
"mongodb://mahdi:Isentia#aws-ap-southeast-1-portal.2.dblayer.com:15312,aws-ap-southeast-1-portal.0.dblayer.com:15312/BBCArticles?ssl=true",
ssl_cert_reqs=ssl.CERT_NONE)
mydb = client['BBCArticles']
my_collection = mydb['Articles']
my_collection.create_index([("Keywords.key", "text")])
print 'Articles containing higher occurences of the keyword is sorted as follow:'
for doc in my_collection.find({"$text": {"$search": keyword}}).sort({"score": {"$meta": "textScore"}}):
print(doc))
I get below error:
Traceback (most recent call last):
File "api_access.py", line 21, in <module>
api.getArticlesbyKeywords("BBC")
File "api_access.py", line 15, in getArticlesbyKeywords
for doc in my_collection.find({"$text": {"$search": keyword}}).sort({"score": {"$meta": "textScore"}}):
File "C:\Python27\lib\site-packages\pymongo\cursor.py", line 660, in sort
keys = helpers._index_list(key_or_list, direction)
File "C:\Python27\lib\site-packages\pymongo\helpers.py", line 63, in _index_list
raise TypeError("if no direction is specified, "
TypeError: if no direction is specified, key_or_list must be an instance of list
A sample record in my mongo DB is as follow:
Keywords: "[{'count': 20, 'key': 'north'}, {'count': 13, 'key': 'image'}, {'count': 13, 'key': 'korean'}, {'count': 10, 'key': 'malaysian'}, {'count': 9, 'key': 'kim'}]"
You need a slightly different schema in order to make this data queryable. Insert an array of documents instead of an array of pairs:
my_collection.insert_one({
"Keywords": [{"key": "africa", "score": 3},
{"key": "content", "score": 5}]
})
Then you can query like:
for doc in my_collection.find({"Keywords.key": "africa"}):
print(doc)
Make sure you create an index:
my_collection.create_index([("Keywords.key", 1)])
If you want more sophisticated querying, use a text index:
my_collection.create_index([("Keywords.key", "text")])
for doc in my_collection.find(
{"$text": {"$search": "africa"}}
).sort({"score": {"$meta": "textScore"}}):
print(doc)
See MongoDB Text Indexes and sort by meta.
use $elemMatch to search in array.
db.test1.find({"items":{"$elemMatch" : {"$elemMatch": {"$in": ["a"]}}}})
{ "_id" : ObjectId("58a9a9805cfd72c8efd8f315"), "name" : "a", "items" : [ [ "a", 1 ], [ "b", 2 ] ] }
Why not use subdocument like
keywords: [{
kw : "africa",
count: 3
},...]
then you can use nest . like {"keywords.kw" : "africa"} to search.