Mongodb bulk write error - mongodb

I'm executing bulk write
bulk = new_packets.initialize_ordered_bulk_op()
bulk.insert(packet)
output = bulk.execute()
and getting an error that I interpret to mean that packet is not a dict. However, I do know that it is a dict. What could be the problem?
Here is the error:
BulkWriteError Traceback (most recent call last)
<ipython-input-311-93f16dce5714> in <module>()
2
3 bulk.insert(packet)
----> 4 output = bulk.execute()
C:\Users\e306654\AppData\Local\Continuum\Anaconda\lib\site-packages\pymongo\bulk.pyc in execute(self, write_concern)
583 if write_concern and not isinstance(write_concern, dict):
584 raise TypeError('write_concern must be an instance of dict')
--> 585 return self.__bulk.execute(write_concern)
C:\Users\e306654\AppData\Local\Continuum\Anaconda\lib\site-packages\pymongo\bulk.pyc in execute(self, write_concern)
429 self.execute_no_results(generator)
430 elif client.max_wire_version > 1:
--> 431 return self.execute_command(generator, write_concern)
432 else:
433 return self.execute_legacy(generator, write_concern)
C:\Users\e306654\AppData\Local\Continuum\Anaconda\lib\site-packages\pymongo\bulk.pyc in execute_command(self, generator, write_concern)
296 full_result['writeErrors'].sort(
297 key=lambda error: error['index'])
--> 298 raise BulkWriteError(full_result)
299 return full_result
300
BulkWriteError: batch op errors occurred

It can be many reasons...
the best is that you try...catch... the exception and check in the errors
from pymongo.errors import BulkWriteError
try:
bulk.execute()
except BulkWriteError as bwe:
print(bwe.details)
#you can also take this component and do more analysis
#werrors = bwe.details['writeErrors']
raise

Ok, the problem was that i was assigning _id explicitly and it turns out that the string was larger than 12-byte limit, my bad.

You should check 2 things:
Duplicates, if you are defining your own key.
Be able to manage custom types, In my case I was trying to pass a hash type object that was not able to be converted into a valid objectId, and that was leading me to the first point and I felt into a vicious circle (I solve it converting myObject to string.
Inserting one by one will give you the idea what is happening.

In addition to the above, check your unique indexes. If you're bulk inserting and have specified an index that doesn't exist in your data, you will get this error.
For example, I had accidentally specified name as a unique index, and the data I was inserting had no keys called name. After the first entry is inserted into mongo, it will throw this error because you're technically inserting another document with a unique name of null.
Here's a part of my model definition where I'm declaring a unique index:
self.conn[self.collection_name].create_index(
[("name", ASCENDING)],
unique=True,
)
And here are the details of the error being thrown:
{'writeErrors': [{'index': 1, 'code': 11000, 'keyPattern': {'name': 1},
'keyValue': {'name': None}, 'errmsg': 'E11000 duplicate key error collection:
troposphere.temp index: name_1 dup key: { name: null }'
...
more resources:
MongoDB E11000 duplicate key error

I was trying to insert two documents with the same "_id" and other keys.
Solution:
insert different "_id" s for different documents. OR
remove the "_id" and you get a randomized one.

Try using debugger, it should gives you errmsg with exact error, and op object was trying to insert.

Related

Mongo .find() returning duplicate documents (with same _id) (!)

Mongo appears to be returning duplicate documents for the same query, i.e. it returns more documents than there are unique _ids in the returned documents:
lobby-brain> count_iterated = 0; ids = {}
{}
lobby-brain> db.the_collection.find({
'a_boolean_key': true
}).forEach((el) => {
count_iterated += 1;
ids[el._id] = (ids[el._id]||0) + 1;
})
lobby-brain> count_iterated
278
lobby-brain> Object.keys(ids).length
251
That is, the number of unique _id returned is 251 -- but there were 278 documents returned by the cursor.
Investigating further:
lobby-brain> ids
{
'60cb8cb92c909a974a96a430': 1,
'61114dea1a13c86146729f21': 1,
'6111513a1a13c861467d3dcf': 1,
...
'61114c491a13c861466d39cf': 2,
'61114bcc1a13c861466b9f8e': 2,
...
}
lobby-brain> db.the_collection.find({
_id: ObjectId("61114c491a13c861466d39cf")
}).forEach((el) => print("foo"));
foo
>
That is, there aren't actually duplicate documents with the same _id -- it's just an issue with the .find().
I tried restarting the database, and rebuilding an index involving 'a_boolean_key', with the same results.
I've never seen this before and this seems impossible... what is causing this and how can I fix it?
Version info:
Using MongoDB: 5.0.5
Using Mongosh: 1.0.4
It is a stand-alone database, no replica set or sharding or anything like that.
Further Info
One thing to note is, there is a compound index with a_boolean_key as the first index, and a datetime field as the second. The boolean key is rarely updated on the database (~once/day), but the datetime field is frequently updated.
Maybe these updates are causing the duplicate return values?
Update Feb 15, 2022: I added a Mongo JIRA task here.
Try checking if you store indexes for a_boolean_key field.
When performing a count, MongoDB can return the count using only the
index
So, maybe you don't have indexes for all documents, so count method result is not equal to your manual count.
According to Louis Williams over at Mongo JIRA, this is not a bug but expected behavior.
Learn something new every day!

Updating JSONB object using another table

I am trying to update a JSONB field in one table with data from another table. For example,
update ms
set data = data || '{"COMMERCIAL": 3.4, "PCT" : medi_percent}'
from mix
where mix.id = mss.data_id
and data_id = 6000
and set_id = 20
This is giving me the following error -
Invalid input syntax for type json
DETAIL: Token "medi_percent" is invalid.
When I change medi_percent to a number, I don't get this error.
{"COMMERCIAL": 3.4, "PCT" : medi_percent} is not a valid JSON text. Notice there is no string interpolation happening here. You might be looking for
json_build_object('COMMERCIAL', 3.4, 'PCT', medi_percent)
instead where medi_percent is now an expression (that will presumably refer to your mix column).

Neo4j Rest Cypher Qyery to match node : Errors

I am sending cypher query using REST API as shown below:
MATCH (user:Profile)-[:HAS_SEARCHED]-(term{name:"TV"})
WITH [x in collect(user)| id(x) ] AS userIDs
MATCH(user:Profile) where id(user) in userIDs
MATCH (user)-[r:HAS_SEARCHED]->(term:SearchTerm)
return term.name
Although the query executes well when running on the server directly, gives below error in eclipse:
{"results":[],"errors":[{"code":"Neo.ClientError.Request.InvalidFormat",
"message":"Unable to deserialize request:
Unexpected character ('T' (code 84)): was expecting comma to separate OBJECT entries\n at
[Source: HttpInputOverHTTP#2543f0f2; line: 1, column: 85]"}]}
Please help!! Thanks
Regarding your query, you don't need to match the users twice:
MATCH (user:Profile)-[:HAS_SEARCHED]-(term:SearchTerm {name:"TV"})
WITH distinct user
MATCH (user)-[r:HAS_SEARCHED]->(term:SearchTerm)
RETURN term.name, count(*) as freq
or even:
MATCH (term:SearchTerm {name:"TV"})<-[:HAS_SEARCHED]-(:Profile)-[:HAS_SEARCHED]->(term:SearchTerm)
RETURN term.name, count(*) as freq

Finding number of inserted documents in a bulk insert with duplicate keys

I'm doing a bulk-insert into a mongodb database. I know that 99% of the records inserted will fail because of a duplicate key error. I would like to print after the insert how many new records were inserted into the database. All this is being done in python through the tornado motor mongodb driver, but probably this doesn't matter much.
try:
bulk_write_result = yield db.collections.probe.insert(dataarray, continue_on_error=True)
nr_inserts = bulk_write_result["nInserted"]
except pymongo.errors.DuplicateKeyError as e:
nr_inserts = ???? <--- what should I put here?
Since an exception was thrown, bulk_write_result is empty. Obviously I can (except for concurrency issues) do a count of the full collection before and after the insert, but I don't like the extra roundtrips to the database for just a line in the logfile. So is there any way I can discover how many records were actually inserted?
It is not clear to me why you yield your insert result. But, concerning the bulk inserts:
you should use insert_many as insert is deprecated;
when setting the ordered keyword to False, your inserts will continue in case of error;
in case of error, insert_many will raise a BulkWriteError, that you can query to obtain the number of inserted documents.
All of this lead to something like that:
try:
insert_many_result = db.collections.probe.insert_many(dataaray,ordered=False)
nr_inserts = len(insert_many_result.inserted_ids)
except pymongo.errors.BulkWriteError as bwe:
nr_inserts = bwe.details["nInserted"]
If you need to identify the reason behind the write error, you will have to examine the bwe.details['writeErrors'] array. A code value of 11000 means "Duplicate key error":
>>> pprint(e.details['writeErrors'])
[{'code': 11000,
'errmsg': 'E11000 duplicate key error index: test.w.$k_1 dup key: { : 1 }',
'index': 0,
'op': {'_id': ObjectId('555465cacf96c51208587eac'), 'k': 1}},
{'code': 11000,
'errmsg': 'E11000 duplicate key error index: test.w.$k_1 dup key: { : 3 }',
'index': 1,
'op': {'_id': ObjectId('555465cacf96c51208587ead'), 'k': 3}}
Here, as you can see, I tried to insert two documents in the w collection of the test db. Both inserts failed because of a duplicate key error.
Regular insert with continue_on_error can't report the info you want. If you're on MongoDB 2.6 or later, however, we have a high-performance solution with good error reporting. Here's a complete example using Motor's BulkOperationBuilder:
import pymongo.errors
from tornado import gen
from tornado.ioloop import IOLoop
from motor import MotorClient
db = MotorClient()
dataarray = [{'_id': 0},
{'_id': 0}, # Duplicate.
{'_id': 1}]
#gen.coroutine
def my_insert():
try:
bulk = db.collections.probe.initialize_unordered_bulk_op()
# Prepare the operation on the client.
for doc in dataarray:
bulk.insert(doc)
# Send to the server all at once.
bulk_write_result = yield bulk.execute()
nr_inserts = bulk_write_result["nInserted"]
except pymongo.errors.BulkWriteError as e:
print(e)
nr_inserts = e.details['nInserted']
print('nr_inserts: %d' % nr_inserts)
IOLoop.instance().run_sync(my_insert)
Full documentation: http://motor.readthedocs.org/en/stable/examples/bulk.html
Heed the warning about poor bulk insert performance on MongoDB before 2.6! It'll still work but requires a separate round-trip per document. In 2.6+, the driver sends the whole operation to the server in one round trip, and the server reports back how many succeeded and how many failed.

Fabricating hstore values for Postgresql

As in title, I try to fabricate hash into hstore type column.
I have seen question fabricator with hstore attribute, but the solution there does not work for me.
My hstore column name is "status", there I want to set three flags: "processed", "duplicate", "eol". I'm using sequel (4.14.0) as ORM, fabrication (2.8.1), Ruby 2.1.2 and Postgresql of course ;)
case 1:
status {eol: true, duplicate: false, processed: true}
result:
syntax error
case 2:
status {"heol"=>"true", "hduplicate"=>"false", "hprocessed"=>"true"}
result:
syntax error
case 3:
status do
{"heol"=>"true", "hduplicate"=>"false", "hprocessed"=>"true"}
end
result:
Sequel::DatabaseError:
PG::DatatypeMismatch: ERROR: column "status" is of type hstore but expression is of type boolean
LINE 1: ...23.0, '2000-01-01', (('heol' = '...
HINT: You will need to rewrite or cast the expression.
case 4:
status do
{status: "heol:true"}
end
result:
Failure/Error: Fabricate(:entry)
Sequel::DatabaseError:
PG::UndefinedColumn: ERROR: column "status" does not exist
LINE 1: ...123.0, '2000-01-01', ("status" =...
HINT: There is a column named "status" in table "entries", but it cannot be referenced from this part of the query.
case 5:
status do
{'status' => "heol:true"} end
result:
Failure/Error: Fabricate(:entry)
Sequel::DatabaseError:
PG::DatatypeMismatch: ERROR: column "status" is of type hstore but expression is of type boolean
LINE 1: ...123.0, '2000-01-01', ('status' =...
HINT: You will need to rewrite or cast the expression.
case 6:
gave up ;)
result:
this question
With FactoryGirl everything works as expected, and syntax is straightforward:
FactoryGirl.define do
factory :entry do
status {{ flag_processed: true, flag_duplicate: false }}
end
Promise to make good use of the correct syntax in Fabrication =)
Thanks!
Lucas.
Case 1 and 2 are definitely not what you want. The Hash needs to be specified within a block, which is the same as what FactoryGirl is doing with your example containing double braces. Case 3, 4, and 5 would normally work but don't because Sequel has a special syntax for assigning hstore columns and Fabrication is not automatically translating it for you (because before you brought it up I had no idea this was a thing).
If you change it to this, I think you'll find success:
status do
Sequel.hstore("heol"=>"true", "hduplicate"=>"false", "hprocessed"=>"true")
end