MongoDB: If document exists VS If document does not exist - mongodb

I'm a beginner at MongoDB and I'm currently pairing it with php to try to make the following work:
I want to create a database to store information that can get updated at any time but it needs to keep a "document added on:" date.
In sum:
IF document exists:
-Update everything in the document except the "document added on" date entry.
ELSE
-Create a document with the data + a "document added on: XXXXX" date.
In a case of a database with this format:
Database{ document{ User_ID: "12345", Name: "Joe", More_Info: "" Date_Added_To_DB: "1372291496", Last_Updated:"1372291556"}}
I've researched and asked around and the best I've got so far is a function that will update a whole document if it exists and create a new document if it does not.
db.Database.update({'User_ID' : $userID},{$set: {'fieldName' : new "data" }}, {upsert: true})

The question is how you determine that "the document" exists? Usually, you'd do this using a unique id. Now MongoDB's ObjectId comes to the rescue because it contains a timestamp already. This can also be used in queries.
Instead of using an integer field User_ID, you might want to consider calling the field _id and use the ObjectId data type so you get that functionality for free.
> db.test.insert({"foo" : "test"});
{ "_id" : ObjectId("51cb763e58bb4077aea65b3d"), "foo" : "test" }
> var foo = db.test.findOne();
> foo._id.getTimestamp();
ISODate("2013-06-26T23:16:14Z")

Related

Mongo Upsert $in values from Find That were Not found

I have to run mongo updates for employee id linked to a supervisorId (That's an oversimplification, but its the gist of what I need). The find uses a list of employee ids, and if the employee id is not found, I have an upsert to add the data. However the issue is I'm not getting the employee id in the upsert. Is it possible to upsert values from the $in array?
Eg:
db.collection.update({
empId:{
$in:['EMP1','EMP2','EMP3']
}
},
{$set: {SupervisorId:'SUP25'}},
{upsert:true},
{multi:true})
If I run this query I get one new doc with _id and SupervisorId:
{
"_id" : ObjectId("5e56ece8b0ae3537d0261ghg"),
"SupervisorId" : "SUP25"
}
I would like to get this in my upserts:
{
"_id" : ObjectId("5e56ece8b0ae3537d0261ghg"),
"SupervisorId" : "SUP25"
"EmpId": "EMP1"
}
I'm still getting comfortable with mongo, but is this even possible with an upsert? If not possible, are there any options for something similar?
MongoDB version 2.9.1

How can I update only new data or changed value in mongodb?

Update command updates the key with provided Json. I want to update only the object that is not present in db and changed value. How can I do that?
"data" : [
{
"_id" : "5bb6253d861d057857ec3ff0",
"name" : "C"
},
{
"_id" : "5bb625fc861d057857ec3ff1",
"name" : "B"
},
{
"_id" : "5bb625fe861d057857ec3ff2",
"name" : "A"
}
]
my data is like this. So, if one more array object comes in json of only 2 new object comes then it should insert the two data along with the 3 data.
Update the object that is not present in DB:
Use upsert: Upsert creates a new document when no document matches the query criteria. Alternatively, you can add null checks in your query e.g { user_id:null }. This will allow to update the data where a record for the user is not present in DB.
Update changed value:
This can be implemented maintaining a key to store last_updated_at. If the last_updated_at value does not match to the previously_updatede_at that record can be treated at modified
You can implement Change Streams, introduced in MongoDB 3.6 from which you can receive real time changes on your data. You can receive only the data that was changed, filtering by the "update" operation. Furthermore you can also filter for data that is newly inserted filtering by the "insert" operation. Please see Change Streams.

how to use filter function of MongoDB in Pycharm?

I am a new guy for DB operation.
When I use MongoDB in Pycharm, it's hard for me enter image description hereto manual find data.
But I find some fields like 'filter','sort','Projection',so how to use it?
In the Filter field you can enter (without "): "{field: 'value'}"
So for instance if you have a collection 'transactions' with the following documents:
{ "_id" : <some unique id>,
"type" : <'TE' or 'MT' or 'SE'>,
"value": <some value>
}
You can find all document with type 'TE' bij entering "{type: 'TE}" in the Filter field.
For an explenation of the Project Field see: https://docs.mongodb.com/manual/tutorial/project-fields-from-query-results/

index Elasticsearch document with existing "id" field

I have documents that I want to index into Elasticsearch with an existing unique "id" field.
I get an array of documents from a REST api endpoint ( eg.: http://some.url/api/products) in no particular order and if a document with the _id already exists in Elasticsearch it should update and reindex the document.
I want to create a new document if no document with the _id in Elasticsearch exists and then update a document, if it matches with an existing document in Elasticsearch.
This could be done with:
PUT products/product/un1qu3-1d-b718-105973677e95
{
"id": "un1qu3-1d-b718-105973677e95",
"state": "packaged"
}
The basic idea is to use the provided "id" field to create or update a document. Extraction of _id from document fields seems deprecated (link). But the indexing/ reindexing of documents with the "id" field can be done manually very easy with the kibana dev tools, with postman or a cURL request.
I want to achieve this (re-)indexing of documents that I receive over this api endpoint programmatically.
Is it possible to achieve this with logstash or a simple cronjob? Does Elasticsearch provide any functionality for this? Or do I need to write some custom backend to achieve this?
I thought of either:
1) index the document into Elasticsearch with the "id" field of my document or
2) find an Elasticsearch query that first searches for the document with the specific "id" field and then updates the document.
I was unable to find a solution for either way and have no clue how a good approach would look like.
Can anyone point me into the right direction on how to achieve this, suggest a better approach or provide a solution?
Any help much appreciated!
Update
I solved the problem with the help of the accepted answer. I used Logstash, the Http_poller input plugin, this article: https://www.elastic.co/blog/new-way-to-ingest-part-1 and this elastic.co question: https://discuss.elastic.co/t/upsert-with-logstash/59116
My output of logstash looks like this at the moment:
output {
elasticsearch {
index => "products"
document_type => "product"
pipeline => "rename_id"
document_id => "%{id}"
doc_as_upsert => true
action => "update"
}
Update 2
just for the sake of completeness I added the "rename_id" pipeline
{
"rename_id": {
"description": "_description",
"processors": [
{
"set": {
"field": "_id",
"value": "{{id}}"
}
}
]
}
}
It works this way!
Thanks alot!
Peter,
If I understand correctly, you want to ingest your documents into elastic search and will have some updates in future for these documents ?
If that's the case,
- Use your documents primary key as id for elastic documents.
- You can ingest entire document with updated values, elastic will replace the previous document with new one. given the primary key is same. Old document with same id will be deleted.
We use this approach for our search data.
you can use ingest pipelines to extract the id from the body and the _create endpoint to only create a document if it does not exist. Minor note: If you could specify the id on the client side indexing would be faster, as adding a pipeline adds a certain overhead.
PUT _ingest/pipeline/my_pipeline
{
"description": "_description",
"processors": [
{
"set": {
"field": "_id",
"value": "{{id}}"
}
}
]
}
PUT twitter/tweet/1?op_type=create&pipeline=my_pipeline
{
"foo" : "bar",
"id" : "123"
}
GET twitter/tweet/123
# this call will fail
PUT twitter/tweet/1?op_type=create&pipeline=my_pipeline
{
"foo" : "bar",
"id" : "123"
}
You can use script to UPSERT (update or insert) your document
PUT /products/product/un1qu3-1d-b718-105973677e95/_update
{
"script": {
"inline": "ctx._source.state = \"packaged\"",
"lang": "painless"
},
"upsert": {
"id": "un1qu3-1d-b718-105973677e95",
"state": "packaged"
}
}
Above query find the document with _id = "un1qu3-1d-b718-105973677e95"
if it is able to find any document then it will update state to "packaged" otherwise create a new document with field "id" and "state" (you can insert as many fields as you want).

Get position of selected document in collection [mongoDB]

How to get position (index) of selected document in mongo collection?
E.g.
this document: db.myCollection.find({"id":12345})
has index 3 in myCollection
myCollection:
id: 12340, name: 'G'
id: 12343, name: 'V'
id: 12345, name: 'A'
id: 12348, name: 'N'
If your requirement is to find the position of the document irrespective of any order, that is not
possible as MongoDb does not store the documents in specific order.
However,if you want to know the index based on some field, say _id , you can use this method.
If you are strictly following auto increments in your _id field. You can count all the documents
that have value less than that _id, say n , then n + 1 would be index of the document based on _id.
n = db.myCollection.find({"id": { "$lt" : 12345}}).count() ;
This would also be valid if documents are deleted from the collection.
As far as I know, there is no single command to do this, and this is impossible in general case (see Derick's answer). However, using count() for a query done on an ordered id value field seems to work. Warning: this assumes that there is a reliably ordered field, which is difficult to achieve in a concurrent writer case. In this example _id is used, however this will only work with a single writer case.:
MongoDB shell version: 2.0.1
connecting to: test
> use so_test
switched to db so_test
> db.example.insert({name: 'A'})
> db.example.insert({name: 'B'})
> db.example.insert({name: 'C'})
> db.example.insert({name: 'D'})
> db.example.insert({name: 'E'})
> db.example.insert({name: 'F'})
> db.example.find()
{ "_id" : ObjectId("4fc5f040fb359c680edf1a7b"), "name" : "A" }
{ "_id" : ObjectId("4fc5f046fb359c680edf1a7c"), "name" : "B" }
{ "_id" : ObjectId("4fc5f04afb359c680edf1a7d"), "name" : "C" }
{ "_id" : ObjectId("4fc5f04dfb359c680edf1a7e"), "name" : "D" }
{ "_id" : ObjectId("4fc5f050fb359c680edf1a7f"), "name" : "E" }
{ "_id" : ObjectId("4fc5f053fb359c680edf1a80"), "name" : "F" }
> db.example.find({_id: ObjectId("4fc5f050fb359c680edf1a7f")})
{ "_id" : ObjectId("4fc5f050fb359c680edf1a7f"), "name" : "E" }
> db.example.find({_id: {$lte: ObjectId("4fc5f050fb359c680edf1a7f")}}).count()
5
>
This should also be fairly fast if the queried field is indexed. The example is in mongo shell, but count() should be available in all driver libs as well.
This might be very slow but straightforward method. Here you can pass as usual query. Just I am looping all the documents and checking if condition to match the record. Here I am checking with _id field. You can use any other single field or multiple fields to check it.
var docIndex = 0;
db.url_list.find({},{"_id":1}).forEach(function(doc){
docIndex++;
if("5801ed58a8242ba30e8b46fa"==doc["_id"]){
print('document position is...' + docIndex);
return false;
}
});
There is no way that MongoDB can return this as it does not keep documents in order in the database, just like MySQL f.e. doesn't name row numbers.
The ObjectID trick from jhonkola will only work if only one client creates new elements, as the ObjectIDs are generated on the client side, with the first part being a timestamp. There is no guaranteed order if different clients talk to the same server. Still, I would not rely on this.
I also don't quite understand what you are trying to do though, so perhaps mention that in your question? I can then update the answer.
Restructure your collection to include the position of any entry i.e {'id': 12340, 'name': 'G', 'position': 1} then when searching the database collection(myCollection) using the desired position as a query
The queries I use that return the entire collection all use sort to get a reproducible order, find.sort.forEach works with the script above to get the correct index.