I built a REST service and I found out that a JSON String generated from an ObjectId by using Gson will be in a different format than that is produced by spring-boot. and if I send an ObjectId of an existing Document's _id field in GSON format to my REST service and save it to the collection by using mongorepository's save function, a new Document with duplicated _id will still be inserted even if a unique index is set on such field. But if I send ObjectId in a format that is produced by spring-boot everything works perfectly. I'm wondering what caused such a problem?
"timestamp": 1558461711,
"machineIdentifier": 5077764,
"processIdentifier": 21816,
"counter": 13546695,
"date": "2019-05-21T18:01:51.000+0000",
"time": 1558461711000,
"timeSecond": 1558461711(generated by spring-boot)
"counter": 13546695,
"randomValue1": 9256029,
"randomValue2": 856,
"timestamp": 1558461711(by GSON)
If you are working with mongodb it is better to use org.bson.Document (which is provided by mongodb dependency) or some other mongodb class to convert document to json rather then GSON.
Document document = new Document();
document.put("_id", new ObjectId());
String json = document.toJson()
document.toJson() should stringify ObjectId in a right way.
Actually the output of the code above will be:
{ "_id" : { "$oid" : "5ce51fb47dda11a8507087eb" } }
Which is a valid format for mongodb, not sure about how SpringBoot will react on it.
Anyway, hope it'll help.
Related
Well, I have used MongoDB in a while but I don't know how to handle this situation.
The scenario is: I have data inserted in MongoDB. I have exported the data in JSON format, and it is something like:
[
{
"_id": { "$oid": "60ff324f41c4d5b96054390d" },
"field": {
"due_date": { "$date": "2021-11-03T00:09:18.271Z" }
}
}
]
You can see that :
_id is { "$oid": "60ff324f41c4d5b96054390d" } and
date is { "$date": "2021-11-03T00:09:18.271Z" }.
So, the problem is trying to insert in another DB using Mongoose. I want to test some funcionality isolated so I wan't these values in other environment, so I have used insertMany() with the JSON previously exported.
(Maybe there is a more elegant way to import a JSON file but is only for testing purposes)
await model.insertMany(JSON.parse(fs.readFileSync('./data.json').toString()))
But the problem is here: It throws an error because my schema says that _id is an ObjectId and due_date is a Date object but they are actually read as objects: {$oid: ""} and {$date: ""}
ValidationError: model validation failed: _id: Cast to ObjectId failed for value "{ '$oid': '60ff324f41c4d5b96054390d' }" (type Object) at path "_id", field.due_date: Cast to date failed for value "{ '$date': '2021-11-03T00:09:18.271Z' }" (type Object) at path "field.due_date"
So the question is: Is there any way to cast the $oid and $date objects using mongoose while inserting?
Also, I have managed to insert the values using a very ugly script iterating over each value and checking if the object is $oid or $date and modifying the values... but it works.
So the question is not: "Is there a workaround to do this?" but "Is there a way to do it directly with mongoose functions as insertMany and casted automatically?"
Thanks.
The {"$oid": ...} and {"$date":...} constructs are MongoDB extended JSON notation, since JSON does not have any type for Date or ObjectId.
In order to insert those values, you will need to use a JSON parser that knows about this extended format.
One possibility is the json_util that is included with bson.
Or you can use mongoimport to read the JSON file and do the inserts for you.
I have used mongo import to import data into mongodb from csv files. I am trying to retrieve data from an Mongodb realm service. The returned data for the entry is as follows:
{
"_id": "6124edd04543fb222e",
"Field1": "some string",
"Field2": {
"$numberDouble": "145.81"
},
"Field3": {
"$numberInt": "0"
},
"Field4": {
"$numberInt": "15"
},
"Field5": {
"$numberInt": "0"
}
How do I convert this into normal JSON by removing $numberInt and $numberDouble like :
{
"_id": "6124edd04543fb222e",
"Field1": "some string",
"Field2": 145.8,
"Field3": 0,
"Field4": 15,
"Field5": 0
}
The fields are also different for different documents so cannot use Mongoose directly. Are there any solutions to this?
Also would help to know why the numbers are being stored as $numberInt:"".
Edit:
For anyone with the same problem this is how I solved it.
The array of documents is in EJSON format instead of JSON like said in the upvoted answer. To covert it back into normal JSON, I used JSON.stringify to first convert each document I got from map function into string and then parsed it using EJSON.parse with
{strict:false} (this option is important)
option to convert it into normal JSON.
{restaurants.map((restaurant) => {
restaurant=EJSON.parse(JSON.stringify(restaurant),{strict:false});
}
EJSON.parse documentation here. The module to be installed and imported is mongodb-extjson.
The format with $numberInt etc. is called (MongoDB) Extended JSON.
You are getting it on the output side either because this is how you inserted your data (meaning your inserted data was incorrect, you need to fix the ingestion side) or because you requested extended JSON serialization.
If the data in the database is correct, and you want non-extended JSON output, you generally need to write your own serializers to JSON since there are multiple possibilities of how to format the data. MongoDB's JSON output format is the Extended JSON you're seeing in your first quote.
I have been trying to use reactivemongo to insert some documents into a mongodb collection with a few BSON types.
I am using the Play JSON library to parse and manipulate some documents in extended JSON, here is one example:
{
"_id" : {"$oid": "5f3403dc7e562db8e0aced6b"},
"some_datetime" : {
"$date" : {"$date": 1597841586927}
}
}
I'm using reactivemongo-play-json, and so I have to import the following so my JsObject is automatically cast to a reactivemongo BSONDocument when passing it to collection.insert.one
import reactivemongo.play.json.compat._
import json2bson._
Unfortunately, once I open my mongo shell and look at the document I just inserted, this is the result:
{
"_id" : ObjectId("5f3403dc7e562db8e0aced6b"),
"some_datetime" : {
"$date" : NumberLong("1597244282116")
},
}
Only the _id has been understood as a BSON type described using extended JSON, and I'd expect the some_datetime field to be something like a ISODate(), same as I'd expect to see UUID()-type values instead of their extended JSON description which looks like this:
{'$binary': 'oKQrIfWuTI6JpPbPlYGYEQ==', '$type': '04'}
How can I make sure this extended JSON is actually converted to proper BSON types?
Turns out the problem is that what I thought to be extended JSON is actually not; my datetime should be formatted as:
{"$date": {"$numberLong": "1597841586927"}}
instead of
{"$date": 1597841586927}
The wrong format was introduced by my data source - a kafka connect mongo source connector not serializing documents to proper extended JSON by default (see this stackoverflow post).
I have documents that I want to index into Elasticsearch with an existing unique "id" field.
I get an array of documents from a REST api endpoint ( eg.: http://some.url/api/products) in no particular order and if a document with the _id already exists in Elasticsearch it should update and reindex the document.
I want to create a new document if no document with the _id in Elasticsearch exists and then update a document, if it matches with an existing document in Elasticsearch.
This could be done with:
PUT products/product/un1qu3-1d-b718-105973677e95
{
"id": "un1qu3-1d-b718-105973677e95",
"state": "packaged"
}
The basic idea is to use the provided "id" field to create or update a document. Extraction of _id from document fields seems deprecated (link). But the indexing/ reindexing of documents with the "id" field can be done manually very easy with the kibana dev tools, with postman or a cURL request.
I want to achieve this (re-)indexing of documents that I receive over this api endpoint programmatically.
Is it possible to achieve this with logstash or a simple cronjob? Does Elasticsearch provide any functionality for this? Or do I need to write some custom backend to achieve this?
I thought of either:
1) index the document into Elasticsearch with the "id" field of my document or
2) find an Elasticsearch query that first searches for the document with the specific "id" field and then updates the document.
I was unable to find a solution for either way and have no clue how a good approach would look like.
Can anyone point me into the right direction on how to achieve this, suggest a better approach or provide a solution?
Any help much appreciated!
Update
I solved the problem with the help of the accepted answer. I used Logstash, the Http_poller input plugin, this article: https://www.elastic.co/blog/new-way-to-ingest-part-1 and this elastic.co question: https://discuss.elastic.co/t/upsert-with-logstash/59116
My output of logstash looks like this at the moment:
output {
elasticsearch {
index => "products"
document_type => "product"
pipeline => "rename_id"
document_id => "%{id}"
doc_as_upsert => true
action => "update"
}
Update 2
just for the sake of completeness I added the "rename_id" pipeline
{
"rename_id": {
"description": "_description",
"processors": [
{
"set": {
"field": "_id",
"value": "{{id}}"
}
}
]
}
}
It works this way!
Thanks alot!
Peter,
If I understand correctly, you want to ingest your documents into elastic search and will have some updates in future for these documents ?
If that's the case,
- Use your documents primary key as id for elastic documents.
- You can ingest entire document with updated values, elastic will replace the previous document with new one. given the primary key is same. Old document with same id will be deleted.
We use this approach for our search data.
you can use ingest pipelines to extract the id from the body and the _create endpoint to only create a document if it does not exist. Minor note: If you could specify the id on the client side indexing would be faster, as adding a pipeline adds a certain overhead.
PUT _ingest/pipeline/my_pipeline
{
"description": "_description",
"processors": [
{
"set": {
"field": "_id",
"value": "{{id}}"
}
}
]
}
PUT twitter/tweet/1?op_type=create&pipeline=my_pipeline
{
"foo" : "bar",
"id" : "123"
}
GET twitter/tweet/123
# this call will fail
PUT twitter/tweet/1?op_type=create&pipeline=my_pipeline
{
"foo" : "bar",
"id" : "123"
}
You can use script to UPSERT (update or insert) your document
PUT /products/product/un1qu3-1d-b718-105973677e95/_update
{
"script": {
"inline": "ctx._source.state = \"packaged\"",
"lang": "painless"
},
"upsert": {
"id": "un1qu3-1d-b718-105973677e95",
"state": "packaged"
}
}
Above query find the document with _id = "un1qu3-1d-b718-105973677e95"
if it is able to find any document then it will update state to "packaged" otherwise create a new document with field "id" and "state" (you can insert as many fields as you want).
I am inserting json file into Mongodb(with Scala/Play framework) and the same getting/downloading it into my view page for some other requirement, but this time it is coming with one "_id" parameter in the json file.
But I need only my actual json file only that is not having any any "_id" parameter. I have read the Mongodb tutorial, that by default storing it with one _id for any collection document.
Please let me know that how can I get or is there any chance to get my actual json file without any _id in MongoDB.
this is the json result which is storing in database(I don't need that "_id" parameter)
{
"testjson": [{
"key01": "value1",
"key02": "value02",
"key03": "value03"
}],
"_id": 1
}
If you have a look at ReactiveMongo dev guide and to its API, you can see it support projection in a similar way as the MongoDB shell.
Then you can understand that you can do
collection.find(selector = BSONDocument(), projection = BSONDocument("_id" -> 0))
Or, as you are using JSON serialization:
collection.find(selector = Json.obj(), projection = Json.obj("_id" -> 0))
You can use this query in the shell:
db.testtable.find({},{"_id" : false})
Here we are telling mongoDB not to return _id from the collection.
You can also use 0 instead of false, like this:
db.testtable.find({},{"_id" : 0})
for scala you need to convert it in as per the driver syntax.