We are migrating the data from Oracle to Mongo DB using Talend tool and we would need to add the object Id to each object inside an array. We have tried to use attribute #type with fixed value as ObjectId but it didn't worked.
We need the output as below:
{
"_id":"12243",
"name": "ABCD",
"city":"XYZ",
"requests":[
{
"_id" : ObjectId("5efdcf15ea9355c419fc9699"), // How to generate this ObjectId using talend tool in Mongo
"type":"department",
"value":"Science"
},
{
"_id" : ObjectId("K279kkqasj8ac023878hjc"), // How to generate this ObjectId using talend tool in Mongo
"type":"department",
"value":"Commerce"
}
]
}
Based on your needs, I assume that manually generating the ObjectId should be enough. I will propose to use:
either the standard MongoDB BSON java library (recommended)
or to generate this element by yourself as soon as you follow the official MongoDB conventions: https://docs.mongodb.com/manual/reference/bson-types/#objectid
The recommended way means that you have to add the MongoDB BSON Java library to your Talend project, certainly by including it as a JAR (see links below). I will not explain how (-> out-of-scope). Then simply do the following to add a correct _id to your embedded elements:
ObjectId id = new ObjectId();
// or
ObjectId id = ObjectId.get();
Related materials:
https://docs.mongodb.com/manual/reference/method/ObjectId/
How to generate unique object id in mongodb
https://mongodb.github.io/mongo-java-driver/4.0/bson/installation-guide/
https://mvnrepository.com/artifact/org.mongodb/bson
Related
I have a AWS DocumentDB with schema my-schema and table called my-table which has a structure something like
{
"_id": { "FIELD_1" : "001", "FIELD_2" : "A1" },
"FIELD_1": "001",
"FILED_2": "A1",
.
.
.
}
as you can see that the _id contains FIELD_1 & FIELD_2. combination of these two fields are unique for all the records. These two fields formed the composite primary key in the original oracle db that is why when we migrated from oracle to DocumentDB, AWS DMS chose to put that into _id.
Now the problem is, we need _id to be a mongodb ObjectId instead of a nested json.
What I have tried is:
create a source endpoint with my DocumentDB (which contains this bad _id data, in schema my-schema).
create a target endpoint with the same DocumentDB but with new schema my-new-schema and same table name my-table.
Then I migrate the data from my-schema to my-new-schema using transformations (remove column _id).
But It still replicates the same nested _id into the target table.
I have tried both document metadata mode & table metadata mode.
In table metadata mode, it doesn't even transfer data because after it flattens the _id into _id.FIELD_1 & _id.FIELD_2. DMS throws an exception "Document can't have '.' in field names"
I know that I can do this easily using code but if it is somehow possible to achieve my goal using DMS I would prefer that.
or can we achieve this using mongodb commands directly?
Not sure about DMS, but I think you can do this using an aggregation query with the $out stage. Project the fields you need, exclude _id, the documents in the new collection will be inserted with the usual ObjectId. Something like this:
db.collection.aggregate([
{$project:
{_id:0,
FIELD_1:1,
FIELD_2:1
}},
{$out: 'new_collection'}
])
I have the Presto set-up done locally and am able to query data from MongoDB collections. At the start, I created presto_schema collection into MongoDB to let Presto understand the Collection details in order to query and I had one Collection entry added into my presto_schema. However, I noticed later that any new Collection into MongoDB, which was not added into presto_schema is still accessible from Presto and upon the first query it is observed that the new collection details are automatically amended into the presto_schema collection with the relevant new collection schema details.
But for the collections with nested schema, it is missing to automatically add all the nested fields and it only adds what it identifies from the initial query.
For example, consider below is my Collection (new_collection), which it got created newly with content as below:
{
"_id" : "13ec5e2a-ef04-4d05-b971-ef8e65638f83",
"name" : "npt",
"client" : "npt_client",
"attributes" : {
"level" : 697,
"country" : "SC",
"doy" : 2022
}
}
And say if my first query from Presto is as below:
presto:mydb> select count(*) from new_collection where attributes.level > 200;
The presto_schema gets automatically added with a new entry for this new collection, however, it adds all the non-nested fields info and the nested fields too that are available from the initial query, but fails to add the other nested fields. So any queries on the other nested fields, Presto does not recognize them. I could go ahead and amend the presto_schema with all the missing nested fields, but wondering if there is any other automated way. So that, we don't need to keep amending it manually on any new field addition into the collection (considering a scenario where we have a complete dynamic fields, which would be added into the Collection's nested object).
I would recommend upgrading to Trino (formerly PrestoSQL) because the MongoDB connector (version >= 360) supports mapping fields to JSON type. This type mapping is unavailable in prestodb.
https://trino.io/download.html
I am trying to query a binary field in mongo db. The data looks like this:
{"_id":"WE8fSixi8EuWnUiThhZdlw=="}
I've tried a lot of things for example:
{ '_id': new Binary( 'WE8fSixi8EuWnUiThhZdlw==', Binary.SUBTYPE_DEFAULT) }
{ '_id': Binary( 'WE8fSixi8EuWnUiThhZdlw==', 0) }
etc
Nothing seems to be working, have exhausted google and the mongo documentation, any helper would be amazing.
UPDATE:
Now you should be able to query UUID and BinData from MongoDB Compass v1.20+ (COMPASS-1083). For example: {"field": BinData(0, "valid_base64")}.
PREVIOUS:
I see that you're using MongoDB Compass to query the field. Unfortunately, the current version of MongoDB Compass (v1.16.x) does not support querying binary data.
You can utilise mongo shell to query the data instead. For example:
db.collection.find({'_id':BinData(0, "WE8fSixi8EuWnUiThhZdlw==")});
Please note that the field name _id is reserved for use as a primary key; its value must be unique in the collection, and is immutable. Depending on the value of the binary that you're storing into _id, I would suggest to store the binary in another field and keep the value of _id to contain ObjectId.
How to get field information of a collection in mongodb.
information I am looking for are
field name
data type
You will need to loop over all the documents and figure out what the used names are, and which types each specific field uses. MongoDB does not have a schema, so there is no short cut to fetch this. Be also aware that each field's value can have totally different data types as well—another one of MongoDB's strenghts.
To figure out some statistics, such as field names, the following script can help:
mr = db.runCommand({
"mapreduce" : "things",
"map" : function() {
for (var key in this) { emit(key, null); }
},
"reduce" : function(key, stuff) { return null; },
"out": "things" + "_keys"
})
Then run distinct on the resulting collection so as to find all the keys:
db[mr.result].distinct("_id");
But there is no way to also include the field types with a Map/Reduce job like this.
You can't determine the schema of a collection. Each of the objects of an collection might have a different schema, you should be aware of this.
I made a similar question a few months ago , in the post you can find how to retrieve the schema of an object using the java programing language; However, to the best of my knowledge, the is no way to retrieve the data types other than try to cast the objects (this is the way the BasicBsonObjects do it).
MongoDB supports dynamic schema, and there is no inbuilt feature for schema introspection or analysis as at MongoDB 2.4.
However .. it is possible to infer the schema by inspecting using a Map/Reduce across either a sample of documents or the entire collection.
There are a few open source tools which package this approach up in a helpful interface, for example:
Schema.js - extends the mongo shell with collection.schema() prototypes
Variety - runs as a standalone script
I like the approach of schema.js, and include it in my ~/mongorc.js startup file so it is available in my mongo shell sessions.
By default schema.js analyzes up to 50 documents in a collection and returns the results inline. There is a limit option to inspect more (or even all) documents in a collection, and it supports the Map/Reduce out options so results can optionally be saved or merged with an output collection.
for testing purposes I need to create manually some objects in a MongoDB. My Class has a reference field to another class. The referred object already exists.
I tried to put the Mongo-ID of my existing object as a value in my new object but I get the following error:
A ReferenceField only accepts DBRef: ['attribute'])
Now my question: Where do I get or find this DBRef?
An example:
I have a user in my db. I want to create a group which has the existing user as "creator". When I put the user-ID into the creator-field I get the error...
Edit:
I just found this link MongoDB - DBRef but the solution does not work for me...
item : {"$ref" : "fruit", "$id" : "1"}
My code is like this:
{ "name" : "MyGroup", "created_at" : "2011-05-22T00:46:38", "creator": { "$ref": "user", "$id": "501bd5ac32f28a1278e54435" } }
Another edit:
Even the Mongo doc says I'm using the right format... http://www.mongodb.org/display/DOCS/Mongo+Extended+JSON. But still not working.
In the question you referenced, the user is using a numeric string as their document ID. In your case, it looks like you're working with the more-common ObjectId but inserting it as a string. Assuming you're using PyMongo, you probably want to use the ObjectId class for the $id property of the DBRef.
If you know all such references are going to point to the same DB and collection, it may make sense to use manual references (just storing the target document's _id) instead of DBRef objects. This is explained in more detail in the Database References documentation.