How to define Projection in PDI for mongodb input step [duplicate] - mongodb

This question already has an answer here:
mongo .find return specific field only for all user [duplicate]
(1 answer)
Closed 2 years ago.
Following is the structure of the document i have in a collection in MongoDB
{
"_id": {
"$oid": "5f48e358d43721376c397f53"
},
"heading": "this is heading",
"tags": ["tag1","tag2","tag3"],
"categories": ["projA", "projectA2"],
"content": ["This", "is", "the", "content", "of", "the", "document"],
"timestamp": 1598612312.506219,
"lang": "en"
}
I only want to select ID and heading fields (like writing query "select id, heading from collection"). the projection entries are not working, I have tried a few different approaches after searching the internet. Filtering, sorting, grouping is working but projection is not.
How do i define projection in Mongodb input step in PDI to select certain fields from a collection I have tried specifying projection in query and field expression but it is not working.
Also i am having trouble concatenating string in the same step. so i would appreciate if someone can help me with that as well.

solved the problem in this step by specifying projection like this
[{$project: {heading: 1}}]

Related

How do you update a mongoDB document by inserting a field as an object? [duplicate]

This question already has answers here:
Insert an embedded document to a new field in mongodb document
(2 answers)
Closed 2 years ago.
My goal is to insert a field in my database as an object using shell as below.
reviewers: Object
reviewer1: "Alpha"
reviewer2: "Beta"
reviewer3: "Gamma"
I've tried several queries such as the following, but it entered the field as an array of object
db.movieDetails.update({"title": "The Martian"}, {$push:{"reviewers":{$each:[{"reviewer1":"Alpha"},{"reviewer2":"Beta"},{"reviewer3":"Gamma"}]}}})
What query would do exactly what I'm looking for?
Your input will be very helpful.
If you want to append the object to an existing one without remove the old fields you can use dot notation in this way:
db.collection.update({
"title": "The Martian"
},
{
"$set": {
"reviewers.reviewer1": "Alpha",
"reviewers.reviewer2": "Beta",
"reviewers.reviewer3": "Gamma"
}
})
Example here
But if you only want to set the entire object, then use $set with the object you want to insert into reviewers.
db.collection.update({
"title": "The Martian"
},
{
"$set": {
"reviewers": {
"reviewer1": "Alpha",
"reviewer2": "Beta",
"reviewer3": "Gamma"
}
}
})
Example here
How about
db.movieDetails.update({"title": "The Martian"}, {"reviewers":{"reviewer1":"Alpha","reviewer2":"Beta","reviewer3":"Gamma"}})
Edit: This will replace the entire document.
I tried the following:
db.regexPractice.update({"Employee id": 22}, {$set:{"reviewers":{"reviewer1":"Alpha","reviewer2":"Beta","reviewer3":"Gamma"}}}
And it worked correctly.

How to get embedded document in mongodb? [duplicate]

This question already has answers here:
Retrieve only the queried element in an object array in MongoDB collection
(18 answers)
Closed 3 years ago.
I tried to get embedded documents in the table using mongodb but its not working.
dev table
[{
"id":1,
"data":[{"id":1,"name":true},{"id":2,"name":true},{"id":3,"name":false},{"id":1,"name":true}]
}]
Query
db.dev,find({data.name:true})
Excepted output
[{
"id":1,
"data":[{"id":1,"name":true},{"id":2,"name":true}]
}]
I got Output
[{
"id":1,
"data":[{"id":1,"name":true},{"id":2,"name":true},{"id":3,"name":"false"},{"id":1,"name":true}]
}]
How to write the query to match the expected output. can give sample code
Try out this solution
let whereClause = { "data.name":true };
await db.dev.find(whereClause);

mongodb how to get a document which has max value of each "group with the same key" [duplicate]

This question already has answers here:
MongoDB - get documents with max attribute per group in a collection
(2 answers)
Closed 5 years ago.
I have a collection:
{'_id':'008','name':'ada','update':'1504501629','star':3.6,'desc':'ok', ...}
{'_id':'007','name':'bob','update':'1504501614','star':4.2,'desc':'gb', ...}
{'_id':'005','name':'ada','update':'1504501532','star':3.2,'desc':'ok', ...}
{'_id':'003','name':'bob','update':'1504501431','star':4.5,'desc':'bg', ...}
{'_id':'002','name':'ada','update':'1504501378','star':3.4,'desc':'no', ...}
{'_id':'001','name':'ada','update':'1504501325','star':3.6,'desc':'ok', ...}
{'_id':'000','name':'bob','update':'1504501268','star':4.3,'desc':'gg', ...}
...
if I want the result is, the max value of 'update' of the same 'name', means the newest document of 'name', get the whole document:
{'_id':'008','name':'ada','update':'1504501629','star':3.6,'desc':'ok', ...}
{'_id':'007','name':'bob','update':'1504501614','star':4.2,'desc':'gb', ...}
...
How to do it most effective?
I do it now in python is:
result=[]
for name in db.collection.distinct('name'):
result.append(db.collection.find({'name':name}).sort('update',-1)[0])
is it do 'find' too many times?
=====
I do this for crawl data with 'name', get many other keys, and every time I insert a document, I set a key named 'update'.
When I using the database, I want the newest document of specific 'name'. so it looks can not just use $group.
How should I do? re design the db structure or better way to find?
=====
Improved !
I've tried create index of 'name' & 'update', the process is shortened from half hour to 30 seconds!
But I still welcome for better solution ^_^
Your use case scenario suits real good for aggregation. As I see in your question you already know that but can't figure out how to use $group and take whole document that has the max update. If you $sort your documents before $groupyou can use $firstoperator. So no need to send a find query for each name.
db.collection.aggregate(
{ $sort: { "name": 1, "update": -1 } },
{ $group: { _id: "$name", "update": { $first: "$update" }, "doc_id": { $first: "$_id" } } }
)
I did not add an extra $projectoperation to aggregate, you can just add fields that you want in result to $groupwith $firstoperator.
Additionally, if you look closer to $sortoperation, you can see it uses your newly created index, so you did good to add that, otherwise I will recommend it too :)
Update: For your question in comment:
You should write all keys in $group. But if you think it will look bad or new fileds will come in future and does not want to rewrite $groupeach time, I would do that:
First get all _idfields of desired documents in aggregation and then get these documents in one findquery with $inoperator.
db.collection.find( { "_id": { $in: [<ids returned in aggregation] } } )

MongoDB: distinct tuples

Suppose to have a collection of MongoDB documents with the following structure:
{
id_str: "some_value",
text: "some_text",
some_field: "some_other_value"
}
I would like to filter such documents so as to obtain the ones with distinct text values.
I learned from the MongoDB documentation how to extract unique field values from a collection, using the distinct operation. Thus, by performing the following query:
db.myCollection.distinct("text")
I would obtain an array containing the distinct text values:
["first_distinct_text", "second_distinct_text",...]
However, this is not the result that i would like to obtain. Instead, I would like to have the following:
{ "id_str": "a_sample_of_id_having_first_distinct_text",
"text": "first_distinct_text"}
{ "id_str": "a_sample_of_id_having_second_distinct_text",
"text": "second_distinct_text"}
I am not sure if this can be done with a single query.
I found a similar question which, however, do not solve fully my problem.
Do you have any hint on how to solve this problem?
Thanks.
You should look into making an aggregate query using the $group stage, and probably using the $first operator.
Maybe something along the lines of:
db.myCollection.aggregate([{ $group : { _id : { text: "$text"},
text: { $first: "$id_str" }
}
}])
try:
db.myCollection.aggregate({$group: {_id: {'text': "$text", 'id_str': '$id_str'}}})
More information here: http://docs.mongodb.org/manual/reference/method/db.collection.aggregate/

How do I manage a sublist in Mongodb?

I have different types of data that would be difficult to model and scale with a relational database (e.g., a product type)
I'm interested in using Mongodb to solve this problem.
I am referencing the documentation at mongodb's website:
http://docs.mongodb.org/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
For the data type that I am storing, I need to also maintain a relational list of id's where this particular product is available (e.g., store location id's).
In their example regarding "one-to-many relationships with embedded documents", they have the following:
{
name: "O'Reilly Media",
founded: 1980,
location: "CA",
books: [12346789, 234567890, ...]
}
I am currently importing the data with a spreadsheet, and want to use a batchInsert.
To avoid duplicates, I assume that:
1) I need to do an ensure index on the ID, and ignore errors on the insert?
2) Do I then need to loop through all the ID's to insert a new related ID to the books?
Your question could possibly be defined a little better, but let's consider the case that you have rows in a spreadsheet or other source that are all de-normalized in some way. So in a JSON representation the rows would be something like this:
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 12346789
},
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 234567890
}
So in order to get those sort of row results into the structure you wanted, one way to do this would be using the "upsert" functionality of the .update() method:
So assuming you have some way of looping the input values and they are identified with some structure then an analog to this would be something like:
books.forEach(function(book) {
db.publishers.update(
{
"name": book.publisher
},
{
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
{ "upsert": true }
);
})
This essentially simplified the code so that MongoDB is doing all of the data collection work for you. So where the "name" of the publisher is considered to be unique, what the statement does is first search for a document in the collection that matches the query condition given, as the "name".
In the case where that document is not found, then a new document is inserted. So either the database or driver will take care of creating the new _id value for this document and your "condition" is also automatically inserted to the new document since it was an implied value that should exist.
The usage of the $setOnInsert operator is to say that those fields will only be set when a new document is created. The final part uses $addToSet in order to "push" the book values that have not already been found into the "books" array (or set).
The reason for the separation is for when a document is actually found to exist with the specified "publisher" name. In this case, all of the fields under the $setOnInsert will be ignored as they should already be in the document. So only the $addToSet operation is processed and sent to the server in order to add the new entry to the "books" array (set) and where it does not already exist.
So that would be simplified logic compared to aggregating the new records in code before sending a new insert operation. However it is not very "batch" like as you are still performing some operation to the server for each row.
This is fixed in MongoDB version 2.6 and above as there is now the ability to do "batch" updates. So with a similar analog:
var batch = [];
books.forEach(function(book) {
batch.push({
"q": { "name": book.publisher },
"u": {
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
"upsert": true
});
if ( ( batch.length % 500 ) == 0 ) {
db.runCommand( "update", "updates": batch );
batch = [];
}
});
db.runCommand( "update", "updates": batch );
So what is doing in setting up all of the constructed update statements into a single call to the server with a sensible size of operations sent in the batch, in this case once every 500 items processed. The actual limit is the BSON document maximum of 16MB so this can be altered appropriate to your data.
If your MongoDB version is lower than 2.6 then you either use the first form or do something similar to the second form using the existing batch insert functionality. But if you choose to insert then you need to do all the pre-aggregation work within your code.
All of the methods are of course supported with the PHP driver, so it is just a matter of adapting this to your actual code and which course you want to take.