Get Max id mongodb in Pentaho spoon - pentaho-spoon

I want to get the maximum id in mongodb using PDI spoon.
I have this fields in my collection:
Id String
Genre String
Before I insert new record I should get the maximum Id.
Can you help me on how to get the maximum Id?

You can use the MongoDB Input step.
Use the following query:
({
$group: {
_id: '',
theMaxId: {
$max: "$_id"
}
}
}
Note: This is an aggregation framework query and you must check the "Query is aggregation pipeline" box at the bottom of the query box on the query tab.
As a side note - if you are using the MongoDB id's, it is auto-incrementing so you don't need to find the max, increment it and use if for your new record.
Hope that helps,
Mark

Related

How to make a unique number in the _id

Unfortunately, I can't figure out how to turn the _id ObjectID, into a _id with a unique number generated by a number.
For example:
Right now the generation is like this
[
{
_id: 'adioj2ouro21jr9o3',
// ...
}
]
And we need to
[
{
id: 1,
// ...
}
]
The build-in mechanism in mongoDB to auto generate ObjectId() is a very good and easy way to have unique _id and also contain the insertion date which sometimes make troubleshooting easier.
You cannot replace having default _id key with id , but you can have both _id and id ...
However you can insert different document in _id instead of the default ObjectId().
if you want the _id to be number you can read max(_id) and insert new document with inc(max(_id)) but this is not scalalble solution since if your writes increase it can become a bottleneck at some point.
Finally it is recomended to leave the default ObjectId() as your auto generated _id ...
https://www.mongodb.com/basics/mongodb-auto-increment
I have not used mongoDB myself. It appears that mongoDB doesnt support auto increment like MySQL would.
Could you use javascript to add N+1 on the last id in the table and manually create the ID field as needed?

range operator not returning expected results via mongodb database

I've now tested this via the shell, Studio 3T IDE and within my API itself.
The first query looks something like this:
Notification.find({
userId: ObjectId("...")
}).limit(20).sort('-createdAt')
which returns the following documents sorted by their createdAt timestamp (which is also imbedded in the _id field):
I then run the next query which I would expect to return the results starting at _id: ObjectId("615869eac849ec00205aa112"):
Notification.find({
userId: ObjectId("..."),
_id: { $lt: ObjectId("615869eac849ec00205aa115"}
}).limit(20).sort('-createdAt')
I would expect this command to get me my next 20 results sorted in the same descending order as the original query above. However, I get the following:
which has 3 results from the original query. The _id field is clearly unique between the _id I use as a cursor and the incorrectly returned results but after inspection the createdAt timestamp is the exact same as the createdAt timestamp of the document _id I use for the range query.
The problem is you are querying on an unsorted field expecting that value to identify a specific point the result set.
Note that in the first result set entries 17 through 29 all have the same epoch timestamp in the _id value, and that those 13 entries are not in any particular order.
As luck would have it, entry 20 has the greatest _id of that group, so all 12 of the others are lesser, even the ones that happened to come before.
To make this work, also sort on _id like:
.sort({createdAt: -1, _id: -1})

How to add text to a MongoDB count query

I'm trying to get a text output added to query to count the number of employed personnel in a DB. I'm using income exists and an indicator that personnel is employed.
My input is
db.collection.countDocuments({income{$exists:true)}, {as: {"Number Employed:"}})
Any thoughts on why this isn't working?
The countDocuments function returns a number, not a document. The second argument is an options object, not a projection. The valid options are limit, skip, hint, and maxTimeMS.
If you need to return a document with a specific field name, use aggregation:
db.collection.aggregate([
{$match: {income{$exists:true}}},
{$count: "Number Employed"}
])

Elasticsearch and subsequent Mongodb queries

I am implementing search functionality using Elasticsearch.
I receive "username" set returned by Elasticsearch after which I need to query a collection in MongoDB for latest comment of each user in the "username" set.
Question: Lets say I receive ~100 usernames everytime I query Elasticsearch what would be the fastest way to query MongoDB to get the latest comment of each user. Is querying MongoDB 100 times in a for loop using .findOne() the only option?
(Note - Because latest comment of a user changes very often, I dont want to store it in Elasticsearch as that will trigger retrieve-change-reindex process for the entire document far too frequently)
This answer assumes following schema for your mongo db stored in comments db.
{
"_id" : ObjectId("5788b71180036a1613ac0e34"),
"username": "abc",
"comment": "Best"
}
assuming usernames is the list of users you get from elasticsearch, you can perform following aggregate:
a =[
{$match: {"username":{'$in':usernames}}},
{$sort:{_id:-1}},
{
$group:
{
_id: "$username",
latestcomment: { $first: "$comment" }
}
}
]
db.comments.aggregate(a)
You can try this..
db.foo.find().sort({_id:1}).limit(100);
The 1 will sort ascending (old to new) and -1 will sort descending (new to old.)

Exclude _id field in mongodb while inserting

Is there any possibilities of without having _id field in mongodb collection??
I don't want it because i need to load mongodb data into apache pig, which will not support _id.
So, i just don't want _id field in my mongodb collections.
Anyone please help..
Thanks in advance.
No, you can't. The _id field is required for internal purposes in MongoDB. It is the MongoDB equivalent to a primary key in a relational database. Every document must have an unique _id field. It does not necessarily need to be an ObjectId, but it must be a value unique to the collection. But you can query data without the ID field:
db.yourCollection.find({ ...query... }, { _id: false } );