I need to find the nth highest salary in a mongodb from Employees collection.
also would be really helpful if someone could gimme an idea of applying joins in mongodb.
This should work
db.Employees.find({}).sort({"Emp salary":-1}).limit(1) //for first highest salary
db.Employees.find({}).sort({"Emp salary":-1}).skip(1).limit(1) // for second highest salary
Similarly you can do db.Employees.find({}).sort({"Emp salary":-1}).skip(nthVarible - 1).limit(1).
Try this out:
db.salary.find({}).sort({s:-1}).skip(1).limit(1);
For your second requirement - MongoDB is noSQL DB, not a transnational DB. It does not support joins.
I found a two-step process to do it. it will work in the scenario if there are multiple records with salary same as highest.
Records I have
{ "_id" : ObjectId("5cc04b02536dc2e493697b4e"), "name" : "Ankit" }
{ "_id" : ObjectId("5cc0504a536dc2e493697b50"), "name" : "Ankit", "salary" : 1000, "email" : "a#b.com", "joining_date" : ISODate("2019-04-24T12:02:18.528Z") }
{ "_id" : ObjectId("5cc0504a536dc2e493697b51"), "name" : "Priya", "salary" : 1300, "email" : "p#b.com", "joining_date" : ISODate("2019-04-24T12:02:18.528Z") }
{ "_id" : ObjectId("5cc0504a536dc2e493697b52"), "name" : "Raj", "salary" : 1200, "email" : "rj#b.com", "joining_date" : ISODate("2019-04-24T12:02:18.528Z") }
{ "_id" : ObjectId("5cc0504a536dc2e493697b53"), "name" : "Vishu", "salary" : 1500, "email" : "v#b.com", "joining_date" : ISODate("2019-04-24T12:02:18.528Z") }
{ "_id" : ObjectId("5cc0504a536dc2e493697b54"), "name" : "Rahul", "salary" : 2000, "email" : "ra#b.com", "joining_date" : ISODate("2019-04-24T12:02:18.528Z") }
{ "_id" : ObjectId("5cc08b5d536dc2e493697b57"), "name" : "Tushar", "salary" : 2000, "email" : "tu#b.com", "joining_date" : ISODate("2019-04-24T16:14:21.061Z") }
Find distinct salaries and store in a variable
sal = db.employee.distinct("salary").sort()
Output: [ 1000, 1200, 1300, 1500, 2000 ]
You can get the second highest salary from this array itself. Below query will give you the record with that salary
db.employee.find({salary:{$lt:sal[sal.length-1]}}).sort({"salary":-1}).limit(1)
Output:
{ "_id" : ObjectId("5cc0504a536dc2e493697b53"), "name" : "Vishu", "salary" : 1500, "email" : "v#b.com", "joining_date" : ISODate("2019-04-24T12:02:18.528Z") }
I see this question has been asked in many technical interviews.
OBJ = client.my_db.employee_table
OBJ.find({}).sort('salary', -1).limit(1)
1 in the index 1 implies ascending order
-1 in the index 1 implies descending order.
As we wanna find the highest salary from the table, we must mention -1.
To find the nth highest salary in the table.
OBJ.find({}).sort('salary',-1).skip(n-1).limit(1)
To eliminate rows, we use OFFSET in Mysql/SQL Likewise we must use skip() in MongoDB
For first, second maximum salary if having multiple records in Mongodb:
METHOD I : (If NO multiple records exists)
db.details.find({}).sort({"salary":-1}).limit(1) ==> First Highest Salary
db.details.find({}).sort({"salary":-1}).skip(1).limit(1) ==> Second Highest Salary
METHOD II : (If MULTIPLES records exists)
Second Maximum Salary :
sal = db.details.distinct("salary").sort() ==> sal = [1000, 1400, 1500, 1700, 2000]
db.details.find({salary:{$lt:sal[sal.length-1]}}).sort({"salary":-1}).limit(1)
db.sales.aggregate({$group:{_id:'$salary'}},{$sort:{salary:-1}},{$skip: 1},{$limit:1})
In $skip value you can use n number to skip n number of rows.
Related
I am trying to count all car dealers who's total fleet units is over 1000. This is the code I wrote to do it, however, it returns 0 and I know for a fact there are quite a few records in this data set that are over 1000.
db.Car_Dealership.find({Totalfleetunits : {$gte: 1000} }).count()
This is a sample of what's in my database, both records have total fleets over 1000. Any ideas why it returns 0?
"_id" : ObjectId("5a203ab0b9574375830354d4"),
"2016rank" : 6,
"Dealershipgroupname" : "Hendrick Automotive Group",
"Address" : "6000 Monroe Road",
"City/State/Zip" : "Charlotte, NC 28212",
"Phone" : "(704) 568-5550",
"Companywebsite" : "www.hendrickauto.com",
"Topexecutive" : "Rick Hendrick",
"Topexecutivetitle" : "chairman",
"Totalnewretailunits" : "117,946",
"Totalusedunits" : "88,458",
"Totalfleetunits" : "4,646",
"Totalwholesaleunits" : "56,569",
"Total_units" : "267,619",
"Total_number_of _dealerships" : 103,
"Grouprevenuealldepartments*" : "$8,551,253,132",
"2015rank" : 6
}
{
"_id" : ObjectId("5a203ab0b9574375830354d5"),
"2016rank" : 5,
"Dealershipgroupname" : "Sonic Automotive Inc.?",
"Address" : "4401 Colwick Road",
"City/State/Zip" : "Charlotte, NC 28211",
"Phone" : "(704) 566-2400",
"Companywebsite" : "www.sonicautomotive.com",
"Topexecutive" : "B. Scott Smith",
"Topexecutivetitle" : "CEO",
"Totalnewretailunits" : "134,288",
"Totalusedunits" : "119,174",
"Totalfleetunits" : "1,715",
"Totalwholesaleunits" : "35,098",
"Total_units" : "290,275",
"Total_number_of _dealerships" : 112,
"Grouprevenuealldepartments*" : "$9,731,778,000",
"2015rank" : 4
That happens because the value of Totalfleetunits is stringType.
Now to solve your problem you have to options.
option 1:
You can change your schema for Totalfleetunits to the type of Number and change all the documents Totalfleetvalues value from string to a number. Like,
"Totalfleetunits": "4,646" needs to be changed with "Totalfleetunits"
: "4646"
option 2:
You can use javascript in your query to first remove , from your value then check the Totalfleetunits value for greater than or equal to ( >= ). Only need to change a single line of code as I given below.
db.Car_Dealership.find("this.Totalfleetunits.replace(',','') >= 1000").count()
I understand there are similar questions to this one on Stack Overflow, but the issue is that I cannot actually change the structure of the Mongo Schema. I have a document resembling the following:
{
"_id" : ObjectId("asdfghjkl"),
"recordId" : "0000_11111",
"__v" : 0,
"userid" : "0000",
"date" : ISODate("2017-08-07T07:34:19.505Z"),
"username" : "batman",
"countries" : {
"Philippines" : 1,
"Lebanon" : 1,
"Andorra" : 1,
"Vanuatu" : 1,
"China" : 2,
"Greenland" : 2,
"Denmark" : 1,
"Hong Kong" : 1
}
}
The list of countries is much larger than the example above, imagine up to 100 countries with numbers summing up to 500,000. I need an aggregate that can take the values of the list of countries and sum them up. I tried grouping and $sum but with no success.
Any quick solutions for this?
Thanks!
AK
In the following model a product is owned by a customer. and cannot be ordered by other customers. So I know that in an order by customer 1 there can only be products owned by customer one.
To give you an idea here is a simple version of the data model:
Orders:
{
'customer' : 1
'products' : [
{'productId' : 'a'},
{'productId' : 'b'}
]
}
Products:
{
'id' : 'a'
'name' : 'somename'
'customer' : 1
}
I need to find orders that contain certain products. I know the product id and customer id. I'm free to add/change indexes on my database.
Now my question is. Is it faster to just add a single field index on the product id's and query only using that ID. Or should I go for a compound index with customer and product id?
I'm not sure if this matters, but in my real model the list of products is actually a list of objects which have an amount and a dbref to the product. And the customer is also a dbref.
Here is a full order object:
{
"_id" : 0,
"_class" : "nl.pfa.myprintforce.models.Order",
"orderNumber" : "e35f1fa8-b4c4-4d53-89c9-66abe94a3553",
"status" : "ERROR",
"created" : ISODate("2017-03-30T11:50:50.292Z"),
"finished" : false,
"orderTime" : ISODate("2017-01-12T12:50:50.292Z"),
"expectedDelivery" : ISODate("2017-03-30T11:50:50.292Z"),
"totalItems" : 19,
"orderItems" : [
{
"amount" : 4,
"product" : {
"$ref" : "product",
"$id" : NumberLong(16)
}
},
{
"amount" : 7,
"product" : {
"$ref" : "product",
"$id" : NumberLong(26)
}
},
{
"amount" : 8,
"product" : {
"$ref" : "product",
"$id" : NumberLong(7)
}
}
],
"stateList" : [
{
"timestamp" : ISODate("2017-03-28T11:50:50.074Z"),
"status" : "NEW",
"message" : ""
},
{
"timestamp" : ISODate("2017-03-29T11:50:50.075Z"),
"status" : "IN_PRODUCTION",
"message" : ""
},
{
"timestamp" : ISODate("2017-03-30T11:50:50.075Z"),
"status" : "ERROR",
"message" : "Something went wrong"
}
],
"customer" : {
"$ref" : "customer",
"$id" : ObjectId("58dcf11a71571a24c475c044")
}
}
When I have the following indexes:
1: {"customer" : 1, "orderItems.product" : 1}
2: {"orderItems.product" : 1}
both count queries (I use count to forcefully find all documents without the network transfer):
a: db.getCollection('order').find({
'orderItems.product' : DBRef('product',113)
}).count()
b: db.getCollection('order').find({
'customer' : DBRef('customer',ObjectId("58de009671571a07540a51d5")),
'orderItems.product' : DBRef('product',113)
}).count()
Run with the same time of ~0.007 seconds on a set of 200k.
When I add 1000k record for a different customer (and different products) it does not effect the time at all.
an extended explain shows that:
query 1 just uses index 2.
query 2 uses index 2 but also considered index 1. Perhaps index intersection is used here?
Because if I drop index 1 the results are:
Query a: 0.007 seconds
Query b: 0.035 seconds (5x as long!)
So my conclusion is that with the right indexing both methods work about as fast. However, if you do not need the compound index for anything else it's just a waste of space & write speed.
So: single field index is better in my case.
I have a mongo document which has structure like
{
"_id" : "THIS_IS_A_DHP_USER_ID+2014-11-26",
"_class" : "weight",
"items" : [
{
"dateTime" : ISODate("2014-11-26T08:08:38.716Z"),
"value" : 98.5
},
{
"dateTime" : ISODate("2014-11-26T08:18:38.716Z"),
"value" : 95.5
},
{
"dateTime" : ISODate("2014-11-26T08:28:38.663Z"),
"value" : 90.5
}
],
"source" : "MANUAL",
"to" : ISODate("2014-11-26T08:08:38.716Z"),
"from" : ISODate("2014-11-26T08:08:38.716Z"),
"userId" : "THIS_IS_A_DHP_USER_ID",
"createdDate" : ISODate("2014-11-26T08:38:38.776Z")
}
{
"_id" : "THIS_IS_A_DHP_USER_ID+2014-11-25",
"_class" : "weight",
"items" : [
{
"dateTime" : ISODate("2014-11-25T08:08:38.716Z"),
"value" : 198.5
},
{
"dateTime" : ISODate("2014-11-25T08:18:38.716Z"),
"value" : 195.5
},
{
"dateTime" : ISODate("2014-11-25T08:28:38.716Z"),
"value" : 190.5
}
],
"source" : "MANUAL",
"to" : ISODate("2014-11-25T08:08:38.716Z"),
"from" : ISODate("2014-11-25T08:08:38.716Z"),
"userId" : "THIS_IS_A_DHP_USER_ID",
"createdDate" : ISODate("2014-11-26T08:38:38.893Z")
}
The query that want to fire on this document structure,
finding documents for a particular user id
unwinding the embedded array
Grouping the documents based over _id with -
summing the items.value of the embedded array
getting the minimum of the items.dateTime of the embedded array
Note. The sum and min, I want to get as a object i.e. { value : sum , dateTime : min of the items.dateTime} inside an array of items
Can this be achieved in an single aggregation call using push or some other technique.
When you group over a particular _id, and apply aggregation operators such as $min and $sum, there exists only one record per group(_id), that holds the sum and the minimum date for that group. So there is no way to obtain a different sum and a different minimum date for the same _id, which also logically makes no sense.
What you would want to do is:
db.collection.aggregate([
{$match:{"userId":"THIS_IS_A_DHP_USER_ID"}},
{$unwind:"$items"},
{$group:{"_id":"$_id",
"values":{$sum:"$items.value"},
"dateTime":{$min:"$items.dateTime"}}}
])
But in case when you do not query for a particular userId, then you would have multiple groups, each group having its own sum and min date. Then it makes sense to accumulate all these results together in an array using the $push operator.
db.collection.aggregate([
{$unwind:"$items"},
{$group:{"_id":"$_id",
"result":{$sum:"$items.value"},
"dateTime":{$min:"$items.dateTime"}}},
{$group:{"_id":null,"result":{$push:{"value":"$result",
"dateTime":"$dateTime",
"id":"$_id"}}}},
{$project:{"_id":0,"result":1}}
])
you should use following aggregation may it works
db.collectionName.aggregate(
{"$unwind":"$items"},
{"$match":{"userId":"THIS_IS_A_DHP_USER_ID"}},
{"$group":{"_id":"$_id","sum":{"$sum":"$items.value"},
"minDate":{"$min":"$items.dateTime"}}}
)
Using MongoDB version 2.4.4, I have a profile collection containing profiles documents.
I have the following query:
Query: { "loc" : { "$near" : [ 32.08290052711715 , 34.80888522811172] , "$maxDistance" : 0.0089992800575954}}
Fields: { "friendsCount" : 1 , "tappsCount" : 1 , "imageUrl" : 1 , "likesCount" : 1 , "lastActiveTime" : 1 , "smallImageUrl" : 1 , "loc" : 1 , "pid" : 1 , "firstName" : 1}
Sort: { "lastActiveTime" : -1}
Limited to 100 documents.
loc - embedded document containing the keys ( lat,lon)
I am getting the exception:
org.springframework.data.mongodb.UncategorizedMongoDbException: too much data for sort() with no index. add an index or specify a smaller limit;
As stated in the exception when I down-size the limit to 50 it works.. but it ain't option for me.
I have the following 2 relevant indexes on the profile document:
{'loc':'2d'}
{'lastActiveTime':-1}
I have also tried compound index as below but without success.
{'loc':'2d', 'lastActiveTime':-1}
This is example document (with the relevant keys):
{
"_id" : "5d5085601208aa918bea3c1ede31374d",
"gender" : "female",
"isCreated" : true,
"lastActiveTime" : ISODate("2013-04-08T11:30:56.615Z"),
"loc" : {
"lat" : 32.082230499955806,
"lon" : 34.813542940344945,
"locTime" : NumberLong(0)
}
}
There are other fields in the profile documents .. basically average profile document size is 0.5 MB correct me if I am wrong but if I am specifying only the relevant response fields (as I do) it is not the cause for the problem.
Don't know if it helps but when I down-size the limit size to 50 and the query succeed
I have the following explain information (via MongoVUE client)
cursor : GeoSearchCursor
isMultyKey : False
n : 50
nscannedObjects : 50
nscanned : 50
nscannedObjectsAllPlans : 50
nscannedAllPlans : 50
scanAndOrder : True
indexOnly : False
nYields : 0
nChunkSkips : 0
millis : 10
indexBounds :
It is a blocker for me and I will appreciate your help, what am I doing wrong? How can I make the query roll with the needed limit size?
Try creating a compound index instead of two indexes.
db.collection.ensureIndex( { 'loc':'2d','lastActiveTime':-1 } )
You can also suggest the query which index to use:
db.collection.find(...).hint('myIndexName')