How to use Spark MongoDB-Connector with conditional query ？

How to use Spark MongoDB-Connector with conditional query ？ - mongodb

Recently I try to use MongoDB-Connector from the Official documents
JavaMongoRDD<Document> rdd = MongoSpark.load(jsc);
but the demo will load all the data from my collection ,
I just want to run this command in my java or scala code
db.pointer.find({"inserttime":{$lt:new Date(2018,0,4,7,0,10),$gte:new Date(2018,0,4,7,0,0)}},{"inserttime":1,})
I know I can use RDD.filter() to get the data I want.
but it will query all data at the first time ,That's not what I want.
Thanks in advance .
EDIT:
Did it provided any method with condition query to reduce the result data when query .
like JDBC:
find(and(eq("status", "A"),or(lt("qty", 30), regex("item", "^p"))));

The documentation uses aggregations to filter data at the database level, so you can do the same.
// the following example was not tested
BasicDBObject query = new BasicDBObject("$lt", new Date(2018,0,4,7,0,10);
query.put("$gte", new Date(2018,0,4,7,0,0));
JavaMongoRDD<Document> aggregatedRdd = rdd.withPipeline(singletonList(query));
Date range query trick came from this answer

Related

MongoDB does't work as expected (Realm.findAll)

I am a newbie in MongoDB Realm. I followed this guide to start(https://www.mongodb.com/docs/realm/sdk/java/quick-start-sync/).
This is the implementation to fetch all employees from MongoDB.
val employeeRealmConfig = SyncConfiguration.Builder(
realmApp.currentUser()!!,
AppConfigs.MONGODB_REALM_USER_PARTITION_ID
).build()
backGroundRealm = Realm.getInstance(employeeRealmConfig)
val queryEmployeesTask = backGroundRealm.where<Employee>().findAll()
I printout queryEmployeesTask size but each time I run my application there is a different result printed out and queryEmployeestask size < 25000. I used mongo compas to check database, there are 25000 records for partition AppConfigs.MONGODB_REALM_USER_PARTITION_ID.
I want to get full 25000 records, Could you help me to resolve this problem ?
Thank in advanced

After checking the document carefully, I realized that Employee Object in the client has a different schema with Mongo Atlast schema, after correcting this problem val queryEmployeesTask = backGroundRealm.where<Employee>().findAll() returns the correct value.
I hope this can help someone has the same problem with me

multiple aggregations on same column using agg in pyspark

I am not able to get multiple metrics using agg as below.
table.select("date_time")\
.withColumn("date",to_timestamp("date_time"))\
.agg({'date_time':'max', 'date_time':'min'}).show()
I see that second aggregation overwriting first aggregation,
can someone help me to get multiple aggregations on same column?

I can't replicate and make sure that it works but I would suggest instead of using a dict for your aggregations try it like this:
table.select("date_time")\
.withColumn("date",to_timestamp("date_time"))\
.agg(min('date_time'), max('date_time')).show()

mongodb: how can I see the execution time for the aggregate command?

I execute the follow mongodb command in mongo shell
db.coll.aggregate(...)
and i see the list of result. but is it possible to see the query
execution time? Is there any equivalent function for explain method for aggregation queries.

var before = new Date()
#aggregation query
var after = new Date()
execution_mills = after - before

You can add a time function to your .mongorc.js file (in your home directory):
function time(command) {
const t1 = new Date();
const result = command();
const t2 = new Date();
print("time: " + (t2 - t1) + "ms");
return result;
}
and then you can use it like so:
time(() => db.coll.aggregate(...))
Caution
This method doesn't give relevant results for db.collection.find()

i see that in mongodb there is a possibility to use this two command:
db.setProfilingLevel(2)
and so after the query you can use db.system.profile.find() to see the query execution time and other

Or you can install the excellent mongo-hacker, which automatically times every query, pretty()fies it, colorizes the output, sorts the keys, and more:

I will write an answer to explain this better.
Basically there is no explain() functionality for the aggregation framework yet: https://jira.mongodb.org/browse/SERVER-4504
However there is a way to measure client side but not without its downsides:
You are not measuring the database
You are measuring the application
There are too many unknowns about the in between parts to be able to get an accurate reading, i.e. you can't say that it took 0.04ms for the document result to be formulated by the MongoDB server, serialised, sent over the wire, de-serialised by the app and then stored into a hash allowing you subtract that sum from the total to get a aggregation benchmark.
However that being said, you might be able to get a slightly accurate result by doing it in MongoDB console on the same server as the mongos / mongod. This will create very little in betweens, still too many but enough to maybe get a reading you could roughly trust. As such you could use #Zagorulkin's answer in that position.

Spring data mongoDB GeoNear query with excluding fields

I don't know if I am doing something wrong or it is a bug.
I have the following code:
Query criteria = new Query(Criteria.where("locationTime").gte(
"date-time"));
criteria.fields().exclude("friends");
NearQuery query = NearQuery.near(point).maxDistance(maxDistance)
.num(limit).query(criteria);
GeoResults<Profile> result = mongoTemplate
.geoNear(query, Profile.class);
I am executing the query and profiles near by retrieved correctly according to distance and the "locationTime" criteria but it seems to ignore the excluded field and retrieving the profiles with their friends.
When I use simple query the exclude/include fields works perfectly.
I looked every where and could not find any resemble use-case, please let me know if i am doing something wrong.
Thanks.

There's no way to limit the fields with a geoNear command, as far as I know.
I looked into calling executeCommand to try to work around the limitations of Spring Data, but it looks like they don't even have a way to do it from the raw command.

pymongo sort grouped results

I need to group and sort by date_published some documents stored on mongodb using pymongo.
the group part went just fine :) but when I'm addding .sort() to the query it keeps failing no matter what I tried :(
here is my query:
db.activities.group(keyf_code,cond,{},reduce_code)
I want to sort by a field called "published" (timestamp)
tried to do
db.activities.group(keyf_code,cond,{},reduce_code).sort({"published": -1})
and many more variations without any success
ideas anyone?

You can't currently do sort with group in MongoDB. You can use MapReduce instead which does support a sort option. There is also an enhancement request to support group with sort here.

Although MongoDB doesn't do what you want, you can always use Python to do the sorting:
result = db.activities.group(keyf_code,cond,{},reduce_code)
result = sorted(result, key=itemgetter("published"), reverse=True)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to use Spark MongoDB-Connector with conditional query ？ - mongodb

Related

MongoDB does't work as expected (Realm.findAll)

multiple aggregations on same column using agg in pyspark

mongodb: how can I see the execution time for the aggregate command?

Spring data mongoDB GeoNear query with excluding fields

pymongo sort grouped results

Categories

Resources