spark-dataframe/mongo - append data - mongodb

I need to append data to mongodb using spark-dataframe. For example, let's say there are 100k stocks in a portfolio:
Stock A
Jan 2018
Profit: $30k
Stock B
Jan 2018
Profile: -$10k
MongoDB:
_id: ObjectId('XXX1')
stock: Stock A
monthlyProfit: Array
0: Object
Month: Jan 2018
Profit: 30k
_id: ObjectId('XXX2')
stock: Stock B
monthlyProfit: Array
0: Object
Month: Jan 2018
Profit: -10k
If I were to append February profit, how do I add an element to an existing array and push it to mongodb without having a performance issue given same updates need to happen to all 100k documents in a collection?

Related

get a excluded day in a the month

I want to get all the data from the month in mongodb
lets say i want to get all the data from September, except 23 Sep
i think of
createdAt 1 sep AND createdAt not in 23,24 sep
but it only execute the createdAt not in 23,24 sep
is there other ways?
db.getCollection('myTest').find({
"$and":[{
createdAt:{ "$gte": ISODate("2019-09-01T16:00:00.000Z")},
createdAt:{ "$nin": [ISODate("2019-09-23T16:00:00.000Z"), ISODate("2019-09-24T16:00:00.000Z")]}
}]
})
db.test.aggregate([{$project:{date:"$date",month:{$month:"$date"},year:{$year:"$date"}}},{$match:{$and:[{date:{$nin:[ISODate("
2019-09-23T00:00:00Z")]}},{month:9},{year:2019}]}},{$project:{_id:1,date:"$date"}}])
])
To get the month number $month,$year operator gives month and year number
$nin for not in date and check month and year number to get particular month in a year
$project the required values

How do I use the $gte and $lte for dates in mongodb?

I have data in a collection. Each stored data was stored along with times indicating the times each data was inserted into the collection.
I have an issue formulating a query that shows data stored between the beginning of a specific month and the end of a specific month.
Kindly help me with what is wrong with my query.
Find below the content of my collection.
Meteor.users.find({}).fetch();
(2) [{…}, {…}]
0: {_id: "ZwT3hcmabytf82wyK", createdAt: Mon Dec 03 2018 11:16:53 GMT+0300 (East Africa Time)}
1: {_id: "SEXZoHzt3ryKC2a4r", createdAt: Mon Dec 03 2018 11:16:34 GMT+0300 (East Africa Time)}
length: 2
Kindly note that the createdAt suggests the inserted data time, being: Monday 3rd December 2018 e.t.c.
And now note my query below. That beginningOfMonth variable and the endOfMonth variables are both set to the November month (beginningOfMonth.setMonth(10, 1) & endOfMonth.setMonth(10, 28)), and yet the results of the query is wrong:
var beginningOfMonth = new Date();
var endOfMonth = new Date();
beginningOfMonth.setMonth(10, 1);
endOfMonth.setMonth(10, 28);
Meteor.users.find({}, { createdAt: { $gte: beginningOfMonth, $lt: endOfMonth} }).fetch();
[{…}]
0:{_id: "SEXZoHzt3ryKC2a4r",
createdAt: Mon Dec 03 2018 11:16:34 GMT+0300 (East Africa Time),
length: 1__proto__: Array(0)
Note that that the results of the query are wrong. The query seems to ignore the dates section createdAt: { $gte: beginningOfMonth, $lt: endOfMonth}.
Kindly help and show me what I am doing wrong.
Thanks in advance
show me what I am doing wrong.
You're not properly using meteor api. Pass your criteria as the first parameter.
Meteor.users.find({ createdAt: { $gte: beginningOfMonth, $lt: endOfMonth} }).fetch();

Mongo db $group dynamic expression

I have a set of logs with a timestamp and needs to group that logs by some non-existent 'virtual session'.
New grouped session begins if there is half of hour between last log in previous session and first log in this.
For example we have following set of data:
[
{
id: "b4f0d0d7-495b-48db-95bf-d5ac0c8c9e9b"
time: 1461872894322
timestamp: "Apr 28, 2016 7:48:14 PM",
},
{
id: "bf55ca2f-b544-406c-bed6-766a1204683d"
time: 1461872937941
timestamp: "Apr 28, 2016 7:48:57 PM"
},
{
id: "7f2ab420-0434-46f8-9444-6e2ffa73aea8"
time: 1461873088155
timestamp: "Apr 28, 2016 7:51:28 PM"
},
{
id: "dd31124c-0375-454a-acca-c239465a2b22"
time: 1461839257257
timestamp: "Apr 28, 2016 10:27:37 AM"
},
{
id: "a4370974-bfea-408f-aa69-973961e9f058"
time: 1461839281324
timestamp: "Apr 28, 2016 10:28:01 AM"
}
]
It should be grouped in two virtual sessions. As a result of grouping i can get min and max time for each group in mongo aggregate $group, but how to write the correct expression?
Expected answer is something like
[
{min: 1461872894322, max: 1461873088155},
{min: 1461839257257, max: 1461839281324}
]
Unfortunately there is no way to do it by mongo query as there is no handle for previous row (like CTE common table expressions).
To solve this problem you need to process data client side (or using javascript in mongo console - like a SP from sql world) and iterate over all documents checking for time gap and adding a grouping indicator to collection.
Then you will be able to group by added grouping indicator.
Was thinking of suing $let as it can access external variable - but this is RO access so we cannot relay on that.
Have a fun!
Any comments welcome.

Between query in mongoDb by using start date and end Date (IST)

I am able to get data by using below query for integer from my table of mongoDb by using umongo UI Interface.
{ sequence: { $gt: 4035 } }
{sequence:{$gte:4035,$lte:4035}}
Both above queries are working fine
but in the same table or collection I have one column schedules inside the schedules I have startDate endDate column
by using this column I want to execute "between clause" on the basis of start date and end date like below but no records are showing .
{'schedules.start': {'$gte': "1-1-2014",'$lt': "14-1-2016"}}
or
{'schedules.start': {$gte: '1-1-2014',$lt: '14-1-2016'}}
or
{'schedules.start': {$gte: 1-1-2014,$lt: 14-1-2016}}
or
{ 'schedules': {'$gt': 'date':'Wed Jan 01 05:30:00 IST 2014', '$lt': 'date' :'Wed Dec 31 05:30:00 IST 2014'}}
so I need between query on the basis of starDate and endDate.
My table structure:
site--> schedules-->star(date):Wed Jan 01 05:30:00 IST 2014
end(date) :Wed Dec 31 05:30:00 IST 2014
Here I am attaching screen shot of my table.
Thank you in advance
Can you wrap it with an $and clause?
{
$and: [
{'schedules.startDate': {$gte: 1-1-2014}},
{'schedules.endDate': {$lt: 14-1-2016}}
]
}
Obviously you'll need to adjust naming conventions to your model.

Date range queries mongodb

I've two collections one is random and other one is 'msg'
In message I've a document like
{ "message": "sssss", postedAt: Fri Jul 17 2015 09:03:43 GMT+0530 (IST) }
For random collection, there is a script which generates random number every minute
like
{ "randomStr": "sss", postedAt: Fri Jul 17 2015 09:03:43 GMT+0530 (IST) }
{ "randomStr": "xxx", postedAt: Fri Jul 17 2015 09:04:43 GMT+0530 (IST) }
{ "randomStr": "yyy", postedAt: Fri Jul 17 2015 09:05:43 GMT+0530 (IST) }
Notice the change in timings, for every mintute there is a new record.
Now, my issue is
when I query for message collection, I'll get one record.
Lets's say this
{ "message": "sssss", postedAt: Fri Jul 17 2015 09:03:13 GMT+0530 (IST) }
now I want to get the record from random collection which posts at exact minute
this message is posted at 09:03, I want to get the record from random collection which postedat exactly same time 09:03.
How to do that? Any help appreciated. Thx.
Note: I'm doing this in meteor
Edit
Added image for first comment
So the point here is 'actually use a range' and also be aware of the Date value you get in return. So as a basic example in principle.
The time right now as I execute this is:
var date = new Date()
ISODate("2015-07-17T04:02:04.471Z")
So if you presume then that your actual timestamp in the document is "not exactly to the minute" (like above) nor is it likely the "random record" is so then the first thing to do is "round" it to the minute:
date = new Date(date.valueOf() - date.valueOf() % ( 1000 * 60 ))
ISODate("2015-07-17T04:02:00Z")
And of course the "end date" is just one minute after that:
var endDate = new Date(date.valueOf() + ( 1000 * 60 ))
ISODate("2015-07-17T04:03:00Z")
Then when you query "rqndom" you just get the "range", with $gte and $lt operators:
db.random.find({ "postedAt": { "$gte": date, "$lt": endDate } })
Which just retrieves your single "write once a minute" item from "random" at any possible value within that minute.
So basically:
Round your input retrieved date to the minute
Search on the "range" betweeen that value and the next minute