How to update collection and increment hours for ISO date - mongodb

I have an ISO date in my collection documents.
"start" : ISODate("2015-07-25T17:35:00Z"),
"end" : ISODate("2015-09-01T23:59:00Z"),
Currently they are in GMT +0, i need them to be GMT +8. Therefore i need to add 8 hours to the existing field. How do i do this via a mongodb query?
Advice appreciated.
Updated Code Snippet
var offset = 8,
bulk = db.collection.initializeUnorderedBulkOp(),
count = 0;
db.collection.find().forEach(doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { “startDateTime": new Date(
doc.startDateTime.valueOf() + ( 1000 * 60 * 60 * offset )
) }
});
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeUnorderedBulkOp();
}
});
if ( count % 1000 !=0 )
bulk.execute();

I aggree wholeheartedly with the answer provided by Ewan here in that you really should keep all times in a database in UTC. And all the sentiments are correct there. Only really adding to this with practical examples.
As a working example, Let's say I have two people using the data, one in New York and one in Sydney, being UTC-5 and UTC+10 respectively. Now consider the following data:
{ "date": ISODate("2015-08-01T04:40:03.389Z") }
Based on that, this is the time the actual "event" takes place. To the perspective of the user in Sydney the event takes place on the 1st August as a whole day where as to the person in New York it is still occuring on the 31st July.
If however I construct a "localized" time for Sydney as follows, the UTC consideration is still correct:
new Date("2015/08/01")
ISODate("2015-07-31T14:00:00Z")
This enforces the time difference like it should by converting from the local timezone to UTC. Therefore a localized date will select the correct values in UTC. So the Sydney user perpective of the start of the 1st August includes all times from 2pm on 31st July and similarly adjusted to the end date of a range selection. With data in UTC, this assertion from the client end it correct, and to their perpective the selected data was in the expected range.
In the case where you were "aggregating" results for a given day, then you build in the "time difference" math into the expression. So for UTC+10 you would do:
var offset = 10;
db.collection.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$add": [
{"$subtract": [ "$date", new Date(0)]},
offset * 1000 * 60 * 60
]},
{ "$mod": [
{ "$add": [
{ "$subtract": [ "$date", new Date(0) ] },
offset * 1000 * 60 * 60
]},
1000 * 60 * 60 * 24
]}
]
},
"count": { "$sum": 1 }
}}
Which then takes the "offset" for the locale in consideration when reporting back the "dates" to the perpective of the client that was viewing the data. So anything that occurred on an "Adjusted date" resulting in a different day such as the 31st August would be aggregated into the correct grouping by this adjustment.
Because your data may very well be used from the perpective of people in different timezones is exactly the reason why you should keep data for dates in UTC format. The client will do the work, or you can adjust accordingly where needed.
In short:
Client: Construct in local time, send in UTC
Server: Provide TZ Offset and adjust from UTC to local on return
Leave your dates in the correct format they are already in and use the methods described here to report on them.
But if you made a mistake
If however you made a mistake in contruction of your data and all times are actually "local" times but repesented as UTC, ie:
ISODate("2015-08-01T11:10:43.569Z") // actually meant to be 11am in UTC+10 :(
Where it should be:
ISODate("2015-08-01T01:10:43.569Z") // This is 11am UTC+10 :)
Then you would correct this as follows:
var offset = 10,
bulk = db.collection.initializeUnorderedBulkOp(),
count = 0;
db.collection.find().forEach(doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "date": new Date(
doc.date.valueOf() - ( 1000 * 60 * 60 * offset )
) }
});
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeUnorderedBulkOp();
}
});
if ( count % 1000 !=0 )
bulk.execute();
Reading each document to get the "date" value and adjusting that accordingly and sending the updated date value back to the document.

By default MongoDB stores all DateTimes as UTC.
There are 2 ways of doing this:
App side (Recommended)
When extracting the start and end from the database, in your language of choice just change it from a UTC to a local datetime.
To have a look at a good example in Python, check out this answer
Database side (Not recommended)
The other option is to write a mongodb query which adds 8 hours on to your start and end like you originally wanted. However this then sets the time as UTC but 8 hours in the future and becomes illogical for other developers and when parsing app side.
This requires updating based on another value in your document so you'll have to loop through each document as described here.

Related

Calculating moving average for every 5 seconds in MongoDB

I want to calculate moving average for my data in MongoDB. My data structure is as below
{
"_id" : NUUID("54ab1171-9c72-57bc-ba20-0a06b4f858b3"),
"DateTime" : ISODate("2018-05-30T21:31:05.957Z"),
"Type" : 3,
"Value" : NumberDecimal("15.905414991993847")
}
I want to calculate the average of values for each type within 2 days and for each 5 seconds. In this case I put Type in $match pipeline but I prefer to group the result by Type and separate the result by Type. Something I did is as below
var start = new Date("2018-05-30T21:31:05.957Z");
var end = new Date("2018-06-01T21:31:05.957Z");
var arr = new Array();
for (var i = 0; i < 34560; i++) {
start.setSeconds(start.getSeconds() + 5);
if (start <= end)
{
var a = new Date(start);
arr.push(a);
}
}
db.Data.aggregate([
{$match:{"DateTime":{$gte:new Date("2018-05-30T21:31:05.957Z"),
$lte:new Date("2018-06-01T21:31:05.957Z")}, "Type":3}},
{$bucket: {
groupBy: "$DateTime",
boundaries: arr,
default: "Other",
output: {
"count": { $sum: 1 },
"Value": {$avg:"$Value"}
}
}
}
])
It seems, it is working but the performance is too slow. How can I make this faster?
I reproduced the behavior you describe with 2 days worth of 1 second observations in the DB and a $match that pulls just one day's worth. The agg works "fine" if you bucket by, say, 60 seconds. But 15 seconds took 6 times as long, to 30 seconds. And every 5 seconds? 144 seconds. 5 seconds yields an array of 17280 buckets. Yep.
So I went client-side and dragged all 43200 docs to the client and created a naive linear search bucket slot finder and calc in javascript.
c=db.foo.aggregate([
{$match:{"date":{$gte:new Date(osv), $lte:new Date(endv) }}}
]);
c.forEach(function(r) {
var x = findSlot(arr, r['date']);
if(buckets[x] == undefined) {
buckets[x] = {lb: arr[x], ub: arr[x+1], n: 0, v:0};
}
var zz = buckets[x];
zz['n']++;
zz['v'] += r['val'];
});
This actually worked somewhat faster but same order of performance, about 92 seconds.
Next, I changed the linear search in findSlot to a bisection search. The 5 second bucket went from 144 seconds to .750 seconds: almost 200x faster. This includes dragging the 43200 records and running the forEach and bucketing logic above. So it stands to reason that $bucket may not be using a great algo and suffers when the bucket array is more than a couple hundred long.
Acknowledging this, we can instead make use of $floor of the delta between the start time and the observation time to bucket the data:
db.foo.aggregate([
{$match:{"date":{$gte:now, $lte:new Date(endv) }}}
// Bucket by turning offset from "now" into floor divided by the number
// of seconds of grouping. In this way, the resulting number becomes the
// slot into the virtual buckets, e.g.:
// date now diff/1000 floor # 5 seconds:
// 1514764800000 1514764800000 0 0
// 1514764802000 1514764800000 2 0
// 1514764804000 1514764800000 4 0
// 1514764806000 1514764800000 6 1
// 1514764808000 1514764800000 8 1
// 1514764810000 1514764800000 10 2
,{$addFields: {"ff": {$floor: {$divide: [ {$divide: [ {$subtract: [ "$date", now ]}, 1000.0 ]}, secondsBucket ] }} }}
// Now just group by the numeric slot number!
,{$group: {_id: "$ff", n: {$sum:1}, tot: {$sum: "$val"}, avg: {$avg: "$val"}} }
// Get it in 0-n order....
,{$sort: {_id: 1}}
]);
found 17280 in 204 millis
So we now have a server-side solution that is just .204 seconds, or 700x faster. And you don't have to sort the input because $group will take care of bundling the slot numbers. And the $sort after the $group is optional (but sort of handy...)

Execution time in mongo with different amount of returned results is the same

I query mongodb with pymongo to get results within date range timestamp.
My query looks like this:
start_TW = time.time()
startforTF = datetime(2016, 6, 1, 5, 0, 0).isoformat()
endforTF = datetime(2016, 6, 2, 5, 0, 0).isoformat()
pipeline= [{"$match": {"properties.timestamp": {"$gte": startforTF, "$lte": endforTF}}},
{ "$project" : {"properties.vessel_hash":"$properties.vessel_hash", "geometry.coordinates":"$geometry.coordinates", "_id":0}}]
fetchPosOfShipInTimewindow = db.samplecol.aggregate(pipeline, allowDiskUse=True)
end_TW = time.time()
print "Time to fetch all positions inside time frame: ", (end_TW - start_TW)
The query seems to work fine but I observe something odd. The execution time to fetch the results in a range of one day is pretty much the same with the execution time in a range of ten days.
For a range of one day the results are:
--execution time -> 0, 45
--records returned from dataset: 1.142.316
For a range of ten days the results are:
--execution time -> 0, 32
--records returned from dataset: 14.309.233
I do not understand how mongo manage to achieve this. Did mongo cache some results?

How to aggregate OHLC 5 min from 1 min nested array data (mongodb, mongoose)

I have a mongodb data storage with 1 minute OHLCV data like below (time, open, high, low, close, volume) stored using mongoose in nodejs.
{
"_id":1,
"__v":0,
"data":[
[
1516597690510,
885000,
885000,
885000,
885000,
121.2982
],
[
1516597739868,
885000,
885000,
885000,
885000,
121.2982
]
...
]
}
I need to extract in same format for 5 minute interval from this data. I could not find how to do that in mongodb/mongoose, even after several hours of searching as am newbie. Kindly help. It is confusing esp because its nested array, and not having fields inside array.
NOTE: Suppose for 5 min data, you will have 4 samples(arrays) of 1 min data from data base, then
time : time element of last 1 min data array (of that 5 min interval)
open : first element of first 1 min data array (of that 5 min interval)
high : max of 2nd element in all 1 min data arrays (of that 5 min interval)
low : min of 3rd element in all 1 min data arrays (of that 5 min interval)
close : last of 4th element in all 1 min data arrays (of that 5 min interval)
volume : last element of last array in all 1 min data arrays (of that 5 min interval)
Please check the visual representation here
Idea is to be able to extract 5 min, 10 min, 30 min, 1 hour, 4 hours, 1 day intervals also in the same manner from the base 1 min database.
You need to use aggregate pipeline for doing this comparing the first element in data array which is stored in epoch time, get the epoch time of your $start and $end interval, use that value in query
db.col.aggregate(
[
{$project : {
data : {
$filter : {input : "$data", as : "d", cond : {$and : [{$lt : [ {$arrayElemAt : ["$$d" , 0]}, "$end"]}, {$gte : [ {$arrayElemAt : ["$$d" , 0]}, "$start"]}]}}
}
}
}
]
).pretty()

How to create huge random document in MongoDB

I am newbie with MongoDB. I'm trying to create a database which will be included 10,000 data. The data will contain "username" and "Birthday".
I want to create 10,000 data with random username and birthday. Do we have a fastest way to create this kind of database?.
Thank you so much for your help!
Here are some functions that will help you to create a random string(name) and random date since 1950 untill 2000 and insert it into mongodb.
function getRandomInt(min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min;
}
function getRandomDate() {
// aprox nr of days since 1970 untill 2000: 30years * 365 days
var nr_days1 = 30*365;
// aprox nr of days since 1950 untill 1970: 20years * 365 days
var nr_days2 = -20*365;
// milliseconds in one day
var one_day=1000*60*60*24
// get a random number of days passed between 1950 and 2000
var days = getRandomInt(nr_days2, nr_days1);
return new Date(days*one_day)
}
for (var i = 1; i <= 10000; i++) {
db.test.insert(
{
name : "name"+i,
birthday: getRandomDate()
}
)
}
Best way will be read off docs about random generate in mongodb.
https://docs.mongodb.com/v2.6/tutorial/generate-test-data/
Also you could use special service for generate random data.
For example:
https://www.mockaroo.com/
I have try mgeneratejs, very easy to use, mgeneratejs
here is sample command, mgeneratejs print data to stdout, then use mongoimport import these data into mongod:
mongodb-osx-x86_64-4.0.1 $ mgeneratejs '{"name": "$name", "age": "$age", "emails": {"$array": {"of": "$email", "number": 3}}}' -n 5 | mongoimport --uri mongodb://localhost:27017/test --collection user --mode insert
2018-08-09T16:19:13.295+0800 connected to: localhost
2018-08-09T16:19:14.544+0800 imported 5 documents

D3.js- How to format tick values as quarters instead of months

I have a set of data in quarters. here is the array:
var dataGDP = [
{date: "Q1-2008", GDPreal: "2.8"},
{date: "Q2-2008", GDPreal: "0.6"},
{date: "Q3-2008", GDPreal: "-2.1"},
{date: "Q4-2008", GDPreal: "-4.3"},
{date: "Q1-2009", GDPreal: "-6.8"},
{date: "Q2-2009", GDPreal: "-6.3"},
{date: "Q3-2009", GDPreal: "-5"}
];
How do I get these dates to show up on my X axis like 1Q 2008, 2Q 2008, 3Q 2008..ect? my X axis uses a time based scale I'm not sure that there is a way to parse these dates as they are now using d3.time.format. I can however parse them if I use months instead like 01/2008, 04/2008... by using: parseDate = d3.time.format("%m/%Y").parse;
Should I write my dates in the array as months and then write a function to convert the months into quarters? or is there a way to keep the Q1..ect in array as it is now and parse the dates?
Here's how I solved this for my data.
Note that I took off 10 seconds from the date (x.getTime() - 10000) to account for the data seeing 3/31/2015 as midnight on 4/1/2015, which throws off the calculation. Depending on your data, you may or may not have to do this.
var xAxis = d3.svg.axis()
.scale( x )
.ticks( d3.time.months, 3 )
.tickFormat( function ( x ) {
// get the milliseconds since Epoch for the date
var milli = (x.getTime() - 10000);
// calculate new date 10 seconds earlier. Could be one second,
// but I like a little buffer for my neuroses
var vanilli = new Date(milli);
// calculate the month (0-11) based on the new date
var mon = vanilli.getMonth();
var yr = vanilli.getFullYear();
// return appropriate quarter for that month
if ( mon <= 2 ) {
return "Q1 " + yr;
} else if ( mon <= 5 ) {
return "Q2 " + yr;
} else if ( mon <= 8 ) {
return "Q3 " + yr;
} else {
return "Q4 " + yr;
}
} )
.orient( "bottom" );
D3 doesn't support quarters (neither parsing nor formatting them). Unless you explicitly need the time-related functionality, you could simply leave those values as they are and use an ordinal scale.
There's a good readme here of how to parse quarters manually but d3 doesn't automate the process.
Google group link
The quarter format was recently added
https://github.com/d3/d3-time-format/pull/58
%q yields an integer in the range [1,4]
To create a format function that yields 'Q1 2020', 'Q2 2020', etc.., do the following:
import { utcFormat } from 'd3-time-format';
const quarterFormat = d => `Q${utcFormat('%q %Y')(d)}`;