MongoDB query dates get no results - mongodb

Maintained a database called "human" with more than 300,000 results of news, but when I use the date query to get the results when the date is after 2018-01-04 it always shows 0 results.
> db.human.count({"crawled_time":{"$gte":new Date("2018-01-04")}})
0
> db.human.count({"crawled_time":{"$gte":new Date("2018-01-04T00:00:00.000Z")}})
0
> db.human.count({"version_created":{"$gte":new Date("2018-01-04T00:00:00.000Z")}})
0
> db.human.count({"version_created":{$gte:new Date("2018-01-04T00:00:00.000Z")}})
0
> db.human.count({"version_created":{$gte:new Date("2018-01-04T00:00:00.000Z")}})
0
> db.human.count({"version_created":{$gte:new Date("2018-01-04T00:00:00Z")}})
0
> db.human.count({"version_created":{"$gte":ISODate("2018-01-04T00:00:00.0000Z")}})
0
A sample of the database file json looks like this:
{"_id":"21adb21dc225406182f031c8e67699cc","_class":"com.pats.reuters.pojo.NewsData","alt_id":"nWGB30349","audiences":["NP:MNI"],"body":"ISSUER: City of Spencer, IA\nAMOUNT: $1,500,000\nDESCRIPTION: General Obligation Corporate Purpose Bonds, Series 2018\n------------------------------------------------------------------------\nSELLING: Feb 5 TIME: 11:00 AM., EST\nFINANCIAL ADVISOR: PFM Fin Advisors\n------------------------------------------------------------------------\n ","first_created":"2018-01-30T06:12:05.000Z","headline":"SEALED BIDS: City of Spencer, IA, $1.5M Ult G.O. On Feb 5","instances_of":[],"language":"en","message_type":2,"mime_type":"text/plain","provider":"NS:RTRS","pub_status":"stat:usable","subjects":["A:R","E:T","E:5I","E:S","A:95","A:85","M:1QD","N2:MUNI","N2:PS1","N2:SL1","N2:CM1","N2:IA1","N2:GOS"],"takeSequence":1,"urgency":3,"version_created":"2018-01-30T06:12:05.000Z","source_id":"WGB30349__180130279kIQIcAh81BiGVmb/Js54Wg3naQC6GXEu9+H","crawled_time":"2018-01-30 14:12:05"}
{"_id":"8ba08c4af9464c6b23cc508645d5bf03","_class":"com.pats.reuters.pojo.NewsData","alt_id":"nWGB3034a","audiences":["NP:MNI"],"body":"ISSUER: City of Long Branch, NJ\nAMOUNT: $31,629,415\nDESCRIPTION: Bond Anticipation Notes, Consisting of $22,629,415 Bond Anticipation Notes, Series 2018A and\n------------------------------------------------------------------------\nSELLING: Feb 1 TIME: 11:30 AM., EST\nFINANCIAL ADVISOR: N.A.\n------------------------------------------------------------------------\n ","first_created":"2018-01-30T06:12:06.000Z","headline":"SEALED BIDS: City of Long Branch, NJ, $31.629M Ult G.O. On Feb 1","instances_of":[],"language":"en","message_type":2,"mime_type":"text/plain","provider":"NS:RTRS","pub_status":"stat:usable","subjects":["G:6J","A:R","E:T","E:5I","E:S","A:9M","E:U","M:1QD","N2:US","N2:MUNI","N2:PS1","N2:SL1","N2:CM1","N2:NJ1","N2:NT1"],"takeSequence":1,"urgency":3,"version_created":"2018-01-30T06:12:06.000Z","source_id":"WGB3034a__1801302ksv4Iy0zSP5cscas0FlZgu1TpQ4Zh25VKCtSt","crawled_time":"2018-01-30 14:12:06"}
{"_id":"537f70076ef056c9a43d30c89500353a","_class":"com.pats.reuters.pojo.NewsData","alt_id":"nWGB3034b","audiences":["NP:MNI"],"body":"ISSUER: Independent School District No. 76 of Canadian County (Calumet), OK\nAMOUNT: $1,630,000\nDESCRIPTION: Combined Purpose Bonds of 2018\n------------------------------------------------------------------------\nSELLING: Feb 12 TIME: 05:00 PM., EST\nFINANCIAL ADVISOR: Stephen H. McDonald\n------------------------------------------------------------------------\n ","first_created":"2018-01-30T06:12:07.000Z","headline":"SEALED BIDS: Independent School District No. 76 of Canadian County (Calumet), OK, $1.63M Ult G.O. On Feb 12","instances_of":[],"language":"en","message_type":2,"mime_type":"text/plain","provider":"NS:RTRS","pub_status":"stat:usable","subjects":["A:R","E:T","E:5I","E:S","A:9R","M:1QD","N2:MUNI","N2:PS1","N2:SL1","N2:CM1","N2:OK1"],"takeSequence":1,"urgency":3,"version_created":"2018-01-30T06:12:07.000Z","source_id":"WGB3034b__1801302ev7DqID2Wr/BAJHrC/plpNKBQhrfuHBnlSldz","crawled_time":"2018-01-30 14:12:07"}

Your field value is not a date object. It is a string. So you can't use new Date("2018-01-04"). Use the below query.
db.human.count({"crawled_time":{"$gte":"2018-01-04"}})
db.human.count({"version_created":{"$gte":"2018-01-04"}})

Related

daily refresh of MongoDB Collection with Insert and update

I have a MongoDB collection where Data need to refreshed for certain fields every night. The Target collection has 3 extra custom fields which are used by end user input for respective documents.
So when daily refresh happens overnight, the data source can send new documents or updated data of existing documents. The documents can be upto 10,000.
I am using Pymongo and MongoDB to achieve this. My problem, how to identify the which record need to be updated and which record needs to be inserted with those 3 extra custom fields without impacting end user data.
For Example:
Data Source:
Manufacture Name Model Year Units
BMW 5Series 2019 10
BMW 5Series 2020 5
AUDI A4 2020 20
AUDI A7 2019 3
TOYOTA COROLLA 2020 5
TOYOTA CAMRY 2020 6
HONDA ACCORD 2020 10
HONDA PILOT 2019 15
HONDA CRV 2019 20
Once Loaded, the App table has 1 custom columns (Location) for user input
Manufacture Name Model Year Location Units
BMW 5Series 2019 London 10
BMW 5Series 2020 New York 5
AUDI A4 2020 Melbourne 20
AUDI A7 2019 London 3
TOYOTA COROLLA 2020 New York 5
TOYOTA CAMRY 2020 London 6
HONDA ACCORD 2020 Sydney 10
HONDA PILOT 2019 Tokyo 15
HONDA CRV 2019
On second day, we get new data as below
Manufacture Name Model Year Units
BMW 5Series 2019 10
BMW 5Series 2020 **35**
**BMW 7Series 2020 12**
AUDI A4 2020 20
AUDI A7 2019 3
**AUDI A6 2019 1**
TOYOTA COROLLA 2020 5
TOYOTA CAMRY 2020 6
HONDA ACCORD 2020 10
HONDA PILOT 2019 15
*HONDA CRV 2019 20* *-- deleted -- in second refresh*
The data can be 10,000 records. How to achieve this with Pymongo or MongodB? I wrote the code in PyMongo until the retrieve the source data and store the cursor in dictionary. Not sure how to proceed after this using MongoDB Upsert or bulk write and preserve/update Location column data for existing records and assign NULL values for new records.
Thanks
finally this is achieved as below:
import pymongo
from pymongo import UpdateOne
# Define MongoDB connection
client = pymongo.MongoClient(server, username='User', password='password', authSource='DBName', authMechanism='SCRAM-SHA-256')
#Source table
collection2 = db['cars2']
count123 = collection2.count_documents({})
#print("New Cars2 Data - Count before Insert:",count123)
source_cursor = collection2.find()
print("Cars2 - Count in Cursor:",source_cursor.count())
#Target Table
collection = db['cars']
tgt_count = collection.count_documents({})
print("Cars Collection - Count before Insert:",tgt_count)
sourcedata = []
#since this is a MongoDB Cursor object, we need push to List using list()
sourcedata = list(source_cursor)
source_cursor.close()
# ADD new columns to the Data before inserting in MongoDB
for item in sourcedata:
item['Location'] = None
item['last_refresh'] = datetime.now()
#sourcedata = source_cursor
ops = []
if tgt_count == 0:
print("Loading for First time:")
for rec in sourcedata:
#Load Data for first time with new fields
ops.append( UpdateOne({'name': rec['name'],'model':rec['model']}, {'$set': {'name':rec['name'],'model':rec['model'],'year':rec['year'],'units':rec['units'], 'Location': rec['Location'],'last_refresh':rec['last_refresh']}}, upsert=True))
#print(ops)
result = collection.bulk_write(ops)
print("Inserted Count:", result.inserted_count)
print("Matched Count:", result.matched_count)
print("Modified Count:", result.modified_count)
print("Upserted Count:", result.upserted_count)
elif tgt_count > 0:
print("Updating the Load:")
for rec in sourcedata:
#No Location field is included to avoid replacing the values of this field by NULL
ops.append( UpdateOne({'name': rec['name'],'model':rec['model']}, {'$set': {'name':rec['name'],'model':rec['model'],'year':rec['year'],'units':rec['units'], 'last_refresh':rec['last_refresh']}}, upsert=True))
result = collection.bulk_write(ops)
print("Inserted Count:", result.inserted_count)
print("Matched Count:", result.matched_count)
print("Modified Count:", result.modified_count)
print("Upserted Count:", result.upserted_count)
#because I didn’t include “Location” field above, the new UPSERT records DO NOT have “Location” field anymore. So I have to update the collection one more time to include “Location” field
nullfld_result = collection.update_many({'Location':None},{ '$set':{'Location':None}})
count2 = collection.count_documents({})
print("Count after Insert:",count2)

How to add blank records for grouping based on formula in Crystal Reports

I have one table and group the records using formula, based on a string field which is formed as time (HH:mm:ss)
Formula is as followings:
select Minute (TimeValue ({MASTER.Saat}))
case 0 to 14: ReplicateString ("0", 2-len(TOTEXT(Hour (TimeValue ({MASTER.Saat})),0))) & TOTEXT(Hour (TimeValue ({MASTER.Saat})),0) & ":00:00"
case 15 to 29: ReplicateString ("0", 2-len(TOTEXT(Hour (TimeValue ({MASTER.Saat})),0))) & TOTEXT(Hour (TimeValue ({MASTER.Saat})),0) & ":15:00"
case 30 to 44: ReplicateString ("0", 2-len(TOTEXT(Hour (TimeValue ({MASTER.Saat})),0))) & TOTEXT(Hour (TimeValue ({MASTER.Saat})),0) & ":30:00"
case 45 to 59: ReplicateString ("0", 2-len(TOTEXT(Hour (TimeValue ({MASTER.Saat})),0))) & TOTEXT(Hour (TimeValue ({MASTER.Saat})),0) & ":45:00"
Actually, grouping works fine but my problem is that if there is no data in the table for a period, I can not show that in the report.
As an example;
Let my data has 5 records as following:
11:01:03
11:16:07
11:28:16
12:18:47
12:22:34
My report gives the result as following:
Period | Total Records
11:00:00 | 1
11:15:00 | 2
12:15:00 | 2
In this situation, I can not show the periods (which are missing in the table) as 0 for Total Records. I have to show as follows:
Period | Total Records
11:00:00 | 1
11:15:00 | 2
11:30:00 | 0
11:45:00 | 0
12:00:00 | 0
12:15:00 | 2
Thanks for all suggestions.
You can't group something that's not there. One way to solve this is to use a table that provides all of your intervals you want to look at (called a date, time or number table).
For your case create a table that contains all your period values (24x4). join the records you want to count to this table. In Crystal Reports group by the period values - your result set will contain the periods without any joined records - you can detect this and output a 0.
You may want to look a this question, it is similar to yours.

Merging average of time series corresponding to time span in a different data set

I have two datasets, one with contracts and one with market prices. The gist of what I am trying to accomplish is to find the average value of a time series that corresponds to a period of time in a cross-sectional data set. Please see below.
Example Dataset 1:
Beginning Ending Price
1/1/2014 5/15/2014 $19.50
3/2/2012 10/9/2015 $20.31
...
1/1/2012 1/8/2012 $19.00
In the example above there are several contracts, the first spanning from January 2014 to May 2014, the second from March 2012 to October 2015. Each one has a single price. The second dataset has weekly market prices.
Example Dataset 2:
Date Price
1/1/2012 $18
1/8/2012 $17.50
....
1/15/2015 $21.00
I would like to find the average "market price" (i.e. the average of the price in dataset 2) between the beginning and ending period for each contract on dataset 1. So, for the third contract from 1/1/2012 to 1/8/2012, from the second dataset the output would be (18+17.50)/2 = 17.75. Then merge this value back to the original dataset.
I work with Stata, but can also work with R or Excel.
Also, if you have a better suggestion for a title I would really appreciate it!
You can cross the contracts cross section data with the time series, which forms every pairwise combination, drop the prices from outside the date range, and calculate the mean like this:
/* Fake Data */
tempfile ts ccs
clear
input str9 d p_daily
"1/1/2012" 18
"1/8/2012" 17.50
"1/15/2015" 21.00
end
gen date = date(d,"MDY")
format date %td
drop d
rename date d
save `ts'
clear
input id str8 bd str9 ed p_contract
1 "1/1/2014" "5/15/2014" 19.50
2 "3/2/2012" "10/9/2015" 20.31
3 "1/1/2012" "1/8/2012" 19.00
end
foreach var of varlist bd ed {
gen date = date(`var',"MDY")
format date %td
drop `var'
rename date `var'
}
save `ccs'
/* Calculate Mean Prices and Merge Contracts Back In */
cross using `ts'
sort id d
keep if d >= bd & d <=ed
collapse (mean) mean_p = p_daily, by(id bd ed p_contract)
merge 1:1 id using `ccs', nogen
sort id
This gets you something like this:
id p_contract bd ed mean_p
1 19.5 01jan2014 15may2014 .
2 20.31 02mar2012 09oct2015 21
3 19 01jan2012 08jan2012 17.75

Qlikview - Data between dates; filter out data past or future data depending on selected date

I've seen threads where the document has Start Date and End Date "widgets" where users type in their dates, however, I'm looking for a dynamic solution, for example on the table below, when I select a date, say "1/1/2004", I only want to see active players (this would exclude Michael Jordan only).
Jersey# Name RookieYr RetirementYr Average PPG
23 Michael Jordan 1/1/1984 1/1/2003 24
33 Scotty Pippen 1/1/1987 1/1/2008 15
1 Derrick Rose 1/1/2008 1/1/9999 16
25 Vince Carter 1/1/1998 1/1/9999 18
The most flexible way is to IntervalMatch the RookieYr * RetireYr dates into a table of all dates. See http://qlikviewcookbook.com/recipes/download-info/count-days-in-a-transaction-using-intervalmatch/ for a complete example.
Here's the interval match for your data. You'll can obviously create your calendar however you want.
STATS:
load * inline [
Jersey#, Name, RookieYr, RetirementYr, Average, PPG
23, Michael Jordan, 1/1/1984, 1/1/2003, 24
33, Scotty Pippen, 1/1/1987, 1/1/2008, 15
1, Derrick Rose, 1/1/2008, 1/1/9999, 16
25, Vince Carter, 1/1/1998, 1/1/9999, 18
];
let zDateMin=37000;
let zDateMax=40000;
DATES:
LOAD
Date($(zDateMin) + IterNo() - 1) as [DATE],
year( Date($(zDateMin) + IterNo() - 1)) as YEAR,
month( Date($(zDateMin) + IterNo() - 1)) as MONTH
AUTOGENERATE 1
WHILE $(zDateMin)+IterNo()-1<= $(zDateMax);
INTERVAL:
IntervalMatch (DATE) load RookieYr, RetirementYr resident STATS;
left join (DATES) load * resident INTERVAL; drop table INTERVAL;
There's not much to it you need to load 2 tables one with the start and end dates and one with the calendar dates then you interval match the date field to the start and end field and from there it will work the last join is just to tidy up a bit.
The result of all of that is this ctrl-t. Don't worry about the Syn key it is required to maintain the interval matching.
Then you can have something like this.
Derrick Rose is also excluded since he had not started by 1/1/2004

Aggregation framework for MongoDB

I have following schema:
Customer ID
Location Name
Time of Visit
The above stores the information of all customer's visit at various locations.
I would like to know if there's a way to write an aggregate query in MongoDB, so that it gives the Total Visitor information by different sections of the day, per day per location.
Sections of the day:
EDIT:
12 am - 8 am
8 am - 11 am
11 am - 1 pm
1 pm - 4 pm
4 pm - 8 pm
8 pm - 12 pm
If a customer visits a location on the same day and same section of the day more than once, it should be counted just once. However, if that customer visits a location on the same day but for different sections of the day, it should be counted exactly once for each of the section of the day he has appeared in.
Example:
Customer 1 visits store A on day 1 at 9:30 AM
Customer 1 visits store A on day 1 at 10:30 PM
Customer 1 visits store B on day 2 at 9:30 AM
Customer 1 visits store B on day 2 at 11:30 AM
Customer 1 visits store B on day 2 at 2:45 PM
Customer 2 visits store A on day 1 at 9:45 AM
Customer 2 visits store B on day 1 at 11:00 AM
Customer 2 visits store B on day 2 at 9:45 AM
Final output of repeat visits:
Store B, Day 1, Section (00:00 - 08:00) : 0 Visitors
Store B, Day 1, Section (08:00 - 16:00) : 2 Visitors
Store B, Day 1, Section (16:00 - 24:00) : 1 Visitors
Store B, Day 2, Section (00:00 - 08:00) : 0 Visitors
Store B, Day 2, Section (08:00 - 16:00) : 2 Visitors
Store B, Day 2, Section (16:00 - 24:00) : 0 Visitors
Is there any way the above kind of query could be done using aggregation framework for MongoDB?
Yes, this can be done quite simply. It's very similar to the query that I describe in the answer to your previous question, but rather than aggregating by day, you need to aggregate by day-hour-combinations.
To start with, rather than doing a group you will need to project a new part of date where you need to transform your "Time of Visit" field to the appropriate hour form. Let's look at one way to do it:
{$project : { newDate: {
y:{$year:"$tov"}, m:{$month:"$tov"}, d:{$dayOfMonth:"$tov"},
h: { $subtract :
[ { $hour : "$tov" },
{ $mod : [ { $hour : "$tov" }, 8 ] }
]
}
},
customerId:1, locationId:1
}
}
As you can see this generates year, month, day and hour but the hour is truncated to mod 8 (so you get 0, 8(am), or 16 aka 4pm.
Next we can do the same steps we did before, but now we are aggregating to a different level of time granularity.
There are other ways of achieving the same thing, you can see some examples of date manipulation on my blog.