How to use Aggregation along with group by and sum - mongodb

How to get the sum of purchased deal's price (current year data) group by week, day, year using purchased_at field
My collection data:
{
"_id": ObjectId("5a66d619042e9f3a070d6864"),
"name": "Deal1",
"price": "2000",
"status": true,
"purchased_at": ISODate("2018-01-23T06:28:41.0Z")
}
{
"_id": ObjectId("5a66d619042e9f3a070d6872"),
"name": "Deal2",
"price": "500",
"status": true,
"purchased_at": ISODate("2018-01-13T06:28:41.0Z")
}
{
"_id": ObjectId("5a66d619042e9f3a070d6880"),
"name": "Deal3",
"price": "1000",
"status": true,
"purchased_at": ISODate("2018-02-13T06:28:41.0Z")
}
{
"_id": ObjectId("5a66d619042e9f3a070d6880"),
"name": "Deal4",
"price": "1000",
"status": false,
"purchased_at": ISODate("2018-01-11T06:28:41.0Z")
}
Can someone please help?

Since you're using non standard date format, you need to use filter():
$lastWeekSum = $collection->filter(function($i) {
Carbon::parse($i['purchased_at'])->gt(now()->subWeek());
})->sum('price');
If $i['purchased_at'] returns an object, you should convert it to a string like 2018-01-11T06:28:41.0Z before parsing it.

Related

How to exclude some item when query with typeorm in Nestjs and Postgres

I want to build a query when i send username to server so i can exclude the record has the excludeUsers has the same username i sent.
Here is my data : `
"id": "f4830220-9912-4cbb-b685-edf4aaaf8fd5",
"createdAt": "2022-03-24T10:19:48.096Z",
"updatedAt": "2022-03-24T10:26:42.487Z",
"isDeleted": false,
"username": "vietphuongthoa98",
"link": "",
"type": "follow",
"current": 13456,
"target": 13556,
"totalPurchase": 100,
"purchasedPacks": [
{
"id": "fbb3079b-32c9-4c4b-a297-16741d1f5485",
"name": "Some packets",
"type": "follow",
"count": 2,
"price": 100,
"follows": 50,
"createdAt": "2022-03-23T07:18:22.898Z",
"isDeleted": false,
"updatedAt": "2022-03-23T09:49:30.192Z",
"description": "Some packages description.",
"originPrice": 0
}
],
"excludeUsers": [
{
"id": "9d8c25d2-8f03-46b3-92fa-b9489b943a56",
"deviceName": "iPhone 12 Pro Max",
"username": "hoaa.hanassii",
"platform": "ios",
"timeZone": 0,
"deviceAge": 21,
"subscribed": true,
"coins": 19600,
"subscriptionExpiration": "2022-04-27T00:00:00.000Z",
"isBlocked": false
}
]
Here is my query builder :
queryBuilder
.leftJoin('featured_user.excludeUsers', 'excludeUsers')
.andWhere('excludeUsers.username != :username', {
username: filter.username,
});
The actual query result is :
SELECT "featured_user"."id" AS "featured_user_id",
"featured_user"."tenant_id" AS "featured_user_tenant_id",
"featured_user"."created_at" AS "featured_user_created_at",
"featured_user"."updated_at" AS "featured_user_updated_at",
"featured_user"."is_deleted" AS "featured_user_is_deleted",
"featured_user"."username" AS "featured_user_username",
"featured_user"."link" AS "featured_user_link",
"featured_user"."type" AS "featured_user_type",
"featured_user"."current" AS "featured_user_current",
"featured_user"."target" AS "featured_user_target",
"featured_user"."total_purchase" AS "featured_user_total_purchase",
"featured_user"."purchased_packs" AS "featured_user_purchased_packs",
"featured_user"."user_id" AS "featured_user_user_id",
"excludeUsers"."username" AS "excludeUsers_username",
"excludeUsers"."id" AS "excludeUsers_id"
FROM "featured_users" "featured_user"
LEFT JOIN "featured_users_exclude_users_users" "featured_user_excludeUsers" ON "featured_user_excludeUsers"."featured_users_id"="featured_user"."id"
LEFT JOIN "users" "excludeUsers" ON "excludeUsers"."id"="featured_user_excludeUsers"."users_id"
WHERE "excludeUsers"."username" != $1 -- PARAMETERS: ["hoaa.hanassii"]
The problem is nothing response. Any ideal ? Thanks
can you add getMany() end of querybuilder and try
queryBuilder
.leftJoin('featured_user.excludeUsers', 'excludeUsers')
.andWhere('excludeUsers.username != :username', {
username: filter.username,
}).getMany()

Updating Mongo DB collection field from object to array of objects

I had to change one of the fields of my collection in mongoDB from an object to array of objects containing a lot of data. New documents get inserted without any problem, but when attempted to get old data, it never maps to the original DTO correctly and runs into errors.
subject is the field that was changed in Students collection.
I was wondering is there any way to update all the records so they all have the same data type, without losing any data.
The old version of Student:
{
"_id": "5fb2ae251373a76ae58945df",
"isActive": true,
"details": {
"picture": "http://placehold.it/32x32",
"age": 17,
"eyeColor": "green",
"name": "Vasquez Sparks",
"gender": "male",
"email": "vasquezsparks#orbalix.com",
"phone": "+1 (962) 512-3196",
"address": "619 Emerald Street, Nutrioso, Georgia, 6576"
},
"subject":
{
"id": 0,
"name": "math",
"module": {
"name": "Advanced",
"semester": "second"
}
}
}
This needs to be updated to the new version like this:
{
"_id": "5fb2ae251373a76ae58945df",
"isActive": true,
"details": {
"picture": "http://placehold.it/32x32",
"age": 17,
"eyeColor": "green",
"name": "Vasquez Sparks",
"gender": "male",
"email": "vasquezsparks#orbalix.com",
"phone": "+1 (962) 512-3196",
"address": "619 Emerald Street, Nutrioso, Georgia, 6576"
},
"subject": [
{
"id": 0,
"name": "math",
"module": {
"name": "Advanced",
"semester": "second"
}
},
{
"id": 1,
"name": "history",
"module": {
"name": "Basic",
"semester": "first"
}
},
{
"id": 2,
"name": "English",
"module": {
"name": "Basic",
"semester": "second"
}
}
]
}
I understand there might be a way to rename old collection, create new and insert data based on old one in to new one. I was wondering for some direct way.
The goal is to turn subject into an array of 1 if it is not already an array, otherwise leave it alone. This will do the trick:
update args are (predicate, actions, options).
db.foo.update(
// Match only those docs where subject is an object (i.e. not turned into array):
{$expr: {$eq:[{$type:"$subject"},"object"]}},
// Actions: set subject to be an array containing $subject. You MUST use the pipeline version
// of the update actions to correctly substitute $subject in the expression!
[ {$set: {subject: ["$subject"] }} ],
// Do this for ALL matches, not just first:
{multi:true});
You can run this converter over and over because it will ignore converted docs.
If the goal is to convert and add some new subjects, preserving the first one, then we can set up the additional subjects and concatenate them into one array as follows:
var mmm = [ {id:8, name:"CORN"}, {id:9, name:"DOG"} ];
rc = db.foo.update({$expr: {$eq:[{$type:"$subject"},"object"]}},
[ {$set: {subject: {$concatArrays: [["$subject"], mmm]} }} ],
{multi:true});

MongoDB - MongoImport of JSON (jsonl) - Rename, change types and add fields

i'm new to the topic MongoDB and have 4 different problems importing a big (16GB) file (jsonl) into my MongoDB (simple PSA-Cluster).
Below attached you will find a sample entry from the mentiond JSON-Dump.
With this file which i get from an external provider I actually have 4 problems.
"hotel_id" is the key and should normally be (re-)named as "_id"
"hotel_id" should not be treated as string rather than as Number
"location" is not properly formatted (if i understood correctly the MongoDB Manual) as GeoJSON as it should be like
"location": {
"type": "Point",
"coordinates": [-93.26838,37.15845]
}
instead of
"location": {
"coordinates": {
"latitude": 37.15845,
"longitude": -93.26838
}
}
"dates" can this be used to efficiently update just the records which needs to be updated?
So my challenge is now to transform the data according to my needs before importing the data or at time of import, but in both cases of course as quickly as possible.
Therefore i searched a lot for hints and best practices, but i was not able to find a solution yet, maybe due to the fact that i'm a beginner with MongoDB.
I played around with "jq" to adjust the data and for example add the type which seems to be necessary for the location (point 3), but wasn't really successful.
cat dump.jsonl | ./bin/jq --arg typeOfField Point '.location + {type: $typeOfField}'
Beside that i was injecting a sample dump of round-about 500MB which took 1,5 mins when importing it the first time (empty database). If i run it in "upsert" mode it will take round-about 12 hours. So i was also wondering what is the best practice to import such a big JSON-dump?
Any help is appreciated!! :-)
Kind regards,
Lumpy
{
"hotel_id": "12345",
"name": "Test Hotel",
"address": {
"line_1": "123 Test St",
"line_2": "Apt A",
"city": "Test City",
},
"ratings": {
"property": {
"rating": "3.5",
"type": "Star"
},
"guest": {
"count": 48382,
"average": "3.1"
}
},
"location": {
"coordinates": {
"latitude": 22.54845,
"longitude": -90.11838
}
},
"phone": "555-0153",
"fax": "555-7249",
"category": {
"id": 1,
"name": "Hotel"
},
"rank": 42,
"dates": {
"added": "1998-07-19T05:00:00.000Z",
"updated": "2018-03-22T07:23:14.000Z"
},
"statistics": {
"11": {
"id": 11,
"name": "Total number of rooms - 220",
"value": "220"
},
"12": {
"id": 12,
"name": "Number of floors - 7",
"value": "7"
}
},
"chain": {
"id": -2,
"name": "Test Hotels"
},
"brand": {
"id": 2,
"name": "Test Brand"
}
}

How to perform date arithmetic between nested and unnested dates in Elasticsearch?

Consider the following Elasticsearch (v5.4) object (an "award" doc type):
{
"name": "Gold 1000",
"date": "2017-06-01T16:43:00.000+00:00",
"recipient": {
"name": "James Conroy",
"date_of_birth": "1991-05-30"
}
}
The mapping type for both award.date and award.recipient.date_of_birth is "date".
I want to perform a range aggregation to get a list of the age ranges of the recipients of this award ("Under 18", "18-24", "24-30", "30+"), at the time of their award. I tried the following aggregation query:
{
"size": 0,
"query": {"match_all": {}},
"aggs": {
"recipients": {
"nested": {
"path": "recipient"
},
"aggs": {
"age_ranges": {
"range": {
"script": {
"inline": "doc['date'].date - doc['recipient.date_of_birth'].date"
},
"keyed": true,
"ranges": [{
"key": "Under 18",
"from": 0,
"to": 18
}, {
"key": "18-24",
"from": 18,
"to": 24
}, {
"key": "24-30",
"from": 24,
"to": 30
}, {
"key": "30+",
"from": 30,
"to": 100
}]
}
}
}
}
}
}
Problem 1
But I get the following error due to the comparison of dates in the script portion:
Cannot apply [-] operation to types [org.joda.time.DateTime] and [org.joda.time.MutableDateTime].
The DateTime object is the award.date field, and the MutableDateTime object is the award.recipient.date_of_birth field. I've tried doing something like doc['recipient.date_of_birth'].date.toDateTime() (which doesn't work despite the Joda docs claiming that MutableDateTime has this method inherited from a parent class). I've also tried doing something further like this:
"script": "ChronoUnit.YEARS.between(doc['date'].date, doc['recipient.date_of_birth'].date)"
Which sadly also doesn't work :(
Problem 2
I notice if I do this:
"aggs": {
"recipients": {
"nested": {
"path": "recipient"
},
"aggs": {
"award_years": {
"terms": {
"script": {
"inline": "doc['date'].date.year"
}
}
}
}
}
}
I get 1970 with a doc_count that happens to equal the total number of docs in ES. This leads me to believe that accessing a property outside of the nested object simply does not work and gives me back some default like the epoch datetime. And if I do the opposite (aggregating dates of birth without nesting), I get the exact same thing for all the dates of birth instead (1970, epoch datetime). So how can I compare those two dates?
I am racking my brain here, and I feel like there's some clever solution that is just beyond my current expertise with Elasticsearch. Help!
If you want to set up a quick environment for this to help me out, here is some curl goodness:
curl -XDELETE http://localhost:9200/joelinux
curl -XPUT http://localhost:9200/joelinux -d "{\"mappings\": {\"award\": {\"properties\": {\"name\": {\"type\": \"string\"}, \"date\": {\"type\": \"date\", \"format\": \"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ\"}, \"recipient\": {\"type\": \"nested\", \"properties\": {\"name\": {\"type\": \"string\"}, \"date_of_birth\": {\"type\": \"date\", \"format\": \"yyyy-MM-dd\"}}}}}}}"
curl -XPUT http://localhost:9200/joelinux/award/1 -d '{"name": "Gold 1000", "date": "2016-06-01T16:43:00.000000+00:00", "recipient": {"name": "James Conroy", "date_of_birth": "1991-05-30"}}'
curl -XPUT http://localhost:9200/joelinux/award/2 -d '{"name": "Gold 1000", "date": "2017-02-28T13:36:00.000000+00:00", "recipient": {"name": "Martin McNealy", "date_of_birth": "1983-01-20"}}'
That should give you a "joelinux" index with two "award" docs to test this out ("James Conroy" and "Martin McNealy"). Thanks in advance!
Unfortunately, you can't access nested and non-nested fields within the same context. As a workaround, you can change your mapping to automatically copy date from nested document to root context using copy_to option:
{
"mappings": {
"award": {
"properties": {
"name": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"date": {
"type": "date"
},
"date_of_birth": {
"type": "date" // will be automatically filled when indexing documents
},
"recipient": {
"properties": {
"name": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"date_of_birth": {
"type": "date",
"copy_to": "date_of_birth" // copy value to root document
}
},
"type": "nested"
}
}
}
}
}
After that you can access date of birth using path date, though the calculations to get number of years between dates are slightly tricky:
Period.between(LocalDate.ofEpochDay(doc['date_of_birth'].date.getMillis() / 86400000L), LocalDate.ofEpochDay(doc['date'].date.getMillis() / 86400000L)).getYears()
Here I convert original JodaTime date objects to system.time.LocalDate objects:
Get number of milliseconds from 1970-01-01
Convert to number of days from 1970-01-01 by dividing it to 86400000L (number of ms in one day)
Convert to LocalDate object
Create date-based Period object from two dates
Get number of years between two dates.
So, the final aggregation query looks like this:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"age_ranges": {
"range": {
"script": {
"inline": "Period.between(LocalDate.ofEpochDay(doc['date_of_birth'].date.getMillis() / 86400000L), LocalDate.ofEpochDay(doc['date'].date.getMillis() / 86400000L)).getYears()"
},
"keyed": true,
"ranges": [
{
"key": "Under 18",
"from": 0,
"to": 18
},
{
"key": "18-24",
"from": 18,
"to": 24
},
{
"key": "24-30",
"from": 24,
"to": 30
},
{
"key": "30+",
"from": 30,
"to": 100
}
]
}
}
}
}

MongoDB Database Structure and Best Practices Help

I'm in the process of developing Route Tracking/Optimization software for my refuse collection company and would like some feedback on my current data structure/situation.
Here is a simplified version of my MongoDB structure:
Database: data
Collections:
“customers” - data collection containing all customer data.
[
{
"cust_id": "1001",
"name": "Customer 1",
"address": "123 Fake St",
"city": "Boston"
},
{
"cust_id": "1002",
"name": "Customer 2",
"address": "123 Real St",
"city": "Boston"
},
{
"cust_id": "1003",
"name": "Customer 3",
"address": "12 Elm St",
"city": "Boston"
},
{
"cust_id": "1004",
"name": "Customer 4",
"address": "16 Union St",
"city": "Boston"
},
{
"cust_id": "1005",
"name": "Customer 5",
"address": "13 Massachusetts Ave",
"city": "Boston"
}, { ... }, { ... }, ...
]
“trucks” - data collection containing all truck data.
[
{
"truckid": "21",
"type": "Refuse",
"year": "2011",
"make": "Mack",
"model": "TerraPro Cabover",
"body": "Mcneilus Rear Loader XC",
"capacity": "25 cubic yards"
},
{
"truckid": "22",
"type": "Refuse",
"year": "2009",
"make": "Mack",
"model": "TerraPro Cabover",
"body": "Mcneilus Rear Loader XC",
"capacity": "25 cubic yards"
},
{
"truckid": "12",
"type": "Dump",
"year": "2006",
"make": "Chevrolet",
"model": "C3500 HD",
"body": "Rugby Hydraulic Dump",
"capacity": "15 cubic yards"
}
]
“drivers” - data collection containing all driver data.
[
{
"driverid": "1234",
"name": "John Doe"
},
{
"driverid": "4321",
"name": "Jack Smith"
},
{
"driverid": "3421",
"name": "Don Johnson"
}
]
“route-lists” - data collection containing all predetermined route lists.
[
{
"route_name": "monday_1",
"day": "monday",
"truck": "21",
"stops": [
{
"cust_id": "1001"
},
{
"cust_id": "1010"
},
{
"cust_id": "1002"
}
]
},
{
"route_name": "friday_1",
"day": "friday",
"truck": "12",
"stops": [
{
"cust_id": "1003"
},
{
"cust_id": "1004"
},
{
"cust_id": "1012"
}
]
}
]
"routes" - data collections containing data for all active and completed routes.
[
{
"routeid": "1",
"route_name": "monday1",
"start_time": "04:31 AM",
"status": "active",
"stops": [
{
"customerid": "1001",
"status": "complete",
"start_time": "04:45 AM",
"finish_time": "04:48 AM",
"elapsed_time": "3"
},
{
"customerid": "1010",
"status": "complete",
"start_time": "04:50 AM",
"finish_time": "04:52 AM",
"elapsed_time": "2"
},
{
"customerid": "1002",
"status": "incomplete",
"start_time": "",
"finish_time": "",
"elapsed_time": ""
},
{
"customerid": "1005",
"status": "incomplete",
"start_time": "",
"finish_time": "",
"elapsed_time": ""
}
]
}
]
Here is the process thus far:
Each day drivers begin by Starting a New Route. Before starting a new route drivers must first input data:
driverid
date
truck
Once all data is entered correctly the Start a New Route will begin:
Create new object in collection “routes”
Query collection “route-lists” for “day” + “truck” match and return "stops"
Insert “route-lists” data into “routes” collection
As driver proceeds with his daily stops/tasks the “routes” collection will update accordingly.
On completion of all tasks the driver will then have the ability to Complete the Route Process by simply changing “status” field to “active” from “complete” in the "routes" collection.
That about sums it up. Any feedback, opinions, comments, links, optimization tactics are greatly appreciated.
Thanks in advance for your time.
You database schema looks like for me as 'classic' relational database schema. Mongodb good fit for data denormaliztion. I guess when you display routes you loading all related customers, driver, truck.
If you want make your system really fast you may embedd everything in route collection.
So i suggest following modifications of your schema:
customers - as-is
trucks - as-is
drivers - as-is
route-list:
Embedd data about customers inside stops instead of reference. Also embedd truck. In this case schema will be:
{
"route_name": "monday_1",
"day": "monday",
"truck": {
_id = 1,
// here will be all truck data
},
"stops": [{
"customer": {
_id = 1,
//here will be all customer data
}
}, {
"customer": {
_id = 2,
//here will be all customer data
}
}]
}
routes:
When driver starting new route copy route from route-list and in addition embedd driver information:
{
//copy all route-list data (just make new id for the current route and leave reference to routes-list. In this case you will able to sync route with route-list.)
"_id": "1",
route_list_id: 1,
"start_time": "04:31 AM",
"status": "active",
driver: {
//embedd all driver data here
},
"stops": [{
"customer": {
//all customer data
},
"status": "complete",
"start_time": "04:45 AM",
"finish_time": "04:48 AM",
"elapsed_time": "3"
}]
}
I guess you asking yourself what do if driver, customer or other denormalized data changed in main collection. Yeah, you need update all denormalized data within other collections. You will probably need update billions of documents (depends on your system size) and it's okay. You can do it async if it will take much time.
What benfits in above data structure?
Each document contains all data that you may need to display in your application. So, for instance, you no need load related customers, driver, truck when you need display routes.
You can make any difficult queries to your database. For example in your schema you can build query that will return all routes thats contains stops in stop of customer with name = "Bill" (you need load customer by name first, get id, and look by customer id in your current schema).
Probably you asking yourself that your data can be unsynchronized in some cases, but to solve this you just need build a few unit test to ensure that you update your denormolized data correctly.
Hope above will help you to see the world from not relational side, from document database point of view.