Percentage calculation in Druid - druid

I have a data in the following format. I am having a hard time doing profit percentage calculation of a particular item or brand w.r.t to all products on a particular day.
Date Item Brand Profit
15-08-2019 A Nike 5
15-08-2019 B Nike 10
15-08-2019 C Nike 12
15-08-2019 D Nike 6
15-08-2019 E Nike 9
15-08-2019 F Adidas 4
15-08-2019 G Adidas 3
15-08-2019 H Adidas 7
16-08-2019 A Nike 8
16-08-2019 B Nike 4
16-08-2019 C Nike 6
16-08-2019 D Nike 7
16-08-2019 E Nike 9
16-08-2019 F Adidas 5
16-08-2019 G Adidas 4
16-08-2019 H Adidas 9
percentage profit of product A on 15th August = profit of A/sum of profits on 15th Aug (5/56).
percentage profit of Nike on 16th Aug = 34/52.
I need to do this calculation in a single query.
If we run it in two parts- first query will fetch data with the date and dimension filters and the second query will fetch me the data with only the date filter. Then i will divide these two.
Cannot figure out a way to combine these in just one query and do the calculation.

Profit percentage calculation of a particular item or brand in your case can be done using filtered aggregation.
For brand 'Nike' following json will work:
{
"queryType": "timeseries",
"dataSource": <your_datasource>,
"granularity": "day",
"aggregations": [
{
"type" : "filtered",
"filter" : {
"type" : "selector",
"dimension" : "Brand",
"value" : "Nike"
},
"aggregator" : {
"type" : "longSum",
"name" : "brand_sum",
"fieldName" : "Profit"
}
},
{ "type": "longSum", "name": "total_sum", "fieldName": "Profit" }
],
"postAggregations": [
{ "type": "arithmetic",
"name": "average_profit",
"fn": "/",
"fields": [
{ "type": "fieldAccess", "name": "brand_sum", "fieldName": "brand_sum" },
{ "type": "fieldAccess", "name": "total_sum", "fieldName": "total_sum" }
]
}
],
"intervals": [ "15-08-2019/16-08-2019" ]
}
It will give average of profit for particular brand (here 'Nike') for all the days (here 15th and 16th.)

Related

how to Write a script to read the data from the `MongoDB` database and produce a `JSON` file as the given format

I have four collections.
CollegeLocation -- Json format
{
"MDTC": {
"collegeName": "College of Medicine - Tucson",
"1002": 10,
"1001": 234
},
"SCNC": {
"collegeName": "College of Science",
"1002": 24,
"1003": 21,
"1001": 16
},
"LAWC": {
"collegeName": "James E Rogers College of Law",
"1002": 234,
"1003": 213
}
}
LocationEmployeeDetails -- Json format
{
"1002": {
"loc_id": "1002",
"kmapids": [
"palmerjo",
"cwesterl",
"lumbee"
]
},
"1001": {
"loc_id": "1001",
"kmapids": [
"skaib",
"dcorso",
"ghuck",
"macmccallum"
]
},
"1003": {
"loc_id": "1003",
"kmapids": [
"ghuck",
"cwesterl",
"dcorso",
"witte",
"lumbee"
]
}
}
LocationDetails - json
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
32.0,
49.0
]
},
"properties": {
"country": "Ukraine",
"city": "",
"name": "Ukraine",
"state": "",
"loc_id": "1001"
}
},
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
-4.0,
40.0
]
},
"properties": {
"country": "Spain",
"city": "",
"name": "Spain",
"state": "",
"loc_id": "1002"
}
},
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
-73.087749,
41.6032207
]
},
"properties": {
"country": "United States",
"city": "",
"name": "Connecticut",
"state": "Connecticut",
"loc_id": "1003"
}
}
]
}
EmployeeDetails - csv
name kmapId college fullname collegeid
J Palmer palmerjo College of Medicine - Tucson John Palmer MDTC
C Westerland cwesterl College of Social and Behav Sci Chad Westerland SBSC
R Williams lumbee James E Rogers College of Law Robert Williams LAWC
S Kaib skaib College of Medicine - Phoenix Susan Kaib MDPX
M Witte witte College of Medicine - Tucson Marlys Witte MDTC
S Dougherty shonad College of Medicine - Tucson Shona Dougherty MDTC
D Corso dcorso College of Fine Arts Dawn Corso FNRT
G Huckleberry ghuck College of Science Gary Huckleberry SCNC
D Mccallum macmccallum James E Rogers College of Law David Mccallum LAWC
location_details.json: contains locations details with location id loc_id
location_employee_details.json: contains a list of the employees (kmapids) associated with each location.
employee_details.csv: contains employees kmapid, name, and college
colleges_location_values.json: contains college values for each location.
sample_output.json: example of output format
Question 1 - How to Write a script to import location_details.json, location_employee_details.json, employee_details.csv, and colleges_location_values.json to a MongoDB database.
Question 2 - How to Write a script to read the data from the MongoDB database and produce a JSON file as the given format sample_output.json. The code should work for the variable length of data.
Not sure how to
Write a script to import location_details.json, location_employee_details.json, employee_details.csv, and colleges_location_values.json to a MongoDB database.
Write a script to read the data from the MongoDB database and produce a JSON file as the given format sample_output.json. The code should work for the variable length of data.

How do you combine data in a Scala dataframe and output it as JSON objects?

This is slightly hard to explain for me so I'll do my best. Here is the given data set:
Name
Car Brand
Car Model
Car Color
Year Bought
Tom
Toyota
Corolla
Black
2009
Tom
Hyundai
Kona
Blue
2010
Tom
Kia
Soul
Red
2011
Bob
Mazda
CX-30
Red
2008
Bob
BMW
X1
Blue
2014
With the given data set, I want to condense it based on name and just put all the cars into a list and output it out as JSON objects on separated lines in file. For the above data set, the output should look like this:
{
"name": "Tom",
"Cars": [{
"CarSpecifications": {
"Brand": "Toyota",
"Model": "Corolla",
"Color": "Black"
},
"YearBought":2009
},
{
"CarSpecifications": {
"Brand": "Hyundai",
"Model": "Kona",
"Color": "Blue"
},
"YearBought":2010
},
{
"CarSpecifications": {
"Brand": "Hyundai",
"Model": "Kona",
"Color": "Blue"
},
"YearBought":2011
}]
}
{
"name": "Bob",
"Cars": [{
"CarSpecifications": {
"Brand": "Mazda",
"Model": "CX-30",
"Color": "Red"
},
"YearBought":2008
},
{
"CarSpecifications": {
"Brand": "BMW",
"Model": "X1",
"Color": "Blue"
},
"YearBought":2014
}]
}
How could I accomplish these transformations using Scala and Scala Dataframes?
You can aggregate the dataset using groupBy & collect_list and generate JSON strings with toJSON:
df.groupBy("Name").agg(collect_list(
struct(
struct(
$"Car Brand".as("Brand"),
$"Car Model".as("Model"),
$"Car Color".as("Color")
).as("CarSpecifications"),
$"Year Bought".as("YearBought")
).as("CarSpecifications")
).as("Cars"))
.toJSON
.show(false)
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"Name":"Tom","Cars":[{"CarSpecifications":{"Brand":"Toyota","Model":"Corolla","Color":"Black"},"YearBought":"2009"},{"CarSpecifications":{"Brand":"Hyundai","Model":"Kona","Color":"Blue"},"YearBought":"2010"},{"CarSpecifications":{"Brand":"Kia","Model":"Soul","Color":"Red"},"YearBought":"2011"}]}|
|{"Name":"Bob","Cars":[{"CarSpecifications":{"Brand":"Mazda","Model":"CX-30","Color":"Red"},"YearBought":"2008"},{"CarSpecifications":{"Brand":"BMW","Model":"X1","Color":"Blue"},"YearBought":"2014"}]} |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Remove numeric key from mongodb collection

I am having the below document in the collection for example.
I want to remove data with numeric keys from the document. for example "0" in the below document.
I have tried the below code
$manager = Core_BP_BaseTableMongo::connection();
$bulk = new MongoDB\Driver\BulkWrite;
$bulk->update($wh, ['$unset' => array("0" => true)]);
$result = $manager->executeBulkWrite(Core_BP_BaseTableMongo::getDatabaseName().'.'.$collection_name, $bulk);
Document
{
"0": {
"datetimestamp_id": "1613044151UyZKNjrs7l",
"product_id": ObjectId("5fe155e4045a855ae211e2a4"),
"image": "",
"squence": 0,
"slug": "ks1215",
"name": "KS1215",
"dimensions": "",
"brand_id": ObjectId("5fe12bbbab6506780bd925a3"),
"brand": "Scihome",
"brand_slug": "scihome",
"country": "China",
"description": "<p>The backrest has a front and rear pushable design that opens to give a bed. The middle of the armrest and the feet are made of stainless steel for a modern look. The two color combinations are more compatible with the style of different living rooms. The activity is on the head to satisfy more sitting and crowd use.</p>\r\n\r\n<p><strong>Dimensions :</strong></p>\r\n\r\n<ul>\r\n\t<li>3 left: L - 1860 mm x W - 1130/1370 mm x H - 835 mm</li>\r\n\t<li>1 no : L - 770 mm x W - 1130/1370 mm x H - 835 mm</li>\r\n\t<li>lying right : L - 1090 mm x W - 1800/2040 mm x H - 835 mm</li>\r\n\t<li>footstool : L - 1080 mm x W - 770 mm x H - 410 mm</li>\r\n\t<li>coffee table : L - 350 mm x W - 1070 mm x H - 645 mm</li>\r\n</ul>",
"qty": 1,
"uom": "Each",
"area": "Drawing hall",
"notes": "No leather"
},
"_id": ObjectId("602519b69613e33c1576866d"),
"title": "XYZ",
"project_type": "Duplex Apartments",
"customer": {
"customer_id": ObjectId("5fe31123045a855ae22a29e8"),
"name": "Jay",
"email": "bd02#xyz.com"
},
"status": "Pending",
"request_date": "2021-02-11 17:19:10",
"rfq": false,
"products": [
{
"datetimestamp_id": "1613044151UyZKNjrs7l",
"product_id": ObjectId("5fe155e4045a855ae211e2a4"),
"image": "",
"squence": 0,
"slug": "ks1215",
"name": "KS1215",
"dimensions": "",
"brand_id": ObjectId("5fe12bbbab6506780bd925a3"),
"brand": "Scihome",
"brand_slug": "scihome",
"country": "China",
"description": "<p>The backrest has a front and rear pushable design that opens to give a bed. The middle of the armrest and the feet are made of stainless steel for a modern look. The two color combinations are more compatible with the style of different living rooms. The activity is on the head to satisfy more sitting and crowd use.</p>\r\n\r\n<p><strong>Dimensions :</strong></p>\r\n\r\n<ul>\r\n\t<li>3 left: L - 1860 mm x W - 1130/1370 mm x H - 835 mm</li>\r\n\t<li>1 no : L - 770 mm x W - 1130/1370 mm x H - 835 mm</li>\r\n\t<li>lying right : L - 1090 mm x W - 1800/2040 mm x H - 835 mm</li>\r\n\t<li>footstool : L - 1080 mm x W - 770 mm x H - 410 mm</li>\r\n\t<li>coffee table : L - 350 mm x W - 1070 mm x H - 645 mm</li>\r\n</ul>",
"qty": 1,
"uom": "Each",
"area": "Drawing hall",
"notes": "No leather"
}
],
"shared": [
{
"customer_id": ObjectId("5fe31123045a855ae22a29e8"),
"name": "Jay",
"email": "bd02#xyz.com",
"permission": "Owner"
}
],
"audit_created_by": "Jay",
"audit_created_date": {
"sec": 1613044150
},
"audit_ip": "172.18.0.1",
"audit_updated_by": null,
"audit_updated_date": {
"sec": 1618221803
},
"is_deleted": false
}
I tried $unset as well but it gives me an error Modifiers operate on fields but we found type array instead. For example: {$mod: {: ...}} not {$unset: [ true ]}

Post aggregation example query for druid in json

I am trying to use post aggregation. I have used aggregation to count the number of rows which match the given filter. Following is the post aggregation query:
{
"queryType": "groupBy",
"dataSource": "datasrc1",
"intervals": ["2020-09-16T21:15/2020-09-16T22:30"],
"pagingSpec":{ "threshold":100},
"dimensions": ["city", "zip_code", "country"],
"filter": {
"fields": [
{
"type": "selector",
"dimension": "bankId",
"value": "<bank id>"
}
]
},
"granularity": "all",
"aggregations": [
{ "type": "count", "name": "row"}
],
"postAggregations": [
{ "type": "arithmetic",
"name": "sum_rows",
"fn": "+",
"fields": [
{ "type": "fieldAccess", "fieldName": "row" }
]
}
]
}
If I remove the post aggregation part, it returns me result like:
[ {
"version" : "v1",
"timestamp" : "2020-09-16T21:15:00.000Z",
"event" : {
"city": "Sunnyvale",
"zip_code": "94085",
"country": "US",
"row" : 1
}
}, {
"version" : "v1",
"timestamp" : "2020-09-16T21:15:00.000Z",
"event" : {
"city": "Sunnyvale",
"zip_code": "94080",
"country": "US",
"row" : 1
}
}
If I add the post aggregations part, I get parser exception:
{
"error" : "Unknown exception",
"errorMessage" : "Instantiation of [simple type, class io.druid.query.aggregation.post.ArithmeticPostAggregator] value failed: Illegal number of fields[
%s], must be > 1 (through reference chain: java.util.ArrayList[0])",
"errorClass" : "com.fasterxml.jackson.databind.JsonMappingException",
"host" : null
}
I want to add all the rows (column 'row') in the response we are getting for aggregation query; and put the output in "sum_rows".
I don't understand what I am missing in post_aggregations. Any help is appreciated.
Confess that I spend most of my time in the SQL API not in the native API (!!) but I think your issue is that you're only supplying one field to your post aggregator. See these examples:
https://druid.apache.org/docs/latest/querying/post-aggregations.html#example-usage
If you need sum of rows, perhaps you need a normal aggregator to sum the row count?
The error message says the ArithmeticPostAggregator requires 2 arguments; the example code has only one. There's an example of this post aggregator at the bottom of this answer.
However...the example query doesn't have multiple numeric aggregations to perform arithmetic post-aggregation against. Maybe the goal is to "combine" the two output rows into one?
...To change the two-row result into only one with the total row count (for all database records matching the query filter and interval), removing zip_code from the dimension list would be one way.
Removing zip_code from dimensions would produce one result like this:
[
{
"version" : "v1",
"timestamp" : "2020-09-16T21:15:00.000Z",
"event" : {
"city": "Sunnyvale",
"country": "US",
"row" : 2
}
]
As you can see, by submitting a groupBy query with aggregations, Druid will do this aggregation for you dynamically (based on the dimension values in the database at the time the query is run) without needing post aggregations.
Example arithmetic post aggregator:
{
"type": "arithmetic",
"name": "my_output_sum",
"fn": "+",
"fields": [
{"fieldName": "input_addend_1", "type":"fieldAccess"},
{"fieldName": "input_addend_2", "type":"fieldAccess"}
]
}

Search Entire collection without knowing the structure

How do I do regex search on schema that is unstructured.
This is my schema:
var phoneSchema = mongoose.Schema({
name: String,
img: String,
specs: []
});
This is example of the data
{
"name": "Apple iPad mini Wi-Fi + Cellular",
"img": "http://cdn2.gsmarena.com/vv/bigpic/apple-ipad-mini-final.jpg",
"_id": {
"$oid": "543f40c5427c1034a905d5af"
},
"specs": [
{
"description": "[{\"criteria\":\"2G Network\",\"description\":\"GSM 850 / 900 / 1800 / 1900 - A1454; A1455\"},{\"criteria\":\" \",\"description\":\"CDMA 800 / 1900 / 2100 - A1455\"},{\"criteria\":\"3G Network\",\"description\":\"HSDPA 850 / 900 / 1900 / 2100 - A1454; A1455\"},{\"criteria\":\"4G Network\",\"description\":\"LTE 700 MHz Class 17 / 1700 / 2100 - A1454\"},{\"criteria\":\" \",\"description\":\"LTE 700 / 850 / 1800 / 1900 / 2100 - A1455\"},{\"criteria\":\"SIM\",\"description\":\"Nano-SIM\"},{\"criteria\":\"Announced\",\"description\":\"2012, October\"},{\"criteria\":\"Status\",\"description\":\"Available. Released 2012, November\"}]",
"title": "General"
},
{
"description": "[{\"criteria\":\"Dimensions\",\"description\":\"200 x 134.7 x 7.2 mm (7.87 x 5.30 x 0.28 in)\"},{\"criteria\":\"Weight\",\"description\":\"312 g (11.01 oz)\"}]",
"title": "Body"
},
{
"description": "[{\"criteria\":\"Type\",\"description\":\"LED-backlit IPS LCD capacitive touchscreen, 16M colors\"},{\"criteria\":\"Size\",\"description\":\"768 x 1024 pixels, 7.9 inches (~162 ppi pixel density)\"},{\"criteria\":\"Multitouch\",\"description\":\"Yes\"},{\"criteria\":\"Protection\",\"description\":\"Oleophobic coating\"}]",
"title": "Display"
},
{
"description": "[{\"criteria\":\"Alert types\",\"description\":\"N/A\"},{\"criteria\":\"Loudspeaker \",\"description\":\"Yes, with stereo speakers\"},{\"criteria\":\"3.5mm jack \",\"description\":\"Yes\"}]",
"title": "Sound"
},
{
"description": "[{\"criteria\":\"Card slot\",\"description\":\"No\"},{\"criteria\":\"Internal\",\"description\":\"16/32/64 GB, 512 MB RAM\"}]",
"title": "Memory"
},
{
"description": "[{\"criteria\":\"GPRS\",\"description\":\"Yes\"},{\"criteria\":\"EDGE\",\"description\":\"Yes\"},{\"criteria\":\"Speed\",\"description\":\"DC-HSDPA, 42 Mbps; HSDPA, 21 Mbps; HSUPA, 5.76 Mbps, LTE, 100 Mbps; EV-DO Rev. A, up to 3.1 Mbps\"},{\"criteria\":\"WLAN\",\"description\":\"Wi-Fi 802.11 a/b/g/n, dual-band\"},{\"criteria\":\"Bluetooth\",\"description\":\"v4.0, A2DP, EDR\"},{\"criteria\":\"USB\",\"description\":\"v2.0\"}]",
"title": "Data"
},
{
"description": "[{\"criteria\":\"Primary\",\"description\":\"5 MP, 2592 х 1944 pixels, autofocus\"},{\"criteria\":\"Features\",\"description\":\"Geo-tagging, touch focus, face detection\"},{\"criteria\":\"Video\",\"description\":\"1080p#30fps\"},{\"criteria\":\"Secondary\",\"description\":\"1.2 MP, 720p#30fps, face detection, FaceTime over Wi-Fi or Cellular\"}]",
"title": "Camera"
},
{
"description": "[{\"criteria\":\"OS\",\"description\":\"iOS 6, upgradable to iOS 7.1.2, upgradable to iOS 8.0.2\"},{\"criteria\":\"Chipset\",\"description\":\"Apple A5\"},{\"criteria\":\"CPU\",\"description\":\"Dual-core 1 GHz Cortex-A9\"},{\"criteria\":\"GPU\",\"description\":\"PowerVR SGX543MP2\"},{\"criteria\":\"Sensors\",\"description\":\"Accelerometer, gyro, compass\"},{\"criteria\":\"Messaging\",\"description\":\"iMessage, Email, Push Email, IM\"},{\"criteria\":\"Browser\",\"description\":\"HTML5 (Safari)\"},{\"criteria\":\"Radio\",\"description\":\"No\"},{\"criteria\":\"GPS\",\"description\":\"Yes, with A-GPS, GLONASS\"},{\"criteria\":\"Java\",\"description\":\"No\"},{\"criteria\":\"Colors\",\"description\":\"Black/Slate, White/Silver\"},{\"criteria\":\" \",\"description\":\"- Siri natural language commands and dictation\\n\\n- iCloud cloud service\\n\\n- Twitter and Facebook integration\\n\\n- Maps\\n\\n- Audio/video player/editor\\n\\n- Photo viewer/editor\\n\\n- Voice memo\\n\\n- TV-out\\n\\n- Document viewer\\n\\n- Predictive text input\"}]",
"title": "Features"
},
{
"description": "[{\"criteria\":\" \",\"description\":\"Non-removable Li-Po 4490 mAh battery (16.7 Wh)\"},{\"criteria\":\"Stand-by\",\"description\":\"\"},{\"criteria\":\"Talk time\",\"description\":\"Up to 10 h (multimedia)\"}]",
"title": "Battery"
},
{
"description": "[{\"criteria\":\"Price group\",\"description\":\"http://cdn2.gsmarena.com/vv/price/pg7.gif\"}]",
"title": "Misc"
},
{
"description": "[{\"criteria\":\"Display\",\"description\":\"\\nContrast ratio: 812:1 (nominal)\"},{\"criteria\":\"Loudspeaker\",\"description\":\"\\nVoice 68dB / Noise 65dB / Ring 75dB\"},{\"criteria\":\"Audio quality\",\"description\":\"\\nNoise -82.8dB / Crosstalk -80.8dB\"}]",
"title": "Tests"
}
],
"__v": 0
}
Let say my query is "Noise 65dB". And it should return this phone.
Could some one help me on the query structure assumedly I have specs as an array but not knowing anything what internal content.
Thanks