Databricks Flatten Read / Unpack JSON Array - pyspark

I'm not entirely sure of the correct term to used here, wherther its Json Unpacking or Json explode, or simply JSON flattening. However, I'm trying to read columns for a Nested JSON array in Databricks.
For example, I would like to pull at the customerId number 'cust5001, which looks like the following key/value pair
array
0: {"customerId": "cust5001", "orderDate": "2021-12-24 00.00.00.000", "orderDetails": [{"productId": "prd9001", "quantity": 2, "sequence": 1, "totalPrice": {"gross": 550, "net": 500, "tax": 50}}, {"productId": "prd9002", "quantity": 3, "sequence": 2, "totalPrice": {"gross": 300, "net": 240, "tax": 60}}], "orderId": "ord1001", "shipmentDetails": {"city": "Delhi", "country": "India", "postalCode": "110040", "state": "New Delhi", "street": "M.G.Road"}}
1: {"customerId": "cust5002", "orderDate": "2021-12-25 00.00.00.000", "orderDetails": [{"productId": "prd9001", "quantity": 1, "sequence": 1, "totalPrice": {"gross": 275, "net": 250, "tax": 25}}, {"productId": "prd9004", "quantity": 4, "sequence": 2, "totalPrice": {"gross": 1000, "net": 900, "tax": 100}}], "orderId": "ord1002", "shipmentDetails": {"city": "Mumbai", "country": "India", "postalCode": "400064", "state": "Maharastra", "street": "Malad"}}
If you take a look at the complete image below, I believe I need to explode the column 'datasets'l and then query "customerId":"cust5001"
To be totally straight, I would like to flatten the whole two rows, but if someone could show me how pull out "customerId":"cust5001", it would be start.

Related

Flutter/Dart how to combine elements by attribute inside a list and make operations over them

I have a list of objects like this:
[
{"name": 'olive oil', "quantity": "1", "unit": "spoons"},
{"name": 'olive oil', "quantity": "2", "unit": "spoons"},
{"name": 'parmesan cheese', "quantity": "20", "unit": "slices"},
{"name": 'parmesan cheese', "quantity": "3/4", "unit": "cup"},
{"name": 'salt', "quantity": "1", "unit": "teaspoon"},
];
And I would like to get something like this:
{"name": 'olive oil', "total": "3 spoons"},
{"name": 'parmesan cheese', "total": "20 slices + 3/4 cup"},
{"name": 'salt', "total": "1 teaspoon"},
Where elements with the same name were merged and its units and quantities were added in a new attribute.
Check this code
void main() {
List<Map<String,String>> data=[
{"name": 'olive oil', "quantity": "1", "unit": "spoons"},
{"name": 'olive oil', "quantity": "2", "unit": "spoons"},
{"name": 'parmesan cheese', "quantity": "20", "unit": "slices"},
{"name": 'parmesan cheese', "quantity": "3/4", "unit": "cup"},
{"name": 'salt', "quantity": "1", "unit": "teaspoon"},
];
for(int i=0;i<data.length;i++){
data[i]['quantity']=data[i]['quantity'].toString()+' '+data[i]['unit'].toString();
print(data[i]);
}
}
it will give you the output like this
{name: olive oil, quantity: 1 spoons, unit: spoons}
{name: olive oil, quantity: 2 spoons, unit: spoons}
{name: parmesan cheese, quantity: 20 slices, unit: slices}
{name: parmesan cheese, quantity: 3/4 cup, unit: cup}
{name: salt, quantity: 1 teaspoon, unit: teaspoon}
Now you can Use Quantity anywhere

How to update specific field in mongoDB given conditions?

Given the following mongdoDB structure, how can i update the field isAvailable to false given that the shopName is "jamrt" and slug is "67626dae-1537-40d8-837d-483e5759ada0". This is my query but it does not work: Shop.find({ shopName: shopName}).update({products: {$elemMatch: {slug: slug}}}, { $set: { isAvailable: req.body.isAvailable} } Thanks!
"shopName": "jmart",
"products": [{
"id": 1,
"name": "Clean and Clear Deep Clean Cleanser 100g",
"slug": "8d1c895c-6911-4fc8-a34c-89c6948233d7",
"price": 4.5,
"discount_price": 0,
"category": "Health and Beauty",
"sale": false,
"subcategory": "personal care",
"color": "black",
"article": "Clean and Clear",
"quantity": 9,
"img": "https://firebasestorage.googleapis.com/v0/b/swifty-products.appspot.com/o/Jmart%2FBeauty%2FClean%20and%20Clear%20Deep%20Clean%20Cleanser%20100g.jpg?alt=media",
"vendor": {
"id": 1,
"name": "Clean and Clear"
},
"ratings": {
"star_ratings": 0,
"votes": 0
},
"isAvailable": true
}, {
"id": 2,
"name": "Colgate Total Pro Breath Health",
"slug": "67626dae-1537-40d8-837d-483e5759ada0",
"price": 4.5,
"discount_price": 0,
"category": "Health and Beauty",
"sale": false,
"subcategory": "personal care",
"color": "black",
"article": "Colgate",
"quantity": 9,
"img": "https://firebasestorage.googleapis.com/v0/b/swifty-products.appspot.com/o/Jmart%2FBeauty%2FColgate%20Total%20Pro%20Breath%20Health.jpg?alt=media",
"vendor": {
"id": 2,
"name": "Colgate"
},
"ratings": {
"star_ratings": 0,
"votes": 0
},
"isAvailable": true
},
]
In your case, you are trying to update only the matching sub documents.
The $elemMatch operator while using in projection updates only the first matching sub document.
The $elemMatch operator while using in find updates all the fields of the matching document.
This solution might help you.
With your case, the solution might be the below in mongodb query:
db.Shop.update({"shopName":"jmart","products.slug":"67626dae-1537-40d8-837d-483e5759ada0"}, {$set: {“products.$[i].isAvailable”: false}}, {arrayFilters: [{“i.slug”: "67626dae-1537-40d8-837d-483e5759ada0"}]})

Sum value of elements and update field Mongo db

How Can I sum price on array "Elemets" and set on Document field Value?
I know how to do It in sql but I,m beginner in mongo.
{
"Document": [
{
"Id": 1,
"Type": "FV",
"Number": 34521,
"Year": 2020,
"Date": "2020-01-01T00:00:00",
"Value": 27.68,
"Elements": [
{
"Id": 1,
"DocumentId": 1,
"ProductId": 1,
"Quantity": 5.00,
"Price": 17.50,
"Task": 0.23
},
{
"Id": 2,
"DocumentId": 1,
"ProductId": 2,
"Quantity": 3.00,
"Price": 24.50,
"Task": 0.23
},
]
},
If you are using MongoDB 4.2, you can use $reduce to calculate the sum in the pipeline form of update.

How to encode table based data in Vega-Lite?

First of all, it is hard to describe what I exactly mean by "table based data", because in some way all the input data for vega is "table-ish", but this example should make it clear:
Most (if not all) of the Vega-Lite examples for multi line charts use data like,
"data": {
"values": [
{"id": 0, "symbol": "A", "value": 4},
{"id": 1, "symbol": "A", "value": 2},
{"id": 0, "symbol": "B", "value": 3},
{"id": 1, "symbol": "B", "value": 8}
]
}
which is simple to color the lines of A and B with an ecoding like this,
"mark": "line",
"encoding": {
"x": {"field": "id", "type": "quantitative"},
"y": {"field": "value", "type": "quantitative"},
"color": {"field": "symbol", "type": "nominal"}
}
But what if I want to produce the same result with a table based form of data like this,
"data": {
"values": [
{"id": 0, "A": 4, "B": 3},
{"id": 1, "A": 2, "B": 8}
]
}
1. How can I encode table based data into one colored multi line chart?
A basic encoding could be to create line charts for every field and layer them on top of each other like this,
"encoding": {
"x": {"field": "id", "type": "quantitative"}
},
"layer": [
{
"mark": "line",
"encoding": {
"y": {"field": "A", "type": "quantitative"}
}
},
{
"mark": "line",
"encoding": {
"y": {"field": "B", "type": "quantitative"}
}
}
]
But with this I don't know how to color the lines differently or how to create a legend.
2. Is this type of input data idiomatic to the way vega/vega-lite is designed?
The data that vega-lite works with is often known as "long-form" or "column-oriented" data. The type of data you're asking about is often known as "wide-form" or "row-oriented" data. This is discussed briefly in the documentation for Altair, a Python wrapper for vega-lite: https://altair-viz.github.io/user_guide/data.html#long-form-vs-wide-form-data
In the current release of Vega-Lite (v2.X) your only option is to modify the data source to be column-oriented with an external tool. This will change in the v3.0 release of Vega-Lite, which adds the Fold transform which is designed to convert row-oriented data to column-oriented within a chart specification.
So, in Vega-Lite 3, you could use the fold transform like this (vega editor link):
{
"data": {"values": [{"id": 0, "A": 4, "B": 3}, {"id": 1, "A": 2, "B": 8}]},
"transform": [{"fold": ["A", "B"]}],
"mark": "line",
"encoding": {
"x": {"field": "id", "type": "quantitative"},
"y": {"field": "value", "type": "quantitative"},
"color": {"field": "key", "type": "nominal"}
}
}
Another solution (a bit tedious) is to use layer and create n layers for n columns
{
"$schema": "https://vega.github.io/schema/vega-lite/v3.json",
"data": {"url": "data/seattle-weather.csv", "format": {"type": "csv"}},
"layer": [{
"mark": {"type": "line", "color": "orange"},
"encoding": {
"x": {"timeUnit": "yearmonthdate", "field": "date", "type": "temporal"},
"y": {"field": "temp_max", "type": "quantitative"}
}
}, {
"mark": {"type": "line", "color": "red"},
"encoding": {
"x": {"timeUnit": "yearmonthdate", "field": "date", "type": "temporal"},
"y": {"field": "temp_min", "type": "quantitative"}
}
}]
}
Future support for layer repeat (https://github.com/vega/vega-lite/issues/1274) may make this a more reasonable solution.

Tax lines won't automatically generate in Shopify API order

I am posting the following to the Shopify API order endpoint:
{
"order": {
"email": "some#email.com",
"financial_status": "paid",
"fulfillment_status": null,
"send_receipt": true,
"send_fulfillment_receipt": true,
"note": "Created by somename",
"line_items": [
{
"variant_id": 21718275463,
"quantity": 1,
"price": 99,
"requires_shipping": true,
"product_id": 6820646151
},
{
"variant_id": 21717700871,
"quantity": 1,
"price": 1000,
"requires_shipping": true,
"product_id": 6820646151
},
{
"variant_id": 21717690055,
"quantity": 1,
"price": 555,
"requires_shipping": true,
"product_id": 6821668807
}
],
"processing_method": "offsite",
"shipping_address": {
"first_name": "Chris",
"address1": "10101 Musick Road",
"phone": "9999999999",
"city": "St. Louis",
"zip": "63123",
"province": "MO",
"country": "United States",
"last_name": "Becker",
"name": "Chris Becker",
"country_code": "US",
"province_code": "MO"
},
"source_name": "somename",
"taxes_included": false,
"shipping_lines": [
{
"title": "standard",
"price": 0.00,
"code": null,
"source": "brand owner on shopify",
"carrier_identifier": null,
"tax_lines": null
}
],
"tags": "some Order"
}
}
and receiving a response without tax lines that are filled. I have seen on the shopify forum that the taxlines are supposed to then be automatically
calculated and filled by shopify. I tried doing it with a customer as well but that didn't work either.
The Orders API will not auto-calculate the taxes but if your app knows how much they are then you can include this data using tax_lines and total_tax:
{
"order": {
"line_items": [{
"title": "Big Brown Bear Boots",
"price": 74.99,
"quantity": 3,
"tax_lines": [{
"price": 13.50,
"rate": 0.06,
"title": "State tax"
}]
}],
"total_tax": 13.50
}
}