how to change the df column name in struct with colum value - scala

df.withColumn("storeInfo", struct($"store", struct($"inhand", $"storeQuantity")))
.groupBy("sku").agg(collect_list("storeInfo").as("info"))
.show(false)
+---+---------------------------------------------------+
|sku|info |
+---+---------------------------------------------------+
|1 |[{2222, {3, 34}}, {3333, {5, 45}}] |
|2 |[{4444, {5, 56}}, {5555, {6, 67}}, {6666, {7, 67}}]|
+---+---------------------------------------------------+
when I am sending it to couchbase
{
"SKU": "1",
"info": [
{
"col2": {
"inhand": "3",
"storeQuantity": "34"
},
"Store": "2222"
},
{
"col2": {
"inhand": "5",
"storeQuantity": "45"
},
"Store": "3333"
}}
]}
can we rename the col2 with the value to the value of store? I want it to look like something as below. So the key of every struct is the value of store value.
{
"SKU": "1",
"info": [
{
"2222": {
"inhand": "3",
"storeQuantity": "34"
},
"Store": "2222"
},
{
"3333": {
"inhand": "5",
"storeQuantity": "45"
},
"Store": "3333"
}}
]}

Simply, we can't construct a column as you want. two limitation:
The field name of struct type must be fixed, we can change 'col2' to another name (eg. 'fixedFieldName' in demo 1), but it can't be dynamic(similar to Java class field name)
The key of map type could be dynamic, but the value of map must be same type, see the exception in demo 2.
maybe you should change the schema, see the outputs of demo 1, 3
demo 1
df.withColumn(
"storeInfo", struct($"store", struct($"inhand", $"storeQuantity").as("fixedFieldName"))).
groupBy("sku").agg(collect_list("storeInfo").as("info")).
toJSON.show(false)
// output:
//+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
//|value |
//+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
//|{"sku":1,"info":[{"store":2222,"fixedFieldName":{"inhand":3,"storeQuantity":34}},{"store":3333,"fixedFieldName":{"inhand":5,"storeQuantity":45}}]} |
//|{"sku":2,"info":[{"store":4444,"fixedFieldName":{"inhand":5,"storeQuantity":56}},{"store":5555,"fixedFieldName":{"inhand":6,"storeQuantity":67}},{"store":6666,"fixedFieldName":{"inhand":7,"storeQuantity":67}}]}|
//+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
demo 2
df.withColumn(
"storeInfo",
map($"store", struct($"inhand", $"storeQuantity"), lit("Store"), $"store")).
groupBy("sku").agg(collect_list("storeInfo").as("info")).
toJSON.show(false)
// output exception:
// The given values of function map should all be the same type, but they are [struct<inhand:int,storeQuantity:int>, int]
demo 3
df.withColumn(
"storeInfo",
map($"store", struct($"inhand", $"storeQuantity"))).
groupBy("sku").agg(collect_list("storeInfo").as("info")).
toJSON.show(false)
//+---------------------------------------------------------------------------------------------------------------------------------------------+
//|value |
//+---------------------------------------------------------------------------------------------------------------------------------------------+
//|{"sku":1,"info":[{"2222":{"inhand":3,"storeQuantity":34}},{"3333":{"inhand":5,"storeQuantity":45}}]} |
//|{"sku":2,"info":[{"4444":{"inhand":5,"storeQuantity":56}},{"5555":{"inhand":6,"storeQuantity":67}},{"6666":{"inhand":7,"storeQuantity":67}}]}|
//+---------------------------------------------------------------------------------------------------------------------------------------------+

Related

How do I find value in the field "key1, key2, key3" in the example I provided using PostgresQL assuming the value is not known

{
"KEY1": {
"NEW_SIZE": 9,
"NEW_VALUE": 1
},
"KEY2": {
"AGE": 35,
"LAST_NAME": "DOE",
"FIRST_NAME": "JOHN",
"MIDDLE_NAME": null,
"BIRTH_MONTH_INT": 9
},
"KEY3": {
"NEW_SIZE": 11,
"NEW_VALUE": 5
}
}
Once I was corrected by JSONB, I was able to use
SELECT jsonb_object_keys(contents::jsonb) as data
FROM example
WHERE id = 1;
This displayed the values key1, key2, key3 as data so I could display any value in those fields.

parse json in db2 with IDs field without TAG

I have the following json Response and I'm trying to parse it, in DB2 v11.
The problem is that I'm not able to so.
The values 3e00a201d9d1b89732bf8c7a00aa7477ac7212354172ad7780e5296803ad6bbb and 4e00a201d9d1b89732bf8c7a00aa7477ac7212354172ad7780e5296803ad62cd
are IDs and can have up to 60 in the response.
For me to do this, I would need like a kind of TAG for those IDs to be able to select them.
Then is there any ways to do this please ? Thank You -
{
"3e00a201d9d1b89732bf8c7a00aa7477ac7212354172ad7780e5296803ad62cd": {
"found": true,
"signature": "FhNzQ3N2FjNzIxMjM1NDE3MmFkNzc4MGU1Mjk2OD111111111",
"sectors": [
"1",
"2"
]
},
"4e00a201d9d1b89732bf8c7a00aa7477ac7212354172ad7780e5296803ad62cd": {
"found": false,
"signature": "FhNzQ3N2FjNzIxMjM1NDE3MmFkNzc4MGU1Mjk2O2222222",
"sectors": []
}
}
I tries this ...
INSERT INTO tablejson
SELECT '22222',
SYSTOOLS.JSON2BSON('
{
"3e00a201d9d1b89732bf8c7a00aa7477ac7212354172ad7780e5296803ad62cd": {
"found": true,
"signature": "MTY0NDI0MDA5NTUxOTozZTAwYTIwMWQ5ZDFiODk3MzJiZjhjN2EwMGFhNzQ3N2FjNzIxMjM1NDE3MmFkNzc4MGU1Mjk2ODAzYWQ2MmNkOjEsMiwzLDQsNSw2LDcsODpmOGU4YmNlN2I3OGZhZWY3NzVlNDNjM2ZhYzZjNWMzZGRkYzgyMzAzNjI5ZDhjYTc2MDFiODIzYTc0MDRjZWNl",
"sectors": [
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8"
]
},
"4e00a201d9d1b89732bf8c7a00aa7477ac7212354172ad7780e5296803ad62cd": {
"found": false,
"signature": "MTY0NDI0MDA5NTUxOTo0ZTAwYTIwMWQ5ZDFiODk3MzJiZjhjN2EwMGFhNzQ3N2FjNzIxMjM1NDE3MmFkNzc4MGU1Mjk2ODAzYWQ2MmNkOjphNTk2OTk1YjQ0ZTVmNGM4YTdiMGMxN2MzMzgyMmQyMzZkNDc2YTcyODA4ZTMyM2YxODI2Y2E5NWZjNjU2MWE0",
"sectors": []
}
}
')
FROM SYSIBM.SYSDUMMY1
SELECT ID,SYSTOOLS.BSON2JSON(jsonfield) AS JSON_INFO FROM tablejson
SELECT JSON_VAL(jsonfield,'*.signature','s:1000') FROM tablejson
But what yes is working is if I write directly the ID, but this is not helping a lot as the query should be generic.
SELECT JSON_VAL(jsonfield, '4e00a201d9d1b89732bf8c7a00aa7477ac7212354172ad7780e5296803ad62cd.found' ,'s:1000')
FROM tablejson
As the OUTPUT , I would like something like 3 fields
IDs found signature
3e00a201d9d1b89732bf8... true FhNzQ3N2FjNzIxMjM1ND...
4e00a201d9d1b89732bf8... flase FhNzQ3N2FjNzIxMjM1ND...
Thank You -

Scala/Spark: flatten multiple json in RDD using SCALA spark but getting invalid data

MY CODE preferred in scala for flatten multiple json
**val data = sc.textFile("/user/cloudera/spark/sample.json")
val nospace = data.map(x => x.trim())
val nospaces = nospace.filter(x => x!="")
val local = nospaces.collect
var vline =""
var eline :List[String]= List()
var lcnt =0
var rcnt =0
local.map{x =>
vline+=x
if (x=="[") lcnt+=1
if (x=="[") rcnt+=1
if (lcnt==rcnt){
eline++=List(vline)
lcnt=0
rcnt=0
vline =""
}
}**
MY Input Sheet multiple json file :
[
{
“Year”: “2013”,
“First Name”: “JANE”,
“County”: “A”,
“Sex”: “F”,
“Count”: “27”
},{
“Year”: “2013”,
“First Name”: “JADE”,
“County”: “B”,
“Sex”: “M”,
“Count”: “26”
},{
“Year”: “2013”,
“First Name”: “JAMES”,
“County”: “C”,
“Sex”: “M”,
“Count”: “21”
}
]
input json taken
root#ubuntu:/home/sathya/Desktop/stackoverflo/data# cat /home/sathya/Desktop/stackoverflo/data/sample.json
[
{
"Year": "2013",
"First Name": "JANE",
"County": "A",
"Sex": "F",
"Count": "27"
},{
"Year": "2013",
"First Name": "JADE",
"County": "B",
"Sex": "M",
"Count": "26"
},{
"Year": "2013",
"First Name": "JAMES",
"County": "C",
"Sex": "M",
"Count": "21"
}
]
code to read the json and flatten as dataframe columns
spark.read.option("multiline","true").json("file:////home/sathya/Desktop/stackoverflo/data/sample.json").show()
'''
+-----+------+----------+---+----+
|Count|County|First Name|Sex|Year|
+-----+------+----------+---+----+
| 27| A| JANE| F|2013|
| 26| B| JADE| M|2013|
| 21| C| JAMES| M|2013|
+-----+------+----------+---+----+
'''

jsonb searching for keys in array and returning position

I have the following json object stored into a jsonb column
{
"msrp": 6000,
"data": [
{
"supplier": "a",
"price": 5775
},
{
"supplier": "b",
"price": 6129
},
{
"supplier": "c",
"price": 5224
},
{
"supplier": "d",
"price": 5775
}
]
}
There's a few things I'm trying to do but completely stuck on :(
Check if a supplier exists inside this array. So if I'm looking up if "supplier": "e" is in here. Here's what I tried but didn't work. "where data #> '{"supplier": "e"}'"
(optional but really nice to have) Before returning results if I do a select *, inject into each array a "price_diff" so that I can see the difference between msrp and the supplier price as such.
{
"supplier": "d",
"price": 5775,
"price_diff": 225
}
where data #> '{"supplier": "e"}'
Do you have a column named data? You can't just treat a JSONB key name as if it were a column name.
Containment starts from the root.
colname #> '{"data":[{"supplier": "e"}]}'
You can redefine the 'root' dynamically though:
colname->'data' #> '[{"supplier": "e"}]'

Is there a magic function with can extract all select keys/nested keys including array from jsonb

Given a jsonb and set of keys how can I get a new jsonb with required keys.
I've tried extracting key-values and assigned to text[] and then using jsonb_object(text[]). It works well, but the problem comes when a key has a array of jsons.
create table my_jsonb_table
(
data_col jsonb
);
insert into my_jsonb_table (data_col) Values ('{
"schemaVersion": "1",
"Id": "20180601550002",
"Domains": [
{
"UID": "29aa2923",
"quantity": 1,
"item": "book",
"DepartmentDomain": {
"type": "paper",
"departId": "10"
},
"PriceDomain": {
"Price": 79.00,
"taxA": 6.500,
"discount": 0
}
},
{
"UID": "bbaa2923",
"quantity": 2,
"item": "pencil",
"DepartmentDomain": {
"type": "wood",
"departId": "11"
},
"PriceDomain": {
"Price": 7.00,
"taxA": 1.5175,
"discount": 1
}
}
],
"finalPrice": {
"totalTax": 13.50,
"total": 85.0
},
"MetaData": {
"shopId": "1405596346",
"locId": "95014",
"countryId": "USA",
"regId": "255",
"Date": "20180601"
}
}
')
This is what I am trying to achieve :
SELECT some_magic_fun(data_col,'Id,Domains.UID,Domains.DepartmentDomain.departId,finalPrice.total')::jsonb FROM my_jsonb_table;
I am trying to create that magic function which extracts the given keys in a jsonb format, as of now I am able to extract scalar items and put them in text[] and use jsonb_object. but don't know how can I extract all elements of array
expected output :
{
"Id": "20180601550002",
"Domains": [
{
"UID": "29aa2923",
"DepartmentDomain": {
"departId": "10"
}
},
{
"UID": "bbaa2923",
"DepartmentDomain": {
"departId": "11"
}
}
],
"finalPrice": {
"total": 85.0
}
}
I don't know of any magic. You have to rebuild it yourself.
select jsonb_build_object(
-- Straight forward
'Id', data_col->'Id',
'Domains', (
-- Aggregate all the "rows" back together into an array.
select jsonb_agg(
-- Turn each array element into a new object
jsonb_build_object(
'UID', domain->'UID',
'DepartmentDomain', jsonb_build_object(
'departId', domain#>'{DepartmentDomain,departId}'
)
)
)
-- Turn each element of the Domains array into a row
from jsonb_array_elements( data_col->'Domains' ) d(domain)
),
-- Also pretty straightforward
'finalPrice', jsonb_build_object(
'total', data_col#>'{finalPrice,total}'
)
) from my_jsonb_table;
This probably is not a good use of a JSON column. Your data is relational and would better fit traditional relational tables.