{
"cars": {
"Nissan": {
"Sentra": {"doors":4, "transmission":"automatic"},
"Maxima": {"doors":4, "transmission":"automatic","colors":["b#lack","pin###k"]}
},
"Ford": {
"Taurus": {"doors":4, "transmission":"automatic"},
"Escort": {"doors":4, "transmission":"auto#matic"}
}
}
}
I have this JSON that I have read, and I want to remove every # symbol in every string that may exist. My problem is doing this function generic, so it could work on every schema that I may encounter and not only this schema as used in JSON above.
You could do something like this: Get all the fields from the schema, use fold with the DataFrame itself as an accumulator and, apply the function that you want
def replaceSymbol(df: DataFrame): DataFrame =
df.schema.fieldNames.foldLeft(df)((df, field) => df.withColumn(field, regexp_replace(col(field), "#", "")))
You might need to check if the column is String or not.
My "Item" model has a JSONB column named "formula".
I want to get it either as Object or JSON string. But what it returns is a string without quoted keys that can't be parsed as JSON.
My code:
async function itemsRead (where) {
const items = await models.item.findAll({
where
})
console.log(items)
return items
}
And what I see and get is:
[
item {
dataValues: {
id: 123,
formula: '{a:-0.81, x:5.12}',
}
},
.
.
.
]
My mistake was in insert (create) phase. I had to pass the original formula object (and not JSON stringified or other string forms) to create()
let formula = {a:-0.81, x:5.12}
models.item.create({id:123, formula})
I want to extract keys from nested json using spark.
I have below JSON
{
"predicates": {
"API_No": "http://www.oilandgas.com/api_no",
"Facility_ID": "http://www.oilandgas.com/facility_id"
},
"prefixes": {
"API_No": "http://www.oilandgas.com/api_no/ ",
"Facility_ID": "http://www.oilandgas.com/facility_id/ "
},
"relations": {
"API_No": [
"Facility_ID",
"County"
]
},
"grahName": "http://www.oilandgas.com/data"
}
I wrote below code read json
val df = spark.read.option("multiline", "true").json("path/to/above/json")
df.select(explode(array(col("relations")))).columns.foreach(println)
I want to get key in 'relations' as 'API_No' from dataframe.
Thanks In advance.
For getting the key in relations as API_No from dataframe you have to just project (select) just relations key. Since type of relations key is struct, by projecting it you can get the desired result. Like the following:
df.select("relations.*").columns.foreach(println)
It will give the following result:
API_No
I hope it helps!
I have some json like below, when I loaded this json some fields is string of json,
How to parse this json using spark scala and look for the key words I am looking for in that json
{"main":"{\"payload\": { \"mode\": [\"Node\"], \"currentSatate\": \"Ready\", \"Previousstate\": \"slow\", \"trigger\": [\"11\", \"12\"], \"AllStates\": [\"Ready\", \"slow\", \"fast\", \"new\"],\"UnusedStates\": [\"slow\", \"new\"],\"Percentage\": \"70\",\"trigger\": [\"11\"]}"}
{"main":"{\"payload\": {\"trigger\": [\"11\", \"22\"],\"mode\": [\"None\"],\"cangeState\": \"Open\"}}"}
{"main":"{\"payload\": { \"trigger\": [\"23\", \"45\"], \"mode\": [\"Edge\"], \"node.postions\": [\"12\", \"23\", \"45\", \"67\"], \"node.names\": [\"aa\", \"bb\", \"cc\", \"dd\"]}}" }
This is how its looking after loading in to data frame
val df = spark.read.json("<pathtojson")
df.show(false)
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|main |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"payload": { "mode": ["Node"], "currentSatate": "Ready", "Previousstate": "slow", "trigger": ["11", "12"], "AllStates": ["Ready", "slow", "fast", "new"],"UnusedStates": ["slow", "new"],"Percentage": "70","trigger": ["11"]}|
|{"payload": {"trigger": ["11", "22"],"mode": ["None"],"cangeState": "Open"}} |
|{"payload": { "trigger": ["23", "45"], "mode": ["Edge"], "node.postions": ["12", "23", "45", "67"], "node.names": ["aa", "bb", "cc", "dd"]}} |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Since my json filed is different for all the 3 json strings , is there a way to match define 3 case class and match
I know only matching to one class
val mapper = new ObjectMapper() with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
val parsedJson = mapper.readValue[classname](jsonstring)
is there a way to create a multiple matching case class and match to any particular class ?
You are using Spark SQL, the first thing you have to do is to turn it into a dataset, and then use the spark's methods to deal with them. Don't use Json, all over the place (e.g., like in Play). The first task is to turn it into a dataset.
You could turn the serialize a Json into a case class:
val jsonFilePath: String = "/whatever/data.json"
val myDataSet = sparkSession.read.json(jsonFilePath).as[StudentRecord]
Then here you have the dataset for StudentRecord. So, you can now use the spark's groupBy method to get the data of the column you want from the dataset:
myDataSet.groupBy("whateverTable.whateverColumn").max() //could be min(), count(), etc...
Extra Note: Your Json, should "cleaned up" a little. For example, if it is within your program you can use the multi line way of declaring your Json, and then you don't need to use escape character all over the place:
val myJson: String =
"""
{
}
""".stripMargin
If it is in the file, then the Json you wrote is not correct. So first, make sure you have a syntactically correct Json to work on.
I am using spark 1.6.3 to parse a json strucuture
I have a json structure below :
{
"events":[
{
"_update_date":1500301647576,
"eventKey":"depth2Name",
"depth2Name":"XYZ"
},
{
"_update_date":1500301647577,
"eventKey":"journey_start",
"journey_start":"2017-07-17T14:27:27.144Z"
}]
}
i want parse the above JSON to 3 columns in dataframe. eventKey's value(deapth2Name) is a node in Json(deapth2Name) and i want to read the value from corresponding node add it to a column "eventValue" so that i can accommodate any new events dynamically.
Here is the expected output:
_update_date,eventKey,eventValue
1500301647576,depth2Name,XYZ
1500301647577,journey_start,2017-07-17T14:27:27.144Z
sample code:
val x = sc.wholeTextFiles("/user/jx665240/events.json").map(x => x._2)
val namesJson = sqlContext.read.json(x)
namesJson.printSchema()
namesJson.registerTempTable("namesJson")
val eventJson=namesJson.select("events")
val mentions1 =eventJson.select(explode($"events")).toDF("events").select($"events._update_date",$"events.eventKey",$"events.$"events.eventKey"")
$"events.$"events.eventKey"" is not working.
Can you please suggest how to fix this issue.
Thanks,
Sree