'TypeError: StructType can not accept object

'TypeError: StructType can not accept object - pyspark

I'm trying to convert this json string data to Dataframe in Databricks
a = """{ "id": "a",
"message_type": "b",
"data": [ {"c":"abcd","timestamp":"2022-03-
01T13:10:00+00:00","e":0.18,"f":0.52} ]}"""
the schema I defined for the data is this
schema=StructType(
[
StructField("id",StringType(),False),
StructField("message_type",StringType(),False),
StructField("data", ArrayType(StructType([
StructField("c",StringType(),False),
StructField("timestamp",StringType(),False),
StructField("e",DoubleType(),False),
StructField("f",DoubleType(),False),
])))
,
]
)
and when I run this command
df = sqlContext.createDataFrame(sc.parallelize([a]), schema)
I get this error
PythonException: 'TypeError: StructType can not accept object '{ "id": "a",\n"message_type": "JobMetric",\n"data": [ {"c":"abcd","timestamp":"2022-03- \n01T13:10:00+00:00","e":0.18,"f":0.52=} ]' in type <class 'str'>'. Full traceback below:
anyone could help me with this, would much appreciate it!

Your a variable is wrong.
"data": [ "{"JobId":"ATLUPS10m2101V1","Timestamp":"2022-03-
01T13:10:00+00:00","number1":0.9098145961761475,"number2":0.5294908881187439}" ]
Should be
"data": [ {"JobId":"ATLUPS10m2101V1","Timestamp":"2022-03-
01T13:10:00+00:00","number1":0.9098145961761475,"number2":0.5294908881187439} ]
And check if it is OK to match name with JobId to job_id and Timestamp to timestamp.

Issue is whenever you're passing the string object to struct schema it expects RDD([StringType, StringType,...]) however, in your current scenario it is getting just string object. In order to fix it first you need to convert your string to a json object and from there you'll need to create a RDD. See the below logic for details -
Input Data -
a = """{"run_id": "1640c68e-5f02-4f49-943d-37a102f90146",
"message_type": "JobMetric",
"data": [ {"JobId":"ATLUPS10m2101V1","timestamp":"2022-03-01T13:10:00+00:00",
"score":0.9098145961761475,
"severity":0.5294908881187439
}
]
}"""
Converting to a RDD using json object -
from pyspark.sql.types import *
import json
schema=StructType(
[
StructField("run_id",StringType(),False),
StructField("message_type",StringType(),False),
StructField("data", ArrayType(StructType([
StructField("JobId",StringType(),False),
StructField("timestamp",StringType(),False),
StructField("score",DoubleType(),False),
StructField("severity",DoubleType(),False),
])))
,
]
)
df = spark.createDataFrame(data=sc.parallelize([json.loads(a)]),schema=schema)
df.show(truncate=False)
Output -
+------------------------------------+------------+--------------------------------------------------------------------------------------+
|run_id |message_type|data |
+------------------------------------+------------+--------------------------------------------------------------------------------------+
|1640c68e-5f02-4f49-943d-37a102f90146|JobMetric |[{ATLUPS10m2101V1, 2022-03-01T13:10:00+00:00, 0.9098145961761475, 0.5294908881187439}]|
+------------------------------------+------------+--------------------------------------------------------------------------------------+

Related

How to get Quantile/median values in pydruid

My goal is to query the median value of column height in my druid datasource. I was able to use other aggregations like count and count distinct values. Here's my query so far:
group = query.groupby(
datasource=datasource,
granularity='all',
intervals='2020-01-01T00:00:00+00:00/2101-01-01T00:00:00+00:00',
dimensions=[
"category_a"
],
filter=(Dimension("country") == country_id),
aggregations={
'count': longsum('count'),
'count_distinct_city': aggregators.thetasketch('city'),
}
)
There's a class Quantile under postaggregator.py so I tried using this.
class Quantile(Postaggregator):
def __init__(self, name, probability):
Postaggregator.__init__(self, None, None, name)
self.post_aggregator = {
"type": "quantile",
"fieldName": name,
"probability": probability,
}
Here's my attempt at getting the median:
post_aggregations={
'median_value': postaggregator.Quantile(
'height', 50
)
}
The error I'm getting here is 'Could not resolve type id \'quantile\' as a subtype of [simple type, class io.druid.query.aggregation.PostAggregator]:
Druid Error: {'error': 'Unknown exception', 'errorMessage': 'Could not resolve type id \'quantile\' as a subtype of [simple type, class io.druid.query.aggregation.PostAggregator]: known type ids = [arithmetic, constant, doubleGreatest, doubleLeast, expression, fieldAccess, finalizingFieldAccess, hyperUniqueCardinality, javascript, longGreatest, longLeast, quantilesDoublesSketchToHistogram, quantilesDoublesSketchToQuantile, quantilesDoublesSketchToQuantiles, quantilesDoublesSketchToString, sketchEstimate, sketchSetOper, thetaSketchEstimate, thetaSketchSetOp] (for POJO property \'postAggregations\')\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 856] (through reference chain: io.druid.query.groupby.GroupByQuery["postAggregations"]->java.util.ArrayList[0])', 'errorClass': 'com.fasterxml.jackson.databind.exc.InvalidTypeIdException', 'host': None}

I modified the code of pydruid to get this working on our end. I've created new aggregator and postaggregator under /pydruid/utils.
aggregator.py
def quantilesDoublesSketch(raw_column, k=128):
return {"type": "quantilesDoublesSketch", "fieldName": raw_column, "k": k}
postaggregator.py
class QuantilesDoublesSketchToQuantile(Postaggregator):
def __init__(self, name: str, field_name: str, fraction: float):
self.post_aggregator = {
"type": "quantilesDoublesSketchToQuantile",
"name": name,
"fraction": fraction,
"field": {
"fieldName": field_name,
"name": field_name,
"type": "fieldAccess",
},
}
My first time to create a PR! Hopefully they accept and publish officially.
https://github.com/druid-io/pydruid/pull/287

Save DF with JSON string as JSON without escape characters with Apache Spark

I have a dataframe which contains some column and json string:
val df = Seq (
(0, """{"device_id": 0, "device_type": "sensor-ipad", "ip": "68.161.225.1", "cca3": "USA", "cn": "United States", "temp": 25, "signal": 23, "battery_level": 8, "c02_level": 917, "timestamp" :1475600496 }"""),
(1, """{"device_id": 1, "device_type": "sensor-igauge", "ip": "213.161.254.1", "cca3": "NOR", "cn": "Norway", "temp": 30, "signal": 18, "battery_level": 6, "c02_level": 1413, "timestamp" :1475600498 }""")
).toDF("id", "json")
Which I want to save as json - without a nested json string in it but a 'raw' one instead.
When I
df.write.json("path")
It saves my json column as string:
{"id":0,"json":"{\"device_id\": 0, \"device_type\": \"sensor-ipad\", \"ip\": \"68.161.225.1\", \"cca3\": \"USA\", \"cn\": \"United States\", \"temp\": 25, \"signal\": 23, \"battery_level\": 8, \"c02_level\": 917, \"timestamp\" :1475600496 }"}
And what I need is:
{"id": 0,"json": {"device_id": 0,"device_type": "sensor-ipad","ip": "68.161.225.1","cca3": "USA","cn": "United States","temp": 25,"signal": 23,"battery_level": 8,"c02_level": 917,"timestamp": 1475600496}}
How can I achieve it? Please not that the structure of json could be different for each row, it can contain additional fields.

You can use from_json function to get the json string data as a new column
// get schema of the json data
// You can also define your own schema
import org.apache.spark.sql.functions._
val json_schema = spark.read.json(df.select("json").as[String]).schema
val resultDf = df.withColumn("json", from_json($"json", json_schema))
Output:
{"id":0,"json":{"battery_level":8,"c02_level":917,"cca3":"USA","cn":"United States","device_id":0,"device_type":"sensor-ipad","ip":"68.161.225.1","signal":23,"temp":25,"timestamp":1475600496}}
{"id":1,"json":{"battery_level":6,"c02_level":1413,"cca3":"NOR","cn":"Norway","device_id":1,"device_type":"sensor-igauge","ip":"213.161.254.1","signal":18,"temp":30,"timestamp":1475600498}}

plotting graph in sapui5 with pandas dataframe

As pandas supports dataframe to json conversion and the dataframe can be converted to a json data as below: (1) and 2) are just for references nothing to do with sapui5,
1) for eg:
import pandas as pd
df = pd.DataFrame([['madrid', 10], ['venice', 20],['milan',40],['las vegas',35]],columns=['city', 'temp'])
df.to_json(orient="records")
gives:
[{"city":"madrid","temp":10},{"city":"venice","temp":20},{"city":"milan","temp":40},{"city":"las vegas","temp":35}]
and
df.to_json(orient="split")
gives:
{"columns":["city","temp"],"index":[0,1,2,3],"data":[["madrid",10],["venice",20],["milan",40],["las vegas",35]]}
As we have json data , this data could be used as input to plot properties.
2)for the same json data I have created an API (running on localhost):
http://127.0.0.1:****/graph
API using in flask:(just for refernce)
from flask import Flask
import pandas as pd
app=Flask(__name__)
#app.route('/graph')
def plot():
df=pd.DataFrame([['madrid', 10], ['venice', 20], ['milan', 40], ['las vegas', 35]],
columns=['city', 'temp'])
jsondata=df.to_json(orient='records')
return jsondata
if __name__=='__main__':
app.run()
postman result:
[
{
"city": "madrid",
"temp": 10
},
{
"city": "venice",
"temp": 20
},
{
"city": "milan",
"temp": 40
},
{
"city": "las vegas",
"temp": 35
}
]
3)How can I make use of this sample api to fetch data and then plot a sample graph for {city vs temp} using sapui5 ??
looking for an example to do so, (or) any help on how to make use of api's in sapui5 ?

AvroTypeException: Not an enum: MOBILE on DataFileWriter

I am getting the following error message when I tried to write avro records using build-in AvroKeyValueSinkWriter in Flink 1.3.2 and avro 1.8.2:
My schema looks like this:
{"namespace": "com.base.avro",
"type": "record",
"name": "Customer",
"doc": "v6",
"fields": [
{"name": "CustomerID", "type": "string"},
{"name": "platformAgent", "type": {
"type": "enum",
"name": "PlatformAgent",
"symbols": ["WEB", "MOBILE", "UNKNOWN"]
}, "default":"UNKNOWN"}
]
}
And I am calling the following Flink code to write data:
var properties = new util.HashMap[String, String]()
val stringSchema = Schema.create(Type.STRING)
val myTypeSchema = Customer.getClassSchema
val keySchema = stringSchema.toString
val valueSchema = myTypeSchema.toString
val compress = true
properties.put(AvroKeyValueSinkWriter.CONF_OUTPUT_KEY_SCHEMA, keySchema)
properties.put(AvroKeyValueSinkWriter.CONF_OUTPUT_VALUE_SCHEMA, valueSchema)
properties.put(AvroKeyValueSinkWriter.CONF_COMPRESS, compress.toString)
properties.put(AvroKeyValueSinkWriter.CONF_COMPRESS_CODEC, DataFileConstants.SNAPPY_CODEC)
val sink = new BucketingSink[org.apache.flink.api.java.tuple.Tuple2[String, Customer]]("s3://test/flink")
sink.setBucketer(new DateTimeBucketer("yyyy-MM-dd/HH/mm/"))
sink.setInactiveBucketThreshold(120000) // this is 2 minutes
sink.setBatchSize(1024 * 1024 * 64) // this is 64 MB,
sink.setPendingSuffix(".avro")
val writer = new AvroKeyValueSinkWriter[String, Customer](properties)
sink.setWriter(writer.duplicate())
However, it throws the following errors:
Caused by: org.apache.avro.AvroTypeException: Not an enum: MOBILE
at org.apache.avro.generic.GenericDatumWriter.writeEnum(GenericDatumWriter.java:177)
at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:119)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:302)
... 10 more
Please suggest!
UPDATE 1:
I found this is kind of bug in avro 1.8+ based on this ticket: https://issues-test.apache.org/jira/browse/AVRO-1810

It turns out this is an issue with Avro 1.8+, I have to override the version flink uses dependencyOverrides += "org.apache.avro" % "avro" % "1.7.3", the bug can be found here https://issues-test.apache.org/jira/browse/AVRO-1810

Is there a way to return a JSON object within a xe:restViewColumn?

I'm trying to generate a REST-Service on a XPage with the viewJsonService service type.
Within a column I need to have a JSON object and tried to solve that with this code:
<xe:restViewColumn name="surveyResponse">
<xe:this.value>
<![CDATA[#{javascript:
var arrParticipants = new Array();
arrParticipants.push({"participant": "A", "selection": ["a1"]});
arrParticipants.push({"participant": "B", "selection": ["b1", "b2"]});
return (arrParticipants);
}
]]>
</xe:this.value>
</xe:restViewColumn>
I was expecting to get this for that specific column:
...
"surveyResponse": [
{ "participant": "A",
"selection": [ "a1" ]
},
{ "participant": "B",
"selection": [ "b1", "b2" ]
}
]
...
What I am getting is this:
...
"surveyResponse": [
"???",
"???"
]
...
When trying to use toJson for the array arrParticipants the result is not valid JSON format:
...
"surveyResponse": "[{\"selection\": [\"a1\"],\"participant\":\"A\"},{\"selection\": [\"b1\",\"b2\"],\"participant\":\"B\"}]"
...
When tyring to use fromJson for the array arrParticipants the result is:
{
"code": 500,
"text": "Internal Error",
"message": "Error while executing JavaScript computed expression",
"type": "text",
"data": "com.ibm.xsp.exception.EvaluationExceptionEx: Error while executing JavaScript computed expression at com.ibm.xsp.binding.javascript.JavaScriptValueBinding.getValue(JavaScriptValueBinding.java:132) at com.ibm.xsp.extlib.component.rest.DominoViewColumn.getValue(DominoViewColumn.java:93) at com.ibm.xsp.extlib.component.rest.DominoViewColumn.evaluate(DominoViewColumn.java:133) at com.ibm.domino.services.content.JsonViewEntryCollectionContent.writeColumns(JsonViewEntryCollectionContent.java:213) at com.ibm.domino.services.content.JsonViewEntryCollectionContent.writeEntryAsJson(JsonViewEntryCollectionContent.java:191) at com.ibm.domino.services.content.JsonViewEntryCollectionContent.writeViewEntryCollection(JsonViewEntryCollectionContent.java:170) at com.ibm.domino.services.rest.das.view.RestViewJsonService.renderServiceJSONGet(RestViewJsonService.java:394) at com.ibm.domino.services.rest.das.view.RestViewJsonService.renderService(RestViewJsonService.java:112) at com.ibm.domino.services.HttpServiceEngine.processRequest(HttpServiceEngine.java:167) at com.ibm.xsp.extlib.component.rest.UIBaseRestService._processAjaxRequest(UIBaseRestService.java:242) at com.ibm.xsp.extlib.component.rest.UIBaseRestService.processAjaxRequest(UIBaseRestService.java:219) at com.ibm.xsp.util.AjaxUtilEx.renderAjaxPartialLifecycle(AjaxUtilEx.java:206) at com.ibm.xsp.webapp.FacesServletEx.renderAjaxPartial(FacesServletEx.java:225) at com.ibm.xsp.webapp.FacesServletEx.serviceView(FacesServletEx.java:170) at com.ibm.xsp.webapp.FacesServlet.service(FacesServlet.java:160) at com.ibm.xsp.webapp.FacesServletEx.service(FacesServletEx.java:138) at com.ibm.xsp.webapp.DesignerFacesServlet.service(DesignerFacesServlet.java:103) at com.ibm.designer.runtime.domino.adapter.ComponentModule.invokeServlet(ComponentModule.java:576) at com.ibm.domino.xsp.module.nsf.NSFComponentModule.invokeServlet(NSFComponentModule.java:1281) at com.ibm.designer.runtime.domino.adapter.ComponentModule$AdapterInvoker.invokeServlet(ComponentModule.java:847) at com.ibm.designer.runtime.domino.adapter.ComponentModule$ServletInvoker.doService(ComponentModule.java:796) at com.ibm.designer.runtime.domino.adapter.ComponentModule.doService(ComponentModule.java:565) at com.ibm.domino.xsp.module.nsf.NSFComponentModule.doService(NSFComponentModule.java:1265) at com.ibm.domino.xsp.module.nsf.NSFService.doServiceInternal(NSFService.java:653) at com.ibm.domino.xsp.module.nsf.NSFService.doService(NSFService.java:476) at com.ibm.designer.runtime.domino.adapter.LCDEnvironment.doService(LCDEnvironment.java:341) at com.ibm.designer.runtime.domino.adapter.LCDEnvironment.service(LCDEnvironment.java:297) at com.ibm.domino.xsp.bridge.http.engine.XspCmdManager.service(XspCmdManager.java:272) Caused by: com.ibm.jscript.InterpretException: Script interpreter error, line=7, col=8: Error while converting from a JSON string at com.ibm.jscript.types.FBSGlobalObject$GlobalMethod.call(FBSGlobalObject.java:785) at com.ibm.jscript.types.FBSObject.call(FBSObject.java:161) at com.ibm.jscript.types.FBSGlobalObject$GlobalMethod.call(FBSGlobalObject.java:219) at com.ibm.jscript.ASTTree.ASTCall.interpret(ASTCall.java:175) at com.ibm.jscript.ASTTree.ASTReturn.interpret(ASTReturn.java:49) at com.ibm.jscript.ASTTree.ASTProgram.interpret(ASTProgram.java:119) at com.ibm.jscript.ASTTree.ASTProgram.interpretEx(ASTProgram.java:139) at com.ibm.jscript.JSExpression._interpretExpression(JSExpression.java:435) at com.ibm.jscript.JSExpression.access$1(JSExpression.java:424) at com.ibm.jscript.JSExpression$2.run(JSExpression.java:414) at java.security.AccessController.doPrivileged(AccessController.java:284) at com.ibm.jscript.JSExpression.interpretExpression(JSExpression.java:410) at com.ibm.jscript.JSExpression.evaluateValue(JSExpression.java:251) at com.ibm.jscript.JSExpression.evaluateValue(JSExpression.java:234) at com.ibm.xsp.javascript.JavaScriptInterpreter.interpret(JavaScriptInterpreter.java:221) at com.ibm.xsp.javascript.JavaScriptInterpreter.interpret(JavaScriptInterpreter.java:193) at com.ibm.xsp.binding.javascript.JavaScriptValueBinding.getValue(JavaScriptValueBinding.java:78) ... 27 more Caused by: com.ibm.commons.util.io.json.JsonException: Error when parsing JSON string at com.ibm.commons.util.io.json.JsonParser.fromJson(JsonParser.java:61) at com.ibm.jscript.types.FBSGlobalObject$GlobalMethod.call(FBSGlobalObject.java:781) ... 43 more Caused by: com.ibm.commons.util.io.json.parser.ParseException: Encountered " "object "" at line 1, column 2. Was expecting one of: "false" ... "null" ... "true" ... ... ... ... "{" ... "[" ... "]" ... "," ... at com.ibm.commons.util.io.json.parser.Json.generateParseException(Json.java:568) at com.ibm.commons.util.io.json.parser.Json.jj_consume_token(Json.java:503) at com.ibm.commons.util.io.json.parser.Json.arrayLiteral(Json.java:316) at com.ibm.commons.util.io.json.parser.Json.parseJson(Json.java:387) at com.ibm.commons.util.io.json.JsonParser.fromJson(JsonParser.java:59) ... 44 more "
}
Is there any way to get the desired answer?

Well, the best way to achieve the desired result is to use the xe:customRestService if you need to return a cascaded JSON object.
All other xe:***RestService elements assume that you will return a flat JSON construct of parameter and value pairs, where the value is a simple data type (like boolean, number or string and - funny though - arrays) but not a complex data type (like objects).
This is, that this result here
...
"surveyResponse": [
{ "participant": "A",
"selection": [ "a1" ]
},
{ "participant": "B",
"selection": [ "b1", "b2" ]
}
]
...
will be only available on using xe:customRestService where you can define your JSON result by yourself.
Using the other services the results are limited to this constructions:
...
"surveyResponse": true;
...
or
...
"surveyResponse": [
"A",
"B"
]
...

cant you use built-in server-side javascript function toJson ?

You could try intercepting the AJAX call when reading the JSON and then manually de-sanitise the JSON string data.
There are more details here.
http://www.browniesblog.com/A55CBC/blog.nsf/dx/15112012082949PMMBRD68.htm
Personally I'd recommend against this as unless you are absolutely sure the end user can't inject code into the JSON data.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

'TypeError: StructType can not accept object - pyspark

Related

How to get Quantile/median values in pydruid

Save DF with JSON string as JSON without escape characters with Apache Spark

plotting graph in sapui5 with pandas dataframe

AvroTypeException: Not an enum: MOBILE on DataFileWriter

Is there a way to return a JSON object within a xe:restViewColumn?

Categories

Resources