Druid: Cached lookup fails because of "Null or Empty Dimension found" during ingestion time - druid

I am setting up a new kafka ingestion stream into druid. This works fine, but now I need to do a lookup during the ingestion time. I am running apache-druid-0.13.0-incubating.
I have created a Registered Lookup function which finds a optional referral_id by a promo_id. When I do a lookup introspection for this function it works fine.
Now I want to store the result of this lookup (referral_id) in my datasource. The lookup cannot be done during query time.
In my dimensionsSpec I have defined the dimension as listed below.
{
"type" : "extraction",
"dimension" : "promo_id",
"outputName":"referral_id",
"outputType": "long",
"replaceMissingValueWith":"0",
"extractionFn": {
"type": "registeredLookup",
"lookup": "referral_promoter"
}
}
It does not work. I see this error in the logs:
WARN [KafkaSupervisor-TEST6] org.apache.druid.data.input.impl.DimensionSchema - Null or Empty Dimension found
"2019-01-22T10:08:39,784 ERROR [KafkaSupervisor-TEST6] org.apache.druid.indexing.kafka.supervisor.KafkaSupervisor - KafkaSupervisor[TEST6] failed to handle notice: {class=org.apache.druid.indexing.kafka.supervisor.KafkaSupervisor, exceptionType=class java.lang.IllegalArgumentException, exceptionMessage=Instantiation of [simple type, class org.apache.druid.data.input.impl.DimensionsSpec] value failed: null (through reference chain: org.apache.druid.data.input.impl.StringInputRowParser["parseSpec"]->org.apache.druid.data.input.impl.JSONParseSpec["dimensionsSpec"]), noticeClass=RunNotice}
" java.lang.IllegalArgumentException: Instantiation of [simple type, class org.apache.druid.data.input.impl.DimensionsSpec] value failed: null (through reference chain: org.apache.druid.data.input.impl.StringInputRowParser["parseSpec"]->org.apache.druid.data.input.impl.JSONParseSpec["dimensionsSpec"])
...
Caused by: java.lang.NullPointerException
I have no clue how to solve this. What am I doing wrong? I have tried various dimension configurations, but none worked.

Related

Actions Builder webhookResponse Unexpected internal error at List Response

I tried to add a List Response from my webhook and always receiving an error such as:
Unexpected internal error id=c57c97b2-0b6f-492b-88a3-3867cf2e7203
(The id changes each time.)
After comparing the expected JSON webhookResponse from the Docs with the generated Response from the Actions SDK I found a difference at the typeOverrides object:
JSON from Docs
"typeOverrides": [
{
"name": "prompt_option",
"synonym": {
"entries": []
},
"typeOverrideMode": "TYPE_REPLACE"
}
]
Generated JSON Response from Actions SDK
"typeOverrides": [
{
"name": "prompt_option",
"synonym": {
"entries": []
},
"mode": "TYPE_REPLACE"
}
]
There seems to be an error in the example documentation, but the reference docs say that it should be mode. I've tested it both ways, and that isn't causing the error.
The likely problem is that if you're replying with a List, you must do two things:
You need a Slot in the Scene that will accept the Type that you specify in the typeOverride.name. (And remember - you're updating the Type, not the name of the Slot.)
In the prompt for this slot, you must call the webhook that generates the list. (It has to be that slots prompt. You can't request it in On Enter, for example.)

Avro Generic Record not taking aliases into account

I have some JsonData (fastxml.jackson objects) and I want to convert this into a GenericAvro Record. As I don't know by forehand what data I will be getting, only that there is an Avro schema available in a schema repository. I can't have predefined classes. So hence the generic record.
When I pretty print my schema, I can see my keys/values and their aliases. However the Generic record "put" method does not seem to know these aliases.
I get the following exception Exception in thread "main" org.apache.avro.AvroRuntimeException: Not a valid schema field: device/id
Is this by design? How can I make this schema look at the aliases as well?
schema extract:
"fields" : [ {
"name" : "device_id",
"type" : "long",
"doc" : " The id of the device.",
"aliases" : [ "deviceid", "device/id" ]
}, {
............
}]
code:
def jsonToAvro(jSONObject: JsonNode, schema: Schema): GenericRecord = {
val converter = new JsonAvroConverter
println(jSONObject.toString) // correct
println(schema.toString(true)) // correct
println(schema.getField("device_id")) //correct
println(schema.getField("device_id").aliases()toString) //correct
val avroRecord = new GenericData.Record(schema)
val iter = jSONObject.fields()
while (iter.hasNext) {
import java.util
val e = jSONObject.fields()
val entry = iter.next.asInstanceOf[util.Map.Entry[String, Nothing]]
println(s"adding ${entry.getKey.toString} and ${entry.getValue} with ${entry.getValue.getClass.getName}") // adding device/id and 8711 with com.fasterxml.jackson.databind.node.IntNode
avroRecord.put(entry.getKey.toString, entry.getValue) // throws
}
avroRecord
}
I tried on Avro 1.8.2, it still throws this exception when I read a json string into a GenericRecord:
org.apache.avro.AvroTypeException: Expected field name not found:
But I saw some sample used alias correctly two years ago:
https://www.waitingforcode.com/apache-avro/serialization-and-deserialization-with-schemas-in-apache-avro/read
So I guess Avro changed that behaviour recently
It seems like the schema is only this flexible when reading.
Writing AVRO only looks at the current field name.
Not only that, but I'm using "/" in my field names (json), this is not supported as a field name.
Schema validation does not complain when it's in the alias, so that might work (haven't tested this)

Nested Rest resource throws constraint violation in Jhipster

I have a nested resource like this:
#GetMapping("/tour-requests/{tourRequestId}/tour-request-messages")
#Secured({AuthoritiesConstants.ADMIN})
public ResponseEntity<List<TourRequestMessageDTO>> getTourRequestMessagesForTourRequest(
#PathVariable("tourRequestId") long tourRequestId,
TourRequestMessageCriteria criteria) {
...
}
When I call this resource, for example with GET api/tour-requests/1301/tour-request-messages I get unexpected error:
{
"type": "https://zalando.github.io/problem/constraint-violation",
"title": "Constraint Violation",
"status": 400,
"path": "/api/tour-requests/1301/tour-request-messages",
"violations": [
{
"field": "tourRequestId",
"message": "Failed to convert property value of type 'java.lang.String' to required type 'io.github.jhipster.service.filter.LongFilter' for property 'tourRequestId'; nested exception is java.lang.IllegalStateException: Cannot convert value of type 'java.lang.String' to required type 'io.github.jhipster.service.filter.LongFilter' for property 'tourRequestId': no matching editors or conversion strategy found"
}
],
"message": "error.validation"
}
I tried to debug this, it seems that the exception is happening before the method is called-
The problem is that the search criteria has hijacked the path parameter tourRequestId, as it happen to be also a possible search parameter of the generated QueryService.
That it is why it tried to convert the tourRequestId parameter to LongFilter.
Renaming the path variable to ìd` did also not helped, but after renaming it to something different, the problem disappeared.
I've also hit this problem, but my choice was to remove the field from child's pt.up.hs.project.service.dto.ChildCriteria. When the resource is always nested, it just does not make sense to allow querying by the field which is also specified in the path.

Spark BigQuery Connector: Writing ARRAY type causes exception: ""Invalid value for: ARRAY is not a valid value""

Running a Spark job in Google Cloud Dataproc. Using the BigQuery Connector to load json data output from the job into a BigQuery table.
BigQuery Standard-SQL data types documentation states the ARRAY type is supported.
My Scala code is:
val outputDatasetId = "mydataset"
val tableSchema = "["+
"{'name': '_id', 'type': 'STRING'},"+
"{'name': 'array1', 'type': 'ARRAY'},"+
"{'name': 'array2', 'type': 'ARRAY'},"+
"{'name': 'number1', 'type': 'FLOAT'}"+
"]"
// Output configuration
BigQueryConfiguration.configureBigQueryOutput(
conf, projectId, outputDatasetId, "outputTable",
tableSchema)
//Write visits to BigQuery
jsonData.saveAsNewAPIHadoopDataset(conf)
But the job throws this exception:
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Invalid value for: ARRAY is not a valid value",
"reason" : "invalid"
} ],
"message" : "Invalid value for: ARRAY is not a valid value"
}
at
com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAnThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:432)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:287)
at com.google.cloud.hadoop.io.bigquery.BigQueryRecordWriter.close(BigQueryRecordWriter.java:358)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$5.apply$mcV$sp(PairRDDFunctions.scala:1124)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1366)
... 8 more
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException:
400 Bad Request
Is this a Legacy vs. Standard SQL issue? Or is ARRAY type not supported by the BigQuery Connector for Spark?
Instead of using type=ARRAY, try setting the type as you normally would but also set the key mode=REPEATED.
An array of strings for instance would be defined as:
{'name': 'field1', 'type': 'STRING', 'mode': 'REPEATED'}
Are these arrays of strings? Integers? I believe that using this API, you need to set the type to the element type, e.g. STRING or INT64, but use a mode of REPEATED. The BigQuery APIs have not yet been fully updated to use standard SQL types everywhere, so you need to use the legacy convention of type + mode instead.

Issue with mondogb-morphia in grails application to store Map correctly in database

I'm using the plugin mongodb-morphia (0.7.8) in a grails (2.0.3) application and I experiment an issue with the type Map.
I want to store in my database a map of type Map but when I put that in my groovy file :
class ServiceInfo {
String name
Map<String,?> args
Date dateCreated // autoset by plugin
Date lastUpdated // autoset by plugin
static constraints = {
name nullable:false
}
}
I obtain the following error :
2012-04-29 14:39:43,876 [pool-2-thread-3] ERROR MongodbMorphiaGrailsPlugin - Error processing mongodb domain Artefact > fr.unice.i3s.modalis.yourcast.provider.groovy.ServiceInfo: Unknown type... pretty bad... call for help, wave your hands... yeah!
I tried just to specify Map in my file:
Map args
In that case I obtain the following simple warning:
INFO: MapKeyDifferentFromString complained about fr.unice.i3s.modalis.yourcast.provider.groovy.ServiceInfo.args : Maps cannot be keyed by Object (Map); Use a parametrized type that is supported (Map)
and when I try to save an object, the attribute args is simply omitted in the database.
For information my objects have this kind of representation:
def icalReader= new ServiceInfo(name:"IcalReader", args:['uri':DEFAULT_URL, 'path':"fr.unice.i3s.modalis.yourcast.sources.calendar.icalreader/"])
icalReader.save()
Finally, if I just say that args is a List:
List args
And I change my objects to take a List with only one Map, I just have a warning:
ATTENTION: The multi-valued field 'fr.unice.i3s.modalis.yourcast.provider.groovy.ServiceInfo.args' is a possible heterogenous collection. It cannot be verified. Please declare a valid type to get rid of this warning. null
but everything works fine and I've my map correctly stored in the database:
{ "_id" : ObjectId("4f9be39f0364bf4002cd48ad"), "name" : "IcalReader", "args" : [ { "path" : "fr.unice.i3s.modalis.yourcast.sources.calendar.icalreader/", "uri" : "http://localhost:8080/" } ], "dateCreated" : ISODate("2012-04-28T12:33:35.838Z"), "lastUpdated" : ISODate("2012-04-28T12:33:35.838Z") }
So is there something I forget in defining my map?
My service does work but I don't like hacks like "encapsulate a map in a list to serialize it" ;)
I don't know about Map, but could you use embedded datamodel instead?
class ServiceInfo {
...
#Embedded
MyArgs args
...
}
class MyArgs {
String key
String value
}