Intermittent problems with querying data from druid using SQL - druid

I query data from druid via SQL. Sometimes it succeeds, but sometimes it fails. My query uses curl; it is:
curl --negotiate -u:srvadmin -X POST -H'Content-Type: application/json' http://du-s12-idc:8082/druid/v2/sql -d #query.json.
When it fails, I get this response:
{"error":"Unknown exception","errorMessage":"Failure getting results for query[6639c357-441f-456c-9a01-0f7ffd0758b7] url[http://du-s28-idc:8083/druid/v2/] because of [Invalid type marker byte 0x3c for expected value token\n at [Source: (SequenceInputStream); line: -1, column:
1]]","errorClass":"io.druid.java.util.common.RE","host":null}
The file query.json is simple:
{"query":"select * from bds_dsp_media_run_info_h_1016 limit 3"}
The data was loaded from hadoop to druid and succeeded. My druid version was 0.11 and built in a cluster with Kerberos.
Does anyone have this problem?

I think Invalid type marker byte 0x3c... exception is just uninformative response that tells you that the server has an internal error, but doesn't give you a clue on what is actually happening. It would help a lot if you could check broker logs when the request is happening.
But, to play a guessing game - I would expect it to be a Kerberos issue. Do you have KRB5_CLIENT_KTNAME env variable populated with the path to your key file?

Related

How to resolve "Invalid Sequence Token" when using cloudwatch agent?

I'm seeing the following warning in the /var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log:
2021-10-06T06:39:23Z W! [outputs.cloudwatchlogs] Invalid SequenceToken used, will use new token and retry: The given sequenceToken is invalid. The next expected sequenceToken is: 49619410836690261519535138406911035003981074860446093650
But there is no mention about which file is really the one that it's failing. Not even when I add "debug": true to the /opt/aws/amazon-cloudwatch-agent/bin/config.json.
cat /opt/aws/amazon-cloudwatch-agent/bin/config.json|jq .agent
{
"metrics_collection_interval": 60,
"debug": true,
"run_as_user": "root"
}
I have many (28) files in my .logs.logs_collected.files.collect_list section of the config.json file, so how can I find which file is exactly causing trouble?
As of 2021-11-29 a PR to improve the log messages has been merged to the cloudwatch-agent but a new version of the cloudwatch-agent has not been released yet, the next version after v1.247349.0 will likely include a fix for this.
The fix will change the log statements to say
INFO: First time sending logs to %v/%v since startup so sequenceToken is nil, learned new token: xxxx: yyyy: This is an INFO message, as this behaviour is expected at startup for example.
WARN: Invalid SequenceToken used (%v) while sending logs to %v/%v, will use new token and retry: xxxxxv: This on the other hand is not expected and may mean that someone else is writing to the loggroup/logstream concurrently.
If those warnings come right after a restart of the cloudwatch agent (cwagent) then you can safely ignore them, it's expected behaviour . The cloudwatch agent does not save the next sequence token in its persistent state so on restart it will "learn" the correct sequence number by issuing a PutLogEvent with no sequence token at all, that returns an InvalidSequenceTokenException with the next sequence token to use. So it's expected to see those at startup, anyway I proposed a PR to amazon-cloudwatch-agent to improve those log messages.
If the "Invalid SequenceToken used" is seen long after the restart then you may have other issues.
The "Invalid SequenceToken used" error usually means that two entities/sources are trying to write to the same log group/log stream as mentioned in 2 (which is really for the old awslogs agent but still useful):
Caught exception: An error occurred (InvalidSequenceTokenException)
when calling the PutLogEvents operation: The given sequenceToken is
invalid[…] -or- Multiple agents might be sending log events to log
stream[…] – You can't push logs from multiple log files to a single
log stream. Update your configuration to push each log to a log
stream-log group combination.
I could be that the amazon cloudwatch agent itself it's trying to upload the same file twice because you have duplicates in your config.json.
So first print all your log group / log stream pairs in your config.json with:
cat /opt/aws/amazon-cloudwatch-agent/bin/config.json|jq -r '.logs.logs_collected.files.collect_list[]|"\(.log_group_name) \(.log_stream_name)"'|sort
which should give an output similar to:
/tableauserver/apigateway apigateway_node5-0.log
/tableauserver/apigateway control_apigateway_node5-0.log
/tableauserver/appzookeeper appzookeeper-discovery_node5-1.log
...
/tableauserver/vizqlserver vizqlserver_node5-3.log
Then you can use uniq -d to find the duplicates in that list with:
cat /opt/aws/amazon-cloudwatch-agent/bin/config.json|jq -r '.logs.logs_collected.files.collect_list[]|"\(.log_group_name) \(.log_stream_name)"'|sort|uniq -d
# The list should be empty otherwise you have duplicates
If that command produces any output it means that you have duplicates in your config.json collect_list.
I personally think that cwagent itself should print the "offending" loggroup/logstream in the logs so I opened in issue in amazon-cloudwatch-agent GitHub page.

Kafka connector config error: filter.condition: Invalid json path defined

I'm trying to use Confluent's Filter SMT with Debezium example unwrap-smt.
I added the following configs to source connector (Debezium MySQL) config:
"transforms": "route,csFilter",
...
...
"transforms.csFilter.type": "io.confluent.connect.transforms.Filter$Value",
"transforms.csFilter.filter.condition": "$.payload.after.source == 2",
"transforms.csFilter.filter.type": "exclude",
"transforms.csFilter.missing.or.null.behavior": "fail"
Since this Filter SMT is provided by Confluent, I downloaded the jar file and copied (connect-transforms, connect-utils, json-path) jar files to path-to-kafka/connect/debezium-connector-mysql directory.
When I tried to register Debezium MySQL source connector,
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json"
localhost:8083/connectors/ -d #source_connector_config.json
I got this error:
{"error_code":400,
"message":"Connector configuration is invalid and contains the following 1 error(s):\n
Invalid value $.payload.after.source == 2 for configuration filter.condition: Invalid json path defined.
Please refer to https://github.com/json-path/JsonPath README for correct use of json path.\n
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`"}
I checked the JSON path expression with the examples provided in this guide. It seemed okay.
Can you please point me to the right directioin? What am I missing?
Thanks.
Please try to use this condition: $.payload.after[?(#.source == 2)]

Spark Session returned an error : Apache NiFi

We are trying to run a spark program using NiFi. This is the basic sample we tried to follow.
We have configured Apache-Livy server in 127.0.0.1:8998.
ExecutiveSparkInteractive processor is used to run sample Spark code.
val gdpDF = spark.read.json("gdp.json")
val gdpRDD = gdpDF.rdd
gdpRDD.count()
LivyController is confiured for 127.0.0.1 port 8998 and Session Type : spark.
When we run the processor we get following error :
Spark Session returned an error, sending the output JSON object as the flow file content to failure (after penalizing)
We just want to output the line count in JSON file. How to redirect it to flowfile?
NiFi User log :
2020-04-13 21:50:49,955 INFO [NiFi Web Server-85]
org.apache.nifi.web.filter.RequestLogger Attempting request for
(anonymous) GET
http://localhost:9090/nifi-api/flow/controller/bulletins (source ip:
127.0.0.1)
NiFi app.log
ERROR [Timer-Driven Process Thread-3]
o.a.n.p.livy.ExecuteSparkInteractive
ExecuteSparkInteractive[id=9a338053-0173-1000-fbe9-e613558ad33b] Spark
Session returned an error, sending the output JSON object as the flow
file content to failure (after penalizing)
I have seen several people struggling with this example. I recommend following this example from the Cloudera Community (especially note part 2).
https://community.cloudera.com/t5/Community-Articles/HDF-3-1-Executing-Apache-Spark-via-ExecuteSparkInteractive/ta-p/247772
The key points I would be concerned with:
Does your spark work in general
Does your livy work in general
Is the Spark sample code good

Custom Spring cloud application - Kafka embeddedheader issue

I am trying to build a custom transformer application using the guidelines provided here
https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#streams-dev-guide
I have started kafka on my windows machine.
I have http source running on windows machine it writes to destination transformData.
Command: java -Dserver.port=8123 -Dhttp.path-pattern=/data -Dspring.cloud.stream.bindings.output.destination=transformData -jar http-source-kafka-10-1.3.1.RELEASE.jar
I have transform application running that reads input from transformData and outputs to destination transformedData
Command
java -Dserver.port=8090 -Dspring.cloud.stream.bindings.input.destination=transformData -Dspring.cloud.stream.bindings.output.destination=transformedData -jar transformer-0.0.1-SNAPSHOT.jar
I have log sink running that reads from destination transformedData
Command
java -Dserver.port=8888 -Dspring.cloud.stream.bindings.input.destination=transformedData -jar log-sink-kafka-10-1.3.1.RELEASE.jar
Problem:
When I try to send this curl request:
curl -H "Content-Type: application/json" -X POST -d '{"id":"1", "temp":"400"}' http://172.20.24.47:8123/data
On the custom Transformer console I see errors:
Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized
token '▒': was expecting ('true', 'false' or 'null') at [Source:
(byte[])"?
contentType
"text/plain"originalContentType "application/json;charset=UTF-8"{"id":"1", "temp":"400"}"; line: 1,
column: 4]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
~[jackson-core-2.9.6.jar!/:2.9.6]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:679)
~[jackson-core-2.9.6.jar!/:2.9.6]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3526)
~[jackson-core-2.9.6.jar!/:2.9.6]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2621)
~[jackson-core-2.9.6.jar!/:2.9.6]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:826)
~[jackson-core-2.9.6.jar!/:2.9.6]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:723)
~[jackson-core-2.9.6.jar!/:2.9.6]
at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4141)
~[jackson-databind-2.9.6.jar!/:2.9.6]
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4000)
~[jackson-databind-2.9.6.jar!/:2.9.6]
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3121)
~[jackson-databind-2.9.6.jar!/:2.9.6]
at org.springframework.cloud.stream.converter.ApplicationJsonMessageMarshallingConverter.convertParameterizedType(ApplicationJsonMessageMarshallingConverter.java:114)
~[spring-cloud-stream-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
... 37 common frames omitted
Can any one help?
I got this to finally work. When building the custom application using the Spring initializr instead of selecting 2.0.4 release as starter I reverted back to 1.5.15 Release. Now I have no more need to pass properties on the subscriber end that is the custom app and the logger sink app using headerModes set to embeddedHeaders.
It appears that you are using Spring Cloud Stream 2.0.0.RELEASE, but your app is 1.3.x. Can you set spring.cloud.stream.bindings.input.consumer.headerMode to embeddedHeaders in the processor app where it is failing? In 2.0, the default is not to embed headers as Kafka supports it out of the box. However, in versions prior to 2.0 (which is used by 1.3.x apps), the default is to embed the headers. You need to explicitly set that when using that combination.

Global Context Broker Federation

Until now, I've been working with a context Broker Instance in stand-alone mode created in Fi-Lab/Cloud and using psb-orion-image. Now I would like to federate an instance of that CB with http//orion.lab.fi-ware.org. I use XML forms to create, update... and the name of the instance to federate is "UPCT:TEMPERATURE:SENSOR" wich sensor type is "UPCT:SENSOR".
So, connecting by SSH I send the next form:
(curl localhost:1026/NGSI10/subscribeContext -s -S --header 'Content-Type: application/xml' -d #- | xmllint --format -) <<EOF
<?xml version="1.0"?>
<subscribeContextRequest>
<entityIdList>
<entityId type="UPCT:SENSOR" isPattern="false">
<id>UPCT:TEMPERATURE:SENSOR</id>
</entityId>
</entityIdList>
<reference>http://orion.lab.fi-ware.eu:1026/ngsi10/notifyContext</reference>
<duration>P1M</duration>
<notifyConditions>
<notifyCondition>
<type>ONCHANGE</type>
<condValueList>
<condValue>temperature</condValue>
</condValueList>
</notifyCondition>
</notifyConditions>
<throttling>PT5S</throttling>
</subscribeContextRequest>
EOF
And I get a right reply, with and subscription ID. However, if I update de contextValue of my instance and try to send a query to http://orion.lab.fi.ware.eu:1026 I receive an error:
-:1: parser error : Start tag expected, '<' not found
Auth-token not found in request header
^
I think I should get the same value that I update to my Instance as indicate in
https://forge.fi-ware.org/plugins/mediawiki/wiki/fiware/index.php/Publish/Subscribe_Broker_-_Orion_Context_Broker_-_User_and_Programmers_Guide#Context_Broker_Federation
I need to know what's wrong and how could I do federete to the global CB.
Thank you
That error message is due to any request sent to Orion instance at orion.lab.fi-ware.org (including the notifications sent by other Orion instances) has to use autentication. At the present moment (i.e. version 0.14.1), Orion doesn't include the X-Auth-Token needed for authentication (see the quick start for programmers) in notifications.
However, the usual use case is federating Orion at orion.lab.fi-ware.org with private Orion user instances (i.e. orion.lab.fi-ware.org -> your Orion) not in the opposite way (i.e. your Orion -> orion.lab.fi-ware.org) as your are trying. This is given that typically what you want to do is to merge public information (e.g. Santander city sensors) with private information (e.g. the information produced by your sensors). Federating orion.lab.fi-ware.org -> your Orion in that way should work perfectly.
EDIT: the limitation in Orion 0.14.1 has been overcomed an current Orion version (2.0.0) propagates X-Auth-Token header in notifications (I don't remember in which exact version between 0.14.1 and 2.0.0, sorry...)