Publishing an AVRO messages to Kafka via Kafka REST - apache-kafka

We have deployed Confluent Platform 6.0 in our Kubernetes cluster. I have created a Kafka topic "test-topic-1" via Kafka REST api. Now I'm trying to publish a simple AVRO message to this topic.
curl --location --request POST 'https://kafka-rest-master.k8s.hip.com.au/topics/test-topic-1' \
--header 'Content-Type: application/vnd.kafka.avro.v2+json' \
--header 'Accept: application/vnd.kafka.v2+json' \
--data-raw '{"value_schema":{"type":"record","name":"User","fields":[{"name":"name","type":"string"}]},"records":[{"value":{"name":"testUser"}}]}'
I get a 500 error response for this request,
{"error_code":500,"message":"Internal Server Error"}
When I check the logs of the kafka rest pod, I can see the following error,
ERROR Request Failed with exception
(io.confluent.rest.exceptions.DebuggableExceptionMapper)
com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot
deserialize instance of java.lang.String out of START_OBJECT token
at [Source:
(org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream);
line: 1, column: 17] (through ref erence chain:
io.confluent.kafkarest.entities.v2.SchemaTopicProduceRequest["value_schema"])
at com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:59)
at com.fasterxml.jackson.databind.DeserializationContext.reportInputMismatch(DeserializationContext.java:1445)
Am I following the correct steps to publish an AVRO message to a newly created Kafka topic? If so what could be the problem here?

your AVRO schema definition is wrong. It should be defined as the following
{
"type": "record",
"name": "recordName",
"namespace": "namespace",
"doc": "description",
"fields": [
{
"name": "key",
"type": {
"type": "string",
"avro.java.string": "String"
}
}
]
}
OR
"fields": [
{
"name": "fiedName",
"type": "string"
},
]

Related

PUT k8s deployment returns 404

According to the Replacement section of Kubernetes API reference v1.24 I should be able to create a deployment with a PUT /apis/apps/v1/namespaces/{namespace}/deployments/{name} HTTP request. The success response here is 201 Created. However, when I try the following, I get a 404 Not Found which is of course correct but unwanted: PUT requests should be treated as Create statements if the resource does not yet exist as documented. Updating a deployment does work (and returns the expected 200 OK HTTP response). Is there any documentation regarding this? Or is the request somehow incorrect? Ty.
➜ ~ curl --request PUT \
--url http://localhost:8080/apis/apps/v1/namespaces/ns/deployments/nginx-deployment \
--header 'content-type: application/json' \
--data '{
"apiVersion":"apps/v1",
"kind":"Deployment",
"metadata":{
"name":"nginx-deployment",
"labels":{
"app":"nginx"
}
},
"spec": {
"replicas" : 3,
"selector": {
"matchLabels" : {
"app":"nginx"
}
},
"template" : {
"metadata" : {
"labels" : {
"app":"nginx"
}
},
"spec":{
"containers":[
{
"name":"ngnix",
"image":"nginx:1.7.9",
"ports":[
{
"containerPort": 80
}
]
}
]
}
}
}
}'
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "deployments.apps \"nginx-deployment\" not found",
"reason": "NotFound",
"details": {
"name": "nginx-deployment",
"group": "apps",
"kind": "deployments"
},
"code": 404
}%
According to the documentation you provided,
PUT /apis/apps/v1/namespaces/{namespace}/deployments/{name}
is meant to "replace the specified Deployment", while a Deployment is created with a POST:
create a Deployment
HTTP Request
POST /apis/apps/v1/namespaces/{namespace}/deployments
You are correct that the documentation also states:
For PUT requests, Kubernetes internally classifies these as either create or update based on the state of the existing object
so there seems to be a contradiction, but the Deployment API spec states that POST should be used to create a deployment and PUT to update it.

MSK S3 sink not working with Field Partitioner

I am using AWS MSK and msk connect. S3 sink connector is not working properly when I added io.confluent.connect.storage.partitioner.FieldPartitioner Note:without fieldPartitioner s3sink had worked. Other than this stack overflow Question Link I was not able to find any resource
Error
ERROR [FieldPart-sink|task-0] Value is not Struct type. (io.confluent.connect.storage.partitioner.FieldPartitioner:81)
Caused by: io.confluent.connect.storage.errors.PartitionException: Error encoding partition.
ERROR [Sink-FieldPartition|task-0] WorkerSinkTask{id=Sink-FieldPartition-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. Error: Error encoding partition. (org.apache.kafka.connect.runtime.WorkerSinkTask:612)
MSK Connect Config
connector.class=io.confluent.connect.s3.S3SinkConnector
format.class=io.confluent.connect.s3.format.avro.AvroFormat
flush.size=1
schema.compatibility=BACKWARD
tasks.max=2
topics=MSKTutorialTopic
storage.class=io.confluent.connect.s3.storage.S3Storage
topics.dir=mskTrials
s3.bucket.name=clickstream
s3.region=us-east-1
partitioner.class=io.confluent.connect.storage.partitioner.FieldPartitioner
partition.field.name=name
value.converter.schemaAutoRegistrationEnabled=true
value.converter.registry.name=datalake-schema-registry
value.convertor.schemaName=MSKTutorialTopic-value
value.converter.avroRecordType=GENERIC_RECORD
value.converter.region=us-east-1
value.converter.schemas.enable=true
value.converter=org.apache.kafka.connect.storage.StringConverter
key.converter=org.apache.kafka.connect.storage.StringConverter
Data Schema which is stored in glue schema registry
{
"namespace": "example.avro",
"type": "record",
"name": "UserData",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "favorite_number",
"type": [
"int",
"null"
]
},
{
"name": "favourite_color",
"type": [
"string",
"null"
]
}
]
}
In order to partition by fields, your data needs actual fields.
StringConverter cannot parse data it consumes to add said fields. Use AvroConverter if you have Avro data in the topic. Also, Avro always has a schema, so remove the schemas.enable configuration

Getting the latest execution for a job via the Rundeck API

I'm using the latest version of Rundeck (3.3.10) and I'm having trouble getting the latest execution for a job via the Rest API.
If I call api/38/job//executions?max=1 it doesn't seem to bring back the latest execution if it is still running. Ideally, I'd also like to be able get the latest execution Start Time, End Time, User and result for each job in a single API call, but I'd resigned myself to calling the API once per job. There doesn't seem to be any way to sort the executions you get back from the API - they seem to be sorted by status first, so the running jobs appear at the end of the list.
Does anyone know a way around this? Thanks.
Yo can get that information using: executions?status=running&max=1 call.
Script example:
#!/bin/sh
# protocol
protocol="http"
# basic rundeck info
rdeck_host="localhost"
rdeck_port="4440"
rdeck_api="38"
rdeck_token="YRVaZikt64Am85RyLo1nyq8U1Oe4Q8J7 "
# specific api call info
rdeck_job="03f28add-84f2-4013-b8f5-e48feaf5977c"
# api call
curl --location --request GET "$protocol://$rdeck_host:$rdeck_port/api/$rdeck_api/job/$rdeck_job/executions?status=running&max=1" \
--header "Accept: application/json" \
--header "X-Rundeck-Auth-Token: $rdeck_token" \
--header "Content-Type: application/json"
Output:
{
"paging": {
"count": 1,
"total": 1,
"offset": 0,
"max": 1
},
"executions": [
{
"id": 7,
"href": "http://localhost:4440/api/38/execution/7",
"permalink": "http://localhost:4440/project/ProjectEXAMPLE/execution/show/7",
"status": "running",
"project": "ProjectEXAMPLE",
"executionType": "user",
"user": "admin",
"date-started": {
"unixtime": 1617304896289,
"date": "2021-04-01T19:21:36Z"
},
"job": {
"id": "03f28add-84f2-4013-b8f5-e48feaf5977c",
"averageDuration": 13796,
"name": "HelloWorld",
"group": "",
"project": "ProjectEXAMPLE",
"description": "",
"href": "http://localhost:4440/api/38/job/03f28add-84f2-4013-b8f5-e48feaf5977c",
"permalink": "http://localhost:4440/project/ProjectEXAMPLE/job/show/03f28add-84f2-4013-b8f5-e48feaf5977c"
},
"description": "sleep 20; echo \"hi\"",
"argstring": null,
"serverUUID": "630be43c-e71f-4102-be96-d017dd22233e"
}
]
}

How can I use the BigQuery REST API from the command line?

Attempting to make a plain GET request to one of the BigQuery REST APIs gives an error that looks like this:
curl https://www.googleapis.com/bigquery/v2/projects/$PROJECT_ID/jobs/$JOBID
Output:
{
"error": {
"errors": [
{
"domain": "global",
"reason": "required",
"message": "Login Required",
"locationType": "header",
"location": "Authorization",
...
What is the correct way to invoke one of the REST APIs from the command-line, such as the query or insert APIs? The API reference has a "Try this API", but the examples don't translate directly to something you can run from the command-line.
As a disclaimer, when working from the command-line, using the bq tool will usually be sufficient, or for more complex use cases, the BigQuery client libraries enable programming with BigQuery from multiple languages. It can still be useful sometimes to make plain requests to the REST APIs to see how certain APIs work at a low level, however.
First, make sure that you have installed the Google Cloud SDK. This should include the gcloud and bq command-line tools. If you haven't already, authorize your account by running this command from your terminal:
gcloud auth login
This should prompt you to log in and then give you an access code that you can paste into your terminal. (The exact process may change over time).
Now let's try a query using the BigQuery REST API, calling the jobs.query method. Modify this script with your own project name, which you can find from the Google Cloud Console, then paste the script into your terminal:
PROJECT="YOUR_PROJECT_NAME"
QUERY="\"SELECT 1 AS x, 'foo' AS y;\""
REQUEST="{\"kind\":\"bigquery#queryRequest\",\"useLegacySql\":false,\"query\":$QUERY}"
echo $REQUEST | \
curl -X POST -d #- -H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://www.googleapis.com/bigquery/v2/projects/$PROJECT/queries
If it worked, you should see output that looks like this:
{
"kind": "bigquery#queryResponse",
"schema": {
"fields": [
{
"name": "x",
"type": "INTEGER",
"mode": "NULLABLE"
},
{
"name": "y",
"type": "STRING",
"mode": "NULLABLE"
}
]
},
"jobReference": {
"projectId": "<your project ID>",
"jobId": "<your job ID>"
},
"totalRows": "1",
"rows": [
{
"f": [
{
"v": "1"
},
{
"v": "foo"
}
]
}
],
"totalBytesProcessed": "0",
"jobComplete": true,
"cacheHit": false
}
If you haven't set up the bq command-line tool, you can use bq init from your terminal to do so. Once you have, you can try running the same query using it:
bq query --use_legacy_sql=False "SELECT 1 AS x, 'foo' AS y;"
You can also see the REST API requests that the bq tool makes by passing the --apilog= option:
bq --apilog= query --use_legacy_sql=False "SELECT [1, 2, 3] AS x;"
Now let's try an example using the jobs.insert method instead of the query API. Run this script, replacing YOUR_PROJECT_NAME with your project name:
PROJECT="YOUR_PROJECT_NAME"
QUERY="\"SELECT 1 AS x, 'foo' AS y;\""
REQUEST="{\"configuration\":{\"query\":{\"useLegacySql\":false,\"query\":${QUERY}}}}"
echo $REQUEST | \
curl -X POST -d #- -H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://www.googleapis.com/bigquery/v2/projects/$PROJECT/jobs
Unlike the query API, which returned a response immediately, you will see a result that looks similar to this:
{
"kind": "bigquery#job",
"etag": "\"<etag string>\"",
"id": "<project name>:<job ID>",
"selfLink": "https://www.googleapis.com/bigquery/v2/projects/<project name>/jobs/<job ID>",
"jobReference": {
"projectId": "<project name>",
"jobId": "<job ID>"
},
"configuration": {
"query": {
"query": "SELECT 1 AS x, 'foo' AS y;",
"destinationTable": {
"projectId": "<project name>",
"datasetId": "<anonymous dataset>",
"tableId": "<anonymous table>"
},
"createDisposition": "CREATE_IF_NEEDED",
"writeDisposition": "WRITE_TRUNCATE",
"useLegacySql": false
}
},
"status": {
"state": "RUNNING"
},
"statistics": {
"creationTime": "<timestamp millis>",
"startTime": "<timestamp millis>"
},
"user_email": "<your email address>"
}
Notice the status:
"status": {
"state": "RUNNING"
},
If you want to check on the job now, you can use the jobs.get method. Similar to before, run this from your terminal, using the job ID from the output in the previous step:
PROJECT="YOUR_PROJECT_NAME"
JOB_ID="YOUR_JOB_ID"
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://www.googleapis.com/bigquery/v2/projects/$PROJECT/jobs/$JOB_ID
If the query is done, you'll get a response that indicates as much:
...
"status": {
"state": "DONE"
},
...
Finally, we can make a request to fetch the query results, also using the REST API.
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://www.googleapis.com/bigquery/v2/projects/$PROJECT/queries/$JOB_ID
The output will look similar to when we used the jobs.query method above:
{
"kind": "bigquery#getQueryResultsResponse",
"etag": "\"<etag string>\"",
"schema": {
"fields": [
{
"name": "x",
"type": "INTEGER",
"mode": "NULLABLE"
},
{
"name": "y",
"type": "STRING",
"mode": "NULLABLE"
}
]
},
"jobReference": {
"projectId": "<project ID>",
"jobId": "<job ID>"
},
"totalRows": "1",
"rows": [
{
"f": [
{
"v": "1"
},
{
"v": "foo"
}
]
}
],
"totalBytesProcessed": "0",
"jobComplete": true,
"cacheHit": true
}

How can I set a node to unschedulable status via the Kubernetes api?

I am attempting to emulate the behavior of kubectl patch. I'm sending an HTTP PATCH with a json payload of the following:
{
"apiVersion": "v1",
"kind": "Node",
"metadata": {
"name": "my-node-hostname"
},
"spec": {
"unschedulable": true
}
}
However, no matter how I seem to tweak this JSON, I keep getting a 415 and the following JSON status back:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "the server responded with the status code 415 but did not return more information",
"details": {},
"code": 415
}
Even with debug on kube-apiserver set to 1000, I get no feedback about why the payload is wrong!
Is there a particular format that one should use in the JSON payload sent via PATCH to enable this to work?
After a helpful member of the Kubernetes Slack channel mentioned I could get the payload from kubectl patch via the --verbose flag, it turns out that Kubernetes expects to get "Content-Type: application/strategic-merge-patch+json" when you are sending the PATCH payload.