Storing Avro schema in schema registry - apache-kafka

I am using Confluent's JDBC connector to send data into Kafka in the Avro format. I need to store this schema in the schema registry, but I'm not sure what format it accepts. I've read the documentation here, but it doesn't mention much.
I have tried this (taking the Avro output and pasting it in - for one int and one string field):
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"type":"struct","fields":[{"type":"int64","optional":true,"field":"id"},{"type":"string","optional":true,"field":"serial"}],"optional":false,"name":"test"}' http://localhost:8081/subjects/view/versions
but I get the error: {"error_code":422,"message":"Unrecognized field: type"}

The schema that you give as a JSON should start with a 'schema' key. The actual schema that you provide will be the value of the key schema.
So your request should look like this:
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema" : "{\"type\":\"string\",\"fields\":[{\"type\":\"int64\",\"optional\":true,\"field\":\"id\"},{\"type\":\"string\",\"optional\":true,\"field\":\"serial\"}],\"optional\":false,\"name\":\"test\"}"}' http://localhost:8081/subjects/view/versions
I've made two other changes to the command:
I've escaped each double quote within the value of the schema key.
I've changed the struct data structure to string. I'm not sure why it isn't taking complex structures though.
Check out how they've modeled the schema here, for the first POST request described in the documentation.

First, do you need to store the schema in advance? If you use the JDBC connector with the Avro converter (which is part of the schema registry package), the JDBC connector will figure out the schema of the table from the database and register it for you. You will need to specify the converter in your KafkaConnect config file. You can use this as an example: https://github.com/confluentinc/schema-registry/blob/master/config/connect-avro-standalone.properties
If you really want to register the schema yourself, there's some chance the issue is with the shell command - escaping JSON in shell is tricky. I installed Advanced Rest Client in Chrome and use that to work with the REST APIs of both schema registry and KafkaConnect.

Related

Kafka mongo db source connector not working

Hi in my POC I am using both the sink and the source mongodb connector.
The sink connector works fine. But the source connector does not push data into the resultant topic. The objective is to push full documents of all changes (Insert and Update) in a collection call 'request'.
Below is the code.
curl -X PUT http://localhost:8083/connectors/source-mongodb-request/config -H "Content-Type: application/json" -d '{
"tasks.max":1,
"connector.class":"com.mongodb.kafka.connect.MongoSourceConnector",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"org.apache.kafka.connect.storage.StringConverter",
"connection.uri":"mongodb://localhost:27017",
"pipeline":"[]",
"database":"proj",
"publish.full.document.only":"true",
"collection":"request",
"topic.prefix": ""
}'
No messages are getting pushed to proj.request topic. The topic gets created once I insert a record in the collection 'request'.
Would be great to get help on this, as its a make or break task for the POC.
Things work fine n the connectors on confluent cloud. But its the on premise set up on which I need to get this working.
make sure you have a valid pipeline - stages included in your properties file such as this one
"pipeline":" [{"$match":{"type":{"$in"["insert","update","replace"]}}}]",
Refer : https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/

Apache NiFi - Move table content from Oracle to Mongo DB

I am very new to Apache Nifi. I am trying to Migrate data from Oracle to Mongo DB as per the screenshot in Apache NiFi. I am failing with the reported error. Pls help.
Till PutFile i think its working fine, as i can see the below Json format file in my local directory.
Simple setup direct from Oracle Database to MongoDb without SSL or username and password (not recommended for Production)
Just keep tinkering on PutMongoRecord Processor until you resolve all outstanding issues and exclamation mark is cleared
I am first using an ExecuteSQL processor which is resulting the dataset in Avro, I need the final data in JSON. In DBconnection pooling Service, you need to create a controller with the credentials of your Orcale database. Post that I am using Split Avro and then Transform XML to convert it into JSON. In Transform XML, you need to use XSLT file. After that, I use PutMongo Processor for ingestion in Json which gets automatically converted in BSON

run spark job using databricks Resr API

I am using databricks rest API to run spark jobs.
I am using the foollowing commands:
curl -X POST -H "Authorization: XXXX" 'url/api/2.0/jobs/create' -d ' {"name":"jobname","existing_cluster_id":"0725-095337-jello70","libraries": [{"jar": "dbfs:/mnt/pathjar/name-9edeec0f.jar"}],"email_notifications":{},"timeout_seconds":0,"spark_jar_task": {"main_class_name": "com.company.DngApp"}}'
curl -X POST -H "Authorization: XXXX" 'url/api/2.0/jobs/run-now' -d '{"job_id":25854,"jar_params":["--param","value"]}'
here param is an input args but I want to find a way to override spark driver properties, usually I do :
--driver-java-options='-Dparam=value'
but I am looking for the equivalent for the databricks rest API side
You cannot use "--driver-java-options" in Jar params.
Reason:
Note: Jar_params is a list of parameters for jobs with JAR tasks, e.g. "jar_params": ["john doe", "35"].
The parameters will be used to invoke the main function of the main class specified in the Spark JAR task. If not specified upon run-now, it will default to an empty list. jar_params cannot be specified in conjunction with notebook_params. The JSON representation of this field (i.e. {"jar_params":["john doe","35"]}) cannot exceed 10,000 bytes.
For more details, Azure Databricks - Jobs API - Run Now.
You can use spark_conf to pass in a string of user-specified spark configuration key-value pairs.
An object containing a set of optional, user-specified Spark configuration key-value pairs. You can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.
Example Spark confs: {"spark.speculation": true, "spark.streaming.ui.retainedBatches": 5} or {"spark.driver.extraJavaOptions": "-verbose:gc -XX:+PrintGCDetails"}
For more details, refer "NewCluster configuration".
Hope this helps.

Kafka schema registry not compatible in the same topic

I'm using Kafka schema registry for producing/consuming Kafka messages, for example I have two fields they are both string type, the pseudo schema as following:
{"name": "test1", "type": "string"}
{"name": "test2", "type": "string"}
but after sending and consuming a while, I need modify schema to change the second filed to long type, then it threw the following exception:
Schema being registered is incompatible with an earlier schema; error code: 409
I'm confused, if schema registry can not evolve the schema upgrade/change, then why should I use Schema registry, or say why I use Avro?
Fields cannot be renamed in BACKWARD compatibility mode. As a workaround you can change the compatibility rules for the schema registry.
According to the docs:
The schema registry server can enforce certain compatibility rules
when new schemas are registered in a subject. Currently, we support
the following compatibility rules.
Backward compatibility (default): A new schema is backward compatible
if it can be used to read the data written in all previous schemas.
Backward compatibility is useful for loading data into systems like
Hadoop since one can always query data of all versions using the
latest schema.
Forward compatibility: A new schema is forward
compatible if all previous schemas can read data written in this
schema. Forward compatibility is useful for consumer applications that
can only deal with data in a particular version that may not always be
the latest version.
Full compatibility: A new schema is fully
compatible if it’s both backward and forward compatible.
No compatibility: A new schema can be any schema as long as it’s a valid
Avro.
Setting compatibility to NONE should do the trick.
# Update compatibility requirements globally
$ curl -X PUT -H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data '{"compatibility": "NONE"}' \
http://localhost:8081/config
And the response should be
{"compatibility":"NONE"}
I generally discourage setting compatibility to NONE on a subject unless absolutely necessary.
If you need just the new schema and you don't need the previous schemas from schema registry, you can delete the older schemas as mentioned below
:
I've tested this with confluent-kafka and it worked for me
Deletes all schema versions registered under the subject "Kafka-value"
curl -X DELETE http://localhost:8081/subjects/Kafka-value
Deletes version 1 of the schema registered under subject "Kafka-value"
curl -X DELETE http://localhost:8081/subjects/Kafka-value/versions/1
Deletes the most recently registered schema under subject "Kafka-value"
curl -X DELETE http://localhost:8081/subjects/Kafka-value/versions/latest
Ref: https://docs.confluent.io/platform/current/schema-registry/schema-deletion-guidelines.html
https://docs.confluent.io/current/avro.html
You might need to add a "default": null.
You can also delete existing one and register the updated one.
You can simply append a default value like this.
{"name": "test3", "type": "string","default": null}

Cannot set more than one Meta data with OpenStack Swift Object

I am trying to set metadata with a Object stored in Swift Container. I am using following command (note that my container is 'container1' and object is 'employee.json':
curl -X POST -H "X-Auth-Token:$TOKEN" -H 'X-Object-Meta-metadata1: value' $STORAGE_URL/container1/employee.json
It works fine with one metadata. But whenever, I am trying to set more than one metadata issuing several curl commands, only the last metadata value is actually set.
I think, there should not be a limit that you can set only one metadata for a swift object. Am I doing anything wrong?
FYI: I am using Havana release of Openstack Swift.
Thank you.
I think, I have figured it out... Its my bad that I did not read documentation sincerely.
It [1] says, "A POST request will delete all existing metadata added with a previous PUT/POST."
So, I tried this and it worked...
curl -X POST -H "X-Auth-Token:$TOKEN" -H 'X-Object-Meta-p1:[P1]' -H 'X-Object-Meta-p2:[P1]' $STORAGE_URL/container1/employee.json
Here, instead of two POST requests, now I have set multiple metadata in a single POST request.
Again, thanks.
Ref:
http://docs.openstack.org/api/openstack-object-storage/1.0/content/update-object-metadata.html