Confluent Schema Registry: POST simple JSON schema with object having single property - confluent-platform

OS: Ubuntu 18.x
docker image (from dockerhub.com, as of 2020-09-25): confluentinc/cp-schema-registry:latest
I am exploring the HTTP API for the Confluent Schema Registry. First off, is there a definitive assertion somewhere about what version of the JSON Schema definition the registry assumes? For now, I am assuming Draft v7.0. More broadly, I believe the API that returns supported schema should list versions. Eg, instead of:
$ curl -X GET http://localhost:8081/schemas/types
["JSON","PROTOBUF","AVRO"]
you would have:
$ curl -X GET http://localhost:8081/schemas/types
[{"flavor": "JSON", "version": "7.0"}, {"flavor": "PROTOBUF", "version": "1.2"}, {"flavor": "AVRO", "version": "3.5"}]
so at least programmers would know for sure what the Schema Registry assumes.
This issue aside, I cannot seem to POST a rather trivial JSON schema to the registry:
$ curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{ "schema": "{ \"type\": \"object\", \"properties\": { \"f1\": { \"type\": \"string\" } } }" }' http://localhost:8081/subjects/mytest-value/versions
{"error_code":42201,"message":"Either the input schema or one its references is invalid"}
Here I am POSTing the schema to the mytest subject. The schema, incidentally, I scraped from Confluent documentation, and then escaped it accordingly.
Can you tell why this schema is not POSTing to the registry? And more generally, can I assume full support for Draft v7.0 of the JSON Schema definition?

You need to pass the schemaType flag. "If no schemaType is supplied, schemaType is assumed to be AVRO." https://docs.confluent.io/current/schema-registry/develop/api.html#post--subjects-(string-%20subject)-versions:
'{"schemaType":"JSON","schema":"{\"type\":\"object\",\"fields\":[{\"name\":\"f1\",\"type\":\"string\"}]}"}'
I agree that output of the supported versions would be helpful.

Related

cannot purge deleted entity in apache atlas

I tried to purge deleted entities in apache atlas and I keep getting the following error
"error":"Cannot deserialize instance of java.util.HashSet<java.lang.Object> out of START_OBJECT token\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 1]"
I am using the following python code. How should I format my json request?
def purgeEntity(guid):
endpoint = 'http://localhost:21000/api/atlas/admin/purge'
response = requests.post(endpoint,
data=guid,
auth=HTTPBasicAuth('admin', 'password'),
headers={"Content-Type": "application/json"}) data = json.dumps({"guid":
["0f8aad54-7275-483e-90ca-8b1c09b061bc"]}) purgeEntity(data)
curl -iv -u admin:admin -X DELETE http://localhost:21000/api/atlas/v2/entity/guid/3f62e45b-5e0b-4431-be1f-b5c77808f29b

Kafka connect spooldir Dynamic schema generator

This is regarding kafka-connect-spooldir connector for CSV. I would like to know if there is a way to avoid hardcoding the schema and let the connector create schema dynamically? I have a lot of csv files to process say few hundreds GB per day sometimes a couple of tera bytes of csv. Sometimes some csv files have new columns and some are dropped.
I am able to successfully read the csv and write to elastic search, and I followed your post.https://www.confluent.io/blog/ksql-in-action-enriching-csv-events-with-data-from-rdbms-into-AWS/
So now I do not want to use value schema and key schema.
From the link https://docs.confluent.io/current/connect/kafka-connect-spooldir/connectors/csv_source_connector.html; I figured that schema.generation.enabled can be set to true.
here's my REST API call [ including my connector config]
$curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://xxx:000/connectors/ -d '{
"name":"csv1",
"config":{
"tasks.max":"1",
"connector.class":"com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector",
"input.file.pattern":"^.*csv$",
"halt.on.error":"false",
"topic":"order",
"schema.generation.enabled":"true",
"schema.generation.key.name":"orderschema",
"schema.generation.value.name":"orderdata",
"csv.first.row.as.header":"true",
"csv.null.field.indicator":"EMPTY_SEPARATORS",
"batch.size" : "5000",
}
}
'
When I submit this, i get the following error.
{
"name": "order",
"connector": {
"state": "FAILED",
"worker_id": "localhost:000",
"trace": "org.apache.kafka.connect.errors.DataException: More than one schema was found for the input pattern.\nSchema: {\"name\":\"com.github.jcustenborder.kafka.connect.model.Value\",\"type\":\"STRUCT\",\"isOptional\":false,\"fieldSchemas\":
Whats the solution for this?
I was able to parse all the data now. Trick was to process one file first [ any] and then just to check add randomly add another. It looks like that way, it updates the schema automagically. (like how Robin Moffatt calls it)
After that add all files to the folder, it process just fine. YAY!

Hashicorp Vault reading creds - failed to find entry for connection with name: db_name

I dont know if I did something wrong or not.
But here is my configuration.
// payload.json
{
"plugin_name": "postgresql-database-plugin",
"allowed_roles": "*",
"connection_url": "postgresql://{{username}}:{{password}}#for-testing-vault.rds.amazonaws.com:5432/test-app",
"username": "test",
"password": "testtest"
}
then run this command:
curl --header "X-Vault-Token: ..." --request POST --data #payload.json http://ip_add.us-west-1.compute.amazonaws.com:8200/v1/database/config/postgresql
roles configuration:
// readonlypayload.json
{
"db_name": "test-app",
"creation_statements": ["CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";"],
"default_ttl": "1h",
"max_ttl": "24h"
}
then run this command:
curl --header "X-Vault-Token: ..." --request POST --data #readonlypayload.json http://ip_add.us-west-1.compute.amazonaws.com:8200/v1/database/roles/readonly
Then created a policy:
path "database/creds/readonly" {
capabilities = [ "read" ]
}
path "/sys/leases/renew" {
capabilities = [ "update" ]
}
and run this to get the token:
curl --header "X-Vault-Token: ..." --request POST --data '{"policies": ["db_creds"]}' http://ip_add.us-west-1.compute.amazonaws.com:8200/v1/auth/token/create | jq
executed this command to get the values:
VAULT_TOKEN=... consul-template.exe -template="config.yml.tpl:config.yml" -vault-addr "http://ip_add.us-west-1.compute.amazonaws.com:8200" -log-level debug
Then I receive this errors:
URL: GET http://ip_add.us-west-1.compute.amazonaws.com:8200/v1/database/creds/readonly
Code: 500. Errors:
* 1 error occurred:
* failed to find entry for connection with name: "test-app"
Any suggestions will be appreciated, thanks!
EDIT: Tried also this command on the server
vault read database/creds/readonly
Still returning
* 1 error occurred:
* failed to find entry for connection with name: "test-app"
For those coming to this page via Googling for this error message, this might help:
Unfortunately the Vault database/role's parameter db_name is a bit misleading. The value needs to match a database/config/ entry, not an actual database name per se. The GRANT statement itself is where the database name is relevant, the db_name is just a reference to the config name, which may or may not match the database name. (In my case, the configs have other data such as environment prefixing the DB name.)
In case this issue not yet resolved
vault is not able to find the db name 'test-app' in postgres, or authentication to the db 'test-app' with given credential fails, so
connection failure happened.
login to postgres and check if the db 'test-app' exists by running \l.
for creating role in postgres you should use the default db 'postgres'. Try to change name from 'test-app' to 'postgres' and check.
Change connection_url in payload.json:
"connection_url": "postgresql://{{username}}:{{password}}#for-testing-vault.rds.amazonaws.com:5432/postgres",

Hood "config.json" values

I have some problems with documentation for hood, there no explanation about what supposed to be in config.json.
I've tried:
{
"development": {
"driver": "postgres",
"source": "my_development"
}
}
but I have the error:
hood db:migrate
2014/06/23 12:53:14 applying migrations...
panic: missing "=" after "my_development" in connection info string"
From the hood documentation :
The driver and source fields are the strings you would pass to the sql.Open(2) function.
So the driver value should be postgresql (for your example), and the source value should be either a list of key=value or a full connection URI (like described in the postgresql documentation).
Some examples (from here) :
postgres://pqgotest:password#localhost/pqgotest?sslmode=verify-full
user=pqgotest dbname=pqgotest sslmode=verify-full

mongodb river for elasticsearch

Is there any official mongodb river available for elasticsearch ? I am using mongodb in node.js through the module mogoose.
I have seen one in http://www.matt-reid.co.uk/blog_post.php?id=68
Is this the correct one ? It says unofficial though...
Edit:
looks like, https://github.com/aparo/elasticsearch has inbuilt mongodb plugin.. Is there any doc available about how to configure this with mongodb and how mongodb pushes data for indexing to elasticsearch?
There is a new MongoDB river on github:
https://github.com/richardwilly98/elasticsearch-river-mongodb
according to the code you can specify several things but there is no separate doc (expect one mailing list discussion):
https://github.com/aparo/elasticsearch/blob/master/plugins/river/mongodb/src/main/java/org/elasticsearch/river/mongodb/MongoDBRiver.java
https://github.com/aparo/elasticsearch/blob/master/plugins/river/mongodb/src/test/java/org/elasticsearch/river/mongodb/MongoDBRiverTest.java
This isn't really the answer you're looking for. I looked at building this mongo river but I found some discussion on it having some memory leaks and I didn't want to fiddle with Java code. I wrote my own mongo->ES importer using the bulk API.
It's a work in progress, so feel free to contribute! :)
https://github.com/orenmazor/elastic-search-loves-mongo
Yes, There is a new MongoDB river on github:
https://github.com/richardwilly98/elasticsearch-river-mongodb
For Further Explanation You can follow below steps:
Step.1: -Install
ES_HOME/bin/plugin -install elasticsearch/elasticsearch-mapper-attachments/1.4.0
ES_HOME/bin/plugin -install richardwilly98/elasticsearch-river-mongodb/1.4.0
Step.2: -Restart Elasticsearch
ES_HOME/bin/service/elasticsearch restart
Step.3: -Enable replica sets in mongodb
go to mongod.conf & Add line
replSet=rs0
save & Exit
Restart mongod
Step.4: -Tell elasticsearch to index the “person” collection in testmongo database by issuing the following command in your terminal
curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "testmongo",
"collection": "person"
},
"index": {
"name": "mongoindex",
"type": "person"
}
}'
Step.5: -add some data to the mongodb through mongo terminal
use testmongo
var p = {firstName: "John", lastName: "Doe"}
db.person.save(p)
Step.6: -Use this command to search the data
curl -XGET 'http://localhost:9200/mongoindex/_search?q=firstName:John'
NOTE:
DELETE /_river
DELETE/_mongoindex
Again run this command,
curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "testmongo",
"collection": "person"
},
"index": {
"name": "mongoindex",
"type": "person"
}
}'
Step.7: -See HQ Plugin
In mongoindex, you will get your data.