Right now I have a MongoDB Sink and it saves the value of incoming AVRO messages correctly.
I need it to save the Kafka Message Key in the document.
I have tried org.apache.kafka.connect.transforms.HoistField$Key in order to add the key to the value that is being saved, but this did nothing. It did work when using ProvidedInKeyStrategy, but I don't want my _id to be the Kafka message Key.
My configuration:
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"connection.uri": "mongodb://mongo1",
"database": "mongodb",
"collection": "sink",
"topics": "topics.foo",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://schema-registry:8081",
"transforms": "hoistKey",
"transforms.hoistKey.type":"org.apache.kafka.connect.transforms.HoistField$Key",
"transforms.hoistKey.field":"kafkaKey"
}
Kafka message schema:
{
"type": "record",
"name": "Smoketest",
"namespace": "some_namespace",
"fields": [
{
"name": "timestamp",
"type": "int",
"logicalType": "timestamp-millis"
}
]
}
Kafka key schema:
[
{
"type": "enum",
"name": "EnvironmentType",
"namespace": "some_namespace",
"doc": "DEV",
"symbols": [
"Dev",
"Test",
"Accept",
"Sandbox",
"Prod"
]
},
{
"type": "record",
"name": "Key",
"namespace": "some_namespace",
"doc": "The standard Key type that is used as key",
"fields": [
{
"name": "conversation_id",
"doc": "The first system producing an event sets this field",
"type": "string"
},
{
"name": "broker_key",
"doc": "The key of the broker",
"type": "string"
},
{
"name": "user_id",
"doc": "User identification",
"type": [
"null",
"string"
]
},
{
"name": "application",
"doc": "The name of the application",
"type": [
"null",
"string"
]
},
{
"name": "environment",
"doc": "The type of environment",
"type": "type.EnvironmentType"
}
]
}
]
Using https://github.com/f0xdx/kafka-connect-wrap-smt I can now wrap all the data from the kafka message into a single document to save in my mongodb sink.
I have a AVRO schema registered in a kafka topic and am trying to send data to it. The schema has nested records and I'm not sure how I correctly send data to it using confluent_kafka python.
Example schema: *ingore any typos in schema (real one is very large, just an example)
{
"namespace": "company__name",
"name": "our_data",
"type": "record",
"fields": [
{
"name": "datatype1",
"type": ["null", {
"type": "record",
"name": "datatype1_1",
"fields": [
{"name": "site", "type": "string"},
{"name": "units", "type": "string"}
]
}]
"default": null
}
{
"name": "datatype2",
"type": ["null", {
"type": "record",
"name": "datatype2_1",
"fields": [
{"name": "site", "type": "string"},
{"name": "units", "type": "string"}
]
}]
"default": null
}
]
}
I am trying to send data to this schema using confluent_kafka python version. When I have done this prior, the records were not nested and I would use a typical dictionary key: value pairs and serialize it. How can I send nested data to work with schema.
What I tried so far...
message = {'datatype1':
{'site': 'sitename',
'units': 'm'
}
}
this version does not cause any kafka errors, but the all of the columns show up as null
and...
message = {'datatype1':
{'datatype1_1':
{'site': 'sitename',
'units': 'm'
}
}
}
This version produced a kafka error with the schema.
If you use namespaces, you don't have to worry about naming collisions and you can properly structure your optional records:
for example, both
{
"meta": {
"instanceID": "something"
}
}
And
{}
are valid instances of:
{
"doc": "Survey",
"name": "Survey",
"type": "record",
"fields": [
{
"name": "meta",
"type": [
"null",
{
"name": "meta",
"type": "record",
"fields": [
{
"name": "instanceID",
"type": [
"null",
"string"
],
"namespace": "Survey.meta"
}
],
"namespace": "Survey"
}
],
"namespace": "Survey"
}
]
}
I am attempting to load data from Cloud Storage into a table and am getting the error message below.
bq load --skip_leading_rows=1 --field_delimiter='\t' --source_format=CSV projectID:dataset.table gs://bucket/source.txt sku:STRING,variant_id:STRING,title:STRING,category:STRING,description:STRING,buy_url:STRING,mobile_url:STRING,itemset_url:STRING,image_url:STRING,swatch_url:STRING,availability:STRING,issellableonline:STRING,iswebexclusive:STRING,price:STRING,saleprice:STRING,quantity:STRING,coresku_inet:STRING,condition:STRING,productreviewsavg:STRING,productreviewscount:STRING,mediaset:STRING,webindexpty:INTEGER,NormalSalesIndex1:FLOAT,NormalSalesIndex2:FLOAT,NormalSalesIndex3:FLOAT,SalesScore:FLOAT,NormalInventoryIndex1:FLOAT,NormalInventoryIndex2:FLOAT,NormalInventoryIndex3:FLOAT,InventoryScore:FLOAT,finalscore:FLOAT,EDVP:STRING,dropship:STRING,brand:STRING,model_number:STRING,gtin:STRING,color:STRING,size:STRING,gender:STRING,age:STRING,oversized:STRING,ishazardous:STRING,proddept:STRING,prodsubdept:STRING,prodclass:STRING,prodsubclass:STRING,sku_attr_names:STRING,sku_attr_values:STRING,store_id:STRING,store_quantity:STRING,promo_name:STRING,product_badge:STRING,cbl_type_names1:STRING,cbl_type_value1:STRING,cbl_type_names2:STRING,cbl_type_value2:STRING,cbl_type_names3:STRING,cbl_type_value3:STRING,cbl_type_names4:STRING,cbl_type_value4:STRING,cbl_type_names5:STRING,cbl_type_value5:STRING,choice1_name_value:STRING,choice2_name_value:STRING,choice3_name_value:STRING,cbl_is_free_shipping:STRING,isnewflag:STRING,shipping_weight:STRING,masterpath:STRING,accessoriesFlag:STRING,short_copy:STRING,bullet_copy:STRING,map:STRING,display_msrp:STRING,display_price:STRING,suppress_sales_display:STRING,margin:FLOAT
I have also tried to load the schema into a json file and I get the same error message.
As this was too big for comment, will post it here.
I wonder what happens if you set the schema file to have this content:
[{"name": "sku", "type": "STRING"},
{"name": "variant_id", "type": "STRING"},
{"name": "title", "type": "STRING"},
{"name": "category", "type": "STRING"},
{"name": "description", "type": "STRING"},
{"name": "buy_url", "type": "STRING"},
{"name": "mobile_url", "type": "STRING"},
{"name": "itemset_url", "type": "STRING"},
{"name": "image_url", "type": "STRING"},
{"name": "swatch_url", "type": "STRING"},
{"name": "availability", "type": "STRING"},
{"name": "issellableonline", "type": "STRING"},
{"name": "iswebexclusive", "type": "STRING"},
{"name": "price", "type": "STRING"},
{"name": "saleprice", "type": "STRING"},
{"name": "quantity", "type": "STRING"},
{"name": "coresku_inet", "type": "STRING"},
{"name": "condition", "type": "STRING"},
{"name": "productreviewsavg", "type": "STRING"},
{"name": "productreviewscount", "type": "STRING"},
{"name": "mediaset", "type": "STRING"},
{"name": "webindexpty", "type": "INTEGER"},
{"name": "NormalSalesIndex1", "type": "FLOAT"},
{"name": "NormalSalesIndex2", "type": "FLOAT"},
{"name": "NormalSalesIndex3", "type": "FLOAT"},
{"name": "SalesScore", "type": "FLOAT"},
{"name": "NormalInventoryIndex1", "type": "FLOAT"},
{"name": "NormalInventoryIndex2", "type": "FLOAT"},
{"name": "NormalInventoryIndex3", "type": "FLOAT"},
{"name": "InventoryScore", "type": "FLOAT"},
{"name": "finalscore", "type": "FLOAT"},
{"name": "EDVP", "type": "STRING"},
{"name": "dropship", "type": "STRING"},
{"name": "brand", "type": "STRING"},
{"name": "model_number", "type": "STRING"},
{"name": "gtin", "type": "STRING"},
{"name": "color", "type": "STRING"},
{"name": "size", "type": "STRING"},
{"name": "gender", "type": "STRING"},
{"name": "age", "type": "STRING"},
{"name": "oversized", "type": "STRING"},
{"name": "ishazardous", "type": "STRING"},
{"name": "proddept", "type": "STRING"},
{"name": "prodsubdept", "type": "STRING"},
{"name": "prodclass", "type": "STRING"},
{"name": "prodsubclass", "type": "STRING"},
{"name": "sku_attr_names", "type": "STRING"},
{"name": "sku_attr_values", "type": "STRING"},
{"name": "store_id", "type": "STRING"},
{"name": "store_quantity", "type": "STRING"},
{"name": "promo_name", "type": "STRING"},
{"name": "product_badge", "type": "STRING"},
{"name": "cbl_type_names1", "type": "STRING"},
{"name": "cbl_type_value1", "type": "STRING"},
{"name": "cbl_type_names2", "type": "STRING"},
{"name": "cbl_type_value2", "type": "STRING"},
{"name": "cbl_type_names3", "type": "STRING"},
{"name": "cbl_type_value3", "type": "STRING"},
{"name": "cbl_type_names4", "type": "STRING"},
{"name": "cbl_type_value4", "type": "STRING"},
{"name": "cbl_type_names5", "type": "STRING"},
{"name": "cbl_type_value5", "type": "STRING"},
{"name": "choice1_name_value", "type": "STRING"},
{"name": "choice2_name_value", "type": "STRING"},
{"name": "choice3_name_value", "type": "STRING"},
{"name": "cbl_is_free_shipping", "type": "STRING"},
{"name": "isnewflag", "type": "STRING"},
{"name": "shipping_weight", "type": "STRING"},
{"name": "masterpath", "type": "STRING"},
{"name": "accessoriesFlag", "type": "STRING"},
{"name": "short_copy", "type": "STRING"},
{"name": "bullet_copy", "type": "STRING"},
{"name": "map", "type": "STRING"},
{"name": "display_msrp", "type": "STRING"},
{"name": "display_price", "type": "STRING"},
{"name": "suppress_sales_display", "type": "STRING"},
{"name": "margin", "type": "FLOAT"}]
If you save it say in file "schema.json" and run the command:
bq load --skip_leading_rows=1 --field_delimiter='\t' --source_format=CSV projectID:dataset.table gs://bucket/source.txt schema.json
Do you still get the same error?
Cherba nailed it. The typo was in my batch file that included the extra load parameter. Thanks for all your time consideration.