I am attempting to load data from Cloud Storage into a table and am getting the error message below.
bq load --skip_leading_rows=1 --field_delimiter='\t' --source_format=CSV projectID:dataset.table gs://bucket/source.txt sku:STRING,variant_id:STRING,title:STRING,category:STRING,description:STRING,buy_url:STRING,mobile_url:STRING,itemset_url:STRING,image_url:STRING,swatch_url:STRING,availability:STRING,issellableonline:STRING,iswebexclusive:STRING,price:STRING,saleprice:STRING,quantity:STRING,coresku_inet:STRING,condition:STRING,productreviewsavg:STRING,productreviewscount:STRING,mediaset:STRING,webindexpty:INTEGER,NormalSalesIndex1:FLOAT,NormalSalesIndex2:FLOAT,NormalSalesIndex3:FLOAT,SalesScore:FLOAT,NormalInventoryIndex1:FLOAT,NormalInventoryIndex2:FLOAT,NormalInventoryIndex3:FLOAT,InventoryScore:FLOAT,finalscore:FLOAT,EDVP:STRING,dropship:STRING,brand:STRING,model_number:STRING,gtin:STRING,color:STRING,size:STRING,gender:STRING,age:STRING,oversized:STRING,ishazardous:STRING,proddept:STRING,prodsubdept:STRING,prodclass:STRING,prodsubclass:STRING,sku_attr_names:STRING,sku_attr_values:STRING,store_id:STRING,store_quantity:STRING,promo_name:STRING,product_badge:STRING,cbl_type_names1:STRING,cbl_type_value1:STRING,cbl_type_names2:STRING,cbl_type_value2:STRING,cbl_type_names3:STRING,cbl_type_value3:STRING,cbl_type_names4:STRING,cbl_type_value4:STRING,cbl_type_names5:STRING,cbl_type_value5:STRING,choice1_name_value:STRING,choice2_name_value:STRING,choice3_name_value:STRING,cbl_is_free_shipping:STRING,isnewflag:STRING,shipping_weight:STRING,masterpath:STRING,accessoriesFlag:STRING,short_copy:STRING,bullet_copy:STRING,map:STRING,display_msrp:STRING,display_price:STRING,suppress_sales_display:STRING,margin:FLOAT
I have also tried to load the schema into a json file and I get the same error message.
As this was too big for comment, will post it here.
I wonder what happens if you set the schema file to have this content:
[{"name": "sku", "type": "STRING"},
{"name": "variant_id", "type": "STRING"},
{"name": "title", "type": "STRING"},
{"name": "category", "type": "STRING"},
{"name": "description", "type": "STRING"},
{"name": "buy_url", "type": "STRING"},
{"name": "mobile_url", "type": "STRING"},
{"name": "itemset_url", "type": "STRING"},
{"name": "image_url", "type": "STRING"},
{"name": "swatch_url", "type": "STRING"},
{"name": "availability", "type": "STRING"},
{"name": "issellableonline", "type": "STRING"},
{"name": "iswebexclusive", "type": "STRING"},
{"name": "price", "type": "STRING"},
{"name": "saleprice", "type": "STRING"},
{"name": "quantity", "type": "STRING"},
{"name": "coresku_inet", "type": "STRING"},
{"name": "condition", "type": "STRING"},
{"name": "productreviewsavg", "type": "STRING"},
{"name": "productreviewscount", "type": "STRING"},
{"name": "mediaset", "type": "STRING"},
{"name": "webindexpty", "type": "INTEGER"},
{"name": "NormalSalesIndex1", "type": "FLOAT"},
{"name": "NormalSalesIndex2", "type": "FLOAT"},
{"name": "NormalSalesIndex3", "type": "FLOAT"},
{"name": "SalesScore", "type": "FLOAT"},
{"name": "NormalInventoryIndex1", "type": "FLOAT"},
{"name": "NormalInventoryIndex2", "type": "FLOAT"},
{"name": "NormalInventoryIndex3", "type": "FLOAT"},
{"name": "InventoryScore", "type": "FLOAT"},
{"name": "finalscore", "type": "FLOAT"},
{"name": "EDVP", "type": "STRING"},
{"name": "dropship", "type": "STRING"},
{"name": "brand", "type": "STRING"},
{"name": "model_number", "type": "STRING"},
{"name": "gtin", "type": "STRING"},
{"name": "color", "type": "STRING"},
{"name": "size", "type": "STRING"},
{"name": "gender", "type": "STRING"},
{"name": "age", "type": "STRING"},
{"name": "oversized", "type": "STRING"},
{"name": "ishazardous", "type": "STRING"},
{"name": "proddept", "type": "STRING"},
{"name": "prodsubdept", "type": "STRING"},
{"name": "prodclass", "type": "STRING"},
{"name": "prodsubclass", "type": "STRING"},
{"name": "sku_attr_names", "type": "STRING"},
{"name": "sku_attr_values", "type": "STRING"},
{"name": "store_id", "type": "STRING"},
{"name": "store_quantity", "type": "STRING"},
{"name": "promo_name", "type": "STRING"},
{"name": "product_badge", "type": "STRING"},
{"name": "cbl_type_names1", "type": "STRING"},
{"name": "cbl_type_value1", "type": "STRING"},
{"name": "cbl_type_names2", "type": "STRING"},
{"name": "cbl_type_value2", "type": "STRING"},
{"name": "cbl_type_names3", "type": "STRING"},
{"name": "cbl_type_value3", "type": "STRING"},
{"name": "cbl_type_names4", "type": "STRING"},
{"name": "cbl_type_value4", "type": "STRING"},
{"name": "cbl_type_names5", "type": "STRING"},
{"name": "cbl_type_value5", "type": "STRING"},
{"name": "choice1_name_value", "type": "STRING"},
{"name": "choice2_name_value", "type": "STRING"},
{"name": "choice3_name_value", "type": "STRING"},
{"name": "cbl_is_free_shipping", "type": "STRING"},
{"name": "isnewflag", "type": "STRING"},
{"name": "shipping_weight", "type": "STRING"},
{"name": "masterpath", "type": "STRING"},
{"name": "accessoriesFlag", "type": "STRING"},
{"name": "short_copy", "type": "STRING"},
{"name": "bullet_copy", "type": "STRING"},
{"name": "map", "type": "STRING"},
{"name": "display_msrp", "type": "STRING"},
{"name": "display_price", "type": "STRING"},
{"name": "suppress_sales_display", "type": "STRING"},
{"name": "margin", "type": "FLOAT"}]
If you save it say in file "schema.json" and run the command:
bq load --skip_leading_rows=1 --field_delimiter='\t' --source_format=CSV projectID:dataset.table gs://bucket/source.txt schema.json
Do you still get the same error?
Cherba nailed it. The typo was in my batch file that included the extra load parameter. Thanks for all your time consideration.
Related
I have an avro schema for which I am generating the Java bean using the avro-maven-plugin. I am then instantiating it and sending it to Kafka using also Confluent's Schema Registry. I can also consume and deserialise the avro into a Spark DataFrame just fine. The problem I am facing is that it is forcing me to set the schemaVersionId at the producer level. If I don't set it, the KafkaAvroSerializer will throw the error in the title. Any ideas please?
val contractEvent: ContractEvent = new ContractEvent()
contractEvent.setSchemaVersionId("1") // This should be appended as the schema is auto-registered.
contractEvent.setIngestedAt("123")
contractEvent.setChangeType(changeTypeEnum.U)
contractEvent.setServiceName("Contract")
contractEvent.setPayload(avroContract)
{
"type": "record",
"namespace": "xxxxxxx",
"name": "ContractEvent",
"fields": [
{"name": "ingestedAt", "type": "string"},
{"name": "eventType",
"type": {
"name": "eventTypeEnum",
"type": "enum", "symbols" : ["U", "D", "B"]
}
},
{"name": "serviceName", "type": "string"},
{"name": "payload",
"type": {
"type": "record",
"name": "Contract",
"fields": [
{"name": "identifier", "type": "string"},
{"name": "createdBy", "type": "string"},
{"name": "createdDate", "type": "string"},
]
}
}
]
}
I was following tutorial on kafka connect, and I am wondering if there is a possibility to define a custom schema registry for a topic which data came from a MySql table.
I can't find where define it in my json/connect config and I don't want to create a new version of that schema after creating it.
My MySql table called stations has this schema
Field | Type
---------------+-------------
code | varchar(4)
date_measuring | timestamp
attributes | varchar(256)
where the attributes contains a Json data and not a String (I have to use that type because the Json field of the attributes are variable.
My connector is
{
"value.converter.schema.registry.url": "http://localhost:8081",
"_comment": "The Kafka topic will be made up of this prefix, plus the table name ",
"key.converter.schema.registry.url": "http://localhost:8081",
"name": "jdbc_source_mysql_stations",
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"transforms": [
"ValueToKey"
],
"transforms.ValueToKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.ValueToKey.fields": [
"code",
"date_measuring"
],
"connection.url": "jdbc:mysql://localhost:3306/db_name?useJDBCCompliantTimezoneShift=true&useLegacyDatetimeCode=false&serverTimezone=UTC",
"connection.user": "confluent",
"connection.password": "**************",
"table.whitelist": [
"stations"
],
"mode": "timestamp",
"timestamp.column.name": [
"date_measuring"
],
"validate.non.null": "false",
"topic.prefix": "mysql-"
}
and creates this schema
{
"subject": "mysql-stations-value",
"version": 1,
"id": 23,
"schema": "{\"type\":\"record\",\"name\":\"stations\",\"fields\":[{\"name\":\"code\",\"type\":\"string\"},{\"name\":\"date_measuring\",\"type\":{\"type\":\"long\",\"connect.version\":1,\"connect.name\":\"org.apache.kafka.connect.data.Timestamp\",\"logicalType\":\"timestamp-millis\"}},{\"name\":\"attributes\",\"type\":\"string\"}],\"connect.name\":\"stations\"}"
}
Where "attributes" field is of course a String.
Unlike I would apply it this other schema.
{
"fields": [
{
"name": "code",
"type": "string"
},
{
"name": "date_measuring",
"type": {
"connect.name": "org.apache.kafka.connect.data.Timestamp",
"connect.version": 1,
"logicalType": "timestamp-millis",
"type": "long"
}
},
{
"name": "attributes",
"type": {
"type": "record",
"name": "AttributesRecord",
"fields": [
{
"name": "H1",
"type": "long",
"default": 0
},
{
"name": "H2",
"type": "long",
"default": 0
},
{
"name": "H3",
"type": "long",
"default": 0
},
{
"name": "H",
"type": "long",
"default": 0
},
{
"name": "Q",
"type": "long",
"default": 0
},
{
"name": "P1",
"type": "long",
"default": 0
},
{
"name": "P2",
"type": "long",
"default": 0
},
{
"name": "P3",
"type": "long",
"default": 0
},
{
"name": "P",
"type": "long",
"default": 0
},
{
"name": "T",
"type": "long",
"default": 0
},
{
"name": "Hr",
"type": "long",
"default": 0
},
{
"name": "pH",
"type": "long",
"default": 0
},
{
"name": "RX",
"type": "long",
"default": 0
},
{
"name": "Ta",
"type": "long",
"default": 0
},
{
"name": "C",
"type": "long",
"default": 0
},
{
"name": "OD",
"type": "long",
"default": 0
},
{
"name": "TU",
"type": "long",
"default": 0
},
{
"name": "MO",
"type": "long",
"default": 0
},
{
"name": "AM",
"type": "long",
"default": 0
},
{
"name": "N03",
"type": "long",
"default": 0
},
{
"name": "P04",
"type": "long",
"default": 0
},
{
"name": "SS",
"type": "long",
"default": 0
},
{
"name": "PT",
"type": "long",
"default": 0
}
]
}
}
],
"name": "stations",
"namespace": "com.mycorp.mynamespace",
"type": "record"
}
Any suggestion please?
In case it's not possible, I suppose I'll have to create a KafkaStream to create another topic, even if I would avoid it.
Thanks in advance!
I don't think you're asking anything about using a "custom" registry (which you'd do with the two lines that say which registry you're using), but rather how you can parse the data / apply a schema after the record is pulled from the database
You can write your own Transform, or you can use Kstreams, which are really the main options here. There is a SetSchemaMetadata transform, but I'm not sure that'll do what you want (parse a string into an Avro record)
Or if you must shove JSON data into a single database attribute, maybe you shouldn't use Mysql and rather a document database which has more flexible data constraints.
Otherwise, you can use BLOB rather than varchar and put binary Avro data into that column, but then you'd still need a custom deserializer to read the data
I created a custom task using the documentation, however it works on Azure DevOps Services but on Server it gives the error
An error occurred while loading the YAML build pipeline. Value cannot be null. Parameter name: key
My first thoughts are "what is the parameter that is missing?" so i filled all the available and possible parameters and still continued with the error.
After that i went to the event viewer in the machine running Azure DevOps Server and got this error:
Detailed Message: The subscriber Pipelines Check Run: build completed event listener raised an exception while being notified of event Microsoft.TeamFoundation.Build2.Server.BuildCompletedEvent.
Exception Message: Value cannot be null.
Parameter name: definition and repository (type ArgumentNullException)
Exception Stack Trace: at Microsoft.TeamFoundation.Pipelines.Server.Providers.TfsGitProvider.TfsGitConnectionCreator.IsProviderDefinition(IVssRequestContext requestContext, BuildDefinition definition)
at Microsoft.TeamFoundation.Pipelines.Server.Extensions.BuildCompletedEventListener2.HandleCompletedEvent(IVssRequestContext requestContext, IReadOnlyBuildData build, BuildDefinition definition)
at Microsoft.TeamFoundation.Pipelines.Server.Extensions.BuildCompletedEventListener.ProcessEvent(IVssRequestContext requestContext, NotificationType notificationType, Object notificationEvent, Int32& statusCode, String& statusMessage, ExceptionPropertyCollection& properties)
at Microsoft.TeamFoundation.Framework.Server.TeamFoundationEventService.SubscriptionList.Notify(IVssRequestContext requestContext, NotificationType notificationType, Object notificationEventArgs, String& statusMessage, ExceptionPropertyCollection& properties, Exception& exception)
task.json:
{
"id": "25156245-9317-48e2-bcf4-7dab4c130a3e",
"name": "ping-pong-build-trigger",
"friendlyName": "Ping Pong Build Trigger",
"description": "Randomly trigger builds to find a sequenced build order",
"helpMarkDown": "https://github.com/brunomartinspro/Ping-Pong-Build-Trigger-AzureDevOps",
"category": "Build",
"author": "Bruno Martins (brunomartins.pro)",
"version": {
"Major": 1,
"Minor": 0,
"Patch": 0
},
"instanceNameFormat": "Ping Pong Build Trigger",
"properties": {
"mode": {
"type": "string",
"description": "Mode to be used",
"label": "Mode",
"required": "true"
},
"apiKey": {
"type": "string",
"label": "PAT",
"defaultValue": "",
"description": "Personal Access Token.",
"required": "true"
},
"source": {
"type": "string",
"label": "AzureDevOps Project URI",
"defaultValue": "http://kamina.azuredevops.local/DefaultCollection/Kamina",
"description": "AzureDevOps Project URI.",
"required": "true"
},
"projectName": {
"type": "string",
"label": "AzureDevOps Project Name",
"defaultValue": "Kamina",
"description": "AzureDevOps Project Name.",
"required": "true"
},
"sourceBranch": {
"type": "string",
"label": "Git Source Branch",
"defaultValue": "develop",
"description": "The branch the builds will trigger",
"required": "true"
},
"lastKnownFile": {
"type": "string",
"label": "Sequence Location",
"defaultValue": "",
"description": "The location of the Build Order.",
"required": "true"
},
"maxErrorCycles": {
"type": "int",
"label": "Maximum Error Cycles",
"defaultValue": 10,
"description": "The number of fails allowed.",
"required": "true"
},
"infiniteCycles": {
"type": "string",
"label": "Infinite Cycles",
"defaultValue": "false",
"description": "Infinite Cycles - only ends until everything succeeds.",
"required": "true"
}
},
"inputs": [{
"name": "mode",
"type": "string",
"label": "Mode",
"defaultValue": "AzureDevOps",
"helpMarkDown": "Mode to be used.",
"required": "true"
},
{
"name": "apiKey",
"type": "string",
"label": "PAT",
"defaultValue": "",
"helpMarkDown": "Personal Access Token.",
"required": "true"
},
{
"name": "source",
"type": "string",
"label": "AzureDevOps Project URI",
"defaultValue": "http://kamina.azuredevops.local/DefaultCollection/Kamina",
"helpMarkDown": "AzureDevOps Project URI.",
"required": "true"
},
{
"name": "projectName",
"type": "string",
"label": "AzureDevOps Project Name",
"defaultValue": "Kamina",
"helpMarkDown": "AzureDevOps Project Name.",
"required": "true"
},
{
"name": "sourceBranch",
"type": "string",
"label": "Git Source Branch",
"defaultValue": "develop",
"helpMarkDown": "The branch the builds will trigger",
"required": "true"
},
{
"name": "lastKnownFile",
"type": "string",
"label": "Sequence Location",
"defaultValue": "",
"helpMarkDown": "The location of the Build Order.",
"required": "true"
},
{
"name": "maxErrorCycles",
"type": "int",
"label": "Maximum Error Cycles",
"defaultValue": 10,
"helpMarkDown": "The number of fails allowed.",
"required": "true"
},
{
"name": "infiniteCycles",
"type": "string",
"label": "Infinite Cycles",
"defaultValue": "false",
"helpMarkDown": "Infinite Cycles - only ends until everything succeeds.",
"required": "true"
}
],
"execution": {
"PowerShell": {
"target": "ping-pong-build-trigger.ps1",
"argumentFormat": ""
}
}
}
vss-extension.json
{
"manifestVersion": 1,
"id": "ping-pong-build-trigger-task",
"name": "Ping Pong Build Trigger",
"version": "1.0.0",
"publisher": "BrunoMartinsPro",
"targets": [{
"id": "Microsoft.VisualStudio.Services"
}],
"description": "Randomly trigger builds to find a sequenced build order",
"categories": [
"Azure Pipelines"
],
"icons": {
"default": "extensionIcon.png"
},
"files": [{
"path": "task"
}],
"contributions": [{
"id": "ping-pong-build-trigger",
"type": "ms.vss-distributed-task.task",
"targets": [
"ms.vss-distributed-task.tasks"
],
"properties": {
"name": "task"
}
}]
}
How can i use a custom task in both Services and Server?
The .vsix can be downloaded in the release page of the Github Repository: https://github.com/brunomartinspro/Ping-Pong-Build-Trigger-AzureDevOps
Developer Community: https://developercommunity.visualstudio.com/content/problem/715570/server-and-services-have-different-behavior.html
So it appears that there is some sort of cache mechanism in the extensions, i need 3 azure devops server editions to debug.
The first one was used for development, the second one also for development but uninstalled and installed again, the third one for testing public releases.
I couldn't find the physical directory of where the cache gets stored, if there is cache at all.
I am processing 2 different avro files:
avroConsumer:
{"namespace": "autoGenerated.avro",
"type": "record",
"name": "UserConsumer",
"fields": [
{"name": "Name", "type": "string"},
{"name": "Surname", "type":["null","string"],"default": null},
{"name": "favorite_number", "type": ["long", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
avroProducer:
{"namespace": "autoGenerated.avro",
"type": "record",
"name": "UserProducer",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
On compiling procedure a deserialization error occurs but I thought that defining the "default" attribute in the consumer should make it work correctly.
Reference: http://avro.apache.org/docs/current/spec.html#Schema+Resolution
if the reader's record schema has a field that contains a default
value, and writer's schema does not have a field with the same name,
then the reader should use the default value from its field.
Do you have some ideas? Can I define a different consumer avro file than the producer avro file?
We have a contextBroker version 1.0.0 and yesterday we had the below unexpected error.
log directory: '/var/log/contextBroker'
terminate called after throwing an instance of 'mongo::AssertionException'
what(): assertion src/mongo/bson/bsonelement.cpp:392
log directory:'/var/log/contextBroker'
Could someone please tell us why it could have happened?
The petition that we do is the below:
HttpUri=http://172.21.0.33:1026/v1/updateContext
HttpMethod=POST
Accept=application/json
Content-Type=application/json
Fiware-Service=tmp_015_adapter
Fiware-ServicePath=/Prueba/Planta_3
{
"contextElements": [{
"type": "device_reading",
"isPattern": "false",
"id": "xxxxx",
"attributes": [{
"name": "timestamp",
"type": "string",
"value": "2016-06-14T12:02:03.000Z"
}, {
"name": "location",
"type": "coords",
"value": "23.295132549930024, 2.1797946491494258"
}, {
"name": "mac",
"type": "string",
"value": "xxxxx"
}, {
"name": "densityPlans",
"type": "string",
"value": "R-B2"
}, {
"name": "floor",
"type": "string",
"value": "Prueba_planta3"
}, {
"name": "manufacturer",
"type": "string",
"value": "Xiaomi+Communications"
}, {
"name": "rssi",
"type": "string",
"value": "9"
}]
}],
"updateAction": "APPEND" }
The error that shows in the application because of contextBroker's error is:
org.apache.http.NoHttpResponseException: 172.21.0.33:1026 failed to respond