Getting null.txt file when using Cygnus HDFS sink - fiware-orion

I'm using Cygnus 0.5 with default configuration for HDFS sink. In order make it to run, I have deactivated the "ds" interceptor (otherwise I get an error at start time that precludes Cygnus to start, related with not finding the matching table file).
Cygnus seems to work, but the file in which entity information is stored in HDFS gets a weird name: "null.txt". How can I fix this?

First of all do no deactivate the DestinationExtractor interceptor. This is the piece of code infering the destination the context data notified by Orion is going to be persisted. Please observe the destination may refer to a HDFS file name, a MySQL table name or a CKAN resource name, it depends on the sinks you have configured. Once infered, the destination is added to the internal Flume event as a header called destination in order the sinks know where to persist. Thus, if deactivated, such a header is not found by the sinks and a null name is used as the destination name.
Regarding the "matching table file not found" problem you experienced (and which leaded you yo deactivate the Interceptor), it was due to the Cygnus configuration template had a bad default value for the cygnusagent.sources.http-source.interceptors.de.matching_table parameter. This has been solved in Cygnus 0.5.1.

A workaround while Cygnus 0.5.1 gets released is:
Do no deactivate the DestinatonExtractor (as #frb says in his answer)
Create an empty matching table file and use it for the matching_table configuration, i.e.: touch /tmp/dummy_table.conf then set in the cygnus configuration file: cygnusagent.sources.http-source.interceptors.de.matching_table = /tmp/dummy_table.conf

Related

Subject does not have subject-level compatibility configured

We use Kafka, Kafka connect and Schema-registry in our stack. Version is 2.8.1(Confluent 6.2.1).
We use Kafka connect's configs(key.converter and value.converter) with value: io.confluent.connect.avro.AvroConverter.
It registers a new schema for topics automatically. But there's an issue, AvroConverter doesn't specify subject-level compatibility for a new schema
and the error appears when we are trying to get config for the schema via REST API /config: Subject 'schema-value' does not have subject-level compatibility configured
If we specify the request parameter defaultToGlobal then global compatibility is returned. But it doesn't work for us because we cannot specify it in the request. We are using 3rd party UI: AKHQ.
How can I specify subject-level compatibility when registering a new schema via AvroConverter?
Last I checked, the only properties that can be provided to any of the Avro serializer configs that affect the Registry HTTP client are the url, whether to auto-register, and whether to use the latest schema version.
There's no property (or even method call) that sets either the subject level or global config during schema registration
You're welcome to check out the source code to verify this
But it doesn't work for us because we cannot specify it in the request. We are using 3rd party UI: AKHQ
Doesn't sound like a Connect problem. Create a PR for AKHQ project to fix the request
As of 2021-10-26, I used akhq 0.18.0 jar and confluent-6.2.0, the schema registry in akhq is working fine.
Note: I also used confluent-6.2.1, seeing exactly the same error. So, you may want to switch back to 6.2.0 to give a try.
P.S: using all only for my local dev env, VirtualBox, Ubuntu.
#OneCricketeer is correct.
There is no possibility to specify subject-level compatibility in AvroConverter unfortunately.
I see only two solutions:
Override AvroConverter to add property and functionality to send an additional request to API /config/{subject} after registering the schema.
Contribute to AKHQ to support defaultToGlobal parameter. But in this case, we also need to backport schema-registry RestClient. Github issue
The second solution is more preferable till the user would specify the compatibility level in the settings of the converter. Without this setting in the native AvroConverter, we have to use the custom converter for every client who writes a schema. And it makes a lot of effort.
For me, it looks strange why the client cannot set up the compatibility at the moment of registering the schema and has to use a different request for it.

Debezium Server and using variables in the application.properties file

I'm trying to get Debezium Server running so that I can use GCP (Google) PubSub, and not have to use Kafka and the Kafka connectors. I have it mostly running, however, I'm having trouble with the using variables in the tranforms section to define a Topic name.
According to the documentation, when using the Outbox transformation, I can choose the Topic name by using the variable ${routedByValue} for the setting route.topic.replacement and this will use the value that is determined by the setting route.by.field. If the replacement setting is omitted, it will use a default topic name of outbox.event.<route.by.field value>.
When I try to use this variable in the 'application.properties' file ...
debezium.transforms.outbox.route.by.field=aggregate_type
debezium.transforms.outbox.route.topic.replace=${routedByValue}
... the Debezium Server stops with a NoSuchElementException, saying it cannot expand routedByValue. If I omit that setting, it works fine and defines the topic name as outbox.event.<route.by.field value>.
How can I use this variable correctly in the 'applications.properties' file so I can customise the topic name (e.g. route.topic.replace=myservice.${routedByValue})?
The way I got this to work was to do the following ...
debezium.transforms.outbox.route.by.file=aggregate_type
debezium.transforms.outbox.route.topic.replacement=$1
I believe this works because omitted from the config is another setting - debezium.transforms.outbox.route.topic.regex - and this has a default value of - (?<routedByValue>.*).
If I understand the documentation correctly, the $1 refers to the first group in the regex expression. In my case, this will return whatever the value of aggregrate_type equates to.
I'm using Debezium Server 2.1 with Pulsar as sink type and the #Dazfl answer solve my issue !
debezium.transforms=outbox
debezium.transforms.outbox.type=io.debezium.transforms.outbox.EventRouter
debezium.transforms.outbox.route.topic.replacement=outbox.event.transactions.$1
Although the Debezium Server docs says to use $routedByValue, this do not works as expeceted...

Creating and using a custom kafka connect configuration provider

I have installed and tested kafka connect in distributed mode, it works now and it connects to the configured sink and reads from the configured source.
That being the case, I moved to enhance my installation. The one area I think needs immediate attention is the fact that to create a connector, the only available mean is through REST calls, this means I need to send my information through the wire, unprotected.
In order to secure this, kafka introduced the new ConfigProvider seen here.
This is helpful as it allows to set properties in the server and then reference them in the rest call, like so:
{
.
.
"property":"${file:/path/to/file:nameOfThePropertyInFile}"
.
.
}
This works really well, just by adding the property file on the server and adding the following config on the distributed.properties file:
config.providers=file # multiple comma-separated provider types can be specified here
config.providers.file.class=org.apache.kafka.common.config.provider.FileConfigProvider
While this solution works, it really does not help to easy my concerns regarding security, as the information now passed from being sent over the wire, to now be seating on a repository, with text on plain sight for everyone to see.
The kafka team foresaw this issue and allowed clients to produce their own configuration providers implementing the interface ConfigProvider.
I have created my own implementation and packaged in a jar, givin it the sugested final name:
META-INF/services/org.apache.kafka.common.config.ConfigProvider
and added the following entry in the distributed file:
config.providers=cust
config.providers.cust.class=com.somename.configproviders.CustConfigProvider
However I am getting an error from connect, stating that a class implementing ConfigProvider, with the name:
com.somename.configproviders.CustConfigProvider
could not be found.
I am at a loss now, because the documentation on their site is not explicit about how to configure custom config providers very well.
Has someone worked on a similar issue and could provide some insight into this? Any help would be appreciated.
I just went through these to setup a custom ConfigProvider recently. The official doc is ambiguous and confusing.
I have created my own implementation and packaged in a jar, givin it the sugested final name:
META-INF/services/org.apache.kafka.common.config.ConfigProvider
You could name the final name of jar whatever you like, but needs to pack to jar format which has .jar suffix.
Here is the complete step by step. Suppose your custom ConfigProvider fully-qualified name is com.my.CustomConfigProvider.MyClass.
1. create a file under directory: META-INF/services/org.apache.kafka.common.config.ConfigProvider. File content is full qualified class name:
com.my.CustomConfigProvider.MyClass
Include your source code, and above META-INF folder to generate a Jar package. If you are using Maven, file structure looks like this
put your final Jar file, say custom-config-provider-1.0.jar, under the Kafka worker plugin folder. Default is /usr/share/java. PLUGIN_PATH in Kafka worker config file.
Upload all the dependency jars to PLUGIN_PATH as well. Use the META-INFO/MANIFEST.MF file inside your Jar file to configure the 'ClassPath' of dependent jars that your code will use.
In kafka worker config file, create two additional properties:
CONNECT_CONFIG_PROVIDERS: 'mycustom', // Alias name of your ConfigProvider
CONNECT_CONFIG_PROVIDERS_MYCUSTOM_CLASS:'com.my.CustomConfigProvider.MyClass',
Restart workers
Update your connector config file by curling POST to Kafka Restful API. In Connector config file, you could reference the value inside ConfigData returned from ConfigProvider:get(path, keys) by using the syntax like:
database.password=${mycustom:/path/pass/to/get/method:password}
ConfigData is a HashMap which contains {password: 123}
If you still seeing ClassNotFound exception, probably your ClassPath is not setup correctly.
Note:
• If you are using AWS ECS/EC2, you need to set the worker config file by setting the environment variable.
• worker config and connector config file are different.

Kafka Connect - Missing Text

Kafka Version : 2.12-2.1.1
I created a very simple example to create a source and sink connector by using following commands :
bin\windows\connect-standalone.bat config\connect-standalone.properties config\connect-file-source.properties config\connect-file-sink.properties
Source file name : text_2.txt
Sink file name : test.sink_2.txt
A topic named "connect-test-2" is used and I created a consumer in PowerShell to show the result.
It works perfect at the first time. However, after i reboot my machine and start everything again. I found that some text are missing.
For example, when I type the characters below into test_2.txt file and save as following:
HAHAHA..
missing again
some text are missing
I am able to enter text
first letter is missing
testing testing.
The result windows (Consumer) and the sink file shows the following:
As you can see, some text are missing and i cannot find out why this happen. Any advice?
[Added information below]
connect-file-source.properties
name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=test_2.txt
topic=connect-test-2
connect-file-sink.properties
name=local-file-sink
connector.class=FileStreamSink
tasks.max=1
file=test.sink_2.txt
topics=connect-test-2
I think the strange behaviour is caused be the way you are modifying the sink file (text_2.txt).
How you applied changes after stoping the connector:
Using some editor <-- I think you use that method
Append only new characters to the end of file
FileStreamSource track changes based on the position in the file. You are using Kafka Connect in standalone mode so current position is written in /tmp/connect.offsets file.
If you modify source file using the editor, the whole content of the file has been changed. However FileStreamSource checks only if the size has change and poll characters, which offsets in the file is bigger then last processed by the Connector.
You should modify source file only by appending new characters to the end of the file

IBM Integration Bus: The PIF data could not be found for the specified application

I'm using IBM Integration Bus v10 (previously called IBM Message Broker) to expose COBOL routines as SOAP Web Services.
COBOL routines are integrated into IIB through MQ queues.
We have imported some COBOL copybooks as DFDL schemas in IIB, and the mapping between SOAP messages and DFDL messages is working fine.
However, when the message reaches a node where a serialization of the message tree has to take place (for example, a FileOutput or a MQ request), it fails with the following error:
"The PIF data could not be found for the specified application"
This is the last part of the stack trace of the exception:
RecoverableException
File:CHARACTER:F:\build\slot1\S000_P\src\DataFlowEngine\TemplateNodes\ImbOutputTemplateNode.cpp
Line:INTEGER:303
Function:CHARACTER:ImbOutputTemplateNode::processMessageAssemblyToFailure
Type:CHARACTER:ComIbmFileOutputNode
Name:CHARACTER:MyCustomFlow#FCMComposite_1_5
Label:CHARACTER:MyCustomFlow.File Output
Catalog:CHARACTER:BIPmsgs
Severity:INTEGER:3
Number:INTEGER:2230
Text:CHARACTER:Caught exception and rethrowing
Insert
Type:INTEGER:14
Text:CHARACTER:Kcilmw20Flow.File Output
ParserException
File:CHARACTER:F:\build\slot1\S000_P\src\MTI\MTIforBroker\DfdlParser\ImbDFDLWriter.cpp
Line:INTEGER:315
Function:CHARACTER:ImbDFDLWriter::getDFDLSerializer
Type:CHARACTER:ComIbmSOAPInputNode
Name:CHARACTER:MyCustomFlow#FCMComposite_1_7
Label:CHARACTER:MyCustomFlow.SOAP Input
Catalog:CHARACTER:BIPmsgs
Severity:INTEGER:3
Number:INTEGER:5828
Text:CHARACTER:The PIF data could not be found for the specified application
Insert
Type:INTEGER:5
Text:CHARACTER:MyCustomProject
It seems like something is missing in my deployable BAR file. It's important to say that my application has the message flow and it depends on a shared library that has all the .xsd files (DFDLs).
I suppose that the schemas are OK, as I've generated them using the Toolkit wizard, and the message parsing works well. The problem is only with serialization.
Does anybody know what may be missing here?
OutputRoot.Properties.MessageType must contain the name of the message in the DFDL schema. Additionally when the DFDL schema is in a shared library, OutputRoot.Properties.MessageSet must contain the name of the library.
Sounds as if OutputRoot.Properties is not pointing at the shared library. I cannot remember which subfield does that job - it is either OutputRoot.Properties.MessageType or OutputRoot.Properties.MessageSet.
You can easily check - just check the contents of InputRoot.Properties after an input node that has used the same shared libary.
Faced a similar problem. In my case, a message flow with an HttpRequest node using a DFDL domain parser / format to parse an HTTP response from the remote system threw this error (PIF data could not be found for the specified application). "Re-selecting" the same parser domain & message type on the node followed by build / redeploy solved the problem. Seemed to be a project reference related issue within the IIB toolkit.
you need to create static libraries and refer to application.
in compute node ur coding is based on dfdl body