How to enable custom stage library for streamsets data collector? - streamsets

I have custom stage library which I want to use in Streamsets Data collector pipeline, I have followed all the steps given in below link to install custom stage lib, but still I am not able search stage library data collector.
Could you please help ?
SS document link :
docs.streamsets.com/datacollector/latest/help/datacollector/UserGuide/Configuration/CustomStageLibraries.html#concept_pmc_jk1_1x

There are only a few things to check:
First, ensure you've exported the USER_LIBRARIES_DIR environment variable so SDC can find the custom stage libraries:
export USER_LIBRARIES_DIR="/opt/sdc-user-libs/"
Then give the custom stage library the appropriate permissions in the JVM's security manager. (Verify that you've added a section like this example to the $SDC_CONF/sdc-security.policy file):
// custom stage library directory
grant codebase "file:///opt/sdc-user-libs/-" {
permission java.security.AllPermission;
};
Verify that you've restarted SDC to pick up the previously-mentioned changes.
After restarting SDC, ensure the custom stage's jar files have been opened by the JVM:
lsof -p (sdc pid) | grep (jar file name)
The documentation can be found here

Related

de- serialize JSON metadata to .qvf using qlik sense API

I am aware of Qlik sense serialize app where we generate a JSON object containing metadata information of a .qvf file using Qlik sense API.
I want to do a reverse operation of this i.e generate .qvf file back from json metadata.
After many research just found this link github and it doesnot have a complete information.
Any solution would be helpfull.
Technically you cant create qvf directly from json. You'll have to create an empty qvf and then use various api to import the json.
Qlik have a very nice tool for un-build/build apps (and more). qlik-cli have dedicated commands for un-build/build:
If you are looking for something more "programmable" then ive create some enigma.js mixin for the same purpouse - enigma-mixin. I still need to perform more detailed testing there but it was working ok with simpler tests
Update 08/10/2021
Using qlik-cli
setup context
first unbuild an app:
qlik app unbuild --app 11111111-2222-3333-4444-555555555555
This will create new folder in the current folder named <app_name>-unbuild. The folder will contain all info about the app in json and/or yaml files
once these files are available then you can use them to build another app. Just to mention that the target app should exists before the build is ran:
qlik.exe app build --config ./config.yml --app 55555555-4444-3333-2222-111111111111
The above command will use all available files (specified in config.yml) and update the target app
If you dont want all files to be used and only want to update the data connections, for example, then the build command can be ran with different arguments:
qlik.exe app build --connections ./connections.yml --app 55555555-4444-3333-2222-111111111111
This command will only update the data connections in the target app and will not update anything else

Creating and using a custom kafka connect configuration provider

I have installed and tested kafka connect in distributed mode, it works now and it connects to the configured sink and reads from the configured source.
That being the case, I moved to enhance my installation. The one area I think needs immediate attention is the fact that to create a connector, the only available mean is through REST calls, this means I need to send my information through the wire, unprotected.
In order to secure this, kafka introduced the new ConfigProvider seen here.
This is helpful as it allows to set properties in the server and then reference them in the rest call, like so:
{
.
.
"property":"${file:/path/to/file:nameOfThePropertyInFile}"
.
.
}
This works really well, just by adding the property file on the server and adding the following config on the distributed.properties file:
config.providers=file # multiple comma-separated provider types can be specified here
config.providers.file.class=org.apache.kafka.common.config.provider.FileConfigProvider
While this solution works, it really does not help to easy my concerns regarding security, as the information now passed from being sent over the wire, to now be seating on a repository, with text on plain sight for everyone to see.
The kafka team foresaw this issue and allowed clients to produce their own configuration providers implementing the interface ConfigProvider.
I have created my own implementation and packaged in a jar, givin it the sugested final name:
META-INF/services/org.apache.kafka.common.config.ConfigProvider
and added the following entry in the distributed file:
config.providers=cust
config.providers.cust.class=com.somename.configproviders.CustConfigProvider
However I am getting an error from connect, stating that a class implementing ConfigProvider, with the name:
com.somename.configproviders.CustConfigProvider
could not be found.
I am at a loss now, because the documentation on their site is not explicit about how to configure custom config providers very well.
Has someone worked on a similar issue and could provide some insight into this? Any help would be appreciated.
I just went through these to setup a custom ConfigProvider recently. The official doc is ambiguous and confusing.
I have created my own implementation and packaged in a jar, givin it the sugested final name:
META-INF/services/org.apache.kafka.common.config.ConfigProvider
You could name the final name of jar whatever you like, but needs to pack to jar format which has .jar suffix.
Here is the complete step by step. Suppose your custom ConfigProvider fully-qualified name is com.my.CustomConfigProvider.MyClass.
1. create a file under directory: META-INF/services/org.apache.kafka.common.config.ConfigProvider. File content is full qualified class name:
com.my.CustomConfigProvider.MyClass
Include your source code, and above META-INF folder to generate a Jar package. If you are using Maven, file structure looks like this
put your final Jar file, say custom-config-provider-1.0.jar, under the Kafka worker plugin folder. Default is /usr/share/java. PLUGIN_PATH in Kafka worker config file.
Upload all the dependency jars to PLUGIN_PATH as well. Use the META-INFO/MANIFEST.MF file inside your Jar file to configure the 'ClassPath' of dependent jars that your code will use.
In kafka worker config file, create two additional properties:
CONNECT_CONFIG_PROVIDERS: 'mycustom', // Alias name of your ConfigProvider
CONNECT_CONFIG_PROVIDERS_MYCUSTOM_CLASS:'com.my.CustomConfigProvider.MyClass',
Restart workers
Update your connector config file by curling POST to Kafka Restful API. In Connector config file, you could reference the value inside ConfigData returned from ConfigProvider:get(path, keys) by using the syntax like:
database.password=${mycustom:/path/pass/to/get/method:password}
ConfigData is a HashMap which contains {password: 123}
If you still seeing ClassNotFound exception, probably your ClassPath is not setup correctly.
Note:
• If you are using AWS ECS/EC2, you need to set the worker config file by setting the environment variable.
• worker config and connector config file are different.

Azure batch Application package not getting copied to Working Directory of Task

I have created Azure Batch pool with Linux Machine and specified Application Package for the Pool.
My command line is
command='python $AZ_BATCH_APP_PACKAGE_scriptv1_1/tasks/XXX/get_XXXXX_data.py',
python3: can't open file '$AZ_BATCH_APP_PACKAGE_scriptv1_1/tasks/XXX/get_XXXXX_data.py':
[Errno 2] No such file or directory
when i connect to node and look at working directory non of the Application Package files are present there.
How do i make sure that files from Application Package are available in working directory or I can invoke/execute files under Application Package from command line ?
Make sure that your async operation have proper await in place before you start using the package in your code.
Also please share your design \ pseudo-code scenario and how you are approaching it as a design?
Further to add:
Seems like this one is pool level package.
The error seems like that the application env variable is either incorrectly used or there is some other user level issue. Please checkout linmk below and specially the section where use of env variable is mentioned.
This seems like user level issue because In case of downloading the package resource, if there will be an error it will be visible to you via exception handler or at the tool level is you are using batch explorer \ Batch-labs or code level exception handling.
https://learn.microsoft.com/en-us/azure/batch/batch-application-packages
Reason \ Rationale:
If the pool level or the task application has error, an error-list will come back if there was an error in the application package then it will be returned as the UserError or and AppPackageError which will be visible in the exception handle of the code.
Key you can always RDP into your node and checkout the package availability: information here: https://learn.microsoft.com/en-us/azure/batch/batch-api-basics#connecting-to-compute-nodes
I once created a small sample to help peeps around so this resource might help you to checkeout the use here.
Hope rest helps.
On Linux, the application package with version string is formatted as:
AZ_BATCH_APP_PACKAGE_{0}_{1}
On Windows it is formatted as:
AZ_BATCH_APP_PACKAGE_APPLICATIONID#version
Where 0 is the application name and 1 is the version.
$AZ_BATCH_APP_PACKAGE_scriptv1_1 will take you to the root folder where the application was unzipped.
Does this "exact" path exist in that location?
tasks/XXX/get_XXXXX_data.py
You can see more information here:
https://learn.microsoft.com/en-us/azure/batch/batch-application-packages
Edit: Just saw this question: "or can I invoke/execute files under Application Package from command line"
Yes you can invoke and execute files from the application package directory with the environment variable above.
If you type env on the node you will see the environment variables that have been set.

How do I query Spark JobServer and find where it stores my Jars?

I am trying to follow this documentation:
https://github.com/spark-jobserver/spark-jobserver#dependency-jars
Option 2 Listed in the docs says:
The dependent-jar-uris can also be used in job configuration param
when submitting a job. On an ad-hoc context this has the same effect
as dependent-jar-uris context configuration param. On a persistent
context the jars will be loaded for the current job and then for every
job that will be executed on the persistent context. curl -d ""
'localhost:8090/contexts/test-context?num-cpu-cores=4&memory-per-node=512m'
OK⏎ curl
'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample&context=test-context&sync=true'
-d '{ dependent-jar-uris = ["file:///myjars/deps01.jar", "file:///myjars/deps02.jar"], input.string = "a b c a b see" }' The
jars /myjars/deps01.jar & /myjars/deps02.jar (present only on the SJS
node) will be loaded and made available for the Spark driver &
executors.
Is "file:///myjars/" directory the SJS node's JAR directory or some custom directory?
I have a client on a Windows box and a Spark JobServer on a Linux box. Next, I upload a JAR to SJS node. SJS node puts that Jar somewhere. Then, when I call to start a Job and set the 'dependent-jar-uris', the SJS node will find my previously uploaded JAR and run the job:
"dependent-jar-uris" set to "file:///tmp/spark-jobserver/filedao/data/simpleJobxxxxxx.jar"
This works fine, but I had to manually go searching around the SJS node to find this location (e.g. file:///tmp/spark-jobserver/filedao/data/simpleJobxxxxxx.jar) and then add it into my future requests to start the job.
Instead, how to I make a REST call from the client to just get the path where Spark JobServer puts my jars when I uploaded them, so that I can set the file:/// path correctly in my 'dependent-jar-uris' property dynamically?
I don't think uploaded jars using "POST /jars" can be used in dependent-jar-uris. Since you are uploading jars, you already know the local path. Just use that.

Database script encounterd "AWKDBE018E Cannot access required JDBC Driver folder" in Workload Scheduler

I create a step of database script which access to SQL Database Service in Workload Scheduler Service. When I run the process, the step encountered the error below.
error message
AWKDBE018E Cannot access required JDBC Driver folder
message information
http://www-01.ibm.com/support/knowledgecenter/SSGSPN_9.2.0/com.ibm.tivoli.itws.doc_9.2/common/src_ms/awsmsawkdbe.htm?lang=en
AWKDBE018E Cannot access required JDBC Driver folder
Explanation
The job was not able to access a JDBC Driver folder, you might not
have enough permissions.
System action
The operation is not performed.
Operator response
Verify that you have enough permissions.
This message seems to ask me to grant the proper authority to the job user. But there is no property to specify the job user of Workload Automation Agent. I use a Workload Automation Agent provisioned by Bluemix automatically.
Could you teach me which parameters are needed ?
Database script step information
JDBC driver class path info
I checked the path by the following "ls -lR" command step's log.
it seems to have a problem with the agent, I tried to replicate the same job type but it is not working with the same error message (even using different solutions for jdbc driver path).
If you are using the Workload Automation Agent that is created for you then you could open a support ticket to have the Workload team look at that agent.
Edit after having support from service team:
in the jar classpath field for a predefined workload scheduler process you have to put only the path to the directory containing jar files, without putting the jar file name to use.
So, according to current Workload Scheduler documentation, you have to use the following value:
/home/wauser/utils
By this way the database script works fine.
(screenshot added)
It looks like it is having issues referencing the location to the JDBC class path for DB2. Can you please double check the location for the class path for the DB2 driver?
Even though old, I wanted to make some quick checks.
This is tested on a 9.5 FP1 dynamic agent, part of the container delivery. The path values are the standard values for the container.
Try 1 - full path - SUCCESS
<jsdldatabase:driverPath>/opt/wa/TWS/jdbcdrivers/db2/</jsdldatabase:driverPath>
= Status Message: Success
= Exit Status : 0
Try 2 - relative path - FAIL
<jsdldatabase:driverPath>./jdbcdrivers/db2/</jsdldatabase:driverPath>
Job status : FAIL
===============================================================
AWKDBE018E Cannot access required JDBC Driver folder
===============================================================
Try3 - variable in path - FAIL
<jsdldatabase:driverPath>${UNISONHOME}/jdbcdrivers/db2/</jsdldatabase:driverPath>
===============================================================
AWKDBE018E Cannot access required JDBC Driver folder
===============================================================
Try4 - variable in path - FAIL
<jsdldatabase:driverPath>$UNISONHOME/jdbcdrivers/db2/</jsdldatabase:driverPath>
===============================================================
AWKDBE018E Cannot access required JDBC Driver folder
===============================================================
So put short you need an absolute path into that parameter.
BUT, you can set the path in a config file global to the agent
Try5 - variable in agent config -
Inside IWSDATA Home : wadata/JavaExt/cfg/DatabaseJobExecutor.properties, write the following line
jdbcDriversPath=/opt/wa/TWS/jdbcdrivers
then remove the xml element about driver from the job, so no line
<jsdldatabase:driverPath>/opt/wa/TWS/jdbcdrivers/db2/</jsdldatabase:driverPath>
===============================================================
= Exit Status : 0
Note that in this case the jdbcdrivers/db2 is not needed. It will search for subdirectories.