How do I pass key/value config settings when submitting a Job to Snappy Job Server? - spark-jobserver

I have a job that loads a data file from a different location each time. I'd like to submit the same job JAR and just pass a different location to it using the Config.java parameter of the runJavaJob() API.
I do not see a way to pass key/value configuration to the snappy-job.sh Usage.
How would I do this?

You can set the key value pairs as part of APP_PROPS environment setting before firing the snappy-job.sh. We have demonstrated it in our Getting Started section.
$ export APP_PROPS="consumerKey=,consumerSecret=,accessToken=,accessTokenSecret="
$ ./bin/snappy-job.sh submit --lead localhost:8090 --app-name TwitterPopularTagsJob --class io.snappydata.examples.TwitterPopularTagsJob --app-jar ./lib/quickstart-0.2.1-PREVIEW.jar --stream

Related

Overriding Jmeter property in Run Taurus task Azure pipeline is not working

I am running jmeter from Taurus and I need a output kpi.jtl file with url listing.
I have tried passing parameter -o modules.jmeter.properties.save.saveservice.url='true' and
-o modules.jmeter.properties="{'jmeter.save.saveservice.url':'true'}". Pipeline is running successfully but the kpi.jtl doesn't have the url. Please help
I have tried few more options like editing jmeter.properties via pipeline - which broke the pipeline and expecting input from user
user.properities- Which is ineffective.
I am expecting kpi.jtl file with all the possible logs especially url.
I believe you're using the wrong property, you should pass the next one:
modules.jmeter.csv-jtl-flags.url=true
More information: CSV file content configuration
However be informed that having a.jtl file "with all possible logs" is some form of a performance anti-pattern as it creates massive disk IO and may ruin your test. More information: 9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure

Access agent hostname for a build variable

I've got release pipelines defined that have worked. I've got a config transform that will write a API url to a config file (currently with a hardcoded api url).
What I'd like to do is be able to have the config be re-written based on the agent its being deployed on.
eg. if the machine being deployed to is TEST-1, I'd like to write https://TEST-1.somedomain.com/api into a config using that transform step.
The .somedomain.com/api can be static.
I've tried modifying the pipeline variable's value to be https://${{Environment.Name}}.somedomain.com/api, but it just replaces the API_URL in the config with that literal string (does not populate machine name in that variable).
Being that variables are the source of value that is being written to configs during the transform, I'm struggling to see another way to do this.
some gotchas
Using non yaml pipeline definitions (I know I saw people put logic in variable definitions within yaml pipelines)
Can't just use localhost, as the configuration is being read into a javascript rich app that would have js trying to connect to localhost vs trying to connect to the server.
I'm interested in any ways I could solve this problem
${{Environment.Name}} is not valid syntax for either YAML or classic pipelines.
In classic pipelines it would be $(Environment.Name).
In YAML, $(Environment.Name) or ${{ variables['Environment.Name'] }} would work.

Error in Google Cloud Shell Commands while working on the lab (Securing Google Cloud with CFT Scorecard)

I am working in a GCP lab (Securing Google Cloud with CFT Scorecard). All instructions for the lab are given.
First I have to run the following two commands to set environment variables
export GOOGLE_PROJECT=$DEVSHELL_PROJECT_ID
export CAI_BUCKET_NAME=cai-$GOOGLE_PROJECT
In the second command given above I don't know what to replace with my own credentials? May be that is the reason I am getting error.
Now I have to enable the "cloudasset.googleapis.com" gcloud service. For this they gave the following command.
gcloud services enable cloudasset.googleapis.com \
--project $GOOGLE_PROJECT
Error for this is given in the screeshot attached herewith:
Error in the serviec enabling command
Next step is to clone the policy: The given command for that is:
git clone https://github.com/forseti-security/policy-library.git
After that they said: "You realize Policy Library enforces policies that are located in the policy-library/policies/constraints folder, in which case you can copy a sample policy from the samples directory into the constraints directory".
and gave this command:
cp policy-library/samples/storage_blacklist_public.yaml policy-library/policies/constraints/
On running this command I received this:
error on running the directory command
Finally they said "Create the bucket that will hold the data that Cloud Asset Inventory (CAI) will export" and gave the following command:
gsutil mb -l us-central1 -p $GOOGLE_PROJECT gs://$CAI_BUCKET_NAME
I am confused in where to replace my own credentials like in the place of project_Id I wrote my own project id.
Also I don't know these errors are ocurring. Kindly help me.
I'm unable to access the tutorial.
What happens if you run the following:
echo ${DEVSHELL_PROJECT_ID}
I suspect you'll get an empty result because I think this environment variable isn't actually set.
I think it should be:
echo ${DEVSHELL_GCLOUD_CONFIG}
Does that return a result?
If so, perhaps try using that variable instead:
export GOOGLE_PROJECT=${DEVSHELL_GCLOUD_CONFIG}
export CAI_BUCKET_NAME=cai-${GOOGLE_PROJECT}
It's not entirely clear to me why this tutorial is using this approach but, if the above works, it may get you further along.
We're you asked to create a Google Cloud Platform project?
As per the shared error, this seems to be because your env variable GOOGLE_PROJECT is not set. You can verify it by using echo $GOOGLE_PROJECT and seeing whether it returns the project ID or not. You could also use echo $DEVSHELL_PROJECT_ID. If that returns the project ID and the former doesn't, it means that you didn't export the variable as stated at the beginning.
If the problem is that GOOGLE_PROJECT doesn't have any value, there are different approaches on how to solve it.
Set the env variable as you explained at the beginning. Obviously this will only work if the variable DEVSHELL_PROJECT_ID is also set.
export GOOGLE_PROJECT=$DEVSHELL_PROJECT_ID
Manually set the project ID into that variable. This is far from ideal because in Qwiklabs they create a new temporal project on every lab, so this would've only worked if you were still on that project. The project ID can be seen on both of your shared screenshots.
export GOOGLE_PROJECT=qwiklabs-gcp-03-c6e1787dc09e
Avoid using the argument --project. According to the documentation, the aforementioned argument is optional and if none is used the command will take the one by default, which will be on the configuration settings. You can get the current project by using this:
gcloud config get-value project
If the previous command matches the project ID you want to use, you can simply issue the following command:
gcloud services enable cloudasset.googleapis.com
Notice that the project ID is not being explicitly mentioned using --project.
Regarding your issue with the GitHub file, I have checked the repository and the file storage_blacklist_public.yaml doesn't seem to be in the directory policy-library/samples. There seems to be a trace that it was once there, but it isn't anymore, they should probably update the lab as it isn't anymore.
About your credentials confusion, you don't have to use your own project ID, just the one given on your lab. If I recall properly all the needed data should be on the left side of the lab. Still, you shouldn't need to authenticate in a normal situation as you are already logged in your temporal project if you are accessing it form the Cloud Shell, which is where you should be doing all this.
Adding this for the later versions
in the gcloud shell you can set a temp variable for the current project id with
PROJECT_ID="$(gcloud config get-value project)"
then use like
--project ${PROJECT_ID}

Use log4j to log message in liberty console

Our log server consumes our log messages through kubernetes pods sysout formatted in json and indexes json fields.
We need to specify some predefined fields in messages, so that we can track transactions across pods.
For one of our pod we use Liberty profile and have issue to configure logging for these needs.
One idea was to use log4j to send customized json message in console. But all message are corrupted by Liberty log system that handles and modifies all logs done in console. I failed to configure Liberty logging parameters (copySystemStreams = false, console log level = NO) for my needs and always have liberty modify my outputs and interleaved non json messages.
To workaround all that I used liberty consoleFormat="json" logging parameter, but this introduced unnecessary fields and also do not allow me to specify my custom fields.
Is it possible to control liberty logging and console ?
What is the best way to do my use case with Liberty (and if possible Log4j)
As you mentioned, Liberty has the ability to log to console in JSON format [1]. The two problems you mentioned with that, for your use case, are 1) unnecessary fields, and 2) did not allow you to specify your custom fields.
Regarding unnecessary fields, Liberty has a fixed set of fields in its JSON schema, which you cannot customize. If you find you don't want some of the fields I can think of a few options:
use Logstash.
Some log handling tools, like Logstash, allow you to remove [2] or mutate [3] fields. If you are sending your logs to Logstash you could adjust the JSON to your needs that way.
change the JSON format Liberty sends to stdout using jq.
The default CMD (from the websphere-liberty:kernel Dockerfile) is:
CMD ["/opt/ibm/wlp/bin/server", "run", "defaultServer"]
You can add your own CMD to your Dockerfile to override that as follows (adjust jq command as needed):
CMD /opt/ibm/wlp/bin/server run defaultServer | grep --line-buffered "}" | jq -c '{ibm_datetime, message}'
If your use case also requires sending log4J output to stdout, I would suggest changing the Dockerfile CMD to run a script you add to the image. In that script you would need to tail your log4J log file as follows (this could be combined with the above advice on how to change the CMD to use jq as well)
`tail -F myLog.json &`
`/opt/ibm/wlp/bin/server run defaultServer`
[1] https://www.ibm.com/support/knowledgecenter/en/SSEQTP_liberty/com.ibm.websphere.wlp.doc/ae/rwlp_logging.html
[2] https://www.elastic.co/guide/en/logstash/current/plugins-filters-prune.html
[3] https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html
Just in case it helps, I ran into the same issue and the best solution I found was:
Convert app to use java.util.Logging (JUL)
In server.xml add <logging consoleSource="message,trace" consoleFormat="json" traceSpecification="{package}={level}"/> (swap package and level as required).
Add a bootstrap.properties that contains com.ibm.ws.logging.console.format=json.
This will give you consistent server and application logging in JSON. A couple of lines at the boot of the server are not json but that was one empty line and a "Launching defaultServer..." line.
I too wanted the JSON structure to be consistent with other containers using Log4j2 so, I followed the advice from dbourne above and add jq to my CMD in my dockerfile to reformat the JSON:
CMD /opt/ol/wlp/bin/server run defaultServer | stdbuf -o0 -i0 -e0 jq -crR '. as $line | try (fromjson | {level: .loglevel, message: .message, loggerName: .module, thread: .ext_thread}) catch $line'
The stdbuf -o0 -i0 -e0 stops pipe ("|") from buffering its output.
This strips out the liberty specific json attributes, which is either good or bad depending on your perspective. I don't need to new values so I don't have a good recommendation for that.
Although the JUL API is not quite as nice as Log4j2 or SLF4j, it's very little code to wrap the JUL API in something closer to Log4j2 E.g. to have varargs rather than an Object[].
OpenLiberty will also dynamically change logging if you edit the server.xml so, it pretty much has all the necessary bits; IMHO.

How do I query Spark JobServer and find where it stores my Jars?

I am trying to follow this documentation:
https://github.com/spark-jobserver/spark-jobserver#dependency-jars
Option 2 Listed in the docs says:
The dependent-jar-uris can also be used in job configuration param
when submitting a job. On an ad-hoc context this has the same effect
as dependent-jar-uris context configuration param. On a persistent
context the jars will be loaded for the current job and then for every
job that will be executed on the persistent context. curl -d ""
'localhost:8090/contexts/test-context?num-cpu-cores=4&memory-per-node=512m'
OK⏎ curl
'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample&context=test-context&sync=true'
-d '{ dependent-jar-uris = ["file:///myjars/deps01.jar", "file:///myjars/deps02.jar"], input.string = "a b c a b see" }' The
jars /myjars/deps01.jar & /myjars/deps02.jar (present only on the SJS
node) will be loaded and made available for the Spark driver &
executors.
Is "file:///myjars/" directory the SJS node's JAR directory or some custom directory?
I have a client on a Windows box and a Spark JobServer on a Linux box. Next, I upload a JAR to SJS node. SJS node puts that Jar somewhere. Then, when I call to start a Job and set the 'dependent-jar-uris', the SJS node will find my previously uploaded JAR and run the job:
"dependent-jar-uris" set to "file:///tmp/spark-jobserver/filedao/data/simpleJobxxxxxx.jar"
This works fine, but I had to manually go searching around the SJS node to find this location (e.g. file:///tmp/spark-jobserver/filedao/data/simpleJobxxxxxx.jar) and then add it into my future requests to start the job.
Instead, how to I make a REST call from the client to just get the path where Spark JobServer puts my jars when I uploaded them, so that I can set the file:/// path correctly in my 'dependent-jar-uris' property dynamically?
I don't think uploaded jars using "POST /jars" can be used in dependent-jar-uris. Since you are uploading jars, you already know the local path. Just use that.