Weka: doing bagging from the command line - command-line

I can train a model using Bagging from the command line like this --
java -Xmx512m -cp $CLASSPATH weka.classifiers.meta.Bagging -P 100 -S 1 -num-slots 1 -I 10 \
-split-percentage 66 \
-t $traindata \
-d $model \
-W weka.classifiers.trees.REPTree -- -M 2 -V 0.001 -N 3 -S 1 -L -1 -I 0.0 \
> $out
But I can't reuse the same model to do prediction from the command line. I guess the command should be something like --
java -Xmx512m -cp $CLASSPATH weka.classifiers.meta.Bagging \
-l $model \
-T $testdata \
-W weka.classifiers.trees.REPTree \
-p 0 \
> $wkresult
But it does not work, any idea?
EDIT: However, when I am doing with a single classifier (i.e. no bagging), it works. The commands were like this --
java -Xmx512m -cp $CLASSPATH weka.classifiers.bayes.NaiveBayesMultinomial \
-split-percentage 66 \
-t $traindata \
-d $model \
> $out
java -Xmx512m -cp $CLASSPATH weka.classifiers.bayes.NaiveBayesMultinomial \
-T $testdata \
-l $model \
-p 0 \
> $wkresult

You need to call a different class to evaluate the model. The command line should be something like
java -cp $CLASSPATH weka.classifiers.Evaluation weka.classifiers.meta.Bagging \
-T $testdata -l $model
You may need to specify some of the additional options you gave when training the classifier. Also have a look at the commandline options for the evaluation class. More information here.

Related

How to specify whether a connector is a source or a sink?

I am currently configuring kafka connect (with debezium/connect docker image), I successfully connected it to Kafka using environment variables:
docker run -it --rm --name AAAAAA-kafka-connect -p 8083:8083 \
-v aaaaa.jks:aaaaa.jks \
-v bbbbbb.jks:bbbbbb.jks \
-e LOG_LEVEL=INFO \
-e HOST_NAME="AAAAAA-kafka-connect" \
-e HEAP_OPTS="-Xms256m -Xmx2g" \
-e BOOTSTRAP_SERVERS="BBBBB:9092" \
-e CONNECT_CLIENT_ID="xxx-kafka-connect" \
-e CONNECT_SASL_JAAS_CONFIG="org.apache.kafka.common.security.scram.ScramLoginModule required username=\"...\" password=\"...\";" \
-e CONNECT_SECURITY_PROTOCOL="SASL_SSL" \
-e CONNECT_SASL_MECHANISM="PLAIN" \
-e CONNECT_SSL_TRUSTSTORE_LOCATION="bbbbbb.jks" \
-e CONNECT_SSL_TRUSTSTORE_PASSWORD="..." \
-e CONNECT_SSL_KEYSTORE_LOCATION="aaaaa.jks" \
-e CONNECT_SSL_KEYSTORE_PASSWORD="..." \
-e GROUP_ID="XXX.grp.kafka.connect" \
-e CONFIG_STORAGE_TOPIC="XXX.connect.configs.v1" \
-e OFFSET_STORAGE_TOPIC="XXX.connect.offsets.v1" \
-e STATUS_STORAGE_TOPIC="XXX.connect.statuses.v1" \
quay.io/debezium/connect:1.9
Now I have to create a source connector (posgresql db) and I want the data kafka connect will grab from the source to be sink in a kafka topic.
Where do I have to set the kafka configuration of the sink since there is no such config in the json config of the database connector?
Have I to create a sink connector to the kafka topic? if so, where do we specify if this is a sink or a source connector??
PS: I already have created the kafka topic where i want to put datas in
Feel free to ask questions
Environment variables only modify the client parameters.
Source and Sinks are determined when you actually create the connector. You need a JSON config and it will have a connector.class.
In Kafka API there is SinkTask and SourceTask.
Debezium is always a Source. Sources write to Kafka; that doesn't make Kafka a sink. You need to install a new connector plugin to get a sink for your database, such as the JDBC Connector from Confluent which has classes for both sources and sinks.
ok, you have to add the CONNECT_PRODUCER_* or CONNECT_CONSUMER_* environment variables to specify the config of source or sink !!!!!!
Like this:
docker run -it --rm --name AAAAAA-kafka-connect -p 8083:8083 \
-v aaaaa.jks:aaaaa.jks \
-v bbbbbb.jks:bbbbbb.jks \
-e LOG_LEVEL=INFO \
-e HOST_NAME="AAAAAA-kafka-connect" \
-e HEAP_OPTS="-Xms256m -Xmx2g" \
-e BOOTSTRAP_SERVERS="BBBBB:9092" \
-e CONNECT_CLIENT_ID="xxx-kafka-connect" \
-e CONNECT_SASL_JAAS_CONFIG="org.apache.kafka.common.security.scram.ScramLoginModule required username=\"...\" password=\"...\";" \
-e CONNECT_SECURITY_PROTOCOL="SASL_SSL" \
-e CONNECT_SASL_MECHANISM="PLAIN" \
-e CONNECT_SSL_TRUSTSTORE_LOCATION="bbbbbb.jks" \
-e CONNECT_SSL_TRUSTSTORE_PASSWORD="..." \
-e CONNECT_SSL_KEYSTORE_LOCATION="aaaaa.jks" \
-e CONNECT_SSL_KEYSTORE_PASSWORD="..." \
-e GROUP_ID="XXX.grp.kafka.connect" \
-e CONFIG_STORAGE_TOPIC="XXX.connect.configs.v1" \
-e OFFSET_STORAGE_TOPIC="XXX.connect.offsets.v1" \
-e STATUS_STORAGE_TOPIC="XXX.connect.statuses.v1" \
-e CONNECT_PRODUCER_TOPIC_CREATION_ENABLE=false \
-e CONNECT_PRODUCER_SASL_JAAS_CONFIG="org.apache.kafka.common.security.scram.ScramLoginModule required username=\"...\" password=\"...\";" \
-e CONNECT_PRODUCER_SECURITY_PROTOCOL="SASL_SSL" \
-e CONNECT_PRODUCER_SASL_MECHANISM="PLAIN" \
-e CONNECT_PRODUCER_SSL_TRUSTSTORE_LOCATION="bbbbbb.jks" \
-e CONNECT_PRODUCER_SSL_TRUSTSTORE_PASSWORD="..." \
-e CONNECT_PRODUCER_SSL_KEYSTORE_LOCATION="aaaaa.jks" \
-e CONNECT_PRODUCER_SSL_KEYSTORE_PASSWORD="..." \
-e CONNECT_PRODUCER_CLIENT_ID="xxx-kafka-connect" \
-e CONNECT_PRODUCER_TOPIC_CREATION_ENABLE=false \
quay.io/debezium/connect:1.9
the sink or source property comes from the connector.class used in the json definition of the connector. However, Debeziums CDC connectors can only be used as a source connector that captures real-time event change records from external database systems (https://hevodata.com/learn/debezium-vs-kafka-connect/#:~:text=Debezium%20platform%20has%20a%20vast,records%20from%20external%20database%20systems.)

Create protocol mapper in Keycloak using kcadm.sh

From Add protocol-mapper to keycloak using kcadm.sh
Has anyone figured this out yet? I tried it the way Oscar suggested and it still does not work.
The lines that are not commented work perfectly.
The lines that are commented do not work. I get an error that says "./clientmapper.sh: 59 (or whatever line number that I have uncommented): -s: not found"
sudo docker exec $keycontainer /opt/jboss/keycloak/bin/kcadm.sh create \
clients/$cid/protocol-mappers/models \
-r myrealm \
-s name=roles \
-s protocol=openid-connect \
-s protocolMapper=oidc-usermodel-attribute-mapper
#-s 'config."id.token.claim"=true' \
#-s claim.name=roles \
#-s jsonType.label=String \
#-s multivalued=true \
#-s userinfo.token.claim=true \
#-s access.token.claim=true
I made this work by formatting as Oscar suggested and using -i after the docker exec command. It works perfectly now.
sudo docker exec -i $keycontainer /opt/jboss/keycloak/bin/kcadm.sh create \
clients/$cid/protocol-mappers/models \
-r testrealm \
-s name=testmap \
-s protocol=openid-connect \
-s protocolMapper=oidc-usermodel-realm-role-mapper \
-s 'config."id.token.claim"=true' \
-s 'config."claim.name"=testmap' \
-s 'config."jsonType.label"=String' \
-s 'config."multivalued"=true' \
-s 'config."userinfo.token.claim"=true' \
-s 'config."access.token.claim"=true'

Add protocol-mapper to keycloak using kcadm.sh

I have been trying to setup my full test system in keycloak using the kcadmin cli, but I have some problems creating protocol mappers:
HTTP error - 400 Bad Request
I have been trying to implement a request using:
http://www.keycloak.org/docs-api/3.3/rest-api/index.html
http://blog.keycloak.org/2017/01/administer-keycloak-server-from-shell.html
Am I missing something in the request:
/opt/jboss/keycloak/bin/kcadm.sh create \
clients/7e8ef93b-0d0f-487d-84a5-5cfaee7ddf13/protocol-mappers/models \
-r $test_realm \
-s config.user.attribute=tenants \
-s config.claim.name=tenants \
-s config.jsonType.label=String \
-s config.id.token.claim=true \
-s config.access.token.claim=true \
-s config.userinfo.token.claim=true \
-s config.multivalued=true \
-s name=tenants \
-s protocolMapper=oidc-usermodel-attribute-mapper
This works:
/opt/jboss/keycloak/bin/kcadm.sh create \
clients/7e8ef93b-0d0f-487d-84a5-5cfaee7ddf13/protocol-mappers/models \
-r $test_realm \
-s name=tenants1 \
-s protocol=openid-connect \
-s protocolMapper=oidc-usermodel-attribute-mapper
You need to specify nested config values like this in Linux:
-s 'config."id.token.claim"=true'
-s 'config."included.client.audience"=theclient'
In the failing example the following value is missing:
-s protocol=openid-connect

How to suppress INFO messages when running psql scripts

I'm seeing INFO messages when I run my tests and I thought that I had gotten rid of them by setting the client_min_messages PGOPTION. Here's my command:
PGOPTIONS='--client-min-messages=warning' \
psql -h localhost \
-p 5432 \
-d my_db \
-U my_user \
--no-align \
--field-separator '|' \
--pset footer \
--quiet \
-v AUTOCOMMIT=off \
-X \
-v VERBOSITY=terse \
-v ON_ERROR_STOP=1 \
--pset pager=off \
-f tests/test.sql \
-o "$test_results"
Can someone advise me on how to turn off the INFO messages?
This works for me: Postgres 9.1.4 on Debian GNU Linux with bash:
env PGOPTIONS='-c client_min_messages=WARNING' psql ...
(Still works for Postgres 12 on Ubuntu 18.04 LTS with bash.)
It's also what the manual suggests. In most shells, setting environment variables also works without an explicit leading env. See maxschlepzig's comment.
Note, however, that there is no message level INFO for client_min_messages.
That's only applicable to log_min_messages and log_min_error_statement.

Varnish DAEMON_OPTS Options Errors

When using inline C with Varnish I've not been able to get /etc/varnish/default
to be happy at startup.
I've tested inline C with varnish for two things: GeoIP detection and Anti-Site-Scraping functions.
The DAEMON_OPTS always complains even though I'm following what other seem
to indicate works fine.
My problem is that this command line start up works:
varnishd -f /etc/varnish/varnish-default.conf -s file,/var/lib/varnish/varnish_storage.bin,512M -T 127.0.0.1:2000 -a 0.0.0.0:8080 -p 'cc_command=exec cc -fpic -shared -Wl,-x -L/usr/include/libmemcached/memcached.h -lmemcached -o %o %s'
But it errors out with trying to start up from default start scripts:
/etc/default/varnish has this in it:
DAEMON_OPTS="-a :8080 \
-T localhost:2000 \
-f /etc/varnish/varnish-default.conf \
-s file,/var/lib/varnish/varnish_storage.bin,512M \
-p 'cc_command=exec cc -fpic -shared -Wl,-x -L/usr/include/libmemcached/memcached.h -lmemcached -o %o %s'"
The error is:
# /etc/init.d/varnish start
Starting HTTP accelerator: varnishd failed!
storage_file: filename: /var/lib/varnish/vbox.local/varnish_storage.bin size 512 MB.
Error:
Unknown parameter "'cc_command".
If I try change the last line to:
-p cc_command='exec cc -fpic -shared -Wl,-x -L/usr/include/libmemcached/memcached.h -lmemcached -o %o %s'"
It's error is now:
# /etc/init.d/varnish start
Starting HTTP accelerator: varnishd failed!
storage_file: filename: /var/lib/varnish/vbox.local/varnish_storage.bin size 512 MB.
Error: Unknown storage method "hared"
It's trying to interpret the '-shared' as -s hared and 'hared' is not a storage type.
For both GeoIP and the Anti-Site-Scrape I've used the exact recommended daemon options
plus have tried all sorts of variations like adding ' and '' but no joy.
Here is a link to the instruction I've followed that work fine except the DAEMON_OPTS part.
http://drcarter.info/2010/04/how-fighting-against-scraping-using-varnish-vcl-inline-c-memcached/
I'm using Debian and the exact DAEMON_OPTS as stated in the instructions.
Can anyone help with a pointer on what's going wrong here?
Even if Jacob will probably never read this, visitors from the future might appreciate what I'm going to write.
I believe I know what's wrong, and it looks like a Debian-specific problem, at least verified on Ubuntu 11.04 and Debian Squeeze.
I traced the execution from my /etc/default/varnish that contains the $DAEMON_OPTS to the init script.
In the init script /etc/init.d/varnish, the start_varnishd() function is:
start_varnishd() {
log_daemon_msg "Starting $DESC" "$NAME"
output=$(/bin/tempfile -s.varnish)
if start-stop-daemon \
--start --quiet --pidfile ${PIDFILE} --exec ${DAEMON} -- \
-P ${PIDFILE} ${DAEMON_OPTS} > ${output} 2>&1; then
log_end_msg 0
else
log_end_msg 1
cat $output
exit 1
fi
rm $output
}
So I modified it to print the full start-stop-daemon command line, like:
start_varnishd() {
log_daemon_msg "Starting $DESC" "$NAME"
output=$(/bin/tempfile -s.varnish)
+ echo "start-stop-daemon --start --quiet --pidfile ${PIDFILE} --exec ${DAEMON} -- -P ${PIDFILE} ${DAEMON_OPTS} > ${output} 2>&1"
if start-stop-daemon \
--start --quiet --pidfile ${PIDFILE} --exec ${DAEMON} -- \
-P ${PIDFILE} ${DAEMON_OPTS} > ${output} 2>&1; then
log_end_msg 0
So I got a command line echoed on STDOUT, and copied-pasted it into my shell. And, surprise! It worked. WTF?
Repeated again to be sure. Yes, it works. Mmh. Could it be another of those bash/dash corner cases?
Let's try feeding the start-stop-daemon command line to bash, and see how it reacts:
start_varnishd() {
log_daemon_msg "Starting $DESC" "$NAME"
output=$(/bin/tempfile -s.varnish)
if bash -c "start-stop-daemon \
--start --quiet --pidfile ${PIDFILE} --exec ${DAEMON} -- \
-P ${PIDFILE} ${DAEMON_OPTS} > ${output} 2>&1"; then
log_end_msg 0
else
log_end_msg 1
cat $output
exit 1
fi
rm $output
}
Yes, it works just fine, at least for my case.
Here's the relevant part of my /etc/default/varnish:
...
## Alternative 2, Configuration with VCL
#
# Listen on port 6081, administration on localhost:6082, and forward to
# one content server selected by the vcl file, based on the request. Use a 1GB
# fixed-size cache file.
#
DAEMON_OPTS="-a :6081 \
-T localhost:6082 \
-f /etc/varnish/geoip-example.vcl \
-S /etc/varnish/secret \
-s malloc,100M \
-p 'cc_command=exec cc -fpic -shared -Wl,-x -L/usr/include/GeoIP.h -lGeoIP -o %o %s'"
...
I've seen posts where someone tried to work around this problem by moving the compile command into a separated shell script. Unfortunately that doesn't change the fact that start-stop-daemon is going to pass the $DAEMON_OPTS var through dash, and that will result in mangled options.
Would be something along the lines of:
-p 'cc_command=exec /etc/varnish/compile.sh %o %s'"
And then the compile.sh script as:
#!/bin/sh
cc -fpic -shared -Wl,-x -L/usr/include/GeoIP.h -lGeoIP -o $#
but it doesn't work, so just patch your init scripts, and you're good to go!
Hope you can find this information useful.
You can try using :-
DAEMON_OPTS="-a :8080 \
-T localhost:2000 \
-f /etc/varnish/varnish-default.conf \
-s file,/var/lib/varnish/varnish_storage.bin,512M \
-p cc_command='exec cc -fpic -shared -Wl,-x -L/usr/include/libmemcached/memcached.h -lmemcached -o %o %s'"
Obviously, your startup script interpreting the DAEMON_OPTS is not prepared for whitespace (even within single quotes). At my Fedora (15) installation, the suggested solution works fine; the arguments get interpreted correctly because the "$*" bash parameter is passed in /etc/init.d/varnish and in /etc/init.d/functions in daemon().
Did you get your startup scripts from a package or did you make custom scripts?
This isn't directly related to the question, but you may find yourself here if you are working through the Varnish Tutorial - Put Varnish on port 80.
For recent installs of Varnish on Debian systems the configuration for varnishd startup options can be found in /etc/systemd/system/multi-user.target.wants/varnish.service. The documented way of changing the port via /etc/default/varnish still exists, but is no longer functional unless you change your system to use init scripts rather than systemd.
After you've changed your options in /etc/systemd/system/multi-user.target.wants/varnish.service, don't forget to run systemctl daemon-reload, which will catalog the changes for executing the program.