Setup and running examples in Oryx2 - apache-kafka

I have a CDH5.5 installation and I want to run some oryx2 examples within my virtual machine.
I've already downloaded and compiled oryx2 from github successfully. I've copied the example app to my ORYX_HOME/deploy/bin folder where oryx-run.sh is placed. I've also added the wordcount-example.conf and add a oryx.conf file from the als one (I pointed to my kafka-brokers and zk-servers within it).
I tried to setup Kafka and/or run some examples but I always get the same error:
> ./oryx-run.sh kafka-setup --layer-jar ../oryx-batch/target/oryx-batch-2.2.0-SNAPSHOT.jar
Can't find kafka scripts like kafka-topics
> ./oryx-run.sh batch --conf wordcount-example.conf --app-jar myapp.jar --layer-jar ../oryx-batch/target/oryx-batch-2.2.0-SNAPSHOT.jar
Can't find kafka scripts like kafka-topics
I've tried copying kafka script to the same oryx-run script folder but got the same errors unfortunately.
Any idea?
Regards.

export KAFKA_HOME=/opt/17173/kafka
export PATH=$HADOOP_HOME/bin:$SPARK_HOME/sbin:$KAFKA_HOME/bin:$PATH

The reason is kafka-topics can't be found in bin, you have to add it to PATH. For me, Kafka is in /Users/long/software/kafka_2.10-0.8.2.2, so I just
vim ~/.bashrc ,then add follow two lines in the end.
export KAFKA_HOME=/Users/long/software/kafka_2.10-0.8.2.2
export PATH=$KAFKA_HOME/bin:$PATH;
After add it, You must restart Shell.

Related

Error in Google Cloud Shell Commands while working on the lab (Securing Google Cloud with CFT Scorecard)

I am working in a GCP lab (Securing Google Cloud with CFT Scorecard). All instructions for the lab are given.
First I have to run the following two commands to set environment variables
export GOOGLE_PROJECT=$DEVSHELL_PROJECT_ID
export CAI_BUCKET_NAME=cai-$GOOGLE_PROJECT
In the second command given above I don't know what to replace with my own credentials? May be that is the reason I am getting error.
Now I have to enable the "cloudasset.googleapis.com" gcloud service. For this they gave the following command.
gcloud services enable cloudasset.googleapis.com \
--project $GOOGLE_PROJECT
Error for this is given in the screeshot attached herewith:
Error in the serviec enabling command
Next step is to clone the policy: The given command for that is:
git clone https://github.com/forseti-security/policy-library.git
After that they said: "You realize Policy Library enforces policies that are located in the policy-library/policies/constraints folder, in which case you can copy a sample policy from the samples directory into the constraints directory".
and gave this command:
cp policy-library/samples/storage_blacklist_public.yaml policy-library/policies/constraints/
On running this command I received this:
error on running the directory command
Finally they said "Create the bucket that will hold the data that Cloud Asset Inventory (CAI) will export" and gave the following command:
gsutil mb -l us-central1 -p $GOOGLE_PROJECT gs://$CAI_BUCKET_NAME
I am confused in where to replace my own credentials like in the place of project_Id I wrote my own project id.
Also I don't know these errors are ocurring. Kindly help me.
I'm unable to access the tutorial.
What happens if you run the following:
echo ${DEVSHELL_PROJECT_ID}
I suspect you'll get an empty result because I think this environment variable isn't actually set.
I think it should be:
echo ${DEVSHELL_GCLOUD_CONFIG}
Does that return a result?
If so, perhaps try using that variable instead:
export GOOGLE_PROJECT=${DEVSHELL_GCLOUD_CONFIG}
export CAI_BUCKET_NAME=cai-${GOOGLE_PROJECT}
It's not entirely clear to me why this tutorial is using this approach but, if the above works, it may get you further along.
We're you asked to create a Google Cloud Platform project?
As per the shared error, this seems to be because your env variable GOOGLE_PROJECT is not set. You can verify it by using echo $GOOGLE_PROJECT and seeing whether it returns the project ID or not. You could also use echo $DEVSHELL_PROJECT_ID. If that returns the project ID and the former doesn't, it means that you didn't export the variable as stated at the beginning.
If the problem is that GOOGLE_PROJECT doesn't have any value, there are different approaches on how to solve it.
Set the env variable as you explained at the beginning. Obviously this will only work if the variable DEVSHELL_PROJECT_ID is also set.
export GOOGLE_PROJECT=$DEVSHELL_PROJECT_ID
Manually set the project ID into that variable. This is far from ideal because in Qwiklabs they create a new temporal project on every lab, so this would've only worked if you were still on that project. The project ID can be seen on both of your shared screenshots.
export GOOGLE_PROJECT=qwiklabs-gcp-03-c6e1787dc09e
Avoid using the argument --project. According to the documentation, the aforementioned argument is optional and if none is used the command will take the one by default, which will be on the configuration settings. You can get the current project by using this:
gcloud config get-value project
If the previous command matches the project ID you want to use, you can simply issue the following command:
gcloud services enable cloudasset.googleapis.com
Notice that the project ID is not being explicitly mentioned using --project.
Regarding your issue with the GitHub file, I have checked the repository and the file storage_blacklist_public.yaml doesn't seem to be in the directory policy-library/samples. There seems to be a trace that it was once there, but it isn't anymore, they should probably update the lab as it isn't anymore.
About your credentials confusion, you don't have to use your own project ID, just the one given on your lab. If I recall properly all the needed data should be on the left side of the lab. Still, you shouldn't need to authenticate in a normal situation as you are already logged in your temporal project if you are accessing it form the Cloud Shell, which is where you should be doing all this.
Adding this for the later versions
in the gcloud shell you can set a temp variable for the current project id with
PROJECT_ID="$(gcloud config get-value project)"
then use like
--project ${PROJECT_ID}

Zookeeper ignores JVMFLAGS?

Hi I setup my zookeeper cluster and it seems to be running fine. But I'm trying to setup the heap size and it doesn't seem to be respected. I created the java.env with export JVMFLAGS="-Xms3000m -Xmx3000m" file inside conf/...
When I ps -aux | grep java I can see -Xmx1000m -Xms3000m -Xmx3000m. But when I check with free -m I only see 200M used and 3.3G free.
I noticed that the default value is set regardless. Does this affects it?
Shouldn't Xms fill up the used RAM?
The file zkEnv.sh contains the following line:
export SERVER_JVMFLAGS="-Xmx${ZK_SERVER_HEAP}m $SERVER_JVMFLAGS"
and the "-Xmx${ZK_SERVER_HEAP}m" caused the trouble.
This is what finally worked for me.
In the conf/java.env
export SERVER_JVMFLAGS="-Xms6144m -Xmx6144m -XX:+AlwaysPreTouch"
If you are using /usr/bin/zookeeper-server-start from the confluent-kafka-* package, you might need to set KAFKA_HEAP_OPTS inside your systemd unit file.
Environment="KAFKA_HEAP_OPTS=-Xmx1024M -Xms1024M"

Azure batch Application package not getting copied to Working Directory of Task

I have created Azure Batch pool with Linux Machine and specified Application Package for the Pool.
My command line is
command='python $AZ_BATCH_APP_PACKAGE_scriptv1_1/tasks/XXX/get_XXXXX_data.py',
python3: can't open file '$AZ_BATCH_APP_PACKAGE_scriptv1_1/tasks/XXX/get_XXXXX_data.py':
[Errno 2] No such file or directory
when i connect to node and look at working directory non of the Application Package files are present there.
How do i make sure that files from Application Package are available in working directory or I can invoke/execute files under Application Package from command line ?
Make sure that your async operation have proper await in place before you start using the package in your code.
Also please share your design \ pseudo-code scenario and how you are approaching it as a design?
Further to add:
Seems like this one is pool level package.
The error seems like that the application env variable is either incorrectly used or there is some other user level issue. Please checkout linmk below and specially the section where use of env variable is mentioned.
This seems like user level issue because In case of downloading the package resource, if there will be an error it will be visible to you via exception handler or at the tool level is you are using batch explorer \ Batch-labs or code level exception handling.
https://learn.microsoft.com/en-us/azure/batch/batch-application-packages
Reason \ Rationale:
If the pool level or the task application has error, an error-list will come back if there was an error in the application package then it will be returned as the UserError or and AppPackageError which will be visible in the exception handle of the code.
Key you can always RDP into your node and checkout the package availability: information here: https://learn.microsoft.com/en-us/azure/batch/batch-api-basics#connecting-to-compute-nodes
I once created a small sample to help peeps around so this resource might help you to checkeout the use here.
Hope rest helps.
On Linux, the application package with version string is formatted as:
AZ_BATCH_APP_PACKAGE_{0}_{1}
On Windows it is formatted as:
AZ_BATCH_APP_PACKAGE_APPLICATIONID#version
Where 0 is the application name and 1 is the version.
$AZ_BATCH_APP_PACKAGE_scriptv1_1 will take you to the root folder where the application was unzipped.
Does this "exact" path exist in that location?
tasks/XXX/get_XXXXX_data.py
You can see more information here:
https://learn.microsoft.com/en-us/azure/batch/batch-application-packages
Edit: Just saw this question: "or can I invoke/execute files under Application Package from command line"
Yes you can invoke and execute files from the application package directory with the environment variable above.
If you type env on the node you will see the environment variables that have been set.

Installing openTSDB on Ubuntu15.04

I have installed openTSDB(.deb package) on ubuntu 15.04 by following the guidelines stated in documentation. when I give this command "service opentsdb start" it is not starting and it is mentioned in documentation that we have to change some configuration files.can anyone please tell me what are the changes that we have to do and in which file the changes have to be done?
Thanks in advance
Regards
VHC
Check your logs in /var/log/opentsdb/opentsdb.log. You should have HBase up&running correctly (means i.e. you are able to create table and store some values)
Remember you have to create tables in HBase running
env COMPRESSION=NONE HBASE_HOME=path/to/hbase-X.XX.X /usr/share/opentsdb/tools/create_table.sh
http://opentsdb.net/docs/build/html/installation.html#create-tables

Write a simple mod_perl handler

I want to write a simple mod_perl handler which returns the local time like described on this page (http://perl.apache.org/docs/2.0/user/handlers/intro.html), but where have I to locate this file to access it.
I'm using Ubuntu but don't have a directory called MyApache2. So where to locate this file to try the functionality?
This is just an example. You need to create the files yourself. (You'll see your example refers to "file:MyApache2/CurrentTime.pm").
mkdir -p example-lib/MyApache2
touch example-lib/MyApache2/CurrentTime.pm
Then paste the contents from the example into the file you just created.
In order for this to run under mod_perl, you'll also have to let the server know where your MyApache2 is located. You should be able to add something like this to your Apache config:
PerlSwitches -I/path/to/example-lib
Don't forget to restart Apache before you test this out.