WebHDFS giving error in Hortonworks - rest

While setting up history server and hive server, webHDFS is giving following error in REST API.
curl -sS -L -w '%{http_code}' -X PUT -T /usr/hdp/2.3.4.0-3485/hadoop/mapreduce.tar.gz 'http://ambari1.devcloud.247-inc.net:50070/webhdfs/v1/hdp/apps/2.3.4.0-3485/mapreduce/mapreduce.tar.gz?op=CREATE&user.name=hdfs&overwrite=True&permission=444'
{"RemoteException":
{"exception":"IOException",
"javaClassName":"java.io.IOException",
"message":"Failed to find datanode, suggest to check cluster health."
}}403

As user hdfs check the cluster status
hdfs dfsadmin -report
I had the same problem, the report showed that the datanodes were not online. In the end I had to add the nodes to the /etc/hosts file. In your case it might be another reason, the essential thing is that the datanodes have to be up and can be reached.

I had the same problem with HDP2.5. Mapreduce report the same Error and Ambari shows 0/3 Datanodes alive even though it also shows all 3 Datanodes are started.
For me, the reason is DNS.
My machines have central DNS server, so at first, I did not add the ip/host pair in /etc/hosts file, after deployment, Mapreduce server failed to start.
Then I add all machines ip/host paris in /etc/hosts file on each machine, and restart HDFS. The problem is resolved.

Related

How to send logs from Google Stackdriver to Kafka

I see many docs and posts about how to send logs to Stackdriver but almost no information about how to do the opposite - send logs from the Stackdriver to Kafka.
In my case, our Ops want to collect the logs from our web servers using Google's stackdriver agents and pushing them to stackdriver ... However, for my stream processing needs I want to get the logs into Kafka to use it's unparalleled abilities to retain and reprocess data by any number of consumers, something that I cannot do with PubSub.
So, what are the options for doing this? I only saw a couple of possible avenues - neither sounds too good:
based on this post: (https://powerspace.tech/how-to-stream-data-from-google-pubsub-to-kafka-with-kafka-connect-dbef1c340a76) push data into PubSub first, and then read from it using either Kafka connector or write my own Kafka consumer. I hate the thought of adding yet another hop (serialize/deserialize/ack/etc.) between the source of data and Kafka ....
I noticed a brief mentioning in passing on adding a plugin to Google's version of Fluentd (which is what stackdriver log collection agent is based on) here: https://powerspace.tech/how-to-stream-data-from-google-pubsub-to-kafka-with-kafka-connect-dbef1c340a76 . Not many details - so hard to tell how involved this approach is ...
Any other options?
Thank you!
Enter in to the Kafka console and add certain elements in the console. Once you have added the elements in the Kafka console you need to check if these elements are reflected successfully in the cloud shell. For this you will run the command > $ gcloud pubsub subscriptions pull from-kafka — auto-ack — limit=10 < . Once you run this command it will take some time to sync with the Kafka console. You will get the results after running this command a couple of times.
You will run the commands in the Cloud Shell and see the output in the Kafka VM SSH.
***Image1
Now you will be verifying the exact opposite procedure where in you will be running the command in the Kafka VM and seeing the output in the Cloud Shell. It will take some time for the output to be reflected and you may have to run the command > $ gcloud pubsub subscriptions pull from-kafka — auto-ack — limit=10 < a couple of times to see the output. Your output will look like this
*** image2
The Kafka plugin is deprecated. For more information, refer to https://cloud.google.com/stackdriver/docs/deprecations
Note: This functionality is only available for agents running on Linux. It is not available on Windows.
Kafka is monitored via JMX. Monitoring supports monitoring Kafka version 0.8.2 and higher.
On your VM instance, download kafka-082.conf from the GitHub configuration repository and place it in the directory /etc/stackdriver/collectd.d/:
(cd /etc/stackdriver/collectd.d/ && sudo curl -O https://raw.githubusercontent.com/Stackdriver/stackdriver-agent-service-configs/master/etc/collectd.d/kafka-082.conf)
The downloaded plugin configuration file assumes that your Kafka server is configured to accept JMX connections on port 9999. If you have configured Kafka with a different JMX port, as root, edit the file and follow the instructions to change the JMX port settings.
After adding the configuration file, restart the Monitoring agent by running the following command:
sudo service stackdriver-agent restart
What is monitored:
https://cloud.google.com/monitoring/api/metrics_agent#agent-kafka

How to sync user directory on bitbucket server to jira with both running on aks?

When trying to sync the user directories of Jira to other atlassian products (confluence and bitbucket server running on aks) a 403 error is returned.
Upon looking into this error the following steps have been attempted:
https://confluence.atlassian.com/stashkb/unable-to-connect-to-jira-for-authentication-forbidden-403-323391874.html
The IP adresses have been added to the whitelist of Jira. The next step in solutions online is to restart the Jira
service.
This however causes issues as upon running the stop/start-jira.sh files inside the pod the service returns
with none of the previous settings and all configurations including backups are gone. Taking us back to square one.
cluster size:
current set-up
3 x Standard D8 v3 (8 vcpus, 32 GiB memory) cluster on aks
Used the following images installed through UI:
atlassian/jira-software
cptactionhank/docker-atlassian-jira
Exec into pod and go to /opt/atlassian/jira/bin
run ./(start/stop)-jira.sh
What should happen is that when going back to the url the Jira instance is reset and all configuration files in the pod for the service are lost.
The logs of the pod give error no 137 as a common error when restarting.
update:
https://github.com/int128/devops-kompose/tree/master/atlassian-jira-software
The following helm chart has also been used and achieved the same result.

How resolve marathon leader with mesos-dns

I've installed mesos-dns in our cluster and is running ok. We can check the domain of the apps installed in marathon but I would like to know in which host is installed the marathon itself. If I do a dig to marathon.domain is not resolving anything.
According to the doc of mesos-dns: "A records ({framework}.domain) and SRV records (_framework._tcp.{framework}.domain) - for every known Mesos master"
Thanks.
It's marathon.mesos unless you've used a different TLD. The Marathon scheduler runs on the Master.
You can use my mesosdns-resolver bash script to get the endpoint from Mesos DNS.
You can use it like:
mesosdns-resolver.sh -sn <service-name>.marathon.mesos -s <IP_ADDRESS_OF_MESOS_DNS_SERVER>

openshift pod fails and restarts frequently

I am creating an app in Origin 3.1 using my Docker image.
Whenever I create image new pod gets created but it restarts again and again and finally gives status as "CrashLoopBackOff".
I analysed logs for pod but it gives no error, all log data is as expected for a successfully running app. Hence, not able to determine the cause.
I came across below link today, which says "running an application inside of a container as root still has risks, OpenShift doesn't allow you to do that by default and will instead run as an arbitrary assigned user ID."
What is CrashLoopBackOff status for openshift pods?
Here my image is using root user only, what to do to make this work? as logs shows no error but pod keeps restarting.
Could anyone please help me with this.
You are seeing this because whatever process your image is starting isn't a long running process and finds no TTY and the container just exits and gets restarted repeatedly, which is a "crash loop" as far as openshift is concerned.
Your dockerfile mentions below :
ENTRYPOINT ["container-entrypoint"]
What actually this "container-entrypoint" doing ?
you need to check.
Did you use the -p or --previous flag to oc logs to see if the logs from the previous attempt to start the pod show anything
The recommendation of Red Hat is to make files group owned by GID 0 - the user in the container is always in the root group. You won't be able to chown, but you can selectively expose which files to write to.
A second option:
In order to allow images that use either named users or the root (0) user to build in OpenShift, you can add the project’s builder service account (system:serviceaccount::builder) to the privileged security context constraint (SCC). Alternatively, you can allow all images to run as any user.
Can you see the logs using
kubectl logs <podname> -p
This should give you the errors why the pod failed.
I am able to resolve this by creating a script as "run.sh" with the content at end:
while :; do
sleep 300
done
and in Dockerfile:
ADD run.sh /run.sh
RUN chmod +x /*.sh
CMD ["/run.sh"]
This way it works, thanks everybody for pointing out the reason ,which helped me in finding the resolution. But one doubt I still have why process gets exited in openshift in this case only, I have tried running tomcat server in the same way which just works fine without having sleep in script.

Cluster(Multi server) setup for zookeeper exhibitor

I'm newbie to Zookeeper.Trying to setup clustering server for zookeeper exhibitor to modify the data. I have tried the server setup with 3 nodes, but data modification not reflected on all the zookeeper.
I refereed the following url & also setup the server in the same way. But no use, some thing i'm missing in that config to run it correctly.
http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup
Exhibitor startup command is:
java -jar exhibitor-war-1.0-jar-with-dependencies.jar -c file --nodemodification true --port 9090
Farther I need to add any other config with this to get my data modification reflect on all the zookeeper.
Advance thanks for ur kind time!
I have similar scenario, and I'm on Windows. So I get some problems due to the fact that Exhibitor is Unix-oriented - it tries to restart zkServer.sh (instead of zkServer.bat). So I:
1. have manually started ZK ensemble (all instances get data modifications from each other).
2. set up Exhibitor above every ZK instance - with single network config file.
Hope it helps. If not, give more details.