webhdfs two steps upload a file

webhdfs two steps upload a file - rest

I build a hadoop cluster with 4 machines:
{hostname}: {ip-address}
master: 192.168.1.60
slave1: 192.168.1.61
slave2: 192.168.1.62
slave3: 192.168.1.63
I use HttpFS upload a file to hdfs with restful way, there contains two steps to finish the task.
Step 1: Submit a HTTP POST request without automatically following redirects and without sending the file data.
curl -i -X POST "http://192.168.1.60:50070/webhdfs/v1/user/haduser/myfile.txt?op=APPEND"
the server return result like:
Location:http://slave1:50075/webhdfs/v1/user/haduser/myfile.txt?op=CREATE&user.name=haduser&namenoderpcaddress=master:8020&overwrite=false
step 2: use the response address to upload the file.
In step 1, How could I get the datanode's ip address(192.168.1.61) rather than the hostname (slave1)?

If your hadoop version>=2.5, at every datanode config ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml file.
add:
property dfs.datanode.hostname,
the value is datanodes's ip address.

Related

Burrow integration with MSK Kafka

I am trying to connect Burrow to AWS MSK Kafka. I keep receiving below message. I am able to connect to MSK from same EC2 instance following steps.However Burrow is not able to connect. We need to specify the truststore which I am not able to set it in Burrow. Any help would be appreciated.
client has run out of available brokers

AWS support ticket helped me solve the issue. My Client to broker was TLS connection, the steps mentioned in AWS refers to PLAINTEXT. Here is what u need to do to make it work.
Run the following command to COPY the cacerts file to the current location:
-> cp /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.amzn2.0.1.x86_64/jre/lib/security/cacerts .
**The JVM path might be different for your instance.
Please note the path of this newly created cacerts file by running the pwd command. This path (say P1) will be used in the next steps.
Add additional configuration for TLS in the file /home/ec2-user/go/src/github.com/linkedin/Burrow/config/burrow.toml and adding the following details:
===========
[client-profile.test]
client-id="burrow-test"
kafka-version="0.10.0"
tls="mytlsprofile"
[tls.mytlsprofile]
cafile="P1/cacerts"
noverify=true

How to Use Command Line Parameters in JMeter

I'm using Jmeter for testing APIs and I want to parametrize the project's path from the terminal and then I want to use this parameter in JMeter.
The parameter that I've sent via Command Line :
./jmeter -n -t your_script.jmx -Jurl=abcdef.com
The parameter that I've used in User Defined Variables :
${__P(url)}
But when I run my automation in JMeter, my test scripts are not going to URL that's been defined. When I check the request body, I see POST https://1 as URL.
Please see the attached photos.
https://mylifebox.com/shr/3df5bb35-cf43-4488-b20b-5c2d59656212&language=en

Let's start clean:
In the User Defined Variables configure the variable with the name of url and the value of ${__P(url,)}
In the HTTP Request sampler (or even better HTTP Request Defaults) put ${url} into "Server Name or IP" field:
Run your test in command-line non-GUI mode like:
jmeter -n -t your_script.jmx -Jurl=abcdef.com -f -l result.jtl
mind this -f argument which tells JMeter to overwrite the existing results file (it might be the case you're looking into "old" results where the url property value was starting with 1)
That's it, you should see the HTTP Request sampler making a call to abcdef.com in the .jtl results file. And if you change this url parameter - you will see the impact in the .jtl results file:

Put ${__P(url)} inside Server Name field in HTTP Request.
Domain name or IP address of the web server, e.g. www.example.com. [Do not include the http:// prefix.] Note: If the "Host" header is defined in a Header Manager, then this will be used as the virtual host name.
Don't use User Defined Variables

Read file created in HDFS with Livy

I am using Livy to run the wordcount example by creating jar file which is working perfectly fine and writing output in HDFS. Now I want to get the result back to my HTML page. I am using Spark scala, sbt, HDFS and Livy.
The GET/batches REST API only shows log and state.
How do I get output results?
Or how can I read a file in HDFS using REST API in Livy? Please help me out with this.
Thanks in advance.

If you check the status for the batches using curl you will get the status of Livy batch job which will come as Finished(If spark driver has launched successfully).
To read the output:
1. You can do SSH using paramiko to the machine where hdfs is running and run hdfs dfs -ls / to check the output and perform your desired tasks.
Using the Livy rest API you need to write a script which does the step 1 and that script can be called through curl command to fetch the output from HDFS but in this case Livy will launch seperate spark driver and output will come in the STDOUT of the driver logs.
curl -vvv -u : :/batches -X POST --data '{"file": "http://"}' -H "Content-Type: application/json"
First one is the sure way of getting the output though I am not 100% sure about how second approach will behave.

You can use WebHDFS in you REST call.Get the WebHDFS enabled first by ur Admin.
Use the webHDFS URL
Create HttpURLConnection object
Set Request method as GET
then use buffer reader to getInputStream.

Why does OpenShift interfere with my redirects?

I configured both, europe.example.org and example.eu as domain alias in OpenShift.
When example.eu is called (eg via curl -i http://example.eu), my OpenShift app's logic sends this HTTP Location header in order to perform a redirect:
Location: http://europe.example.org/?from=example.eu
However, OpenShift intereferes with what I send, actually submitting the following instead:
Location: http://example.eu/?from=example.eu
This creates an infinite redirect-loop.
How can I stop OpenShift from doing that and instead have it pass what my app actually says to?

Try:
Location: http://#europe.example.org/?from=example.eu

How to check HTTP response code in zabbix?

I have a Zabbix server 2.2 and a few linux hosts with websites. How can I get a notification from Zabbix, if the HTTP(s) response code is not 200?
I've tried those triggers without any success:
{owncloud:web.test.rspcode[Availability of owncloud,owncloud availability].last(,10)}#200
{owncloud:web.test.error[Availability of owncloud].count(10,200)}<1
{owncloud:web.test.error[Availability of owncloud].last(#1,10)}=200
But nothing works. I never got an notification, that the code is not 200 anymore even it was 404, because I have renamed the index.php of owncloud to index2.php

I configured the Application and the we the Web Scenario as followed:
if you have already configured the host go to step 1
1) Select the host by Configuration-> Host groups -> select host (example server 1)
2) Go to Config > Hosts > [Host Created Above] > Applications and click on Create Application
3) Now you have to create the Web scenario with the status code check, in my case I checked status code 200. So go to Configuration > Hosts > [Host Created Above] > Web Scenarios and click on Create Web Scenario .
Remark: you have to select the previous application created at the step 2
4) After that without click on Add button go to Steps window and you have to configure the host and parameters for the chek. After that click on Add. In my cas e check the status code 200 response for the HTTP request.

I found the issue. You need to specify the URL to check with file. For example like this in your web scenario:
https://owncloud.example.com/index.php
"Note that Zabbix frontend uses JavaScript redirect when logging in, thus first we must log in, and only in further steps we may check for logged-in features. Additionally, the login step must use full URL to index.php file." - https://www.zabbix.com/documentation/2.4/manual/web_monitoring/example
I also used following expression as trigger:
{owncloud:web.test.fail[Availability of owncloud].last()}>0

you have set a triggers bye Expression
{host name:web.test.rspcode[Scenario name,Steps name].last()}=200

The question has been answered adequately, but I will provide a very much more advanced solution that you could use for all HTTP status codes.
I've created an item that monitors all HTTP status codes of a proxy, graphs them, and then set up multiple different types of triggers to watch last value and counts in last N minutes.
The regex I used to extract all the values from a Nginx or Apache access log is
^(\S+) (\S+) (\S+) \[([\w:\/]+\s[+\-]\d{4})\] \"(\S+)\s?(\S+)?\s?(\S+)?\" (\d{3}|-) (\d+|-)\s?\"?([^\"]*)\"?\s?\"?([^\"]*)\"?\s
I then set many triggers relevant for my particular situation
101 Switching Protocols
301 Moved Permanently
302 Redirect
304 not modified
400 Bad Request
401 Unauthorised
403 Forbidden
404 Not found
500 Server Error
It's also important that your Zabbix agent has permissions to read the log file on the host. You can add the zabbix-agent to the www-data group using this command.
$ sudo usermod -a -G www-data Zabbix
See the tutorial for all the steps in greater detail.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

webhdfs two steps upload a file - rest

If your hadoop version>=2.5, at every datanode config ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml file. add: property dfs.datanode.hostname, the value is datanodes's ip address.

Related

Burrow integration with MSK Kafka

How to Use Command Line Parameters in JMeter

Read file created in HDFS with Livy

Why does OpenShift interfere with my redirects?

How to check HTTP response code in zabbix?

Categories

Resources