Using scala I want to read file from hdfs where kerberos security is enabled.Any suggestion how to start.
Related
I am discovering the tool and I have some questions:
-what do you exactly mean by the type File in (Source, Sink),
-is it also possible to send the result of the pipeline directly to a FTP server
I check the documentation, but I did not find this information
thank you
Short answer: File refers to the filesystem where the pipelines run. In Data Fusion context if you are using File sink the contents will be written to HDFS on Dataproc cluster.
Data Fusion has SFTP put actions that can be used to write to SFTP. Here is a simple pipeline of how to write to SFTP from GCS.
Step1: GCS Source to File Sink - This writes the content of GCS to HDFS on Dataproc when the pipeline is run
Step 2: SFTP Put action, that takes the output of File sink and upload to SFTP.
You need to configure the output path of File the same as source path in SFTP
I followed the turorial from here on how to install and use Zeppelin. I also created users with passwords by specifying them in the conf/shiro.ini file.
The problem however is that a user can write this simple script to see the contents of the shiro file.
%python
import os
os.system("cat <path_to_zeppelin_folder>conf/shiro.ini")
My question is, how can I prevent the user from seeing this file, as this file is accessed by the Zeppelin program and therefore I can't just make it unreadable by removing read permissions.
What I am doing now is to remove read/write permissions of the shiro.ini file after starting Zeppelin, but there should be a more elegant way of preventing such a thing.
We have implemented contract testing using message pact and directly accessing Kafka Topics for retrieving the messages from queues. Kafka topics can be accessed using authentication PLAINTEXT. So we have a separate LoginModule defined in a config file with username and password. When I do the test from consumer end it is picking up the correct config file and the scripts are running. But when I run pact:verify using the same setting in the script, LoginModule is not getting recognized and I get an error "unable to find LoginModule class". From pact side I am getting an error "Failed to invoke provider method". Have anyone faced such issues with using pact with kafka before please ?
Are you talking about this one? github.com/reevoo/pact-messages If so, we are not currently supporting pact-messages as we have yet to finalize the base level tech with http/json.
This has been brought up in the past and is known within the Foundation, but we'd rather lock down the core technology before trying to tackle other message protocols/formats.
I need some advice for the following problem:
I have a Spark cluster with Cassandra.
I need to write a spark job (using Scala) to extract some informations out of Cassandra. I need to generate a file with the result and put it on another server (where there is no Spark).
My question is: What is the best solution for that ?
1. Generate the file on the same server as spark and then do a scp to copy it on my destination server ?
2. Is there another way to generate the file right on my destination server ?
Thanks.
A better way to do so would be to compute the results and store them in some directory in HDFS (server with spark) and nfs mount this directory to some path in your destination server (server without spark).
Let me know if this helped. Cheers.
I'm trying to import data into my Analytics for Apache Hadoop instance using Hadoop shell commands.
The Analytics for Apache Hadoop bluemix documentation provides a link to BigInsight documentation in the related links section. The link is: http://www.ibm.com/support/knowledgecenter/SSPT3X_4.0.0/com.ibm.swg.im.infosphere.biginsights.welcome.doc/doc/welcome.html
I navigated to the following page: http://www-01.ibm.com/support/knowledgecenter/SSPT3X_4.0.0/com.ibm.swg.im.infosphere.biginsights.import.doc/doc/data_rest_shell.html, where instructions are given for using the hadoop fs command line.
I tried the following command (replacing the hostname with my instance name):
hadoop fs -mkdir hdfs://test.ibm.com:9000/<TargetDirPath>
However, the command timed out.
--
Question: Can I use the hadoop fs command as described in the BigInsights documentation with Analytics for Apache Hadoop?
It isn't possible to use the Hadoop shell commands.
The next best thing is to use the REST API, webHDFS. The REST api is documented here: https://www.ng.bluemix.net/docs/services/AnalyticsforHadoop/index.html#analyticsforhadoop_data