Alright, this is annoying! I am new to Hadoop. And I am trying to find decent alternative to basic HDFS web interface. i tried with hadoop eclipse plugin but seems it's oudated already and it's pain to set it up correctly! I have cloudera's distribution installed and I heard about cloudera desktop but it's no longer available. Can anybody tell me decent alternative to HDFS web interface where I can download and upload files to HDFS via GUI easily? P.S I am running everything on my local no, cluster involved. Tried a lot to find , but nothing seems to be pointing towards right direction
You can use webhdfs of which REST API supports the complete FileSystem interface for HDFS. http://hadoop.apache.org/docs/r1.0.4/webhdfs.html
OR
You can integrate hadoop with hoop(HDFS over HTTP), which is used to access HDFS via HTTP protocol. Hoop provides access to all Hadoop Distributed File System (HDFS) operations (read and write) over HTTP/S
for more details please refer.
http://bigobject.blogspot.in/2013/03/hoop-https-over-hdfs.html
or also you can user HTTPFS as a option to Hoop
http://bigobject.blogspot.in/2013/03/apache-hadoop-httpfs-service-that.html
Related
Our application reads data from several HDFS data folders, folders get updated weekly/daily/monthly so based on the updated period we need to find the latest path and then read the data.
We would like to do this using programmatic way using scala, so is there libraries available?
We could only see but just wondering any better libraries available?
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/package-summary.html
The linked library is the recommended way to use HDFS API programmatically without going through hadoop fs CLI scripts. Any other library you may find would be built using the same package.
I want to activate the Kafka Spark pipeline for the Thingsboard platform (community edition).
As per the mentioned Stack question "Couldn't able to find plugins in ThingsBoard 2.0.3 Home screen"
It's been said that we can do it via Rule chains itself since the plugin section has been removed, but I am not able to understand how to configure it using rule chains. I am not able to get the complete documentation to configure Kafka via rule chains. So need help on that.
I figured it out. By using this link it can be done easily "https://thingsboard.io/docs/samples/analytics/kafka-streams/"
The thing is that using the Thingsboard CE we can get data into Kafka-topic. However, to fetch data from Kafka you will need to have TB Professional Edition integration.
The alternate option to Thingsboard PE is to write your own REST API script to push the insights back to ThingsBoard.
Does it makes sense to use Knox (+LDAP) as authentication proxy to application that is not using Hadoop at all?
I'm new to this domain and have heard about such possibility but I don't quite get it. Maybe there are some viable alternatives?
Absolutely ! you can use Apache Knox without Hadoop. I use it to secure my Raspberry Pi Motion setup, this is a link to the blog post (it is still work in progress but you'll get an idea) Securing Raspberry Pi Security Camera (or UIs) using Apache Knox
So I've read the documentation for enabling Plugins, but there doesn't seem to be a way to add new plugins other than for the ones already packaged.
I've tried grabbing the google cloud storage connector jar and sticking it in the 3rd party folder and starting Drill, but it doesn't seem to pick it up. I get an error that it doesn't recognise the protocol (gs://, as opposed to s3:// which works obviously).
Has any managed to do this? There seems to be zero information on getting this working, although Drill does claim to be able to work with google cloud services.
Is possible build a bigdata application on cloud with RED HAT'PaaS OpenShift? I'm looking how build on cloud an Scala Application with Hadoop (HDFS),Spark,an Apache Mahout but i can't find any thing about it.I've seen something with HortonWorks but nothing clear about how install it in an openshift environment an how add HDFS node in Cloud too.Is it possible with OpneShift?
It's possible in Amazon but my question is : IS possible in OpenShift ??
It really depends on what you're ultimately trying to achieve. I know you mention building a big data application on Openshift with Scala but what will the application ultimately be doing?
I've gotten Hadoop running in a gear before but if you want a better example check out this quickstart here to get an idea of how its done https://github.com/ryanj/flask-hbase-todos. I know its not scala but here's a good article that will show you how to put together a scala app https://www.openshift.com/blogs/building-distributed-and-event-driven-applications-in-java-or-scala-with-akka-on-openshift.
What will the application ultimately be doing?:
Forecasting for football match result for several football leagues,a web application (ruby) and
statistic computation and data mining ,calculations with Scala language
and apache frameworks(spark & mahout).
We get the info via CSV files, process and save it in nosql db (Cassandra).
And all of this on cloud(OpenShift),that's the idea.
I've seen the info https://github.com/ryanj/flask-hbase-todos.I'll try by this way but
with Scala.