Starting Druid to consume data from kafka - druid

Took the latest version of druid 0.16.0-incubating.
Had 2 question .
1) As mentioned in quick start , micro-quick-start doesnt work as it complains about no file jvm.config and main.config under /conf/druid/single-server/micro-quickstart/coordinator-overlord .
2) As micro qucik start failed i started to try with single-server-small.
Was trying to import data from kafka in single-server-small but unable to do so as it says extension is not loaded , by the way which i can see gets loaded in logs.
But i think my main problem is when ever i land up on 'Load data' Section on druid web page on localhost:8888 , it keeps me giving below error
"Failed to get overlord modules : Unable to determine destination for [/proxy/overlord/status]; is your coordinator/overlord running ?"
I can see my coordinator-overlord process up .
Any suggestions ?
Thanks

Delete your kafka logs and druid logs(the location set up in common.runtime.properties) and then try again.

Related

Unable to open ejabbered web dashboard localhost:5280

I am new to ejabberd and trying to play with it. I installed ejabbered following the instructions present at official doc page with username and password. I am able to start the ejabbered server and connect to it through Java using "Smack" API's.
However, when I try to open Web Dashboard at http://localhost:5280/admin/ and then login with admin user, It doesn't show up anything. When I checked logs, it shows following erros -
2020-03-13 21:03:17.618 [error] <0.1965.0>#ejabberd_http:apply_custom_headers:860 CRASH REPORT Process <0.1965.0> with 0 neighbours crashed with reason: bad argument in call to maps:from_list([html]) in ejabberd_http:apply_custom_headers/2 line 860
2020-03-13 21:03:17.619 [error] <0.535.0>#ejabberd_http:apply_custom_headers:860 Supervisor ejabberd_http_sup had child undefined started with {ejabberd_http,start_link,undefined} at <0.1965.0> exit with reason bad argument in call to maps:from_list([html]) in ejabberd_http:apply_custom_headers/2 line 860 in context child_terminated
Pleasse help me out with this. Thanks!
Yes, that problem was introduced in ejabberd 20.02, and it's fixed in the following versions.
You have two options:
Download ejabberd 20.01, that doesn't have this problem
Or download ejabberd 20.03 or above.

Unable to start zookeeper getting exception saying that unable to load

I am unable to start zookeeper in my local
getting exception
Error: Could not find or load main class software\kafka_2.12-2.3.0\libs\activation-1.1.1.jar;
I have got the same error.
Finally I found the reason.
I put kafka in a folder which it's name contains space like this
C:\Open source\kafka\kafka_2.12-2.3.0
Then I copied all kafka_2.12-2.3.0 to a new folder without space in name, such asc C:\kafka_2.12-2.3.0
and kafka can run
NOte that the command line is: zookeeper-server-start.bat ../../config/zookeeper.properties
NOT: zookeeper-server-start.bat config/zookeeper.properties
Here's my result

Mappers fail for pig to insert data into MongoDB

I am trying to import a file from HDFS to MongoDB using MongoInsertStorage with PIG. The files are large, around 5GB. The script runs fine when I run it in local mode with
pig -x local example.pig
However if I run it in the mapreduce mode, Most of the mappers fail with the following error:
Error: com.mongodb.ConnectionString.getReadConcern()Lcom/mongodb/ReadConcern;
Container killed by the ApplicationMaster.
Container killed on request.
Exit code is 143 Container exited with a non-zero exit code 143
Can someone help me solve this issue?? I also increased the memory allocated to YARN containers but that hasnt helped.
Some mappers are also timing out after 300 seconds.
Pig Script is as follows
REGISTER mongo-java-driver-3.2.2.jar
REGISTER mongo-hadoop-core-1.4.0.jar
REGISTER mongo-hadoop-pig-1.4.0.jar
REGISTER mongodb-driver-3.2.2.jar
DEFINE MongoInsertStorage com.mongodb.hadoop.pig.MongoInsertStorage();
SET mapreduce.reduce.speculative true
BIG_DATA = LOAD 'hdfs://example.com:8020/user/someuser/sample.csv' using PigStorage(',') As (a:chararray,b:chararray,c:chararray);
STORE BIG_DATA INTO 'mongodb://insert.some.ip.here:27017/test.samplecollection' USING MongoInsertStorage('', '')
Found a solution.
For the error
Error: com.mongodb.ConnectionString.getReadConcern()Lcom/mongodb/ReadConcern;
Container killed by the ApplicationMaster.
Container killed on request.
Exit code is 143 Container exited with a non-zero exit code 143
I changed the JAR versions - hadoopcore and hadooppig from 1.4.0 to 2.0.2 and for Mongo Java driver from 3.2.2 to 3.4.2. This eliminated the ReadConcern Error on the mappers!
For the timeout, I added this after registering the jars:
SET mapreduce.task.timeout 1800000
I had been using SET mapred.task.timeout which didnt work
Hope this helps anyone who has a similar issue!

Failed to load kafka module

I am trying out the nxlog kafka out module from below
Link
I get the below error message
ERROR Failed to load module from /usr/local/libexec/nxlog/modules/output/om_kafka.so, /usr/local/libexec/nxlog/modules/output/om_kafka.so: undefined symbol: rd_kafka_topic_new;DSO load failed
ERROR module 'outKafka' is not declared at /usr/local/etc/nxlog/nxlog.conf:65
ERROR route tcproute is not functional without output modules, ignored at /usr/local/etc/nxlog/nxlog.conf:65
I am using :
Nxlog Version - nxlog-ce-2.8.1248
Kafka Version - kafka_2.9.2-0.8.1.1
Latest librdkafka
Also the example programe of librdkafka (rdkafka) for both producer and consumer runs fine so I guess the environment is set correct for librdkafka ,
but am unable to determine what's causing this problem.
The problem is that om_kafka.so is not linked with librdkafka.
You will need this in Makefile.am:
om_kafka_la_LIBADD = $(LIBRDKAFKA) $(LIBNX)
The value of $(LIBRDKAFKA) should be properly set ,normally this is done in configure.in. Otherwise you could just use the full path to the library (.so or .la or .a )

Error running hadoop application in Eclipse on Windows

I'm trying to set up an Eclipse environment for developing and debugging hadoop. I'm following Tom White's Definitive Hadoop 3rd ed. What I would like to do is get the MaxTemperature app working locally on my Windows within Eclipse before moving it to my Hortonworks sandbox VM. The comment on page 158 about using the local job runner seems to be what I want. I don't want to set up a full hadoop implementation on Windows. I'm hoping with the right config params I can convince it to run as a java application inside Eclipse.
Windows: 7
Eclipse: Luna
Hadoop: 2.4.0
JDK: 7
When I set the Run configuration for MaxTemperatureDriver (Source code on page 157) to
inputfile outputdir foo (deliberate bogus 3rd parameter)
I get the usage message so I know I'm running my program with those params.
If I remove the bogus third param I get
Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at mark.MaxTemperatureDriver.run(MaxTemperatureDriver.java:52)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at mark.MaxTemperatureDriver.main(MaxTemperatureDriver.java:56)
I've tried inserting -conf but it seems to be ignored. There is no error message if I specify a nonexistent path.
I've tried inserting -fs file:/// -jt local, but it makes no difference
I've tried inserting -D mapreduce.framework.name=local
I've tried specifying the input and output with the file: format
Note. I'm not asking about how to configure eclipse to connect to a remote Hadoop installation. I want the application to run within eclipse.
Is this possible? Any ideas?
Additional info:
I turned on debugging. I saw:
582 [main] DEBUG org.apache.hadoop.mapreduce.Cluster - Trying ClientProtocolProvider : org.apache.hadoop.mapred.YarnClientProtocolProvider
583 [main] DEBUG org.apache.hadoop.mapreduce.Cluster - Cannot pick org.apache.hadoop.mapred.YarnClientProtocolProvider as the ClientProtocolProvider - returned null protocol
I'm wondering not why YarnClientProtocolProvider failed, but why it didn't try LocalClientProtocolProvider.
New info:
It seems that this is an issue with Hadoop 2.4.0. I recreated my environment with Hadoop 1.2.1, followed the instructions in
http://gerrymcnicol.com/index.php/2014/01/02/hadoop-and-cassandra-part-4-writing-your-first-mapreduce-job/
added the Windows hack from
http://bigdatanerd.wordpress.com/2013/11/14/mapreduce-running-mapreduce-in-windows-file-system-debug-mapreduce-in-eclipse
and it all started working.
Following blog will be useful.
Running mapreduce in Windows filesystem