Error running hadoop application in Eclipse on Windows - eclipse

I'm trying to set up an Eclipse environment for developing and debugging hadoop. I'm following Tom White's Definitive Hadoop 3rd ed. What I would like to do is get the MaxTemperature app working locally on my Windows within Eclipse before moving it to my Hortonworks sandbox VM. The comment on page 158 about using the local job runner seems to be what I want. I don't want to set up a full hadoop implementation on Windows. I'm hoping with the right config params I can convince it to run as a java application inside Eclipse.
Windows: 7
Eclipse: Luna
Hadoop: 2.4.0
JDK: 7
When I set the Run configuration for MaxTemperatureDriver (Source code on page 157) to
inputfile outputdir foo (deliberate bogus 3rd parameter)
I get the usage message so I know I'm running my program with those params.
If I remove the bogus third param I get
Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at mark.MaxTemperatureDriver.run(MaxTemperatureDriver.java:52)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at mark.MaxTemperatureDriver.main(MaxTemperatureDriver.java:56)
I've tried inserting -conf but it seems to be ignored. There is no error message if I specify a nonexistent path.
I've tried inserting -fs file:/// -jt local, but it makes no difference
I've tried inserting -D mapreduce.framework.name=local
I've tried specifying the input and output with the file: format
Note. I'm not asking about how to configure eclipse to connect to a remote Hadoop installation. I want the application to run within eclipse.
Is this possible? Any ideas?
Additional info:
I turned on debugging. I saw:
582 [main] DEBUG org.apache.hadoop.mapreduce.Cluster - Trying ClientProtocolProvider : org.apache.hadoop.mapred.YarnClientProtocolProvider
583 [main] DEBUG org.apache.hadoop.mapreduce.Cluster - Cannot pick org.apache.hadoop.mapred.YarnClientProtocolProvider as the ClientProtocolProvider - returned null protocol
I'm wondering not why YarnClientProtocolProvider failed, but why it didn't try LocalClientProtocolProvider.
New info:
It seems that this is an issue with Hadoop 2.4.0. I recreated my environment with Hadoop 1.2.1, followed the instructions in
http://gerrymcnicol.com/index.php/2014/01/02/hadoop-and-cassandra-part-4-writing-your-first-mapreduce-job/
added the Windows hack from
http://bigdatanerd.wordpress.com/2013/11/14/mapreduce-running-mapreduce-in-windows-file-system-debug-mapreduce-in-eclipse
and it all started working.

Following blog will be useful.
Running mapreduce in Windows filesystem

Related

Use Kafka Connect with jcustenborder / kafka-connect-twitter

I'm trying to use Kafka Connect with kafka-connect-twitter from jcustenborder in Github to introduce Twitter tweets into Kafka. The instructions say:
mvn clean package
export CLASSPATH="$(find target/ -type f -name '*.jar'| grep '\-package' | tr '\n' ':')"
$CONFLUENT_HOME/bin/connect-standalone connect/connect-avro-docker.properties config/TwitterSourceConnector.properties
The export CLASSPATH line in fact does not work and returns nothing when run. The connect avro docker properties file seems to want to use the jars available in target/kafka-connect-target/usr/share/kafka-connect after running mvn clean package in the kafka-connect-twitter repository.
When I run
connect-standalone.sh connect-avro-docker.properties TwitterSourceConnector.properties in the directory where these two .properties are present, since connect-standalone.sh is in the path, I get the error:
2021-11-12 18:22:05,267] ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectStandalone:126)
org.apache.kafka.common.config.ConfigException: Invalid value io.confluent.connect.avro.AvroConverter for configuration key.converter: Class io.confluent.connect.avro.AvroConverter could not be found.
at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:744)
at org.apache.kafka.common.config.ConfigDef.parseValue(ConfigDef.java:490)
at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:483)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:108)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:129)
at org.apache.kafka.connect.runtime.WorkerConfig.<init>(WorkerConfig.java:452)
at org.apache.kafka.connect.runtime.standalone.StandaloneConfig.<init>(StandaloneConfig.java:42)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:80)
It's not finding the jar where the AvroConverter is.
I'm using Kafka 2.13-2.8.0 and the 0.3.34 jcustenborder kafka-connect-twitter.
I see nowhere possible jars where the AvroConverter might be, in the Kafka distribution. Does it include Kafka Connect?
Note that I'm using an install of Kafka in an iMac, I'm not using Docker for running Kafka.
EDIT:
instead of using the avro properties file, I'm using the connect-standalone.properties. Although the log says it has loaded the guava jar:
INFO Loading plugin from: /Users/paupaches/dev/books/kafkabeginnerscourse/kafka-connect/connectors/kafka-connectors-twitter/guava-30.1.1-jre.jar (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:246)
[2021-11-13 10:50:22,992] INFO Registered loader: PluginClassLoader{pluginLocation=file:/Users/paupaches/dev/books/kafkabeginnerscourse/kafka-connect/connectors/kafka-connectors-twitter/guava-30.1.1-jre.jar} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:269)
I get the error
ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectStandalone:126)
java.lang.NoClassDefFoundError: com/google/common/collect/Multimap
I am using openjdk 17.
It is not necessary to export the CLASSPATH, using the "connectors" path in the plugin.path property in the file connect-standalone.properties is enough.
The "connectors" directory contains the kafka-connect-twitter directory which contains all the jars generated when running "mvn clean package" in the connector working copy.
In the end I used openjdk 8, although 17 is also installed in my Mac.

RuntimeException trying to run Drools examples

I downloaded Drools 7.46.0.Final and extracted the contents to my local drive. When I try to run the examples from the Linux command line using the provided runExamples.sh script I'm getting the following exception. I've tried with Java 8 and Java 11 (the only versions I have installed). Does this really require Java 6 like the message recommends or is there some other problem here?
I'm new to Drools, so I'm afraid I'm not sure how to troubleshoot this.
UPDATE: interestingly I tried version 7.44.0.Final and that runs fine. So downloaded 7.45.0.Final and that one is broken too. So something changed between 7.44 and 7.45 that's causing this.
10:06:44.154 [main] INFO o.k.a.i.utils.ServiceDiscoveryImpl.processKieService:129 - Cannot load service: org.kie.internal.process.CorrelationKeyFactory
10:06:44.157 [main] ERROR o.k.a.i.utils.ServiceDiscoveryImpl.processKieService:131 - Loading failed because There already exists an implementation for service org.drools.core.reteoo.KieComponentFactoryFactory with same priority 0
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.drools.dynamic.DynamicServiceRegistrySupplier.get(DynamicServiceRegistrySupplier.java:32)
at org.drools.dynamic.DynamicServiceRegistrySupplier.get(DynamicServiceRegistrySupplier.java:23)
at org.kie.api.internal.utils.ServiceRegistry$Impl.getServiceRegistry(ServiceRegistry.java:88)
at org.kie.api.internal.utils.ServiceRegistry$ServiceRegistryHolder.<clinit>(ServiceRegistry.java:47)
at org.kie.api.internal.utils.ServiceRegistry.getInstance(ServiceRegistry.java:39)
at org.kie.api.internal.utils.ServiceRegistry.getService(ServiceRegistry.java:35)
at org.kie.api.KieServices$Factory$LazyHolder.<clinit>(KieServices.java:358)
at org.kie.api.KieServices$Factory.get(KieServices.java:365)
at org.kie.api.KieServices.get(KieServices.java:349)
at org.drools.examples.DroolsExamplesApp.<init>(DroolsExamplesApp.java:59)
at org.drools.examples.DroolsExamplesApp.main(DroolsExamplesApp.java:52)
Caused by: java.lang.RuntimeException: Unable to build kie service url = jar:file:/home/davek/apps/drools-distribution-7.46.0.Final/examples/binaries/drools-examples-7.46.0.Final.jar!/META-INF/kie.conf
at org.kie.api.internal.utils.ServiceDiscoveryImpl.registerConfs(ServiceDiscoveryImpl.java:105)
at org.kie.api.internal.utils.ServiceDiscoveryImpl.lambda$getServices$1(ServiceDiscoveryImpl.java:83)
at java.util.Optional.ifPresent(Optional.java:159)
at org.kie.api.internal.utils.ServiceDiscoveryImpl.getServices(ServiceDiscoveryImpl.java:81)
at org.kie.api.internal.utils.ServiceRegistry$Impl.<init>(ServiceRegistry.java:60)
at org.drools.dynamic.DynamicServiceRegistrySupplier$LazyHolder.<clinit>(DynamicServiceRegistrySupplier.java:27)
... 11 more
Caused by: java.lang.RuntimeException: There already exists an implementation for service org.drools.core.reteoo.KieComponentFactoryFactory with same priority 0
at org.kie.api.internal.utils.ServiceDiscoveryImpl$PriorityMap.put(ServiceDiscoveryImpl.java:222)
at org.kie.api.internal.utils.ServiceDiscoveryImpl.processKieService(ServiceDiscoveryImpl.java:124)
at org.kie.api.internal.utils.ServiceDiscoveryImpl.registerConfs(ServiceDiscoveryImpl.java:101)
... 16 more
Unfortunately this is a known issue that I fixed with this commit.
Upcoming Drools 7.47.0.Final (to be released next week) won't suffer of this.
Switching to version 8.16.0.Beta or newer resolved this for me

Failed to start Neo4j service

I am using neo4j enterprise 3.0.3 version for windows. Following the operations manual 3.0, I have installed the neo4j service with bin\neo4j install-service. But I can't start it with bin\neo4j start. It said
Invoke-Neo4j : Failed to start service 'Neo4j Graph Database - neo4j (neo4j)'.
And I can't start the neo4j service in windows serice either. Maybe anyone have encountered this case before?
I had the same problem: I am using neo4j community 3.1.2 for windows and installed the service with the neo4j.bat file without any problems.Then i wanted to start the service with neo4j.bat and got the same error as you
I found a solution that worked for me. My neo4j files were in a folder, where the path to the folder contained spaces (C:\Program Files\Neo4j) Then i moved the folder one level up (C:\Neo4j).
After that i could start the service without problems.
Maybe this solution helps.
I am running neo4j on windows and in my case the crux of the issue was that there was an incompatibility between the installed versions of Java (32-bit) v/s OS version. The biggest clue that led me to this is the following set of lines in neo4j-service.2018-08-03 log file
[2018-08-03 14:55:42] [info] [ 1432] Starting service...
[2018-08-03 14:55:42] [error] [ 1432] %1 is not a valid Win32 application.
[2018-08-03 14:55:42] [error] [ 1432] Failed creating java C:\JavaNew\bin\server\jvm.dll
[2018-08-03 14:55:42] [error] [ 1432] %1 is not a valid Win32 application.
[2018-08-03 14:55:42] [error] [ 1432] ServiceStart returned 1
There are a fair number of potential issues, and I have made an attempt to compile all the issues with this,
Windows services cannot deal with service names in folders that have spaces; especially if there is another folder with the same name as the one with spaces.
For example - C:\Program Files... will have issues if C:\Program\Something...
To work around this, I put Neo4j in root folder c:\Neo4j
Get-Java.ps1 (under ..\bin\Neo4j-Management folder)looks in the path variable for 'JAVA_HOME' (usually found in *nix environments). If it does not find it here, it keeps looking in registry, and finally throws up its hand!
To deal with this, I simply put in a path variable. For a good measure, I uninstalled Java and re-installed Java in the root folder under C:\JavaNew
In retrospect, this step is probably not on part of the problem, and hence can be ignored. But I am leaving it here for completeness sake.
Invoke-Neo4j.ps1 (also under ..\bin\Neo4j-Management folder) has code that determines if the OS is 32-bit (or 64-bit). Based on this it determines if it should run prunsrv-i386.exe (32-bit) or prunsrv-amd64.exe (64-bit).
This has to match the Java version installed.
Upon running java -XshowSettings:all, and inspecting the sun.arch.data.model value (32, in my case), I realized that my OS is 64 bit and the Java version is 32-bit.
To deal with this, I put in code (very klugey!). I am sure there are much better ways to get to the same outcome, but this is what I used.
switch ( (Get-WMIObject -Class Win32_Processor | Select-Object -First 1).Addresswidth ) {
32 { $PrunSrvName = 'prunsrv-i386.exe' } # 4 Bytes = 32bit
#64 { $PrunSrvName = 'prunsrv-amd64.exe' } # 8 Bytes = 64bit COMMENTED as a workaround!!!
64 { $PrunSrvName = 'prunsrv-i386.exe' } # 8 Bytes = 64bit
Now, uninstall the neo4j service, install it, and start the service.
Hope this works for you.
neo4j console
Posting for latest versions > 4.x
I had the same issue using neo4j start, Neo4j console is the right command I was looking for. It is a web-based graph that acts as an interactive tutorial.
i had the same problem , after the neo4j worked for few weeks it stoop working (without any change that i made)
i have set java_home uninstall and install and now it works
neo4j-enterprise-3.3.4
I was also having weired issue as there was no error but neo4J service did not start.
[xx#ss1 bin]$ ./neo4j console
[xx#ss1 bin]$ .
The problem was with the permission on Java directory and I tried
chmod -R 777 jdk_directory
and problem got solved.
#neo4j #neo4jnotstarting

Jetty Web Server unable to start "java.io.IOException: cannot read file:.."

015-04-08 12:56:30 Commons Daemon procrun stderr initialized
java.io.IOException: Cannot read file: C:\Streem\web\modules\annotations.mod
at org.eclipse.jetty.start.Modules.registerModule(Modules.java:549)
at org.eclipse.jetty.start.Modules.registerAll(Modules.java:486)
at org.eclipse.jetty.start.Main.processCommandLine(Main.java:608)
at org.eclipse.jetty.start.Main.main(Main.java:111)
I checked that installed Java version 1.7.0_25 and npn-1.7.0_25.mod do exist under web\modules\protonego-impl\
I am using jetty-9.2.5.v20141112 on windows 2008 R2 server
Does annotations.mod need something special regarding this case?
We found the same problem on Windows Server 2008. It happens when Jetty is trying to read the module configuration files and is due to a fault in the check for readability.
In the jetty source file FS.java line 39 a check is made using java.nio, to see if the file is readable:
public static boolean canReadFile(Path path)
{
return Files.exists(path) && Files.isRegularFile(path) && Files.isReadable(path);
}
The call to isReadable is slow and fails, see also:
http://mail.openjdk.java.net/pipermail/nio-discuss/2012-July/000672.html
The file itself is in fact readable and can be successfully read from Java, but the isReadable incorrectly returns false.
There are two possible workarounds:
Upgrade to Java 8
Remove the check for isReadable from the Jetty source (in any case if the file wasn't readable the reading will fail with an exception).
(See also similar question unable to start jetty service through command in window 7)
This is a fundamental I/O error, something prevented Jetty from reading that file.
Try some basic troubleshooting ...
File permissions?
Windows File Locking issue? (a different process has that file open?)

Selenium looping through jenkins and permission denied in cli

After struggling to get proper testsuites, I'm now pretty disappointed by the fact that , while following as close as possible this tutorial (pretty straightforward, right ?) Setting up Selenium server on a headless Jenkins CI build machine, Jenkins keeps looping on the current build, outputting :
So I decided to run a selenium build by hand on the ci machine, and got this :
user#machine:/var/log$ export DISPLAY=":99" && java -jar /var/lib/selenium/selenium- server.jar -browserSessionReuse -htmlSuite *firefox http://staging.site.com /var/lib/jenkins/jobs/project/workspace/tests/selenium/testsuite.html /var/lib/jenkins/jobs/project/workspace/logs/selenium.html
24 janv. 2012 19:27:56 org.openqa.grid.selenium.GridLauncher main
INFO: Launching a standalone server
19:27:59.927 INFO - Java: Sun Microsystems Inc. 20.0-b11
19:27:59.929 INFO - OS: Linux 3.0.0-14-generic amd64
19:27:59.951 INFO - v2.17.0, with Core v2.17.0. Built from revision 15540
19:27:59.958 INFO - Will recycle browser sessions when possible.
19:28:00.143 INFO - RemoteWebDriver instances should connect to: http://127.0.0.1:4444/wd/hub
19:28:00.144 INFO - Version Jetty/5.1.x
19:28:00.145 INFO - Started HttpContext[/selenium-server/driver,/selenium-server/driver]
19:28:00.147 INFO - Started HttpContext[/selenium-server,/selenium-server]
19:28:00.147 INFO - Started HttpContext[/,/]
19:28:00.183 INFO - Started org.openqa.jetty.jetty.servlet.ServletHandler#16ba8602
19:28:00.184 INFO - Started HttpContext[/wd,/wd]
19:28:00.199 INFO - Started SocketListener on 0.0.0.0:4444
19:28:00.199 INFO - Started org.openqa.jetty.jetty.Server#6f7a29a1
HTML suite exception seen:
java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:900)
at org.openqa.selenium.server.SeleniumServer.runHtmlSuite(SeleniumServer.java:603)
at org.openqa.selenium.server.SeleniumServer.boot(SeleniumServer.java:287)
at org.openqa.selenium.server.SeleniumServer.main(SeleniumServer.java:245)
at org.openqa.grid.selenium.GridLauncher.main(GridLauncher.java:54)
19:28:00.218 INFO - Shutting down...
19:28:00.220 INFO - Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=4444]
While understanding the output is'nt that hard, finding what to do to remove this issue is.
Any chance you guys already have been facing that kind of stuff ? Thanks
I only just got past these problems myself, but I was able to run your command when I pointed it at my .jar, testSuite and report file. I'm thinking that perhaps the location of your files under,
/var/lib/selenium
could be part of the problem. Try putting them where your user has permission perhaps under
/home/USERNAME/selenium
Other than that the only thing I can say is make sure your .jar, testSuite and report file are valid.
Also (I assume this is an error of copy and paste into stack overflow) but, this part of your command is incorrect
/var/lib/selenium/selenium- server.jar
You are not getting the error I would expect from an incorrect jar location so I assume something was lost when you pasted to stackoverflow.