Hadoop - Input path does not exist - eclipse

I did set up the hadoop Ubuntu OS, followed all the necessary steps,
1.created the hdfs file system
2.Moved the text files to input directory
3.having privilege to access all the directories.
but when run the simple word count example, i got
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/root/input
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
but, input path is valid and even can able view the files in that path from eclipse itself, so plz assist were i am wrong.
attached the screenshot for the reference

Add the following 2 lines in your code :
config.addResource(new Path("/HADOOP_HOME/conf/core-site.xml"));
config.addResource(new Path("/HADOOP_HOME/conf/hdfs-site.xml"));
Your client is looking into the local FS.

For hadoop-2.2.0 on windows 7, I added the below lines and it solved the issue (NOTE: My Hadoop Home is: C:\MyWork\MyProjects\Hadoop\hadoop-2.2.0)
Configuration conf = new Configuration();
conf.addResource(new Path("C:\MyWork\MyProjects\Hadoop\hadoop-2.2.0\etc\hadoop\core-site.xml"));
conf.addResource(new Path("C:\MyWork\MyProjects\Hadoop\hadoop-2.2.0\etc\hadoop\hdfs-site.xml"));

Related

SolrCloud Configset API upload returns 500 "KeeperErrorCode = NoNode"

Situation
First of all I must mention that I'm using Solr 8.1.1 and am running the default "solr -e cloud" to do some testing. This is running on a Windows Azure VM. I'm trying to create a PowerShell script that will do some setup on the SolrCloud. The first step in this is uploading a custom Configset. I was using https://lucene.apache.org/solr/guide/8_1/configsets-api.html as guide and the PowerShell command if you take away all the parameters and such boils down to the following:
Invoke-WebRequest -Uri "http://localhost:8983/solr/admin/configs?action=UPLOAD&name=MyConfig" -Method Post -ContentType "application/octet-stream" -InFile "config.zip"
EDIT: For clarity the contents of the ZIP is as follows: https://imgur.com/a/OHR1bf1
Problem
When I run the above command however I'm met with the following error:
Invoke-WebRequest : { "responseHeader":{ "status":500, "QTime":11}, "error":{ "msg":"KeeperErrorCode = NoNode for /configs/MyConfig/lang/contractions_ca.txt", "trace":"org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /configs/MyConfig/lang/contractions_ca.txt\r\n\tat org.apache.zookeeper.KeeperException.create(KeeperException.java:114)\r\n\tat
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)\r\n\tat org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:792)\r\n\tat
org.apache.solr.common.cloud.SolrZkClient.lambda$create$7(SolrZkClient.java:415)\r\n\tat org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:71)\r\n\tat
org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:415)\r\n\tat org.apache.solr.handler.admin.ConfigSetsHandler.createZkNodeIfNotExistsAndSetData(ConfigSetsHandler.java:201)\r\n\tat
org.apache.solr.handler.admin.ConfigSetsHandler.handleConfigUploadRequest(ConfigSetsHandler.java:181)\r\n\tat org.apache.solr.handler.admin.ConfigSetsHandler.handleRequestBody(ConfigSetsHandler.java:111)\r\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\r\n\tat org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:796)\r\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:762)\r\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:522)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:397)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\r\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\r\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\r\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\r\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\r\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\r\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:502)\r\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)\r\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\r\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\r\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\r\n\tat
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\r\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\r\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\r\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\r\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\r\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\r\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)\r\n\tat java.lang.Thread.run(Thread.java:748)\r\n",
"code":500}}
At line:1 char:1
Observations
When I first "failed" I had created a zip file from my config which contained an additional top level folder (ea instead of MyConfig/solrconfig.xml etc my ZIP was MyConfig/MyConfig/solrconfig.xml) and when I used this the command was run successful but the second command (creating a collection) would fail because it could not find solrconfig.xml. This tells me that the ZIP is correctly present in the request and Solr does seem capable of processing it but once I correct it to an actual configset it massively fails?
EDIT: I was asked about this and whether using "conf" in the zip would work. As I mentioned here this results in a successful upload (https://imgur.com/a/JHLZ8td) however as you can see it does not match the other config sets and when you try to create a collection with this set you will get Error CREATEing SolrCore 'Test_shard1_replica_n1': Unable to create core [Test_shard1_replica_n1] Caused by: Can't find resource 'solrconfig.xml' in classpath or '/configs/Sitecore', cwd=C:\solr-8.1.1\server
Question(s)
What am I doing wrong? Is this a bug?
Going back through some work I did on SolrCloud a while ago, I am reminded of one annoying issue I hit:
I got odd issues uploading the schema config zip files if I had created that zip using "Send to Compressed Folder" in the Windows UI, or via Compress-Archive in PowerShell. I found that compressing the data with 7Zip did work, however.
I suspect there's something incompatible between the Windows zip code (which I think is quite old, and something they licensed ages ago?) and how Solr/ZooKeeper deals with extracting the files again?
I just ran into the same issue without using Windows zip code. I was trying to upload a configset to Solr 7.7.3 from a conf directory containing a "lang" subdirectory with a bunch of files. I got the NoNode error for /configs/_myconfigsetname_/lang/stopwords_eu.txt. The configset was being zipped on the fly through a recursive directory walk in Java, sending each filename to the Zip file using Java's ZipOutputStream. The resulting zipped bytes were then sent to Solr/Zookeeper.
This code worked fine for conf directories without subdirectories. It turned out that when there is a subdirectory, it was necessary to create a ZipEntry for the directory (e.g. lang/) before adding files to the Zip stream such as lang/stopwords_eu.txt.

citrus waitFor().file fails to read a file

I’m trying to use waitFor() in my Citrustest to wait for an output file on disk to be written by the process I’m testing. I’ve used this code
outputFile = new File “/esbfiles/blesbt/bl03orders.99160221.14289.xml");
waitFor().file(outputFile).seconds(65L).interval(1000L);
after a few seconds, the file appears in the folder as expected. The user I’m running the test code as has permissions to read the file. The waitFor(), however, ends in a timeout.
09:46:44 09:46:44,818 DEBUG dition.FileCondition| Checking file path '/esbfiles/blesbt/bl03orders.99160221.14289.xml'
09:46:44 09:46:44,818 WARN dition.FileCondition| Failed to access file resource 'class path resource [esbfiles/blesbt/bl03orders.99160221.14289.xml] cannot be resolved to URL because it does not exist'
What could be the problem? Can’t I check for files outside the classpath?
This is actually a bug in Citrus. Citrus is working with the file path instead of the file object and in combination with Spring's PathMatchingResourcePatternResolver this causes Citrus to search for a classpath resource instead of using the absolute file path as external file system resource.
You can fix this by providing the absolute file path instead of the file object like this:
waitFor().file(“file:/esbfiles/blesbt/bl03orders.99160221.14289.xml")
.seconds(65L)
.interval(1000L);
Issue regarding broken file object conversion has been opened: https://github.com/christophd/citrus/issues/303
Thanks for pointing to it!

Cudafy chapter 3 example has path issue how to fix?

Using Cudafy version 1.29, which can be downloaded from here
I am executing the examples that are found in the install folder CudafyV1.29\CudafyByExample\
Specifically, "chapter 3" example that begins line 42 of program.cs calls the following:
simple_kernel.Execute();
which is this:
public static void Execute()
{
CudafyModule km = CudafyTranslator.Cudafy(); // <--exception thrown!
GPGPU gpu = CudafyHost.GetDevice(CudafyModes.Target, CudafyModes.DeviceId);
gpu.LoadModule(km);
gpu.Launch().thekernel(); // or gpu.Launch(1, 1, "kernel");
Console.WriteLine("Hello, World!");
}
The indicated line throws this exception:
Compilation error: CUDAFYSOURCETEMP.cu
'C:\Program' is not recognized as an internal or external command,
operable program or batch file. .
Which is immediately obvious that the path has spaces and the programmer did not double quote or use ~ to make it operational.
So, I did not write this code. And I cannot step through the sealed code contained within CudafyModule km = CudafyTranslator.Cudafy();In fact I don't even know the full path that is causing the exception, it is cut-off in the exception message.
Does anyone have a suggestion for how to fix this issue?
Update #1: I discovered where CUDAFYSOURCETEMP.cu lives on my computer, here it is:
C:\Users\humphrt\Desktop\Active Projects\Visual Studio
Projects\CudafyV1.29\CudafyByExample\bin\Debug
...I'm still trying to determine what the program is looking for along the path to 'C:\Program~'.
I was able to apply a workaround to bypass this issue. The workaround is to reinstall all components of cudafy in to folders with paths with no ' ' (spaces). My setup looks like the below screenshot. Notice that I also installed the CUDA TOOLKIT from NVIDIA in the same folder - also with no spaces in folder names.
I created a folder named "C:\CUDA" and installed all components within it, here is the folder structure:

Spark Shell unable to read file at valid path

I am trying to read a file in Spark Shell that comes with CentOS distribution of Cloudera on my local machine. Following are the commands I have entered in Spark Shell.
spark-shell
val fileData = sc.textFile("hdfs://user/home/cloudera/cm_api.py");
fileData.count
I also tried this statment for reading file:
val fileData = sc.textFile("user/home/cloudera/cm_api.py");
However I am getting
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera/cm_api.py
I haven't changed any settings or configurations. What am I doing wrong?
You are missing the leading slash in your url, so the path is relative. To make it absolute, use
val fileData = sc.textFile("hdfs:///user/home/cloudera/cm_api.py")
or
val fileData = sc.textFile("/user/home/cloudera/cm_api.py")
I think you need to put the file in hdfs first: hadoop fs -put, then check the file: hadoop fs -ls, then go spark-shell , val fileData = sc.textFile("cm_api.py")
In "hdfs://user/home/cloudera/cm_api.py", you are missing the hostname of the URI. You should have pass something like "hdfs://<host>:<port>/user/home/cloudera/cm_api.py", where <host> is Hadoop NameNode host and the <port> is, well, port number of Hadoop NameNode, which is 50070 by default.
The error message says hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera/cm_api.py does not exist. The path looks suspicious! The file you mean is probably at hdfs://quickstart.cloudera:8020/user/cloudera/cm_api.py.
If it is, you can access it by using that full path. Or, if the default file system is configured as hdfs://quickstart.cloudera:8020/user/cloudera/, you can use simply cm_api.py.
You may be confused between HDFS file paths and local file paths. By specifying
hdfs://quickstart.cloudera:8020/user/home/cloudera/cm_api.py
you are saying two things:
1) there is a computer by the name "quickstart.cloudera' reachable via the network (try ping to ensure that is the case), and it is running HDFS.
2) the HDFS file system contains a file at /user/home/cloudera/cm_api.py (try 'hdfs dfs -ls /user/home/cloudera/' to verify this
If you are trying to access a file on the local file system you have to use a different URI:
file:///user/home/cloudera/cm_api.py

opkg install error - wfopen no such file or directory

I have followed instructions to create an .ipk file, the Packages.gz and host them on a web server as a repo. I have set the opkg.conf in my other VM to point to this repo. The other VM is able to update and list the contents of repositories successfully.
But, when I try to install, I get this message. Can you please describe why I am getting this and what needs to be changed?
Collected errors:
* wfopen: /etc/repo/d1/something.py: No such file or directory
* wfopen: /etc/repo/d1/something-else.py: No such file or directory
While creating the .ipk, I had created a folder named data that had a file structure as /etc/repo/d1/ with the file something.py stored at d1 location. I zipped that folder to data.tar.gz. And, then together with control.tar.gz and 'debian-binary`, I created the .ipk.
I followed instructions from here:
http://bitsum.com/creating_ipk_packages.htm
http://www.jumpnowtek.com/yocto/Managing-a-private-opkg-repository.html
http://www.jumpnowtek.com/yocto/Using-your-build-workstation-as-a-remote-package-repository.html
It is very likely that the directory called /etc/repo/d1/ does not exist on the target system. If you create the folder manually, and try installing again, it probably will not fail. I'm not sure how to force opkg to create the empty directory by itself :/
Update:
You can solve this problem using a preinst script. Just create the missing directories on it, like this:
#!/bin/sh
mkdir -p /etc/repo/d1/
# always return 0 if success
exit 0