Unable to view any folders on DFS locations connecting to hadoop from eclipse - eclipse

I have setup Hadoop1.2.1 in windows with CYGWIN installed.
I have started sshd service.
Also started namenode, datanode, mapreduce (job tracker, task tracker). I am able to see the namenode, datanode and mapreduce running status through the following URLs.
When i try connecting the hadoop through eclipse, i am able to.Though i was able to connect hadoop from eclipse, i was not seeing any folders on opening DFS locations. Its displaying as (0) (refer Pic #1 ,
which i guess no directories/files available. The same i checked with namenode storage (refer Pic #2)
Even when i try creating a directory through CYGWIN terminal (refer Pic #4), i was not able to see it in DFS locations in eclipse environment.
That being said, i tried with WordCount example, by setting the input path and output path as follows,
// specify input and output dirs
FileInputFormat.addInputPath(conf, new Path("Input"));
FileOutputFormat.setOutputPath(conf, new Path("Output"));
When i run that in HDFS location from eclipse, i was getting the following exception
13/10/30 06:52:44 ERROR security.UserGroupInformation: PriviledgedActionException as:Administrator cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:47110/user/Administrator/Input
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:47110/user/Administrator/Input
Questions:
Why i am not able to see the directory that i created through CYGWIN terminal and any folders for that matter?
What does "hdfs://localhost:47110" point to?
Am i getting the above exception since it dont see the directory in datanode?
What is the input path should i set?
Please advice me on this.
Thanks in advance.

1st you should have check the all the setting of your hadoop cluster from scratch beacuse this problem shows that you have not configure your eclipse properly with the hadoop cluster
see the following link which help you...
https://www.youtube.com/watch?v=TavehEdfNDk
also check you dfs is connected to your cluster or not means are able to store file and in your dfs or not..

Related

Kafka Connect On windows failed to find the connector class

I have installed Kafka on my windows machine and everything is supposed to work properly until i insert a connector configuration file which contains the io debezium mysql connector. all i found on google for this bug is about the plugin.path so i did all the possible workarounds to make it work in vain: Change the jar folders to literally c:/mypluginfolder using " \ " and changing it to "/" in my paths, using absolute paths adding the directory in classpath ect... the logs even say that some of the debezium plugins are being added before it crashes so technically the server sees that path. Help a fellow out i've been at loss for more than 2 weeks. thank you.
The cmd output: https://controlc.com/880d73f2
my standalone.props and connector: https://controlc.com/4ee0164f
PS:Sorry for the controlc links i dont know how to format questions i'm new here.

Watching for files on remote shared folder using tWaitForFile

I am trying tWaitForFile component in Talend to watch for new created files. It seems to be working for local directory (I am using Windows 7).
However, when I point it to a shared folder like //ps1.remotemachine.com/Continents/Africa it doesn't work. It doesn't give me file creation signals like it gives for local directory.
Am I missing something?
Update:
In my testing so far, below are the observations for monitoring files on network path:
Talend tWaitForFile - Inconsistent results. Only gives notification sometimes. Majority of time, doesn't.
Java Nio WatchService - Tried this out of Talend solution. It does give notification for created files on network path. However, when the number of folders to be monitored on network path are too many, it starts missing events of some of the folders. In my case, it was around 100 folders to be monitored.
Hence, aborted both of above approaches and sticking on scheduler based running of Talend jobs.
Use
"\\\\ps1.remotemachine.com/Continents/Africa"
If you use the value from a context then you don't need to double "\"
And in the tWaitForFile

Classpath is empty error when running zookeeper instance

I am trying to follow the instructions on https://kafka.apache.org/quickstart to try and start a Kafka install and then send some messages from a scala client.
I am using a windows system.
I am getting this error(see screencap) when i run the zookeeper instance.
The reason most probably is because your directory path has a space - “Development Tools”. Try running this in a path which has no spaces. I guess the space is causing some path issues in the shell script.
Also, I assume that you downloaded the binary and not the source files?
Hope it works and let us know.

Accessing streamsets web UI on another node in a cluster than where installed, which file system does it 'look in'?

I have a cluster of machines hosting hadoop (MapR) and have install streamsets on one of the nodes (say node002) following the RPM documentation. However, I am accessing the web UI for the data collector from another node, node001.
My question is, when I specify files paths (eg. an origin directory), which file system is the web UI going to be referring to? Eg. if I put an origin directory as /home/myuser/mydata, will the pipeline created in the web UI be looking for that directory in node001 or node002? New to using streamsets, so a more detailed answer would be appreciated. Thanks.
** Ultimately I am asking this because I am currently getting "FileNotFound" and "permission denied" errors while trying to follow the documentation's tutorial and am trying to debug the situation.
From the streamsets community forums: It will be the path to the local file on the machine running that particular SDC instance.
The FileNotFound and permission errors have to do with the fact that the default user for the sdc service is a user called sdc. Still working on how to fix this part, but can produce a workable prototype by setting the read and write access for the directories in question to allow public access (still need to work on this part, but this answers the posted question).

JConsole can't find process

I tried to run JConsole to analyze the memory used by a running process, but JConsole doesn't show me processes even though I am absolutely sure that one is running (in addition to that it should show JConsole in the process list as well but it doesn't).
Does anyone have an idea why it doesn't show any processes?
Cheers
at window prompt, run echo %TMP%, it will give you default temp dir. Go to that directory and find directory named hsperfdata_user where user is your login. This is directory to store your process id. Any new process you created such as java application will have a new file named by process id. Jconsole will pick up the process ids from this directory. If you cannot create a file in this directory, that means you need change permission to allow write. Once done that, start a new java application to see if new process id file is in the dir. Once confirmed, start jconsole
I have the same problem. But if I explicitly specify the PID, as in jconsole 1234, jconsole is able to analyze the process.
If you are running jconsole on windows - simply :
Find jconsole.exe
Right click it
Select run as administrator.
In my case, removal of hsperfdata_USERNAME directory (in %TMP% directory) and closing all the JVMs has helped.
This happens when %TMP% value is different for monitored JVM and the monitoring tool (JConsole/JMC/Java Mission Control, maybe even VisualVM).
This may be the standard scenario with Cygwin (at least in my case: Cygwin+Babun)
Easiest solution is to set value of the TMP environment variable to the default value used by Windows, at least in scope of shell launching the JVM.
You have to start jconsole with the same user as the process you want to analyze is started by.
Just ran into this issue
If you are using multiple jdk's by any chance (ex. SDKMAN), then make sure that jconsole is run using the same jdk as the application
8 years later... I had the same problem. I could only see certain processes but couldn't see and monitor any java processes running in a docker container in Linux.
Inspired by the Windows solution by RoyalBigMack:
Solution 1. Run terminal as super user (su command) and run jconsole
Solution 2. Run solution 1 as one command, sudo jconsole
Only the first solution worked for me, and once the jconsole UI popped up- all the hidden processes were now visible.