Cannot write a string to hdfs file using scala - scala

I wrote some code to create a file in hdfs and write bytes to it. This is the code:
def write(uri: String, filePath: String, data: String): Unit = {
System.setProperty("HADOOP_USER_NAME", "hibou")
val path = new Path(filePath + "/hello.txt")
val conf = new Configuration()
conf.set("fs.defaultFS", uri)
val fs = FileSystem.get(conf)
val os = fs.create(path)
os.writeBytes(data)
os.flush()
fs.close()
}
The code success without error but I only see that file created. When I examine the content of the file with hdfs -dfs -cat /.../hello.txt I don't see any content?

Related

Smile - Model Persistence - How to write models to HDFS?

I am trying to use Smile in my Scala project which uses Spark and HDFS. For reusability of my models, I need to write them to HDFS.
Right now I am using the write object, checking if the path exists beforehand and creating it if it does not (otherwise it would throw a FileNotFoundException) :
import java.nio.file.Paths
val path: String = "hdfs:/my/hdfs/path"
val outputPath: Path = Paths.get(path)
val outputFile: File = outputPath.toFile
if(!outputFile.exists()) {
outputFile.getParentFile().mkdirs(); // This is a no-op if it exists
outputFile.createNewFile();
}
write(mySmileModel, path)
but this creates locally the path "hdfs:/my/hdfs/path" and writes the model in it, instead of actually writing to HDFS.
Note that using a spark model and its save method works:
mySparkModel.save("hdfs:/my/hdfs/path")
Therefore my question: How to write a Smile model to HDFS?
Similarly, if I manage to write a model to HDFS, I will probably also wonder how to read a model from HDFS.
Thanks!
In the end, I solved my problem by writing my own save method for my wrapper class, which roughly amounts to:
import org.apache.hadoop.fs.{FSDataInputStream, FSDataOutputStream, FileSystem, Path}
import org.apache.hadoop.conf.Configuration
import java.io.{ObjectInputStream, ObjectOutputStream}
val path: String = /my/hdfs/path
val file: Path = new Path(path)
val conf: Configuration = new Configuration()
val hdfs: FileSystem = FileSystem.get(new URI(path), conf)
val outputStream: FSDataOutputStream = hdfs.create(file)
val objectOutputStream: ObjectOutputStream = new ObjectOutputStream(outputStream)
objectOutputStream.writeObject(model)
objectOutputStream.close()
Similarly, for loading the saved model I wrote a method doing roughly the following:
val conf: Configuration = new Configuration()
val path: String = /my/hdfs/path
val hdfs: FileSystem = FileSystem.get(new URI(path), conf)
val inputStream: FSDataInputStream = hdfs.open(new Path(path))
val objectInputStream: ObjectInputStream = new ObjectInputStream(inputStream)
val model: RandomForest = objectInputStream.readObject().asInstanceOf[RandomForest]

java.lang.IllegalArgumentException when HDFS file creating

I have HDFS and some text, I want to create file with text. I tried to use HDFS api and FSDataOutputStream, but got an exception. Could you help me please resolve it.
The exception is:
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs:/user/user1, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:647)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:513)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:499)
at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:594)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:448)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:776)
at com.example.FileBuilder$.buildFile(FileBuilder.scala:23)
The code is
val fs = FileSystem.get(new Configuration())
val path = new Path(s"hdfs:////user/" + "fileName.sql")
val fsDataOutputStream = fs.create(path)
val outputStreamWriter = new OutputStreamWriter(fsDataOutputStream, "UTF-8")
val bufferedWriter = new BufferedWriter(outputStreamWriter)
bufferedWriter.write(data)
bufferedWriter.close()
outputStreamWriter.close()
fsDataOutputStream.close()
I think there is some problem with the file path. Can you test by replace below portion of code in yours.
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI(<url:port>), configuration);
Path filePath = new Path("/user/fileName.sql");
val fsDataOutputStream = fs.create(path)

Read data from HDFS

I'm using the FSDataInputStream library to access the data from HDFS
The following is the snippet which I'm using
val fs = FileSystem.get(new java.net.URI(#HDFS_URI),new Configuration())
val stream = fs.open(new Path(#PATH))
val reader = new BufferedReader(new InputStreamReader(stream))
val offset:String = reader.readLine() #Reads the string "5432" stored in the file
Expected output is "5432".
But the actual output is "^#5^#4^#3^#2"
Not able to trim "^#" since they are not considered as characters.Please help with appropriate solution.

Hdfs file list in scala

i am trying to find the list of file in hdfs directory but the code its expecting file as the input when i try to run the below code.
val TestPath2="hdfs://localhost:8020/user/hdfs/QERESULTS1.csv"
val hdfs: org.apache.hadoop.fs.FileSystem = org.apache.hadoop.fs.FileSystem.get(sc.hadoopConfiguration)
val hadoopPath = new org.apache.hadoop.fs.Path(TestPath1)
val recursive = true
// val ri = hdfs.listFiles(hadoopPath, recursive)()
//println(hdfs.getChildFileSystems)
//hdfs.get(sc
val ri=hdfs.listFiles(hadoopPath, true)
println(ri)
You should set your default filesystem to hdfs:// first, I seems like your default filesystem is file://
val conf = sc.hadoopConfiguration
conf.set("fs.defaultFS", "hdfs://some-path")
val hdfs: org.apache.hadoop.fs.FileSystem = org.apache.hadoop.fs.FileSystem.get(conf)
...

How to append text files in HDFS using Hadoop client using Scala?

I want to write text files into HDFS.
The path to which files has to be written to HDFS is dynamically generated.
If a file path(including file name) is new, then the file should be created and text should be written to it.
If the file path(including file) already exists, then the string must be appended to the existing file.
I used the following code. File creation is working fine. But cannot append text to existing files.
def writeJson(uri: String, Json: JValue, time: Time): Unit = {
val path = new Path(generateFilePath(Json, time))
val conf = new Configuration()
conf.set("fs.defaultFS", uri)
conf.set("dfs.replication", "1")
conf.set("dfs.support.append", "true")
conf.set("dfs.client.block.write.replace-datanode-on-failure.enable","false")
val Message = compact(render(Json))+"\n"
try{
val fileSystem = FileSystem.get(conf)
if(fileSystem.exists(path).equals(true)){
println("File exists.")
val outputStream = fileSystem.append(path)
val bufferedWriter = new BufferedWriter(new OutputStreamWriter(outputStream))
bufferedWriter.write(Message.toString)
bufferedWriter.close()
println("Appended to file in path : " + path)
}
else {
println("File does not exist.")
val outputStream = fileSystem.create(path, true)
val bufferedWriter = new BufferedWriter(new OutputStreamWriter(outputStream))
bufferedWriter.write(Message.toString)
bufferedWriter.close()
println("Created file in path : " + path)
}
}catch{
case e:Exception=>
e.printStackTrace()
}
}
Hadoop version : 2.7.0
Whenever append has to be done, the following error is generated:
org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException)
I can see 3 possibilities:
probably the easiest is to use external commands provided by hdfs which is sitting on your Hadoop cluster, see:
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html . Or even WebHDFS REST functionality: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
If you don't want to use hdfs commnads, then you might use hdfs API provided by hadoop-hdfs library http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs/2.7.1
Use Spark, if you want clean Scala solution, e.g. http://spark.apache.org/docs/latest/programming-guide.html or https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter3/save_the_rdd_to_files.html