I have a Spark UDF written on Scala. I'd like to use my function with some additional files.
import scala.io.Source
import org.json4s.jackson.JsonMethods.parse
import org.json4s.DefaultFormats
object consts {
implicit val formats = DefaultFormats
val my_map = parse(Source.fromFile("src/main/resources/map.json").mkString).extract[Map[String, Map[String, List[String]]]]
}
Now I want to use my_map object inside UDF. So I basically do this:
import package.consts.my_map
object myUDFs{
*bla-bla and use my_map*
}
I've already tested my function in a local, so it works well.
Now I want to understand how to pack a jar file so that .json file stays there?
Thank you.
If you manage your project with Maven, you can place your .json file(s) under src/main/resources as it's the default place where Maven looks for your project's resources.
You also can define a custom path for your resources as described here: https://maven.apache.org/plugins/maven-resources-plugin/examples/resource-directory.html
UPD: I managed to do so by creating fatJar and reading my resource file this way:
Source
.fromInputStream(
getClass.getClassLoader.getResourceAsStream("map.json")
)
.mkString
).extract[Map[String, Map[String, List[String]]]]
Related
I am trying to create a program in spark scala that read the data from the different sources based on dynamic based on configuration setting.
i am trying to create a program that read the data in different format like csv,parquet and Sequence files dynamic based on configuration setting.
I tried more please help i am new in scala spark
Please use a config file to specify your input file format and location. For example:
import java.io.File
import com.typesafe.config.{ Config, ConfigFactory }
val configFile= System.getProperty("config.file")
val config = ConfigFactory.parseFile(new File(configFile))
val format = config.getString("inputDataFormat")
Based on the above format, write your conditional statements for reading files.
I am trying to read a set of XML files nested in many folders into sequence files in spark. I can read the file names using function recursiveListFiles from How do I list all files in a subdirectory in scala?.
import java.io.File
def recursiveListFiles(f: File): Array[File] = {
val these = f.listFiles
these ++ these.filter(_.isDirectory).flatMap(recursiveListFiles)
}
But how to read the file content as separate column here?
What about using sparks wholeTextFiles method? And parsing the XML yourself afterwards?
I am using the following code:
val fs = FileSystem.get(new Configuration())
val status = fs.listStatus(new Path("wasb:///example/"))
status.foreach(x=> println(x.getPath)
from this question: How to enumerate files in HDFS directory
My problem is that I do not understand how to make an alias for a class and without it the code fails. I found all the classes mentioned in the code and the following code works:
val fs = org.apache.hadoop.fs.FileSystem.get(new org.apache.hadoop.conf.Configuration())
val status = fs.listStatus(new org.apache.hadoop.fs.Path("wasb:///example/"))
status
So the question is: How to make an alias for a class in scala? How to point Path() to org.apache.hadoop.fs.Path()?
I tried this question on Stackoverflow: Class alias in scala, but did not find a connection with my case.
Not sure about your term alias. I think you want to import. e.g.
import org.apache.hadoop.fs.Path
or more generally
import org.apache.hadoop.fs._
Note that you can alias via an import, thus:
import org.apache.hadoop.fs.{Path => MyPath}
and then refer to Path as MyPath. This is particularly useful when writing code that imports 2 classes of the same name but differing packages e.g. java.util.Date and java.sql.Date. Aliasing allows you to resolve that confusion.
I am new to Scala programming and I wanted to read a properties file in Scala.
I can't find any APIs to read a property file in Scala.
Please let me know if there are any API for this or other way to read properties files in Scala.
Beside form Java API, there is a library by Typesafe called config with a good API for working with configuration files of different types.
You will have to do it in similar way you would with with Scala Map to java.util.Map. java.util.Properties extends java.util.HashTable whiche extends java.util.Dictionary.
scala.collection.JavaConverters has functions to convert to and fro from Dictionary to Scala mutable.Map:
val x = new Properties
//load from .properties file here.
import scala.collection.JavaConverters._
scala> x.asScala
res4: scala.collection.mutable.Map[String,String] = Map()
You can then use Map above. To get and retrieve. But if you wish to convert it back to Properties type (to store back etc), you might have to type cast it manually then.
You can just use the Java API.
Consider something along the lines
def getPropertyX: Option[String] = Source.fromFile(fileName)
.getLines()
.find(_.startsWith("propertyX="))
.map(_.replace("propertyX=", ""))
How do I get the list of files (or all *.txt files for example) in a directory in Scala.
The Source class does not seem to help.
new java.io.File(dirName).listFiles.filter(_.getName.endsWith(".txt"))
The JDK7 version, using the new DirectoryStream class is:
import java.nio.file.{Files, Path}
Files.newDirectoryStream(path)
.filter(_.getFileName.toString.endsWith(".txt"))
.map(_.toAbsolutePath)
Instead of a string, this returns a Path, which has loads of handy methods on it, like 'relativize' and 'subpath'.
Note that you will also need to import import scala.collection.JavaConversions._ to enable interop with Java collections.
The Java File class is really all you need, although it's easy enough to add some Scala goodness to iteration over directories easier.
import scala.collection.JavaConversions._
for(file <- myDirectory.listFiles if file.getName endsWith ".txt"){
// process the file
}
For now, you should use Java libraries to do so.