Why Files.write can't use Array[Byte]()? - scala

I have this code:
val socket=new ServerSocket(25)
val client=socket.accept()
val inputStream=client.getInputStream
var dataBuffer=new Array[Byte](4096)
inputStream.read(dataBuffer)
Files.write("",dataBuffer)
Files.write need a Byte array,and I have give it a Byte array,so why I got error at last line:
Type mismatch,expected:Iterable[_<:CharSequence],actual:Array[Byte]
inputStream.read also need a Byte array param,it can use dataBuffer,so why the next line got a error?How to fix it?Thanks!

If you use java.nio.file.Files you should use Path as first parameter.
val b: Array[Byte] = Array()
Files.write(Paths.get(""), b)

import java.net.ServerSocket
import java.nio.file.{Files, Paths}
object Test1 {
def main(args: Array[String]): Unit = {
val socket = new ServerSocket(9999)
val client = socket.accept()
val inputStream = client.getInputStream
var dataBuffer = new Array[Byte](4096)
inputStream.read(dataBuffer)
Files.write(Paths.get("/home/eiffel/a.txt"), dataBuffer)
}
}

Related

udf No TypeTag available for type string

I don't understand a behavior of spark.
I create an udf which returns an Integer like below
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
object Show {
def main(args: Array[String]): Unit = {
val (sc,sqlContext) = iniSparkConf("test")
val testInt_udf = sqlContext.udf.register("testInt_udf", testInt _)
}
def iniSparkConf(appName: String): (SparkContext, SQLContext) = {
val conf = new SparkConf().setAppName(appName)//.setExecutorEnv("spark.ui.port", "4046")
val sc = new SparkContext(conf)
sc.setLogLevel("WARN")
val sqlContext = new SQLContext(sc)
(sc, sqlContext)
}
def testInt() : Int= {
return 2
}
}
I work perfectly but if I change the return type of method test from Int to String
val testString_udf = sqlContext.udf.register("testString_udf", testString _)
def testString() : String = {
return "myString"
}
I get the following error
Error:(34, 43) No TypeTag available for String
val testString_udf = sqlContext.udf.register("testString_udf", testString _)
Error:(34, 43) not enough arguments for method register: (implicit evidence$1: reflect.runtime.universe.TypeTag[String])org.apache.spark.sql.UserDefinedFunction.
Unspecified value parameter evidence$1.
val testString_udf = sqlContext.udf.register("testString_udf", testString _)
here are my embedded jars:
datanucleus-api-jdo-3.2.6
datanucleus-core-3.2.10
datanucleus-rdbms-3.2.9
spark-1.6.1-yarn-shuffle
spark-assembly-1.6.1-hadoop2.6.0
spark-examples-1.6.1-hadoop2.6.0
I am a little bit lost... Do you have any idea?
Since I can't reproduce the issue copy-pasting just your example code into a new file, I bet that in your real code String is actually shadowed by something else. To verify this theory you can try to change you signature to
def testString() : scala.Predef.String = {
return "myString"
}
or
def testString() : java.lang.String = {
return "myString"
}
If this one compiles, search for "String" to see how you shadowed the standard type. If you use IntelliJ Idea, you can try to use "Ctrl+B" (GoTo) to find it out. The most obvious candidate is that you used String as a name of generic type parameter but there might be some other choices.

Scala- Some bytes are missing because of 'hasNext' function?

Listing below is my code:
import scala.io.Source
object grouped_list {
def encrypt(file: String) = {
val text = Source.fromFile(file)
val list = text.toList
val blocks = list.grouped(501)
while (blocks.hasNext) {
val block0 = blocks.next()
val stringBlock = block0.mkString
println(stringBlock)
}
}
def main (args: Array[String]): Unit = {
val blockcipher = encrypt("E:\\test.txt")
}
The file ("E:\test.txt", https://pan.baidu.com/s/1jHKuWb8) is about 300KB, however, the 'stringBlock' (https://pan.baidu.com/s/1pLygsyR) which is supposed to be the same with the file is about only 80KB, where are the rest? When I am not doing the 'while(blocks.hasNext)', the first 501 bytes of 'stringBlock' is the same with the file's first 501 bytes. After I did the 'while(blocks.hasNext)' , the first 501 bytes of 'stringBlock' is no longer the same with the file's. Is my problem has something to do with the 'hasNext' function?

Serializing to disk and deserializing Scala objects using Pickling

Given a stream of homogeneous typed object, how would I go about serializing them to binary, writing them to disk, reading them from disk and then deserializing them using Scala Pickling?
For example:
object PicklingIteratorExample extends App {
import scala.pickling.Defaults._
import scala.pickling.binary._
import scala.pickling.static._
case class Person(name: String, age: Int)
val personsIt = Iterator.from(0).take(10).map(i => Person(i.toString, i))
val pklsIt = personsIt.map(_.pickle)
??? // Write to disk
val readIt: Iterator[Person] = ??? // Read from disk and unpickle
}
I find a way to so for standard files:
object PickleIOExample extends App {
import scala.pickling.Defaults._
import scala.pickling.binary._
import scala.pickling.static._
val tempPath = File.createTempFile("pickling", ".gz").getAbsolutePath
val outputStream = new FileOutputStream(tempPath)
val inputStream = new FileInputStream(tempPath)
val persons = for{
i <- 1 to 100
} yield Person(i.toString, i)
val output = new StreamOutput(outputStream)
persons.foreach(_.pickleTo(output))
outputStream.close()
val personsIt = new Iterator[Person]{
val streamPickle = BinaryPickle(inputStream)
override def hasNext: Boolean = inputStream.available > 0
override def next(): Person = streamPickle.unpickle[Person]
}
println(personsIt.mkString(", "))
inputStream.close()
}
But I am still unable to find a solution that will work with gzipped files. Since I do not know how to detect the EOF? The following throws an EOFexception since GZIPInputStream available method does not indicate the EOF:
object PickleIOExample extends App {
import scala.pickling.Defaults._
import scala.pickling.binary._
import scala.pickling.static._
val tempPath = File.createTempFile("pickling", ".gz").getAbsolutePath
val outputStream = new GZIPOutputStream(new FileOutputStream(tempPath))
val inputStream = new GZIPInputStream(new FileInputStream(tempPath))
val persons = for{
i <- 1 to 100
} yield Person(i.toString, i)
val output = new StreamOutput(outputStream)
persons.foreach(_.pickleTo(output))
outputStream.close()
val personsIt = new Iterator[Person]{
val streamPickle = BinaryPickle(inputStream)
override def hasNext: Boolean = inputStream.available > 0
override def next(): Person = streamPickle.unpickle[Person]
}
println(personsIt.mkString(", "))
inputStream.close()
}

How to convert Source[ByteString, Any] to InputStream

akka-http represents a file uploaded using multipart/form-data encoding as Source[ByteString, Any]. I need to unmarshal it using Java library that expects an InputStream.
How Source[ByteString, Any] can be turned into an InputStream?
As of version 2.x you achieve this with the following code:
import akka.stream.scaladsl.StreamConverters
...
val inputStream: InputStream = entity.dataBytes
.runWith(
StreamConverters.asInputStream(FiniteDuration(3, TimeUnit.SECONDS))
)
See: http://doc.akka.io/docs/akka-stream-and-http-experimental/2.0.1/scala/migration-guide-1.0-2.x-scala.html
Note: was broken in version 2.0.2 and fixed in 2.4.2
You could try using an OutputStreamSink that writes to a PipedOutputStream and feed that into a PipedInputStream that your other code uses as its input stream. It's a little rough of an idea but it could work. The code would look like this:
import akka.util.ByteString
import akka.stream.scaladsl.Source
import java.io.PipedInputStream
import java.io.PipedOutputStream
import akka.stream.io.OutputStreamSink
import java.io.BufferedReader
import java.io.InputStreamReader
import akka.actor.ActorSystem
import akka.stream.ActorFlowMaterializer
object PipedStream extends App{
implicit val system = ActorSystem("flowtest")
implicit val mater = ActorFlowMaterializer()
val lines = for(i <- 1 to 100) yield ByteString(s"This is line $i\n")
val source = Source(lines)
val pipedIn = new PipedInputStream()
val pipedOut = new PipedOutputStream(pipedIn)
val flow = source.to(OutputStreamSink(() => pipedOut))
flow.run()
val reader = new BufferedReader(new InputStreamReader(pipedIn))
var line:String = reader.readLine
while(line != null){
println(s"Reader received line: $line")
line = reader.readLine
}
}
You could extract an interator from ByteString and then get the InputStream. Something like this (pseudocode):
source.map { data: ByteString =>
data.iterator.asInputStream
}
Update
A more elaborated sample starting with a Multipart.FormData
def isSourceFromFormData(formData: Multipart.FormData): Source[InputStream, Any] =
formData.parts.map { part =>
part.entity.dataBytes
.map(_.iterator.asInputStream)
}.flatten(FlattenStrategy.concat)

Scala: Adding elements to Set inside 'foreach' doesn't persist

I create a mutable set and iterate over a list using a 'foreach' to populate the set. When I print the set inside the foreach, it prints the contents of the set correctly. However, the set is empty after the end of 'foreach'. I am not able to figure out what I am missing.
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
object SparkTest {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Spark Test")
val sc = new SparkContext(conf)
val graph = GraphLoader.edgeListFile(sc, "followers.txt")
val edgeList = graph.edges
var mapperResults = iterateMapper(edgeList)
sc.stop()
}
def iterateMapper(edges: EdgeRDD[Int, Int]) : scala.collection.mutable.Set[(VertexId, VertexId)] = {
var mapperResults = scala.collection.mutable.Set[(VertexId, VertexId)]()
val mappedValues = edges.mapValues(edge => (edge.srcId, edge.dstId)) ++ edges.mapValues(edge => (edge.dstId, edge.srcId))
mappedValues.foreach {
edge => {
var src = edge.attr._1
var dst = edge.attr._2
mapperResults += ((src, dst))
}
}
println(mapperResults)
return mapperResults
}
}
This is the code I'm working with. It is a modified example from Spark.
The
println(mapperResults)
prints out an empty set.
Actually it works, but in the worker.
the foreach is a function that exists for the side effects, but it work on the worker, so you wont see the updated Set.
Other issue is that it design to be Immutable! so do not use mutable collection there. Also there is no need for that. The following code should do what do you meant to do:
var mapperResults = mappedValues.map(_.attr).distinct.collect
It shorter, cleaner and do the map work on the workers.