Scala- Some bytes are missing because of 'hasNext' function? - scala

Listing below is my code:
import scala.io.Source
object grouped_list {
def encrypt(file: String) = {
val text = Source.fromFile(file)
val list = text.toList
val blocks = list.grouped(501)
while (blocks.hasNext) {
val block0 = blocks.next()
val stringBlock = block0.mkString
println(stringBlock)
}
}
def main (args: Array[String]): Unit = {
val blockcipher = encrypt("E:\\test.txt")
}
The file ("E:\test.txt", https://pan.baidu.com/s/1jHKuWb8) is about 300KB, however, the 'stringBlock' (https://pan.baidu.com/s/1pLygsyR) which is supposed to be the same with the file is about only 80KB, where are the rest? When I am not doing the 'while(blocks.hasNext)', the first 501 bytes of 'stringBlock' is the same with the file's first 501 bytes. After I did the 'while(blocks.hasNext)' , the first 501 bytes of 'stringBlock' is no longer the same with the file's. Is my problem has something to do with the 'hasNext' function?

Related

Nested for loop in Scala not iterating through out the file?

I have first file with data as
A,B,C
B,E,F
C,N,P
And second file with data as below
A,B,C,YES
B,C,D,NO
C,D,E,TRUE
D,E,F,FALSE
E,F,G,NO
I need every record in the first file to iterate with all records in the second file. But it's happening only for the first record.
Below is the code:
import scala.io.Source.fromFile
object TestComparision {
def main(args: Array[String]): Unit = {
val lines = fromFile("C:\\Users\\nreddy26\\Desktop\\Spark\\PRI.txt").getLines
val lines2 = fromFile("C:\\Users\\nreddy26\\Desktop\\Spark\\LKP.txt").getLines
var l = 0
var cnt = 0
for (line <- lines) {
for (line2 <- lines2) {
val cols = line.split(",").map(_.trim)
println(s"${cols(0)}|${cols(1)}|${cols(2)}")
val cols2 = line2.split(",").map(_.trim)
println(s"${cols2(0)}|${cols2(1)}|${cols2(2)}|${cols2(3)}")
}
}
}
}
As rightly suggested by #Luis, get the lines in List form by using toList:
val lines = fromFile("C:\\Users\\nreddy26\\Desktop\\Spark\\PRI.txt").getLines.toList
val lines2 = fromFile("C:\\Users\\nreddy26\\Desktop\\Spark\\LKP.txt").getLines.toList

Why Files.write can't use Array[Byte]()?

I have this code:
val socket=new ServerSocket(25)
val client=socket.accept()
val inputStream=client.getInputStream
var dataBuffer=new Array[Byte](4096)
inputStream.read(dataBuffer)
Files.write("",dataBuffer)
Files.write need a Byte array,and I have give it a Byte array,so why I got error at last line:
Type mismatch,expected:Iterable[_<:CharSequence],actual:Array[Byte]
inputStream.read also need a Byte array param,it can use dataBuffer,so why the next line got a error?How to fix it?Thanks!
If you use java.nio.file.Files you should use Path as first parameter.
val b: Array[Byte] = Array()
Files.write(Paths.get(""), b)
import java.net.ServerSocket
import java.nio.file.{Files, Paths}
object Test1 {
def main(args: Array[String]): Unit = {
val socket = new ServerSocket(9999)
val client = socket.accept()
val inputStream = client.getInputStream
var dataBuffer = new Array[Byte](4096)
inputStream.read(dataBuffer)
Files.write(Paths.get("/home/eiffel/a.txt"), dataBuffer)
}
}

Apache Flink: add timestamp to filename

I want to create a BucketingSink in Flink which writes all the files in the same folder, but sets the file name with the current timestamp instead of incrementing counter. For example:
part-0-1514107452000.avro
part-0-1514021052000.avro
...
Here is my code:
val sink = new BucketingSink[Tuple2[String, MyType]]("/tmp/flink/")
sink.setBucketer(new MyTypeBucketer(new SimpleDateFormat("yyyy-MM-dd--HH")))
sink.setInactiveBucketThreshold(120000) // this is 2 minutes
sink.setBatchSize(1024 * 1024 * 64) // this is 64 MB,
sink.setPendingSuffix(".avro")
val writer: AvroKeyValueSinkWriter[String, MyType] = new AvroKeyValueSinkWriter[String, MyType](parseAvroSinkProperties())
sink.setWriter(writer.duplicate())
def parseAvroSinkProperties(): util.Map[String, String] = {
var properties = new util.HashMap[String, String]()
val stringSchema = Schema.create(Type.STRING)
val myTypeSchema = myType.getClassSchema
val keySchema = stringSchema.toString
val valueSchema = myTypeSchema.toString
val compress = true
properties.put(AvroKeyValueSinkWriter.CONF_OUTPUT_KEY_SCHEMA, keySchema)
properties.put(AvroKeyValueSinkWriter.CONF_OUTPUT_VALUE_SCHEMA, valueSchema)
properties.put(AvroKeyValueSinkWriter.CONF_COMPRESS, compress.toString)
properties.put(AvroKeyValueSinkWriter.CONF_COMPRESS_CODEC, DataFileConstants.SNAPPY_CODEC)
properties
}
class MyTypeBucketer(dateFormatter: SimpleDateFormat) extends DateTimeBucketer[Tuple2[String, MyType]] {
override def getBucketPath(clock: Clock, basePath: Path, element: Tuple2[String, MyType]) = {
new Path(s"$basePath/${element.f1.getMyStringProp}")
}
Does anybody have any idea?
thanks

Serializing to disk and deserializing Scala objects using Pickling

Given a stream of homogeneous typed object, how would I go about serializing them to binary, writing them to disk, reading them from disk and then deserializing them using Scala Pickling?
For example:
object PicklingIteratorExample extends App {
import scala.pickling.Defaults._
import scala.pickling.binary._
import scala.pickling.static._
case class Person(name: String, age: Int)
val personsIt = Iterator.from(0).take(10).map(i => Person(i.toString, i))
val pklsIt = personsIt.map(_.pickle)
??? // Write to disk
val readIt: Iterator[Person] = ??? // Read from disk and unpickle
}
I find a way to so for standard files:
object PickleIOExample extends App {
import scala.pickling.Defaults._
import scala.pickling.binary._
import scala.pickling.static._
val tempPath = File.createTempFile("pickling", ".gz").getAbsolutePath
val outputStream = new FileOutputStream(tempPath)
val inputStream = new FileInputStream(tempPath)
val persons = for{
i <- 1 to 100
} yield Person(i.toString, i)
val output = new StreamOutput(outputStream)
persons.foreach(_.pickleTo(output))
outputStream.close()
val personsIt = new Iterator[Person]{
val streamPickle = BinaryPickle(inputStream)
override def hasNext: Boolean = inputStream.available > 0
override def next(): Person = streamPickle.unpickle[Person]
}
println(personsIt.mkString(", "))
inputStream.close()
}
But I am still unable to find a solution that will work with gzipped files. Since I do not know how to detect the EOF? The following throws an EOFexception since GZIPInputStream available method does not indicate the EOF:
object PickleIOExample extends App {
import scala.pickling.Defaults._
import scala.pickling.binary._
import scala.pickling.static._
val tempPath = File.createTempFile("pickling", ".gz").getAbsolutePath
val outputStream = new GZIPOutputStream(new FileOutputStream(tempPath))
val inputStream = new GZIPInputStream(new FileInputStream(tempPath))
val persons = for{
i <- 1 to 100
} yield Person(i.toString, i)
val output = new StreamOutput(outputStream)
persons.foreach(_.pickleTo(output))
outputStream.close()
val personsIt = new Iterator[Person]{
val streamPickle = BinaryPickle(inputStream)
override def hasNext: Boolean = inputStream.available > 0
override def next(): Person = streamPickle.unpickle[Person]
}
println(personsIt.mkString(", "))
inputStream.close()
}

Creating serializable objects from Scala source code at runtime

To embed Scala as a "scripting language", I need to be able to compile text fragments to simple objects, such as Function0[Unit] that can be serialised to and deserialised from disk and which can be loaded into the current runtime and executed.
How would I go about this?
Say for example, my text fragment is (purely hypothetical):
Document.current.elements.headOption.foreach(_.open())
This might be wrapped into the following complete text:
package myapp.userscripts
import myapp.DSL._
object UserFunction1234 extends Function0[Unit] {
def apply(): Unit = {
Document.current.elements.headOption.foreach(_.open())
}
}
What comes next? Should I use IMain to compile this code? I don't want to use the normal interpreter mode, because the compilation should be "context-free" and not accumulate requests.
What I need to get hold off from the compilation is I guess the binary class file? In that case, serialisation is straight forward (byte array). How would I then load that class into the runtime and invoke the apply method?
What happens if the code compiles to multiple auxiliary classes? The example above contains a closure _.open(). How do I make sure I "package" all those auxiliary things into one object to serialize and class-load?
Note: Given that Scala 2.11 is imminent and the compiler API probably changed, I am happy to receive hints as how to approach this problem on Scala 2.11
Here is one idea: use a regular Scala compiler instance. Unfortunately it seems to require the use of hard disk files both for input and output. So we use temporary files for that. The output will be zipped up in a JAR which will be stored as a byte array (that would go into the hypothetical serialization process). We need a special class loader to retrieve the class again from the extracted JAR.
The following assumes Scala 2.10.3 with the scala-compiler library on the class path:
import scala.tools.nsc
import java.io._
import scala.annotation.tailrec
Wrapping user provided code in a function class with a synthetic name that will be incremented for each new fragment:
val packageName = "myapp"
var userCount = 0
def mkFunName(): String = {
val c = userCount
userCount += 1
s"Fun$c"
}
def wrapSource(source: String): (String, String) = {
val fun = mkFunName()
val code = s"""package $packageName
|
|class $fun extends Function0[Unit] {
| def apply(): Unit = {
| $source
| }
|}
|""".stripMargin
(fun, code)
}
A function to compile a source fragment and return the byte array of the resulting jar:
/** Compiles a source code consisting of a body which is wrapped in a `Function0`
* apply method, and returns the function's class name (without package) and the
* raw jar file produced in the compilation.
*/
def compile(source: String): (String, Array[Byte]) = {
val set = new nsc.Settings
val d = File.createTempFile("temp", ".out")
d.delete(); d.mkdir()
set.d.value = d.getPath
set.usejavacp.value = true
val compiler = new nsc.Global(set)
val f = File.createTempFile("temp", ".scala")
val out = new BufferedOutputStream(new FileOutputStream(f))
val (fun, code) = wrapSource(source)
out.write(code.getBytes("UTF-8"))
out.flush(); out.close()
val run = new compiler.Run()
run.compile(List(f.getPath))
f.delete()
val bytes = packJar(d)
deleteDir(d)
(fun, bytes)
}
def deleteDir(base: File): Unit = {
base.listFiles().foreach { f =>
if (f.isFile) f.delete()
else deleteDir(f)
}
base.delete()
}
Note: Doesn't handle compiler errors yet!
The packJar method uses the compiler output directory and produces an in-memory jar file from it:
// cf. http://stackoverflow.com/questions/1281229
def packJar(base: File): Array[Byte] = {
import java.util.jar._
val mf = new Manifest
mf.getMainAttributes.put(Attributes.Name.MANIFEST_VERSION, "1.0")
val bs = new java.io.ByteArrayOutputStream
val out = new JarOutputStream(bs, mf)
def add(prefix: String, f: File): Unit = {
val name0 = prefix + f.getName
val name = if (f.isDirectory) name0 + "/" else name0
val entry = new JarEntry(name)
entry.setTime(f.lastModified())
out.putNextEntry(entry)
if (f.isFile) {
val in = new BufferedInputStream(new FileInputStream(f))
try {
val buf = new Array[Byte](1024)
#tailrec def loop(): Unit = {
val count = in.read(buf)
if (count >= 0) {
out.write(buf, 0, count)
loop()
}
}
loop()
} finally {
in.close()
}
}
out.closeEntry()
if (f.isDirectory) f.listFiles.foreach(add(name, _))
}
base.listFiles().foreach(add("", _))
out.close()
bs.toByteArray
}
A utility function that takes the byte array found in deserialization and creates a map from class names to class byte code:
def unpackJar(bytes: Array[Byte]): Map[String, Array[Byte]] = {
import java.util.jar._
import scala.annotation.tailrec
val in = new JarInputStream(new ByteArrayInputStream(bytes))
val b = Map.newBuilder[String, Array[Byte]]
#tailrec def loop(): Unit = {
val entry = in.getNextJarEntry
if (entry != null) {
if (!entry.isDirectory) {
val name = entry.getName
// cf. http://stackoverflow.com/questions/8909743
val bs = new ByteArrayOutputStream
var i = 0
while (i >= 0) {
i = in.read()
if (i >= 0) bs.write(i)
}
val bytes = bs.toByteArray
b += mkClassName(name) -> bytes
}
loop()
}
}
loop()
in.close()
b.result()
}
def mkClassName(path: String): String = {
require(path.endsWith(".class"))
path.substring(0, path.length - 6).replace("/", ".")
}
A suitable class loader:
class MemoryClassLoader(map: Map[String, Array[Byte]]) extends ClassLoader {
override protected def findClass(name: String): Class[_] =
map.get(name).map { bytes =>
println(s"defineClass($name, ...)")
defineClass(name, bytes, 0, bytes.length)
} .getOrElse(super.findClass(name)) // throws exception
}
And a test case which contains additional classes (closures):
val exampleSource =
"""val xs = List("hello", "world")
|println(xs.map(_.capitalize).mkString(" "))
|""".stripMargin
def test(fun: String, cl: ClassLoader): Unit = {
val clName = s"$packageName.$fun"
println(s"Resolving class '$clName'...")
val clazz = Class.forName(clName, true, cl)
println("Instantiating...")
val x = clazz.newInstance().asInstanceOf[() => Unit]
println("Invoking 'apply':")
x()
}
locally {
println("Compiling...")
val (fun, bytes) = compile(exampleSource)
val map = unpackJar(bytes)
println("Classes found:")
map.keys.foreach(k => println(s" '$k'"))
val cl = new MemoryClassLoader(map)
test(fun, cl) // should call `defineClass`
test(fun, cl) // should find cached class
}