Converting a sequence of Json Object to An Rdd - scala

Iam currently having a json object say student.json. The Structure looks something like this
{"serialNo":"1","name":"Rahul"}
{"serialNo":"2","name":"Rakshith"}
case class Student(serialNo:Int,name:String)
student.json is a huge file which Iam planning to parse through a spark job. And the snippet :
import play.api.libs.json.{ Json, JsObject, JsString }
.....
.....
for(jsonLine <-sc.textFile("student.json")
student<- Json.parse(jsonLine).asOpt[Student])
yield(student.serialNumber -> student.name)
Is there a better way to do this??

If student.json is a huge file, and each line is just a valid json object, you should do:
val myRdd = sc.textFile("student.json").map(l=> Json.parse(l).asOpt[Student])
If you want to get the RDD to your local master, you can:
val students = myRdd.collect()..// then you can do operate it in the old fashion way.
I saw you are importing play.api.libs.json which is from the Play Framework. I don't think running a Spark program in a web application is a good idea...

Related

Create and stream a zip file as it is beeing created with the playframework and scala

My scala-play api provides endpoints to return a file as a stream via the Ok.chunked-function.
I now want to be able to allow the download of multiple files as a zip archive.
I want to create a zip file as a stream which play should directly return as a filestream.
Meaning without the need to temporarly save the zip-file on the disc and serving it while it is being created.
What would be a good way to implemente a function that creates this stream?
I solved the issue by using akkas alpakka.
import akka.stream.alpakka.file.ArchiveMetadata
import akka.stream.alpakka.file.scaladsl.Archive
import akka.stream.scaladsl.Source
import akka.util.ByteString
val fileSource: Source[ByteString, _] = FileIO.fromPath(path)
val tupelWithMetaData = (ArchiveMetadata(s"file.txt"), fileSource)
val stream: Source[ByteString, _] = Source(List(tupelWithMetaData)).via(Archive.zip())
First I create a akka ByteString source. The source is used inside the Tupel2 with some ArchiveMetadata. This tuple can than be used to create a new source which can be connected to alpakkas Archive.zip().
The resulting stream can than be used with plays Ok.chunked.
I hope this solution might help you if you have the same question.

Passing In Config In Gatling Tests

Noob to Gatling/Scala here.
This might be a bit of a silly question but I haven't been able to find an example of what I am trying to do.
I want to pass in things such as the baseURL, username and passwords for some of my calls. This would change from env to env, so I want to be able to change these values between the envs but still have the same tests in each.
I know we can feed in values but it appears that more for iterating over datasets and not so much for passing in the config values like I have.
Ideally I would like to house this information in a JSON file and not pass it in on the command line, but maybe thats not doable?
Any guidance on this would be awesome.
I have a similar setup and you can use pure scala here .In this scenario you can create an object called Config for eg
object Configuration { var INPUT_PROFILE_FILE_NAME = ""; }
This class can also read a file , I have the below code in the above object
val file = getClass.getResource("data/config.properties").getFile()
val prop = new Properties()
prop.load(new FileInputStream(file));
INPUT_PROFILE_FILE_NAME = prop.getProperty("inputProfileFileName")
Now you can import this object in Gattling Simulation File
val profileName= Configuration.INPUT_PROFILE_FILE_NAME ;
https://docs.scala-lang.org/tutorials/tour/singleton-objects.html.html

ingesting data in solr using spark scala

I am trying to ingest data to solr using scala and spark however, my code is missing something. For instance, I got below code from Hortonworks tutorial.
I am using spark 1.6.2, solr 5.2.1, scala 2.10.5.
Can anybody provide me a workable snippet to successfully insert data into solr?
val input_file = "hdfs:///tmp/your_text_file"
case class Person(id: Int, name: String)
val people_df1 = sc.textFile(input_file).map(_.split(",")).map(p => Person(p(0).trim.toInt, p(1))).toDF()
val docs = people_df1.map{doc=>
val docx=SolrSupport.autoMapToSolrInputDoc(doc.getAs[Int]("id").toString, doc, null)
docx.setField("scala_s", "supercool")
docx.setField("name_s", doc.getAs[String]("name"))
}
// below code has compilation issue somehow although jar file doest contain these functions.
SolrSupport.indexDocs("sandbox.hortonworks.com:2181","testsparksolr",10,docs)
val solrServer = com.lucidworks.spark.SolrSupport.getSolrServer("http://ambari.asiacell.com:2181")
solrServer.setDefaultCollection("
testsparksolr")
solrServer.commit(false, false)
thanks in advance
Have you tried spark-solr?
The library's main focus is to provide a clean API to index documents to a Solr server as in your case.

Getting Json to a model object

I need to get Json to my model but i have problems(i am beginner).
my model
https://gist.github.com/anonymous/1c2e88cb83cbeace6f34
and i need to do getting json to my model
and i need json->to model convertion
and i use web service for getting json
and how can i implement object list of jobs to my model
controller
https://gist.github.com/anonymous/c526483b29be0b198bca
i need objects to edit some details and i want to re convert to Json
my opinion is this
i am open new ideas
Thanks...
It looks like you have nearly everything you need there. To convert builds in the JSON to a list of Build, you'd do this:
val js: JsValue = response.json
(js \ "builds").as[List[Build]]
To modify a field, you can do, for example:
val build = builds.head // get the first build
val modifiedBuild = build.copy(name = "new name")
Then to convert that back to being JSON:
Json.toJson(mp)

Sending email with attachment using scala and Liftweb

This is the first time i am integrating Email service with liftweb
I want to send Email with attachments(Like:- Documents,Images,Pdfs)
my code looking like below
case class CSVFile(bytes: Array[Byte],filename: String = "file.csv",
mime: String = "text/csv; charset=utf8; header=present" )
val attach = CSVFile(fileupload.mkString.getBytes("utf8"))
val body = <p>Please research the enclosed.</p>
val msg = XHTMLPlusImages(body,
PlusImageHolder(attach.filename, attach.mime, attach.bytes))
Mailer.sendMail(
From("vyz#gmail.com"),
Subject(subject(0)),
To(to(0)),
)
this code is taken from LiftCookbook its not working like my requirement
its working but only the Attached file name is coming(file.csv) no data in it(i uploaded this file (gsy.docx))
Best Regards
GSY
You don't specify what type fileupload is, but assuming it is of type net.liftweb.http. FileParamHolder then the issue is that you can't just call mkString and expect it to have any data since there is no data in the object, just a fileStream method for retrieving it (either from disk or memory).
The easiest to accomplish what you want would be to use a ByteArrayInputStream and copy the data to it. I haven't tested it, but the code below should solve your issue. For brevity, it uses Apache IO Commons to copy the streams, but you could just as easily do it natively.
val data = {
val os = new ByteArrayOutputStream()
IOUtils.copy(fileupload.fileStream, os)
os.toByteArray
}
val attach = CSVFile(data)
BTW, you say you are uploading a Word (DOCX) file and expecting it to automatically be CSV when the extension is changed? You will just get a DOCX file with a csv extension unless you actually do some conversion.