foreach loop error reading from file - scala

I am trying to read lines from a text file into a foreach loop but I keep getting this error: value getLines is not a member of org.xml.sax.InputSource. Can someone explain what this error means, and how I can resolve it?
import scala.xml._
import collection.mutable.HashMap
val noDupFile="nodup_steam_out.txt"
Source.fromFile(noDupFile).getLines().par.foreach((res:String){
//....
})

import scala.xml._
import collection.mutable.HashMap
val noDupFile="nodup_steam_out.txt"
io.Source.fromFile(noDupFile).getLines().foreach(res => {
//....
})
without the right import you are refering to org.xml.sax.InputSource
io.Source.fromFile returns a io.BufferedSource and does not have the par method (defined on Parallelizable).
You can write
import scala.xml._
import collection.mutable.HashMap
val noDupFile="nodup_steam_out.txt"
io.Source.fromFile(noDupFile).getLines().toStream.par.foreach(res => {
//....
})

You seem to be trying to use the scala.xml library to iterate over you txt file. If you import scala.io._ instead, you should be able to do something like:
import scala.io._
val noDupFile="nodup_steam_out.txt"
Source.fromFile(noDupFile).getLines().foreach{ res =>
//....
}

Related

Understanding log.info()

Can you please explain what this log.info is doing here? Where can I see this log info for debugging purpose? In my existing project, it has been used at many places inside for loop. Thanks!
import java.io.{BufferedInputStream, FileInputStream}
import cats.instances.list._
import cats.syntax.contravariantSemigroupal._
import cats.syntax.flatMap._
import cats.syntax.parallel._
import com.google.api.client.util.{DateTime => GDateTime}
import com.google.api.services.storage.model.StorageObject
import com.google.cloud.storage.Storage
import com.leighperry.log4zio.Log.SafeLog
import zio.interop.catz._
import zio.{IO, Managed, ZIO}
def fileContentsToGcs(
cfg: AppConfig,
log: SafeLog[String],
filepath: String
): List[String] = {
val result: IO[FilesError, List[String]] =
for {
_ <- log.info(s"GCS file ${cfg.gcsBucketProject} ${cfg.gcsInputBucket} $filepath")
// list of csv files
contents <- expandedFileContents(log, cfg, filepath)
_ <- log.info(s"GCS temp files $contents")
} yield contents
unsafeRun(result)
}

Implement case class inside a class

I am using the below code to run in Qubole Notebook and the code is running successfully.
case class cls_Sch(Id:String, Name:String)
class myClass {
implicit val sparkSession = org.apache.spark.sql.SparkSession.builder().enableHiveSupport().getOrCreate()
sparkSession.sql("set spark.sql.crossJoin.enabled = true")
sparkSession.sql("set spark.sql.caseSensitive=false")
import sparkSession.sqlContext.implicits._
import org.apache.hadoop.fs.{FileSystem, Path, LocatedFileStatus, RemoteIterator, FileUtil}
import org.apache.hadoop.conf.Configuration
import org.apache.spark.sql.DataFrame
def my_Methd() {
var my_df = Seq(("1","Sarath"),("2","Amal")).toDF("Id","Name")
my_df.as[cls_Sch].take(my_df.count.toInt).foreach(t => {
println(s"${t.Name}")
})
}
}
val obj_myClass = new myClass()
obj_myClass.my_Methd()
However when I run in the same code in Qubole's Analyze, I am getting the below error.
When I take out the below code, its running fine in Qubole's Anlayze.
my_df.as[cls_Sch].take(my_df.count.toInt).foreach(t => {
println(s"${t.Name}")
})
I believe somewhere I have to change the usage of case class.
I am using Spark 2.3.
Can someone please let me know how to solve this issue.
Please let me know if you need any other details.
All you have to do is have the import spark.implicits._ inside the my_Methd() function.
def my_Methd() {
import spark.implicits._
var my_df = Seq(("1","Sarath"),("2","Amal")).toDF("Id","Name")
my_df.as[cls_Sch].take(my_df.count.toInt).foreach(t => {
println(s"${t.Name}")
})
}
For any reason the kernel finds problems when working with dataset. I made two tests that worked with Apache Toree:

How to produce a Traversable with cats-effect's IO given an async that will call multiple times

What I'm really trying to do is monitor multiple files and when any of them is modified I'd like to update some state and produce a side effect using this state. I imagine what I want is a scan over a Traversable that produces a Traversable[IO[_]]. But I don't see the path there.
as a minimal attempt to produce this I wrote
package example
import better.files.{File, FileMonitor}
import cats.implicits._
import com.monovore.decline._
import cats.effect.IO
import java.nio.file.{Files, Path}
import scala.concurrent.ExecutionContext.Implicits.global
object Hello extends CommandApp(
name = "cats-effects-playground",
header = "welcome",
main = {
val filesOpts = Opts.options[Path]("input", help = "input files")
filesOpts.map { files =>
IO.async[File] { cb =>
val watchers = files.map { path =>
new FileMonitor(path, recursive = false) {
override def onModify(file: File, count: Int) = cb(Right(file))
}
}
watchers.toList.foreach(_.start)
}
.flatMap(f => IO { println(f) })
.unsafeRunSync
}
}
)
but this has two major flaws. One it creates a thread for each file I'm watching, which is a little heavy. But more importantly the program finishes as soon as a single file is modified, even though onModify would be called more times if the program stayed running.
I'm not married to using better-files, it just seemed like the path of least resistance. But I do require using Cats IO.
This solution doesn't solve the issue of creating a bunch of threads, and it doesn't strictly produce a Traversable, but it solves the underlying use case. I'm very open to this being critiqued and a better solution provided.
package example
import better.files.{File, FileMonitor}
import cats.implicits._
import com.monovore.decline._
import cats.effect.IO
import java.nio.file.{Files, Path}
import java.util.concurrent.LinkedBlockingQueue
import scala.concurrent.ExecutionContext.Implicits.global
object Hello extends CommandApp(
name = "cats-effects-playground",
header = "welcome",
main = {
val filesOpts = Opts.options[Path]("input", help = "input files")
filesOpts.map { files =>
val bq: LinkedBlockingQueue[IO[File]] = new LinkedBlockingQueue()
val watchers = files.map { path =>
new FileMonitor(path, recursive = false) {
override def onModify(file: File, count: Int) = bq.put(IO(file))
}
}
def ioLoop(): IO[Unit] = bq.take()
.flatMap(f => IO(println(f)))
.flatMap(_ => ioLoop())
watchers.toList.foreach(_.start)
ioLoop.unsafeRunSync
}
}
)

error getting files from gridfs

I'm trying to display images and files from gridfs. So I started with the save function and it's working well:
import javax.inject.Inject
import org.joda.time.DateTime
import scala.concurrent.Future
import play.api.Logger
import play.api.Play.current
import play.api.i18n.{ I18nSupport, MessagesApi }
import play.api.mvc.{ Action, Controller, Request }
import play.api.libs.concurrent.Execution.Implicits.defaultContext
import play.api.libs.json.{ Json, JsObject, JsString }
import reactivemongo.api.gridfs.{ GridFS, ReadFile }
import play.modules.reactivemongo.{
MongoController, ReactiveMongoApi, ReactiveMongoComponents
}
import play.modules.reactivemongo.json._, ImplicitBSONHandlers._
import play.modules.reactivemongo.json.collection._
class griidfs #Inject() (
val messagesApi: MessagesApi,
val reactiveMongoApi: ReactiveMongoApi)
extends Controller with MongoController with ReactiveMongoComponents {
import java.util.UUID
import MongoController.readFileReads
type JSONReadFile = ReadFile[JSONSerializationPack.type, JsString]
// get the collection 'articles'
// a GridFS store named 'attachments'
//val gridFS = GridFS(db, "attachments")
private val gridFS = reactiveMongoApi.gridFS
// let's build an index on our gridfs chunks collection if none
gridFS.ensureIndex().onComplete {
case index =>
Logger.info(s"Checked index, result is $index")
}
def saveAttachment =
Action.async(gridFSBodyParser(gridFS)) { request =>
// here is the future file!
val futureFile = request.body.files.head.ref
futureFile.onFailure {
case err => err.printStackTrace()
}
// when the upload is complete, we add the article id to the file entry (in order to find the attachments of the article)
val futureUpdate = for {
file <- { println("_0"); futureFile }
// here, the file is completely uploaded, so it is time to update the article
updateResult <- {
println("_1"); futureFile
}
} yield updateResult
futureUpdate.map { _ =>
Redirect(routes.Application.index())
}.recover {
case e => InternalServerError(e.getMessage())
}
}
but when I try to get files from gridfs to display them in my browser with this code:
import reactivemongo.api.gridfs.Implicits.DefaultReadFileReader
def getAttachment = Action.async { request =>
// find the matching attachment, if any, and streams it to the client
val file = gridFS.find[JsObject, JSONReadFile](Json.obj("_id" -> id))
request.getQueryString("inline") match {
case Some("true") =>
serve[JsString, JSONReadFile](gridFS)(file, CONTENT_DISPOSITION_INLINE)
case _ => serve[JsString, JSONReadFile](gridFS)(file)
}
}
I get this error:
type arguments [play.api.libs.json.JsObject,griidfs.this.JSONReadFile]
do not conform to method find's type parameter bounds
[S,T <:
reactivemongo.api.gridfs.ReadFile[reactivemongo.play.json.JSONSerializationPack.type, _]]
in this line:
val file = gridFS.find[JsObject, JSONReadFile](Json.obj("_id" -> id))
Any help please?
I have been battling the same issue most of the day and finally have it up and running. I think it has something to do with the imports I will check in my work then begin to remove them one by one to check which one causes the issue. In the meantime here is a list of my imports. Hope this helps
import javax.inject.Inject
import forms._
import models._
import org.joda.time.DateTime
import play.modules.reactivemongo.json.JSONSerializationPack
import services._
import play.api.data._
import play.api.data.Forms._
import scala.async.Async._
import play.api.i18n.{ I18nSupport, MessagesApi }
import play.api.mvc.{ Action, Controller, Request }
import play.api.libs.json.{ Json, JsObject, JsString }
import reactivemongo.api.gridfs.{ GridFS, ReadFile }
import play.modules.reactivemongo.{
MongoController, ReactiveMongoApi, ReactiveMongoComponents
}
import play.api.libs.concurrent.Execution.Implicits.defaultContext
import scala.concurrent.Future
import play.modules.reactivemongo.json._
import com.mohiva.play.silhouette.api.{ Environment, LogoutEvent, Silhouette }
import com.mohiva.play.silhouette.impl.authenticators.CookieAuthenticator
import com.mohiva.play.silhouette.impl.providers.SocialProviderRegistry
import MongoController._
Edit Answer:
I have narrowed it down to this one: "import MongoController._"
add it and you should be ok :)

Scala Marshalling/Unmarshalling

I'm trying to write a custom Marshaller for a very simple object, but it seems the scala runtime is not able to find it.
Following the spray template, I've defined the route as follows:
package com.example
import akka.actor.Actor
import spray.routing._
import spray.http._
import MediaTypes._
import com.example.dto.RecipeEntry
import com.example.dto.RecipeEntryJson._
trait RecipeManager extends HttpService {
val myRoute =
path("recipe") {
post {
decompressRequest() {
entity(as[RecipeEntry]) { recipe =>
complete(s"picture is $recipe.image")
}
}
}
}
}
and I've tried to define the Marshaller[RecipeEntry] as such:
package com.example.dto
import spray.json.DefaultJsonProtocol
import spray.httpx.SprayJsonSupport._
import spray.httpx.unmarshalling._
import spray.httpx.marshalling._
import spray.http._
case class RecipeEntry(originSite: String, image: String)
object RecipeEntryJson extends DefaultJsonProtocol {
implicit val jsonMarshaller: Marshaller[RecipeEntry] = jsonFormat2(RecipeEntry.apply)
}
but I keep getting the following error:
RecipeManager.scala:18: could not find implicit value for parameter um: spray.httpx.unmarshalling.FromRequestUnmarshaller[com.example.dto.RecipeEntry]
[error] entity(as[RecipeEntry]) { recipe =>
In fact, I'm running into the same problem as this link, however adding import com.example.dto.RecipeEntryJson._ did not help
I must be missing some small detail (probably quite a few, as I'm very new to scala and spray), but I've tried a number of things but to no avail. Any help is very much appreciated.