error: value saveAsTextFile is not a member of Unit - scala

I am relatively new to Spark and scala programming.
I was trying to execute the simple pagerank algorithm using scala. But I encountered this error when compiling.
error: value saveAsTextFile is not a member of Unit
I have attached the code I am using.
val output = ranks.collect()
output.foreach(tup => println(tup._1 + " has page rank: " + tup._2)).saveAsTextFile("/user/ssimhadr/ScalaWordCount_Output")

foreach is, as Ryan pointed out, solely for side-effects. It returns Unit and not the List itself. Ergo no chaining.
Now what you are actually doing is the following:
val output = ranks.collect()
val realoutput: Unit = output.foreach(tup => println(tup._1 + " has page rank: " + tup._2))
realoutput.saveAsTextFile(...)
saveAsTextFile is not a member of Unit and you get your error message
You should be doing:
ranks.foreach(tup => println(tup._1 + " has page rank: " + tup._2))
ranks.saveAsTextFile(...)
or
ranks.saveAsTextFile(...)
ranks.collect().foreach(tup => println(tup._1 + " has page rank: " + tup._2))

Related

how to save a value in a file using Scala

I am trying to save a value in a file, but keep getting an error
I have tried
.saveAsTextFile("/home/amel/timer")`
REDUCER Function
val startReduce = System.currentTimeMillis()
val y = sc.textFile("/home/amel/10MB").filter(!_.contains("NULL")).filter(!_.contains("Null"))
val er = x.map(row => {
val cols = row.split(",")
(cols(1).split("-")(0) + "," + cols(2) + "," + cols(3), 1)
}).reduceByKey(_ + _).map(x => x._1 + "," + x._2)
er.collect.foreach(println)
val endReduce = System.currentTimeMillis()
val durationReduce = ((endReduce-startReduce)/1000).saveAsTextFile("home/amel/timer/")
the error I'm receiving is on this line
val durationReduce = ((endReduce-startReduce)/1000).saveAsTextFile("home/amel/timer/")
it says: saveAsTextFile is not a member of Long
The output I want is a number
Long does not have a method named saveAsTextFile If you want to write a Long value, there are many ways a simple way is to use java PrintWriter
val duration = ((endReduce-startReduce)/1000)
new PrintWriter("ome/amel/timer/time") { write(duation.toString); close }
If you still want to use spark RDD saveAsTextFile then you can use
sc.parallelize(Seq(duration)).saveAsTextFile("path")
But this does not make sense just to write a single value.
saveAsTextFile is a method on the class org.apache.spark.rdd.RDD (docs)
The expression ((endReduce-startReduce)/1000) is of type Long, so it does not have this method, hence the error you are seeing "saveAsTextFile is not a member of Long"
This answer is applicable here: https://stackoverflow.com/a/32105659/8261
Basically the situation is that you have an Int and you want to write it to a file. Your first thought is to create a distributed collection across a cluster of machines, that only contains this Int and let those machines write the Int to a set of files in a distributed way.
I'd argue this is not the right approach. Do not use Spark for saving an Int into a file. Instead you can use a PrintWriter:
val out = new java.io.PrintWriter("filename.txt")
out.println(finalvalue)
out.close()

support and lift for fp-growth rules in mllib spark/scala

I would like to extract support and lift for generated association rules with fp-growth. Having found the rules with the code below I manually go through the transactions and calculate support and lift. I wonder if there is a more legant way to extract this info. thanks!
val fpg = new FPGrowth()
.setMinSupport(0.2)
.setNumPartitions(10)
val model = fpg.run(transactions)
model.freqItemsets.collect().foreach { itemset =>
println(itemset.items.mkString("[", ",", "]") + ", " + itemset.freq)
}
val minConfidence = 0.8
model.generateAssociationRules(minConfidence).collect().foreach { rule =>
println(
rule.antecedent.mkString("[", ",", "]")
+ " => " + rule.consequent .mkString("[", ",", "]")
+ ", " + rule.confidence)
}
mm not elegant but this is what I do
val freqs = fpgrowth_model(transactions, min_supp=supp)
val supps = freqs.withColumn("support", $"freq" / total_transactions)
val rules = get_rules(transactions, min_supp=supp, min_confidence=conf)
val cross_df = supps.join(rules, $"items" === $"consequent")
.withColumn("lift",$"confidence" / $"support")

SignatureDoesNotMatch Aws CloudSearch scala

I keep getting:
"#SignatureDoesNotMatch","error":{"message":"[Deprecated: Use the
outer message field] The request signature we calculated does not
match the signature you provided. Check your AWS Secret Access Key and
signing method. Consult the service documentation for details.
from trying to do a get request to cloudsearch. I verified that my Canonical String and String-to-Sign match the ones sent back from the error message everytime now, but I keep getting the error. Im assuming my signature itself isn't being processed correctly. But hard to nail it down.
def getHash(key:Array[Byte]): String = {
try
{
val md = MessageDigest.getInstance("SHA-256").digest(key)
md.map("%02x".format(_)).mkString.toLowerCase()
}
catch
{
case e: Exception => ""
}
}
.
def HmacSHA256(data:String, key:Array[Byte]): Array[Byte] = {
val algorithm="HmacSHA256";
val mac = Mac.getInstance(algorithm);
mac.init(new SecretKeySpec(key, algorithm));
mac.doFinal(data.getBytes("UTF8"));
}
.
...
val algorithm = "AWS4-HMAC-SHA256"
val credential_scope = date + "/us-west-1/cloudsearch/aws4_request"
val string_to_sign = algorithm + "\n" + dateTime + "\n" + credential_scope + "\n" + getHash(canonical_request)
val kSecret = ("AWS4" + config.getString("cloud.secret")).getBytes("utf-8")
val kDate = HmacSHA256(date.toString, kSecret)
val kRegion = HmacSHA256("us-west-1",kDate)
val kService = HmacSHA256("cloudsearch",kRegion)
val kSigning = HmacSHA256("aws4_request",kService)
val signing_key = kSigning
val signature = getHash(HmacSHA256(string_to_sign, kSigning))
val authorization_header = algorithm + " " + "Credential=" + config.getString("cloud.key") + "/" + credential_scope + ", " + "SignedHeaders=" + signed_headers + ", " + "Signature=" + signature
val complexHolder = holder.withHeaders(("x-amz-date",dateTime.toString))
.withHeaders(("Authorization",authorization_header))
.withRequestTimeout(5000)
.get()
val response = Await.result(complexHolder, 10 second)
I just released a helper library to sign your HTTP requests to AWS: https://github.com/ticofab/aws-request-signer . Hope it helps!

Can't print items of the observables after grouped

Can't understand why the following rxscala code is not working as expected:
import rx.lang.scala.Observable
object MyTest extends App {
case class ProjectEvent(projectName: String, description: String)
val projectEvents: Observable[ProjectEvent] = Observable.just(
ProjectEvent("aaa", "d1"),
ProjectEvent("bbb", "d2"),
ProjectEvent("aaa", "d3")
)
lazy val grouped = projectEvents.groupBy(_.projectName).map { case (projectName, eventsOfThisProject) =>
println("projectName: " + projectName)
eventsOfThisProject.foreach(x => "######### event in project " + projectName + ": " + x)
(projectName, eventsOfThisProject)
}
grouped.foreach(println)
}
I grouped the projectEvents by the projectName and want to print the items of each project. But when I run this code, it only prints:
projectName: aaa
(aaa,rx.lang.scala.JavaConversions$$anon$2#49de17f4)
projectName: bbb
(bbb,rx.lang.scala.JavaConversions$$anon$2#52f6438d)
There is no ######### event in project printed.
I can't understand why, is there anything I missed?
You forgot to use println in this line:
eventsOfThisProject.foreach(x => "######### event in project " + projectName + ": " + x)
The function in foreach just converts x to a String but doesn't print it.

"foreach is not a member of object" when I'm trying to iterate over enumeration [duplicate]

I'm try to learn some Scala reading Programming Scala, by Dean Wampler.
I'm trying to replicate a code snippet about Enumeration
object Breed extends Enumeration {
val doberman = Value("Doberman Pinscher")
val yorkie = Value("Yorkshire Terrier")
val scottie = Value("Scottish Terrier")
val dane = Value("Great Dane")
val portie = Value("Portuguese Water Dog")
}
for (breed <- Breed) println(breed.id + "\t" + breed)
But, in the last line of code, I got this error:
value foreach is not a member of object Breed
Am I missing something? How can I solve?
You need to use .values:
for (breed <- Breed.values) println(breed.id + "\t" + breed)
And why not make it a bit more scala-y
Breed.values.foreach(breed => println(breed.id + "\t" + breed));