error: object Stemmer is not a member of package org.apache.spark.mllib.feature - scala

Importing the package org.apache.spark.mllib.feature.Stemmer in Spark-shell using Scala returns the following error:
:47: error: object Stemmer is not a member of package org.apache.spark.mllib.feature
import org.apache.spark.mllib.feature.Stemmer
I am trying to use stemming to my words using:
val stemmer_product_title = new Stemmer()
.setInputCol("ngrams")
.setOutputCol("stemmed")
.setLanguage("English")
Here ngrams is a 1-gram transformed text. Could anyone help me with this please? I would be grateful.

Add the following dependency to your pom.xml
<dependency>
<groupId>com.github.master</groupId>
<artifactId>spark-stemming_2.10</artifactId>
<version>0.2.0</version>
</dependency>
or to your build.sbt
libraryDependencies += "com.github.master" %% "spark-stemming" % "0.2.1"

Related

Set file-level option to scalapb project

I'm using ScalaPB (version 0.11.1) and plugin sbt-protoc (version 1.0.3) to try to compile an old project with ProtocolBuffers in Scala 2.12. Reading the documentation, I want to set the file property preserve_unknown_fields to false. But my question is, where? Where do I need to set this flag? On the .proto file?
I've also tried to include the flag as a package-scoped option by creating a package.proto file next to my other .proto file, with the following content (as it is specified here):
import "scalapb/scalapb.proto";
package eur.astrata.eu.bigdata.tpms.protobuf;
option (scalapb.options) = {
preserve_unknown_fields: false
};
But when trying to compile, I get the following error:
[libprotobuf WARNING T:\src\github\protobuf\src\google\protobuf\compiler\parser.cc:648] No syntax specified for the proto file: package.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)
scalapb/scalapb.proto: File not found.
package.proto:1:1: Import "scalapb/scalapb.proto" was not found or had errors.
I've also tried with syntax = "proto3"; at the beginning but it doesn't work.
Any help would be greatly appreciated.
From the docs:
If you are using sbt-protoc and importing protos like
scalapb/scalapb.proto, or common protocol buffers like
google/protobuf/wrappers.proto:
Add the following to your build.sbt:
libraryDependencies += "com.thesamet.scalapb" %% "scalapb-runtime" % scalapb.compiler.Version.scalapbVersion % "protobuf"
This tells sbt-protoc to extract protos from this jar (and all its dependencies,
which includes Google's common protos), and make them available in the
include path that is passed to protoc.
It is important to add that by setting preserve_unknown_fields to false you are turning off a protobuf feature that could prevent data loss when different parts of a distributed system are not running the same version of the schema.

How to access library dependencies at build-time in SBT/Scala build?

For example, suppose I have a file project/CodeGeneration.scala that generates "managed" source-code, and suppose that object (CodeGeneration) needs to leverage a third-party library -- say jsoup...
import org.jsoup._
object CodeGeneration {
def generateCode = /* Generate code using jsoup... */
}
Simply adding a line for jsoup to your libraryDependencies in build.sbt doesn't do the trick; it leads to a compilation error complaining about the missing jsoup object/namespace.
So, (how) can one access this dependency from "meta" code -- code that generates other code?
It seems the solution is to leverage sbt's "recursive" nature, and put an additional build.sbt file in the project directory. So, for example, project/build.sbt might look like this:
libraryDependencies += "org.jsoup" % "jsoup" % "1.11.2"
There's more detail in sbt's official documentation.

Scala SBT and JNI library

I am writing a simple app in Scala that uses a leveldb database through the leveldbjni library. My build.sbt file looks like this:
name := "Whatever"
version := "1.0"
scalaVersion := "2.10.2"
libraryDependencies ++= Seq(
"org.iq80.leveldb" % "leveldb-api" % "0.6",
"org.fusesource.leveldbjni" % "leveldbjni-all" % "1.7"
)
An Object is then responsible for creating a database. Unfortunately if I run the program I get back a java.lang.UnsatisfiedLinkError, raised by the hawtjni library that leveldbjni exploits under the hood.
The error can be triggered easily also from the scala console:
scala> import java.io.File
scala> import org.iq80.leveldb._
scala> import org.fusesource.leveldbjni.JniDBFactory._
scala> factory.open(new File("test"), new Options().createIfMissing(true))
java.lang.UnsatisfiedLinkError: org.fusesource.leveldbjni.internal.NativeOptions.init()V
at org.fusesource.leveldbjni.internal.NativeOptions.init(Native Method)
at org.fusesource.leveldbjni.internal.NativeOptions.<clinit>(NativeOptions.java:54)
at org.fusesource.leveldbjni.JniDBFactory$OptionsResourceHolder.init(JniDBFactory.java:98)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:167)
at .<init>(<console>:15)
...
scala> System getProperty "java.io.tmpdir"
res2: String = /var/folders/1l/wj6yg_wd15sg_gcql001wchm0000gn/T/
I can't understand what is going on since the library is getting correctly extracted from the jar file but it is not getting loaded for some reasons.
$ file /var/folders/1l/wj6yg_wd15sg_gcql001wchm0000gn/T/lib*
/var/folders/1l/wj6yg_wd15sg_gcql001wchm0000gn/T/libleveldbjni-1.7.jnilib: Mach-O universal binary with 2 architectures
/var/folders/1l/wj6yg_wd15sg_gcql001wchm0000gn/T/libleveldbjni-1.7.jnilib (for architecture x86_64): Mach-O 64-bit dynamically linked shared library x86_64
/var/folders/1l/wj6yg_wd15sg_gcql001wchm0000gn/T/libleveldbjni-1.7.jnilib (for architecture i386): Mach-O dynamically linked shared library i386
I think the problem is probably related to the classloader that sbt employs but I am not sure since I am relatively new to scala.
UPDATE
Still didn't find what or who is the culprit. Anyway the library is actually found and correctly loaded, since I can execute the following commands:
scalac> import org.fusesource.leveldbjni.internal.NativeDB
scalac> NativeDB.LIBRARY.load()
The error is somehow due to the init() function that according to the hawtjni documentation is responsible for setting all the static fields annotated as constant fields with the constant value. The exception can still be triggered by typing:
scalac> import org.fusesource.leveldbjni.internal.NativeOptions
scalac> new NativeOptions()
java.lang.UnsatisfiedLinkError: org.fusesource.leveldbjni.internal.NativeOptions.init()V
at org.fusesource.leveldbjni.internal.NativeOptions.init(Native Method)
at org.fusesource.leveldbjni.internal.NativeOptions.<clinit>(NativeOptions.java:54)
at .<init>(<console>:9)
Apparently this is a known problem as documented in this sbt issue page. I have implemented, according to the eventsourced documentation, a custom run-nobootcp command that executes the code without adding the Scala library to the boot classpath.
This should solve the problem.

Unresolved symbol s2 in Specs2 class

When I compile my specification, the compiler tells me
"error: value s2 is not a member of StringContext"
The salient portion of my specification class is:
import org.specs2._
import specification._
import mock._
class EnterpriseDirectoryActionSpec extends Specification { def is = s2"""
An enterprise directory action should provide enabled fields
after a call to doDefault ${c().e1}
after a call to doSearchPrevious ${c().e2}
after a call to doSearchNext ${c().e3}
after a call to doExecuteSearch ${c().e4}
"""
...
What is causing the error, and how can I correct it?
I'm using Specs2 (artifact specs2_2.10) version 1.14.
You need to use a later version of specs2: specs2-2.0-RC1or specs2-2.0-RC2-SNAPSHOT
For the benefit of others reading this, I put the following into my pom.xml:
<dependency>
<groupId>org.specs2</groupId>
<artifactId>specs2_2.10</artifactId>
<version>2.0-RC2-SNAPSHOT</version>
<scope>test</scope>
</dependency>
...along with the repository entry for snapshots:
<!--
We need this repository in order to have access to a snapshot version of
the Specs2 testing library for Scala. In particular, the snapshot version
includes support for using string interpolation in test specifications.
-->
<repository>
<id>oss.sonatype.org</id>
<name>snapshots</name>
<url>http://oss.sonatype.org/content/repositories/snapshots</url>
</repository>

Why does Play fail with "Driver not found: [org.postgresql.Driver]"?

This is my application.conf:
db.default.driver=org.postgresql.Driver
db.default.url="postgres://postgres:postgres#localhost:5432/postgres"
db.default.user="postgres"
db.default.password= "postgres"
I downloaded postgresql-9.1-902.jdbc4.jar. Included it in my jar files by adding it as an external jar. Still it shows me this error that the driver was not found. Help?
I'd say the PostgreSQL driver isn't really on your classpath after all, but since you haven't shown the exact text of the error message it's hard to be sure. It would help if you could (a) show the exact copied and pasted text of the full error message and traceback; and (b) show exactly where you put the PgJDBC jar.
Consider adding some debug code that prints out the contents of System.getProperty("java.class.path") during your app's startup. Also add a block that does something like:
try {
Class.forName("org.postgresql.Driver")
} catch (ClassNotFoundException ex) {
// Log or abort here
}
This should tell you something about the class's visibility. Because of the complexity of class loading on modern JVMs and frameworks it won't be conclusive - there are just too many class loaders.
I am using postgresql 9.1-901.jdbc4 in my project and I configured it like this:
Build.scala:
import sbt._
import Keys._
import PlayProject._
object ApplicationBuild extends Build {
val appName = "project_name"
val appVersion = "1.0-SNAPSHOT"
val appDependencies = Seq(
// Add your project dependencies here,
"postgresql" % "postgresql" % "9.1-901.jdbc4"
)
val main = PlayProject(appName, appVersion, appDependencies, mainLang = JAVA).settings(
// Add your own project settings here
)
}
application.conf
db.default.driver=org.postgresql.Driver
db.default.url="jdbc:postgresql://localhost:5432/project_name"
db.default.user=postgres
db.default.password=mypw
db.default.partitionCount=1
db.default.maxConnectionsPerPartition=5
db.default.minConnectionsPerPartition=5
Then I used following combo when taking it to use:
play clean
play compile
play eclipsify
play ~run
Alternatively you could type play dependencies after this to see if its properly loaded.