I am attempting to connect to an ES cluster through Elastic4s. I am using the example given in the github repo:
import com.sksamuel.elastic4s.ElasticClient
import com.sksamuel.elastic4s.ElasticDsl._
object Test extends App {
val client = ElasticClient.transport(ElasticsearchClientUri(host, port))
// await is a helper method to make this operation synchronous instead of async
// You would normally avoid doing this in a real program as it will block your thread
client.execute { index into "bands" / "artists" fields "name"->"coldplay" }.await
// we need to wait until the index operation has been flushed by the server.
// this is an important point - when the index future completes, that doesn't mean that the doc
// is necessarily searchable. It simply means the server has processed your request and the doc is
// queued to be flushed to the indexes. Elasticsearch is eventually consistent.
// For this demo, we'll simply wait for 2 seconds (default refresh interval is 1 second).
Thread.sleep(2000)
// now we can search for the document we indexed earlier
val resp = client.execute { search in "bands" / "artists" query "coldplay" }.await
println(resp)
}
The client accepts connections on 9434 as described in here - https://www.elastic.co/guide/en/cloud/current/security.html#security-transport
Furthermore it looks for a or appends - depending on the construction way chosen - elasticsearch:\\ to the host and port.
Upon running even the line that initializes the Client I get Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
Clearly I am misunderstanding something. Please let me know what I am doing wrong.
EDIT:
As validation I have a .Net client to ES that uses the regular http connection.
var node = new Uri(url);
var connectionSettings = new ConnectionSettings(node);
connectionSettings.BasicAuthentication(settings.username,settings.password);
client = new ElasticClient(connectionSettings);
I am aiming to achieve the same.
That would appear you are missing the scala-library in your dependencies. So depending on what version of Scala you are using you have to have your deps match that. What build tool are you using?
SBT (you shouldn't need to do this, SBT Should do it automatically based on your scalaVersion)
"org.scala-lang" % "scala-library" % "YOUR SCALA VERSION"
Maven
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.1</version>
</dependency>
SBT also has some information here, http://www.scala-sbt.org/1.0/docs/Configuring-Scala.html
Related
The setup
I have a simple Spark application that uses mapPartitions to transform an RDD. As part of this transformation, I retrieve some necessary data from a Mongo database. The connection from the Spark worker to the Mongo database is managed using the MongoDB Connector for Spark (https://docs.mongodb.com/spark-connector/current/).
I'm using mapPartitions instead of the simpler map because there is some relatively expensive setup that is only required once for all elements in a partition. If I were to use map instead, this setup would have to be repeated for every element individually.
The problem
When one of the partitions in the source RDD becomes large enough, the transformation fails with the message
IllegalStateException: state should be: open
or, occasionally, with
IllegalStateException: The pool is closed
The code
Below is the code of a simple Scala application with which I can reproduce the issue:
package my.package
import com.mongodb.spark.MongoConnector
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession
import org.bson.Document
object MySparkApplication {
def main(args: Array[String]): Unit = {
val sparkSession: SparkSession = SparkSession.builder()
.appName("MySparkApplication")
.master(???) // The Spark master URL
.config("spark.jars", ???) // The path at which the application's fat JAR is located.
.config("spark.scheduler.mode", "FAIR")
.config("spark.mongodb.keep_alive_ms", "86400000")
.getOrCreate()
val mongoConnector: MongoConnector = MongoConnector(Map(
"uri" -> ??? // The MongoDB URI.
, "spark.mongodb.keep_alive_ms" -> "86400000"
, "keep_alive_ms" -> "86400000"
))
val localDocumentIds: Seq[Long] = Seq.range(1L, 100L)
val documentIdsRdd: RDD[Long] = sparkSession.sparkContext.parallelize(localDocumentIds)
val result: RDD[Document] = documentIdsRdd.mapPartitions { documentIdsIterator =>
mongoConnector.withMongoClientDo { mongoClient =>
val collection = mongoClient.getDatabase("databaseName").getCollection("collectionName")
// Some expensive query that should only be performed once for every partition.
collection.find(new Document("_id", 99999L)).first()
documentIdsIterator.map { documentId =>
// An expensive operation that does not interact with the Mongo database.
Thread.sleep(1000)
collection.find(new Document("_id", documentId)).first()
}
}
}
val resultLocal = result.collect()
}
}
The stack trace
Below is the stack trace returned by Spark when I run the application above:
Driver stacktrace:
[...]
at my.package.MySparkApplication.main(MySparkApplication.scala:41)
at my.package.MySparkApplication.main(MySparkApplication.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalStateException: state should be: open
at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70)
at com.mongodb.connection.BaseCluster.getDescription(BaseCluster.java:152)
at com.mongodb.Mongo.getConnectedClusterDescription(Mongo.java:885)
at com.mongodb.Mongo.createClientSession(Mongo.java:877)
at com.mongodb.Mongo$3.getClientSession(Mongo.java:866)
at com.mongodb.Mongo$3.execute(Mongo.java:823)
at com.mongodb.FindIterableImpl.first(FindIterableImpl.java:193)
at my.package.MySparkApplication$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(MySparkApplication.scala:36)
at my.package.MySparkApplication$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(MySparkApplication.scala:33)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The research I have done
I have found several people asking about this issue, and it seems that in all of their cases, the problem turned out to be them using the Mongo client after it had been closed. As far as I can tell, this is not happening in my application - opening and closing the connection should be handled by the Mongo-Spark connector, and I would expect the client to only be closed after the function passed to mongoConnector.withMongoClientDo returns.
I did manage to discover that the issue does not arise for the very first element in the RDD. It seems instead that a number of elements are being processed successfully, and that the failure only occurs once the process has taken a certain amount of time. This amount of time seems to be on the order of 5 to 15 seconds.
The above leads me to believe that something is automatically closing the client once it has been active for a certain amount of time, even though it is still being used.
As you can tell by my code, I have discovered the fact that the Mongo-Spark connector exposes a configuration spark.mongodb.keep_alive_ms that, according to the connector documentation, controls "The length of time to keep a MongoClient available for sharing". Its default value is 5 seconds, so this seemed like a useful thing to try. In the application above, I attempt to set it to an entire day in three different ways, with zero effect. The documentation does state that this specific property "can only be configured via a System Property". I think that this is what I'm doing (by setting the property when initialising the Spark session and/or Mongo connector), but I'm not entirely sure. It seems to be impossible to verify the setting once the Mongo connector has been initialised.
One other StackOverflow question mentions that I should try setting the maxConnectionIdleTime option in the MongoClientOptions, but as far as I can tell it is not possible to set these options through the connector.
As a sanity check, I tried replacing the use of mapPartitions with a functionally equivalent use of map. The issue disappeared, which is probably because the connection to the Mongo database is re-initialised for each individual element of the RDD. However, as mentioned above, this approach would have significantly worse performance because I would end up repeating expensive setup work for every element in the RDD.
Out of curiosity I also tried replacing the call to mapPartitions with a call to foreachPartition, also replacing the call to documentIdsIterator.map with documentIdsIterator.foreach. The issue also disappeared in this case. I have no idea why this would be, but because I need to transform my RDD, this is also not an acceptable approach.
The kind of answer I am looking for
"You actually are closing the client prematurely, and here's where: [...]"
"This is a known issue in the Mongo-Spark connector, and here's a link to their issue tracker: [...]"
"You are setting the spark.mongodb.keep_alive_ms property incorrectly, this is how you should do it: [...]"
"It is possible to verify the value of spark.mongodb.keep_alive_ms on your Mongo connector, and here's how: [...]"
"It is possible to set MongoClientOptions such as maxConnectionIdleTime through the Mongo connector, and here's how: [...]"
Edit
Further investigation has yielded the following insight:
The phrase 'System property' used in the connector's documentation refers to a Java system property, set using System.setProperty("spark.mongodb.keep_alive_ms", desiredValue) or the command line option -Dspark.mongodb.keep_alive_ms=desiredValue. This value is then read by the MongoConnector singleton object, and passed to the MongoClientCache. However, neither of the approaches for setting this property actually works:
Calling System.setProperty() from the driver program sets the value only in the JVM for the Spark driver program, while the value is needed in the Spark worker's JVM.
Calling System.setProperty() from the worker program sets the value only after it is read by MongoConnector.
Passing the command line option -Dspark.mongodb.keep_alive_ms to the Spark option spark.driver.extraJavaOptions again only sets the value in the driver's JVM.
Passing the command line option to the Spark option spark.executor.extraJavaOptions results in an error message from Spark:
Exception in thread "main" java.lang.Exception: spark.executor.extraJavaOptions is not allowed to set Spark options (was '-Dspark.mongodb.keep_alive_ms=desiredValue'). Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit.
The Spark code that throws this error is located in org.apache.spark.SparkConf#validateSettings, where it checks for any worker option value that contains the string -Dspark.
This seems like an oversight in the design of the Mongo connector; either the property should be set through the Spark session (as I originally expected it to be), or it should be renamed to something that doesn't start with spark. I added this information to the JIRA ticket mentioned in the comments.
The core issue is that the MongoConnector uses a cache for MongoClients and follows the loan pattern for managing that cache. Once all loaned MongoClients are returned and the keep_alive_ms time has passed the MongoClient will be closed and removed from the cache.
Due to the nature of how RDD's are implemented (they follow Scala's lazy collection semantics) the following code: documentIdsIterator.map { documentId => ... } is only processed once the RDD is actioned. By that time the loaned MongoClient has already been returned back to the cache and after the keep_alive_ms the MongoClient will be closed. This results in a state should be open exception on the client.
How to solve?
You could once SPARK-246 is fixed set the keep_alive_ms to be high enough so that the MongoClient is not closed during the processing of the RDD. However, that still breaks the contract of the loan pattern that the MongoConnector uses - so should be avoided.
Reuse the MongoConnector to get the client as needed. This way the cache can still be used, should a client be available, but should a client timeout for any reason then a new one will be automatically created:
documentIdsRdd.mapPartitions { documentIdsIterator =>
mongoConnector.withMongoClientDo { mongoClient =>
// Do some expensive operation
...
// Return the lazy collection
documentIdsIterator.map { documentId =>
// Loan the mongoClient
mongoConnector.withMongoClientDo { mongoClient => ... }
}
}
}
Connection objects are in general tightly bound to the context, in which they where initialized. You cannot simply serialize such objects and pass them around. Instead you these should be initialized in-place, in the mapPartitions:
val result: RDD[Document] = documentIdsRdd.mapPartitions { documentIdsIterator =>
val mongoConnector: MongoConnector = MongoConnector(Map(
"uri" -> ??? // The MongoDB URI.
, "spark.mongodb.keep_alive_ms" -> "86400000"
, "keep_alive_ms" -> "86400000"
))
mongoConnector.withMongoClientDo { mongoClient =>
...
}
}
I am using Flink v1.4.0 and I have set up two distinct jobs. The first is a pipeline that consumes data from a Kafka Topic and stores them into a Queryable State (QS). Data are keyed by date. The second submits a query to the QS job and processes the returned data.
Both jobs were working fine with Flink v.1.3.2. But with the new update, everything has broken. Here is part of the code for the first job:
private void runPipeline() throws Exception {
StreamExecutionEnvironment env = configurationEnvironment();
QueryableStateStream<String, DataBucket> dataByDate = env.addSource(sourceDataFromKafka())
.map(NewDataClass::new)
.keyBy(data.date)
.asQueryableState("QSName", reduceIntoSingleDataBucket());
}
and here is the code on client side:
QueryableStateClient client = new QueryableStateClient("localhost", 6123);
// the state descriptor of the state to be fetched.
ValueStateDescriptor<DataBucket> descriptor = new ValueStateDescriptor<>(
"QSName",
TypeInformation.of(new TypeHint<DataBucket>() {}));
jobId = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
String key = "2017-01-06";
CompletableFuture<ValueState<DataBucket> resultFuture = client.getKvState(
jobId,
"QSName",
key,
BasicTypeInfo.STRING_TYPE_INFO,
descriptor);
try {
ValueState<DataBucket> valueState = resultFuture.get();
DataBucket bucket = valueState.value();
System.out.println(bucket.getLabel());
} catch (IOException | InterruptionException | ExecutionException e) {
throw new RunTimeException("Unable to query bucket key: " + key , e);
}
I have followed the instructions as per the following link:
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/queryable_state.html
making sure to enable the queryable state on my Flink cluster by including the flink-queryable-state-runtime_2.11-1.4.0.jar from the opt/ folder of your Flink distribution to the lib/ folder and checked it runs in the task manager.
I keep getting the following error:
Exception in thread "main" java.lang.NullPointerException
at org.apache.flink.api.java.typeutils.GenericTypeInfo.createSerializer(GenericTypeInfo.java:84)
at org.apache.flink.api.common.state.StateDescriptor.initializeSerializerUnlessSet(StateDescriptor.java:253)
at org.apache.flink.queryablestate.client.QueryableStateClient.getKvState(QueryableStateClient.java:210)
at org.apache.flink.queryablestate.client.QueryableStateClient.getKvState(QueryableStateClient.java:174)
at com.company.dept.query.QuerySubmitter.main(QuerySubmitter.java:37)
Any idea of what is happening? I think that my requests don't reach the QS at all ... Really don't know if and how I should change anything. Thanks.
So, as it turned out, it was 2 things that were causing this error. The first was the use of the wrong constructor for creating a descriptor on the client side. Rather than using the one that only takes as input a name for the QS and a TypeHint, I had to use another one where a keySerialiser along with a default value are provided as per below:
ValueStateDescriptor<DataBucket> descriptor = new ValueStateDescriptor<>(
"QSName",
TypeInformation.of(new TypeHint<DataBucket>() {}).createSerializer(new ExecutionConfig()),
DataBucket.emptyBucket()); // or anything that can be used as a default value
The second was relevant to the host and port values. The port was different from v1.3.2 now set to 9069 and the localhost was also different in my case. You can verify both by checking the logs of any task manager for the line:
Started the Queryable State Proxy Server # ....
Finally, in case you are here because you are looking to allow port-range for queryable state client proxy, I suggest you follow the respective issue (FLINK-7788) here: https://issues.apache.org/jira/browse/FLINK-7788.
I am trying to speed up "SELECT * FROM WHERE name=?" kind of queries in Play! + Scala app. I am using Play 2.4 + Scala 2.11 + play-slick-1.1.1 package. This package uses Slick-3.1 version.
My hypothesis was that slick generates Prepared statements from DBIO actions and they get executed. So I tried to cache them buy turning on flag cachePrepStmts=true
However I still see "Preparing statement..." messages in the log which means that PS are not getting cached! How should one instructs slick to cache them?
If I run following code shouldn't the PS be cached at some point?
for (i <- 1 until 100) {
Await.result(db.run(doctorsTable.filter(_.userName === name).result), 10 seconds)
}
Slick config is as follows:
slick.dbs.default {
driver="slick.driver.MySQLDriver$"
db {
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/staging_db?useSSL=false&cachePrepStmts=true"
user = "user"
password = "passwd"
numThreads = 1 // For not just one thread in HikariCP
properties = {
cachePrepStmts = true
prepStmtCacheSize = 250
prepStmtCacheSqlLimit = 2048
}
}
}
Update 1
I tried following as per #pawel's suggestion of using compiled queries:
val compiledQuery = Compiled { name: Rep[String] =>
doctorsTable.filter(_.userName === name)
}
val stTime = TimeUtil.getUtcTime
for (i <- 1 until 100) {
FutureUtils.blockFuture(db.compiledQuery(name).result), 10)
}
val endTime = TimeUtil.getUtcTime - stTime
Logger.info(s"Time Taken HERE $endTime")
In my logs I still see statement like:
2017-01-16 21:34:00,510 DEBUG [db-1] s.j.J.statement [?:?] Preparing statement: select ...
Also timing of this is also remains the same. What is the desired output? Should I not see these statements anymore? How can I verify if Prepared statements are indeed reused.
You need to use Compiled queries - which are exactly doing what you want.
Just change above code to:
val compiledQuery = Compiled { name: Rep[String] =>
doctorsTable.filter(_.userName === name)
}
for (i <- 1 until 100) {
Await.result(db.run(compiledQuery(name).result), 10 seconds)
}
I extracted above name as a parameter (because you usually want to change some parameters in your PreparedStatements) but that's definitely an optional part.
For further information you can refer to: http://slick.lightbend.com/doc/3.1.0/queries.html#compiled-queries
For MySQL you need to set an additional jdbc flag, useServerPrepStmts=true
HikariCP's MySQL configuration page links to a quite useful document that provides some simple performance tuning configuration options for MySQL jdbc.
Here are a few that I've found useful (you'll need to & append them to jdbc url for options not exposed by Hikari's API). Be sure to read through linked document and/or MySQL documentation for each option; should be mostly safe to use.
zeroDateTimeBehavior=convertToNull&characterEncoding=UTF-8
rewriteBatchedStatements=true
maintainTimeStats=false
cacheServerConfiguration=true
avoidCheckOnDuplicateKeyUpdateInSQL=true
dontTrackOpenResources=true
useLocalSessionState=true
cachePrepStmts=true
useServerPrepStmts=true
prepStmtCacheSize=500
prepStmtCacheSqlLimit=2048
Also, note that statements are cached per thread; depending on what you set for Hikari connection maxLifetime and what server load is, memory usage will increase accordingly on both server and client (e.g. if you set connection max lifetime to just under MySQL default of 8 hours, both server and client will keep N prepared statements alive in memory for the life of each connection).
p.s. curious if bottleneck is indeed statement caching or something specific to Slick.
EDIT
to log statements enable the query log. On MySQL 5.7 you would add to your my.cnf:
general-log=1
general-log-file=/var/log/mysqlgeneral.log
and then sudo touch /var/log/mysqlgeneral.log followed by a restart of mysqld. Comment out above config lines and restart to turn off query logging.
I am looking to open up multiple connections using a netty client bootstrap in order to parse messages coming from multiple sources. The messages all have the same format, however, due to the amount of data that needs to be processed, I must run each connection on separate threads (This is assuming netty creates a thread per client channel, which I couldn't find a reference for - if that's not the case, how would this be achieved?).
This is the code that I use to connect to the data server:
var b = new Bootstrap()
.group(group)
.channel(classOf[NioSocketChannel])
.handler(RawFeedChannelInitializer)
var ch1 = b.clone().connect(host, port).sync().channel();
var ch2 = b.clone().connect(host, port).sync().channel();
The initializer calls RawPacketDecoder, which extends ReplayingDecoder, and is defined here.
The code works well without #Sharable when opening a single connection, but for the purpose of my application I must connect to the same server multiple times.
This results in the runtime error #Sharable annotation is not allowed pointing to my RawPacketDecoder class.
I am not entirely sure on how to get past this issue, short of reimplementing in scala an instantiable class of ReplayingDecoder as my decoder based directly on ByteToMessageDecoder.
Any help would be greatly appreciated.
Note: I am using netty 4.0.32 Final
I found the solution in this StockExchange answer.
My issue was that I was using an object based ChannelInitializer (singleton), and ReplayingDecoder as well as ByteToMessageDecoder are not sharable.
My initializer was created as a scala object, and therefore a single instance allowed. Changing the initializer to a scala class and instantiating for each bootstrap clone solved the problem. I modified the bootstrap code above as follows:
var b = new Bootstrap()
.group(group)
.channel(classOf[NioSocketChannel])
//.handler(RawFeedChannelInitializer)
var ch1 = b.clone().handler(new RawFeedChannelInitializer()).connect(host, port).sync().channel();
var ch2 = b.clone().handler(new RawFeedChannelInitializer()).connect(host, port).sync().channel();
I am not sure whether this ensures multithreading as wanted but it does allow to split the data access into multiple connections to the feed server.
Edit Update: After performing additional research on the subject, I have determined that netty does in fact create a thread per channel; this was verified by printing to console after the creation of each channel:
println("No. of active threads: " + Thread.activeCount());
The output shows an incremental number as channels are created and associated with their respective threads.
By default NioEventLoopGroup uses 2*Num_CPU_cores threads as defined here:
DEFAULT_EVENT_LOOP_THREADS = Math.max(1, SystemPropertyUtil.getInt(
"io.netty.eventLoopThreads",
Runtime.getRuntime().availableProcessors() * 2));
This value can be overriden to something else by setting
val group = new NioEventLoopGroup(16)
and then using the group to create/setup the bootstrap.
I want to know how can I do following things in scala?
Connect to a postgreSQL database.
Write SQL queries like SELECT , UPDATE etc. to modify a table in that database.
I know that in python I can do it using PygreSQL but how to do these things in scala?
You need to add dependency "org.postgresql" % "postgresql" % "9.3-1102-jdbc41" in build.sbt and you can modify following code to connect and query database. Replace DB_USER with your db user and DB_NAME as your db name.
import java.sql.{Connection, DriverManager, ResultSet}
object pgconn extends App {
println("Postgres connector")
classOf[org.postgresql.Driver]
val con_st = "jdbc:postgresql://localhost:5432/DB_NAME?user=DB_USER"
val conn = DriverManager.getConnection(con_str)
try {
val stm = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
val rs = stm.executeQuery("SELECT * from Users")
while(rs.next) {
println(rs.getString("quote"))
}
} finally {
conn.close()
}
}
I would recommend having a look at Doobie.
This chapter in the "Book of Doobie" gives a good sense of what your code will look like if you make use of this library.
This is the library of choice right now to solve this problem if you are interested in the pure FP side of Scala, i.e. scalaz, scalaz-stream (probably fs2 and cats soon) and referential transparency in general.
It's worth nothing that Doobie is NOT an ORM. At its core, it's simply a nicer, higher-level API over JDBC.
Take look at the tutorial "Using Scala with JDBC to connect to MySQL", replace the db url and add the right jdbc library. The link got broken so here's the content of the blog:
Using Scala with JDBC to connect to MySQL
A howto on connecting Scala to a MySQL database using JDBC. There are a number of database libraries for Scala, but I ran into a problem getting most of them to work. I attempted to use scala.dbc, scala.dbc2, Scala Query and Querulous but either they aren’t supported, have a very limited featured set or abstracts SQL to a weird pseudo language.
The Play Framework has a new database library called ANorm which tries to keep the interface to basic SQL but with a slight improved scala interface. The jury is still out for me, only used on one project minimally so far. Also, I’ve only seen it work within a Play app, does not look like it can be extracted out too easily.
So I ended up going with basic Java JDBC connection and it turns out to be a fairly easy solution.
Here is the code for accessing a database using Scala and JDBC. You need to change the connection string parameters and modify the query for your database. This example was geared towards MySQL, but any Java JDBC driver should work the same with Scala.
Basic Query
import java.sql.{Connection, DriverManager, ResultSet};
// Change to Your Database Config
val conn_str = "jdbc:mysql://localhost:3306/DBNAME?user=DBUSER&password=DBPWD"
// Load the driver
classOf[com.mysql.jdbc.Driver]
// Setup the connection
val conn = DriverManager.getConnection(conn_str)
try {
// Configure to be Read Only
val statement = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
// Execute Query
val rs = statement.executeQuery("SELECT quote FROM quotes LIMIT 5")
// Iterate Over ResultSet
while (rs.next) {
println(rs.getString("quote"))
}
}
finally {
conn.close
}
You will need to download the mysql-connector jar.
Or if you are using maven, the pom snippets to load the mysql connector, you’ll need to check what the latest version is.
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.12</version>
</dependency>
To run the example, save the following to a file (query_test.scala) and run using, the following specifying the classpath to the connector jar:
scala -cp mysql-connector-java-5.1.12.jar:. query_test.scala
Insert, Update and Delete
To perform an insert, update or delete you need to create an updatable statement object. The execute command is slightly different and you will most likely want to use some sort of parameters. Here’s an example doing an insert using jdbc and scala with parameters.
// create database connection
val dbc = "jdbc:mysql://localhost:3306/DBNAME?user=DBUSER&password=DBPWD"
classOf[com.mysql.jdbc.Driver]
val conn = DriverManager.getConnection(dbc)
val statement = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_UPDATABLE)
// do database insert
try {
val prep = conn.prepareStatement("INSERT INTO quotes (quote, author) VALUES (?, ?) ")
prep.setString(1, "Nothing great was ever achieved without enthusiasm.")
prep.setString(2, "Ralph Waldo Emerson")
prep.executeUpdate
}
finally {
conn.close
}
We are using Squeryl, which is working well so far for us. Depending on your needs it may do the trick.
Here is a list of supported DB's and the adapters
If you want/need to write your own SQL, but hate the JDBC interface, take a look at O/R Broker
I would recommend the Quill query library. Here is an introduction post by Li Haoyi to get started.
TL;DR
{
import io.getquill._
import com.zaxxer.hikari.{HikariConfig, HikariDataSource}
val pgDataSource = new org.postgresql.ds.PGSimpleDataSource()
pgDataSource.setUser("postgres")
pgDataSource.setPassword("example")
val config = new HikariConfig()
config.setDataSource(pgDataSource)
val ctx = new PostgresJdbcContext(LowerCase, new HikariDataSource(config))
import ctx._
}
Define a class ORM:
// mapping `city` table
case class City(
id: Int,
name: String,
countryCode: String,
district: String,
population: Int
)
and query all items:
# ctx.run(query[City])
cmd11.sc:1: SELECT x.id, x.name, x.countrycode, x.district, x.population FROM city x
val res11 = ctx.run(query[City])
^
res11: List[City] = List(
City(1, "Kabul", "AFG", "Kabol", 1780000),
City(2, "Qandahar", "AFG", "Qandahar", 237500),
City(3, "Herat", "AFG", "Herat", 186800),
...
ScalikeJDBC is quite easy to use. It allows you to write raw SQL using interpolated strings.
http://scalikejdbc.org/