How can I connect to a postgreSQL database in scala? - postgresql

I want to know how can I do following things in scala?
Connect to a postgreSQL database.
Write SQL queries like SELECT , UPDATE etc. to modify a table in that database.
I know that in python I can do it using PygreSQL but how to do these things in scala?

You need to add dependency "org.postgresql" % "postgresql" % "9.3-1102-jdbc41" in build.sbt and you can modify following code to connect and query database. Replace DB_USER with your db user and DB_NAME as your db name.
import java.sql.{Connection, DriverManager, ResultSet}
object pgconn extends App {
println("Postgres connector")
classOf[org.postgresql.Driver]
val con_st = "jdbc:postgresql://localhost:5432/DB_NAME?user=DB_USER"
val conn = DriverManager.getConnection(con_str)
try {
val stm = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
val rs = stm.executeQuery("SELECT * from Users")
while(rs.next) {
println(rs.getString("quote"))
}
} finally {
conn.close()
}
}

I would recommend having a look at Doobie.
This chapter in the "Book of Doobie" gives a good sense of what your code will look like if you make use of this library.
This is the library of choice right now to solve this problem if you are interested in the pure FP side of Scala, i.e. scalaz, scalaz-stream (probably fs2 and cats soon) and referential transparency in general.
It's worth nothing that Doobie is NOT an ORM. At its core, it's simply a nicer, higher-level API over JDBC.

Take look at the tutorial "Using Scala with JDBC to connect to MySQL", replace the db url and add the right jdbc library. The link got broken so here's the content of the blog:
Using Scala with JDBC to connect to MySQL
A howto on connecting Scala to a MySQL database using JDBC. There are a number of database libraries for Scala, but I ran into a problem getting most of them to work. I attempted to use scala.dbc, scala.dbc2, Scala Query and Querulous but either they aren’t supported, have a very limited featured set or abstracts SQL to a weird pseudo language.
The Play Framework has a new database library called ANorm which tries to keep the interface to basic SQL but with a slight improved scala interface. The jury is still out for me, only used on one project minimally so far. Also, I’ve only seen it work within a Play app, does not look like it can be extracted out too easily.
So I ended up going with basic Java JDBC connection and it turns out to be a fairly easy solution.
Here is the code for accessing a database using Scala and JDBC. You need to change the connection string parameters and modify the query for your database. This example was geared towards MySQL, but any Java JDBC driver should work the same with Scala.
Basic Query
import java.sql.{Connection, DriverManager, ResultSet};
// Change to Your Database Config
val conn_str = "jdbc:mysql://localhost:3306/DBNAME?user=DBUSER&password=DBPWD"
// Load the driver
classOf[com.mysql.jdbc.Driver]
// Setup the connection
val conn = DriverManager.getConnection(conn_str)
try {
// Configure to be Read Only
val statement = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
// Execute Query
val rs = statement.executeQuery("SELECT quote FROM quotes LIMIT 5")
// Iterate Over ResultSet
while (rs.next) {
println(rs.getString("quote"))
}
}
finally {
conn.close
}
You will need to download the mysql-connector jar.
Or if you are using maven, the pom snippets to load the mysql connector, you’ll need to check what the latest version is.
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.12</version>
</dependency>
To run the example, save the following to a file (query_test.scala) and run using, the following specifying the classpath to the connector jar:
scala -cp mysql-connector-java-5.1.12.jar:. query_test.scala
Insert, Update and Delete
To perform an insert, update or delete you need to create an updatable statement object. The execute command is slightly different and you will most likely want to use some sort of parameters. Here’s an example doing an insert using jdbc and scala with parameters.
// create database connection
val dbc = "jdbc:mysql://localhost:3306/DBNAME?user=DBUSER&password=DBPWD"
classOf[com.mysql.jdbc.Driver]
val conn = DriverManager.getConnection(dbc)
val statement = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_UPDATABLE)
// do database insert
try {
val prep = conn.prepareStatement("INSERT INTO quotes (quote, author) VALUES (?, ?) ")
prep.setString(1, "Nothing great was ever achieved without enthusiasm.")
prep.setString(2, "Ralph Waldo Emerson")
prep.executeUpdate
}
finally {
conn.close
}

We are using Squeryl, which is working well so far for us. Depending on your needs it may do the trick.
Here is a list of supported DB's and the adapters

If you want/need to write your own SQL, but hate the JDBC interface, take a look at O/R Broker

I would recommend the Quill query library. Here is an introduction post by Li Haoyi to get started.
TL;DR
{
import io.getquill._
import com.zaxxer.hikari.{HikariConfig, HikariDataSource}
val pgDataSource = new org.postgresql.ds.PGSimpleDataSource()
pgDataSource.setUser("postgres")
pgDataSource.setPassword("example")
val config = new HikariConfig()
config.setDataSource(pgDataSource)
val ctx = new PostgresJdbcContext(LowerCase, new HikariDataSource(config))
import ctx._
}
Define a class ORM:
// mapping `city` table
case class City(
id: Int,
name: String,
countryCode: String,
district: String,
population: Int
)
and query all items:
# ctx.run(query[City])
cmd11.sc:1: SELECT x.id, x.name, x.countrycode, x.district, x.population FROM city x
val res11 = ctx.run(query[City])
^
res11: List[City] = List(
City(1, "Kabul", "AFG", "Kabol", 1780000),
City(2, "Qandahar", "AFG", "Qandahar", 237500),
City(3, "Herat", "AFG", "Herat", 186800),
...

ScalikeJDBC is quite easy to use. It allows you to write raw SQL using interpolated strings.
http://scalikejdbc.org/

Related

Slick 3.2 CodeGenerator Tool hardcodes the database name in the autogenerated code

I am using slick 3.2 code generation tool and I auto generated the code against my production database. In the generated code I can see
class BarActivity(_tableTag: Tag) extends profile.api.Table[BarActivityRow](_tableTag, Some("foo_prod"), "bar_activity") {
Here foo_prod is the database against which the code generator ran.
The problem is that for multiple environments my databases are named differently. So dev database is foo_dev and qa database is foo_qa.
I don't want to generate the database file each and every-time I switch environments. I want to use the same generated code across environments.
I think slick should have allowed us to specify the database name from the connection properties.
So with auto generated code, how do I write a problem which has two connections. One to read data from prod and second to write data to dev? Should I generate the code twice?
You can customize the output as you wish :
The default value of where you're looking at is in AbstractSourceCodeGenerator.scala -> TableClassDef trait, in the code method.
So, assuming you have a class like
class MyGenerator extends SourceCodeGenerator(model) {
you can add in it
override def Table = new Table(_) {
override def TableClass = new TableClassDef {
override def code = {
val prns = parents.map(" with " + _).mkString("")
s"""class $name(_tableTag: Tag) extends profile.api.Table[$elementType](_tableTag, ${hereGoesTheDatabaseNameAndTheTableNameComaSeparated})$prns {
${indent(body.map(_.mkString("\n")).mkString("\n\n"))}"""
}
}
}
In your case, you may want to have the variable hereGoesTheDatabaseNameAndTheTableNameComaSeparated fetch the database name from some config file or environment variable for example ?
Full disclaimer : I never customized this part myself, but I can't see a big reason this wouldn't work.

ingesting data in solr using spark scala

I am trying to ingest data to solr using scala and spark however, my code is missing something. For instance, I got below code from Hortonworks tutorial.
I am using spark 1.6.2, solr 5.2.1, scala 2.10.5.
Can anybody provide me a workable snippet to successfully insert data into solr?
val input_file = "hdfs:///tmp/your_text_file"
case class Person(id: Int, name: String)
val people_df1 = sc.textFile(input_file).map(_.split(",")).map(p => Person(p(0).trim.toInt, p(1))).toDF()
val docs = people_df1.map{doc=>
val docx=SolrSupport.autoMapToSolrInputDoc(doc.getAs[Int]("id").toString, doc, null)
docx.setField("scala_s", "supercool")
docx.setField("name_s", doc.getAs[String]("name"))
}
// below code has compilation issue somehow although jar file doest contain these functions.
SolrSupport.indexDocs("sandbox.hortonworks.com:2181","testsparksolr",10,docs)
val solrServer = com.lucidworks.spark.SolrSupport.getSolrServer("http://ambari.asiacell.com:2181")
solrServer.setDefaultCollection("
testsparksolr")
solrServer.commit(false, false)
thanks in advance
Have you tried spark-solr?
The library's main focus is to provide a clean API to index documents to a Solr server as in your case.

Cache Slick DBIO Actions

I am trying to speed up "SELECT * FROM WHERE name=?" kind of queries in Play! + Scala app. I am using Play 2.4 + Scala 2.11 + play-slick-1.1.1 package. This package uses Slick-3.1 version.
My hypothesis was that slick generates Prepared statements from DBIO actions and they get executed. So I tried to cache them buy turning on flag cachePrepStmts=true
However I still see "Preparing statement..." messages in the log which means that PS are not getting cached! How should one instructs slick to cache them?
If I run following code shouldn't the PS be cached at some point?
for (i <- 1 until 100) {
Await.result(db.run(doctorsTable.filter(_.userName === name).result), 10 seconds)
}
Slick config is as follows:
slick.dbs.default {
driver="slick.driver.MySQLDriver$"
db {
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/staging_db?useSSL=false&cachePrepStmts=true"
user = "user"
password = "passwd"
numThreads = 1 // For not just one thread in HikariCP
properties = {
cachePrepStmts = true
prepStmtCacheSize = 250
prepStmtCacheSqlLimit = 2048
}
}
}
Update 1
I tried following as per #pawel's suggestion of using compiled queries:
val compiledQuery = Compiled { name: Rep[String] =>
doctorsTable.filter(_.userName === name)
}
val stTime = TimeUtil.getUtcTime
for (i <- 1 until 100) {
FutureUtils.blockFuture(db.compiledQuery(name).result), 10)
}
val endTime = TimeUtil.getUtcTime - stTime
Logger.info(s"Time Taken HERE $endTime")
In my logs I still see statement like:
2017-01-16 21:34:00,510 DEBUG [db-1] s.j.J.statement [?:?] Preparing statement: select ...
Also timing of this is also remains the same. What is the desired output? Should I not see these statements anymore? How can I verify if Prepared statements are indeed reused.
You need to use Compiled queries - which are exactly doing what you want.
Just change above code to:
val compiledQuery = Compiled { name: Rep[String] =>
doctorsTable.filter(_.userName === name)
}
for (i <- 1 until 100) {
Await.result(db.run(compiledQuery(name).result), 10 seconds)
}
I extracted above name as a parameter (because you usually want to change some parameters in your PreparedStatements) but that's definitely an optional part.
For further information you can refer to: http://slick.lightbend.com/doc/3.1.0/queries.html#compiled-queries
For MySQL you need to set an additional jdbc flag, useServerPrepStmts=true
HikariCP's MySQL configuration page links to a quite useful document that provides some simple performance tuning configuration options for MySQL jdbc.
Here are a few that I've found useful (you'll need to & append them to jdbc url for options not exposed by Hikari's API). Be sure to read through linked document and/or MySQL documentation for each option; should be mostly safe to use.
zeroDateTimeBehavior=convertToNull&characterEncoding=UTF-8
rewriteBatchedStatements=true
maintainTimeStats=false
cacheServerConfiguration=true
avoidCheckOnDuplicateKeyUpdateInSQL=true
dontTrackOpenResources=true
useLocalSessionState=true
cachePrepStmts=true
useServerPrepStmts=true
prepStmtCacheSize=500
prepStmtCacheSqlLimit=2048
Also, note that statements are cached per thread; depending on what you set for Hikari connection maxLifetime and what server load is, memory usage will increase accordingly on both server and client (e.g. if you set connection max lifetime to just under MySQL default of 8 hours, both server and client will keep N prepared statements alive in memory for the life of each connection).
p.s. curious if bottleneck is indeed statement caching or something specific to Slick.
EDIT
to log statements enable the query log. On MySQL 5.7 you would add to your my.cnf:
general-log=1
general-log-file=/var/log/mysqlgeneral.log
and then sudo touch /var/log/mysqlgeneral.log followed by a restart of mysqld. Comment out above config lines and restart to turn off query logging.

How to implement master/slave structure with squeryl and play framework

I am running a play framework website that uses squeryl and mysql database.
I need to use squeryl to run all read queries to the slave and all write queries to the master.
How can I achieve this? either via squeryl or via jdbc connector itself.
Many thanks,
I don't tend to use MySQL myself, but here's an idea:
Based on the documentation here, the MySQL JDBC driver will round robin amongst the slaves if the readOnly attribute is properly set on the Connnection. In order to retrieve and change the current Connection you'll want to use code like
transaction {
val conn = Session.currentSession.connection
conn.setReadOnly(true)
//Your code here
}
Even better, you can create your own readOnlyTransaction method:
def readOnlyTransaction(f: => Unit) = {
transaction {
val conn = Session.currentSession.connection
val orig = conn.getReadOnly()
conn.setReadOnly(true)
f
conn.setReadOnly(orig)
}
}
Then use it like:
readOnlyTransaction {
//Your code here
}
You'll probably want to clean that up a bit so the default readOnly state is reset if an exception occurs, but you get the general idea.

Self-consistent and updated example of using Spark over ElasticSearch

This guy had a very small example that showed how to integrate ElasticSearch and Spark, when all the ES ecosystem was around version 0.9. Nowadays, it doesn't work anymore (and googling for it doesn't seem an easy feat). Can someone give a small, self-contained Scala example of:
Opening a file in spark (in the example above, it was /var/log/syslog);
Doing something with it;
Sending the result into ES;
Opening that result back in Spark.
... that works with ElasticSearch 1.3.4 and Spark 1.1.0.
I gave a talk awhile back with Spark and Elastic Search (around the 0.9 days), and I recently updated some of the examples for present day (read 1.1). I've posted the slides and the example code. Hope that helps!
I've also copied the relevant sections (from my own github repo) here:
import org.elasticsearch.spark.sql._
...
val tweetsAsCS =
createSchemaRDD(tweetRDD.map(SharedIndex.prepareTweetsCaseClass))
tweetsAsCS.saveToEs(esResource)
Note that we didn't specify any ES nodes. This will default to trying to save to a cluster on local host. If we want to use a different cluster we can add:
// if we want to have a different es cluster we can add
import org.elasticsearch.hadoop.cfg.ConfigurationOptions
val config = new SparkConf()
config.set(ConfigurationOptions.ES_NODES, node) // set the node for discovery
// other config settings
val sc = new SparkContext(config)
So that will do the first part (indexing some data).
Querying ES from Spark has also gotten a lot simpler, although only if your data types are supported by the mappings of the connector (the primary one I ran into that wasn't was geolocation but its easy enough to extend the mapper if you run into this).
val query = "{\"query\": {\"filtered\" : {\"query\" : {\"match_all\" : {}},\"filter\" : { \"geo_distance\" : { \"distance\" : \""+ dist + "km\", \"location\" : { \"lat\" : "+ lat +", \"lon\" : "+ lon +" }}}}}}"
val tweets = sqlCtx.esRDD(esResource, query)
The esRDD function isn't normally on the SQLContext, but the implicit conversions we imported up above make it available to us. tweets is now a SchemaRDD and we can update it as desired and save the results back as we did in the first part of this example.
Hope this helps!