BulkLoading to Phoenix using Spark - scala

I was trying to code some utilities to bulk load data through HFiles from Spark RDDs.
I was taking the pattern of CSVBulkLoadTool from phoenix. I managed to generate some HFiles and load them into HBase, but i can't see the rows using sqlline(e.g using hbase shell it is possible). I would be more than grateful for any suggestions.
BulkPhoenixLoader.scala:
class BulkPhoenixLoader[A <: ImmutableBytesWritable : ClassTag, T <: KeyValue : ClassTag](rdd: RDD[(A, T)]) {
def createConf(tableName: String, inConf: Option[Configuration] = None): Configuration = {
val conf = inConf.map(HBaseConfiguration.create).getOrElse(HBaseConfiguration.create())
val job: Job = Job.getInstance(conf, "Phoenix bulk load")
job.setMapOutputKeyClass(classOf[ImmutableBytesWritable])
job.setMapOutputValueClass(classOf[KeyValue])
// initialize credentials to possibily run in a secure env
TableMapReduceUtil.initCredentials(job)
val htable: HTable = new HTable(conf, tableName)
// Auto configure partitioner and reducer according to the Main Data table
HFileOutputFormat2.configureIncrementalLoad(job, htable)
conf
}
def bulkSave(tableName: String, outputPath: String, conf:
Option[Configuration]) = {
val configuration: Configuration = createConf(tableName, conf)
rdd.saveAsNewAPIHadoopFile(
outputPath,
classOf[ImmutableBytesWritable],
classOf[Put],
classOf[HFileOutputFormat2],
configuration)
}
}
ExtendedProductRDDFunctions.scala:
class ExtendedProductRDDFunctions[A <: scala.Product](data: org.apache.spark.rdd.RDD[A]) extends
ProductRDDFunctions[A](data) with Serializable {
def toHFile(tableName: String,
columns: Seq[String],
conf: Configuration = new Configuration,
zkUrl: Option[String] =
None): RDD[(ImmutableBytesWritable, KeyValue)] = {
val config = ConfigurationUtil.getOutputConfiguration(tableName, columns, zkUrl, Some(conf))
val tableBytes = Bytes.toBytes(tableName)
val encodedColumns = ConfigurationUtil.encodeColumns(config)
val jdbcUrl = zkUrl.map(getJdbcUrl).getOrElse(getJdbcUrl(config))
val conn = DriverManager.getConnection(jdbcUrl)
val query = QueryUtil.constructUpsertStatement(tableName,
columns.toList.asJava,
null)
data.flatMap(x => mapRow(x, jdbcUrl, encodedColumns, tableBytes, query))
}
def mapRow(product: Product,
jdbcUrl: String,
encodedColumns: String,
tableBytes: Array[Byte],
query: String): List[(ImmutableBytesWritable, KeyValue)] = {
val conn = DriverManager.getConnection(jdbcUrl)
val preparedStatement = conn.prepareStatement(query)
val columnsInfo = ConfigurationUtil.decodeColumns(encodedColumns)
columnsInfo.zip(product.productIterator.toList).zipWithIndex.foreach(setInStatement(preparedStatement))
preparedStatement.execute()
val uncommittedDataIterator = PhoenixRuntime.getUncommittedDataIterator(conn, true)
val hRows = uncommittedDataIterator.asScala.filter(kvPair =>
Bytes.compareTo(tableBytes, kvPair.getFirst) == 0
).flatMap(kvPair => kvPair.getSecond.asScala.map(
kv => {
val byteArray = kv.getRowArray.slice(kv.getRowOffset, kv.getRowOffset + kv.getRowLength - 1) :+ 1.toByte
(new ImmutableBytesWritable(byteArray, 0, kv.getRowLength), kv)
}))
conn.rollback()
conn.close()
hRows.toList
}
def setInStatement(statement: PreparedStatement): (((ColumnInfo, Any), Int)) => Unit = {
case ((c, v), i) =>
if (v != null) {
// Both Java and Joda dates used to work in 4.2.3, but now they must be java.sql.Date
val (finalObj, finalType) = v match {
case dt: DateTime => (new Date(dt.getMillis), PDate.INSTANCE.getSqlType)
case d: util.Date => (new Date(d.getTime), PDate.INSTANCE.getSqlType)
case _ => (v, c.getSqlType)
}
statement.setObject(i + 1, finalObj, finalType)
} else {
statement.setNull(i + 1, c.getSqlType)
}
}
private def getIndexTables(conn: Connection, qualifiedTableName: String) : List[(String, String)]
= {
val table: PTable = PhoenixRuntime.getTable(conn, qualifiedTableName)
val tables = table.getIndexes.asScala.map(x => x.getIndexType match {
case IndexType.LOCAL => (x.getTableName.getString, MetaDataUtil.getLocalIndexTableName(qualifiedTableName))
case _ => (x.getTableName.getString, x.getTableName.getString)
}).toList
tables
}
}
The generated HFiles I load with the utility tool from hbase as follows:
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles path/to/hfile tableName

You could just convert your csv file to an RDD of Product and use the .saveToPhoenix method. This is generally how I load csv data into phoenix.
Please see: https://phoenix.apache.org/phoenix_spark.html

Related

How to using Akka Stream with Akk-Http to stream the response

I'm new to Akka Stream. I used following code for CSV parsing.
class CsvParser(config: Config)(implicit system: ActorSystem) extends LazyLogging with NumberValidation {
import system.dispatcher
private val importDirectory = Paths.get(config.getString("importer.import-directory")).toFile
private val linesToSkip = config.getInt("importer.lines-to-skip")
private val concurrentFiles = config.getInt("importer.concurrent-files")
private val concurrentWrites = config.getInt("importer.concurrent-writes")
private val nonIOParallelism = config.getInt("importer.non-io-parallelism")
def save(r: ValidReading): Future[Unit] = {
Future()
}
def parseLine(filePath: String)(line: String): Future[Reading] = Future {
val fields = line.split(";")
val id = fields(0).toInt
try {
val value = fields(1).toDouble
ValidReading(id, value)
} catch {
case t: Throwable =>
logger.error(s"Unable to parse line in $filePath:\n$line: ${t.getMessage}")
InvalidReading(id)
}
}
val lineDelimiter: Flow[ByteString, ByteString, NotUsed] =
Framing.delimiter(ByteString("\n"), 128, allowTruncation = true)
val parseFile: Flow[File, Reading, NotUsed] =
Flow[File].flatMapConcat { file =>
val src = FileSource.fromFile(file).getLines()
val source : Source[String, NotUsed] = Source.fromIterator(() => src)
// val gzipInputStream = new GZIPInputStream(new FileInputStream(file))
source
.mapAsync(parallelism = nonIOParallelism)(parseLine(file.getPath))
}
val computeAverage: Flow[Reading, ValidReading, NotUsed] =
Flow[Reading].grouped(2).mapAsyncUnordered(parallelism = nonIOParallelism) { readings =>
Future {
val validReadings = readings.collect { case r: ValidReading => r }
val average = if (validReadings.nonEmpty) validReadings.map(_.value).sum / validReadings.size else -1
ValidReading(readings.head.id, average)
}
}
val storeReadings: Sink[ValidReading, Future[Done]] =
Flow[ValidReading]
.mapAsyncUnordered(concurrentWrites)(save)
.toMat(Sink.ignore)(Keep.right)
val processSingleFile: Flow[File, ValidReading, NotUsed] =
Flow[File]
.via(parseFile)
.via(computeAverage)
def importFromFiles = {
implicit val materializer = ActorMaterializer()
val files = importDirectory.listFiles.toList
logger.info(s"Starting import of ${files.size} files from ${importDirectory.getPath}")
val startTime = System.currentTimeMillis()
val balancer = GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val balance = builder.add(Balance[File](concurrentFiles))
val merge = builder.add(Merge[ValidReading](concurrentFiles))
(1 to concurrentFiles).foreach { _ =>
balance ~> processSingleFile ~> merge
}
FlowShape(balance.in, merge.out)
}
Source(files)
.via(balancer)
.withAttributes(ActorAttributes.supervisionStrategy { e =>
logger.error("Exception thrown during stream processing", e)
Supervision.Resume
})
.runWith(storeReadings)
.andThen {
case Success(_) =>
val elapsedTime = (System.currentTimeMillis() - startTime) / 1000.0
logger.info(s"Import finished in ${elapsedTime}s")
case Failure(e) => logger.error("Import failed", e)
}
}
}
I wanted to to use Akka HTTP which would give all ValidReading entities parsed from CSV but I couldn't understand on how would I do that.
The above code fetches file from server and parse each lines to generate ValidReading.
How can I pass/upload CSV via akka-http, parse the file and stream the resulted response back to the endpoint?
The "essence" of the solution is something like this:
import akka.http.scaladsl.server.Directives._
val route = fileUpload("csv") {
case (metadata, byteSource) =>
val source = byteSource.map(x => x)
complete(HttpResponse(entity = HttpEntity(ContentTypes.`text/csv(UTF-8)`, source)))
}
You detect that the uploaded thing is a multipart-form-data with a chunk named "csv". You get the byteSource from that. Do the calculation (insert your logic to the .map(x=>x) part). Convert your data back to ByteString. Complete the request with the new source. This will make your endoint like a proxy.

Empty Iterator : Asynchronous cassandra write

I am trying to implement asynchronous cassandra writes on objects (not RDD) using TableWriter. Code snippet below:
class CassandraOperations[T] extends Serializable with Logging {
/**
* Saves the data from object or Iterator of object to a Cassandra table asynchronously. Uses the specified column names.
* You can check whether this action is completed or not by callback on Future.
*/
def saveToCassandraAsync(
cc: CassandraConnector,
keyspaceName: String,
tableName: String,
columns: ColumnSelector = AllColumns,
data: Iterator[T],
writeConf: WriteConf = WriteConf(ttl = TTLOption.constant(80000)))(implicit rwf: RowWriterFactory[T]):
Future[Unit] = {
implicit val ec = ExecutionContext.global
val writer = TableWriter(cc, keyspaceName, tableName, columns, writeConf)
val futureAction = Future(writer.write(TaskContext.get(), data: Iterator[T]))
futureAction
}
}
And then wait using:
Await.result(resultFuture, TIMEOUT seconds)
the data is available when the execution reaches the write method on line :
val futureAction = Future(writer.write(TaskContext.get(), data: Iterator[T]))
But data is empty when the execution reaches the definition def write(taskContext: TaskContext, **data**: Iterator[T]) of function :
def write(taskContext: TaskContext, data: Iterator[T]) {
val updater = OutputMetricsUpdater(taskContext, writeConf)
connector.withSessionDo { session =>
val protocolVersion = session.getCluster.getConfiguration.getProtocolOptions.getProtocolVersion
val rowIterator = new CountingIterator(data)
val stmt = prepareStatement(session).setConsistencyLevel(writeConf.consistencyLevel)
val queryExecutor = new QueryExecutor(
session,
writeConf.parallelismLevel,
Some(updater.batchFinished(success = true, _, _, _)),
Some(updater.batchFinished(success = false, _, _, _)))
val routingKeyGenerator = new RoutingKeyGenerator(tableDef, columnNames)
val batchType = if (isCounterUpdate) Type.COUNTER else Type.UNLOGGED
val boundStmtBuilder = new BoundStatementBuilder(
rowWriter,
stmt,
protocolVersion = protocolVersion,
ignoreNulls = writeConf.ignoreNulls)
val batchStmtBuilder = new BatchStatementBuilder(
batchType,
routingKeyGenerator,
writeConf.consistencyLevel)
val batchKeyGenerator = batchRoutingKey(session, routingKeyGenerator) _
val batchBuilder = new GroupingBatchBuilder(
boundStmtBuilder,
batchStmtBuilder,
batchKeyGenerator,
writeConf.batchSize,
writeConf.batchGroupingBufferSize,
rowIterator)
val rateLimiter = new RateLimiter((writeConf.throughputMiBPS * 1024 * 1024).toLong, 1024 * 1024)
logDebug(s"Writing data partition to $keyspaceName.$tableName in batches of ${writeConf.batchSize}.")
for (stmtToWrite <- batchBuilder) {
queryExecutor.executeAsync(stmtToWrite)
assert(stmtToWrite.bytesCount > 0)
rateLimiter.maybeSleep(stmtToWrite.bytesCount)
}
queryExecutor.waitForCurrentlyExecutingTasks()
if (!queryExecutor.successful)
throw new IOException(s"Failed to write statements to $keyspaceName.$tableName.")
val duration = updater.finish() / 1000000000d
logInfo(f"Wrote ${rowIterator.count} rows to $keyspaceName.$tableName in $duration%.3f s.")
if (boundStmtBuilder.logUnsetToNullWarning) {
logWarning(boundStmtBuilder.UnsetToNullWarning)
}
}
}
}
so I see empty iterator.
Please guide on what can be the issue.

Why does my Akka data stream stops processing a huge file (~250,000 lines of strings) but works for small file?

My stream works for smaller file of 1000 lines but stops when I test it on a large file ~12MB and ~250,000 lines? I tried applying backpressure with a buffer and throttling it and still same thing...
Here is my data streamer:
class UserDataStreaming(usersFile: File) {
implicit val system = ActorSystemContainer.getInstance().getSystem
implicit val materializer = ActorSystemContainer.getInstance().getMaterializer
def startStreaming() = {
val graph = RunnableGraph.fromGraph(GraphDSL.create() {
implicit builder =>
val usersSource = builder.add(Source.fromIterator(() => usersDataLines)).out
val stringToUserFlowShape: FlowShape[String, User] = builder.add(csvToUser)
val averageAgeFlowShape: FlowShape[User, (String, Int, Int)] = builder.add(averageUserAgeFlow)
val averageAgeSink = builder.add(Sink.foreach(averageUserAgeSink)).in
usersSource ~> stringToUserFlowShape ~> averageAgeFlowShape ~> averageAgeSink
ClosedShape
})
graph.run()
}
val usersDataLines = scala.io.Source.fromFile(usersFile, "ISO-8859-1").getLines().drop(1)
val csvToUser = Flow[String].map(_.split(";").map(_.trim)).map(csvLinesArrayToUser)
def csvLinesArrayToUser(line: Array[String]) = User(line(0), line(1), line(2))
def averageUserAgeSink[usersSource](source: usersSource) {
source match {
case (age: String, count: Int, totalAge: Int) => println(s"age = $age; Average reader age is: ${Try(totalAge/count).getOrElse(0)} count = $count and total age = $totalAge")
case bad => println(s"Bad case: $bad")
}
}
def averageUserAgeFlow = Flow[User].fold(("", 0, 0)) {
(nums: (String, Int, Int), user: User) =>
var counter: Option[Int] = None
var totalAge: Option[Int] = None
val ageInt = Try(user.age.substring(1, user.age.length-1).toInt)
if (ageInt.isSuccess) {
counter = Some(nums._2 + 1)
totalAge = Some(nums._3 + ageInt.get)
}
else {
counter = Some(nums._2 + 0)
totalAge = Some(nums._3 + 0)
}
//println(counter.get)
(user.age, counter.get, totalAge.get)
}
}
Here is my Main:
object Main {
def main(args: Array[String]): Unit = {
implicit val system = ActorSystemContainer.getInstance().getSystem
implicit val materializer = ActorSystemContainer.getInstance().getMaterializer
val usersFile = new File("data/BX-Users.csv")
println(usersFile.length())
val userDataStreamer = new UserDataStreaming(usersFile)
userDataStreamer.startStreaming()
}
It´s possible that there may be any error related to one row of your csv file. In that case, the stream materializes and stops. Try to define your flows like that:
FlowFlowShape[String, User].map {
case (user) => try {
csvToUser(user)
}
}.withAttributes(ActorAttributes.supervisionStrategy {
case ex: Throwable =>
log.error("Error parsing row event: {}", ex)
Supervision.Resume
}
In this case the possible exception is captured and the stream ignores the error and continues.
If you use Supervision.Stop, the stream stops.

Spark Streaming - Broadcast variable - Case Class

My requirement is to enrich data stream data with profile information from a HBase table. I was looking to use a broadcast variable. Enclosed the whole code here.
The output of HBase data is as follows
In the Driver node HBaseReaderBuilder
(org.apache.spark.SparkContext#3c58b102,hbase_customer_profile,Some(data),WrappedArray(gender, age),None,None,List()))
In the Worker node
HBaseReaderBuilder(null,hbase_customer_profile,Some(data),WrappedArray(gender, age),None,None,List()))
As you can see it has lost the spark context. When i issue the statement val
myRdd = bcdocRdd.map(r => Profile(r._1, r._2, r._3)) i get a NullPointerException
java.lang.NullPointerException
at it.nerdammer.spark.hbase.HBaseReaderBuilderConversions$class.toSimpleHBaseRDD(HBaseReaderBuilder.scala:83)
at it.nerdammer.spark.hbase.package$.toSimpleHBaseRDD(package.scala:5)
at it.nerdammer.spark.hbase.HBaseReaderBuilderConversions$class.toHBaseRDD(HBaseReaderBuilder.scala:67)
at it.nerdammer.spark.hbase.package$.toHBaseRDD(package.scala:5)
at testPartition$$anonfun$main$1$$anonfun$apply$1$$anonfun$apply$2.apply(testPartition.scala:34)
at testPartition$$anonfun$main$1$$anonfun$apply$1$$anonfun$apply$2.apply(testPartition.scala:33)
object testPartition {
def main(args: Array[String]) : Unit = {
val sparkMaster = "spark://x.x.x.x:7077"
val ipaddress = "x.x.x.x:2181" // Zookeeper
val hadoopHome = "/home/hadoop/software/hadoop-2.6.0"
val topicname = "new_events_test_topic"
val mainConf = new SparkConf().setMaster(sparkMaster).setAppName("testingPartition")
val mainSparkContext = new SparkContext(mainConf)
val ssc = new StreamingContext(mainSparkContext, Seconds(30))
val eventsStream = KafkaUtils.createStream(ssc,"x.x.x.x:2181","receive_rest_events",Map(topicname.toString -> 2))
val docRdd = mainSparkContext.hbaseTable[(String, Option[String], Option[String])]("hbase_customer_profile").select("gender","age").inColumnFamily("data")
println ("docRDD from Driver ",docRdd)
val broadcastedprof = mainSparkContext.broadcast(docRdd)
eventsStream.foreachRDD(dstream => {
dstream.foreachPartition(records => {
println("Broadcasted docRDD - in Worker ", broadcastedprof.value)
val bcdocRdd = broadcastedprof.value
records.foreach(record => {
//val myRdd = bcdocRdd.map(r => Profile(r._1, r._2, r._3))
//myRdd.foreach(println)
val Rows = record._2.split("\r\n")
})
})
})
ssc.start()
ssc.awaitTermination()
}
}

Slick code generation for only a single schema

Is there a way to have Slick's code generation generate code for only a single schema? Say, public? I have extensions that create a whole ton of tables (eg postgis, pg_jobman) that make the code that slick generates gigantic.
Use this code with appropriate values and schema name,
object CodeGenerator {
def outputDir :String =""
def pkg:String =""
def schemaList:String = "schema1, schema2"
def url:String = "dburl"
def fileName:String =""
val user = "dbUsername"
val password = "dbPassword"
val slickDriver="scala.slick.driver.PostgresDriver"
val JdbcDriver = "org.postgresql.Driver"
val container = "Tables"
def generate() = {
val driver: JdbcProfile = buildJdbcProfile
val schemas = createSchemaList
var model = createModel(driver,schemas)
val codegen = new SourceCodeGenerator(model){
// customize Scala table name (table class, table values, ...)
override def tableName = dbTableName => dbTableName match {
case _ => dbTableName+"Table"
}
override def code = {
//imports is copied right out of
//scala.slick.model.codegen.AbstractSourceCodeGenerator
val imports = {
"import scala.slick.model.ForeignKeyAction\n" +
(if (tables.exists(_.hlistEnabled)) {
"import scala.slick.collection.heterogenous._\n" +
"import scala.slick.collection.heterogenous.syntax._\n"
} else ""
) +
(if (tables.exists(_.PlainSqlMapper.enabled)) {
"import scala.slick.jdbc.{GetResult => GR}\n" +
"// NOTE: GetResult mappers for plain SQL are only generated for tables where Slick knows how to map the types of all columns.\n"
} else ""
) + "\n\n" //+ tables.map(t => s"implicit val ${t.model.name.table}Format = Json.format[${t.model.name.table}]").mkString("\n")+"\n\n"
}
val bySchema = tables.groupBy(t => {
t.model.name.schema
})
val schemaFor = (schema: Option[String]) => {
bySchema(schema).sortBy(_.model.name.table).map(
_.code.mkString("\n")
).mkString("\n\n")
}
}
val joins = tables.flatMap( _.foreignKeys.map{ foreignKey =>
import foreignKey._
val fkt = referencingTable.TableClass.name
val pkt = referencedTable.TableClass.name
val columns = referencingColumns.map(_.name) zip
referencedColumns.map(_.name)
s"implicit def autojoin${fkt + name.toString} = (left:${fkt} ,right:${pkt}) => " +
columns.map{
case (lcol,rcol) =>
"left."+lcol + " === " + "right."+rcol
}.mkString(" && ")
})
override def entityName = dbTableName => dbTableName match {
case _ => dbTableName
}
override def Table = new Table(_) {
table =>
// customize table value (TableQuery) name (uses tableName as a basis)
override def TableValue = new TableValue {
override def rawName = super.rawName.uncapitalize
}
// override generator responsible for columns
override def Column = new Column(_){
// customize Scala column names
override def rawName = (table.model.name.table,this.model.name) match {
case _ => super.rawName
}
}
}
}
println(outputDir+"\\"+fileName)
(new File(outputDir)).mkdirs()
val fw = new FileWriter(outputDir+File.separator+fileName)
fw.write(codegen.packageCode(slickDriver, pkg, container))
fw.close()
}
def createModel(driver: JdbcProfile, schemas:Set[Option[String]]): Model = {
driver.simple.Database
.forURL(url, user = user, password = password, driver = JdbcDriver)
.withSession { implicit session =>
val filteredTables = driver.defaultTables.filter(
(t: MTable) => schemas.contains(t.name.schema)
)
PostgresDriver.createModel(Some(filteredTables))
}
}
def createSchemaList: Set[Option[String]] = {
schemaList.split(",").map({
case "" => None
case (name: String) => Some(name)
}).toSet
}
def buildJdbcProfile: JdbcProfile = {
val module = currentMirror.staticModule(slickDriver)
val reflectedModule = currentMirror.reflectModule(module)
val driver = reflectedModule.instance.asInstanceOf[JdbcProfile]
driver
}
}
I encountered the same problem and I found this question. The answer by S.Karthik sent me in the right direction. However, the code in the answer is slightly outdated. And I think a bit over-complicated. So I crafted my own solution:
import slick.codegen.SourceCodeGenerator
import slick.driver.JdbcProfile
import slick.model.Model
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, ExecutionContext}
val slickDriver = "slick.driver.PostgresDriver"
val jdbcDriver = "org.postgresql.Driver"
val url = "jdbc:postgresql://localhost:5432/mydb"
val outputFolder = "/path/to/src/test/scala"
val pkg = "com.mycompany"
val user = "user"
val password = "password"
object MySourceCodeGenerator {
def run(slickDriver: String, jdbcDriver: String, url: String, outputDir: String,
pkg: String, user: Option[String], password: Option[String]): Unit = {
val driver: JdbcProfile =
Class.forName(slickDriver + "$").getField("MODULE$").get(null).asInstanceOf[JdbcProfile]
val dbFactory = driver.api.Database
val db = dbFactory.forURL(url, driver = jdbcDriver, user = user.orNull,
password = password.orNull, keepAliveConnection = true)
try {
// **1**
val allSchemas = Await.result(db.run(
driver.createModel(None, ignoreInvalidDefaults = false)(ExecutionContext.global).withPinnedSession), Duration.Inf)
// **2**
val publicSchema = new Model(allSchemas.tables.filter(_.name.schema.isEmpty), allSchemas.options)
// **3**
new SourceCodeGenerator(publicSchema).writeToFile(slickDriver, outputDir, pkg)
} finally db.close
}
}
MySourceCodeGenerator.run(slickDriver, jdbcDriver, url, outputFolder, pkg, Some(user), Some(password))
I'll explain what's going on here:
I copied the run function from the SourceCodeGenerator class that's in the slick-codegen library. (I used version slick-codegen_2.10-3.1.1.)
// **1**: In the origninal code, the generated Model was referenced in a val called m. I renamed that to allSchemas.
// **2**: I created a new Model (publicSchema), using the options from the original model, and using a filtered version of the tables set from the original model. It turns out tables from the public schema don't get a schema name in the model. Hence the isEmpty. Should you need tables from one or more other schemas, you can easily create a different filter expression.
// **3**: I create a SourceCodeGenerator with the created publicSchema model.
Of course, it would even be better if the Slick codegenerator could incorporate an option to select one or more schemas.