create a big Table using HBASE client API

create a big Table using HBASE client API - scala

I am working with google cloud big table using client API in scala,
I am trying to create a table with a single column family but I am getting errors
Below is the code I wrote :
`object TestBigtable {
val columnFamilyName = Bytes.toBytes("cf1")
def createConnection(ProjectId: String, InstanceID: String): Connection = {
BigtableConfiguration.connect(ProjectId, InstanceID)
}
def createTableIfNotExists(connection: Connection, name: String) = {
val tableName = TableName.valueOf(name)
val admin = connection.getAdmin()
if (!admin.tableExists(tableName)) {
val tableDescriptor = new HTableDescriptor(tableName)
tableDescriptor.addFamily(
new HColumnDescriptor(columnFamilyName))
admin.createTable(tableDescriptor)
}
}
def runner(projectId: String,
instanceId: String,
tableName: String) = {
val createTableConnection = createConnection(projectId, instanceId)
try {
createTableIfNotExists(createTableConnection, tableName)
} finally {
createTableConnection.close()
}
}`
Once I execute my jar I get the following set of errors:
18/07/25 10:36:20 INFO com.google.cloud.bigtable.grpc.BigtableSession: Bigtable options: BigtableOptions{dataHost=bigtable.googleapis.com, adminHost=bigtableadmin.googleapis.com, port=443, projectId=renault-ftt, instanceId=testfordeletion, appProfileId=, userAgent=hbase-1.4.3, credentialType=DefaultCredentials, dataChannelCount=4, retryOptions=RetryOptions{retriesEnabled=true, allowRetriesWithoutTimestamp=false, statusToRetryOn=[UNAUTHENTICATED, ABORTED, DEADLINE_EXCEEDED, UNAVAILABLE], initialBackoffMillis=5, maxElapsedBackoffMillis=60000, backoffMultiplier=2.0, streamingBufferSize=60, readPartialRowTimeoutMillis=60000, maxScanTimeoutRetries=3}, bulkOptions=BulkOptions{asyncMutatorCount=2, useBulkApi=true, bulkMaxKeyCount=125, bulkMaxRequestSize=1048576, autoflushMs=0, maxInflightRpcs=40, maxMemory=97307852, enableBulkMutationThrottling=false, bulkMutationRpcTargetMs=100}, callOptionsConfig=CallOptionsConfig{useTimeout=false, shortRpcTimeoutMs=60000, longRpcTimeoutMs=600000}, usePlaintextNegotiation=false, useCachedDataPool=false}.
18/07/25 10:36:20 INFO com.google.cloud.bigtable.grpc.io.OAuthCredentialsCache: Refreshing the OAuth token
Exception in thread "grpc-default-executor-0" java.lang.IllegalAccessError: tried to access field com.google.protobuf.AbstractMessage.memoizedSize from class com.google.bigtable.admin.v2.ListTablesRequest
at com.google.bigtable.admin.v2.ListTablesRequest.getSerializedSize(ListTablesRequest.java:236)
at io.grpc.protobuf.lite.ProtoInputStream.available(ProtoInputStream.java:108)
at io.grpc.internal.MessageFramer.getKnownLength(MessageFramer.java:204)
at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:136)
at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:52)
at io.grpc.internal.DelayedStream$5.run(DelayedStream.java:218)
at io.grpc.internal.DelayedStream.drainPendingCalls(DelayedStream.java:132)
at io.grpc.internal.DelayedStream.setStream(DelayedStream.java:101)
at io.grpc.internal.DelayedClientTransport$PendingStream.createRealStream(DelayedClientTransport.java:361)
at io.grpc.internal.DelayedClientTransport$PendingStream.access$300(DelayedClientTransport.java:344)
at io.grpc.internal.DelayedClientTransport$5.run(DelayedClientTransport.java:302)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Any one could help me with this please ?

Reposting the comment from Solomon as an answer:
io.grpc.protobuf.lite is in the stack. The Cloud Bigtable client was
never tested with protobuf lite. A dependency graph would help. As a
quick fix, you can also try the bigtable-hbase-1.x-shaded artifact
instead of the bigtable-hbase-1.x artifact.
It's possible that your use of io.grpc.protobuf.lite is causing issues. As I understand it, io.grpc.protobuf.lite is mainly for use on Android clients.
Using the shaded artifact should prevent dependency conflicts at the cost of a larger JAR size and potential memory footprint. You may also want to review these similar issue reports and how they were resolved:
https://groups.google.com/forum/#!topic/protobuf/_Yq0Dar_jhk
https://github.com/grpc/grpc-java/issues/2300

Related

java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be case to com.streamsets.pipeline.api.Record

I am trying to run a sample application locally using:
Scala (2.11), Spark(2.3.0) with streamset api version 3.8.0.
(I am trying to run a spark transformation as described in this tutorial: https://github.com/streamsets/tutorials/blob/master/tutorial-spark-transformer-scala/readme.md )
First I create a JavaRDD[Record], something like:
val testrecord = spark.read.json("...path to json file").toJavaRDD.asInstanceOf[JavaRDD[Record]]
Then I pass this JavaRDD[Record] to the transform method in DTStream class:
new DTStream().transform(testrecord)
The Transform method in the DTStream class itself is very simple:
#override def transform(javaRDD: JavaRDD[Record]): TransformResult = {
val recordRDD = javaRDD.rdd
val resultMessage = recordRDD.map((record) => record) //Just trying to pass incoming record as outgoing record - no transformation at all.
new TransformResult (resultMessage.toJavaRDD, error) // where error is already defined as a JavaPairRDD.
}
When I try this simple code out, I am getting the following exception exactly at this line:
val resultMessage = recordRDD.map((record) => record)
java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to com.streamsets.pipeline.api.Record.
Any pointers as to why I may be getting this and how to resolve?
Thanks in advance.
Note: Record is datacollector-api/Record : https://github.com/streamsets/datacollector-api/blob/master/src/main/java/com/streamsets/pipeline/api/Record.java

I don't think you can run the sample application in an IDE - you have to do so within StreamSets Data Collector itself as detailed in the tutorial.

Spray, Slick, Spark - OutOfMemoryError: PermGen space

I have successfully implemented a simple web service using Spray and Slick that passes an incoming request through a Spark ML Prediction Pipeline. Everything was working fine until I tried to add a data layer. I have chosen Slick it seems to be popular.
However, I can't quite get it to work right. I have been basing most of my code on the Hello-Slick Activator Template. I use a DAO object like so:
object dataDAO {
val datum = TableQuery[Datum]
def dbInit = {
val db = Database.forConfig("h2mem1")
try {
Await.result(db.run(DBIO.seq(
datum.schema.create
)), Duration.Inf)
} finally db.close
}
def insertData(data: Data) = {
val db = Database.forConfig("h2mem1")
try {
Await.result(db.run(DBIO.seq(
datum += data,
datum.result.map(println)
)), Duration.Inf)
} finally db.close
}
}
case class Data(data1: String, data2: String)
class Datum(tag: Tag) extends Table[Data](tag, "DATUM") {
def data1 = column[String]("DATA_ONE", O.PrimaryKey)
def data2 = column[String]("DATA_TWO")
def * = (data1, data2) <> (Data.tupled, Data.unapply)
}
I initialize my database in my Boot object
object Boot extends App {
implicit val system = ActorSystem("raatl-demo")
Classifier.initializeData
PredictionDAO.dbInit
// More service initialization code ...
}
I try to add a record to my database before completing the service request
val predictionRoute = {
path("data") {
get {
parameter('q) { query =>
// do Spark stuff to get prediction
DataDAO.insertData(data)
respondWithMediaType(`application/json`) {
complete {
DataJson(data1, data2)
}
}
}
}
}
When I send a request to my service my application crashes
java.lang.OutOfMemoryError: PermGen space
I suspect I'm implementing the Slick API incorrectly. its hard to tell from the documentation, because it stuffs all the operations into a main method.
Finally, my conf is the same as the activator ui
h2mem1 = {
url = "jdbc:h2:mem:raatl"
driver = org.h2.Driver
connectionPool = disabled
keepAliveConnection = true
}
Has anyone encountered this before? I'm using Slick 3.1

java.lang.OutOfMemoryError: PermGen space is normally not a problem with your usage, here is what oracle says about this.
The detail message PermGen space indicates that the permanent generation is full. The permanent generation is the area of the heap where class and method objects are stored. If an application loads a very large number of classes, then the size of the permanent generation might need to be increased using the -XX:MaxPermSize option.
I do not think this is because of incorrect implementation of the Slick API. This probably happens because you are using multiple frameworks that loads many classes.
Your options are:
Increase the size of perm gen size -XX:MaxPermSize
Upgrade to Java 8. The perm gen space is now replaced with MetaSpace which is tuned automatically

Play reports that it can't get ClosableLazy value after it has been closed

I am trying to run specification test in Play/Scala/ReactiveMongo project. Setup is like this:
class FeaturesSpec extends Specification {
"Features controller" should {
"create feature from JSON request" in withMongoDb { app =>
// do test
}
}
With MongoDbFixture as follows:
object MongoDBTestUtils {
def withMongoDb[T](block: Application => T): T = {
implicit val app = FakeApplication(
additionalConfiguration = Map("mongodb.uri" -> "mongodb://localhost/unittests")
)
running(app) {
def db = ReactiveMongoPlugin.db
try {
block(app)
} finally {
dropAll(db)
}
}
}
def dropAll(db: DefaultDB) = {
Await.ready(Future.sequence(Seq(
db.collection[JSONCollection]("features").drop()
)), 2 seconds)
}
}
When test runs, logs are pretty noisy, complain about resource being already closed. Although tests work correctly, this is weird and I would like to know why this occurs and how to fix it.
Error:
[info] application - ReactiveMongoPlugin stops, closing connections...
[warn] play - Error stopping plugin
java.lang.IllegalStateException: Can't get ClosableLazy value after it has been closed
at play.core.ClosableLazy.get(ClosableLazy.scala:49) ~[play_2.11-2.3.7.jar:2.3.7]
at play.api.libs.concurrent.AkkaPlugin.applicationSystem(Akka.scala:71) ~[play_2.11-2.3.7.jar:2.3.7]
at play.api.libs.concurrent.Akka$$anonfun$system$1.apply(Akka.scala:29) ~[play_2.11-2.3.7.jar:2.3.7]
at play.api.libs.concurrent.Akka$$anonfun$system$1.apply(Akka.scala:29) ~[play_2.11-2.3.7.jar:2.3.7]
at scala.Option.map(Option.scala:145) [scala-library-2.11.4.jar:na]

The exception means that you are using the ReactiveMongo plugin after the application has stopped.
You might wanna try using Around:
class withMongoDb extends Around with Scope {
val db = ReactiveMongoPlugin.db
override def around[T: AsResult](t: => T): Result = try {
val res = t
AsResult.effectively(res)
} finally {
...
}
}
You should also take a look at Flapdoodle Embedded Mongo, with that you don't have to delete databases after testing IIRC.

This problem likely occurs because your test exercises code that references a closed MongoDB instance. After each Play Specs2 test runs, the MongoDb connection is reset, thus your first test may pass, but a subsequent test may hold a stale reference to the closed instance, and as a result fail.
One way to solve this issue is to ensure the following criteria are met in your application:
Avoid using val or lazy val for MongoDb database resources
(Re)Initialize all database references on application start.
I wrote up a blog post that describes a solution to the problem within the context of a Play Controller.

Run Scales XML's pullXml in weblogic server

I used pullXml due to the file size is quite large, > 100M, I wrote a sample program as follows:
object TestXml {
val mc = new java.math.MathContext(1024)
val zero = BigDecimal(0, mc)
def calculate(infile: String, encoding: String): BigDecimal = {
val inStream = new FileInputStream(infile)
val pull = pullXml(new InputSource(new InputStreamReader(inStream, encoding)))
val ns = Namespace("urn:abcaus.onair.sintecmedia.com")
val qnames = List(ns("ePGResp"), "EPGResponse"l, "Event"l)
def eventStream = iterate(qnames, pull).toStream
var count = zero
eventStream foreach { event => count += eventId(event) }
inStream.close
count
}
def eventId(event: XmlPath): Long =
text(event.\*("EventID")).toLong
def main(args: Array[String]): Unit = args.toList match {
case infile :: encoding :: Nil => println(calculate(infile, encoding))
case _ => println("usage scala -cp classpath au.net.abc.epg.TestLoadXml infile encoding")
}
}
I run the program on a command line as follows:
$JAVA_HOME/bin/java -cp JarContainsSampleProgram.jar:scala-library-2.10.2.jar:scala-reflect-2.10.2.jar:scalalogging-slf4j_2.10-1.0.1.jar:scalaz-core_2.10-7.0.0.jar:scalaz-effect_2.10-7.0.0.jar:scalaz-iterv_2.10-7.0.0.jar:scales-xml_2.10-0.6.0-M1.jar:slf4j-api-1.6.4.jar TestLoadXml /home/wonga4d/EPG/Huge.xml utf-16
It runs successfully returning a value, say, 879452677392.
However, when I deploy it as a oracle service bus Java callout (which is ok because Scala is JVM lang) to be used by an OSB proxy, still use the same input file and encoding, I got the following error
Callout to java method "public static scala.math.BigDecimal au.net.abc.epg.TestLoadXml.calculate(java.lang.String,java.lang.String)" resulted in exception: Got an event (Text()) that should not be in the prolog java.lang.RuntimeException: Got an event (Text()) that should not be in the prolog
at scala.sys.package$.error(package.scala:27)
at scala.Predef$.error(Predef.scala:142)
at scales.utils.package$.error(package.scala:19)
at scales.xml.parser.pull.PullUtils$$anonfun$getMisc$1.apply(PullIterator.scala:144)
at scales.xml.parser.pull.PullUtils$$anonfun$getMisc$1.apply(PullIterator.scala:141)
at scala.util.Either.fold(Either.scala:97)
at scales.xml.parser.pull.PullUtils$.getMisc(PullIterator.scala:141)
at scales.xml.parser.pull.XmlPull$class.start(PullIterator.scala:89)
at scales.xml.parser.pull.XmlPulls$$anon$1.start(XmlPull.scala:134)
at scales.xml.parser.pull.XmlPulls$$anon$1.<init>(XmlPull.scala:156)
at scales.xml.parser.pull.XmlPulls$class.pullXml(XmlPull.scala:134)
at scales.xml.package$.pullXml(package.scala:7)
at TestXml$.calculate(TestLoadXml.scala:23)
at TestXml.calculate(TestLoadXml.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at stages.transform.runtime.JavaCalloutRuntimeStep$1.run(JavaCalloutRuntimeStep.java:173)
at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:363)
at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:146)
at weblogic.security.Security.runAs(Security.java:61)
at stages.transform.runtime.JavaCalloutRuntimeStep.processMessage(JavaCalloutRuntimeStep.java:195)
at com.bea.wli.sb.pipeline.debug.DebuggerRuntimeStep.processMessage(DebuggerRuntimeStep.java:74)
at com.bea.wli.sb.stages.StageMetadataImpl$WrapperRuntimeStep.processMessage(StageMetadataImpl.java:346)
at com.bea.wli.sb.pipeline.PipelineStage.processMessage(PipelineStage.java:84)
It fails at pullXml. But it always succeeds when running on a commandline which I showed before. If I don't use pullXml but loadXml, it will always succeed even when running in a weblogic server. But loadXml will get a problem if loading a huge xml file. Both pullXml & loadXml methods are located in the same jar, scales-xml_2.10-0.6.0-M1.jar.
Just wonder if anyone ever used scales xml in weblogic server. Sounds like I got to give up using scales xml if weblogic server is the execution environment.
Thanks

that your code runs outside of weblogic shows its a javax.xml implementation issue. Its possible that this only needs you to supply an alternate implementation (e.g. aalto-xml) with a "child first" or app first classloader setting.
If you could let me know (answering here is cool) if that sorts things out that would be great, I'll add it to the docs.
Outside of this issue I hope Scales is working out for you :)
Cheers,
Chris

Jetty works for HTTP but not HTTPS

I am trying to create a jetty consumer. I am able to get it successfully running using the endpoint uri:
jetty:http://0.0.0.0:8080
However, when I modify the endpoint uri for https:
jetty:https://0.0.0.0:8443
The page times out trying to load. This seems odd because the camel documentation states it should function right out of the box.
I have since loaded a signed SSL into java's default keystore, with my attempted implementation to load it below:http://camel.apache.org/jetty.html
I have a basic Jetty instance using the akka-camel library with akka and scala. ex:
class RestActor extends Actor with Consumer {
val ksp: KeyStoreParameters = new KeyStoreParameters();
ksp.setPassword("...");
val kmp: KeyManagersParameters = new KeyManagersParameters();
kmp.setKeyStore(ksp);
val scp: SSLContextParameters = new SSLContextParameters();
scp.setKeyManagers(kmp);
val jettyComponent: JettyHttpComponent = CamelExtension(context.system).context.getComponent("jetty", classOf[JettyHttpComponent])
jettyComponent.setSslContextParameters(scp);
def endpointUri = "jetty:https://0.0.0.0:8443/"
def receive = {
case msg: CamelMessage => {
...
}
...
}
...
}
This resulted in some progress, because the page does not timeout anymore, but instead gives a "The connection was interrupted" error. I am not sure where to go from here because camel is not throwing any Exceptions, but rather failing silently somewhere (apparently).
Does anybody know what would cause this behavior?

When using java's "keytool" I did not specify an output file. It didn't throw back an error, so it probably went somewhere. I created a new keystore and explicitly imported my crt into the keyfile. I then explicitly added the filepath to that keystore I created, and everything works now!
If I had to speculate, it is possible things failed silently because I was adding the certs to jetty's general bank of certs to use if eligible, instead of explicitly binding it as the SSL for the endpoint.
class RestActor extends Actor with Consumer {
val ksp: KeyStoreParameters = new KeyStoreParameters();
ksp.setResource("/path/to/keystore");
ksp.setPassword("...");
val kmp: KeyManagersParameters = new KeyManagersParameters();
kmp.setKeyStore(ksp);
val scp: SSLContextParameters = new SSLContextParameters();
scp.setKeyManagers(kmp);
val jettyComponent: JettyHttpComponent = CamelExtension(context.system).context.getComponent("jetty", classOf[JettyHttpComponent])
jettyComponent.setSslContextParameters(scp);
def endpointUri = "jetty:https://0.0.0.0:8443/"
def receive = {
case msg: CamelMessage => {
...
}
...
}
...
}
Hopefully somebody in the future can find use for this code as a template in implementing Jetty over SSL with akka-camel (surprisingly no examples seem to exist)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

create a big Table using HBASE client API - scala

Related

java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be case to com.streamsets.pipeline.api.Record

Spray, Slick, Spark - OutOfMemoryError: PermGen space

Play reports that it can't get ClosableLazy value after it has been closed

Run Scales XML's pullXml in weblogic server

Jetty works for HTTP but not HTTPS

Categories

Resources