How to query flink's queryable state - scala

I am using flink 1.8.0 and I am trying to query my job state.
val descriptor = new ValueStateDescriptor("myState", Types.CASE_CLASS[Foo])
descriptor.setQueryable("my-queryable-State")
I used port 9067 which is the default port according to this, my client:
val client = new QueryableStateClient("127.0.0.1", 9067)
val jobId = JobID.fromHexString("d48a6c980d1a147e0622565700158d9e")
val execConfig = new ExecutionConfig
val descriptor = new ValueStateDescriptor("my-queryable-State", Types.CASE_CLASS[Foo])
val res: Future[ValueState[Foo]] = client.getKvState(jobId, "my-queryable-State","a", BasicTypeInfo.STRING_TYPE_INFO, descriptor)
res.map(_.toString).pipeTo(sender)
but I am getting :
[ERROR] [06/25/2019 20:37:05.499] [bvAkkaHttpServer-akka.actor.default-dispatcher-5] [akka.actor.ActorSystemImpl(bvAkkaHttpServer)] Error during processing of request: 'org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:9067'. Completing with 500 Internal Server Error response. To change default exception handling behavior, provide a custom ExceptionHandler.
java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:9067
what am I doing wrong ?
how and where should I define QueryableStateOptions

So If You want to use the QueryableState You need to add the proper Jar to Your flink. The jar is flink-queryable-state-runtime, it can be found in the opt folder in Your flink distribution and You should move it to the lib folder.
As for the second question the QueryableStateOption is just a class that is used to create static ConfigOption definitions. The definitions are then used to read the configurations from flink-conf.yaml file. So currently the only option to configure the QueryableState is to use the flink-conf file in the flink distribution.
EDIT: Also, try reading this]1 it provides more info on how does Queryable State works. You shouldn't really connect directly to the server port but rather You should use the proxy port which by default is 9069.

Related

Authenticate with ECE ElasticSearch Sink from Apache Fink (Scala code)

Compiler error when using example provided in Flink documentation. The Flink documentation provides sample Scala code to set the REST client factory parameters when talking to Elasticsearch, https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/elasticsearch.html.
When trying out this code i get a compiler error in IntelliJ which says "Cannot resolve symbol restClientBuilder".
I found the following SO which is EXACTLY my problem except that it is in Java and i am doing this in Scala.
Apache Flink (v1.6.0) authenticate Elasticsearch Sink (v6.4)
I tried copy pasting the solution code provided in the above SO into IntelliJ, the auto-converted code also has compiler errors.
// provide a RestClientFactory for custom configuration on the internally created REST client
// i only show the setMaxRetryTimeoutMillis for illustration purposes, the actual code will use HTTP cutom callback
esSinkBuilder.setRestClientFactory(
restClientBuilder -> {
restClientBuilder.setMaxRetryTimeoutMillis(10)
}
)
Then i tried (auto generated Java to Scala code by IntelliJ)
// provide a RestClientFactory for custom configuration on the internally created REST client// provide a RestClientFactory for custom configuration on the internally created REST client
import org.apache.http.auth.AuthScope
import org.apache.http.auth.UsernamePasswordCredentials
import org.apache.http.client.CredentialsProvider
import org.apache.http.impl.client.BasicCredentialsProvider
import org.apache.http.impl.nio.client.HttpAsyncClientBuilder
import org.elasticsearch.client.RestClientBuilder
// provide a RestClientFactory for custom configuration on the internally created REST client// provide a RestClientFactory for custom configuration on the internally created REST client
esSinkBuilder.setRestClientFactory((restClientBuilder) => {
def foo(restClientBuilder) = restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
override def customizeHttpClient(httpClientBuilder: HttpAsyncClientBuilder): HttpAsyncClientBuilder = { // elasticsearch username and password
val credentialsProvider = new BasicCredentialsProvider
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(es_user, es_password))
httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider)
}
})
foo(restClientBuilder)
})
The original code snippet produces the error "cannot resolve RestClientFactory" and then Java to Scala shows several other errors.
So basically i need to find a Scala version of the solution described in Apache Flink (v1.6.0) authenticate Elasticsearch Sink (v6.4)
Update 1: I was able to make some progress with some help from IntelliJ. The following code compiles and runs but there is another problem.
esSinkBuilder.setRestClientFactory(
new RestClientFactory {
override def configureRestClientBuilder(restClientBuilder: RestClientBuilder): Unit = {
restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
override def customizeHttpClient(httpClientBuilder: HttpAsyncClientBuilder): HttpAsyncClientBuilder = {
// elasticsearch username and password
val credentialsProvider = new BasicCredentialsProvider
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(es_user, es_password))
httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider)
httpClientBuilder.setSSLContext(trustfulSslContext)
}
})
}
}
The problem is that i am not sure if i should be doing a new of the RestClientFactory object. What happens is that the application connects to the elasticsearch cluster but then discovers that the SSL CERT is not valid, so i had to put the trustfullSslContext (as described here https://gist.github.com/iRevive/4a3c7cb96374da5da80d4538f3da17cb), this got me past the SSL issue but now the ES REST Client does a ping test and the ping fails, it throws an exception and the app shutsdown. I am suspecting that the ping fails because of the SSL error and maybe it is not using the trustfulSslContext i setup as part of new RestClientFactory and this makes me suspect that i should not have done the new, there should be a simple way to update the existing RestclientFactory object and basically this is all happening because of my lack of Scala knowledge.
Happy to report that this is resolved. The code i posted in Update 1 is correct. The ping to ECE was not working for two reasons:
The certificate needs to include the complete chain including the root CA, the intermediate CA and the cert for the ECE. This helped get rid of the whole trustfulSslContext stuff.
The ECE was sitting behind an ha-proxy and the proxy did the mapping for the hostname in the HTTP request to the actual deployment cluster name in ECE. this mapping logic did not take into account that the Java REST High Level client uses the org.apache.httphost class which creates the hostname as hostname:port_number even when the port number is 443. Since it did not find the mapping because of the 443 therefore the ECE returned a 404 error instead of 200 ok (only way to find this was to look at unencrypted packets at the ha-proxy). Once the mapping logic in ha-proxy was fixed, the mapping was found and the pings are now successfull.

spark scala on windows machine

I am learning from the class. I have run the code as shown in the class and i get below errors. Any idea what i should do?
I have spark 1.6.1 and Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_74)
val datadir = "C:/Personal/V2Maestros/Courses/Big Data Analytics with Spark/Scala"
//............................................................................
//// Building and saving the model
//............................................................................
val tweetData = sc.textFile(datadir + "/movietweets.csv")
tweetData.collect()
def convertToRDD(inStr : String) : (Double,String) = {
val attList = inStr.split(",")
val sentiment = attList(0).contains("positive") match {
case true => 0.0
case false => 1.0
}
return (sentiment, attList(1))
}
val tweetText=tweetData.map(convertToRDD)
tweetText.collect()
//val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
var ttDF = sqlContext.createDataFrame(tweetText).toDF("label","text")
ttDF.show()
The error is:
scala> ttDF.show()
[Stage 2:> (0 + 2) / 2]16/03/30 11:40:25 ERROR ExecutorClassLoader: Failed to check existence of class org.apache.spark.sql.catalyst.expressio
REPL class server at http://192.168.56.1:54595
java.net.ConnectException: Connection timed out: connect
at java.net.TwoStacksPlainSocketImpl.socketConnect(Native Method)
re/4729300
I'm no expert but the connection IP in the error message looks like a private node or even your router/modem local address.
As stated in the comment it could be that you're running the context with a wrong configuration that tries to spread the work to a cluster that's not there, instead of in your local jvm process.
For further information you can read here and experiment with something like
import org.apache.spark.SparkContext
val sc = new SparkContext(master = "local[4]", appName = "tweetsClass", conf = new SparkConf)
Update
Since you're using the interactive shell and the provided SparkContext available there, I guess you should pass the equivalent parameters to the shell command as in
<your-spark-path>/bin/spark-shell --master local[4]
Which instructs the driver to assign a master for the spark cluster on the local machine, on 4 threads.
I think the problem comes with connectivity and not from within the code.
Check if you can actually connect to this address and port (54595).
Probably your spark master is not accessible at the specified port. Use local[*] to validate using a smaller dataset and local master. Then, ckeck if the port is accessible or change it based on Spark port configuration (http://spark.apache.org/docs/latest/configuration.html)

How to handle a 'Configuration error[Cannot connect to database [...]]'

I am implementing a web service with the Play Framework, that uses multiple databases. All databases are configured in the conf/application.conf by specifying the db.database1..., db.database2... properties.
At startup, play will try to establish connections to all databases configured in the database and if one connection fails, the service will not start.
In my case, not all databases are necessary to start the web service, but the web service can still run with limited functionality, if some databases are not available. Since not all databases are under my control, it is crucial for my web service to handle a connection error.
Therefore my question:
Is there a way to either
handle the connection error by overriding some 'onError' method or insert a try-catch at the right place or
manually create the Datasources at runtime to handle the error when they are created
I would prefer solution 2.
I am using play version 2.4.2 with scala version 2.11.7.
Since the whole exceptions fills multiple pages, I only insert the first lines here:
CreationException: Unable to create injector, see the following errors:
1) Error in custom provider, Configuration error: Configuration error[Cannot connect to database [foo]]
while locating play.api.db.DBApiProvider
while locating play.api.db.DBApi
for field at play.api.db.NamedDatabaseProvider.dbApi(DBModule.scala:80)
while locating play.api.db.NamedDatabaseProvider
at com.google.inject.util.Providers$GuicifiedProviderWithDependencies.initialize(Providers.java:149)
at play.api.db.DBModule$$anonfun$namedDatabaseBindings$1.apply(DBModule.scala:34):
Binding(interface play.api.db.Database qualified with QualifierInstance(#play.db.NamedDatabase(value=appstate)) to ProviderTarget(play.api.db.NamedDatabaseProvider#1a7884c6)) (via modules: com.google.inject.util.Modules$OverrideModule -> play.api.inject.guice.GuiceableModuleConversions$$anon$1)
Caused by: Configuration error: Configuration error[Cannot connect to database [foo]]
at play.api.Configuration$.configError(Configuration.scala:178)
at play.api.Configuration.reportError(Configuration.scala:829)
at play.api.db.DefaultDBApi$$anonfun$connect$1.apply(DefaultDBApi.scala:48)
at play.api.db.DefaultDBApi$$anonfun$connect$1.apply(DefaultDBApi.scala:42)
at scala.collection.immutable.List.foreach(List.scala:381)
at play.api.db.DefaultDBApi.connect(DefaultDBApi.scala:42)
at play.api.db.DBApiProvider.get$lzycompute(DBModule.scala:72)
I remember that exists a Global Settings configuration file to catch errors when the application starts.
Take a look here: https://www.playframework.com/documentation/2.0/ScalaGlobal I know you are using a more recent play version but you will get a more general idea how it works.
In play 2.4.x this file was removed and uses DI now (https://www.playframework.com/documentation/2.4.x/GlobalSettings).
To solve my problem, I ended up writing my own wrapper for BoneCP. This way the initialization of the ConnectionPools were in my hands and I could handle connection errors.
Instead of using the db prefix, I use the prefix database (so that Play will not automatically parse its content) in a new config file database.conf.
object ConnectionPool {
private var connectionPools = Map.empty[String, BoneCP]
val config = ConfigFactory.parseFile(new File("conf/database.conf"))
private def dbConfig(dbId: String): ConfigObject = {
config.getObject("database." + dbId).asInstanceOf[ConfigObject]
}
def createConnectionPool(dbId: String): BoneCP = {
val dbConf = dbConfig(dbId)
val cpConfig: BoneCPConfig = new BoneCPConfig()
cpConfig.setJdbcUrl(dbConf.get("url").unwrapped().toString)
cpConfig.setUsername(dbConf.get("user").unwrapped().toString)
cpConfig.setPassword(dbConf.get("password").unwrapped().toString)
new BoneCP(cpConfig)
}
def getConnectionPool(dbId: String): BoneCP = {
if(!connectionPools.contains(dbId)) {
val cp = createConnectionPool(dbId)
connectionPools = (connectionPools + (dbId -> cp))
}
connectionPools(dbId)
}
def getConnection(dbId: String): Connection = {
getConnectionPool(dbId).getConnection()
}
def withConnection[T](dbId: String)(fun: Connection => T): T = {
val conn = getConnection(dbId)
val t = fun(conn)
conn.close()
t
}
}

Handling connection failures in apache-camel

I am writing an apache-camel RabbitMQ consumer. I would like to react somehow to connection problems (i.e. try to reconnect). Is it possible to configure apache-camel to automatically reconnect?
If not, how can I find out that a connection to the queue was interrupted? I've done the following test:
start the queue (and some producer)
start my consumer (it was getting messages as expected)
stop the queue (the messages stopped arriving, as expected, but no exception was thrown)
start the queue (no new messages were received)
I am using camel in Scala (via akka-camel), but a Java solution would be probably also OK
You can pass in the flag automaticRecoveryEnabled=true to the URI, Camel will reconnect if the connection is lost.
For automatic RabbitMQ resource recovery (Connections/Channels/Consumers/Queues/Exchanages/Bindings) when failures occur, check out Lyra (which I authored). Example usage:
Config config = new Config()
.withRecoveryPolicy(new RecoveryPolicy()
.withMaxAttempts(20)
.withInterval(Duration.seconds(1))
.withMaxDuration(Duration.minutes(5)));
ConnectionOptions options = new ConnectionOptions().withHost("localhost");
Connection connection = Connections.create(options, config);
The rest of the API is just the amqp-client API, except your resources are automatically recovered when failures occur.
I'm not sure about camel-rabbitmq specifically, but hopefully there's a way you can swap in your own resource creation via Lyra.
Current camel-rabbitmq just create a connection and the channel when the consumer or producer is started. So it don't have a chance to catch the connection exception :(.

Programmatically flush JBoss 4.2.2 connection pool

I'm running JBoss 4.2.2. I'm trying to determine the correct code to both:
Lookup the org.jboss.resource.connectionmanager.JBossManagedConnectionPool
Perform a flush() operation on said pool.
I've found a couple other questions out there with no answers. I'm hoping this doesn't become yet another one of them.
The closest question I've found so far: https://community.jboss.org/message/637784
Here's the basics using a quickie groovy example.
First, you want jboss-4.2.2/client/jbossall-client.jar in your classpath.
Next, you need the JMX ObjectName of the data source. It may be helpful to find this in the JMX Console at http://localhost:8080/jmx-console/ or however you have deployed. So the string value of the ObjectName is the domain + ":" + the properties.
For example:
The ObjectName is: jboss.jca:name=DefaultDS,service=ManagedConnectionPool.
Next, look up the RMIAdaptor in JNDI. This is the MBeanServer interfac that will allow you to invoke the flush operation on the target MBean. Then call the invocation. That's it.
import javax.management.*;
import javax.naming.*;
p = new Properties();
p.put(Context.INITIAL_CONTEXT_FACTORY, "org.jnp.interfaces.NamingContextFactory");
p.put(Context.PROVIDER_URL, "localhost:1099");
ctx = new InitialContext(p);
rmiAdaptor = ctx.lookup("jmx/rmi/RMIAdaptor");
rmiAdaptor.invoke(new ObjectName("jboss.jca:name=DefaultDS,service=ManagedConnectionPool"), "flush", [] as Object[], [] as String[]);
Make sense ?
===== Update =====
If you are executing this from inside the JBoss JVM, you don't need the JNDI setup:
import javax.management.*;
import org.jboss.mx.util.MBeanServerLocator;
MBeanServer server = MBeanServerLocator.locateJBoss();
server.invoke(new ObjectName("jboss.jca:name=DefaultDS,service=ManagedConnectionPool"), "flush", [] as Object[], [] as String[]);