Cant to connect to active Yarn ResourceManager - scala

There is problem with get data from a Yarn ResourceManager that is configured as High Availability (RM1 and RM2).
Im using scala like this:
import org.apache.hadoop.yarn.client.api.YarnClient
import org.apache.hadoop.yarn.conf.YarnConfiguration
val conf: YarnConfiguration = new YarnConfiguration()
val yarnClient: YarnClient = YarnClient.createYarnClient()
yarnClient.init(conf)
yarnClient.start()
val apps = yarnClient.getApplications()
I'm getting an error on the last line:
java.net.ConnectionException connection refused from "my host" to "RM1 host"
Before that occurs, the process is trying to connect to RM for a really long time.
Currently RM1 is not active, and RM2 is active.
Property of YarnConfiguration like yarn.resourcemanager.address.RM1/RM2 is correspond to real addresses.
I dont understand why my process is trying to connect to not active RM?

Related

How to query flink's queryable state

I am using flink 1.8.0 and I am trying to query my job state.
val descriptor = new ValueStateDescriptor("myState", Types.CASE_CLASS[Foo])
descriptor.setQueryable("my-queryable-State")
I used port 9067 which is the default port according to this, my client:
val client = new QueryableStateClient("127.0.0.1", 9067)
val jobId = JobID.fromHexString("d48a6c980d1a147e0622565700158d9e")
val execConfig = new ExecutionConfig
val descriptor = new ValueStateDescriptor("my-queryable-State", Types.CASE_CLASS[Foo])
val res: Future[ValueState[Foo]] = client.getKvState(jobId, "my-queryable-State","a", BasicTypeInfo.STRING_TYPE_INFO, descriptor)
res.map(_.toString).pipeTo(sender)
but I am getting :
[ERROR] [06/25/2019 20:37:05.499] [bvAkkaHttpServer-akka.actor.default-dispatcher-5] [akka.actor.ActorSystemImpl(bvAkkaHttpServer)] Error during processing of request: 'org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:9067'. Completing with 500 Internal Server Error response. To change default exception handling behavior, provide a custom ExceptionHandler.
java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:9067
what am I doing wrong ?
how and where should I define QueryableStateOptions
So If You want to use the QueryableState You need to add the proper Jar to Your flink. The jar is flink-queryable-state-runtime, it can be found in the opt folder in Your flink distribution and You should move it to the lib folder.
As for the second question the QueryableStateOption is just a class that is used to create static ConfigOption definitions. The definitions are then used to read the configurations from flink-conf.yaml file. So currently the only option to configure the QueryableState is to use the flink-conf file in the flink distribution.
EDIT: Also, try reading this]1 it provides more info on how does Queryable State works. You shouldn't really connect directly to the server port but rather You should use the proxy port which by default is 9069.

MongoEngine connects to incorrect database during testing

Context
I'm creating Flask app connected to mongodb using MongoEngine via flask-mongoengine extension. I create my app using application factory pattern as specified in configuration instructions.
Problem
While running test(s), I specified testing database named datazilla_test which is passed to mongo instance via mongo.init_app(app). Even though my app.config['MONGODB_DB'] and mongo.app.config['MONGODB_DB'] instance has correct value (datazilla_test), this value is not reflected in mongo instance. Thus, when I run assertion assert mongo.get_db().name == mongo.app.config['MONGODB_DB'] this error is triggered AssertionError: assert 'datazzilla' == 'datazzilla_test'
Question
What am I doing wrong? Why database connection persist with default database datazzilla rather, than datazilla_test? How to fix it?
Source Code
# __init__.py
from flask_mongoengine import MongoEngine
mongo = MongoEngine()
def create_app(config=None):
app = Flask(__name__)
app.config['MONGODB_HOST'] = 'localhost'
app.config['MONGODB_PORT'] = '27017'
app.config['MONGODB_DB'] = 'datazzilla'
# override default config
if config is not None:
app.config.from_mapping(config)
mongo.init_app(app)
return app
# conftest.py
import pytest
from app import mongo
from app import create_app
#pytest.fixture
def app():
app = create_app({
'MONGODB_DB': 'datazzilla_test',
})
assert mongo.get_db().name == mongo.app.config['MONGODB_DB']
# AssertionError: assert 'datazzilla' == 'datazzilla_test'
return app
Mongoengine is already connected when your fixture is called, when you call app = create_app from your fixture, it tries to re-establish the connection but fails silently (as it sees that there is an existing default connection established).
This got reworked in the development version of mongoengine (see https://github.com/MongoEngine/mongoengine/pull/2038) but wasn't released yet (As of 04-JUN-2019). When that version gets out, you'll be able to disconnect any existing mongoengine connection by calling disconnect_all
In the meantime, you can either:
- check where the existing connection is created and prevent it
- try to disconnect the existing connection by using the following:
from mongoengine.connection import disconnect, _connection_settings
#pytest.fixture
def app():
disconnect()
del _connection_settings['default']
app = create_app(...)
...
But it may have other side effects
Context
Coincidentally, I figure-out fix for this problem. #bagerard answer is correct! It works for MongoClient where client's connect is set to True -this is/should be default value.
MongoClient(host=['mongo:27017'], document_class=dict, tz_aware=False, connect=False, read_preference=Primary())
If that is the case, then you have to disconnect database and delete connection settings as #bagerard explains.
Solution
However, if you change MongoClient connection to False, then you don't have to disconnect database and delete connection settings. At the end solution that worked for me was this solution.
def create_app(config=None):
...
app.config['MONGODB_CONNECT'] = False
...
Notes
As I wrote earlier. I found this solution coincidentally, I was trying to solve this problem MongoClient opened before fork. Create MongoClient only after forking. It turned out that it fixes both problems :)
P.S If there are any side effects I'm not aware of them at this point! If you find some then please share them in comments section.

how to validate connection strings of azure iot hub in spark streaming?

i'm new to azure iot hub . im trying to pull message from azure iot hub using spark streaming . im getting error when i execute the code and i could understand there is some problem in connection strings. is there any specific way to validate the connection string in spark and also please tell me the format which i specified is correct or not .
My sample code:
import org.apache.spark.eventhubs._
val eventHubName = "xyztest.azure-devices.net"
val eventHubNSConnStr = "Endpoint=sb://testname.servicebus.windows.net/;SharedAccessKeyName=primary;SharedAccessKey=abcedfgrdxyeurjrsdfyasdf="
val connStr = ConnectionStringBuilder(eventHubNSConnStr).setEventHubName(eventHubName).build
val customEventhubParameters = EventHubsConf(connStr).setMaxEventsPerTrigger(5)
val incomingStream = spark.readStream.format("eventhubs").options(customEventhubParameters.toMap).load()
incomingStream.writeStream.outputMode("append").format("console").option("truncate", false).start().awaitTermination()
Errro:
java.util.concurrent.ExecutionException: com.microsoft.azure.eventhubs.IllegalEntityException: The messaging entity 'sb://testname.servicebus.windows.net/xyztest' could not be found.
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at org.apache.spark.eventhubs.client.EventHubsClient.partitionCount(EventHubsClient.scala:169)
val eventHubName = "xyztest.azure-devices.net"
It seems you set wrong event hub name. "xyztest.azure-devices.net" should be your Azure IoT Hub hostname.
To find the event hub name you can go your iot hub-> endpoints-> events and copy the value of Event Hub-compatible name like this:
In the end, the event hub connect string will have the following format:
Endpoint=sb://SAMPLE;SharedAccessKeyName=KEY_NAME;SharedAccessKey=KEY;EntityPath=EVENTHUB_NAME

Pyspark running error

The Spark I connected to, is not built on my local computer but a remote one. Everytime when I connect to it http://xx.xxx.xxx.xxx:10000/, the error says:
[IPKernelApp] WARNING | Unknown error in handling PYTHONSTARTUP file /usr/local/spark/python/pyspark/shell.py:
18/03/07 08:52:53 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Anyways, I still keep trying to run it on Jupyter notebook:
from pyspark.conf import SparkConf
SparkSession.builder.config(conf=SparkConf())
dir(spark)
When I ran it yesterday, it shows directory. when I did it today, it says:
NameError: name 'spark' is not defined
Any suggestion is appreciated!
you re missing the spark variable
from pyspark.conf import SparkConf
spark=SparkSession.builder.config(conf=SparkConf())
dir(spark)

spark scala on windows machine

I am learning from the class. I have run the code as shown in the class and i get below errors. Any idea what i should do?
I have spark 1.6.1 and Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_74)
val datadir = "C:/Personal/V2Maestros/Courses/Big Data Analytics with Spark/Scala"
//............................................................................
//// Building and saving the model
//............................................................................
val tweetData = sc.textFile(datadir + "/movietweets.csv")
tweetData.collect()
def convertToRDD(inStr : String) : (Double,String) = {
val attList = inStr.split(",")
val sentiment = attList(0).contains("positive") match {
case true => 0.0
case false => 1.0
}
return (sentiment, attList(1))
}
val tweetText=tweetData.map(convertToRDD)
tweetText.collect()
//val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
var ttDF = sqlContext.createDataFrame(tweetText).toDF("label","text")
ttDF.show()
The error is:
scala> ttDF.show()
[Stage 2:> (0 + 2) / 2]16/03/30 11:40:25 ERROR ExecutorClassLoader: Failed to check existence of class org.apache.spark.sql.catalyst.expressio
REPL class server at http://192.168.56.1:54595
java.net.ConnectException: Connection timed out: connect
at java.net.TwoStacksPlainSocketImpl.socketConnect(Native Method)
re/4729300
I'm no expert but the connection IP in the error message looks like a private node or even your router/modem local address.
As stated in the comment it could be that you're running the context with a wrong configuration that tries to spread the work to a cluster that's not there, instead of in your local jvm process.
For further information you can read here and experiment with something like
import org.apache.spark.SparkContext
val sc = new SparkContext(master = "local[4]", appName = "tweetsClass", conf = new SparkConf)
Update
Since you're using the interactive shell and the provided SparkContext available there, I guess you should pass the equivalent parameters to the shell command as in
<your-spark-path>/bin/spark-shell --master local[4]
Which instructs the driver to assign a master for the spark cluster on the local machine, on 4 threads.
I think the problem comes with connectivity and not from within the code.
Check if you can actually connect to this address and port (54595).
Probably your spark master is not accessible at the specified port. Use local[*] to validate using a smaller dataset and local master. Then, ckeck if the port is accessible or change it based on Spark port configuration (http://spark.apache.org/docs/latest/configuration.html)