Spark Application Testing: Initialize variable in object with different file in Production and Test Env - scala

I am working on a spark application. While writing test cases for application , i came across requirement where i need to initialize a var in object with different file rather than production file. For ex:
ObjA{
val properties: Properties = new Properties()
properties.load(new FileInputStream("constants.properties"))
val map = JavaConverters.propertiesAsScalaMapConverter(properties).asScala
}
In Test enviornment i wanted to use a different file other than constants.properties . i.e i wanted to initialize objA with file constant.properties present in test directory. I dont want to pass file name as argument to method. Is there a way where application figures out env(Test or production) and initialize variable as per that?
Thanks

You could use ip from driver (node you submit application). Before, initializing sparkContext finding you test env ip, you could do some thing like below,
import java.net._
val ip = InetAddress.getLocalHost
val props = if (ip == "ip-xx-xxx-xx-xx.ec2.internal") "TestConstant.properties" else "productionConstants.properties"
properties.load(new FileInputStream("props"))
so, in test env it is gonna load test property in any other env it is gonna load prod property

Related

How can I use FastAPI Routers with FastAPI-Users and MongoDB?

I can use MongoDB with FastAPI either
with a global client: motor.motor_asyncio.AsyncIOMotorClient object, or else
by creating one during the startup event per this SO answer which refers to this "Real World Example".
However, I also want to use fastapi-users since it works nicely with MongoDB out of the box. The downside is it seems to only work with the first method of handling my DB client connection (ie global). The reason is that in order to configure fastapi-users, I have to have an active MongoDB client connection just so I can make the db object as shown below, and I need that db to then make the MongoDBUserDatabase object required by fastapi-users:
# main.py
app = FastAPI()
# Create global MongoDB connection
DATABASE_URL = "mongodb://user:paspsword#localhost/auth_db"
client = motor.motor_asyncio.AsyncIOMotorClient(DATABASE_URL, uuidRepresentation="standard")
db = client["my_db"]
# Set up fastapi_users
user_db = MongoDBUserDatabase(UserDB, db["users"])
cookie_authentication = CookieAuthentication(secret='lame secret' , lifetime_seconds=3600, name='cookiemonster')
fastapi_users = FastAPIUsers(
user_db,
[cookie_authentication],
User,
UserCreate,
UserUpdate,
UserDB,
)
After that point in the code, I can import the fastapi_users Routers. However, if I want to break up my project into FastAPI Routers of my own, I'm hosed because:
If I move the client creation to another module to be imported into both my app and my routers, then I have different clients in different event loops and get errors like RuntimeError: Task <Task pending name='Task-4' coro=<RequestResponseCycle.run_asgi() running at /usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py:389> cb=[set.discard()]> got Future <Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/local/lib/python3.8/asyncio/futures.py:360]> attached to a different loop (touched on in this SO question)
If I user the solutions of the "Real World Example", then I get stuck on where to build my fastapi_users object in my code example: I can't do it in main.py because there's no db object yet.
I considered making the MongoDBUserDatabase object as part of the startup event code (ie within async def connect_to_mongo() from the Real World Example), but I'm not able to get that to work either since I can't see how to make it work.
How can I either
make a global MongoDB client and FastAPI-User object in a way that can be shared among my main app and several routers without "attached to a different loop" errors, or
create fancy wrapper classes and functions to set up FastAPI users with the startup trigger?
I don't think my solution is complete or correct, but I figured I'd post it in case it inspires any ideas, I'm stumped. I have run into the exact dilemma, almost seems like a design flaw..
I followed this MongoDB full example and named it main.py
At this point my app does not work. The server starts up but result results in the aforementioned "attached to a different loop" whenever trying to query the DB.
Looking for guidance, I stumbled upon the same "real world" example
In main.py added the startup and shudown event handlers
# Event handlers
app.add_event_handler("startup", create_start_app_handler(app=app))
app.add_event_handler("shutdown", create_stop_app_handler(app=app))
In dlw_api.db.events.py this:
import logging
from dlw_api.user import UserDB
from fastapi import FastAPI
from fastapi_users.db.mongodb import MongoDBUserDatabase
from motor.motor_asyncio import AsyncIOMotorClient
LOG = logging.getLogger(__name__)
DB_NAME = "dlwLocal"
USERS_COLLECTION = "users"
DATABASE_URI = "mongodb://dlw-mongodb:27017" # protocol://container_name:port
_client: AsyncIOMotorClient = None
_users_db: MongoDBUserDatabase = None
def get_users_db() -> MongoDBUserDatabase:
return _users_db
async def connect_to_db() -> None:
global _users_db
# logger.info("Connecting to {0}", repr(DATABASE_URL))
client = AsyncIOMotorClient(DATABASE_URI)
db = client[DB_NAME]
collection = db[USERS_COLLECTION]
_users_db = MongoDBUserDatabase(UserDB, collection)
LOG.info(f"Connected to {DATABASE_URI}")
async def close_db_connection(app: FastAPI) -> None:
_client.close()
LOG.info("Connection closed")
And dlw_api.events.py:
from typing import Callable
from fastapi import FastAPI
from dlw_api.db.events import close_db_connection, connect_to_db
from dlw_api.user import configure_user_auth_routes
from fastapi_users.authentication import CookieAuthentication
from dlw_api.db.events import get_users_db
COOKIE_SECRET = "THIS_NEEDS_TO_BE_SET_CORRECTLY" # TODO: <--|
COOKIE_LIFETIME_SECONDS: int = 3_600
COOKIE_NAME = "c-is-for-cookie"
# Auth stuff:
_cookie_authentication = CookieAuthentication(
secret=COOKIE_SECRET,
lifetime_seconds=COOKIE_LIFETIME_SECONDS,
name=COOKIE_NAME,
)
auth_backends = [
_cookie_authentication,
]
def create_start_app_handler(app: FastAPI) -> Callable:
async def start_app() -> None:
await connect_to_db(app)
configure_user_auth_routes(
app=app,
auth_backends=auth_backends,
user_db=get_users_db(),
secret=COOKIE_SECRET,
)
return start_app
def create_stop_app_handler(app: FastAPI) -> Callable:
async def stop_app() -> None:
await close_db_connection(app)
return stop_app
This doesn't feel correct to me, does this mean all routes that use Depends for user-auth have to be included on the server startup event handler??
The author (frankie567) of fastapi-users created a repl.it showing a solution of sorts. My discussion about this solution may provide more context but the key parts of the solution are:
Don't bother using FastAPI startup trigger along with Depends for your MongDB connectivity management. Instead, create a separate file (ie db.py) to create your DB connection and client object. Import this db object whenever needed, like your Routers, and then use it as a global.
Also create a separate users.py to do 2 things:
Create globally used fastapi_users = FastAPIUsers(...) object for use with other Routers to handle authorization.
Create a FastAPI.APIRouter() object and attach all the fastapi-user routers to it (router.include_router(...))
In all your other Routers, import both db and fastapi_users from the above as needed
Key: split your main code up into
a main.py which only import uvicorn and serves app:app.
an app.py which has your main FastAPI object (ie app) and which then attaches all our Routers, including the one from users.py with all the fastapi-users routers attached to it.
By splitting up code per 4 above, you avoid the "attached to different loop" error.
I faced similar issue, and all I have to do to get motor and fastapi run in the same loop is this:
client = AsyncIOMotorClient()
client.get_io_loop = asyncio.get_event_loop
I did not set on_startup or whatsoever.

Content not saved in file while spark job running on cluster (yarn)

I am facing a weird issue and I've stuck with that for a while.
Basically in my spark application I have a method, where I create a file in MapR fs and save some content in this file.
The method is like below:
def collector(value: String): Unit = {
val conf = new Configuration()
val fs= FileSystem.get(conf)
val path = getConfig("path")
Try {fs.create(new Path(path + s"${scala.util.Random.alphanumeric take 10 mkString}" +
Calendar.getInstance.getTimeInMillis))
.write(value.getBytes);
fs.close()}.getOrElse(logger.warn("File not saved"))
}
This method is called from different object. When I run it with --master local[*], then the file is created in specific location in the FS with string that is passed in value. However, when I run it on cluster (--master yarn), then only empty file is saved. When I print value, its printing the string. But for some reason not saving it in the file.
I wonder if anybody has any idea why?
Thanks

Passing In Config In Gatling Tests

Noob to Gatling/Scala here.
This might be a bit of a silly question but I haven't been able to find an example of what I am trying to do.
I want to pass in things such as the baseURL, username and passwords for some of my calls. This would change from env to env, so I want to be able to change these values between the envs but still have the same tests in each.
I know we can feed in values but it appears that more for iterating over datasets and not so much for passing in the config values like I have.
Ideally I would like to house this information in a JSON file and not pass it in on the command line, but maybe thats not doable?
Any guidance on this would be awesome.
I have a similar setup and you can use pure scala here .In this scenario you can create an object called Config for eg
object Configuration { var INPUT_PROFILE_FILE_NAME = ""; }
This class can also read a file , I have the below code in the above object
val file = getClass.getResource("data/config.properties").getFile()
val prop = new Properties()
prop.load(new FileInputStream(file));
INPUT_PROFILE_FILE_NAME = prop.getProperty("inputProfileFileName")
Now you can import this object in Gattling Simulation File
val profileName= Configuration.INPUT_PROFILE_FILE_NAME ;
https://docs.scala-lang.org/tutorials/tour/singleton-objects.html.html

Gatling Configuration

Is it possible to programmatically set the gatling.core.directory.data path from the gatling.conf?
I am attempting to read in a CSV that is not the in the default directory.
I have attempted to do;
System.getProperties.setProperty("gatling.core.directory.data",FilePathHelper.getGatlingDataFilePath.getAbsolutePath)
But I still get a null pointer for my file;
val users = csv("user.csv")
Thanks
In the end changing the data path is very easy and to run from code is just as simple!
val props = new GatlingPropertiesBuilder
props.simulationClass(<your runner>)
props.dataDirectory(new File(<your data dir>))
props.resultsDirectory(new File(<your report dir>))
Gatling.fromMap(props.build)

Is there a way to find out which pytest-xdist gateway is running?

I would like to create a separate log file for each subprocess/gateway that is spawned by pytest-xdist. Is there an elegant way of finding out in which subprocess/gateway pytest is currently in? I'm configuring my root logger with a session scoped fixture located in conftest.py, something like this:
#pytest.fixture(scope='session', autouse=True)
def setup_logging():
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
fh = logging.FileHandler('xdist.log')
fh.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.addHandler(fh)
It would be great if i could add a prefix to the log file name based on the gateway number, e.g:
fh = logging.FileHandler('xdist_gateway_%s.log' % gateway_number)
Without this each gateway will use the same log and the logs will get messy. I know that I can add a time stamp to the filename. But this doesn't let me to distinguish quickly which file is from which gateway.
Similar to #Kanguros's answer but plugging into the pytest fixture paradigm:
You can get the worker id by [accessing] the slaveinput dictionary. Here's a fixture which makes that information available to tests and other fixtures:
#pytest.fixture
def worker_id(request):
if hasattr(request.config, 'workerinput'):
return request.config.workerinput['workerid']
else:
return 'master'
This is quoted from a comment on the pytest-xdist Issues tracker/discussion (2016).
I found out that you can access the gateway id in the following way:
slaveinput = getattr(session.config, "slaveinput", None)
if slaveinput:
gatewayid = slaveinput['slaveid']
Of course you need to be in a place where you can access the session.config object.