I’m new to deploying ML models and I want to deploy a model that contains several modules, each of which consist of “folders” containing some data files, .py scripts and a Python notebook.
I created a project in GitLab and I’m trying to follow tutorials on FastAPI since this is what I’m gonna be using. But I’ve been told that before I start integrating the code, I need to set up a health endpoint.
I know about the request curl "https://gitlab.example.com/-/health", but do I need to set up anything? Is there anything else I need to do for the project setup before doing the requirements.txt, building the skeleton of the application etc.?
It depend totaly of your needs, there is no health endpoint implemented natively in fastapi.
But I’ve been told that before I start integrating the code, I need to set up a health endpoint.
not necessarly a bad practice, you could start by listing all your futur health checks and build your route from there.
update from comment:
But I don’t know how to implement this. I need a config file? I’m very new to this.
From what i understand you are very new to python api so you should start by following the official fastapi user guide. You can also follow fastapi first steps from this.
Very basic one file project that run as is:
# main.py
from fastapi import FastAPI
app = FastAPI()
#app.get("/health")
async def root():
return {"message": "Alive!"}
Remember that the above is not suitable for production, only for testing/learning purposes, to make a production api you should follow the official advanced user guide and implement something like the following.
more advanced router:
You have this health lib for fastapi that is nice.
You can make basic checks like this:
# app.routers.health.py
from fastapi import APIRouter, status, Depends
from fastapi_health import health
from app.internal.health import healthy_condition, sick_condition
router = APIRouter(
tags=["healthcheck"],
responses={404: {"description": "not found"}},
)
#router.get('/health', status_code=status.HTTP_200_OK)
def perform_api_healthcheck(health_endpoint=Depends(health([healthy_condition, sick_condition]))):
return health_endpoint
# app.internal.health.py
def healthy_condition(): # just for testing puposes
return {"database": "online"}
def sick_condition(): # just for testing puposes
return True
Related
I am trying to implement TSOA with an existing HapiJS server and would like some insight on the best approach.
You can run tsoa spec-and-routes to generate routes.ts and swagger.json. However, running this manually before running the node process is less than ideal.
The solution would be then to run them programatically using the APIs provided by the TSOA library. However, when registering the routes with my Hapi server, I need to import the generated routes.ts file. e.g import RegisterRoutes from '../build/routes.ts.
So when I run the node process, generate the routes during this (programatically), it tries to grab '../build/routes.ts' before it has been built. Producing an error and the node proceess exits.
What is the way around this?
tsoa spec-and-routes && node bin/node ?
Any clarification would be greatly appreciated. Thanks.
I am wondering what approach in designing serverless functions to take, while taking designing a regular server as a point of reference.
With a traditional server, one would focus on defining collections and then CRUD operations one can run on each of them (HTTP verbs such as GET or POST).
For example, you would have a collection of users, and you can get all records via app.get('/users', ...), get specific one via app.get('/users/{id}', ...) or create one via app.post('/users', ...).
How differently would you approach designing a serverless function? Specifically:
Is there a sense in differentiating between HTTP operations or would you just go with POST? I find it useful to have them defined on the client side, to decide if I want to retry in case of an error (if the operation is idempotent, it will be safe to retry etc.), but it does not seem to matter in the back-end.
Naming. I assume you would use something like getAllUsers() when with a regular server you would define collection of users and then just use GET to specify what you want to do with it.
Size of functions: if you need to do a number of things in the back-end in one step. Would you define a number of small functions, such as lookupUser(), endTrialForUser() (fired if user we got from lookupUser() has been on trial longer than 7 days) etc. and then run them one after another from the client (deciding if trial should be ended on the client - seems quite unsafe), or would you just create a getUser() and then handle all the logic there?
Routing. In serverless functions, we can't really do anything like .../users/${id}/accountData. How would you go around fetching nested data? Would you just return a complete JSON every time?
I have been looking for some comprehensive articles on the matter but no luck. Any suggestions?
This is a very broad question that you've asked. Let me try answering it point by point.
Firstly, the approach that you're talking about here is the Serverless API project approach. You can clone their sample project to get a better understanding of how you can build REST apis for performing CRUD operations. Start by installing the SAM cli and then run the following commands.
$ sam init
Which template source would you like to use?
1 - AWS Quick Start Templates
2 - Custom Template Location
Choice: 1
Cloning from https://github.com/aws/aws-sam-cli-app-templates
Choose an AWS Quick Start application template
1 - Hello World Example
2 - Multi-step workflow
3 - Serverless API
4 - Scheduled task
5 - Standalone function
6 - Data processing
7 - Infrastructure event management
8 - Machine Learning
Template: 3
Which runtime would you like to use?
1 - dotnetcore3.1
2 - nodejs14.x
3 - nodejs12.x
4 - python3.9
5 - python3.8
Runtime: 2
Based on your selections, the only Package type available is Zip.
We will proceed to selecting the Package type as Zip.
Based on your selections, the only dependency manager available is npm.
We will proceed copying the template using npm.
Project name [sam-app]: sample-app
-----------------------
Generating application:
-----------------------
Name: sample-app
Runtime: nodejs14.x
Architectures: x86_64
Dependency Manager: npm
Application Template: quick-start-web
Output Directory: .
Next steps can be found in the README file at ./sample-app/README.md
Commands you can use next
=========================
[*] Create pipeline: cd sample-app && sam pipeline init --bootstrap
[*] Test Function in the Cloud: sam sync --stack-name {stack-name} --watch
Comings to your questions point wise:
Yes, you should differentiate your HTTP operations with their suitable HTTP verbs. This can be configured at the API Gateway and can be checked for in the Lambda code. Check the source code of handlers & the template.yml file from the project you've just cloned with SAM.
// src/handlers/get-by-id.js
if (event.httpMethod !== 'GET') {
throw new Error(`getMethod only accepts GET method, you tried: ${event.httpMethod}`);
}
# template.yml
Events:
Api:
Type: Api
Properties:
Path: /{id}
Method: GET
The naming is totally up to the developer. You can follow the same approach that you're following with your regular server project.
You can define the handler with name getAllUsers or users and then set the path of that resource to GET /users in the AWS API Gateway. You can choose the HTTP verbs of your desire. Check this tutorial out for better understanding.
Again this up to you. You can create a single Lambda that handles all that logic or create individual Lambdas that are triggered one after another by the client based on the response from the previous API. I would say, create a single Lambda and just return the cumulative response to reduce the number of requests. But again, this totally depends on the UI integration. If your screens demand separate API calls, then please, by all means create individual lambdas.
This is not true. We can have dynamic routes specified in the API Gateway.
You can easily set wildcards in your routes by using {variableName} while setting the routes in API Gateway.
GET /users/{userId}
The userId will then be available at your disposal in the lambda function via event.pathParameters.
GET /users/{userId}?a=x
Similarly, you could even pass query strings and access them via event.queryStringParameters in code. Have a look at working with routes.
Tutorial I would recommend for you:
Tutorial: Build a CRUD API with Lambda and DynamoDB
Currently, I'm developing a native application using React-Native. I've decided to go with AWS Amplify because of it's real time updates as well as its authentication.
I also have a Web Application that runs on a Node.js with Epxress server. This web application connects to a Mongo database.
My big problem is that I would like to have all of my aws amplify queries run to my existing MongoDb instead of a new dynamoDb database which is provided with AWS AppSync, but unfortunately I dont know where to start. This is especially helpful in adding authentication easily in my existing web application as well.
My first idea was to just create all my API endpoints in a new node js server and have app sync call to these API end points, but I'm not sure how to implement calling end points on an existing server (and this seems kind of counter intuitive to the 'serverless' idea)
My other idea came from this: Can AWS App-Sync be used without dynamoDB
This states to use AWS Lambda to 'pipeline' my data to the existing mongodb, but I'm not really sure what that entails.
TL;DR - I would like to be able to query an existing Mongodb instead of using DynamoDb when using AWS Amplify with AppSync.
I hope this is clear enough and doesn't sound like I'm rambling. Thanks in advance!
I would suggest using either an HTTP datasource to connect to your MongoDB backend or a Lambda function. Here are a couple getting started tutorials for both:
https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-http-resolvers.html
https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-lambda-resolvers.html
If you go the Lambda route, then you can leverage the new #function feature of the GraphQL Transformer in the Amplify CLI: https://aws-amplify.github.io/docs/cli/graphql#function
Is it possible to integrate a headless browser with locust? I need my load tests to process client side script that triggers additional requests on each page load.
That's an old question, but I came across it now, and found a solution in "realbrowserlocusts" (https://github.com/nickboucart/realbrowserlocusts) - it adds "Real Browser support for Locust.io load testing" using selenium.
If you use one of its classes (FirefoxLocust, ChromeLocust, PhantomJSLocust) instead of HttpLocust for your locust user class
class WebsiteUser(HeadlessChromeLocust):
then in your TaskSet self.client becomes an instance of selenium WebDriver.
One drawback for me was that webdriver (unlike built-in client in HttpLocust) doesn't know about "host", which forces to use absolute URLs in TaskSet instead of relative ones, and it's really convenient to use relative URLs when working with different environments (local, dev, staging, prod, etc.).
But there is an easy fix for this: to inherit from one of realbrowserlocusts' locusts and pass "host" to WebDriver instance:
from locust import TaskSet, task, between
from realbrowserlocusts import HeadlessChromeLocust
class UserBehaviour(TaskSet):
#task(1)
def some_action(self):
# self.client is selenium WebDriver instance
self.client.get(self.client.base_host + "/relative/url")
# and then for inst. using selenium methods:
self.client.find_element_by_name("form-name").send_keys("your text")
# etc.
class ChromeLocustWithHost(HeadlessChromeLocust):
def __init__(self):
super(ChromeLocustWithHost, self).__init__()
self.client.base_host = self.host
class WebsiteUser(ChromeLocustWithHost):
screen_width = 1200
screen_height = 1200
task_set = UserBehaviour
wait_time = between(5, 9)
============
UPDATE from September 5, 2020:
I posted this solution in March 2020, when locust was on major version 0. Since then, in May 2020, they released version 1.0.0 in which some backward incompatible changes were made (one of which - renaming StopLocust to StopUser). realbrowserlocusts was not updated for a while, and is not updated yet to work with locust >=1.
There is a workaround though. When locust v1.0.0 was release, previous versions were released under a new name - locustio with the last version 0.14.6, so if you install "locustio==0.14.6" (or "locustio<1"), then a solution with realbrowserlocusts still works (I checked just now). (see https://github.com/nickboucart/realbrowserlocusts/issues/13).
You have to limit a version of locustio, as it refuses to install without it:
pip install locustio
...
ERROR: Command errored out with exit status 1:
...
**** Locust package has moved from 'locustio' to 'locust'.
Please update your reference
(or pin your version to 0.14.6 if you dont want to update to 1.0)
In theory you could make a headless browser a Locust slave/worker. But the problem is that the browser is much more expensive in terms of CPU and memory which would make it difficult to scale.
That is why Locust uses small greenlets to simulate users since they much cheaper to construct and run.
I would recommend you to break down your page's requests and encode them as requests inside of Locust. The Network tab in Chrome's Dev Tools is probably a good start. I've also heard of people capturing these by going through a proxy that logs all requests for you.
You could use something like browserless to take care of the hosting of Chrome (https://browserless.io). Depending on how you brutal your load tests are there’s varying degrees of concurrency. Full disclaimer: I’m the maker of the browserless service
I think locust is not desinged for that purposes, it is for creating concurrent user to make http requests so I didnt see any integration with locust and browser. However you can simulate browser by sending extra information in the header with that way client side scripts will also work.
r = self.client.get("/orders", headers = {"Cookie": self.get_user_cookie(user[0]), 'User-Agent': self.user_agent})
The locust way of solving this is to add more requests to your test that mimic the requests that the javascript code will make.
I structure my locust tests to parse the JSON response from an early request in the app's workflow. I then randomly pick some interesting piece of data from that JSON, and then issue more requests that mimic what would happen in the browser if the user had clicked on that piece of data.
I have been of late trying out apache spark. My question is more specific to trigger spark jobs. Here I had posted question on understanding spark jobs. After getting dirty on jobs I moved on to my requirement.
I have a REST end point where I expose API to trigger Jobs, I have used Spring4.0 for Rest Implementation. Now going ahead I thought of implementing Jobs as Service in Spring where I would submit Job programmatically, meaning when the endpoint is triggered, with given parameters I would trigger the job.
I have now few design options.
Similar to the below written job, I need to maintain several Jobs called by a Abstract Class may be JobScheduler .
/*Can this Code be abstracted from the application and written as
as a seperate job. Because my understanding is that the
Application code itself has to have the addJars embedded
which internally sparkContext takes care.*/
SparkConf sparkConf = new SparkConf().setAppName("MyApp").setJars(
new String[] { "/path/to/jar/submit/cluster" })
.setMaster("/url/of/master/node");
sparkConf.setSparkHome("/path/to/spark/");
sparkConf.set("spark.scheduler.mode", "FAIR");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
sc.setLocalProperty("spark.scheduler.pool", "test");
// Application with Algorithm , transformations
extending above point have multiple versions of jobs handled by service.
Or else use an Spark Job Server to do this.
Firstly, I would like to know what is the best solution in this case, execution wise and also scaling wise.
Note : I am using a standalone cluster from spark.
kindly help.
It turns out Spark has a hidden REST API to submit a job, check status and kill.
Check out full example here: http://arturmkrtchyan.com/apache-spark-hidden-rest-api
Just use the Spark JobServer
https://github.com/spark-jobserver/spark-jobserver
There are a lot of things to consider with making a service, and the Spark JobServer has most of them covered already. If you find things that aren't good enough, it should be easy to make a request and add code to their system rather than reinventing it from scratch
Livy is an open source REST interface for interacting with Apache Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN.
Here is a good client that you might find helpful: https://github.com/ywilkof/spark-jobs-rest-client
Edit: this answer was given in 2015. There are options like Livy available now.
Even I had this requirement I could do it using Livy Server, as one of the contributor Josemy mentioned. Following are the steps I took, hope it helps somebody:
Download livy zip from https://livy.apache.org/download/
Follow instructions: https://livy.apache.org/get-started/
Upload the zip to a client.
Unzip the file
Check for the following two parameters if doesn't exists, create with right path
export SPARK_HOME=/opt/spark
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
Enable 8998 port on the client
Update $LIVY_HOME/conf/livy.conf with master details any other stuff needed
Note: Template are there in $LIVY_HOME/conf
Eg. livy.file.local-dir-whitelist = /home/folder-where-the-jar-will-be-kept/
Run the server
$LIVY_HOME/bin/livy-server start
Stop the server
$LIVY_HOME/bin/livy-server stop
UI: <client-ip>:8998/ui/
Submitting job:POST : http://<your client ip goes here>:8998/batches
{
"className" : "<ur class name will come here with package name>",
"file" : "your jar location",
"args" : ["arg1", "arg2", "arg3" ]
}