I have an existing kinesis instance and my aim is to connect to it via a lambda function and process the records.
I created the lambda using vscode aws-toolkit extension by "create new SAM Application". I put some test records using boto3 in python. every time I revoke the lambda locally in the debug mode, the event is an empty object so there are no records to parse.
I can connect to the kinesis and retrieve records in python using boto3 to confirm the existence of the records.
Here is my template.yaml
Globals:
Function:
Timeout: 60
Resources:
KinesisRecors: Type: AWS::Serverless::Function
Properties:
CodeUri: kinesis_records/
Handler: app.lambda_handler
Runtime: python3.8
Events:
KinesisEvent:
Type: Kinesis
Properties:
Stream: arn:aws:....
StartingPosition: TRIM_HORIZON
BatchSize: 10
Enabled: false
I have also tested with Enabled: true with no success
The lamda function
import base64
def lambda_handler(event, context):
for record in event['Records']:
payload=base64.b64decode(record["kinesis"]["data"])
Is it possible to invoke the function locally and get records?
Related
Great Expectations creates temporary tables. I tried profiling data in my Snowflake lab. It worked because the role I was using could create tables in the schema that contained the tables I was profiling.
I tried to profile a table in a Snowflake share, where we can't create objects, and it failed:
(snowflake.connector.errors.ProgrammingError) 002003 (02000): SQL compilation error:
Schema 'OUR_DATABASE.SNOWFLAKE_SHARE_SCHEMA' does not exist or not authorized.
[SQL: CREATE OR REPLACE TEMPORARY TABLE ge_temp_3eb6c50b AS SELECT *
FROM "SNOWFLAKE_SHARE_SCHEMA"."INTERESTING_TABLE"
WHERE true]
(Background on this error at: https://sqlalche.me/e/14/f405)
Here's the output from the CLI:
% great_expectations suite new
Using v3 (Batch Request) API
How would you like to create your Expectation Suite?
1. Manually, without interacting with a sample batch of data (default)
2. Interactively, with a sample batch of data
3. Automatically, using a profiler
: 3
A batch of data is required to edit the suite - let's help you to specify it.
Select data_connector
1. default_runtime_data_connector_name
2. default_inferred_data_connector_name
3. default_configured_data_connector_name
: 3
Which data asset (accessible by data connector "default_configured_data_connector_name") would you like to use?
1. INTERESTING_TABLE
Type [n] to see the next page or [p] for the previous. When you're ready to select an asset, enter the index.
: 1
Name the new Expectation Suite [INTERESTING_TABLE.warning]:
Great Expectations will create a notebook, containing code cells that select from available columns in your dataset and
generate expectations about them to demonstrate some examples of assertions you can make about your data.
When you run this notebook, Great Expectations will store these expectations in a new Expectation Suite "INTERESTING_TABLE.warning" here:
file:///path/to-my-repo/great_expectations/expectations/INTERESTING_TABLE/warning.json
Would you like to proceed? [Y/n]: Y
Here's the datasources section from great_expectations.yml:
datasources:
our_snowflake:
class_name: Datasource
module_name: great_expectations.datasource
execution_engine:
module_name: great_expectations.execution_engine
credentials:
host: xyz92716.us-east-1
username: MYUSER
query:
schema: MYSCHEMA
warehouse: MY_WAREHOUSE
role: RW_ROLE
password: password1234
drivername: snowflake
class_name: SqlAlchemyExecutionEngine
data_connectors:
default_runtime_data_connector_name:
class_name: RuntimeDataConnector
batch_identifiers:
- default_identifier_name
module_name: great_expectations.datasource.data_connector
default_inferred_data_connector_name:
include_schema_name: true
class_name: InferredAssetSqlDataConnector
introspection_directives:
schema_name: SNOWFLAKE_SHARE_SCHEMA
module_name: great_expectations.datasource.data_connector
default_configured_data_connector_name:
assets:
INTERESTING_TABLE:
schema_name: SNOWFLAKE_SHARE_SCHEMA
class_name: Asset
module_name: great_expectations.datasource.data_connector.asset
class_name: ConfiguredAssetSqlDataConnector
module_name: great_expectations.datasource.data_connector
How can I tweak great_expectations.yml so that temporary objects are created in a separate database and schema from the datasource?
As a workaround, we created a view in the schema with read/write that points to the data in the read-only share. That adds an extra step. I'm hoping there's a simple config to create temporary objects outside the schema being profiled.
I'm starting in Azure Function & Cosmo DB.
I created in the Azure Portal a function app, then I followed the guide to get started in VS Code:
npm install -g azure-functions-core-tools#4 --unsafe-perm true
Then New Project
Then New function, selected the HTTP trigger template
When running F5 and deployed, it work.
Then I created, in the portal, a "Azure Cosmos DB API for MongoDB" database. I followed this to publish a document when having my method called:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-add-output-binding-cosmos-db-vs-code?tabs=in-process&pivots=programming-language-csharp
So my current result is:
a function:
namespace TakeANumber
{
public static class TestFunc
{
[FunctionName("TestFunc")]
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)] HttpRequest req,
[CosmosDB(databaseName: "cosmodb-take-a-number", collectionName: "take-a-number", ConnectionStringSetting = "cosmoDbConnectionString")] IAsyncCollector<dynamic> documentsOut,
ILogger log)
{
log.LogInformation("C# HTTP trigger function processed a request.");
string name = req.Query["name"];
string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
dynamic data = JsonConvert.DeserializeObject(requestBody);
name = name ?? data?.name;
string responseMessage = string.IsNullOrEmpty(name)
? "This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response."
: $"Hello, {name}. This HTTP triggered function executed successfully.";
if (!string.IsNullOrEmpty(name))
{
// Add a JSON document to the output container.
await documentsOut.AddAsync(new
{
// create a random ID
id = System.Guid.NewGuid().ToString(),
name = name
});
}
return new OkObjectResult(responseMessage);
}
}
}
A local.settings.json file with a cosmoDbConnectionString settings that contains a mongo db connexion string.
When I run the function, I get this:
[2022-04-21T17:40:34.078Z] Executed 'TestFunc' (Failed, Id=b69a625c-9055-48bd-a5fb-d3c3b3a6fb9b, Duration=4ms)
[2022-04-21T17:40:34.079Z] System.Private.CoreLib: Exception while executing function: TestFunc. Microsoft.Azure.WebJobs.Host: Exception binding parameter 'documentsOut'. Microsoft.Azure.DocumentDB.Core: Value cannot be null. (Parameter 'authKeyOrResourceToken | secureAuthKey').
My guess is that it's expecting a Core SQL database, with another kind of access token.
My question:
Is it possible to connect to an azure CosmoDB for MongoDB from an Azure function?
If you're using out-of-the-box bindings, you can only use Cosmos DB's SQL API.
You can totally use the MongoDB API, but you'd have to install a MongoDB client SDK and work with your data programmatically (just like you'd do with any other code-oriented approach).
Since your sample code is taking data in, and writing out to Cosmos DB, you'd do your writes via MongoDB's node/c#/python/etc driver (I believe they still call them drivers), which effectively gives you a db.collection.insert( {} ) or something more complex.
More info about Cosmos DB bindings here.
As CloudFormation does not natively support creating a DB User for an RDS Database, I am looking for ways to do this via CustomResource. However, even if I write a CustomResource backed by a Lambda function, I do not see an RDS API endpoint that would allow me to add a user to a database instance.
Could anyone suggest potential ways to create a DB User for an Aurora Cluster backed by Postgres 10 database engine?
I do not see an RDS API endpoint that would allow me to add a user to a database instance.
Usually you would set your custom resource to trigger after RDS is created. Thus, you can pass the RDS endpoint url to the lambda using, for example, function environment variables.
Practically DependsOn attribute on your custom resource could be used to ensure that the custom resource triggers after the RDS is successfully created. Not really needed if you pass the RDS url though environmental variables.
Update code with example lambda which uses pymysql:
MyLambdaFunction:
Type: AWS::Lambda::Function
Properties:
Handler: index.lambda_handler
Role: !Ref ExecRoleArn
Runtime: python3.7
Environment:
Variables:
DB_HOSTNAME: !Ref DbHostname
DB_USER: !Ref DbMasterUsername
DB_PASSWORD: !Ref DbMasterPassword
DB_NAME: !Ref DbName
VpcConfig:
SecurityGroupIds: [!Ref SecurityGroupId]
SubnetIds: !Ref SubnetIds
Code:
ZipFile: |
import base64
import json
import os
import logging
import random
import sys
import pymysql
import boto3
rds_host = os.environ['DB_HOSTNAME']
rds_user = os.environ['DB_USER']
rds_password = os.environ['DB_PASSWORD']
rds_dbname = os.environ['DB_NAME']
logger = logging.getLogger()
logger.setLevel(logging.INFO)
try:
conn = pymysql.connect(rds_host,
user=rds_user,
passwd=rds_password,
db=rds_dbname,
connect_timeout=5)
except:
logger.error("ERROR: Unexpected error: Could not connect to MySql instance.")
sys.exit()
def lambda_handler(event, context):
print(json.dumps(event))
with conn.cursor() as cur:
cur.execute("create table if not exists Employee (EmpID int NOT NULL auto_increment, Name varchar(255) NOT NULL, PRIMARY KEY (EmpID))")
conn.commit()
return {
'statusCode': 200,
'body': ""
}
Timeout: 60 #
MemorySize: 128
Layers:
- arn:aws:lambda:us-east-1:113088814899:layer:Klayers-python37-PyMySQL:1
I am trying to upload mongo db backup to google drive
I am installing following bundles dizda/cloud-backup-bundle and Happyr
/
GoogleSiteAuthenticatorBundle for adapters I am using cache/adapter-bundle
configuration:
dizda_cloud_backup:
output_file_prefix: '%dizda_hostname%'
timeout: 300
processor:
type: zip # Required: tar|zip|7z
options:
compression_ratio: 6
password: '%dizda_compressed_password%'
cloud_storages:
google_drive:
token_name: 'AIzaSyA4AE21Y-YqneV5f9POG7MPx4TF1LGmuO8' # Required
remote_path: ~ # Not required, default "/", but you can use path like "/Accounts/backups/"
databases:
mongodb:
all_databases: false # Only required when no database is set
database: '%database_name%'
db_host: '%mongodb_backup_host%'
db_port: '%mongodb_port%'
db_user: '%mongodb_user%'
db_password: '%mongodb_password%'
cache_adapter:
providers:
my_redis:
factory: 'cache.factory.redis'
happyr_google_site_authenticator:
cache_service: 'cache.provider.my_redis'
tokens:
google_drive:
client_id: '85418079755-28ncgsoo91p69bum6ulpt0mipfdocb07.apps.googleusercontent.com'
client_secret: 'qj0ipdwryCNpfbJQbd-mU2Mu'
redirect_url: 'http://localhost:8000/googledrive/'
scopes: ['https://www.googleapis.com/auth/drive']
when I use factory: 'cache.factory.mongodb' getting
You have requested a non-existent service "cache.factory.mongodb" this while running server and while running backup command getting
Something went terribly wrong. We could not create a backup. Read your log files to see what caused this error
I verified logs getting Command "--env=prod dizda:backup:start" exited with code "1" {"command":"--env=prod dizda:backup:start","code":1} []
I am not sure which adapter needs to use and what's going on here.
Can someone help me? Thanks in advance
I wrote the lambda function in python3.6 to access the postgresql database which is running in EC2 instance.
psycopg2.connect(user="<USER NAME>",
password="<PASSWORD>",
host="<EC2 IP Address>",
port="<PORT NUMBER>",
database="<DATABASE NAME>")
created deployment package with required dependencies as zip file and uploaded into AWS lambda.To build dependency i followed THIS reference guide.
And also configured Virtual Private Cloud (VPC) as default one and also included Ec2 instance details, but i couldn't get the connection from database. when trying to connect database from lambda result in timeout.
Lambda function:
from __future__ import print_function
import json
import ast,datetime
import psycopg2
def lambda_handler(event, context):
received_event = json.dumps(event, indent=2)
load = ast.literal_eval(received_event)
try:
connection = psycopg2.connect(user="<USER NAME>",
password="<PASSWORD>",
host="<EC2 IP Address>",
# host="localhost",
port="<PORT NUMBER>",
database="<DATABASE NAME>")
cursor = connection.cursor()
postgreSQL_select_Query = "select * from test_table limit 10"
cursor.execute(postgreSQL_select_Query)
print("Selecting rows from mobile table using cursor.fetchall")
mobile_records = cursor.fetchall()
print("Print each row and it's columns values")
for row in mobile_records:
print("Id = ", row[0], )
except (Exception,) as error :
print ("Error while fetching data from PostgreSQL", error)
finally:
#closing database connection.
if(connection):
cursor.close()
connection.close()
print("PostgreSQL connection is closed")
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!'),
'dt' : str(datetime.datetime.now())
}
I googled quite a lot, But i couldn't found any workaround for this.is there any way to accomplish this requirement?
Your configuration would need to be:
A database in a VPC
The Lambda function configured to use the same VPC as the database
A security group on the Lambda function (Lambda-SG)
A security group on the Database (DB-SG) that permits inbound connects from Lambda-SG on the relevant database port
That is, DB-SG refers to Lambda-SG.
For lambda to connect to any resources inside a VPC, it needs to setup ENIs to the related private subnets of the VPC. Have you set up the VPC association and security groups of the EC2 correctly?
You can refer https://docs.aws.amazon.com/lambda/latest/dg/vpc.html