Using Redshift Spectrum with Cloud Formation - aws-cloudformation

I want to configure a RedShift Spectrum resource with a Cloud Formation template. What are the CF template parameters to do so?
For example, a normal RedShift can be templated like,
myCluster:
Type: "AWS::Redshift::Cluster"
Properties:
DBName: "mydb"
MasterUsername: "master"
MasterUserPassword:
Ref: "MasterUserPassword"
NodeType: "dw.hs1.xlarge"
ClusterType: "single-node"
Tags:
- Key: foo
Value: bar
What is the Spectrum equivalent?

Your template looks ok, but there is one more thing to consider which is the IAM role (IAMRoles array) that is needed the CF documentation lists this as an additional parameter.
myCluster:
Type: "AWS::Redshift::Cluster"
Properties:
DBName: "mydb"
MasterUsername: "master"
MasterUserPassword:
Ref: "MasterUserPassword"
NodeType: "dw.hs1.xlarge"
ClusterType: "single-node"
IamRoles:
- "arn:aws:iam::123456789012:role/S3Access"
Tags:
- Key: foo
Value: bar
The IAM role is needed to talk to the Glue / Athena catalog and authenticate your requests against your data in S3.

Amazon Redshift Spectrum is a feature of Amazon Redshift.
Simply launch a normal Amazon Redshift cluster and the features of Amazon Redshift Spectrum are available to you.
From Getting Started with Amazon Redshift Spectrum:
To use Redshift Spectrum, you need an Amazon Redshift cluster and a SQL client that's connected to your cluster so that you can execute SQL commands.

Related

Redshift Query Scheduler is State: Enabled but the queries are not running. IAM Permission problem?

I am attempting to use the Redshift Scheduler to run an UNLOAD to S3. The database is connected properly and when I connect and run the query from the Console it runs. I have enabled a scheduled query to run every hour, but it is not executing.
Am I missing something? I have set up the IAM role with the AmazonRedshiftFullAccess and Secret Manager access. The Trust Policy has scheduler.redshift.amazonaws.com and redshift.amazonaws.com as trusted entity providers and can assume role as my user.
I'm not getting any errors, it just is not running. State is enabled, but the S3 bucket is not updating and there are no query records in the history.
I needed to add the EventBridge permissions - events.amazonaws.com as a trusted provider and AmazonEventBridgeFullAccess policy.
Using CloudFormation I had to set the StatementName in RedshiftDataParameters to the same Name as the Rule
MyRule:
Type: AWS::Events::Rule
Properties:
ScheduleExpression: "cron(0/1 * ? * MON,TUE,WED,THU,FRI,SAT,SUN *)"
Name: "my-rule"
Targets:
- Arn: !Sub "arn:aws:redshift:${AWS::Region}:${AWS::AccountId}:cluster:${RedshiftClusterName}"
Id: "my-target-id"
RoleArn: !GetAtt RoleFromUpTop.Arn
RedshiftDataParameters:
Database: "my_db"
DbUser: "my_user"
Sql: "SELECT * FROM some_table;"
StatementName: "my-rule"

How to create initial database on Amazon Aurora PostgreSQL serverless cluster

I have created a serverless DB cluster using
aws rds create-db-cluster --db-cluster-identifier psql-test --engine aurora-postgresql --engine-version 10.12 --engine-mode serverless --scaling-configuration MinCapacity=2,MaxCapacity=4,SecondsUntilAutoPause=1000,AutoPause=true --enable-http-endpoint --master-username postgres --master-user-password password
Which parameter should be used to include an initial database like it is possible here?
You can find this information on the official API documentation or on the help information of the command
But you have to use --database-name nameOfYourDB
--database-name (string)
The name for your database of up to 64 alphanumeric characters. If
you do not provide a name, Amazon RDS doesn't create a database in
the DB cluster you are creating.

Postgres subchart not recommended for production enviroment for airflow in Kubernetes

I am new working with Airflow and Kubernetes. I am trying to use apache Airflow in Kubernetes.
To deploy it I used this chart: https://github.com/apache/airflow/tree/master/chart.
When I deploy it like in the link above a PostgreSQL database is created. When I explore the value.yml file of the chart I found this:
# Configuration for postgresql subchart
# Not recommended for production
postgresql:
enabled: true
postgresqlPassword: postgres
postgresqlUsername: postgres
I cannot find why is not recommended for production.
and also this:
data:
# If secret names are provided, use those secrets
metadataSecretName: ~
resultBackendSecretName: ~
# Otherwise pass connection values in
metadataConnection:
user: postgres
pass: postgres
host: ~
port: 5432
db: postgres
sslmode: disable
resultBackendConnection:
user: postgres
pass: postgres
host: ~
port: 5432
db: postgres
sslmode: disable
What is recommended for production? use my own PostgreSQL database outside Kubernetes? If it is correct, how can I use it instead this one? How I have to modify it to use my own postgresql?
The reason why it is not recommended for production is because the chart provides a very basic Postgres setup.
In container world containers are transient unlike processes in the VM world. So likelihood of database getting restarted or killed is high. So if we are running stateful components in K8s, someone needs to make sure that the Pod is always running with its configured storage backend.
The following tools help to run Postgres with High Availablity on K8s/containers and provides various other benefits:
Patroni
Stolon
We have used Stolon to run 80+ Postgres instances on Kubernetes in a microservices environment. These are for public facing products so services are heavily loaded as well.
Its very easy to setup a Stolon cluster once you understand its architecture. Apart from HA it also provides replication, standby clusters and CLI for cluster administration.
Also please consider this blog as well for making your decision. It brings in the perspective of how much Ops will be involved in different solutions.
Managing databases in Kubernetes its a pain and not recommended due to scaling, replicating, backups, among other common tasks are not as easy to do, what you should do is set up your own Postgres in VM or a managed cloud service as RDS or GCP, more information:
https://cloud.google.com/blog/products/databases/to-run-or-not-to-run-a-database-on-kubernetes-what-to-consider

Amazon Aurora PostgreSQL SELECT INTO OUTFILE S3

We are trying to export data from an Amazon Aurora PostgreSQL database to an S3 buckets. The code being used is like this:
SELECT * FROM analytics.my_test INTO OUTFILE S3
's3-us-east-2://myurl/sampledata'
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
MANIFEST ON
OVERWRITE ON;
All permissions have been set up but we get the error
SQL Error [42601]: ERROR: syntax error at or near "INTO" Position: 55
Does this only work with a MySQL database?
It is fairly new feature on Aurora Postgres, but it is possible to export the query result into a file on s3: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/postgresql-s3-export.html#postgresql-s3-export-file
The syntax is not the same as for MySQL though. For Postgres it is:
SELECT * from aws_s3.query_export_to_s3('select * from sample_table',
aws_commons.create_s3_uri('sample-bucket', 'sample-filepath', 'us-west-2')
);
I believe saving SQL select output data in S3 ONLY works for Amazon Aurora MySQL DB. I don't see any reference in the official documentation that mentions the same for Amazon Aurora PostgresSQL.
Here are snippets from official documentation that I referred to
Integrating Amazon Aurora MySQL with Other AWS Services
Amazon Aurora MySQL integrates with other AWS services so that you can
extend your Aurora MySQL DB cluster to use additional capabilities in
the AWS Cloud. Your Aurora MySQL DB cluster can use AWS services to do
the following:
Synchronously or asynchronously invoke an AWS Lambda function using
the native functions lambda_sync or lambda_async. For more
information, see Invoking a Lambda Function with an Aurora MySQL
Native Function.
Load data from text or XML files stored in an Amazon Simple Storage
Service (Amazon S3) bucket into your DB cluster using the LOAD DATA
FROM S3 or LOAD XML FROM S3 command. For more information, see Loading
Data into an Amazon Aurora MySQL DB Cluster from Text Files in an
Amazon S3 Bucket.
Save data to text files stored in an Amazon S3 bucket from your DB
cluster using the SELECT INTO OUTFILE S3 command. For more
information, see Saving Data from an Amazon Aurora MySQL DB Cluster
into Text Files in an Amazon S3 Bucket.
Automatically add or remove Aurora Replicas with Application Auto
Scaling. For more information, see Using Amazon Aurora Auto Scaling
with Aurora Replicas.
Integrating Amazon Aurora PostgreSQL with Other AWS Services
Amazon Aurora integrates with other AWS services so that you can
extend your Aurora PostgreSQL DB cluster to use additional
capabilities in the AWS Cloud. Your Aurora PostgreSQL DB cluster can
use AWS services to do the following:
Quickly collect, view, and assess performance for your Aurora
PostgreSQL DB instances with Amazon RDS Performance Insights.
Performance Insights expands on existing Amazon RDS monitoring
features to illustrate your database's performance and help you
analyze any issues that affect it. With the Performance Insights
dashboard, you can visualize the database load and filter the load by
waits, SQL statements, hosts, or users.
For more information about Performance Insights, see Using Amazon RDS
Performance Insights.
Automatically add or remove Aurora Replicas with Aurora Auto Scaling.
For more information, see Using Amazon Aurora Auto Scaling with Aurora
Replicas.
Configure your Aurora PostgreSQL DB cluster to publish log data to
Amazon CloudWatch Logs. CloudWatch Logs provide highly durable storage
for your log records. With CloudWatch Logs, you can perform real-time
analysis of the log data, and use CloudWatch to create alarms and view
metrics. For more information, see Publishing Aurora PostgreSQL Logs
to Amazon CloudWatch Logs.
Ther is no mention of saving data to S3 for PostgresSQL

Cloudformation RDS Aurora : Invalid Storage Type

Following is my CFD script to create an RDS instance.
I am trying to create Amazon Aurora with PostgreSQL compatibility, but I am facing: Invalid Storage Type : gp2 error.
SnapshotRDSDBInstance:
Type: AWS::RDS::DBInstance
Properties:
AllocatedStorage: 20
DBInstanceClass: 'db.t3.medium'
DBName: mydatabase
StorageType: gp2
Engine: aurora-postgresql
PubliclyAccessible: true
MultiAZ: false
DBSubnetGroupName: !Ref SnapshotRDSDBSubnetGroup
VPCSecurityGroups:
- !Ref SnapshotRDSDBSG
MasterUsername: 'test'
MasterUserPassword: 'Demo#123'
BackupRetentionPeriod: 15
DBInstanceIdentifier: 'myrds'
I also tried to remove the StorageType parameter in above script , but then I face Invalid storage type: standard error.
I am not able to understand the root cause.
I am using ap-south-1 (Mumbai) region to launch this script.
Aurora instances need to be associated with a AWS::RDS::DBCluster via DBClusterIdentifier without the cluster you get these generic errors
In order to understand your use case please clarify below questions
Are you trying to configure Aurora MySQL in Cluster mode which has Writer and Reader Instances.
Are you trying to create Aurora MySQL in Serverless.
If you are planning to go with option 1 then you need to first create a cluster using AWS::RDS::DBCluster and add Writer and Reader Instances using AWS::RDS::DBInstance.
If you are planning to go with option 2 then just use AWS::RDS::DBCluster
When you decided to go with either of the option with correct cloud formation resource, will eliminate your storage error.