Is there a way to connect Apache airflow with Superset? - visualization

Can we connect Superset with Airflow? So that when a task gets completed, Airflow can trigger the Superset to fetch the data from a data source?

Superset is a visualization tool which can display dashboards for multiple database types. While Airflow is a orchestration tool to manage job.
What Airflow can do is to ask the task to fetch data into database where Superset can consume.

Related

What are the best tools to schedule Snowflake tasks or python scripts in Ec2 to load data into snowflake?

Please share your experiences wrt orchestrating jobs run through various tools and programmatic interfaces to load data to Snowflake-
python scripts in Ec2 instances. currently scheduled using crontab.
tasks in snowflake
Alteryx workflows
Are there any tools with sophisticated UI to create job workflows with dependencies?
The workflow can have -
python script followed by a task
Alteryx workflow followed by a python script and then a task
If any job fails then it should send emails to the team.
Thanks
We have used both CONTROL-M and Apache Airflow to schedule and orchestrate data load to snowflake

Airflow database performance impact due to multiple connection to database

I have setup Airflow 1.10 to schedule python DAGs. I am also working on other project, which would need data from backend postgresql external database of Airflow (querying at every regular 5 min interval).
Now I am trying to understand the impact on Airflow database performance due to multiple connections on Airflow. Accordingly, I will plan my approach to get the data from Airflow database for other purpose

How can Spring Cloud Dataflow Server use new tables( with custom prefix ) created for Spring batch and Spring cloud task?

I have created spring cloud task tables i.e. TASK_EXECUTION, TASK_TASK_BATCH with prefix as MYTASK_ and spring batch Tables with prefix MYBATCH_ in oracle database.
There are default tables also there in the same schema which got created automatically or by other team mate.
I have bound my Oracle database service to SCDF server deployed on PCF.
How can i tell my Spring Cloud Dataflow server to use tables created with my prefix to render data on dataflow server dashboard?
Currently, SCDF dashboard uses tables with default prefix to render data. It works fine. I want to use my tables to render SCDF dashboard screens.
I am using Dataflowserver version - 1.7.3 and Deployed it on PCF using manifest.yml
There's an open story to add this enhancement via spring-cloud/spring-cloud-dataflow#2048.
Feel free to consider contributing or share use-case details in the issue.
Currently in a spring-cloud-dataflow and spring-cloud-skipper we use flyway to manage database schemas and it's not possible to prefix table names. Trying to support for this would add too much complexity and I'm not even sure if it'd be possible.

AWS DMS - Scheduled DB Migration

I have Postgresql db in RDS. I need to fetch data from a bunch of tables in postgresql db and push data into a S3 bucket every hour. I only want the delta changes (any new inserts / updates) to be sent in the hourly. Is it possible to do this using DMS or is EMR a better tool for performing this activity?
You can create an automated environment of migration data from RDS to S3 using AWS DMS (Data Migration Service) tasks.
Create a source endpoint (reading your RDS database - Postgres, MySQL, Oracle, etc...);
Create a target endpoint using S3 as an engine endpoint (read it: Using Amazon S3 as a Target for AWS Database Migration Service);
Create a replication instance, responsible to make a bridge between source data and target endpoint (you will only pay while processing);
Create a database migration task using the option 'Replication data change only' on migration type field;
Create a cron lambda, which starts a DMS task, with stack Python following these instructions of this articles Lambda with scheduled events e Start DMS tasks with boto3 in Python.
Connecting these resources above you may can have what you want.
Regards,
Renan S.

Loading data from S3 to PostgreSQL RDS

We are planning to go for PostgreSQL RDS in AWS environment. There are some files in S3 which we will need to load every week. I don't see any option in AWS documentation where we can load data from S3 to PostgreSQL RDS. I see it is possible for Aurora but cannot find anything for PostgreSQL.
Any help will be appreciated.
One option is to use AWS Data Pipeline. It's essentially a JSON script that allows you to orchestrate the flow of data between sources on AWS.
There's a template offered by AWS that's setup to move data between S3 and MySQL. You can find it here. You can easily follow this and swap out the MySQL parameters with those associated with your Postgres instance. Data Pipeline simply looks for RDS as the type and does not distinguish between MySQL and Postgres instances.
Scheduling is also supported by Data Pipeline, so you can automate your weekly file transfers.
To start this:
Go to the Data Pipeline service in your AWS console
Select "Build from template" under source
Select the "Load S3 to MySQL table" template
Fill in the rest of the fields and create the pipeline
From there, you can monitor the progress of the pipeline in the console!