Read Glue Metadata using Lambda Fucntion and Push it to RDS PostgreSQL - postgresql

I am new to Lambda Function and need some help with writing a Lambda Function, which will access Glue Catalog Database Metadata and dump that into a RDS PostgreSQL database table.
Please let me know if there is any Python code for Lambda that can support this use case.
Thanks in advance.

Related

Redshift Super Type in Aws Glue

I’m trying to create a continuous migration job from Aws S3 to Redshift using Aws Glue.
I wish to load object data types to Redshift as super type directly in Aws Glue.
However during the function glueContext.write_dynamic_frame.from_jdbc_conf, If the data contains an object data type, I get an error msg "CSV data source does not support struct data type" and I am aware of the cause of the error.
An option would be to use pyspark.sql.functions.to_json to the object data and later use json_extract_path_text() when querying the objects in Redshift.
But I hope there is an approach in AWS glue, that supports a direct transformation and loads object type data to super type (type that Amazon Redshift uses to support JSON columns).
Also, I do not want to flatten the objects, just want to keep them as is. So dynamic_frame.relationalize() is also not a suitable solution.
Any help would be greatly thankful.

create a database in pyspark using Python API's only

I'm unable to locate any API to create a database in pyspark. All I can find is SQL based approach.
catalog doesn't mention a python method to create a database.
Is this even possible? I am trying to avoid using SQL.
I guess it might be a suboptimal solution, but you can call a CREATE DATABASE statement using SparkSession's sql method to create a database, like this:
spark.sql("CREATE DATABASE IF EXISTS test_db")
It's not pure PySpark API, but this way you don't have to switch context to SQL completely, to create a database :)

How do I efficiently migrate the BigQuery Tables to On-Prem Postgres?

I need to migrate the tables from the BigQuery to the on-prem Postgres database.
How can I efficiently achieve that?
Some thoughts that are coming
I will use Google APIs to export the data from the tables
Store it locally
And finally, import to Postgres
But I am not sure if that can be done for a huge amount of data in TBs. Also, how can I automate this process? Can I use Jenkins for that?
Exporting the data from BigQuery, store it and importing it to PostgreSQL is a good approach. Here are other two alternatives that you can consider:
1) There's a PostgreSQL wrapper for BigQuery that allows to query directly from BigQuery. Depending on your case scenario this might be the easiest way to transfer the data; although, for TBs it might not be the best approach. This suggestion was made by #David in this SO question.
2) Using Dataflow. You can create a ETL process using Apache Beam to made the transfer. Take a look at this how-to for transferring data from BigQuery to CloudSQL. You would need to adapt it for local PostgreSQL, but the idea maintains.
Here's another SO answer that gives more context on this approach.

how to call DB2 functions in Informatica ETL?

How to call DB2 functions in Informatica ETL?
Only used views in ETL before, have no idea how to do so for the functions, anyone got any idea?
#MarekGrzenkowicz is correct, it is the same as envoke stored procedures.

Transact SQL - Information Schema

Is there a way to query an Information Schema from DB2 and dump the contents(tables - structure only),into another database? I'm looking for a way to create a shell model of a database schema from DB2 into a SQL Server database?
Thanks
You can use db2look to get the table structure (DDL) out of db2.
Once you've got it, however, I'm afraid you'll have to manually replace any db2-specific syntax (datatypes, storage parameters, etc.) with it's corresponding SQL Server syntax.
It doesn't look like Microsoft's Migration Tool works for db2 yet. :(