I am using a Custom SQL in Amazon QuickSight for joining several tables from RedShift. I wonder where the join happens, does QuickSight sends the query to the RedShift cluster and gets the results back, or does the join happen in QuickSight? I thought to create a view in RedShift and select data from the view to make sure the join happens in RedShift, however, read in few articles that using views in RedShift is not a good idea.
Quicksight pushes SQL down to the underlying database e.g. Redshift.
Using custom SQL is the same as using a view inside Redshift from a performance point of view.
In my opinion it is easier to manage as a Redshift view as you can:
Use Quicksight wizards more effectively
Drop and recreate the view as needed to add new columns
Have visibility into your SQL source code by storing it on a code
repo e.g. git.
Related
I have a glue connection which connects to redshift using jdbc.
The redshift has spectrum which automatically fetches latest data from S3 location.
I need to create a view from that spectrum while utilizing the glue connection!
I have tried using below code where I have added sql query in post_actions to refresh redshift view!
datasink1 = glueContext.write_dynamic_frame.from_jdbc_conf(frame=inputGDF_final,
catalog_connection=f"{srvc_user}",connection_options={"preactions":pre_query,"dbtable": f"
{table_dbname}.{table}","database": f"{database}","postactions": post_query},
redshift_tmp_dir=args["TempDir"],transformation_ctx="datasink1")
But issue with the above piece of code is that it creates a table in redshift and then uses that table to create view. I need to utilize the redshift spectrum and not table!
Can anyone please help!
I have an Aurora Serverless instance which has data loaded across 3 tables (mixture of standard and jsonb data types). We currently use traditional views where some of the deeply nested elements are surfaced along with other columns for aggregations and such.
We have two materialized views that we'd like to send to Redshift. Both the Aurora Postgres and Redshift are in Glue Catalog and while I can see Postgres views as a selectable table, the crawler does not pick up the materialized views.
Currently exploring two options to get the data to redshift.
Output to parquet and use copy to load
Point the Materialized view to jdbc sink specifying redshift.
Wanted recommendations on what might be most efficient approach if anyone has done a similar use case.
Questions:
In option 1, would I be able to handle incremental loads?
Is bookmarking supported for JDBC (Aurora Postgres) to JDBC (Redshift) transactions even if through Glue?
Is there a better way (other than the options I am considering) to move the data from Aurora Postgres Serverless (10.14) to Redshift.
Thanks in advance for any guidance provided.
Went with option 2. The Redshift Copy/Load process writes csv with manifest to S3 in any case so duplicating that is pointless.
Regarding the Questions:
N/A
Job Bookmarking does work. There is some gotchas though - ensure Connections both to RDS and Redshift are present in Glue Pyspark job, IAM self ref rules are in place and to identify a row that is unique [I chose the primary key of underlying table as an additional column in my materialized view] to use as the bookmark.
Using the primary key of core table may buy efficiencies in pruning materialized views during maintenance cycles. Just retrieve latest bookmark from cli using aws glue get-job-bookmark --job-name yourjobname and then just that in the where clause of the mv as where id >= idinbookmark
conn = glueContext.extract_jdbc_conf("yourGlueCatalogdBConnection")
connection_options_source = { "url": conn['url'] + "/yourdB", "dbtable": "table in dB", "user": conn['user'], "password": conn['password'], "jobBookmarkKeys":["unique identifier from source table"], "jobBookmarkKeysSortOrder":"asc"}
datasource0 = glueContext.create_dynamic_frame.from_options(connection_type="postgresql", connection_options=connection_options_source, transformation_ctx="datasource0")
That's all, folks
I am looking to find the data source of couple of Tables in Redshift. I have gone through all the stored procedures in Redshift instance. I couldn't find any stored procedure which populates these tables in Redshift. I have also checked the Data Migration Service and didn't see these tables are being migrated from RDS instance. However, the tables are updated regularly each day.
What would be the way to find how data is populated in those 2 tables? Is there any logs or system tables I can look in to?
One place I'd look is svl_statementtext. That will pull any queries and utility queries that may be inserting or running copy jobs against that table. Just use a WHERE text LIKE %yourtablenamehere% and see what comes back.
https://docs.aws.amazon.com/redshift/latest/dg/r_SVL_STATEMENTTEXT.html
Also check scheduled queries in the Redshift UI console.
I am rebuilding a system so instead of using multiple Access and Excel files the business I am working for, uses SSRS for reporting requirements. Most things are coming along great but I have one sticking point.
One of the Access databases has a table within itself rather than replying on the server data, which is there to keep the grade level of staff up to date (as it is a very complex method on how staff go up a grade).
Now I could easily build a new table in the SQL Server, but I do not want management relying on me for updating this particular table. I could also rebuild the Access database to upload the data to the server which is probably what I will do, but what I wanted to ask first is there a way to join to the table in the Access Database from the T-SQL query, as if it was another table in the main database?
Yes. Attach the database file under Server Objects as a Linked Server.
To ease referencing the table in this, create a view in your database that "hides" the needed weird triple-dot syntax, like:
SELECT FIELD1, FIELD2, FIELDN
FROM THELINKEDSERVERNAME...YourTable AS LinkedYourTable
Then use this view to read the table.
I've gotten SQL Developer to be able to connect to my RedShift cluster but I can only see the tables. I can't see any of the views? Does anyone know if that's possible or if I should just be considering a different tool to work with Redshift?
If I use other tools (i.e. Aginity) I can see the views so I know it's not a DB permission thing, it must be a JDBC+SQL-Develop thing.
What tool do you use for RedShift?
Our redshift support is presented for a singular use case: copying/moving your data to the Oracle Autonomous Data Warehouse (ADW)
Can you connect and browse to a redshift instance? Yes. But that's not the desired use case.