How Do i read the Lake database in Azure Synapse in a PySpark notebook - pyspark

Hi I created a Database in Azure Synapse Studio and I can see the database and table in there, Now I have created a Notebook where I have added the required libraries but I am unable to read the table by below code. Can anyone fix what wrong am i doing here ?
My database name is Utilities_66_Demo . It gives me error as
AnalysisException: Path does not exist:
abfss://users#stcdmsynapsedev01.dfs.core.windows.net/Utilities_66_Demo.parquet
From where should I take the path? I tried to follow the MS article. Where Do I read path? if I click on edit Database, i get this
%%pyspark
df = spark.read.load('abfss://users#stcdmsynapsedev01.dfs.core.windows.net/Utilities_66_Demo.parquet', format='parquet')
display(df.limit(10))
Trying to access the created Lake Database Table:
Selected Azure Synapse Analytics:
I select my workspace and in dropdown there is no table shown:
I select Edit and put my Db name and Table name and it says Invalid
details.
Now I select Azure Dedicated Synapse Pool from Linked Service,
I get no option to select in SQL Pool or Table, and without SQL Pool I am unable to create a Linked service just by inserting Table name:

You can directly go to your ADLS and right click the parquet file and select properties. There, you will be able to find the ABFSS path which is in the format :
abfss://<container_name>#<storage_account_name>.dfs.core.windows.net/<path

Related

export Amazon RDS into S3 or locally

i am using Amazon RDS Aurora postgreSQL 10.18, i need to export a specific tables with more than 50,000 rows into csv file (either local or into s3 bucket), i have tried many procedure but ended up with fail :
i tried the button export to csv from the query editor after select all rows but the API response with too large data to return
i tried to use aws_s3.query_export_to_s3, but ERROR: (credentials stored with the database cluster can’t be accessed Hint: Has the IAM role Amazon Resource Name (ARN) been associated with the feature-name "s3Export")
i tried to take a snapshot from our instance, then export it into s3 bucket but ended up with error (The specified db snapshot engine mode isn’t supported and can’t be exported)

How to read tables from synapse database tables using pyspark

I am a newbie to Azure Synapse, I have to work on the Azure spark notebook. One of my colleagues connected the on-prime database using the azure link service. Now I have written a test framework for comparing the on-prime data and data-lake(curated) data. but I don't understand how to read those tables using Pyspark.
here is my linked service data structure.
enter image description here
here my Link service names and Database name.
You can read any file as a table which is stored in Synapse Linked location by using Azure Synapse Dedicated SQL Pool Connector for Apache Spark.
First you need to read the file which you need to read as the table in Synapse. Use below code to read the file.
%%pyspark
df = spark.read.load('abfss://sampleadls2#sampleadls1.dfs.core.windows.net/business.csv', format='csv', header=True)
Then convert this file into table using the code below:
%%pyspark
spark.sql("CREATE DATABASE IF NOT EXISTS business")
df.write.mode("overwrite").saveAsTable("business.data")
Refer below image.
Now you can run any Spark SQL command on this table as shown below:
%%pyspark
data = spark.sql("SELECT * FROM business.data")
display(data)
See the output in below image.

o110.pyWriteDynamicFrame. null

I have created a visual job in AWS Glue where I extract data from Snowflake and then my target is a postgresql database in AWS.
I have been able to connect to both Snowflak and Postgre, I can preview data from both.
I have also been able to get data from snoflake, write to s3 as csv and then take that csv and upload it to postgre.
However when I try to get data from snowflake and push it to postgre I get the below error:
o110.pyWriteDynamicFrame. null
So it means that you can get the data from snowflake in a Datafarme and while writing the data from this datafarme to postgres, you are failing.
You need to check was glue logs to get more understanding why is this failing while writing the data into postgres.
Please check if you have the right version of jars (needed by postgres) compatible with scala(on was glue side).

Insert data into Redshift from Windows txt files

I have 50 txt files on windows and I would like to insert their data into a single table on Redshift.
I created the basic table structure and now I'm having issues with inserting the data. I tried using COPY command from SQLWorkbench/J but it didn't work out.
Here's the command:
copy feed
from 'F:\Data\feed\feed1.txt'
credentials 'aws_access_key_id=<access>;aws_secret_access_key=<key>'
Here's the error:
-----------------------------------------------
error: CREDENTIALS argument is not supported when loading from file system
code: 8001
context:
query: 0
location: xen_load_unload.cpp:333
process: padbmaster [pid=1970]
-----------------------------------------------;
Upon removing the Credentials argument, here's the error I get:
[Amazon](500310) Invalid operation: LOAD source is not supported. (Hint: only S3 or DynamoDB or EMR based load is allowed);
I'm not a UNIX user so I don't really know how this should be done. Any help in this regard would be appreciated.
#patthebug is correct in that Redshift cannot see your local Windows drive. You must push the data into an S3 bucket. There are some additional sources you can use per http://docs.aws.amazon.com/redshift/latest/dg/t_Loading_tables_with_the_COPY_command.html, but they seem outside the context you're working with. I suggest you get a copy of Cloudberry Explorer (http://www.cloudberrylab.com/free-amazon-s3-explorer-cloudfront-IAM.aspx) which you can use to copy those files up to S3.

Rename SQL Azure database?

How can i rename the database in sql Azure?
I have tried Alter database old_name {MODIFY NAME = new_name} but not worked.
Is this feature available in SQL Azure or not?
Just so people don't have to search through the comments to find this... Use:
ALTER DATABASE [dbname] MODIFY NAME = [newdbname]
(Make sure you include the square brackets around both database names.)
Please check that you've connected to master database and you not trying to rename system database.
Please find more info here: https://msdn.microsoft.com/en-US/library/ms345378.aspx
You can also connect with SQL Server Management Studio and rename it in Object Explorer. I just did so and the Azure Portal reflected the change immediately.
Do this by clicking on the database name (as the rename option from the dropdown will be greyed out)
Connect with SQL Server Management Studio to your Azure database server, right-click on the master database and select 'New Query'. In the New Query window that will open type ALTER DATABASE [dbname] MODIFY NAME = [newdbname].
It's Very simple for now - Connect to DB via SQL Management Studio and Just rename as you generally doing for DB [Press F2 on DB name]. It will allow you to do this and it will immediately reflect the same.
I can confirm the
ALTER DATABASE [oldname] MODIFY NAME = [newname];
works without connecting to master first BUT if you are renaming a restored Azure database; don't miss the space before the final hyphen
ALTER DATABASE [oldname_2017-04-23T09 -17Z] MODIFY NAME = [newname];
And be prepared for a confusing error message in the Visual Studio 2017 Message window when executing the ALTER command
Msg 0, Level 20, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.
You can easily do it from SQL Server Management Studio, Even from the community edition.