Importing a BCP file in Azure database - tsql

I have an Azure function that retrieves a zip file that contains multiple BCP files which unzips them and adds them as blobs.
I now want to import the BCP files into my SQL database but not sure how to go about it. I know I can use following script and run an SqlCommand:
BULK INSERT RegPlusExtract.dbo.extract_class
FROM 'D:\local\data\extract_class.bsp'
WITH ( FIELDTERMINATOR = '#**#',ROWTERMINATOR = '*##*')
But this obviously does not work as the SQL server doesn't have access to the local function's D: drive.
How should I go about loading the data? Is it possible to load the BCP file into memory and then pass the SQLCommand? Or can I pass the file direct to SQL server?
I've found out that for backup/restore I can do FROM URL = ''. If I could use this for bulk insert then I can just reference the blob url, but doesn't look like I can?

You will need to use BLOB storage..below are the steps and these are documented here Microsoft/sql-server-samples
--create an external data source
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://sqlchoice.blob.core.windows.net/sqlchoice/samples/load-from-azure-blob-storage',
-- CREDENTIAL= MyAzureBlobStorageCredential --> CREDENTIAL is not required if a blob storage is public!
);
You also can upload files to a container and reference it like below.Here week3 is a container
CREATE EXTERNAL DATA SOURCE MyAzureInvoicesContainer
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://newinvoices.blob.core.windows.net/week3',
CREDENTIAL = UploadInvoices
);
Now you can use OpenRowset and BulkInsert like below
-- 2.1. INSERT CSV file into Product table
BULK INSERT Product
FROM 'product.csv'
WITH ( DATA_SOURCE = 'MyAzureBlobStorage',
FORMAT='CSV', CODEPAGE = 65001, --UTF-8 encoding
FIRSTROW=2,
TABLOCK);
-- 2.2. INSERT file exported using bcp.exe into Product table
BULK INSERT Product
FROM 'product.bcp'
WITH ( DATA_SOURCE = 'MyAzureBlobStorage',
FORMATFILE='product.fmt',
FORMATFILE_DATA_SOURCE = 'MyAzureBlobStorage',
TABLOCK);
-- 2.3. Read rows from product.dat file using format file and insert it into Product table
INSERT INTO Product WITH (TABLOCK) (Name, Color, Price, Size, Quantity, Data, Tags)
SELECT Name, Color, Price, Size, Quantity, Data, Tags
FROM OPENROWSET(BULK 'product.bcp',
DATA_SOURCE = 'MyAzureBlobStorage',
FORMATFILE='product.fmt',
FORMATFILE_DATA_SOURCE = 'MyAzureBlobStorage') as products;
-- 2.4. Query remote file
SELECT Color, count(*)
FROM OPENROWSET(BULK 'product.bcp',
DATA_SOURCE = 'MyAzureBlobStorage',
FORMATFILE='data/product.fmt',
FORMATFILE_DATA_SOURCE = 'MyAzureBlobStorage') as data
GROUP BY Color;

Related

Copy Data Sink Validation

How to Use Copy data activity to check against sink values
My Data Sources:
SourceDataset : Source_SQL_DB
DestinationDataset : Destination_SQL_DB
SourceTable : SourceTableName
Column : Name,Age,Gender,Location
DestinationTable : DestinationTableName
Column : Name,Age,Gender,Location
Below is my scenario :
I have to validate Source before moving to sinkTable by checking Destination should not have the values
On Copy data, i can directly load data,
How to pass the Location in Source Query since my source will be connecting to source dataset only
select * from SourceTableName where Location in (select distinct Location from DestinationTableName)
How to check is the name present in the destination dataset table, If name is present, i should not insert data.
select * from SourceTableName where name not in (select distinct name from DestinationTableName )
assuming both your source and sink are sql, you can use a lookup activity to get the list of names and location as comma seperated and either save them in a variable or use it to directly in source query.
Another way would be to load the source dara as is in staging table and then leveraging a stored procedure activity.
The final way would be to use dataflows

Using PostgreSQL comments as descriptions in dbt docs

We've been adding comments to the columns in postgres as column descriptions. Similarly, there are descriptions in dbt that can be written.
How would I go about writing SQL to automatically setting the same descriptions in postgres into dbt docs?
Here's how I often do it.
Take a look at this answer on how to pull descriptions from the pg.catalog.
From there, you want to write a BQ query that generates a json which you can then convert to a yaml file you can use directly in dbt.
BQ link - save results as JSON file.
Use a json2yaml tool.
Save yaml file to an appropriate place in your project tree.
Code sample:
-- intended to be saved as JSON and converted to YAML
-- ex. cat script_job_id_1.json | python3 json2yaml.py | tee schema.yml
-- version will be created as version:'2' . Remove quotes after conversion
DECLARE database STRING;
DECLARE dataset STRING;
DECLARE dataset_desc STRING;
DECLARE source_qry STRING;
SET database = "bigquery-public-data";
SET dataset = "census_bureau_acs";
SET dataset_desc = "";
SET source_qry = CONCAT('''CREATE OR REPLACE TEMP TABLE tt_master_table AS ''',
'''(''',
'''SELECT cfp.table_name, ''',
'''cfp.column_name, ''',
'''cfp.description, ''',
'''FROM `''', database, '''`.''', dataset, '''.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS cfp ''',
''')''');
EXECUTE IMMEDIATE source_qry;
WITH column_info AS (
SELECT table_name as name,
ARRAY_AGG(STRUCT(column_name AS name, COALESCE(description,"") AS description)) AS columns
FROM tt_master_table
GROUP by table_name
)
, table_level AS (
SELECT CONCAT(database, ".", dataset) AS name,
database,
dataset,
dataset_desc AS `description`,
ARRAY_AGG(
STRUCT(name, columns)) AS tables
FROM column_info
GROUP BY database,
dataset,
dataset_desc
LIMIT 1)
SELECT CAST(2 AS INT) AS version,
ARRAY_AGG(STRUCT(name, database, dataset, description, tables)) AS sources
FROM table_level
GROUP BY version

Bulk import into Azure

For a Bulk Insert, I have got a data file and a format file (xml);
File.dat
File.xml
This is working OnPremises with a Bulk Insert statement, however in Azure it seems to have a problem with the format file. Below are the steps I have taken
Set Storage Access
Created a Shared Access Signature
Set the container Access Policy to 'Blob (anonymous read access for blobs only)
Create an Database Scoped Credential to the Storage
CREATE DATABASE SCOPED CREDENTIAL StorageCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'This is my secret' (Shared Access Signature Key)
Create an external Data Source
CREATE EXTERNAL DATA SOURCE Storage
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://<storagename>.blob.core.windows.net/<containername>',
CREDENTIAL = StorageCredential
);
File Query (Bulk insert or Openrowset)
BULK INSERT <Schema>.<Table>
FROM 'File.dat'
WITH (
DATA_SOURCE = 'Storage',
FORMATFILE = 'File.xml'
)
or
SELECT * FROM OPENROWSET(
BULK 'File.dat',
DATA_SOURCE = 'Storage',
FORMATFILE = 'File.xml'
) AS DataFile;
They are both not working with the error;
'Cannot bulk load because the file is inclomplete or could not be read'
However if I can succesfully run the following query;
SELECT * FROM OPENROWSET(
BULK 'File.xml',
DATA_SOURCE = 'Storage',
SINGLE_NClob) AS DataFile
I have found the answer and I will post it myself (In case other people also run into this problem).
The datasource of the format file should be specified individually. I tried the way specified in the documentation of Microsoft; Bulk Insert
However there is an error in the parameter name. It states that the correct parameter is 'FORMATFILE_DATASOURCE', however it should be 'FORMATFILE_DATA_SOURCE'. (This is commented at the bottom)
BULK INSERT <Schema>.<Table>
FROM 'File.dat'
WITH (
DATA_SOURCE = 'Storage',
FORMATFILE = 'File.xml',
FORMATFILE_DATA_SOURCE = 'Storage'
)

shredding xml file with xmltable db2

Its possibile give input at XMLTable xml file store in my desktop pc?
How ?
select id, name, T.*
INSERT INTO abc(name)
SELECT x.name
from XMLTABLE('$i/product' PASSING CAST(? AS XML) as "i"
COLUMNS
name VARCHAR(10) PATH 'name',
) as x;
How pass my file store on the desktop in my pc?
thank you
SQL statements can't access files outside of the database manager. To load data in an XML file and write it to a table, you'd either need to use a database utility (like LOAD or IMPORT), or write your own program to read the data from files on the client machine (i.e., your PC) and perform the inserts.

SSIS Import Files with changing layouts

I'm using SSIS 2008 and trying to work on a package for importing a specified file into a table created for its layout. It will take in the destination table & source file as package variables.
The main problem I'm running into is that the file layouts are subject to change, they're not consistent. The table I'd be importing into will match the file though. I had initial success, but soon after changing the source file/destination it throws the vs_needsnewmetadata error.
Are there any workarounds discovered that could potentially be used here for files not fitting the layout the package was designed with?
Edit: These are .txt files, tab-delimited.
Edit2: Tried fiddling with OPENROWSET as well, hit a security error on our server.
I am assuming here that said file is a CSV file.
I have just been faced with the exact same problem a couple of weeks ago. You need to use dynamic SQL to achieve this.
Create a stored procedure on your database with the code below (change the 2 "C:\Folder\" locations to the location of your file):
CREATE PROCEDURE [dbo].[CreateAndImportCSVs] (#FILENAME NVARCHAR(200))
AS
BEGIN
SET NOCOUNT ON;
DECLARE #PATH NVARCHAR(4000) = N'C:\Folder\' + #FILENAME + ''
DECLARE #TABLE NVARCHAR(50) = SUBSTRING(#FILENAME,0,CHARINDEX('.',#FILENAME))
DECLARE #SQL NVARCHAR(4000) = N'IF OBJECT_ID(''dbo.' + #TABLE + ''' , ''U'') IS NOT NULL DROP TABLE dbo.[' + #TABLE + ']
SELECT * INTO [' + #TABLE + ']
FROM OPENROWSET(''MSDASQL''
,''Driver={Microsoft Access Text Driver (*.txt, *.csv)};DefaultDir=C:\Folder;''
,''SELECT * FROM ' + #FILENAME + ''')'
EXEC(#SQL)
END
You might need to download the Microsoft Access Database Engine from:
https://www.microsoft.com/en-gb/download/details.aspx?id=13255
and install on your machine/server for the Microsoft Access Text Driver to work.
Then create an Execute SQL Task in SSIS with the relevant connection details to your SQL server database. Then pass the file name to the stored procedure you created:
EXEC dbo.CreateAndImportCSVs 'filename.csv'
It will then create the table based on the structure and data contained within the CSV, it also names the table the same as the csv file name.
*This stored procedure can also be used to run through a list of files.
Hope this helps!