Copy multiple csv files to PostgreSQL - postgresql

I have multiple csv files and need to copy them into one postgres database. Besides, I want to add one more column for my table which can display the name from each file. Anybody can help on this? Thanks.

You can create a temp table with the structure of data from your CSV and another with the same structure and adding the column to the name of CSV file, and make a plpgsql code to load the files into the table, for example:
Files:
file1.csv
hello,world21,2021
hello,world20,2020
hello,world12,2019
file2.csv
hello,world18,2018
hello,world17,2017
hello,world16,2016
Code to load csv files
create table csvtable (file text, hi text, world text, yr integer );
do $$
declare
files text[]:=array['file1.csv','file2.csv'];
copy_command text;
x text;
begin
create table csvtable_tmp ( hi text, world text, yr integer );
FOREACH x IN ARRAY files
LOOP
copy_command := 'copy csvtable_tmp from '''||x || '''csv delimiter '',''';
execute copy_command;
insert into csvtable select x,* from csvtable_tmp;
truncate csvtable_tmp;
END LOOP;
drop table csvtable_tmp;
end;
$$
db=# select * from csvtable;
file | hi | world | yr
-----------+-------+---------+------
file1.csv | hello | world21 | 2021
file1.csv | hello | world20 | 2020
file1.csv | hello | world12 | 2019
file2.csv | hello | world18 | 2018
file2.csv | hello | world17 | 2017
file2.csv | hello | world16 | 2016
(6 rows)

Related

Bad or inaccessible location specified in external data source

I'm trying to save a file from Azure File Storage into Azure SQL Database table varbinary(max) column (store whole content as advised in this answer). I've tried a few times to adjust my SQL query but without success. Here's the code which results in error 'Bad or inaccessible location specified in external data source "my_Azure_Files".' when it invokes OPENROWSET:
OPEN MASTER KEY DECRYPTION BY PASSWORD = 'mypassword123'
GO
CREATE DATABASE SCOPED CREDENTIAL [https://mystorageaccount.file.core.windows.net/]
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sas_token_generated_on_azure_portal';
CREATE EXTERNAL DATA SOURCE my_Azure_Files
WITH (
LOCATION = 'https://mystorageaccount.file.core.windows.net/test',
CREDENTIAL = [https://mystorageaccount.file.core.windows.net/],
TYPE = BLOB_STORAGE
);
Insert into dbo.myTable(targetColumn)
Select BulkColumn FROM OPENROWSET(
BULK 'test.csv',
DATA_SOURCE = 'my_Azure_Files',
SINGLE_BLOB) AS testFile;
CLOSE MASTER KEY;
GO
I'm able to download the test.csv file by a web-browser using the same SAS token and url path. I'm also able to verify that the credential and the external source are successfully created in the database:
+-------------------------------------------------+------------------+-----------------------------------------------------+-------------------------+------------------+---------------------------+---------------+---------------+----------------+--------------------+----------+
| data_source_id | name | location | type_desc | type | resource_manager_location | credential_id | database_name | shard_map_name | connection_options | pushdown |
+-------------------------------------------------+------------------+-----------------------------------------------------+-------------------------+------------------+---------------------------+---------------+---------------+----------------+--------------------+----------+
| 65540 | my_Azure_Files | https://mystorageaccount.file.core.windows.net/test | BLOB_STORAGE | 05/01/1900 00:00 | NULL | 65539 | NULL | NULL | NULL | ON |
| name | principal_id | credential_id | credential_identity | create_date | modify_date | target_type | target_id | | | |
+-------------------------------------------------+------------------+-----------------------------------------------------+-------------------------+------------------+---------------------------+---------------+---------------+----------------+--------------------+----------+
| https://mystorageaccount.file.core.windows.net/ | 1 | 65539 | SHARED ACCESS SIGNATURE | 15/07/2020 13:14 | 15/07/2020 13:14 | NULL | NULL | | | |
When creating SAS on Azure portal I checked all allowed resource types and all allowed permissions, except 'Delete'. I also removed the leading '?' from SAS to use in the SECRET field.
I've tried variations of TYPE = BLOB_STORAGE and TYPE = HADOOP as well as SINGLE_BLOB, SINGLE_CLOB and SINGLE_NCLOB parameters.
Please help me solve my problem.
By following below steps, able to successfully insert into the target table:
While generating the SAS, please select Allowed Resource Type as ‘Container’ and ‘Object’:
Copy the SAS and use below command:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'password#123'
Use SAS token generated without ‘?’ at the start and create Scoped Credentials:
CREATE DATABASE SCOPED CREDENTIAL MyAzureBlobStorageCredential WITH IDENTITY =
'SHARED ACCESS SIGNATURE', SECRET = 'sv=2019-10-
10XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';
Create External Data Source referencing your blob path:
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://mystorageaccount.file.core.windows.net'
, CREDENTIAL= MyAzureBlobStorageCredential
);
Run the insert using OPENROWSET:
Insert into dbo.test(name1)
Select BulkColumn FROM OPENROWSET(
BULK 'test/test.csv',
DATA_SOURCE = 'MyAzureBlobStorage',
SINGLE_BLOB) AS testFile;
Can also use Bulk insert:
BULK INSERT dbo.test
FROM 'test/test.csv'
WITH (DATA_SOURCE = 'MyAzureBlobStorage',
FORMAT = 'CSV');
Assuming table dbo.test is already created

Dynamic Column name changes every year in where clause

I'm trying to automate a TSQL select statement on a website. Each year the column names change with the number values at the end of the name increasing by 1 so instead of manually updating the site I'm trying to figure out how to include the dynamic column name in the where clause.
The data looks something like this.
+------+------+------+------+------+
| FY18 | FY19 | FY20 | FY21 | FY22 |
+------+------+------+------+------+
| 1 | 0 | 1 | 0 | 1 |
| 1 | 1 | 0 | 1 | 0 |
| 1 | 0 | 0 | 1 | 0 |
+------+------+------+------+------+
Here is what I've come up with so far... The select statement looks something like this
Select distinct
POS
from TBL_Staff
where [' 'FY'+right(year(dateadd(month,3,getdate()))-1,2) '] = 1
What I'm trying to figure it out is if there is a way to dynamically generate the date and get SQl to recognize name+date as a column
Note: This is fake data so please let me know if something isn't clear
Any help on this is most appreciated.
You could use dynamic SQL
DECLARE #sql NVARCHAR(MAX) = 'Select distinct
POS
from TBL_Staff
where FY' + CAST(right(year(dateadd(month,3,getdate()))-1,2) AS VARCHAR) + ' = 1'
exec sp_executesql #sql
#sql is a variable that is used to construct a SQL statement dynamically. Then the sp_executesql procedure executes it. Beware of using dynamic sql when other alternatives exist. It's harder to read, can be difficult to maintain, and can be a security issue if taking input from users and you're not careful to sanitize input parameters.

How to preserve new line character while performing psql copy command

I have following content in my csv file(with 3 columns):
141413,"\"'/x=/></script></title><x><x/","Mountain View, CA\"'/x=/></script></title><x><x/"
148443,"CLICK LINK BELOW TO ENTER^^^^^^^^^^^^^^","model\
\
xxx lipsum as it is\
\
100 sometimes unknown\
\
travel evening market\
"
When I import above mentioned csv in mysql using following command, it treats the backslash() as new line; which is the expected behavior.
LOAD DATA INFILE '1.csv' INTO TABLE users FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';
MYSQL Output
But when I try to import to psql using copy command, it treats \ as a normal character.
copy users from '1.csv' WITH (FORMAT csv, DELIMITER ',', ENCODING 'utf8', NULL "\N", QUOTE E'\"', ESCAPE '\');
postgres Output
Try parsing these \ before importing the CSV file, e.g. using perl -pe or sed and the STDIN from psql:
$ cat 1.csv | perl -pe 's/\\\n/\n/g' | psql testdb -c "COPY users FROM STDIN WITH (FORMAT csv, DELIMITER ',', ENCODING 'utf8', NULL "\N", QUOTE E'\"', ESCAPE '\');"
This is how it looks like after the import:
testdb=# select * from users;
id | company | location
--------+-----------------------------------------+-------------------------------------------------
141413 | "'/x=/></script></title><x><x/ | Mountain View, CA"'/x=/></script></title><x><x/
148443 | CLICK LINK BELOW TO ENTER^^^^^^^^^^^^^^ | model +
| | +
| | xxx lipsum as it is +
| | +
| | 100 sometimes unknown +
| | +
| | travel evening market +
| |
(2 Zeilen)

How to split the denominator value from one column and store it in another column using perl?

My example code output:
time | name | status | s_used | s_max |
+------------+-------------+-----------+------------+-----------+
| 1482222363 | asf | Closed | 0/16 | 0 |
| 1482222363 | as0 | Available | 4/16 | 4 |
I have attached the part of my output which is generated using perl cgi script and mysql database.
My query is how to take denominator value from the column s_used and store only the denominator values in the s_max column using perl.
3.I had attached the following part of code which i tried.
if($i == 4){
if(/s_used/){
print;
}
else{
chomp();
my($num,$s_max)=split /\//,$table_data{2}{'ENTRY'};
print $s_max;
}
}
Code Explanation:
$i == 4 is the column where should I store the variable.
I got time column from the sql database $time, name I got from $table_data{0}{'ENTRY'}, status from $table_data{1}{'ENTRY'}, s_used from $table_Data{2}{'ENTRY'}.
Expected output:
time | name | status | s_used | s_max |
+------------+-------------+-----------+------------+-----------+
| 1482222363 | asf | Closed | 0/16 | 16 |
| 1482222363 | as0 | Available | 4/16 | 16 |
Seems your code "my($num,$s_max)=split /\//,$table_data{2}{'ENTRY'};" is right.
Somehow the value $s_max at the time it's writing to the DB is incorrect. Since you did not post the portion of code to show the part $s_max writing back to the DB, you need to check what value is in $s_max (e.g. printing the $s_max value) at the time right before writing it back to DB. From there, please try to trace back why an incorrect value is assigned to $s_max. Then, the problem would be solved.

Powershell Read the table name and use it in a line

Hi i am new to powershell and i have a scenario where i have a script to read all the CREATE TABLES from a sql file. Before these CREATE TABLES statement i have to print a IF EXIST statement which include the table name of the CREATE TABLE statement
Just uses a regex replace, something like this:
$x = Get-Content my_file.sql -raw
$r = #'
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = N'$1')
DROP TABLE [dbo].[$1]
--GO
CREATE TABLE [dbo].[$1]
'#
$x -replace 'CREATE TABLE \[dbo\]\.\[([^\]]+)\]', $r
If $SqlStatements is a variable holding the content then this works.
$ifExists = #'
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = N'{0}')
DROP TABLE {1}
--GO
'#
[Regex]::Matches($SqlStatements, '\s*CREATE TABLE (\S+\.\[(\S+)\])') | Sort-Object Index -Descending | ForEach-Object {
$SqlStatements = $SqlStatements.Insert(
$_.Index,
"`r`n" + ($ifExists -f $_.Groups[2].Value, $_.Groups[1].Value) + "`r`n"
)
}
$SqlStatements
Replacements are done from the end tracking backwards. Attempting to go forwards will invalidate the Index value after the first insert.