Copy multiple csv files to PostgreSQL

Copy multiple csv files to PostgreSQL - postgresql

I have multiple csv files and need to copy them into one postgres database. Besides, I want to add one more column for my table which can display the name from each file. Anybody can help on this? Thanks.

You can create a temp table with the structure of data from your CSV and another with the same structure and adding the column to the name of CSV file, and make a plpgsql code to load the files into the table, for example:
Files:
file1.csv
hello,world21,2021
hello,world20,2020
hello,world12,2019
file2.csv
hello,world18,2018
hello,world17,2017
hello,world16,2016
Code to load csv files
create table csvtable (file text, hi text, world text, yr integer );
do $$
declare
files text[]:=array['file1.csv','file2.csv'];
copy_command text;
x text;
begin
create table csvtable_tmp ( hi text, world text, yr integer );
FOREACH x IN ARRAY files
LOOP
copy_command := 'copy csvtable_tmp from '''||x || '''csv delimiter '',''';
execute copy_command;
insert into csvtable select x,* from csvtable_tmp;
truncate csvtable_tmp;
END LOOP;
drop table csvtable_tmp;
end;
$$
db=# select * from csvtable;
file | hi | world | yr
-----------+-------+---------+------
file1.csv | hello | world21 | 2021
file1.csv | hello | world20 | 2020
file1.csv | hello | world12 | 2019
file2.csv | hello | world18 | 2018
file2.csv | hello | world17 | 2017
file2.csv | hello | world16 | 2016
(6 rows)

Related

Bad or inaccessible location specified in external data source

I'm trying to save a file from Azure File Storage into Azure SQL Database table varbinary(max) column (store whole content as advised in this answer). I've tried a few times to adjust my SQL query but without success. Here's the code which results in error 'Bad or inaccessible location specified in external data source "my_Azure_Files".' when it invokes OPENROWSET:
OPEN MASTER KEY DECRYPTION BY PASSWORD = 'mypassword123'
GO
CREATE DATABASE SCOPED CREDENTIAL [https://mystorageaccount.file.core.windows.net/]
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sas_token_generated_on_azure_portal';
CREATE EXTERNAL DATA SOURCE my_Azure_Files
WITH (
LOCATION = 'https://mystorageaccount.file.core.windows.net/test',
CREDENTIAL = [https://mystorageaccount.file.core.windows.net/],
TYPE = BLOB_STORAGE
);
Insert into dbo.myTable(targetColumn)
Select BulkColumn FROM OPENROWSET(
BULK 'test.csv',
DATA_SOURCE = 'my_Azure_Files',
SINGLE_BLOB) AS testFile;
CLOSE MASTER KEY;
GO
I'm able to download the test.csv file by a web-browser using the same SAS token and url path. I'm also able to verify that the credential and the external source are successfully created in the database:
+-------------------------------------------------+------------------+-----------------------------------------------------+-------------------------+------------------+---------------------------+---------------+---------------+----------------+--------------------+----------+
| data_source_id | name | location | type_desc | type | resource_manager_location | credential_id | database_name | shard_map_name | connection_options | pushdown |
+-------------------------------------------------+------------------+-----------------------------------------------------+-------------------------+------------------+---------------------------+---------------+---------------+----------------+--------------------+----------+
| 65540 | my_Azure_Files | https://mystorageaccount.file.core.windows.net/test | BLOB_STORAGE | 05/01/1900 00:00 | NULL | 65539 | NULL | NULL | NULL | ON |
| name | principal_id | credential_id | credential_identity | create_date | modify_date | target_type | target_id | | | |
+-------------------------------------------------+------------------+-----------------------------------------------------+-------------------------+------------------+---------------------------+---------------+---------------+----------------+--------------------+----------+
| https://mystorageaccount.file.core.windows.net/ | 1 | 65539 | SHARED ACCESS SIGNATURE | 15/07/2020 13:14 | 15/07/2020 13:14 | NULL | NULL | | | |
When creating SAS on Azure portal I checked all allowed resource types and all allowed permissions, except 'Delete'. I also removed the leading '?' from SAS to use in the SECRET field.
I've tried variations of TYPE = BLOB_STORAGE and TYPE = HADOOP as well as SINGLE_BLOB, SINGLE_CLOB and SINGLE_NCLOB parameters.
Please help me solve my problem.

By following below steps, able to successfully insert into the target table:
While generating the SAS, please select Allowed Resource Type as ‘Container’ and ‘Object’:
Copy the SAS and use below command:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'password#123'
Use SAS token generated without ‘?’ at the start and create Scoped Credentials:
CREATE DATABASE SCOPED CREDENTIAL MyAzureBlobStorageCredential WITH IDENTITY =
'SHARED ACCESS SIGNATURE', SECRET = 'sv=2019-10-
10XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';
Create External Data Source referencing your blob path:
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://mystorageaccount.file.core.windows.net'
, CREDENTIAL= MyAzureBlobStorageCredential
);
Run the insert using OPENROWSET:
Insert into dbo.test(name1)
Select BulkColumn FROM OPENROWSET(
BULK 'test/test.csv',
DATA_SOURCE = 'MyAzureBlobStorage',
SINGLE_BLOB) AS testFile;
Can also use Bulk insert:
BULK INSERT dbo.test
FROM 'test/test.csv'
WITH (DATA_SOURCE = 'MyAzureBlobStorage',
FORMAT = 'CSV');
Assuming table dbo.test is already created

Dynamic Column name changes every year in where clause

I'm trying to automate a TSQL select statement on a website. Each year the column names change with the number values at the end of the name increasing by 1 so instead of manually updating the site I'm trying to figure out how to include the dynamic column name in the where clause.
The data looks something like this.
+------+------+------+------+------+
| FY18 | FY19 | FY20 | FY21 | FY22 |
+------+------+------+------+------+
| 1 | 0 | 1 | 0 | 1 |
| 1 | 1 | 0 | 1 | 0 |
| 1 | 0 | 0 | 1 | 0 |
+------+------+------+------+------+
Here is what I've come up with so far... The select statement looks something like this
Select distinct
POS
from TBL_Staff
where [' 'FY'+right(year(dateadd(month,3,getdate()))-1,2) '] = 1
What I'm trying to figure it out is if there is a way to dynamically generate the date and get SQl to recognize name+date as a column
Note: This is fake data so please let me know if something isn't clear
Any help on this is most appreciated.

You could use dynamic SQL
DECLARE #sql NVARCHAR(MAX) = 'Select distinct
POS
from TBL_Staff
where FY' + CAST(right(year(dateadd(month,3,getdate()))-1,2) AS VARCHAR) + ' = 1'
exec sp_executesql #sql
#sql is a variable that is used to construct a SQL statement dynamically. Then the sp_executesql procedure executes it. Beware of using dynamic sql when other alternatives exist. It's harder to read, can be difficult to maintain, and can be a security issue if taking input from users and you're not careful to sanitize input parameters.

How to preserve new line character while performing psql copy command

I have following content in my csv file(with 3 columns):
141413,"\"'/x=/></script></title><x><x/","Mountain View, CA\"'/x=/></script></title><x><x/"
148443,"CLICK LINK BELOW TO ENTER^^^^^^^^^^^^^^","model\
\
xxx lipsum as it is\
\
100 sometimes unknown\
\
travel evening market\
"
When I import above mentioned csv in mysql using following command, it treats the backslash() as new line; which is the expected behavior.
LOAD DATA INFILE '1.csv' INTO TABLE users FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';
MYSQL Output
But when I try to import to psql using copy command, it treats \ as a normal character.
copy users from '1.csv' WITH (FORMAT csv, DELIMITER ',', ENCODING 'utf8', NULL "\N", QUOTE E'\"', ESCAPE '\');
postgres Output

Try parsing these \ before importing the CSV file, e.g. using perl -pe or sed and the STDIN from psql:
$ cat 1.csv | perl -pe 's/\\\n/\n/g' | psql testdb -c "COPY users FROM STDIN WITH (FORMAT csv, DELIMITER ',', ENCODING 'utf8', NULL "\N", QUOTE E'\"', ESCAPE '\');"
This is how it looks like after the import:
testdb=# select * from users;
id | company | location
--------+-----------------------------------------+-------------------------------------------------
141413 | "'/x=/></script></title><x><x/ | Mountain View, CA"'/x=/></script></title><x><x/
148443 | CLICK LINK BELOW TO ENTER^^^^^^^^^^^^^^ | model +
| | +
| | xxx lipsum as it is +
| | +
| | 100 sometimes unknown +
| | +
| | travel evening market +
| |
(2 Zeilen)

How to split the denominator value from one column and store it in another column using perl?

My example code output:
time | name | status | s_used | s_max |
+------------+-------------+-----------+------------+-----------+
| 1482222363 | asf | Closed | 0/16 | 0 |
| 1482222363 | as0 | Available | 4/16 | 4 |
I have attached the part of my output which is generated using perl cgi script and mysql database.
My query is how to take denominator value from the column s_used and store only the denominator values in the s_max column using perl.
3.I had attached the following part of code which i tried.
if($i == 4){
if(/s_used/){
print;
}
else{
chomp();
my($num,$s_max)=split /\//,$table_data{2}{'ENTRY'};
print $s_max;
}
}
Code Explanation:
$i == 4 is the column where should I store the variable.
I got time column from the sql database $time, name I got from $table_data{0}{'ENTRY'}, status from $table_data{1}{'ENTRY'}, s_used from $table_Data{2}{'ENTRY'}.
Expected output:
time | name | status | s_used | s_max |
+------------+-------------+-----------+------------+-----------+
| 1482222363 | asf | Closed | 0/16 | 16 |
| 1482222363 | as0 | Available | 4/16 | 16 |

Seems your code "my($num,$s_max)=split /\//,$table_data{2}{'ENTRY'};" is right.
Somehow the value $s_max at the time it's writing to the DB is incorrect. Since you did not post the portion of code to show the part $s_max writing back to the DB, you need to check what value is in $s_max (e.g. printing the $s_max value) at the time right before writing it back to DB. From there, please try to trace back why an incorrect value is assigned to $s_max. Then, the problem would be solved.

Powershell Read the table name and use it in a line

Hi i am new to powershell and i have a scenario where i have a script to read all the CREATE TABLES from a sql file. Before these CREATE TABLES statement i have to print a IF EXIST statement which include the table name of the CREATE TABLE statement

Just uses a regex replace, something like this:
$x = Get-Content my_file.sql -raw
$r = #'
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = N'$1')
DROP TABLE [dbo].[$1]
--GO
CREATE TABLE [dbo].[$1]
'#
$x -replace 'CREATE TABLE \[dbo\]\.\[([^\]]+)\]', $r

If $SqlStatements is a variable holding the content then this works.
$ifExists = #'
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = N'{0}')
DROP TABLE {1}
--GO
'#
[Regex]::Matches($SqlStatements, '\s*CREATE TABLE (\S+\.\[(\S+)\])') | Sort-Object Index -Descending | ForEach-Object {
$SqlStatements = $SqlStatements.Insert(
$_.Index,
"`r`n" + ($ifExists -f $_.Groups[2].Value, $_.Groups[1].Value) + "`r`n"
)
}
$SqlStatements
Replacements are done from the end tracking backwards. Attempting to go forwards will invalidate the Index value after the first insert.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Copy multiple csv files to PostgreSQL - postgresql

I have multiple csv files and need to copy them into one postgres database. Besides, I want to add one more column for my table which can display the name from each file. Anybody can help on this? Thanks.

Related

Bad or inaccessible location specified in external data source

Dynamic Column name changes every year in where clause

How to preserve new line character while performing psql copy command

How to split the denominator value from one column and store it in another column using perl?

Powershell Read the table name and use it in a line

Categories

Resources