How to put xml from zip in a postgres table?

How to put xml from zip in a postgres table? - postgresql

How can I put xml files from a zip file into a postgresql table?
COPY FROM 'zip/file.zip/*.xml';
There are a lot of xml files in the zip file.
I want to add rows containing xml files to the table.

Related

aws_s3.query_export_to_s3 PostgreSQL RDS extension exporting all multi-part CSV files to S3 with a header

I'm using the aws_s3.query_export_to_s3 function to export data from an Amazon Aurora Postgresql database to S3 in CSV format with a header row.
This works.
However, when the export is large and outputs to multiple part files, the first part file has the CSV header row, and subsequent part files do not.
SELECT * FROM aws_s3.query_export_to_s3(
'SELECT ...',
aws_commons.create_s3_uri(...),
options:='format csv, HEADER true'
);
How can I make this export add the header row to all CSV file parts?
I'm using Apache Spark to load this CSV data and it expects a header row in each individual part file.

How can I make this export add the header row to all part filess?
It's not possible, unfortunately.
The aws_s3.query_export_to_s3 function uses the PostgreSQL COPY command under the hood & then chunks the files appropriately depending on size.
Unless the extension picks up on the HEADER true option, caches the header & then provides an option to apply that to every CSV file generated, you're out of luck.
The expectation is that the files are then combined at destination when downloaded or the file processor has some mechanism of reading files in parts or the file processor only needs the header once.

How to create unique list of codes from multiple files in multiple subfolders in ADF?

We have a folder structure in data lake like this:
folder1/subfolderA/parquet files
folder1/subfolderB/parquet files
folder1/subfolderC/parquet files
folder1/subfolderD/parquet files
etc.
All the parquet files have the same schema, and all the parquet files have, amongst other fields, a code field, let's call it code_XX.
Now I want from all parquet files in all folders the distinct value of code_XX.
So if code_XX, value 'A345' resides multiple times in the parquet files in subfolderA and subfolderC, I only want it once.
Output must be a Parquet file with all unique codes.
Is this doable in Azure Data Factory, and how?
If not, can it be done in Databricks?

You can try as below.
Set source folder path to recursively look for all parquet files and choose a column to store the file names.
As it seems you only need file names in output parquet file, use select to have only that column forward.
Use expression in derived column to get the file names from path string.
distinct(array(right(fileNames,locate('/',reverse(fileNames))-1)))

If you have access to SQL, it can be done with two copy activities, no need for data flows.
Copy Activity 1 (Parquet to SQL): Ingest all files into a staging table.
Copy Activity 2 (SQL to Parquet): Select DISTINCT code_XX from the staging table.
NOTE:
Use Mapping to only extract the column you need.
Use a wildcard file path with the recursive option enabled to copy all files from subfolders. https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage?tabs=data-factory#blob-storage-as-a-source-type

ADF / Dataflow - Convert Multiple CSV to Parquet

In ADLS Gen2, TextFiles folder has 3 CSV files. Column names are different in each file.
We need to convert all 3 CSV files to 3 parquet files and put it in ParquetFiles folder
I tried to use Copy Activity and it fails because the column names have empty space in it and parquet files doesn't allow it
To remove spaces, I used Data flow: Source -> Select (replace space by underscore in col name) and sink. This worked for a single file. When I tried to do it for all 3 files, it tries to merge 3 files and generates single file with incorrect data.
How to solve this, mainly removing spaces from column names in all files. What would be the other options here?

Pipeline: ForEach activity (loop over CSV files in folder and send in current iteration item to data flow as param) -> Data Flow activity with source that points to that folder (parameterize the file name in the source path)

I created 2 datasets, one in csv with wildcard format, the other in parquet. I used the Data Copy Activity using the parquet data set as sink and csv data set as source. I set the copy behavior to Merge files.

How to make a csv file from multiple tables from sqlite in swift

I am trying to make a csv file from sqlite database.How to i make a csv file with table name as the header values in a single csv file in swift

Insert excel file in SQLite Database in Iphone

We have a requirement to insert a excel/pdf file into SQLite database filed.How can we do this ? Is it possible to convert the excel/pdf file in some binary format and then insert into a blob field in DB? We don't want to save the files in the documents directory and store the file names in sqlite.

why not save the files in the documents directory and store the file names in sqlite?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to put xml from zip in a postgres table? - postgresql

How can I put xml files from a zip file into a postgresql table? COPY FROM 'zip/file.zip/*.xml'; There are a lot of xml files in the zip file. I want to add rows containing xml files to the table.

Related

aws_s3.query_export_to_s3 PostgreSQL RDS extension exporting all multi-part CSV files to S3 with a header

How to create unique list of codes from multiple files in multiple subfolders in ADF?

ADF / Dataflow - Convert Multiple CSV to Parquet

How to make a csv file from multiple tables from sqlite in swift

Insert excel file in SQLite Database in Iphone

Categories

Resources