How to import BLOB formatted as CSV to postgres - postgresql

I have a csv file that is an output from a BLOB store. The csv contains 6 related tables. Not all records utilise the 6 tables but all records do use table 1. I would like to import table 1 into postgres. Data is described as following
Files are ASCII text files comprising variable length fields delimited by an asterisk. The files have an extension of “csv”. Records are separated by a Carriage Return/Line Feed. No data items should contain an asterisk.
Further information is given in the technical arrangement.
Technical Arrangement
The technical arrangement of our summary data is as follows:
Fields are in the exact order in which they are listed in this file specification.
Records are broken down into Types. Each Type represents a different part of the record.
Each record will begin with Type ‘01’ data
For each record type ‘01’, there is one or more record type ‘02’s containing Survey Line Item data. There may be zero or more of record types ’03’ and '06'.
There may be zero or one of record types '04' and '05'.
If a record type '06' exists, there will be one record type '07'
The end of a record is only indicated by the next row of Type ‘01’ data or the end of the file.
You should use this information to read the file into formal data structures.
I'm new to databases and want to know how to tackle this, i understand that postgres has python and java connectors which in turn have ways to read blob data. Is that the best approach?
EDIT
Sample data, one entry comprising 2 record types then 1 containing all 7 record types ;
01*15707127000*8227599000*0335*The Occupier*3****MARKET STREET**BRACKNELL*BERKS*RG12 1JG*290405*Shop And Premises*60.71*14872*14872*14750*2017*Bracknell Forest*00249200003001*20994339144*01-APR-2017**249*NIA*330.00
02*1*Ground*Retail Zone A*29.42*330.00*9709
02*2*Ground*Retail Zone B*31.29*165.00*5163
01*15707136000*492865165*0335**7-8****CHARLES SQUARE**BRACKNELL*BERKS*RG12 1DF*290405*Shop And Premises*325.10*34451*32921*32750*2017*Bracknell Forest*00215600007806*21012750144*01-APR-2017**249*NIA*260.00
02*1*Ground*Retail Zone A*68.00*260.00*17680
02*2*Ground*Remaining Retail Zone*83.50*32.50*2714
02*3*Ground*Office*7.30*26.00*190
02*4*First*Locker Room (Female)*3.20*13.00*42
02*5*First*Locker Room (Male)*5.80*13.00*75
02*6*First*Mess/Staff Room*11.50*13.00*150
02*7*Ground*Internal Storage*7.80*26.00*203
02*8*Ground*Retail Zone B*68.10*130.00*8853
02*9*Ground*Retail Zone C*69.90*65.00*4544
03*Air Conditioning System*289.5*7.00*+2027
06*Divided or split unit*-5.00%
06*Double unit*-5.00%
07*36478*-3557`

Copy the text file to an auxiliary table with a single text column:
drop table if exists text_buffer;
create table text_buffer(text_row text);
copy text_buffer from '/data/my_file.csv';
Transform the text column to text array skipping rows you do not need. You'll be able to select any element as a new column with a given name and type, e.g.:
select
cols[2]::bigint as bigint1,
cols[3]::bigint as bigint2,
cols[4]::text as text1,
cols[5]::text as text2
-- specify name and type of any column you need
from text_buffer,
lateral string_to_array(text_row, '*') cols -- transform text column to text array
where left(text_row, 2) = '01'; -- get only rows for table1
bigint1 | bigint2 | text1 | text2
-------------+------------+-------+--------------
15707127000 | 8227599000 | 0335 | The Occupier
15707136000 | 492865165 | 0335 |
(2 rows)

Related

Is it possible to generate the space separated header row using data factory copy activity?

I am using azure sql as source dataset and delimited file as sink dataset in the copy activity.
I tried copy activity but First row as header gives comma separated headers.
Is there way to change the header output style ?
Please note spacing is unequal (h3...h4)
In this repro, I tried to give
1 space between 1st and 2nd column,
2 spaces between 2nd and 3rd column,
3 spaces between 3rd and 4th column.
Also, I tried to give same column name for column2 and column3. The approach is as follows.
Data is copied from Azure SQL database to datalake in comma delimitted format as a staging file.
This staging file is taken as a source in Dataflow activity.
In source dataset, first row as header is not checked.
Data preview of Source transformation:
Derived column transformation is added to change the column name of column2 and column3.
In this case, date_col of column1 is header data. Thus when column1 is 'date_col' replace column2 and column3 data with same column name.
column_2 = iif(Column_1=='date_col','ECIX',Column_2);
column_3 = iif(Column_1=='date_col','ECIX',Column_3);
Again derived column transformation is added to concat all the columns with spaces. Column name is given as concat . Value for this column is
concat(Column_1,' ',Column_2,' ',Column_3,' ',Column_4)
Select transformation is added and only concat column is selected here.
In sink, new delimited file is added as a sink dataset. And in sink dataset also , first row as header is not checked.
Output file screenshot
After pipeline is run, the target file looks like this.
Keeping the source as azure sql itself in the dataflow, I created a single derived column 'OUTDC' and added all the columns from the source like this:
(h1)+' '+(h2)+' '+(h3)
Then fed the OUTDC to a delimited sink and kept the Headers option as single string like this:
['h1 h2 h2']

ADF map source columns startswith to sink columns in SQL table

I have a ADF data flow with many csv files as a source and a SQL database as a sink. The data in the csv files are similar with 170 plus columns wide however not all of the files have the same columns. Additionally, some column names are different in each file, but each column name starts with the same corresponding 3 digits. Example: 203-student name, 644-student GPA.
Is it possible to map source columns using the first 3 characters?
Go back to the data flow designer and edit the data flow.
Click on the parameters tab
Create a new parameter and choose string array data type
For the default value as per your requirement, enter ['203-student name','203-student grade',’203-student-marks']
Add a Select transformation. The Select transformation will be used to map incoming columns to new column names for output.
We're going to change the first 3 column names to the new names defined in the parameter
To do this, add 3 rule-based mapping entries in the bottom pane
For the first column, the matching rule will be position==1 and the name will be $parameter11
Follow the same pattern for column 2 and 3
Click on the Inspect and Data Preview tabs of the Select transformation to view the new column name.
Reference - https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow-dynamic-columns#parameterized-column-mapping

Breakout concatenated field into rows not columns within Tableau

I have two fields that contain concatenated strings. The first field contains medical codes and the second field contains the descriptions of those codes. I don't want to break these into multiple fields because some of them would contain hundreds of splits. Is there any way to break them into a row each like below? The code and description values are separated by a semicolon (;)
code description
----- ------------
80400 description1
80402 description2
A sample of the data:
One way is you can custom split two columns at ; which will create separate columns for every entry then you can pivot code columns and description columns separately.
One issue will be you can't guarantee if every code is mapped to correct description.
One more way is export data to excel sheet and then split and pivot the columns and then match the code and description, Then take the excel as datasource to the tableau.

How do you insert part of an exsisting column record into a new table in postgreSQL?

I'm in the process of formatting a database and have found a column I'd like to format. It has 3 types of information for every record in the column. For example, a record in my history column shows as (American, born Estonia. 19011974). What I want to do is put this data into new individual columns to make them atomic. I want to extract data such as 'American' into a country column, 'Estonia' into a born column and '1901' into a born column and '1974' into a death column. UPDATE: However, some of the columns hold nullls, for example, another record in the same column might be (German, 19242004), so a normal regular expression wouldn't work for all data would it? Any help is appreciated!
What PostgreSQL statements would I use to obtain specific parts of this data from the individual records? I understand it would be insert and have already came up with:
INSERT INTO historian (id,url)
SELECT object_id, url FROM maintable;
that statement allowed me to get those values into new columns, but those were atomic already so I could easily transition them. Thanks for any help! :)

When I Import txt to VFP with Wizard decimals don't work ok

i have this txt file:
"1","My Product 1","Vegetables","15.20"
"2","My Product 2","soda","9.52"
but when i import that with the wizard on Visual FoxPro 6 my result in the table is:
1 | My Product 1 | Vegetables | 15
2 | My Product 2 | Vegetables | 9
I've used SET DECIMALS TO 2 but it doesn't work. If I export again, the table in txt shows this:
"1","My Product 1","Vegetables","15"
"2","My Product 2","soda","9"
without decimals. So, how can I import decimals correctly to VFP with the wizard or with a sentence?
I don't know the format of your table, but here is something that will work for you. I am creating a temporary cursor as opposed to a permanent table, but a permanent table could do the same thing. You need to pre-define your columns in the same order and expected data type. In this case, the price I set as numeric with a length 10 max, but 2 decimal positions.
CREATE CURSOR C_Import;
( someID c(5),;
someProduct c(30),;
someOtherFld c(20),;
somePrice n(10,2))
Now, if you append the text file as CSV (comma separated values), VFP will recognize the decimal positions during the numeric import.
APPEND FROM YourTextFile.txt TYPE csv
If the the default decimal point is ',' you have to define before the append command: SET POINT TO '.'. Without that you'll get only integer value as price.
Remember after append to change it back to the original value.