Adding field delimiter ";" in last column on header file - datastage

I'm new in datastage and trying to create a sequential file with ";" as delimeter.
I would like to add my delimeter just after the last column in the headers
please see below exemple for more understanding
Actully i have this in my sequential file :
SERVICE_ID;OFFER_ID;MINIMUM;MAXIMUM
19441;162887;;;
19442;162889;;;
Expected result with delimiter after last column in header :
SERVICE_ID;OFFER_ID;MINIMUM;MAXIMUM;
19441;162887;;;
19442;162889;;;
How can i do that please ?

Use the Final Delimiter property in the Sequential File stage format properties.

Related

Getting Python to accept a csv into postgreSQL table with ":" in the headers

I receive a .csv export every 10 minutes that I'd like to import into a postgreSQL server. Working with a test csv, I got everything to work, but didn't take notice that my actual csv file has a forced ":" at the end of each column header (but not on the first header for some reason)(built into the back-end of the exporter, so I cant get it removed, already asked the company). So I added the ":"s to my test csv as shown in the link,
My insert into functions no longer work and give me syntax errors. First I'm trying to add them using the following code,
print("Reading file contents and copying into table...")
with open('C:\\Users\\admin\\Desktop\\test2.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
columns = next(readCSV) #skips the header row
query = 'insert into test({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
for data in readCSV:
cursor.execute(query, data)
con.commit()
Resulting in '42601' error near ":" in the second column header.
The results are the same while actually listing column headers and ? ? ?s out in the INSERT INTO section.
What is the syntax to get the script to accept ":" on column headers? If there's no way, is there a way to scan through headers and remove the ":" at the end of each?
Because : is a special character, if your column is named year: in the DB, you must double quote its name --> select "year:" from test;
You are getting a PG error because you are referencing the unquoted column name (insert into test({0})), so add double quotes there.
query = 'insert into test("year:","day:", "etc:") values (...)'
That being said, it might be simpler to remove every occurrence of : in your csv's 1st line
Much appreciated JGH and Adrian. I went with your suggestion to remove every occurrence of : by adding the following line after the first columns = ... statement
columns = [column.strip(':') for column in columns]
It worked well.

Skip first and last line from a pipe delimited file with 26 columns and make it to dataframe using scala

HD|20211210
DT|D-|12/22/2017|12/22/2017 09:41:45.828000|11/01/2017|01/29/2018 14:46:10.666000|1.2|1.2|ABC|ABC|123|123|4554|023|11/01/2017|ACDF|First|0012345||f|ABCD|ABCDEFGH|ABCDEFGH||||
DT|D-|12/25/2017|12/25/2017 09:24:20.202000|12/13/2017|01/29/2018 07:52:23.607000|6.4|6.4|ABC|ABC|123|123|4540|002|12/13/2017|ACDF|First|0012345||f|ABC|ABCDEF|ABCDEFGH||||
TR|0000000002
File name is Datafile.Dat. Scala version 2.11
I need to create header Dataframe with the first line but excluding "HD|", Need to create trailer dataframe with the last line but excluding "TR|", and finally need to create actual dataframe by skipping both the first and last line and excluding "DT|" from each line.
Please help me on this.
I see you have a defined schema for your dataframe (except first and last row).
What you can do is to read that file and seperator will be '|'
and you can enable "DROPMALFORMED" mode.
schema = 'define your schema here'
df = spark.read.option("mode","DROPMALFORMED").option("delimiter","|").option("header","true").schema(schema).csv("Datafile.Dat")
Another way is to use zipWithIndex.

Converting csv to parquet in spark gives error if csv column headers contain spaces

I have csv file which I am converting to parquet files using databricks library in scala. I am using below code:
val spark = SparkSession.builder().master("local[*]").config("spark.sql.warehouse.dir", "local").getOrCreate()
var csvdf = spark.read.format("org.apache.spark.csv").option("header", true).csv(csvfile)
csvdf.write.parquet(csvfile + "parquet")
Now the above code works fine if I don't have space in my column headers. But if any csv file have spaces in the column headers, it doesn't work and errors out stating invalid column headers. My csv files are delimited by ,.
Also, I cannot change the spaces of column names of the csv. The column names has to be as they are even if they contain spaces as those are given by end user.
Any idea on how to fix this?
per #CodeHunter's request
sadly, the parquet file format does not allow for spaces in column names;
the error that it'll spit out when you try is: contains invalid character(s) among " ,;{}()\n\t=".
ORC also does not allow for spaces in column names :(
Most sql-engines don't support column names with spaces, so you'll probably be best off converting your columns to your preference of foo_bar or fooBar or something along those lines
I would rename the offending columns in the dataframe, to change space to underscore, before saving. Could be with select "foo bar" as "foo_bar" or .withColumnRenamed("foo bar", "foo_bar")

USQL Escape Quotes

I am new to Azure data lake analytics, I am trying to load a csv which is double quoted for sting and there are quotes inside a column on some random rows.
For example
ID, BookName
1, "Life of Pi"
2, "Story about "Mr X""
When I try loading, it fails on second record and throwing an error message.
1, I wonder if there is a way to fix this in csv file, unfortunatly we cannot extract new from source as these are log files?
2, is it possible to let ADLA to ignore the bad rows and proceed with rest of the records?
Execution failed with error '1_SV1_Extract Error :
'{"diagnosticCode":195887146,"severity":"Error","component":"RUNTIME","source":"User","errorId":"E_RUNTIME_USER_EXTRACT_ROW_ERROR","message":"Error
occurred while extracting row after processing 9045 record(s) in the
vertex' input split. Column index: 9, column name:
'instancename'.","description":"","resolution":"","helpLink":"","details":"","internalDiagnostics":"","innerError":{"diagnosticCode":195887144,"severity":"Error","component":"RUNTIME","source":"User","errorId":"E_RUNTIME_USER_EXTRACT_EXTRACT_INVALID_CHARACTER_AFTER_QUOTED_FIELD","message":"Invalid
character following the ending quote character in a quoted
field.","description":"Invalid character is detected following the
ending quote character in a quoted field. A column delimiter, row
delimiter or EOF is expected.\nThis error can occur if double-quotes
within the field are not correctly escaped as two
double-quotes.","resolution":"Column should be fully surrounded with
double-quotes and double-quotes within the field escaped as two
double-quotes."
As per the error message, if you are importing a quoted csv, which has quotes within some of the columns, then these need to be escaped as two double-quotes. In your particular example, you second row needs to be:
..."Life after death and ""good death"" models - a qualitative study",...
So one option is to fix up the original file on output. If you are not able to do this, then you can import all the columns as one column, use RegEx to fix up the quotes and output the file again, eg
// Import records as one row then use RegEx to clean columns
#input =
EXTRACT oneCol string
FROM "/input/input132.csv"
USING Extractors.Text( '|', quoting: false );
// Fix up the quotes using RegEx
#output =
SELECT Regex.Replace(oneCol, "([^,])\"([^,])", "$1\"\"$2") AS cleanCol
FROM #input;
OUTPUT #output
TO "/output/output.csv"
USING Outputters.Csv(quoting : false);
The file will now import successfully. My results:

How to append CSV file into dbf?

Do you know how to append a .csv file into DBF by using FoxPro command , instead of using import wizard?
I have tried the code below, but it is not working.
Code:
Append from getfile() Delimited With ,
You can use append from:
local lcFileName
lcFilename = getfile()
if !empty(m.lcFileName)
append from (m.lcFileName) type delimited
endif
This would do appending including the header line. However, if you have a field that is not character, then you could use it to skip header line. For example say there is a column named theDate of type date and should not be empty, then you could say:
append from (m.lcFileName) type delimited for !empty(theDate)
Or if you are appending into an empty cursor, you could say - ... for recno()>1.