How to include curly braces for array of double field when exporting db data to csv file using CsvMapper - export-to-csv

I have record in my table having field type String, double[], String[] etc, I need output in csv file as below
Expected output
Record, "{0.6, 1.5, 1.8}", "{ST, PST, CB}"
But I m getting as Record, 0.6, 1.5, 1.8, ST, PST, CB.
Ultimately when generating csv file, for array of double (double[]) and array of String (String[]) need to incude "{}" Surrounding for that fied alone

Related

PostgreSQL COPY cannot read JSON from CSV file

I'm copying data from a CSV file into a PostgreSQL table using COPY
My CSV file is simply:
0\"a string"
And my table "Test" was created by the following:
create table test (
id integer,
data jsonb
);
My copy statement was the following:
I received the following error:
williazz=# \copy test from 'test/test.csv' delimiters '\' CSV
ERROR: invalid input syntax for type json
DETAIL: Token "a" is invalid.
CONTEXT: JSON data, line 1: a...
COPY test, line 1, column data: "a string"
Interestingly, when I changed my CSV file to a number, it had no problem.
CSV:
0\1505
williazz=# \copy test from 'test/test.csv' delimiters '\' CSV
COPY 1
williazz=# select * from test;
id | data
----+------
0 | 1505
(1 row)
Furthermore, numbers in arrays also work:
CSV:
1\[0,1,2,3,4,5]
williazz=# select * from test;
id | data
----+---------------
0 | 1505
1 | [0,1,2,3,4,5]
(2 rows)
But as soon as I introduct a non-digit string into the JSON, the COPY stops working
0\[1,2,"three",4,5]
ERROR: invalid input syntax for type json
DETAIL: Token "three" is invalid.
CONTEXT: JSON data, line 1: [1, 2, three...
COPY test, line 1, column data: "[1, 2, three, 4, 5]"
I cannot get postgres to read a non-digit string in JSON format. I've also tried changing the data type of column "data" from jsonb to json, and using basically every combination of single and double quotes
Could someone please help me identify the problem? Thank you
Because your file is CSV encoded, it does not mean what you think.
0\"a string"
With a delimiter of \ this is two values: the number 0 and the string a string. Note the lack of quotes. Those quotes are part of the CSV string formatting. a string is not valid JSON, the quotes are required.
Instead you need to include the JSON string quotes inside the CSV string quotes. Quotes in CSV are escaped by doubling them.
0\"""a string"""
Now that is the number 0 and the string "a string" including quotes.
And as an observation, it would be simpler to remove the complication of embedding JSON into a CSV and use a pure JSON file.
[
[0, "a string"],
[1, "other string"]
]

Converting a table that has all variables of the class char to string

I have a table t that contains a column year. The following command returns
class t.Year
ans =
char
This is not just for the column year but for all columns in the table.
I need to convert the table to a cellstr so that I can do str2num function on it. I am unable to convert all the columns and rows to string type. I also need to remove '' from the column names when I do table2cell. After table2cell I need to convert to cellstr and I am unable to do so since all the values in the table(columns) are char.

kdb+: Save table into a csv file

I have the below table "dates" , it has a sym column with symbols and a d column with list of strings and would like to save it into a regular CSV file. Couldn't find a good way to do it. Any suggestions?
q)dates
sym d
----------------------------------------------------------------------------
6AH0 "1970.03.16" "1980.03.17" "1990.03.19" "2010.03.15"
6AH6 "1976.03.15" "1986.03.17" "1996.03.18" "2016.03.14"
6AH7 "1977.03.14" "1987.03.16" "1997.03.17" "2017.03.13"
6AH8 "1978.03.13" "1988.03.14" "1998.03.16" "2018.03.19"
6AH9 "1979.03.19" "1989.03.13" "1999.03.15" "2019.03.18"
When I try to do the regular save the below error happens:
q)save `:dates.csv
k){$[t&77>t:#y;$y;x;-14!'y;y]}
'type
q))
The internal table->csv conversion function within Kdb+ is not able to handle nested lists in columns. The d column in your table is a list of list of chars. However, the conversion function is able to handle a simply nested column (depth of 1).
Therefore, you can convert the d column to a list of chars and then save to CSV using the internal function:
/ generate a table of dummy data
q)show dates:flip `sym`d!(`6AH0`6AH6`6AH7;string (3;0N)#12?.z.d)
sym d
--------------------------------------------------------
6AH0 "2008.02.04" "2015.01.02" "2003.07.05" "2005.02.25"
6AH6 "2012.10.25" "2008.08.28" "2017.01.25" "2007.12.27"
6AH7 "2004.02.01" "2005.06.06" "2013.02.11" "2010.12.20"
/ convert 'd' column to simple list - the (" " sv') is the conversion func here
q)#[`dates;`d;" " sv']
`dates
/ review what was done
q)show dates
sym d
--------------------------------------------------
6AH0 "2008.02.04 2015.01.02 2003.07.05 2005.02.25"
6AH6 "2012.10.25 2008.08.28 2017.01.25 2007.12.27"
6AH7 "2004.02.01 2005.06.06 2013.02.11 2010.12.20"
/ save to csv
q)save `:dates.csv
`:dates.csv
/ review saved csv
q)\cat dates.csv
"sym,d"
"6AH0,2008.02.04 2015.01.02 2003.07.05 2005.02.25"
"6AH6,2012.10.25 2008.08.28 2017.01.25 2007.12.27"
"6AH7,2004.02.01 2005.06.06 2013.02.11 2010.12.20"
As per the csv specification, you'll want to flatten the list out and separate each with a comma and double quote the list.
'save' is limited in that the file must be named the same as the global variable you are saving.
If I was tasked with your question I'd do it like so;
`:myFileNamedWhatever.csv 0: csv 0: select sym,csv sv'd from dates
Explanation;
csv 0: table /csv is a variable, literally defined as "," - its good for readability. csv 0: table converts the table to a comma separated list of strings
`:file 0: listOfStrings /this takes a LIST of strings and pushes them to the file handle. Each element of the list is a new line in the file
I'd prefer this approach as it is general and allows the saving of various types. You can use it within a function etc..
At a later date I decided that I wanted it saved as a pipe (or anything) separated file;
`:myNewFile.psv 0: "|" 0: select sym,"|"sv'd from table

Spark: Read a csv file into a map like structure using scala

I have a csv file of the format:
key, age, marks, feature_n
abc, 23, 84, 85.3
xyz, 25, 67, 70.2
Here the number of features can vary. In eg: I have 3 features (age, marks and feature_n). I have to convert it into a Map[String,String] as below :
[key,value]
["abc","age:23,marks:84,feature_n:85.3"]
["xyz","age:25,marks:67,feature_n:70.2"]
I have to join the above data with another dataset A on column 'key' and append the 'value' to another column in dataset A. The csv file can be loaded into a dataframe with schema (schema defined by first row of the csv file).
val newRecords = sparkSession.read.option("header", "true").option("mode", "DROPMALFORMED").csv("/records.csv");
Post this I will join the dataframe newRecords with dataset A and append the 'value' to one of the columns of dataset A.
How can I iterate over each column for each row, excluding the column "key" and generate the string of format "age:23,marks:84,feature_n:85.3" from newRecords?
I can alter the format of csv file and have the data in JSON format if it helps.
I am fairly new to Scala and Spark.
I would suggest the following solution:
val updated:RDD[String]=newRecords.drop(newRecords.col("key")).rdd.map(el=>{val a=el.toSeq;val st= "age"+a.head+"marks:"+a(1)+" feature_n:"+a.tail; st})

concatenating text to a column in pig

I have a day column and a month column and would like to concatenate the year to it and store it in CHARARRAY format with the hyphens.
so I have: month:CHARARRAY, day:CHARARRAY
Meaning, for example, if the day column contains '03' and the month column contains '04', I would like to create a date column that contains: '2014-04-03'
This is my code:
CONCAT('2014-',month,'-',day) as date;
It doesn't work and I'm not quite sure how to concatenate additional text onto the column.
I would like to note that I'm not sure converting to date format is an option for me. I would prefer to keep it in CHARARRAY format since I would like to join with another file that has date stored in CHARARRAY format.
Assuming this is the data file called dateExample.csv:
Surender,02,03,1988
Raja,12,09,1998
Raj,05,10,1986
This is the script for pig:
A = LOAD 'dateExample.csv' USING PigStorage(',') AS(name:chararray,day:chararray,month:long,year:chararray);
X = FOREACH A GENERATE CONCAT((chararray)day,CONCAT('-',CONCAT((chararray)month,CONCAT('-',(chararray)year))));
dump X;
You will get the desired output:
(02-3-1988)
(12-9-1998)
(05-10-1986)
Explanation:
When we try to concat like this:
X = FOREACH A GENERATE CONCAT(day,CONCAT('-',CONCAT(month,CONCAT('-',year))));
We get following exception :
ERROR 1045:
<line 2, column 45> Could not infer the matching function for org.apache.pig.builtin.CONCAT as multiple or none of them fit. Please use an explicit cast.
So we need to explicitly cast the day,month and year values to chararray and it works!!