Retain double Quotes while Importing CSV File in pgAdmin -4 - postgresql

When I tried to create an Import Job in pgAdmin 4, to copy the CSV File into the table, all double quotes are getting truncated. I am sure, something to do with Quote and Escape, not understanding, please help.
My CSV File looks like
After Import Job , the data Inserted is without double quotes in 'description' column.
Here are the Import Job from pgAdmin -
[![enter image description here][2]][2]
Data in Excel File looks like
{
"push": {
"title": "Recent Detected",
"body": " were above target",
"expandedBody" : " more details about this pattern"
}
}
This is a sinle column Data. This is String Column. I want to Import as it it. But, if I import, the data that I am getting in the table is
{
push: {
title: Recent Detected,
body: were above target,
expandedBody : more details about this pattern
}
}
| Id | description |
| ---| ------------|
| 1 | {
"push": {
"title": "Recent Detected",
"body": " were above target",
"expandedBody" : " more details about this pattern"
}
}
-----------------------
Table Structure :
Create table abcd ( id integer, description character varying );
What I am trying to do is :
--Use Import function in pgAdmin 4 to populate the table with the Data from the above CSV File.
What is going on :
Data got inserted without the double quotes.
What is Expected :
Data should get Inserted with the quotes, as if the quotes are part of the string .
What is going wrong -
Import settings, something I am missing or not understanding.

Related

Dataframe headers get modified on saving as parquet file using glue

I'm writing a dataframe with headers by partitions to s3 using the below code:
df_dynamic = DynamicFrame.fromDF(
df_columned,
glue_context,
"temp_ctx"
)
print("\nUploading parquet to " + destination_path)
glue_context.write_dynamic_frame.from_options(
frame=df_dynamic,
connection_type="s3",
connection_options={
"path": destination_path,
"partitionKeys" : ["partition_id"]
},
format_options={
"header":"true"
},
format="glueparquet"
)
Once my files are created I see I have #1, #2 added after my column headers.
Example: if my column name is "Doc Data", it gets converted to Doc_Date#1
I thought its a parquet way of saving data.
Then when I try to read from the same files using the below code, my headers are no more the same. Now they come as Doc_Date#1. How do I fix this?
str_folder_path = str.format(
_S3_PATH_FORMAT,
args['BUCKET_NAME'],
str_relative_path
)
df_grouped = glue_context.create_dynamic_frame.from_options(
"s3",
{
'paths': [str_folder_path],
'recurse':True,
'groupFiles': 'inPartition',
'groupSize': '1048576'
},
format_options={
"header":"true"
},
format="parquet"
)
return df_grouped.toDF()
Issue Resolved!!
Issue was that I had spaces in my column names. Once I replaced them with underscores(_), the issue resolved.

firebase error (child:) Must be a non-empty string and not contain '.' '#' '$' '[' or ']

I am uploading key value pairs to firebase and I am receiving the error
(child:) Must be a non-empty string and not contain '.' '#' '$' '[' or ']...
does this error mean the child "key" contains $ # . etc
or is it if the child "value" contains $ # . etc?
as I am trying to locate where the error is?
this is my dictionary below though i can't see any of the values in the key part of the key value pair.
this is dictionary:
["Subtitle": Sad DVD, "Email": go#mail.com, "Gender": 27, "Username": Sad DVD, "Display Name": Sad DVD, "Account Type": Business, "Password": Sdfsdfsdf, "profileImageUrl": https://firebasestorage.googleapis.com/v0/b/grapevine-2019-1d4a5.appspot.com/o/profile%20images%2F438670B1-E723-484C-8510-555B1CA0B9C5png?alt=media&token=b65796ea-6380-497e-a98b-d1c75c4c454a]
this is profile image url https://firebasestorage.googleapis.com/v0/b/grapevine-2019-1d4a5.appspot.com/o/profile%20images%2F438670B1-E723-484C-8510-555B1CA0B9C5png?alt=media&token=b65796ea-6380-497e-a98b-d1c75c4c454a
You can store any ascii text as a value so the issue is with one of the keys or perhaps a value is empty.
I copy and pasted your data into a project and enclosed all of the values within quotes
let test = ["Subtitle": "Sad DVD",
"Email": "go#mail.com",
"Gender": "27",
"Username": "Sad DVD",
"Display Name": "Sad DVD",
"Account Type": "Business",
"Password": "Sdfsdfsdf",
"profileImageUrl": "https://firebasestorage.googleapis.com/v0/b/grapevine-2019-1d4a5.appspot.com/o/profile%20images%2F438670B1-E723-484C-8510-555B1CA0B9C5png?alt=media&token=b65796ea-6380-497e-a98b-d1c75c4c454a"
]
Then, to test the data I wrote it to Firebase like this:
let testRef = self.ref.child("test").childByAutoId()
testRef.setValue(test)
Which correctly wrote the data with no errors.
So that would mean the string included in your question is not the actual string being written to Firebase.
I would suggest adding a
print(test)
right after it's assigned to the test var and inspect that for key's that are incorrect or empty values.
Oh - and the issue could also be the parent node you are attempting to write to. In my code I create the parent key with childByAutoId - you're code isn't included in your question but you may want to check that as well.

How to select child tag from JSON file using scala

Good Day!!
I am writing a Scala code to select the multiple child tag from json file however I am not getting exact solution. The code looks like below,
Code:
val spark = SparkSession.builder.master("local").appName("").config("spark.sql.warehouse.dir", "C:/temp").getOrCreate()
val df = spark.read.option("header", "true").json("C:/Users/Desktop/data.json").select("type", "city", "id","name")
println(df.show())
Data.json
{"claims":[
{ "type":"Part B",
"city":"Chennai",
"subscriber":[
{ "id":11 },
{ "name":"Harvey" }
] },
{ "type":"Part D",
"city":"Bangalore",
"subscriber":[
{ "id":12 },
{ "name":"andrew" }
] } ]}
Expected Result:
type city subscriber/0/id subscriber/1/name
Part B Chennai 11 Harvey
Part D Bangalore 12 Andrew
Please help me with the above code.
If I'm not mistaken Apache Spark expects each line to be a separate JSON object, so it will fail if you’ll try to load a pretty formatted JSON file.
https://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets
http://jsonlines.org/examples/

amazon redshift copy using json having trouble

I have created simple table called as test3
create table if not exists test3(
Studies varchar(300) not null,
Series varchar(500) not null
);
I got some json data
{
"Studies": [{
"studyinstanceuid": "2.16.840.1.114151",
"studydescription": "Some study",
"studydatetime": "2014-10-03 08:36:00"
}],
"Series": [{
"SeriesKey": "abc",
"SeriesInstanceUid": "xyz",
"studyinstanceuid": "2.16.840.1.114151",
"SeriesDateTime": "2014-10-03 09:05:09"
}, {
"SeriesKey": "efg",
"SeriesInstanceUid": "stw",
"studyinstanceuid": "2.16.840.1.114151",
"SeriesDateTime": "0001-01-01 00:00:00"
}],
"ExamKey": "exam-key",
}
and here is my json_path
{
"jsonpaths": [
"$['Studies']",
"$['Series']"
]
}
Both the json data and json path is uploaded to s3.
I try to execute the following copy command in redshift consule.
copy test3
from 's3://mybucket/redshift_demo/input.json'
credentials 'aws_access_key_id=my_key;aws_secret_access_key=my_access'
json 's3://mybucket/redsift_demo/json_path.json'
I get the following error. Can anyone please help been stuck on this for sometime now.
Amazon](500310) Invalid operation: Number of jsonpaths and the number of columns should match. JSONPath size: 1, Number of columns in table or column list: 2
Details:
-----------------------------------------------
error: Number of jsonpaths and the number of columns should match. JSONPath size: 1, Number of columns in table or column list: 2
code: 8001
context:
query: 1125432
location: s3_utility.cpp:670
process: padbmaster [pid=83747]
-----------------------------------------------;
1 statement failed.
Execution time: 1.58s
Redshift's error is misleading. The issue is that your input file is wrongly formatted: you have an extra comma after the last JSON entry.
Copy succeeds if you change "ExamKey": "exam-key", to "ExamKey": "exam-key"

How to run U-SQL for all the files in a folder using parameters from ADF?

Can't pass the "in" parameter to U-SQL to use all the files in the folder.
in my ADF pipeline, I have the following parameters settings:
"parameters": {
"in": "$$Text.Format('stag/input/{0:yyyy}/{0:MM}/{0:dd}/*.csv', SliceStart)",
"out": "$$Text.Format('stag/output/{0:yyyy}/{0:MM}/{0:dd}/summary.csv"
}
And the U-SQL script trys to extract from:
#couponlog =
EXTRACT
Id int,
[Other columns here]
FROM #in
USING Extractors.Csv(skipFirstNRows:1);
But I get file not found during execution.
The files exists in the data lake but I don't know the correct syntax to pass it as a parameter.
I am sure there are many ways to solve the issue, but what I found is that instead of passing a parameter from the ADF pipeline, it is easier to use virtual columns. in my case v_date
#couponlog =
EXTRACT
Id int,
[Other columns here],
v_date DateTime
FROM "stag/input/{v_date:yyyy}/{v_date:MM}/{v_date:dd}/{*}.csv"
USING Extractors.Csv(skipFirstNRows:1);
With this the U-SQL scrip found all the files
I'm using dates inputted by ADF with no trouble. I pass in just the date portion and then format it within USQL:
"parameters": {
"in": "$$Text.Format('{0:yyyy}/{0:MM}/{0:dd}/', SliceStart)"
}
Then in USQL:
DECLARE #inputPath = "path/to/file/" + #in + "{*}.csv";
DECLARE #outputPath = "path/to/file/" + #in + "output.csv";
Those variables then get used in the script as needed.
I use this input parameter in ADF to read all the files from a folder with a virtual column (file) to retrieve the name of the file
"parameters": {
"in": "$$Text.Format('storage/folder/{0:yyyy}-{0:MM}/{1}.csv', SliceStart, '{file:*}')",
"out": "$$Text.Format('otherFolder/{0:yyyy}-{0:MM}/result.txt', SliceStart)"
}
The related U-SQL
#sales =
EXTRACT column1 string,
column2 decimal,
file string
FROM #in
USING Extractors.Csv(silent : true);