JSON Data loading into Redshift Table

JSON Data loading into Redshift Table - amazon-redshift

I am trying to load JSON Data into Redshift Table. Below is the sample code, Table structure and JSON Data.
I have gone through many post in this site and AWS. However, my issue is not yet resolved.
JSON data is below, that I copied the below data in test.json and uploaded in S3...
{backslash: "a",newline: "ab",tab: "dd"}
Table structure is as below
create table escapes (backslash varchar(25), newline varchar(35), tab
varchar(35));
Copy command is as below
copy escapes from 's3://dev/test.json'
credentials 'aws_access_key_id=******;aws_secret_access_key=$$$$$'
format as JSON 'auto';
However it throws the below error
Amazon Invalid operation: Load into table 'escapes' failed. Check 'stl_load_errors' system table for details.;
1 statement failed.
In the 'stl_load_errors' table , the error reason is as below "Invalid value."

Seems like issue is with your JSON data. Ideally it should be-
{
"backslash": "a",
"newline": "ab",
"tab": "dd"
}
I hope this should resolve your issue, but if not, update your question and I could reattempt the answer.

Related

Azure Data Factory Copy Activity Pipeline Destination Mapping String Format Date to Sql Date column Warning

I am doing copy activity to load the data from azure data factory to on premise SQL table.
I could see in copy activity column Mapping, there is warning message like source column is string with date and time value (2022-09-13 12:53:28) so that i created target SQL table column is date data type.
While import mapping in copy activity, i could see the whatever date column i mapped in SQL. there is warning message throwing in ADF. kindly advise, how do we resolve it.
screenshot:

The warning just indicates that it copy data will truncate source column data when additional data information is found in a column value. There would not be any error in this case but there might be data loss.
In your case, since the column value is 2022-09-13 12:53:28, it will be inserted without any issue into the datetime column without truncation.
The following is a demonstration where I try to insert the following source data:
id,first_name,date
1,Wenona,2022-09-13 12:53:28
2,Erhard,2022-09-13 13:53:28
3,Imelda,2022-09-13 14:53:28
The copy activity runs successfully and inserts the data. The following is my target table data after inserting:
When I insert the following data, it would be truncated to just include a precision of 2 digits of milli seconds as shown below.
id,first_name,date
1,Wenona,2022-09-13 12:53:28.11111
2,Erhard,2022-09-13 13:53:28.11111
3,Imelda,2022-09-13 14:53:28.11111

Delete data from teradata using pyspark

I am trying to delete the record from teradata and then write into the table for avoiding duplicates
So i have tried in many ways which is not working
I have tried deleting while reading the data which is giving syntax error like '(' expected between table and delete
spark.read.format('jdbc').options('driver','com.TeradataDriver').options('user','user').options('pwd','pwd').options('dbtable','delete from table').load()
Also tried like below, which is also giving syntax error like something expected between '('and delete
options('dbtable','(delete from table) as td')
2)I have tried deleting while writing the data which is not working
df.write.format('jdbc').options('driver','com.TeradataDriver').options('user','user').options('pwd','pwd').options('dbtable','table').('preactions','delete from table').save()

Possible solution is to call procedure which delete data.
import teradata
host,username,password = '','', ''
udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False)
with udaExec.connect(method="odbc"
,system=host
,username=username
,password=password
,driver="Teradata Database ODBC Driver 16.20"
,charset= 'UTF16'
,transactionMode='Teradata') as connect:
connect.execute("CALL db.PRC_DELETE()", queryTimeout=0)

how to view data catalog table in S3 using redshift spectrum

I created external schema for my database in aws glue. I can see the list of table but I cannot look into the json data. redshift throws me this errors.
[Amazon](500310) Invalid operation: S3 Query Exception (Fetch)
Details:
-----------------------------------------------
error: S3 Query Exception (Fetch)
code: 15001
context: Task failed due to an internal error. Error occured during Ion/JSON extractor match: IERR_INVALID_SYNTAX
query: 250284
location: dory_util.cpp:717
process: query2_124_250284 [pid=12336]
-----------------------------------------------;
1 statement failed.
I dont want to create external tables because I will create a view combining the external tables in the data catalog in aws glue.
Just an update:
I used aws glue crawler in creating the tables in the data catalog. They are in json format. If I use a job that will upload this data in redshift they are loaded as flat file (except arrays) in redshift table.
Example of json data:
{
"array": [
1,
2,
3
],
"boolean": true,
"null": null,
"number": 123,
"object": {
"a": "b",
"c": "d",
"e": "f"
},
"string": "Hello World"
}
If I upload them using a job in aws glue the output will be like (as table)
see image
Now, I have trmendous amount of tables crawled in data catalog. I am struggling creating the individual script of this tables that is why an amazon redshift spectrum external schema can be helpful.
However when I query the external table in the external schema I am getting the error as posted above. I do not encounter problems with external tables from the data catalog if they are loaded as csv but the format files I need to read in redshift spectrum should be in json.
Is it possible to view the external table in redshift spectrum the same format when it is loaded using a job?

beni,
The errors thrown by RedShift Spectrum may not always be accurate. I can only confirm the querying with JSON should work similar to other data formats. By the way, the external table needs to be corrected through SQL client within spectrum database.
So, I will suggest to refer this and this to review your steps

Insert yyyyMMdd string into date column using Talend

I have the follow situation:
A PostgreSQL database with a table that contains a date type column called date.
A string from a delimited .txt file outputting: 20170101.
I want to insert the string into the date type column.
So far i have tried the following with mixed results/errors:
row1.YYYYMMDD
Detail Message: Type mismatch: cannot convert from String to Date
Explanation: This one is fairly obvious.
TalendDate.parseDate("yyyyMMdd",row1.YYYYMMDD)
Batch entry 0 INSERT INTO "data" ("location_id","date","avg_winddirection","avg_windspeed","avg_temperature","min_temperature","max_temperature","total_hours_sun","avg_precipitation") VALUES (209,2017-01-01 00:00:00.000000 +01:00:00,207,7.7,NULL,NULL,NULL,NULL,NULL) was aborted. Call getNextException to see the cause.
can see the string parsed into "2017-01-01 00:00:00.000000 +01:00:00".
When I try to execute the query directly i get a "SQL Error: 42601: ERROR: Syntax error at "00" position 194"
Other observations/attempts:
The funny thing is if I use '20170101' as a string in the query it works, see below.
INSERT INTO "data" ("location_id","date","avg_winddirection","avg_windspeed","avg_temperature","min_temperature","max_temperature","total_hours_sun","avg_precipitation") VALUES (209,'20170101',207,7.7,NULL,NULL,NULL,NULL,NULL)
I've also tried to change the schema of the database date column to string. It produces the following:
Batch entry 0 INSERT INTO "data" ("location_id","date","avg_winddirection","avg_windspeed","avg_temperature","min_temperature","max_temperature","total_hours_sun","avg_precipitation") VALUES (209,20170101,207,7.7,NULL,NULL,NULL,NULL,NULL) was aborted. Call getNextException to see the cause.
This query also doesn't work directly because the date isn't between single quotes.
What am i missing or not doing?
(I've started learning to use Talend 2-3 days ago)
EDIT//
Screenshots of my Job and tMap
http://imgur.com/a/kSFd0
EDIT//It doesnt appear to be a date formatting problem but a Talend to PostgreSQL connection problem
EDIT//
FIXED: It was a stupid easy problem/solution ofcourse. THe database name and schema name fields were empty... so it basically didnt know where to connect

You don't have to do anything to insert a string like 20170101 into a date column. PostgreSQL will handle it for you it's just ISO 8601's date format.
CREATE TABLE foo ( x date );
INSERT INTO foo (x) VALUES ( '20170101' );
This is just a talend problem, if anything.

[..] (209,2017-01-01 00:00:00.000000 +01:00:00,207,7.7,NULL,NULL,NULL,NULL,NULL)[..]
If Talend doesn't know by itself that passing timestamp into query requires it to be single quoted, then if possible - you need to do it.

FIXED: It was a stupid easy problem/solution ofcourse. THe database name and schema name fields were empty... so it basically didnt know where to connect thats why i got the BATCH 0 error and when i went deeper while debugging i found it couldnt find the table, stating the relation didnt exist.

Try like this,
The data in input file is: 20170101(in String format)
then set the tMap like,
The output is as follows:

string or binary data error even if 0 rows are inserted

I am trying to insert data into a temp table by joining other two tables but for some reason, i keep getting this error String or Binary data would be truncated.
On debugging, I realized there are no rows being inserted into the table and it still throws an error.
To get rid of this, I had finally used SET ANSI_WARNINGS OFF inside the stored procedure and it worked fine. Now the issue is I cannot recompile the stored procedure with this settings in the production database and I want this issue to be fixed. And the other thing which is more irritating is, by default the ANSI_WARNINGS are actually OFF for the database.
Please let me know what could be the possible solution. It would be of great help.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse