amazon redshift copy using json having trouble - amazon-redshift

I have created simple table called as test3
create table if not exists test3(
Studies varchar(300) not null,
Series varchar(500) not null
);
I got some json data
{
"Studies": [{
"studyinstanceuid": "2.16.840.1.114151",
"studydescription": "Some study",
"studydatetime": "2014-10-03 08:36:00"
}],
"Series": [{
"SeriesKey": "abc",
"SeriesInstanceUid": "xyz",
"studyinstanceuid": "2.16.840.1.114151",
"SeriesDateTime": "2014-10-03 09:05:09"
}, {
"SeriesKey": "efg",
"SeriesInstanceUid": "stw",
"studyinstanceuid": "2.16.840.1.114151",
"SeriesDateTime": "0001-01-01 00:00:00"
}],
"ExamKey": "exam-key",
}
and here is my json_path
{
"jsonpaths": [
"$['Studies']",
"$['Series']"
]
}
Both the json data and json path is uploaded to s3.
I try to execute the following copy command in redshift consule.
copy test3
from 's3://mybucket/redshift_demo/input.json'
credentials 'aws_access_key_id=my_key;aws_secret_access_key=my_access'
json 's3://mybucket/redsift_demo/json_path.json'
I get the following error. Can anyone please help been stuck on this for sometime now.
Amazon](500310) Invalid operation: Number of jsonpaths and the number of columns should match. JSONPath size: 1, Number of columns in table or column list: 2
Details:
-----------------------------------------------
error: Number of jsonpaths and the number of columns should match. JSONPath size: 1, Number of columns in table or column list: 2
code: 8001
context:
query: 1125432
location: s3_utility.cpp:670
process: padbmaster [pid=83747]
-----------------------------------------------;
1 statement failed.
Execution time: 1.58s

Redshift's error is misleading. The issue is that your input file is wrongly formatted: you have an extra comma after the last JSON entry.
Copy succeeds if you change "ExamKey": "exam-key", to "ExamKey": "exam-key"

Related

InvalidInputException: AWS Personalize error importing boolean fields in user or item metadata

I'm building recommender system using AWS Personalize. User-personalization recipe has 3 dataset inputs: interactions, user_metadata and item_metadata. I am having trouble importing user metadata which contains boolean field.
I created the following schema:
user_schema = {
"type": "record",
"name": "Users",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "USER_ID",
"type": "string"
},
{
"name": "type",
"type": [
"null",
"string"
],
"categorical": True
},
{
"name": "lang",
"type": [
"null",
"string"
],
"categorical": True
},
{
"name": "is_active",
"type": "boolean"
}
],
"version": "1.0"
}
dataset csv file content looks like:
USER_ID,type,lang,is_active
1234#gmail.com ,,geo,True
01027061015#mail.ru ,facebook,eng,True
03dadahda#gmail.com ,facebook,geo,True
040168fadw#gmail.com ,facebook,geo,False
I uploaded given csv file on s3 bucket.
When I am trying create dataset import job it gives me the following exception:
InvalidInputException: An error occurred (InvalidInputException) when calling the CreateDatasetImportJob operation: Input csv has rows that do not conform to the dataset schema. Please ensure all required data fields are present and that they are of the type specified in the schema.
I tested and it works without boolean field is_active. There are no NaN values in given column!
It'd be nice to have an ability to directly test if your pandas dataframe or csv file conforms given schema and possibly get more detailed error message.
Does anybody know how to format boolean field to fix that issue?
I found a solution through many trials. Checked the AWS Personalization documentation (https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html#dataset-requirements) which says that: boolean (values true and false must be lower case in your data).
Then I tried several things to find a solution, and one of them really worked. But still the hard way to find a solution and spent hours.
Solution:
Convert column in pandas DataFrame into string (Object) format.
lowercase True and False string values to get true and false.
store pandas DataFrame as csv file
it results in lowercase values of true and false.
USER_ID,type,lang,is_active
1234#gmail.com ,,geo,true
01027061015#mail.ru ,facebook,eng,true
03dadahda#gmail.com ,facebook,geo,true
040168fadw#gmail.com ,facebook,geo,false
That's all! There is no need to change "boolean" type in schema to "string"!
Hopefully they'll solve that issue soon since I contacted with AWS technical support with the same issue.

Retain double Quotes while Importing CSV File in pgAdmin -4

When I tried to create an Import Job in pgAdmin 4, to copy the CSV File into the table, all double quotes are getting truncated. I am sure, something to do with Quote and Escape, not understanding, please help.
My CSV File looks like
After Import Job , the data Inserted is without double quotes in 'description' column.
Here are the Import Job from pgAdmin -
[![enter image description here][2]][2]
Data in Excel File looks like
{
"push": {
"title": "Recent Detected",
"body": " were above target",
"expandedBody" : " more details about this pattern"
}
}
This is a sinle column Data. This is String Column. I want to Import as it it. But, if I import, the data that I am getting in the table is
{
push: {
title: Recent Detected,
body: were above target,
expandedBody : more details about this pattern
}
}
| Id | description |
| ---| ------------|
| 1 | {
"push": {
"title": "Recent Detected",
"body": " were above target",
"expandedBody" : " more details about this pattern"
}
}
-----------------------
Table Structure :
Create table abcd ( id integer, description character varying );
What I am trying to do is :
--Use Import function in pgAdmin 4 to populate the table with the Data from the above CSV File.
What is going on :
Data got inserted without the double quotes.
What is Expected :
Data should get Inserted with the quotes, as if the quotes are part of the string .
What is going wrong -
Import settings, something I am missing or not understanding.

SSAS Tabular Add Column via TMSL

Good Morning,
Objective: I am working on trying to add new columns to a SSAS Tabular Model table. With a long-term aim to programmaticly made large-batch changes when needed.
Resources I've found:
https://learn.microsoft.com/en-us/sql/analysis-services/tabular-models-scripting-language-commands/create-command-tmsl
This one gives the template I've been following but seems to not work.
What I have tried so far:
{
"create": {
"parentObject": {
"database": "TabularModel_1_dev"
, "table": "TableABC"
},
"columns": [
{
"name": "New Column"
, "dataType": "string"
, "sourceColumn": "Column from SQL Source"
}
]
}
}
This first one is the most true to the example but returns the following error:
"The JSON DDL request failed with the following error: Unrecognized JSON property: columns. Check path 'create.columns', line 7, position 15.."
Attempt Two:
{
"create": {
"parentObject": {
"database": "TabularModel_1_dev"
, "table": "TableABC"
},
"table": {
"name": "Item Details by Branch",
"columns": [
{
"name": "New Column"
, "dataType": "string"
, "sourceColumn": "New Column"
}
]
}
}
}
Adding table within the child list returns error too;
"...Cannot execute the Create command: the specified parent object cannot have a child object of type Table.."
Omitting the table within the parentObject is unsuccessful as well.
I know it's been three years since the post, but I too was attempting the same thing and stumbled across this post in my quest. I ended up reaching out to microsoft and was told that the Add Column example they gave in their documentation was a "doc bug". In fact, you can't add just a column, you have to feed it an entire table definition via createOrReplace.
SSAS Error Creating Column with TMSL

ADF cannot parse DateTimeOffset

We have JSON's that contain timestamps in the format:
2016-11-03T03:05:21.673Z
2016-11-03T03:05:21.63Z
So the appropriate format to parse the data is yyyy-MM-ddTHH:mm:ss.FFF\Z
I tried all of these variants to explain to ADF how to parse it:
"structure": [
{
"name": "data_event_time",
"type": "DateTime",
"format": "yyyy-MM-ddTHH:mm:ss.FFF\\Z"
},
...
]
"structure": [
{
"name": "data_event_time",
"type": "DateTimeOffset",
"format": "yyyy-MM-ddTHH:mm:ss.FFFZ"
},
...
]
"structure": [
{
"name": "data_event_time",
"type": "DateTimeOffset"
},
...
]
"structure": [
{
"name": "data_event_time",
"type": "DateTime"
},
...
]
In all of these cases above ADF fails with the error:
Copy activity encountered a user error at Sink side: ErrorCode=UserErrorInvalidDataValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column 'data_event_time' contains an invalid value '2016-11-13T00:44:50.573Z'. Cannot convert '2016-11-13T00:44:50.573Z' to type 'DateTimeOffset' with format 'yyyy-MM-dd HH:mm:ss.fffffff zzz'.,Source=Microsoft.DataTransfer.Common,''Type=System.FormatException,Message=String was not recognized as a valid DateTime.,Source=mscorlib,'.
What am i doing wrong? How to fix it?
The previous issue has been fixed. Thanx wBob.
But now i have a new issue at the sink level.
I'm trying to load data from Azure Blob Storage to Azure DWH via ADF + PolyBase:
"sink": {
"type": "SqlDWSink",
"sqlWriterCleanupScript": "$$Text.Format('DELETE FROM [stage].[events] WHERE data_event_time >= \\'{0:yyyy-MM-dd HH:mm}\\' AND data_event_time < \\'{1:yyyy-MM-dd HH:mm}\\'', WindowStart, WindowEnd)",
"writeBatchSize": 6000000,
"writeBatchTimeout": "00:15:00",
"allowPolyBase": true,
"polyBaseSettings": {
"rejectType": "percentage",
"rejectValue": 10.0,
"rejectSampleValue": 100,
"useTypeDefault": true
}
},
"enableStaging": true,
"stagingSettings": {
"linkedServiceName": "AppInsight-Stage-BlobStorage-LinkedService"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "..."
}
But the process fails with error:
Database operation failed. Error message from database execution : ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=107091;Query aborted-- the maximum reject threshold (10 %) was reached while reading from an external source: 6602 rows rejected out of total 6602 rows processed. Rows were rejected while reading from external source(s). 52168 rows rejected from external table [ADFCopyGeneratedExternalTable_0530887f-f870-4624-af46-249a39472bf3] in plan step 2 of query execution: Location: '/13/2cd1d10f-4f62-4983-a38d-685fc25c40a2_20161102_135850.blob' Column ordinal: 0, Expected data type: DATETIMEOFFSET(7) NOT NULL, Offending value: 2016-11-02T13:56:19.317Z (Column Conversion Error), Error: Conversion failed when converting the NVARCHAR value '2016-11-02T13:56:19.317Z' to data type DATETIMEOFFSET. Location: '/13/2cd1d10f-4f62-4983-a38d-685fc25c40a2_20161102_135850.blob' Column ordinal: 0, Expected ...
I read the Azure SQL Data Warehouse loading patterns and strategies
If the DATE_FORMAT argument isn’t designated, the following default formats are used:
DateTime: ‘yyyy-MM-dd HH:mm:ss’
SmallDateTime: ‘yyyy-MM-dd HH:mm’
Date: ‘yyyy-MM-dd’
DateTime2: ‘yyyy-MM-dd HH:mm:ss’
DateTimeOffset: ‘yyyy-MM-dd HH:mm:ss’
Time: ‘HH:mm:ss’
Looks like i have no ability at ADF level to specify the datetime format for PolyBase.
Does someone know any workaround?
We looked at a similar issue recently here:
What's reformatting my input data before I get to it?
JSON does not have a Datetime format as such, so leave the type and format elements out. Then your challenge is with the sync. Inserting these values into an Azure SQL Database for example should work.
"structure": [
{
"name": "data_event_time"
},
...
Looking at your error message, I would expect that to work inserting into a DATETIME column in SQL Data Warehouse (or SQL Database or SQL Server on a VM) but it is ordinary DATETIME data, not DATETIMEOFFSET.
If you have issues inserting into the target sink, you may have to workaround by not using the Polybase checkbox and code that side of the process yourself, eg
Copy raw files to blob storage or Azure Data Lake (now Polybase supports ADLS)
Create external tables over the files where the datetime data is set as varchar data-type
CTAS the data into an internal table, also converting the string datetime format to a proper DATETIME using T-SQL

How to Insert LinkList into Orient Db using batch command?

I am trying to insert some values to a table using Batch URL in orient db.
this is json string.
{"transaction":true,"operations":[{"type":"script","language":"javascript","script":"var result2= db.command(\"insert into ChatConversation set ChatID = 19,Messages = 28:119\")"}]}
But this gives an error.
error:
{
"errors": [{
"code": 500,
"reason": 500,
"content": "com.orientechnologies.orient.core.exception.OValidationException: The field 'ChatConversation.Messages' has been declared as LINKLIST but an incompatible type is used. Value: [#28:119]"
}
]
}
please advice me to solve this.
Thank in advance
To insert data into LinkList use this syntax:
insert into ChatConversation(Messages) values ([#28:119,#ANOTHER_RID,#THIRD_RID])