Redshift COPY using JSONPath for missing array/fields - amazon-redshift

I am using the COPY command to load the JSON dataset from S3 to Redshift table. The data is getting loaded partially but it ignores records which has missing data(key-value/array) i.e. from the below example only the first record will get loaded.
Query:
COPY address from 's3://mybucket/address.json'
credentials 'aws_access_key_id=XXXXXXX;aws_secret_access_key=XXXXXXX'
maxerror as 250
json 's3:/mybucket/address_jsonpath.json';
My question is how can I load all the records from address.json even when some records will have missing key/data, similar to the below sample data set.
Sample of JSON
{
"name": "Sam P",
"addresses": [
{
"zip": "12345",
"city": "Silver Spring",
"street_address": "2960 Silver Ave",
"state": "MD"
},
{
"zip": "99999",
"city": "Curry",
"street_address": "2960 Silver Ave",
"state": "PA"
}
]
}
{
"name": "Sam Q",
"addresses": [ ]
}
{
"name": "Sam R"
}
Is there an alternative to FILLRECORD for JSON dataset?
I am looking for an implementation or a workaround which can load all the above 3 records in the Redshift table.

There is no FILLRECORD equivalent for COPY from JSON. It is explicitly not supported in the documentation.
But you have a more fundamental issue - the first record contains an array of multiple addresses. Redshift's COPY from JSON does not allow you to create multiple rows from nested arrays.
The simplest way to resolve this is to define the files to be loaded as an external table and use our nested data syntax to expand the embedded array into full rows. Then use an INSERT INTO to load the data to a final table.
DROP TABLE IF EXISTS spectrum.partial_json;
CREATE EXTERNAL TABLE spectrum.partial_json (
name VARCHAR(100),
addresses ARRAY<STRUCT<zip:INTEGER
,city:VARCHAR(100)
,street_address:VARCHAR(255)
,state:VARCHAR(2)>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://my-test-files/partial_json/'
;
INSERT INTO final_table
SELECT ext.name
, address.zip
, address.city
, address.street_address
, address.state
FROM spectrum.partial_json ext
LEFT JOIN ext.addresses address ON true
;
-- name | zip | city | street_address | state
-- -------+-------+---------------+-----------------+-------
-- Sam P | 12345 | Silver Spring | 2960 Silver Ave | MD
-- Sam P | 99999 | Curry | 2960 Silver Ave | PA
-- Sam Q | | | |
-- Sam R | | | |
NB: I tweaked your example JSON a little to make this simpler. For instance you had un-keyed objects as the values for name that I made into plain string values.

how about...
{
"name": "Sam R"
"address": ""
}

Related

Handle "excluded" updates in ksqlDB

I've created a stream and a table in this way:
CREATE STREAM user_stream
(id VARCHAR, name VARCHAR, age INT)
WITH (kafka_topic='user_topic', value_format='json', partitions=1);
CREATE TABLE user_table AS
SELECT
id,
LATEST_BY_OFFSET(name) as name,
LATEST_BY_OFFSET(age) as age
FROM user_stream
GROUP BY id
EMIT CHANGES;
And submit some event to the user_topic as:
{ "id": "user_1", "name": "Sherie Shine", "age": 31 }
{ "id": "user_2", "name": "Liv Denman", "age": 52 }
{ "id": "user_3", "name": "Frona Ness", "age": 44 }
Then query the table as:
SELECT * FROM user_table WHERE age > 40 EMIT CHANGES;
We'll get two rows:
+------------+----------------+-------+
|SID |NAME |AGE |
+------------+----------------+-------+
|user_2 |Frona Ness |44 |
|user_3 |Liv Denman |52 |
Post another message to the user_topic:
{ "id": "user_3", "age": 35 }
I'm expecting user_3 will be removed from the current query, but I've received nothing.
If I interrupt the current query with Ctrl+C and issue the same query again, I'll see only user_2 as it's the only one with age > 40 now.
How can we handle the update to remove a row from the filter?
The issue is gone after upgrading confluent-6.1.1 to confluent-6.2.0.

Viewing a postgresql json table in adminer as a (sortable) table

Viewing a json table in adminer as a (sortable) table
I have a jsonb field in a database table, I'd like to be able to save it as a view in Adminer so that I can quickly search and sort the fields like with a standard database table.
I wonder if the json-column adminer plugin could help, but I can't work out how to use it.
I'm using the adminer docker image which I believe has plugins already built in.
If I have a database like this (somebody kindly put together this fiddle for a previous question)
CREATE TABLE prices( sub_table JSONB );
INSERT INTO prices VALUES
('{ "0": {"Name": "CompX", "Price": 10, "index": 1, "Date": "2020-01-09T00:00:00.000Z"},
"1": {"Name": "CompY", "Price": 20, "index": 1, "Date": "2020-01-09T00:00:00.000Z"},
"2": {"Name": "CompX", "Price": 19, "index": 2, "Date": "2020-01-10T00:00:00.000Z"}
}');
and to view a subset of the table
SELECT j.value
FROM prices p
CROSS JOIN jsonb_each(sub_table) AS j(e)
WHERE (j.value -> 'Name')::text = '"CompX"'
I'd like to see the following table in Adminer
| Date | Name | Price | index |
| ------------------------ | ----- | ----- | ----- |
| 2020-01-09T00:00:00.000Z | CompX | 10 | 1 |
| 2020-01-10T00:00:00.000Z | CompX | 19 | 2 |
| | | | |
as opposed to:
| value |
| ------------------------------------------------------------ |
| {"Date": "2020-01-09T00:00:00.000Z", "Name": "CompX", "Price": 10, "index": 1} |
| {"Date": "2020-01-09T00:00:00.000Z", "Name": "CompX", "Price": 10, "index": 1} |
EDIT - building on a-horse-with-no-name answer. The following saves a view with the appropriate columns, and can then be searched/sorted in Adminer in the same way as a standard table.
CREATE VIEW PriceSummary AS
select r.*
from prices p
cross join lateral jsonb_each(p.sub_table) as x(key, value)
cross join lateral jsonb_to_record(x.value) as r("Name" text, "Price" int, index int, "Date" date)
There is no automatic conversion available, but you could convert the JSON value to a record, to make the display easier:
select r.*
from prices p
cross join lateral jsonb_each(p.sub_table) as x(key, value)
cross join lateral jsonb_to_record(x.value) as r("Name" text, "Price" int, index int, "Date" date)
where r."Name" = 'CompX'
Online example
The json-column adminer plugin will do the job to some extent, in the sense that it will display that JSON values better.
Here is a minimal docker-compose.yml that creates a postgres conatiner and links an adminer to it. To have plugins installed in the adminer container, you can use the environment variable ADMINER_PLUGINS as shown below:
version: '3'
services:
db:
image: postgres
restart: always
environment:
POSTGRES_PASSWORD: example
adminer:
image: adminer
restart: always
environment:
- ADMINER_PLUGINS=json-column tables-filter tinymce
ports:
- 9999:8080
Access the adminer web UI at localhost:9999. Use username: postgres, password: example to connect to the postgres database.
When you will edit a table that contains JSON columns, it will be displayed like this:

Store and update jsonb value in Postgres

I have a table such as:
ID | Details
1 | {"name": "my_name", "phone": "1234", "address": "my address"}
2 | {"name": "his_name", "phone": "4321", "address": "his address"}
In this, Details is a jsonb object. I want to add another field named 'tags' to jsonb which should have some particular keys. In this case, "name", "phone". The final state after execution of the query should be:
ID | Details
1 | {"tags": {"name": "my_name", "phone": "1234"},"name": "my_name", "phone": "1234", "address":"my address"}
2 | {"tags": {"name": "his_name", "phone": "4321"},"name": "his_name", "phone": "4321", "address":"his address"}
I can think of the following steps to get this done:
Loop over each row and extract the details["name"] and details["phone"] in variables.
Add these variables to the jsonb.
I cant think of how the respective postgres query for this should be. Please guide.
use jsonb_build_object
update t set details
= jsonb_build_object ( 'tags',
jsonb_build_object( 'name', details->>'name', 'phone',details->>'phone')
)
|| details
DEMO
Use the concatenate operator, of course!
https://www.postgresql.org/docs/current/functions-json.html
update t1 set details = details || '{"tags": {"name": "my_name"}}' where id = 1
You can extract the keys you are interested in, build a new json value and append that to the column:
update the_table
set details = details || jsonb_build_object('tags',
jsonb_build_object('name', details -> 'name',
'phone', details -> 'phone'));

Change Json text to json array

Currently in my table data is like this
Field name : author
Field Data : In json form
When we run select query
SELECT bs.author FROM books bs; it returns data like this
"[{\"author_id\": 1, \"author_name\": \"It happend once again\", \"author_slug\": \"fiction-books\"}]"
But I need selected data should be like this
[
{
"author_id": 1,
"author_name": "It happend once again",
"author_slug": "fiction-books"
}
]
Database : PostgreSql
Note : Please avoid PHP code or iteration by PHP code
The answer depends on the version of PostgreSQL you are using and ALSO what client you are using but PostgreSQL has lots of builtin json processing functions.
https://www.postgresql.org/docs/10/functions-json.html
Your goal is also not clearly defined...If all you want to do is pretty print the json, this is included.
# select jsonb_pretty('[{"author_id": 1,"author_name":"It happend once again","author_slug":"fiction-books"}]') as json;
json
-------------------------------------------------
[ +
{ +
"author_id": 1, +
"author_name": "It happend once again",+
"author_slug": "fiction-books" +
} +
]
If instead you're looking for how to populate a postgres record set from json, this is also included:
# select * from json_to_recordset('[{"author_id": 1,"author_name":"It happend once again","author_slug":"fiction-books"}]')
as x(author_id text, author_name text, author_slug text);
author_id | author_name | author_slug
-----------+-----------------------+---------------
1 | It happend once again | fiction-books

Updating PostgreSQL JSONB key value by adding 1 to existing key value

Im getting error while updating JSON data
CREATE TABLE testTable
AS
SELECT $${
"id": 1,
"value": 100
}$$::jsonb AS jsondata;
and I want to update value to 101 by adding 1, after visiting many websites I found this statement
UPDATE testTable
SET jsondata = JSONB_SET(jsondata, '{value}', (jsondata->>'value')::int + 1);
but above one is giving error "cannot convert jsonb to int"
and my expected output is
{
"id": 1,
"value": 101
}
Look at the signature of jsonb_set (using \df jsonb_set)
Schema | Name | Result data type | Argument data types | Type
------------+-----------+------------------+----------------------------------------------------------------------------------------+--------
pg_catalog | jsonb_set | jsonb | jsonb_in jsonb, path text[], replacement jsonb, create_if_missing boolean DEFAULT true | normal
What you want is this..
UPDATE testTable
SET jsondata = jsonb_set(
jsondata,
ARRAY['value'],
to_jsonb((jsondata->>'value')::int + 1)
)
;