Find and remove non-printing characters from data

Find and remove non-printing characters from data - unicode

I was expecting both the records to return in case of following query. But I got back only second record.
WITH students_results(student_id, result) AS (VALUES
(1,  '️सिनेमा'  ),
(2, 'सिनेमा'))
SELECT
    student_id,
    result
FROM students_results
where result = 'सिनेमा'
The first record has a non printing unicode at the beginning \ufe0f
How do I find and remove such characters from data?

This is how you find such data.
WITH students_results(student_id, result) AS (VALUES
(1, '️सिनेमा'),
(2, 'सिनेमा')
)
SELECT student_id
FROM students_results
WHERE char2hexint(result) like '%FE0F%'
As for removing it, this is not quite what Athena is for. As you will probably be aware, it's an analytics, not an ETL service. I would advise to cleanse your data using AWS Glue, or even locally if it's applicable to your use case.

Related

With PostgREST, convert a column to and from an external encoding in the API

We are using PostgREST to automatically generate a REST API for a Postgres database. Our primary keys have an external representation that's different from how we store them internally. For simplicity's sake lets pretend the ids are stored as integers but we represent them as hexadecimal strings outwardly.
It's simple enough to get PostgREST to convert to the external representation for read operations:
CREATE DOMAIN hexid AS bigint;
CREATE TABLE fruits (
fruit_id hexid PRIMARY KEY,
name text
);
CREATE OR REPLACE VIEW api_fruits AS
SELECT to_hex(fruit_id) as fruit_id, name FROM fruits;
INSERT INTO fruits(fruit_id, name) VALUES('51955', 'avocado');
PostgREST generates the expected representation when we GET api_fruits:
[
{
"fruit_id": "caf3",
"name": "avocado"
}
]
But that's about as far as we get with this solution. It's a one way transformation so we won't be able to POST/PATCH records this way. The way PostgREST works is to transform such requests into equivalent INSERT and UPDATE statements. But this view with its custom formatting is not updatable. This is what would happen if we tried:
ERROR: cannot insert into column "fruit_id" of view "api_fruits"
DETAIL: View columns that are not columns of their base relation are not updatable.
STATEMENT: WITH pgrst_source AS (WITH pgrst_payload AS (SELECT $1::json AS json_data), pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload) INSERT INTO "api_x"."api_fruits"("fruit_id", "name") SELECT "fruit_id", "name" FROM json_populate_recordset (null::"api_x"."api_fruits", (SELECT val FROM pgrst_body)) _ RETURNING "api_x"."api_fruits".*) SELECT '' AS total_result_set, pg_catalog.count(_postgrest_t) AS page_total, CASE WHEN pg_catalog.count(_postgrest_t) = 1 THEN coalesce((
WITH data AS (SELECT row_to_json(_) AS row FROM pgrst_source AS _ LIMIT 1)
SELECT array_agg(json_data.key || '=eq.' || json_data.value)
FROM data CROSS JOIN json_each_text(data.row) AS json_data
WHERE json_data.key IN ('')
), array[]::text[]) ELSE array[]::text[] END AS header, '' AS body, nullif(current_setting('response.headers', true), '') AS response_headers, nullif(current_setting('response.status', true), '') AS response_status FROM (SELECT * FROM pgrst_source) _postgrest_t
We can't INSERT into "View columns that are not columns of their base relation".
The obvious workaround is to serve fruit_id as a straight column, just an integer. With some post and preprocessing at the nginx level we can hex encode it there (and hex decode incoming ids). I'm wondering if we can do better than that though. For large API operations, re-encoding the JSON will use a lot of memory and CPU time and it seems so unnecessary.
It would have been great to be able to use a custom CREATE CAST to take the incoming hexadecimal strings and turn them back into integers, something like this:
CREATE CAST (json AS hexid) WITH FUNCTION json_to_hexid AS ASSIGNMENT;
But alas custom casts are ignored on CREATE DOMAIN types. And we can't make a true custom column type because our cloud Postgres host (Google Cloud SQL) doesn't allow custom extensions.
It feels like some combination of INSTEAD OF triggers or rules could work. But when using query parameters to filter results using query parameters (e.g. select a fruit by id), I don't think there's an appropriate trigger to use. INSTEAD OF doesn't work for straight SELECT does it?
For example I've tested doing something like this to take care of INSERT and allow POST with PostgREST. It works:
CREATE OR REPLACE FUNCTION api_fruits_insert()
RETURNS trigger AS
$$
BEGIN
INSERT INTO fruits(fruit_id, name) VALUES (('x' || lpad(NEW.fruit_id, 16, '0'))::bit(64)::bigint::hexid, NEW.name);
RETURN NEW;
END
$$ LANGUAGE 'plpgsql';
CREATE TRIGGER api_fruits_insert
INSTEAD OF INSERT
ON api_fruits
FOR EACH ROW
EXECUTE PROCEDURE api_fruits_insert();
The trouble is in the WHERE clause. Let's PATCH api_fruits?fruit_id=in.(7b,caf3) with {"name": "pear"}. This works out of the box since the name column is updatable but look at the query:
WITH pgrst_source AS (WITH pgrst_payload AS (SELECT $1::json AS json_data), pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload) UPDATE "api_x"."api_fruits" SET "name" = _."name" FROM (SELECT * FROM json_populate_recordset (null::"api_x"."api_fruits" , (SELECT val FROM pgrst_body) )) _ WHERE "api_x"."api_fruits"."fruit_id" = ANY ($2) RETURNING 1) SELECT '' AS total_result_set, pg_catalog.count(_postgrest_t) AS page_total, array[]::text[] AS header, '' AS body, nullif(current_setting('response.headers', true), '') AS response_headers, nullif(current_setting('response.status', true), '') AS response_status FROM (SELECT * FROM pgrst_source) _postgrest_t
DETAIL: parameters: $1 = '{
"name": "pear"
}', $2 = '{7b,caf3}'
So we have essentially UPDATE api_fruits SET name='berry' WHERE fruit_id IN ('7b', 'caf3');. Surprisingly this works but it's a full table scan so Postgres can evaluate to_hex(fruit_id) for each row looking for matches. The same happens if we try to GET a record by fruit_id. How would we rewrite the WHERE clauses?
It really feels like some combination of just the right Postgres and PostgREST features should be able to get us to a point where it's all happening in Postgres without nginx's help and without excessive complexity. Any ideas?

Can postgreSQL OnConflict combine with JSON obejcts?

I wanted to perform a conditional insert in PostgreSQL. Something like:
INSERT INTO {TABLE_NAME} (user_id, data) values ('{user_id}', '{data}')
WHERE not exists(select 1 from files where user_id='{user_id}' and data->'userType'='Type1')
Unfortunately, insert and where does not cooperate in PostGreSQL. What could be a suitable syntax for my query? I was considering ON CONFLICT, but couldn't find the syntax for using it with JSON object. (Data in the example)
Is it possible?

Rewrite the VALUES part to a SELECT, then you can use a WHERE condition:
INSERT INTO { TABLE_NAME } ( user_id, data )
SELECT
user_id,
data
FROM
( VALUES ( '{user_id}', '{data}' ) ) sub ( user_id, data )
WHERE
NOT EXISTS (
SELECT 1
FROM files
WHERE user_id = '{user_id}'
AND data -> 'userType' = 'Type1'
);
But, there is NO guarantee that the WHERE condition works! Another transaction that has not been committed yet, is invisible to this query. This could lead to data quality issues.

You can use INSERT ... SELECT ... WHERE ....
INSERT INTO elbat
(user_id,
data)
SELECT 'abc',
'xyz'
WHERE NOT EXISTS (SELECT *
FROM files
WHERE user_id = 'abc'
AND data->>'userType' = 'Type1')
And it looks like you're creating the query in a host language. Don't use string concatenation or interpolation for getting the values in it. That's error prone and makes your application vulnerable to SQL injection attacks. Look up how to use parameterized queries in your host language. Very likely for the table name parameters cannot be used. You need some other method of either whitelisting the names or properly quoting them.

How to split array in json using json_query?

I've got a column in a table that's a json. It contains only values without keys like
Now I'm trying to split the data from the json and create new table using every index of each array as new entry like
I've already tried
SELECT JSON_QUERY(abc) as 'Type', Id as 'ValueId' from Table FOR JSON AUTO
Is there any way to handle splitting given that some arrays might be empty and look like
[]
?

A fairly simply approach would be to use outer apply with openjson.
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
Id int,
Value nvarchar(20)
)
INSERT INTO #T VALUES
(1, '[10]'),
(2, '[20, 200]'),
(3, '[]'),
(4, '')
The query:
SELECT Id, JsonValues.Value
FROM #T As t
OUTER APPLY
OPENJSON( Value ) As JsonValues
WHERE ISJSON(t.Value) = 1
Results:
Id Value
1 10
2 20
2 200
3 NULL
Note the ISJSON condition in the where clause will prevent exceptions in case the Value column contains anything other than a valid json (an empty array [] is still considered valid for this purpose).
If you don't want to return a row where the json array is empty, use cross apply instead of outer apply.

Your own code calling for FOR JSON AUTO tries to create JSON out of tabular data. But what you really needs seems to be the opposite direction: You want to transform JSON to a result set, a derived table. This is done by OPENJSON.
Your JSON seems to be a very minimalistic array.
You can try something along this.
DECLARE #json NVARCHAR(MAX) =N'[1,2,3]';
SELECT * FROM OPENJSON(#json);
The result returns the zero-based ordinal position in key, the actual value in value and a (very limited) type-enum.
Hint: If you want to use this against a table's column you must use APPLY, something along
SELECT *
FROM YourTable t
OUTER APPLY OPENJSON(t.TheJsonColumn);

How to simply transpose two columns into a single row in postgres?

Following is the output of my query:
key ;value
"2BxtRdkRvwc-2hPjF8LBmHD-finapril" ;4
"3QXORSfsIY0-2sDizCyvY6m-finapril" ;12
"4QXORSfsIY0-2sDizCyvY6m-curr" ;12
"5QXORSfsIY0-29Xcom4SHVh-finapril" ;12
What i want is simply to bring the rows into columns so that only one row remains with the key as the column name.
I have seen examples with crosstab catering to much complex use cases but i want to know if there is a simpler way in which this can be achieved in my particular case?
Any help is appreciated
Thanks
Postgres Version : 9.5.10

It is impossible to execute a query resulting in an unknown number and names of columns. The simplest way to get a similar effect is to generate a json object which can be easily interpreted by a client app as a pivot table, example:
with the_data(key, value) as (
values
('2BxtRdkRvwc-2hPjF8LBmHD-finapril', 4),
('3QXORSfsIY0-2sDizCyvY6m-finapril', 12),
('4QXORSfsIY0-2sDizCyvY6m-curr', 12),
('5QXORSfsIY0-29Xcom4SHVh-finapril', 12)
)
select jsonb_object_agg(key, value)
from the_data;
The query returns this json object:
{
"4QXORSfsIY0-2sDizCyvY6m-curr": 12,
"2BxtRdkRvwc-2hPjF8LBmHD-finapril": 4,
"3QXORSfsIY0-2sDizCyvY6m-finapril": 12,
"5QXORSfsIY0-29Xcom4SHVh-finapril": 12
}

to find even odd and plain series in postgresql

this might be naive but I want to know is there any way possible to find rows which have even entries and which have odd.
I have data in this format
"41,43,45,49,35,39,47,37"
"12,14,18,16,20,24,22,10"
"1,2,3,4,5,6"
"1,7,521,65,32"
what I have tried is to split these values with an Id column and then reading them with Even and Odd functions but it is too time taking.
is there a query or a function through I can find out that which of the rows are even, odd sequence and arbitrary?
thanks in advance.

Assuming your table has some way of (uniquely) identifying each of those lists, this should work:
create table data (id integer, list text);
insert into data
values
(1, '41,43,45,49,35,39,47,37'),
(2, '12,14,18,16,20,24,22,10'),
(3, '1,2,3,4,5,6'),
(4, '1,7,521,65,32');
with normalized as (
select id, unnest(string_to_array(list,',')::int[]) as val
from data
)
select id,
bool_and((val % 2) = 0) as all_even,
bool_and((val % 2) <> 0) as all_odd
from normalized
group by id;
Not sure if that is fast enough for your needs

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Find and remove non-printing characters from data - unicode

Related

With PostgREST, convert a column to and from an external encoding in the API

Can postgreSQL OnConflict combine with JSON obejcts?

How to split array in json using json_query?

How to simply transpose two columns into a single row in postgres?

to find even odd and plain series in postgresql

Categories

Resources