Formatting hstore column Postgres - postgresql

I'm trying to find the best way to format a hstore column (see screenshot) my goal is to have the same format based on the screenshot as the "updated_column. I was thinking about a case statement like :
Case when json_column -> id then 'id:'
any suggestion would be appreciated.

Migration approach:
Add new column with type text like you want it
make sure new data directly enters the new column as the string you want (pre-formatted at the backend)
Create a migration function that converts json column data batchwise into your new string table. You can use postgres replace/.. operations to reformat it. You can also use an external python script/...
remove the json column after the migration is done
Let me see what / how you have tried and then we can see how to improve/solve your issues.

So I think i found a temporary solution that will work, but I think like #Bergi mentioned a view might be more appropriate.
For now I will just use something like:
concat(concat(concat(concat('id',':',column -> 'id')
,' ','auth_id',':',column -> 'auth_id')
,' ','type',':',column -> 'type')
,' ','transaction',':',column -> 'transaction')

You can use some function to make it generic:
Let's get some example:
select '{"a":1,"b":2}'::json;
┌───────────────┐
│ json │
├───────────────┤
│ {"a":1,"b":2} │
└───────────────┘
(1 row)
Back to text:
select '{"a":1,"b":2}'::json::text;
┌───────────────┐
│ text │
├───────────────┤
│ {"a":1,"b":2} │
└───────────────┘
(1 row)
Now, remove the undesired tokens {}" with a regex:
select regexp_replace('{"a":1,"b":2}'::json::varchar, '["{}]+', '', 'g');
┌────────────────┐
│ regexp_replace │
├────────────────┤
│ a:1,b:2 │
└────────────────┘
(1 row)
and you can wrap it into a function:
create function text_from_json(json) returns text as $$select regexp_replace($1::text, '["{}]+', '', 'g')$$ language sql;
CREATE FUNCTION
Testing the function now:
tsdb=> select text_from_json('{"a":1,"b":2}'::json);
┌────────────────┐
│ text_from_json │
├────────────────┤
│ a:1,b:2 │
└────────────────┘

Related

Polars equivalent to SQL `COUNT(DISTINCT expr,[expr...])`, or other method of checking uniqueness

When processing data, I often add a check after each step to validate that the data still has the unique key I think it does. For example, I might check that my data is still unique on (a, b). To accomplish this, I would typically check that the number of distinct combinations of columns a and b equals the total number of rows.
In polars, to get a COUNT(DISTINCT ...) I can do
(
df
.select(['a', 'b'])
.unique()
.height
)
But height does not work on LazyFrames, so I need to actually materialize the entire data with this method, I think (?). Is there a better way?
For reference, in R's data.table library I would do
mtc_dt <- data.table::as.data.table(mtcars)
stopifnot(data.table::uniqueN(mtc_dt[, .(mpg, disp)]) == nrow(mtc_dt))
To any contributors reading:
Thanks for the great package! Has sped up many of my workflows to a fraction of the time.
You can use a map function that asserts on the unique count.
This allows you to get an eager DataFrame in the middle of a query plan.
Note that we turn off projection_pushdown optimization, as the optimizer is not able to know which subset of columns we select.
df = pl.DataFrame({
"foo": [1, 2, 3],
"bar": [None, "hello", None]
})
def unique_check(df: pl.DataFrame, subset: list[str]) -> pl.DataFrame:
assert df.select(pl.struct(subset).unique().count()).item() == df.height
return df
out = (df.lazy()
.map(lambda df: unique_check(df, ["foo", "bar"]), projection_pushdown=False)
.select("bar")
.collect()
)
print(out)
shape: (3, 1)
┌───────┐
│ bar │
│ --- │
│ str │
╞═══════╡
│ null │
│ hello │
│ null │
└───────┘
Not turning of predicate_pushdown is better, but then we must ensure the subset is selected before the map.
The answer here provides a technique that can answer this question: gather the columns together in a struct column, and then apply .n_unique() to that struct. That question uses groupby, but it will work without groupby as well.
(
df
.with_column(pl.struct(['a', 'b'].alias('ident'))
['ident'].n_unique()
)
I was able to run code more or less identical to this on a dataset I am working with, and got a sensible answer.
Note that I am not sure if this materializes the entire table before aggregating, nor if this works specifically on lazy data frames. If not, please let me know, and I will retract this answer.
If you have
df=pl.DataFrame({'a':[1,2,3],'b':[2,3,4],'c':[3,4,5]}).lazy()
and you want to see if [a,b] are unique without returning all the data, you can lazily group by and count those groups. With that, you can add a filter such that only rows with a count greater than 1 are returned. Only after those expressions are strung to the LazyFrame do you collect and if your pair of columns remain unique as you intend then the result will have 0 rows.
df \
.groupby(['a','b']) \
.agg(pl.count()). \
filter(pl.col('count')>1).select('count').collect().height

Postgres Citus, immutable date conversion

Trying to update some dates programmatically on Citus I always get
[0A000] ERROR: STABLE functions used in UPDATE queries cannot be called with column references
From a query like
UPDATE date_container SET json_value = json_value::jsonb - 'created_time' || CONCAT('{"created_time":"',
rtrim(replace(to_timestamp(((json_value->>'created_time')::numeric/1000000))::text,' ','T'), '-05'),'"}')::jsonb
In theory all methods are immutable, but for some reasons it says that some part of it is not.
I tried also all methods below:
PostgreSQL: how to convert from Unix epoch to date?
The CONCAT function is stable instead of immutable, this is often the case for functions that take any/anyelement as an argument.
select proname, pronamespace::regnamespace, provolatile
from pg_proc
where proname = 'concat';
proname │ pronamespace │ provolatile
─────────┼──────────────┼─────────────
concat │ pg_catalog │ s
Instead you should be able to use the string concatenation operator ||, but be sure to cast all items to text, otherwise you might get the same problem with it using a anyelement version of the || operator.
So I think this query should work:
UPDATE date_container SET json_value = json_value::jsonb - 'created_time' ||
(
'{"created_time":"'::text
|| rtrim(replace(to_timestamp(((json_value->>'created_time')::numeric/1000000))::text,' ','T'), '-05')::text
|| '"}'::text
)::jsonb

Whats the meaning of select attributeName(tableName) from tablename in postgresql

Using Postgresql I have an apparently strange behavior that I don't understand
Assume to have a simple table
create table employee (
number int primary key,
surname varchar(20) not null,
name varchar(20) not null);
It is well clear for me the meaning of
select name from employee
However, I obtain all the names also with
select name(employee) from employee
and I do not understand this last statement.
I'm using PostgreSQL 13 and pgadmin 4
I'd like to expand #Abelisto's answer with this quotation from PostgreSQL docs:
Another special syntactical behavior associated with composite values is that we can use functional notation for extracting a field of a composite value. The simple way to explain this is that the notations field(table) and table.field are interchangeable. For example, these queries are equivalent:
SELECT c.name FROM inventory_item c WHERE c.price > 1000;
SELECT name(c) FROM inventory_item c WHERE price(c) > 1000;
...
This equivalence between functional notation and field notation makes it possible to use functions on composite types to implement “computed fields”. An application using the last query above wouldn't need to be directly aware that somefunc isn't a real column of the table.
Just an assumption.
There are two syntactic ways in PostgreSQL to call a function that receives a row as its argument. For example:
create table t(x int, y int); insert into t values(1, 2);
create function f(a t) returns int language sql as 'select a.x+a.y';
select f(t), t.f from t;
┌───┬───┐
│ f │ f │
├───┼───┤
│ 3 │ 3 │
└───┴───┘
Probably it is implemented to make the syntax same for columns also:
select f(t), t.f, x(t), t.x from t;
┌───┬───┬───┬───┐
│ f │ f │ x │ x │
├───┼───┼───┼───┤
│ 3 │ 3 │ 1 │ 1 │
└───┴───┴───┴───┘

Can a Postgres daterange include infinity as an upper bound?

I can't see how to create a daterange with infinity as an inclusive upper bound. Postgres converts both inputs to an exclusive upper bound:
create table dt_overlap (
id serial primary key,
validity daterange not null
);
insert into dt_overlap (validity) values
('["2019-01-01", infinity]'),
('["2019-02-02", infinity)');
table dt_overlap;
id │ validity
────┼───────────────────────
1 │ [2019-01-01,infinity)
2 │ [2019-02-02,infinity)
select id,
upper(validity),
upper_inf(validity),
not isfinite(upper(validity)) as is_inf
from dt_overlap;
id │ upper │ upper_inf │ is_inf
────┼──────────┼───────────┼────────
1 │ infinity │ f │ t
2 │ infinity │ f │ t
That both values give the same results is kind of expected, since the inclusive upper bound infinity] was coerced to an exclusive upper bound infinity).
The same problem does not exist for the lower end of the range since the daterange keeps an inclusive lower bound and thus lower_inf() returns true.
Tested and reproduced with Postgresql 9.6.5 and Postgresql 10.3.
Any ideas?
Another way of creating an unbounded range is to leave out the upper bound completely, e.g. '["2019-01-01",)'
with dt_overlap (validity) as (
values
('["2019-01-01", infinity]'::daterange),
('["2019-02-01",]'::daterange)
)
select validity,
upper_inf(validity)
from dt_overlap;
results in
validity | upper_inf
----------------------+----------
[2019-01-01,infinity) | false
[2019-02-01,) | true

How to specify PostGIS geography value in a composite type literal?

I have a custom composite type:
CREATE TYPE place AS (
name text,
location geography(point, 4326)
);
I want to create a value of that type using a literal:
SELECT $$("name", "ST_GeogFromText('POINT(121.560800 29.901200)')")$$::place;
This fails with:
HINT: "ST" <-- parse error at position 2 within geometry
ERROR: parse error - invalid geometry
But this executes just fine:
SELECT ST_GeogFromText('POINT(121.560800 29.901200)');
I wonder what's the correct way to specify PostGIS geography value in a composite type literal?
You are trying to push a function call ST_GeogFromText into a text string. This will not be allowed, as it creates a possibility for SQL injection.
In second call you need ST_GeogFromText to mark the type of input. For a composite type, you did that already in type definition, so you can skip that part:
[local] gis#gis=# SELECT $$("name", "POINT(121.560800 29.901200)")$$::place;
┌───────────────────────────────────────────────────────────┐
│ place │
├───────────────────────────────────────────────────────────┤
│ (name,0101000020E610000032E6AE25E4635E40BB270F0BB5E63D40) │
└───────────────────────────────────────────────────────────┘
(1 row)
Time: 0,208 ms
Another option would be to use non-literal form, which allows function calls:
[local] gis#gis=# SELECT ('name', ST_GeogFromText('POINT(121.560800 29.901200)'))::place;;
┌───────────────────────────────────────────────────────────┐
│ row │
├───────────────────────────────────────────────────────────┤
│ (name,0101000020E610000032E6AE25E4635E40BB270F0BB5E63D40) │
└───────────────────────────────────────────────────────────┘
(1 row)
Time: 5,004 ms