Create an index for json_array_elements in PostgreSQL - postgresql

I need to create an index from a query that uses json_array_elements()
SELECT *, json_array_elements(nested_json_as_text::json) as elements FROM my_table
Since the json contains multiple elements, the result is that the original index is now duplicated across rows and no longer unique.
I am not very familiar with creating indices and want to avoid doing anything destructive. What is the best way to create a column of unique integers for this case?

Found an answer:
SELECT *, json_array_elements(nested_json_as_text::json) as elements, row_number() over () as my_index FROM my_table

Related

Unnest a two-dimensional array to a table

Is it possible to convert the following into a two-columned table?
SELECT * FROM UNNEST([[1,'a'],[2,'b'], [3,'c']) AS tbl (num,str);
Or more generally into a table of n columns where n is the size of the inner array (i.e., the length of a row).
The array structure is arr[rows][cols]
Arrays in PostgreSQL can contain elements of one type (but it can be composite type). So array like array[1,2,'e'] is invalid in PostgreSQL.
What can be done is to unnest two arrays in one statement
SELECT UNNEST(array[1,2,3]),unnest(array['a','b','c'])

Is it possible to index the position of an array column in PostgreSQL?

Let's say I want to find rows in the table my_table that have the value 5 at the first position of the array column my_array_column. To prepare the table, I executed the following statements:
CREATE TABLE my_table (
id serial primary key,
my_array_column integer[]
);
CREATE INDEX my_table_my_array_column_index on "my_table" USING GIN ("my_array_column");
SET enable_seqscan TO off;
INSERT INTO my_table (my_array_column) VALUES ('{5,7,10}');
Now, the query can look like this:
select * from my_table where my_array_column[1] = 5;
This works, but it doesn't use the created GIN index. Is it possible to search for the value 5 at a specific position with an index?
I want to find rows in the table my_table that have the value 5 at the first position of the array column
A partial index would be most efficient for that definition:
CREATE INDEX my_table_my_array_special_idx ON my_table ((true))
WHERE my_array_column[1] = 5;
If only a small fraction of rows qualifies, a partial index is accordingly smaller. Plus, the actual index column only occupies minimum space (typically 8 bytes). And, on top of that, Postgres 13 or later can apply index deduplication to make the index much smaller, yet.
Once the index is fully cached, its small size does not make it much faster, but still.
And most writes do not have to manipulate the index, which may be the most important benefit, depending on the workload.
Oh, and Postgres collects statistics for a partial index. So you can expect the query planner to make a fully educated choice when that index is involved.
Related:
PostgreSQL partial index unused when created on a table with existing data
Index that is not used, yet influences query
It's applicable when the query repeats the same condition.
Typically, you have something useful as index field on top of your declared purpose. But if you don't, just use any small constant - true in my example, but anything < 8 bytes is equally good.
Minor disclaimer: The "first position" in a Postgres array does not necessarily have index 1. If non-standard array indexes are possible, consider:
...
WHERE (my_array_column[:])[1] = 5;
In index and queries.
See:
Normalize array subscripts for 1-dimensional array so they start with 1
You can index just the first position. You need an extra set of parentheses in the create statement to do that:
create index on my_table ((my_array_column[1]));
Or you could augment your query to work with your gin index, on the theory that an array can't have the first element be 5 unless at least one element is 5.
select * from my_table where my_array_column[1] = 5 and my_array_column #> ARRAY[5];
Of course this won't be very efficient if a lot of your arrays contain 5, but in some other spot in the array. It would have to recheck all of those "false matches" to eliminate them. So if you only care about the first element, the first index I showed is better. (Of course, if you only care about the first element, why use an array to start with?)
If you always look at the first position a regular B-Tree index will do:
create index on my_table ( (my_array_column[1]) );
If you don't know the position, then a GIN index is indeed needed, but you need to use an operator that is supported by a gin index, that would be e.g. the #> operator. But for that you need to use a different query:
select *
from my_table
where my_array_column #> array[5];
That would find all rows where the array column contains the value 5.
But you should head the advice given in the manual regarding the use of arrays:
Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.

Casting rows to arrays in PostgreSQL

I need to query a table as in
SELECT *
FROM table_schema.table_name
only each row needs to be a TEXT[] with array values corresponding to column values casted to TEXT coming in the same order as in SELECT * so assuming the table has columns a, b and c I need the result to look like
SELECT ARRAY[a::TEXT, b::TEXT, c::TEXT]
FROM table_schema.table_name
only it shouldn't explicitly list columns by name. Ideally it should look like
SELECT as_text_array(a)
FROM table_schema.table_name AS a
The best I came up with looks ugly and relies on "hstore" extension
WITH columnz AS ( -- get ordered column name array
SELECT array_agg(attname::TEXT ORDER BY attnum) AS column_name_array
FROM pg_attribute
WHERE attrelid = 'table_schema.table_name'::regclass AND attnum > 0 AND NOT attisdropped
)
SELECT hstore(a)->(SELECT column_name_array FROM columnz)
FROM table_schema.table_name AS a
I am having a feeling there must be a simpler way to achieve that
UPDATE 1
Another query that achieves the same result but arguably as ugly and inefficient as the first one is inspired by the answer by #bspates. It may be even less efficient but doesn't rely on extensions
SELECT r.text_array
FROM table_schema.table_name AS a
INNER JOIN LATERAL ( -- parse ROW::TEXT presentation of a row
SELECT array_agg(COALESCE(replace(val[1], '""', '"'), NULLIF(val[2], ''))) AS text_array
FROM regexp_matches(a::text, -- parse double-quoted and simple values separated by commas
'(?<=\A\(|,) (?: "( (?:[^"]|"")* )" | ([^,"]*) ) (?=,|\)\Z)', 'xg') AS t(val)
) AS r ON TRUE
It is still far from ideal
UPDATE 2
I tested all 3 options existing at the moment
Using JSON. It doesn't rely on any extensions, it is short to write, easy to understand and the speed is ok.
Using hstore. This alternative is the fastest (>10 times faster than JSON approach on a 100K dataset) but requires an extension. hstore in general is very handy extension to have through.
Using regex to parse TEXT presentation of a ROW. This option is really slow.
A somewhat ugly hack is to convert the row to a JSON value, then unnest the values and aggregate it back to an array:
select array(select (json_each_text(to_json(t))).value) as row_value
from some_table t
Which is to some extent the same as your hstore hack.
If the order of the columns is important, then using json and with ordinality can be used to keep that:
select array(select val
from json_each_text(to_json(t)) with ordinality as t(k,val,idx)
order by idx)
from the_table t
The easiest (read hacky-est) way I can think of is convert to a string first then parse that string into an array. Like so:
SELECT string_to_array(table_name::text, ',') FROM table_name
BUT depending on the size and type of the data in the table, this could perform very badly.

Selecting a row by searching a specific value in an Array column

We have a table where one of the columns is an array. I need to select a row or many rows as long as my search value matches their values using ILIKE. My problem is that I need to search the values of an array column as well. I tried using ANY but the value needs to be exact to select a row. I need something similar to ILIKE but for that array column.
Thank you in advance.
Use unnest function:
SELECT x.value
FROM my_table t, unnest(t.my_array_column) as x(value)
WHERE x.value ILIKE 'foo'
Once your question is also tagged elixir, for converting this to Ecto use Ecto.Query.API.fragment/1 for the select condition and Ecto.Query.API.ilike/2 for match.

ROWID equivalent in postgres 9.2

Is there any way to get rowid of a record in postgres??
In oracle i can use like
SELECT MAX(BILLS.ROWID) FROM BILLS
Yes, there is ctid column which is equivalent for rowid. But is useless for you. Rowid and ctid are physical row/tuple identifiers => can change after rebuild/vacuum.
See: Chapter 5. Data Definition > 5.4. System Columns
The PostgreSQL row_number() window function can be used for most purposes where you would use rowid. Whereas in Oracle the rowid is an intrinsic numbering of the result data rows, in Postgres row_number() computes a numbering within a logical ordering of the returned data. Normally if you want to number the rows, it means you expect them in a particular order, so you would specify which column(s) to order the rows when numbering them:
select client_name, row_number() over (order by date) from bills;
If you just want the rows numbered arbitrarily you can leave the over clause empty:
select client_name, row_number() over () from bills;
If you want to calculate an aggregate over the row number you'll have to use a subquery:
select max(rownum) from (
select row_number() over () as rownum from bills
) r;
If all you need is the last item from a table, and you have a column to sort sequentially, there's a simpler approach than using row_number(). Just reverse the sort order and select the first item:
select * from bills
order by date desc limit 1;
Use a Sequence. You can choose 4 or 8 byte values.
http://www.neilconway.org/docs/sequences/
Add any unique column to your table(name maybe rowid).
And prevent changing it by creating BEFORE UPDATE trigger, which will raise exception if someone will try to update.
You may populate this column with sequence as #JohnMudd mentioned.