Pixel values of raster records to be inserted in the table as columns - postgresql

I have a table with following columns:
(ID, row_num, col_num, pix_centroid, pix_val1).
I have more than 1000 records. I am inserting my data using:
insert into pixelbased (row_num, col_num, pix_centroid, pix_val)
select
(ST_PixelAsPolygons(rast, 1)).x as X,
(ST_PixelAsPolygons(rast, 1)).y as Y,
(ST_Centroid((ST_PixelAsPolygons(rast, 1)).geom)) as geom,
(ST_PixelAsPolygons(rast, 1)).val as pix_val1
from mytable
where rid=1`
Now I am trying to insert all the other records as a column and _pix_val1_ column is important for me. All the other columns will remain the same. In the other word, I want the final table to have these columns:
(ID, row_num, col_num, pix_centroid, pix_val1, pix_val2, pix_val3, ....)
Is there a way to do it?

I would want to store this data as a bitmap in a bytea if possible. Here's how to take a series of byte values and turn it into a bytea:
WITH bytes(b) AS (SELECT x % 256 FROM generate_series(1,53000) x)
SELECT ('\x'||string_agg(lpad(to_hex(b),2,'0'),''))::bytea FROM bytes;
You can access fields or ranges of the byte array using the substr function. This bytea is organized as a linear pixel array, but you may find it more useful to organize it into a more traditional bitmap format. Also, if your pixels are more than one byte you may need to cope with big-endian vs little-endian. You could do that in SQL, but it's likely to be much easier in a procedural language like PL/Perl.
Failing that, a multidimensional array would be a somewhat reasonable choice.
Using a generate_series statement as a substitute for your pix_val field for convenient testing, this query produces a two-dimensional array of integers using two aggregation passes:
SELECT ('{'||string_agg(subarray, ',')||'}')::integer[] AS arr
FROM (
SELECT array_agg(x order by x)::text
FROM generate_series(1,53000) x
GROUP BY width_bucket(x, 1, 53001, 100)
) a(subarray);
The unfortunate use of the string literal form of the two dimensional array is made necessary by the fact that array_agg cannot aggregate arrays. In my view this is a real wart in PostgreSQL; in general its multidimensional arrays are odd to work with and inconsistent with how most applications and languages implement arrays.
You can get fields out of the array by indexing it. Example:
regress=> SELECT ('{'||string_agg(subarray, ',')||'}')::integer[] AS arr INTO test FROM (SELECT array_agg(x order by x)::text from generate_series(1,53000) x GROUP BY width_bucket(x, 1, 53001, 100)) a(subarray);
regress=> \d test
Table "public.test"
Column | Type | Modifiers
--------+-----------+-----------
arr | integer[] |
test contains a single array with two dimensions:
regress=> \x
regress=> select array_dims(test.arr), array_ndims(test.arr), array_length(test.arr,1), array_length(test.arr,2) FROM test;
-[ RECORD 1 ]+---------------
array_dims | [1:100][1:530]
array_ndims | 2
array_length | 100
array_length | 530
I can get elements with two-level indexing:
regress=> SELECT test.arr[4][4] FROM test;
arr
------
1594
(1 row)
or a "column" with slicing:
regress=> SELECT test.arr[4:4][1:530] FROM test;
Oddly, this is still a two-dimensional array, the top dimension is just one element deep. You can flatten it (inefficiently) with unnest and array_agg if you need to.
Two-dimensional arrays in PostgreSQL are somewhat weird, as you can see, but so is what you're trying to do.

Related

Redshift how to split a stringified array into separate parts

Say I have a varchar column let's say religions that looks like this: ["Christianity", "Buddhism", "Judaism"] (yes it has a bracket in the string) and I want the string (not array) split into multiple rows like "Christianity", "Buddhism", "Judaism" so it can be used in a WHERE clause.
Eventually I want to use the results of the query in a where clause like this:
SELECT ...
FROM religions
WHERE name in
(
<this subquery>
)
How can one do this?
You can use the function JSON_PARSE to convert the varchar string into an array. Then you can use the strategy described in Convert varchar array to rows in redshift - Stack Overflow to convert the array to separate rows.
You can do the following.
Create a temporary table with sequence of numbers
Using the sequence and split_part function available in redshift, you can split the values based on the numbers generated in the temporary table by doing a cross join.
To replace the double quote and square brackets, you can use the regexp_replace function in Redshift.
create temp table seq as
with recursive numbers(NUMBER) as
(
select 1 UNION ALL
select NUMBER + 1 from numbers where NUMBER < 28
)
select * from numbers;
select regexp_replace(split_part(val,',',seq.number),'[]["]','') as value
from
(select '["christianity","Buddhism","Judaism"]' as val) -- You can select the actual column from the table here.
cross join
seq
where seq.number <= regexp_count(val,'[,]')+1;

How to re-map array column values in select in Postgresql?

Is it possible to re-map integer values from a Postgres array column in the select? This is what I have:
select unnest(tag_ids) from mention m where id = 288201;
unnest
---------
-143503
-143564
125192
143604
137694
tag_ids is integer[] column
I would like to translate those numbers. Functions like abs(unnest(..)) work but found I cannot use a CASE statement. Tx.
If you want to do anything non-trivial with the elements from an array after unnesting, use the set-returning function like table:
select u.tag_id
from mention m
cross join unnest(m.tag_ids) as u(tag_id)
where m.id = 288201;
Now, u.tag_id is an integer column that you can use like any other column, e.g. in a CASE expression.

generate_subscripts(array, 2) returns two records when there is only one multidimensional element

Why does this return 2 records when there is only one multidimensional element in multidimensional array images?
SELECT images
FROM (
SELECT images, generate_subscripts(images, 2) AS s
FROM listings
WHERE listings.id = 2
) as foo;
Note: I shortened the base64 string for easier viewing.
id | 2
created_at | 2017-04-19 23:44:50.150913+00
posted_by | 10209280753550922
images | {{/9j/4AAJRgAB2dgKd/9k=,3/2/image-3-2-1492645490308.jpeg}}
dev_dolphin_db=# SELECT images FROM(SELECT images, generate_subscripts(images, 2) AS s FROM listings where listings.id = 2) as foo;
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------
images | {{/9j/4AAQSkZJRdgKd/9k=,3/2/image-3-2-1492645490308.jpeg}}
-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------------------
images | {{/9j/4AAQSkZN2dgKd/9k=,3/2/image-3-2-1492645490308.jpeg}}
There are two elements in your array, separated by the comma:
{{/9j/4AAJRgAB2dgKd/9k=,3/2/image-3-2-1492645490308.jpeg}}
See:
SELECT *
FROM unnest('{{/9j/4AAJRgAB2dgKd/9k=,3/2/image-3-2-1492645490308.jpeg}}'::text[])
unnest
--------------------------------
/9j/4AAJRgAB2dgKd/9k=
3/2/image-3-2-1492645490308.jpeg
generate_subscripts() returns one row per element in the specified dimension (not one row per dimension). The manual:
generate_subscripts is a convenience function that generates the set
of valid subscripts for the specified dimension of the given array.
Zero rows are returned for arrays that do not have the requested
dimension, or for NULL arrays (but valid subscripts are returned for
NULL array elements).
If that's supposed to be a single element, it would have to be double-quoted to escape the special meaning of the comma:
{{"/9j/4AAJRgAB2dgKd/9k=,3/2/image-3-2-1492645490308.jpeg"}}
Aside: in modern Postgres you can use this simpler equivalent query:
SELECT images
FROM listings, generate_subscripts(images, 2) s
WHERE id = 2;
That's an implicit CROSS JOIN LATERAL. See:
What is the difference between LATERAL and a subquery in PostgreSQL?

postgres `order by` argument type

What is the argument type for the order by clause in Postgresql?
I came across a very strange behaviour (using Postgresql 9.5). Namely, the query
select * from unnest(array[1,4,3,2]) as x order by 1;
produces 1,2,3,4 as expected. However the query
select * from unnest(array[1,4,3,2]) as x order by 1::int;
produces 1,4,3,2, which seems strange. Similarly, whenever I replace 1::int with whatever function (e.g. greatest(0,1)) or even case operator, the results are unordered (on the contrary to what I would expect).
So which type should an argument of order by have, and how do I get the expected behaviour?
This is expected (and documented) behaviour:
A sort_expression can also be the column label or number of an output column
So the expression:
order by 1
sorts by the first column of the result set (as defined by the SQL standard)
However the expression:
order by 1::int
sorts by the constant value 1, it's essentially the same as:
order by 'foo'
By using a constant value for the order by all rows have the same sort value and thus aren't really sorted.
To sort by an expression, just use that:
order by
case
when some_column = 'foo' then 1
when some_column = 'bar' then 2
else 3
end
The above sorts the result based on the result of the case expression.
Actually I have a function with an integer argument which indicates the column to be used in the order by clause.
In a case when all columns are of the same type, this can work: :
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1
WHEN 2 THEN column2
.....
WHEN 1235 THEN column1235
END
If columns are of different types, you can try:
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1::varchar
WHEN 2 THEN column2::varchar
.....
WHEN 1235 THEN column1235::varchar
END
But these "workarounds" are horrible. You need some other approach than the function returning a column number.
Maybe a dynamic SQL ?
I would say that dynamic SQL (thanks #kordirko and the others for the hints) is the best solution to the problem I originally had in mind:
create temp table my_data (
id serial,
val text
);
insert into my_data(id, val)
values (default, 'a'), (default, 'c'), (default, 'd'), (default, 'b');
create function fetch_my_data(col text)
returns setof my_data as
$f$
begin
return query execute $$
select * from my_data
order by $$|| quote_ident(col);
end
$f$ language plpgsql;
select * from fetch_my_data('val'); -- order by val
select * from fetch_my_data('id'); -- order by id
In the beginning I thought this could be achieved using case expression in the argument of the order by clause - the sort_expression. And here comes the tricky part which confused me: when sort_expression is a kind of identifier (name of a column or a number of a column), the corresponding column is used when ordering the results. But when sort_expression is some value, we actually order the results using that value itself (computed for each row). This is #a_horse_with_no_name's answer rephrased.
So when I queried ... order by 1::int, in a way I have assigned value 1 to each row and then tried to sort an array of ones, which clearly is useless.
There are some workarounds without dynamic queries, but they require writing more code and do not seem to have any significant advantages.

Postgresql Select all columns and column names with a specific value for a row

I have a table with many(+1000) columns and rows(~1M). The columns have either the value 1 , or are NULL.
I want to be able to select, for a specific row (user) retrieve the column names that have a value of 1.
Since there are many columns on the table, specifying the columns would yield a extremely long query.
You're doing something SQL is quite bad at - dynamic access to columns, or treating a row as a set. It'd be nice if this were easier, but it doesn't work well with SQL's typed nature and the concept of a relation. Working with your data set in its current form is going to be frustrating; consider storing an array, json, or hstore of values instead.
Actually, for this particular data model, you could probably use a bitfield. See bit(n) and bit varying(n).
It's still possible to make a working query with your current model PostgreSQL extensions though.
Given sample:
CREATE TABLE blah (id serial primary key, a integer, b integer, c integer);
INSERT INTO blah(a,b,c) VALUES (NULL, NULL, 1), (1, NULL, 1), (NULL, NULL, NULL), (1, 1, 1);
I would unpivot each row into a key/value set using hstore (or in newer PostgreSQL versions, the json functions). SQL its self provides no way to dynamically access columns, so we have to use an extension. So:
SELECT id, hs FROM blah, LATERAL hstore(blah) hs;
then extract the hstores to sets:
SELECT id, k, v FROM blah, LATERAL each(hstore(blah)) kv(k,v);
... at which point your can filter for values matching the criteria. Note that all columns have been converted to text, so you may want to cast it back:
SELECT id, k FROM blah, LATERAL each(hstore(blah)) kv(k,v) WHERE v::integer = 1;
You also need to exclude id from matching, so:
regress=> SELECT id, k FROM blah, LATERAL each(hstore(blah)) kv(k,v) WHERE v::integer = 1 AND
k <> 'id';
id | k
----+---
1 | c
2 | a
2 | c
4 | a
4 | b
4 | c
(6 rows)