postgresql return the first positive value from multiple columns queried - postgresql

I have a table in a POstgreSQL database with multiple columns out of which only one will have a value entered.
SELECT "Garden_GUID", "Municipality_Amajuba", "Municipality_Ilembe", "Municipality_Sisonke" from forms_garden
WHERE "Garden_GUID" = 'testguid';
Garden_GUID | Municipality_Amajuba | Municipality_Ilembe | Municipality_Sisonke
-------------+----------------------+---------------------+----------------------
testguid | Dannhauser | |
(1 row)
I wish to create a view in which the entries from those columns are colated into a single column.
I have tried:
CREATE VIEW municipality (GUID,funder,municipality)
AS SELECT "Garden_GUID"GUID,"Funder"funder,"Municipality_Amajuba","Municipality_Ilembe","Municipality_Sisonke"municipality
FROM forms_garden;
but it returns an error:
ERROR: column "municipality" specified more than once
Is there any way to query the various municipality_* columns row by row and only return the first positive entry?
Many thanks in advance.

I think coalesce() is what you're looking for:
with forms_garden as (
select 'guid1' guid, 'Dannhauser' amajuba, null ilembe, null sisonke
union all select 'guid2', null, 'muni2', null
union all select 'guid3', null, null, 'muni3'
) select guid, coalesce(amajuba,ilembe,sisonke) municipality from forms_garden;

Related

Subquery returning field from second table with between operator using value from first table

I have a problem with doing a subquery in PostgreSQL to get a calculated column value, it reports:
[21000] ERROR: more than one row returned by a subquery used as an expression
Situation
I have two tables:
accounts
postal_code_to_state
Accounts table (subset of columns)
name
postal_code
state
Cust One
00020
NULL
Cust Two
63076
CD
Postal Code to State
pc_from
pc_to
state
10
30
AB
63000
63100
CD
The accounts table has rows where the state value may be unknown, but there is a postal code value. The postal code field is char (but that is incidental).
The postal_code_to_state table has rows with postcode (integer) from & to columns. That is the low/high numbers of the postal code range for a state.
There is no common field to do joins. To get the state from the postal_code_to_state table the char field is cast to INT and the between operator used, e.g.
SELECT state
FROM postal_code_to_state
WHERE CAST('00020' AS INT) BETWEEN pc_from AND pc_to
this works OK, there is also a unique index on pc_from and pc_to.
But I need to run a query selecting from the accounts table and populating the state column from the state column in the postal_code_to_state table using the postal_code from the accounts table to select the appropriate row.
I can't figure out why PostgreSQL is complaining about the subquery returning multiple rows. This is the query I am currently using:
SELECT id,
name,
postal_code,
state,
(SELECT state
FROM postal_code_to_state
WHERE CAST(accounts.postal_code AS INT) BETWEEN pc_from AND pc_to) AS new_state
FROM accounts
WHERE postal_code IS NOT NULL ;
If I use LIMIT 1 in the subquery it is OK, and it returns the correct state value from postal_code_to_state, but would like to have it working without need to do that.
UPDATE 2022-10-22
#Adrian - thanks for query to find duplicates, I had to change your query a little, the != 'empty' to != FALSE.
When I run it on data I get this, groups of two rows (1 & 2, 3 & 4, etc.) shows the overlapping ranges.
state
pc_from
pc_to
CA
9010
9134
OR
9070
9170
UD
33010
33100
PN
33070
33170
TS
34010
34149
GO
34070
34170
CB
86010
86100
IS
86070
86170
So if I run...
SELECT pc_from,
pc_to,
state
FROM postal_code_to_state
WHERE int4range(pc_from, pc_to) #> 9070;
I get...
pc_from
pc_to
state
9010
9134
CA
9070
9170
OR
So, from the PostgreSQL side, the problem is clear - obviously it is the data. On the point of the data, what is shown on a site that has Italian ZIP code information is interesting:
https://zip-codes.nonsolocap.it/cap?k=12071&b=&c=
This was one of the dupes I had already removed.
The exact same ZIP code is used in two completely different provinces (states) - go figure! Given that the ZIP code is meant to resolve down to the street level, I can't see how one code can be valid for two localities.
Try this query to find duplicates:
select
a_tbl.state, a_tbl.pc_from, a_tbl.pc_to
from
postal_code_to_state as a_tbl,
(select * from postal_code_to_state) as b_tbl
where
a_tbl.state != b_tbl.state
and
int4range(a_tbl.pc_from, a_tbl.pc_to, '[]') && int4range(b_tbl.pc_from, b_tbl.pc_to, '[]') != 'empty';
If there are duplicates, after clearing them then you can do:
alter table
postal_code_to_state
add
constraint exclude_test EXCLUDE USING GIST (int4range(pc_from, pc_to, '[]') WITH &&);
This will set up an exclusion constraint to prevent overlapping ranges.
So:
insert into postal_code_to_state values (10, 30, 'AB'), (6300, 63100, 'CD');
insert into postal_code_to_state values (25, 40, 'SK');
ERROR: conflicting key value violates exclusion constraint "exclude_test"
insert into postal_code_to_state values (31, 40, 'SK');
INSERT 0 1
select * from postal_code_to_state ;
pc_from | pc_to | state
---------+-------+-------
10 | 30 | AB
63000 | 63100 | CD
31 | 40 | SK
The combined unique index would not protect you against overlapping postal codes, only duplicates. First, I'd write the query like this
SELECT id, name, postcode, coalesce(accounts.state, postal_code_to_state.state) state
FROM accounts
LEFT JOIN postal_code_to_state ON accounts.postal_code::Integer BETWEEN pc_from AND pc_to
WHERE accounts.state IS NOT NULL OR postal_code_to_state.state IS NOT NULL;
You could modify it to tell you which are overlapping
SELECT id, coalesce(accounts.state, postal_code_to_state.state) state
FROM accounts
LEFT JOIN postal_code_to_state ON accounts.postal_code::Integer BETWEEN pc_from AND pc_to
WHERE accounts.state IS NOT NULL OR postal_code_to_state.state IS NOT NULL
GROUP BY id,state
HAVING count(id) > 1;
I haven't tested any of this.

How to break apart a column includes keys and values into separate columns in postgres

I am new in postgres and basically have no experience. I have a table with a column includes key and value. I need write a query which return a table with the all the columns of the table and additional columns as key as the column name and the value under it.
My input is like:
id | name|message
12478| A |{img_type:=png,key_id:=f235, client_status:=active, request_status:=open}
12598| B |{img_type:=none,address_id:=c156, client_status:=active, request_status:=closed}
output will be:
id |name| Img_type|Key_id|address_id|Client_status|Request_status
12478| A | png |f235 |NULL |active | open
12598| B | none |NULL |c156 |active | closed
Any help would be greatly appreciated.
The only thing I can think of, is a regular expression to extract the key/value pairs.
select id, name,
(regexp_match(message, '(img_type:=)([^,}]+),{0,1}'))[2] as img_type,
(regexp_match(message, '(key_id:=)([^,}]+),{0,1}'))[2] as key_id,
(regexp_match(message, '(client_status:=)([^,}]+),{0,1}'))[2] as client_status,
(regexp_match(message, '(request_status:=)([^,}]+),{0,1}'))[2] as request_status
from the_table;
regexp_match returns an array of matches. As the regex contains two groups (one for the "key" and one for the "value"), the [2] takes the second element of the array.
This is quite expensive and error prone (e.g. if any of the values contains a , and you need to deal with quoted values). If you have any chance to change the application that stores the value, you should seriously consider changing your code to store a proper JSON value, e.g.
{"img_type": "png", "key_id": "f235", "client_status": "active", "request_status": "open"}'
then you can use e.g. message ->> 'img_type' to retrieve the value for the key img_type
You might also want to consider a properly normalized table, where each of those keys is a real column.
I can do it with function.
I am sure about the performance but here is my suggestion:
CREATE TYPE log_type AS (img_type TEXT, key_id TEXT, address_id TEXT, client_status TEXT, request_status TEXT);
CREATE OR REPLACE FUNCTION populate_log(data TEXT)
RETURNS log_type AS
$func$
DECLARE
r log_type;
BEGIN
select x.* into r
from
(
select
json_object(array_agg(array_data)) as json_data
from (
select unnest(string_to_array(trim(unnest(string_to_array(substring(populate_log.data, '[^{}]+'), ','))), ':=')) as array_data
) d
) d2,
lateral json_to_record(json_data) as x(img_type text, key_id text, address_id text, client_status text, request_status text);
RETURN r;
END
$func$ LANGUAGE plpgsql;
with log_data (id, name, message) as (
values
(12478, 'A', '{img_type:=png,key_id:=f235, client_status:=active, request_status:=open}'),
(12598, 'B', '{img_type:=none,address_id:=c156, client_status:=active, request_status:=closed}')
)
select id, name, l.*
from log_data, lateral populate_log(message) as l;
What you finally write in query will be something like this, imagine that the data is in a table named log_data :
select id, name, l.*
from log_data, lateral populate_log(message) as l;
I suppose that message column is a text, in Postgres it might be an array, in that case you have to remove some conversions, string_to_array(substring(populate_log.data)) -> populate_log.data

Looking for a value in a jsonb list of keys/values

I have a postgresql table of cities (1 row = 1 city) with a jsonb colum containing the name of the city in different languages (as a list, not an array). For example for Paris(France) I have:
id_city (integer) = 7444
name_city (text) = Paris
names_i18n (jsonb) = {"name:fr":"Paris","name:zh":"巴黎","name:it":"Parigi",...}
In reality in my table I have around 20 different languages. So I try to find a city looking for any name:xx's value that could match a parameter given by the user, but I can't find how to query the jsonb column in that way. I've tried something like the request below but it doesn't seem to be the good syntaxe
select * from jsonb_each_text(select names_i18n from CityTable)
where value ilike 'Parigi'
I have also tried the following
select * from CityTable where names_i18n ? 'Parigi';
But it seems to work only for the key part of the jsonb, is there any similar operator for the value part? I also need a way to know what name:XX has been found, not only the city name.
Anyone has a clue?
with CityTable (id_city, name_city, names_i18n) as (values(
7444, 'Paris',
'{"name:fr":"Paris","name:zh":"巴黎","name:it":"Parigi"}'::jsonb
))
select *
from CityTable, jsonb_each_text(names_i18n) jbet (key, value)
where value ilike 'Parigi'
;
id_city | name_city | names_i18n | key | value
---------+-----------+--------------------------------------------------------------+---------+--------
7444 | Paris | {"name:fr": "Paris", "name:it": "Parigi", "name:zh": "巴黎"} | name:it | Parigi

Calculate table column using over table on server side

Suppose, there are two tables in db:
Table registries:
Column | Type |
--------------------+-----------------------------+---------
registry_id | integer | not null
name | character varying | not null
...
uploaded_at | timestamp without time zone | not null
Table rows:
Column | Type | Modifiers
---------------+-----------------------------+-----------
row_id | character varying | not null
registry_id | integer | not null
row | character varying | not null
In real world registries is just a csv-file and rows is lines of the files. In my scala-slick application, I want to know how many lines in each file.
registries:
1,foo,...
2,bar,...
3,baz,...
rows:
aaa,1,...
bbb,1,...
ccc,2,...
desired result:
1,foo,... - 2
2,bar,... - 1
3,baz,... - 0
My code now is (slick-3.0):
def getRegistryWithLength(rId: Int) = {
val q1 = registries.filter(_.registryId===rId).take(1).result.headOption
val q2 = rows.filter(_.registryId===rId).length.result
val registry = Await.result(db.run(q1), 5.seconds)
val length = Await.result(db.run(q2), 5.seconds)
(registry, length)
}
(Await is bad idea, I know it)
How can I do getRegistryWithLength using single sql query?
I could add column row_n into table registries, but then I'll be forced to do updating column row_n after delete/insert query of rows table.
How can I do automatic calculation column row_n in table registries on db server side?
The basic query could be:
SELECT r.*, COALESCE(n.ct, 0) AS ct
FROM registry r
LEFT JOIN (
SELECT registry_id, count(*) AS ct
FROM rows
GROUP BY registry_id
) n USING (registry_id);
The LEFT [OUTER] JOIN is essential so you do not filter rows from registry without related rows in rows.
COALESCE to return 0 instead of NULL where no related rows are found.
There are many related answers on SO. One here:
SQL: How to save order in sql query?
You could wrap this in a VIEW for convenience:
CREATE VIEW reg_rn AS
SELECT ...
... which you query like a table.
Aside: It's unwise to use reserved SQL key words as identifiers. row is a no-go for a column name (even if allowed in Postgres).
Thanks Erwin Brandstetter for awesome answer, using it, I wrote code for my scala-slick application.
Scala code looks much more complicated than plain sql:
val registryQuery = registries.filter(_.userId === userId)
val rowQuery = rows groupBy(_.registryId) map { case (regId, rowItems) => (regId, rowItems.length)}
val q = registryQuery joinLeft rowQuery on (_.registryId === _._1) map {
case (registry, rowsCnt) => (registry, rowsCnt.map(_._2))
}
but it works!

Why SELECT with WHERE clause returns 0 rows on Cassandra's table? (should return 2 rows)

I created a minimal example of users TABLE on Cassandra 2.0.9 database. I can use SELECT to select all its rows, but I do not understand why adding my WHERE clause (on indexed collumn) returns 0 rows.
(I also do not get why 'COINTAINS' statement causes an error here, as presented below, but let's assume this is not my primary concern. )
DROP TABLE IF EXISTS users;
CREATE TABLE users
(
KEY varchar PRIMARY KEY,
password varchar,
gender varchar,
session_token varchar,
state varchar,
birth_year bigint
);
INSERT INTO users (KEY, gender, password) VALUES ('jessie', 'f', 'avlrenfls');
INSERT INTO users (KEY, gender, password) VALUES ('kate', 'f', '897q7rggg');
INSERT INTO users (KEY, gender, password) VALUES ('mike', 'm', 'mike123');
CREATE INDEX ON users (gender);
DESCRIBE TABLE users;
Output:
CREATE TABLE users (
key text,
birth_year bigint,
gender text,
password text,
session_token text,
state text,
PRIMARY KEY ((key))
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.000000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX users_gender_idx ON users (gender);
This SELECT works OK
SELECT * FROM users;
key | birth_year | gender | password | session_token | state
--------+------------+--------+-----------+---------------+-------
kate | null | f | 897q7rggg | null | null
jessie | null | f | avlrenfls | null | null
mike | null | m | mike123 | null | null
And this does not:
SELECT * FROM users WHERE gender = 'f';
(0 rows)
This also fails:
SELECT * FROM users WHERE gender CONTAINS 'f';
Bad Request: line 1:33 no viable alternative at input 'CONTAINS'
It sounds like your index may have become corrupt. Try rebuilding it. Run this from a command prompt:
nodetool rebuild_index yourKeyspaceName users users_gender_idx
However, the larger issue here is that secondary indexes are known to perform poorly. Some have even identified their use as an anti-pattern. DataStax has a document designed to guide you in appropriate use of secondary indexes. And this is definitely not one of them.
creating an index on an extremely low-cardinality column, such as a boolean column, does not make sense. Each value in the index becomes a single row in the index, resulting in a huge row for all the false values, for example. Indexing a multitude of indexed columns having foo = true and foo = false is not useful.
While gender may not be a boolean column, it has the same cardinality. A secondary index on this column is a terrible idea.
If querying by gender is something you really need to do, then you may need to find a different way to model or partition your data. For instance, PRIMARY KEY (state, gender, key) will allow you to query gender by state.
SELECT * FROM users WHERE state='WI' and gender='f';
That would return all female users from the state of Wisconsin. Of course, that would mean you would also have to query all states individually. But the bottom line, is that Cassandra does not handle queries for low cardinality keys/indexes well, so you have to be creative in how you solve these types of problems.