Postgres - is it possible to group by substring of one of my fields? - postgresql

This is my table:
id | integer | not null default nextval('frontend_prescription_id_seq'::regclass)
actual_cost | double precision | not null
chemical_id | character varying(9) | not null
practice_id | character varying(6) | not null
I'd like to query results for a particular practice_id, and then sum the actual_cost by date and by the first two characters of the chemical_id. Is this possible in Postgres?
In other words, I'd like the output to look something like this:
processing_date | cost | chemical_id_substr
01-01-2010 1234 01
01-02-2010 4366 01
01-01-2010 3827 02
01-02-2010 8768 02
This is my current query, but it groups by the whole of chemical_id, not the substring:
query = "SELECT SUM(actual_cost) as cost, processing_date, "
query += "chemical_id as id FROM frontend_items"
query += " WHERE practice_id=%s "
query += "GROUP BY processing_date, chemical_id"
cursor.execute(query, (practice_id,))
I'm not sure how to change this to group by substring, or whether I should add a functional index, or whether I should just denormalise my table and add a new column. Thanks for any help.

You can do this, but you also need to make sure the substring is used in the select list, not the complete column:
SELECT SUM(actual_cost) as cost,
processing_date,
left(chemical_id,2) as id --<< use the same expression here as in the GROUP BY
FROM frontend_items
WHERE practice_id= %s
GROUP BY processing_date, left(chemical_id,2);

Related

Aggregate function to extract all fields based on maximum date

In one table I have duplicate values ​​that I would like to group and export only those fields where the value in the "published_at" field is the most up-to-date (the latest date possible). Do I understand it correctly as I use the MAX aggregate function the corresponding fields I would like to extract will refer to the max found or will it take the first found in the table?
Let me demonstrate you this on simple example (in real world example I am also joining two different tables). I would like to group it by id and extract all fields but only relating to the max published_at field. My query would be:
SELECT "t1"."id", "t1"."field", MAX("t1"."published_at") as "published_at"
FROM "t1"
GROUP By "t1"."id"
| id | field | published_at |
---------------------------------
| 1 | document1 | 2022-01-10 |
| 1 | document2 | 2022-01-11 |
| 1 | document3 | 2022-01-12 |
The result I want is:
1 - document3 - 2022-01-12
Also one question - why am I getting this error "ERROR: column "t1"."field" must appear in the GROUP BY clause or be used in an aggregate function". Can I use MAX function on string type column?
If you want the latest row for each id, you can use DISTINCT ON. For example:
select distinct on (id) *
from t
order by id, published_at desc
If you just want the latest row in the whole result set you can use LIMIT. For example:
select *
from t
order by published_at desc
limit 1

Looking for a value in a jsonb list of keys/values

I have a postgresql table of cities (1 row = 1 city) with a jsonb colum containing the name of the city in different languages (as a list, not an array). For example for Paris(France) I have:
id_city (integer) = 7444
name_city (text) = Paris
names_i18n (jsonb) = {"name:fr":"Paris","name:zh":"巴黎","name:it":"Parigi",...}
In reality in my table I have around 20 different languages. So I try to find a city looking for any name:xx's value that could match a parameter given by the user, but I can't find how to query the jsonb column in that way. I've tried something like the request below but it doesn't seem to be the good syntaxe
select * from jsonb_each_text(select names_i18n from CityTable)
where value ilike 'Parigi'
I have also tried the following
select * from CityTable where names_i18n ? 'Parigi';
But it seems to work only for the key part of the jsonb, is there any similar operator for the value part? I also need a way to know what name:XX has been found, not only the city name.
Anyone has a clue?
with CityTable (id_city, name_city, names_i18n) as (values(
7444, 'Paris',
'{"name:fr":"Paris","name:zh":"巴黎","name:it":"Parigi"}'::jsonb
))
select *
from CityTable, jsonb_each_text(names_i18n) jbet (key, value)
where value ilike 'Parigi'
;
id_city | name_city | names_i18n | key | value
---------+-----------+--------------------------------------------------------------+---------+--------
7444 | Paris | {"name:fr": "Paris", "name:it": "Parigi", "name:zh": "巴黎"} | name:it | Parigi

PostgreSQL convert varchar to numeric and get average

I have a column that I want to get an average of, the column is varchar(200). I keep getting this error. How do I convert the column to numeric and get an average of it.
Values in the column look like
16,000.00
15,000.00
16,000.00 etc
When I execute
select CAST((COALESCE( bonus,'0')) AS numeric)
from tableone
... I get
ERROR: invalid input syntax for type numeric:
The standard way to represent (as text) a numeric in SQL is something like:
16000.00
15000.00
16000.00
So, your commas in the text are hurting you.
The most sensible way to solve this problem would be to store the data just as a numeric instead of using a string (text, varchar, character) type, as already suggested by a_horse_with_no_name.
However, assuming this is done for a good reason, such as you inherited a design you cannot change, one possibility is to get rid of all the characters which are not a (minus sign, digit, period) before casting to numeric:
Let's assume this is your input data
CREATE TABLE tableone
(
bonus text
) ;
INSERT INTO tableone(bonus)
VALUES
('16,000.00'),
('15,000.00'),
('16,000.00'),
('something strange 25'),
('why do you actually use a "text" column if you could just define it as numeric(15,0)?'),
(NULL) ;
You can remove all the straneous chars with a regexp_replace and the proper regular expression ([^-0-9.]), and do it globally:
SELECT
CAST(
COALESCE(
NULLIF(
regexp_replace(bonus, '[^-0-9.]+', '', 'g'),
''),
'0')
AS numeric)
FROM
tableone ;
| coalesce |
| -------: |
| 16000.00 |
| 15000.00 |
| 16000.00 |
| 25 |
| 150 |
| 0 |
See what happens to the 15,0 (this may NOT be what you want).
Check everything at dbfiddle here
I'm going to go out on a limb and say that it might be because you have Empty strings rather than nulls in your column; this would result in the error you are seeing. Try wrapping the column name in a nullif:
SELECT CAST(coalesce(NULLIF(bonus, ''), '0') AS integer) as new_field
But I would really question your schema that you have numeric values stored in a varchar column...

How to convert tsvector?

A typical and relevant application of tsvectot is to query and summarize information about the set of occurred words and about its frequency... And JSONB is the natural choice (!) to represent tsvectot datatype for these "querying applications"... So,
There are a simple workaround to cast tsvector into JSONB?
Example: counting global frequency of words of a cached tsvectot's, will be something like this query
SELECT r.key as word, SUM(r.value) as occurrences
FROM (
SELECT jsonb_each(kx_tsvectot::jsonb) as r FROM terms
) t
GROUP BY 1;
You can use ts_stat() function, which will give you exactly what you need
word text — the value of a lexeme
ndoc integer — number of documents (tsvectors) the word occurred in
nentry integer — total number of occurrences of the word
Example may be the following:
CREATE TABLE t (
tsv TSVECTOR
);
INSERT INTO t VALUES
('word'::TSVECTOR),
('second word'::TSVECTOR),
('third word'::TSVECTOR);
SELECT * FROM
ts_stat('SELECT tsv FROM t');
Result:
word | ndoc | nentry
--------+------+--------
word | 3 | 3
third | 1 | 1
second | 1 | 1
(3 rows)
If you still want to convert it to jsonb you can use cast word from text to jsonb.

Is it possible in PL/pgSQL to evaluate a string as an expression, not a statement?

I have two database tables:
# \d table_1
Table "public.table_1"
Column | Type | Modifiers
------------+---------+-----------
id | integer |
value | integer |
date_one | date |
date_two | date |
date_three | date |
# \d table_2
Table "public.table_2"
Column | Type | Modifiers
------------+---------+-----------
id | integer |
table_1_id | integer |
selector | text |
The values in table_2.selector can be one of one, two, or three, and are used to select one of the date columns in table_1.
My first implementation used a CASE:
SELECT value
FROM table_1
INNER JOIN table_2 ON table_2.table_1_id = table_1.id
WHERE CASE table_2.selector
WHEN 'one' THEN
table_1.date_one
WHEN 'two' THEN
table_1.date_two
WHEN 'three' THEN
table_1.date_three
ELSE
table_1.date_one
END BETWEEN ? AND ?
The values for selector are such that I could identify the column of interest as eval(date_#{table_2.selector}), if PL/pgSQL allows evaluation of strings as expressions.
The closest I've been able to find is EXECUTE string, which evaluates entire statements. Is there a way to evaluate expressions?
In the plpgsql function you can dynamically create any expression. This does not apply, however, in the case you described. The query must be explicitly defined before it is executed, while the choice of the field occurs while the query is executed.
Your query is the best approach. You may try to use a function, but it will not bring any benefits as the essence of the issue will remain unchanged.