Search inside full search column using certain letters - crate

I want to search inside a full search column using certain letters, I mean:
select "Name","Country","_score" from datatable where match("Country", 'China');
Returns many rows and is ok. My question is, how can I search for example:
select "Name","Country","_score" from datatable where match("Country", 'Ch');
I want to see, China, Chile, etc.
I think that match_type phrase_prefix can be the answer, but I don't know how I can use (correct syntax).

The match predicate supports different types by use of using match_type [with (match_parameter = [value])].
So in your example using the phrase_prefix match type:
select "Name","Country","_score" from datatable where match("Country", 'Ch') using phrase_prefix;
gives you your desired results.
See the match predicate documentation: https://crate.io/docs/en/latest/sql/fulltext.html?#match-predicate

If you just need to match the beginning of a string column, you don't need a fulltext analyzed column. You can use the LIKE operator instead, e.g.:
cr> create table names_table (name string, country string);
CREATE OK (0.840 sec)
cr> insert into names_table (name, country) values ('foo', 'China'), ('bar','Chile'), ('foobar', 'Austria');
INSERT OK, 3 rows affected (0.049 sec)
cr> select * from names_table where country like 'Ch%';
+---------+------+
| country | name |
+---------+------+
| Chile | bar |
| China | foo |
+---------+------+
SELECT 2 rows in set (0.037 sec)

Related

Aggregate function to extract all fields based on maximum date

In one table I have duplicate values ​​that I would like to group and export only those fields where the value in the "published_at" field is the most up-to-date (the latest date possible). Do I understand it correctly as I use the MAX aggregate function the corresponding fields I would like to extract will refer to the max found or will it take the first found in the table?
Let me demonstrate you this on simple example (in real world example I am also joining two different tables). I would like to group it by id and extract all fields but only relating to the max published_at field. My query would be:
SELECT "t1"."id", "t1"."field", MAX("t1"."published_at") as "published_at"
FROM "t1"
GROUP By "t1"."id"
| id | field | published_at |
---------------------------------
| 1 | document1 | 2022-01-10 |
| 1 | document2 | 2022-01-11 |
| 1 | document3 | 2022-01-12 |
The result I want is:
1 - document3 - 2022-01-12
Also one question - why am I getting this error "ERROR: column "t1"."field" must appear in the GROUP BY clause or be used in an aggregate function". Can I use MAX function on string type column?
If you want the latest row for each id, you can use DISTINCT ON. For example:
select distinct on (id) *
from t
order by id, published_at desc
If you just want the latest row in the whole result set you can use LIMIT. For example:
select *
from t
order by published_at desc
limit 1

How to use ts_query with ANY(anyarray)

I currently have a query in PostgreSQL like:
SELECT
name
FROM
ingredients
WHERE
name = ANY({"string value",tomato,other})
My ingredients table is simply a list of names:
name
----------
jalapeno
tomatoes
avocados
lime
My issue is that plural values in the array will not match single values in the query. To solve this, I created a tsvector column on the table:
name | tokens
---------------+--------------
jalapeno | 'jalapeno':1
tomatoes | 'tomato':1
avocados | 'avocado':1
lime | 'lime':1
I'm able to correctly query single values from the table like this:
SELECT
name,
ts_rank_cd(tokens, plainto_tsquery('tomato'), 16) AS rank
FROM
ingredients
WHERE
tokens ## plainto_tsquery('tomato')
ORDER BY
rank DESC;
However, I need to query values from the entire array. The array is generated from another function, so I have control over the type of each of items in the array.
How can I use the ## operand with ANY(anyarray)?
That should be straight forward:
WHERE tokens ## ANY
(ARRAY[
plainto_tsquery('tomato'),
plainto_tsquery('celery'),
plainto_tsquery('vodka')
])

Looking for a value in a jsonb list of keys/values

I have a postgresql table of cities (1 row = 1 city) with a jsonb colum containing the name of the city in different languages (as a list, not an array). For example for Paris(France) I have:
id_city (integer) = 7444
name_city (text) = Paris
names_i18n (jsonb) = {"name:fr":"Paris","name:zh":"巴黎","name:it":"Parigi",...}
In reality in my table I have around 20 different languages. So I try to find a city looking for any name:xx's value that could match a parameter given by the user, but I can't find how to query the jsonb column in that way. I've tried something like the request below but it doesn't seem to be the good syntaxe
select * from jsonb_each_text(select names_i18n from CityTable)
where value ilike 'Parigi'
I have also tried the following
select * from CityTable where names_i18n ? 'Parigi';
But it seems to work only for the key part of the jsonb, is there any similar operator for the value part? I also need a way to know what name:XX has been found, not only the city name.
Anyone has a clue?
with CityTable (id_city, name_city, names_i18n) as (values(
7444, 'Paris',
'{"name:fr":"Paris","name:zh":"巴黎","name:it":"Parigi"}'::jsonb
))
select *
from CityTable, jsonb_each_text(names_i18n) jbet (key, value)
where value ilike 'Parigi'
;
id_city | name_city | names_i18n | key | value
---------+-----------+--------------------------------------------------------------+---------+--------
7444 | Paris | {"name:fr": "Paris", "name:it": "Parigi", "name:zh": "巴黎"} | name:it | Parigi

How to convert tsvector?

A typical and relevant application of tsvectot is to query and summarize information about the set of occurred words and about its frequency... And JSONB is the natural choice (!) to represent tsvectot datatype for these "querying applications"... So,
There are a simple workaround to cast tsvector into JSONB?
Example: counting global frequency of words of a cached tsvectot's, will be something like this query
SELECT r.key as word, SUM(r.value) as occurrences
FROM (
SELECT jsonb_each(kx_tsvectot::jsonb) as r FROM terms
) t
GROUP BY 1;
You can use ts_stat() function, which will give you exactly what you need
word text — the value of a lexeme
ndoc integer — number of documents (tsvectors) the word occurred in
nentry integer — total number of occurrences of the word
Example may be the following:
CREATE TABLE t (
tsv TSVECTOR
);
INSERT INTO t VALUES
('word'::TSVECTOR),
('second word'::TSVECTOR),
('third word'::TSVECTOR);
SELECT * FROM
ts_stat('SELECT tsv FROM t');
Result:
word | ndoc | nentry
--------+------+--------
word | 3 | 3
third | 1 | 1
second | 1 | 1
(3 rows)
If you still want to convert it to jsonb you can use cast word from text to jsonb.

Postgresql query results to depend on few rows of same table

I'm working on some application, and we're using postgres as our DB. I don't a lot of experience with SQL at all, and now i encountered a problem, that i can't find answer to.
So here's a problem:
We have privacy settings stored in separate table, and accessibility of each row of data depends on few rows of this privacy table.
Basically structure of privacy table is:
entityId | entityType | privacyId | privacyType | allow | deletedAt
-------------------------------------------------------------------
5 | user | 6 | user | f | //example entry
5 | user | 1 | user_all | t |
In two words, this settings mean, that user id5 allows to have access to his data to everybody except user id6.
So i get available data by query like:
SELECT <some_relevant_fields> FROM <table>
JOIN <join>
WHERE
(privacy."privacyId"=6 AND privacy."privacyType"='user' AND privacy.allow=true)
OR (
(privacy."privacyType"='user_all' AND privacy."deletedAt" IS NOT NULL)
AND
(privacy."privacyType"='user' AND privacy."privacyId"=6 AND privacy.allow!=false)
);
I know that this query is incorrect in this form, but i want you to get idea of what i try to achieve.
So it must check for field with its type/id and allow=true, OR check that user_all is not deleted(deletedAt field is null) and there is no field restricting access with allow=false to this user.
But it seems like postgres is chaining all expressions, so it overrides privacy."privacyType"='user_all' with 'user' at the end of expression, and returns no results, or returns data even if user "blocked", because 'user_all' exist.
Is there a way to write WHERE clause to return result if AND expression is true for 2 different rows, for example in code above: (privacy."privacyType"='user_all' AND privacy."deletedAt" IS NOT NULL) is true for one row AND (privacy."privacyType"='user' AND privacy."privacyId"=6 AND privacy.allow!=false) is true for other, or maybe check for absence of row with this values.
Is this what you want?
select <some_fields> from <table> where
privacyType='user_all' AND deletedAt IS NOT NULL
union
select <some_fields> from <table> where
privacyType='user' AND privacyId=6 AND allow<>'f';
You left join the table with itself and found what element doesnt have a match using the where.
SELECT p1.*
FROM privacy p1
LEFT JOIN privacy p2
ON p1."entityId" = p2."entityId"
AND p1."privacyType" = 'user_all'
AND p1."deletedAt" IS NULL
AND p2."privacyType"='user' AND
AND p2."privacyId"= 6
AND p2.allow!=false
WHERE
p2.privacyId IS NOT NULL