How to unnest single quoted json array in Postgresql - postgresql

I have a postgresql table containing a column (movies) with json array. The column type is text. Below is example of table:
name
movies
bob
['movie1', 'movie2']
mary
['movie1', 'movie3']
How can I unnest the above table to look like below:
name
movie
bob
movie1
bob
movie2
mary
movie1
mary
movie3
Also note that the elements in the json array are single quoted
Im using postgresql database on AWS RDS engine version 10.17.
Thanks in advance

That is not JSON, that is "something vaguely inspired by JSON". we don't know how it will deal with things like apostrophes in the titles, or non ASCII characters, or any of the other things that an actual standard should specify but something vaguely inspired by a standard doesn't.
If you want to ignore such niceties and make something that works on this one example, we could suppress the characters '[] (done by regexp_replace) and then split/unnest on commas followed by optional space (done by regexp_split_to_table).
with t as (select 'bob' name ,$$['movie1', 'movie2']$$ movies union select 'mary',$$['movie1', 'movie3']$$)
select name, movie from t, regexp_split_to_table(regexp_replace(movies,$$['\[\]]$$,$$$$,'g'),', ?') g(movie);
Another slightly more resilient option would be to swap ' for " then use an actual JSON parser:
with t as (select 'bob' name ,$$['lions, and tigers, and bears', 'movie2']$$ movies union select 'mary',$$['movie1','movie3']$$)
select name, movie from t, jsonb_array_elements_text(regexp_replace(movies,$$'$$,$$"$$,'g')::jsonb) g(movie);

Related

Split column to many columns in PostgresQL

I have a table like:
id Name f_data
1 Raj {"review":"perfect", "tech":{"scalability":"complete", "backbonetech":["satellite"], "lastmiletech":["Fiber","Wireless","DSL"] }}
I want to split f_data column to multiple columns. Expected result:
id Name review scalability backbonetech lastmiletech
1 Raj perfect complete satellite Fiber,wireless,DSL
when I tray split json column I couldn't remove the bracket. My output is:
id Name review scalability backbonetech lastmiletech
1 Raj perfect complete ["satellite"] ["Fiber","wireless","DSL"]
I used this code:
SELECT id, Name,
f_data->'review' ->>0 as review,
f_data->'tech' ->> 'scalability' as scalability,
f_data->'tech' ->> 'backbonetech' as backbonetech,
f_data->'tech' ->> 'lastmiletech' as lastmileteck
from my_table;
One possibility is to use the json_array_elements_text function to transform the arrays in your JSON into a set of text and then use the string_agg function to concatenate the individual elements into a single string.
For lastmiletech, the request might look like :
select
string_agg(t.lastmiletech, ',') as lastmiletech
from
(
select
json_array_elements_text('{"scalability":"complete", "backbonetech":>["satellite"], "lastmiletech":["Fiber","Wireless","DSL"]}'::json->'lastmiletech') as lastmiletech
) t
You can modify the subquery to include the additional fields you need.
As the name imply, the first parameter of the json_array_elements_text has to be a json array, so be careful not to convert the array into text before passing it to the function. In other words, f_data->tech->>lastmiletech will not be accepted.

How to perform a search query on a column value containing a string with comma separated values?

I have a table which looks like below
date | tags | name
------------+-----------------------------------+--------
2018-10-08 | 100.21.100.1, cpu, del ZONE1
2018-10-08 | 100.21.100.1, mem, blr ZONE2
2018-10-08 | 110.22.100.3, cpu, blr ZONE3
2018-10-09 | 110.22.100.3, down, hyd ZONE2
2018-10-09 | 110.22.100.3, down, del ZONE1
I want to select the name for those rows which have certain strings in the tags column
Here column tags has values which are strings containing comma separated values.
For example I have a list of strings ["down", "110.22.100.3"]. Now if I do a look up into the table which contains the strings in the list, I should get the last two rows which have the names ZONE2, ZONE1 respectively.
Now I know there is something called in operator but I am not quite sure how to use it here.
I tried something like below
select name from zone_table where 'down, 110.22.100.3' in tags;
But I get syntax error.How do I do it?
You can do something like this.
select name from zone_table where
string_to_array(replace(tags,' ',''),',')#>
string_to_array(replace('down, 110.22.100.3',' ',''),',');
1) delete spaces in the existing string for proper string_to_array separation without any spaces in the front using replace
2)string_to_array converts your string to array separated by comma.
3) #> is the contains operator
(OR)
If you want to match as a whole
select name from zone_table where POSITION('down, 110.22.100.3' in tags)!=0
For separate matches you can do
select name from zone_table where POSITION('down' in tags)!=0 and
POSITION('110.22.100.3' in tags)!=0
More about position here
We can try using the LIKE operator here, and check for the presence of each tag in the CSV tag list:
SELECT name
FROM zone_table
WHERE ', ' || tags LIKE '%, down,%' AND ', ' || tags LIKE '%, 110.22.100.3,%';
Demo
Important Note: It is generally bad practice to store CSV data in your SQL tables, for the very reason that it is unnormalized and makes it hard to work with. It would be much better design to have each individual tag persisted in its own record.
demo: db<>fiddle
I would do a check with array overlapping (&& operator):
SELECT name
FROM zone_table
WHERE string_to_array('down, 110.22.100.3', ',') && string_to_array(tags,',')
Split your string lists (the column values and the compare text 'down, 110.22.100.3') into arrays with string_to_array() (of course if your compare text is an array already you don't have to split it)
Now the && operator checks if both arrays overlap: It checks if one array element is part of both arrays (documentation).
Notice:
"date" is a reserved word in Postgres. I recommend to rename this column.
In your examples the delimiter of your string lists is ", " and not ",". You should take care of the whitespace. Either your string split delimiter is ", " too or you should concatenate the strings with a simple "," which makes some things easier (aside the fully agreed thoughts about storing the values as string lists made by #TimBiegeleisen)

why is this postgresql full text search query returning ts_rank of 0?

Before I invest in using solr or lucene or sphinx, I wanted to try to implement a search capability on my system using postgresql full text search.
I have a national list of businesses in a table that I want to search. I created a ts vector that combines the business name and city so that I can do a search like "outback atlanta".
I am also implementing an auto-completion function by using the wildcard capability of the search by appending ":" to the search pattern and inserting " & " between keywords, so the search pattern "outback atl" turns into the "outback & atl:" before getting converted into a query using to_tsquery().
Here's the problem that I am running into currently.
if the search pattern is entered as "ou", many "Outback Steakhouse" records are returned.
if the search pattern is entered as "out", no results are returned.
if the search pattern is entered as "outb", many "Outback Steakhouse" records are returned.
doing a little debugging, I came up with this:
select ts_rank(to_tsvector('Outback Steakhouse'),to_tsquery('ou:*')) as "ou",
ts_rank(to_tsvector('Outback Steakhouse'),to_tsquery('out:*')) as "out",
ts_rank(to_tsvector('Outback Steakhouse'),to_tsquery('outb:*')) as "outb"
which results this:
ou out outb
0.0607927 0 0.0607927
What am I doing wrong?
Is this a limitation of pg full text search?
Is there something that I can do with my dictionary or configuration to get around this anomaly?
UPDATE:
I think that "out" may be a stop word.
when I run this debug query, I don't get any lexemes for "out"
SELECT * FROM ts_debug('english','out back outback');
alias description token dictionaries dictionary lexemes
asciiword Word all ASCII out {english_stem} english_stem {}
blank Space symbols {}
asciiword Word all ASCII back {english_stem} english_stem {back}
blank Space symbols {}
asciiword Word all ASCII outback {english_stem} english_stem {outback}
So now I ask how do I modify the stop word list to remove a word?
UPDATE:
here is the query that I currently using:
select id,name,address,city,state,likes
from view_business_favorite_count
where textsearchable_index_col ## to_tsquery('simple',$1)
ORDER BY ts_rank(textsearchable_index_col, to_tsquery('simple',$1)) DESC
When I execute the query (I'm using Strongloop Loopback + Express + Node), I pass the pattern in to replace $1 param. The pattern (as stated above) will look something like "keyword:" or "keyword1 & keyword2 & ... & keywordN:"
thanks
The problem here is that you are searching against business names and as #Daniel correctly pointed out - 'english' dictionary will not help you to find "fuzzy" match for NON-dictionary words like "Outback Steakhouse" etc;
'simple' dictionary
'simple' dictionary on its own will not help you neither, in your case business names will work only for exact match as all words are unstemmed.
'simple' dictionary + pg_trgm
But if you use 'simple' dictionary together with pg_trgm module - it will be exactly what you need, in particular:
for to_tsvector('simple','<business name>') you don't need to worry about stop words "hack", you will get all the lexemes unstemmed;
using similarity() from pg_trgm you will get the the highest "rank"
for the best match,
look at this:
WITH pg_trgm_test(business_name,search_pattern) AS ( VALUES
('Outback Steakhouse','ou'),
('Outback Steakhouse','out'),
('Outback Steakhouse','outb')
)
SELECT business_name,search_pattern,similarity(business_name,search_pattern)
FROM pg_trgm_test;
result:
business_name | search_pattern | similarity
--------------------+----------------+------------
Outback Steakhouse | ou | 0.1
Outback Steakhouse | out | 0.15
Outback Steakhouse | outb | 0.2
(3 rows)
Ordering by similarity DESC you will be able to get what you need.
UPDATE
For you situation there are 2 possible options.
Option #1.
Just create trgm index for name column in view_business_favorite_count table; index definition may be the following:
CREATE INDEX name_trgm_idx ON view_business_favorite_count USING gin (name gin_trgm_ops);
Query will look something like that:
SELECT
id,
name,
address,
city,
state,
likes,
similarity(name,$1) AS trgm_rank -- similarity score
FROM
view_business_favorite_count
WHERE
name % $1 -- trgm search
ORDER BY trgm_rank DESC;
Option #2.
With full text search, you need to :
create a separate table, for example unnested_business_names, where you will store 2 columns: 1st column will keep all lexemes from to_tsvector('simple',name) function, 2nd column will have vbfc_id(FK for id from view_business_favorite_count table);
add trgm index for the column, which contains lexemes;
add trigger for unnested_business_names, which will update OR insert OR delete new values from view_business_favorite_count to keep all words up to date

Postgres Find and Return Keywords From List Within Select

I have a simple postgres table that contains a comments (text) column.
Within a view, I need to search that comments field for a list of words and then return a comma separated list of the words found as a column (as well as a bunch of normal columns).
The list of defined keywords contains about 20 words. I.e. apples, bananas, pear, peach, plum.
Ideal result would be something like:
id | comments | keywords
-----------------------------------------------------
1 | I like bananas! | bananas
2 | I like apples. | apples
3 | I don't like fruit |
4 | I like apples and bananas! | apples,bananas
I'm thinking I need to do a sub query and array_agg? Or possibly 'where in'. But I can't figure out how to bolt it together.
Many thanks,
Steve
You can use full-text search facilities to achieve results:
Setup new ispell dictionary with your list of words.
Create full-text search configuration which will be based on your dictionary. Don't forget to remove all other dictionaries from configuration, because in your case all other words actually are stopwords.
After that when you execute
select plainto_tsquery('<your config name>', 'I like apples and bananas!')
you will get only your keywords: 'apples' & 'bananas' or even 'apple' & 'banana' if you setup dictionary properly.
By default, english configuration uses snowball dictionaries which reduce word endings, so if you run
select plainto_tsquery('english', 'I like apples and bananas!')
you will get
'like' & 'appl' & 'banana'
which is not exact suitable for your case.
Another easier way (but slower):
create dict table:
create table keywords (nm text);
insert into keywords (nm)
values ('apples'), ('bananas');
Execute the following script against your text to extract keywords
select string_agg(regexp_replace(foo, '[^a-zA-Z\-]*', '', 'ig'), ',') s
from regexp_split_to_table('I like apples and bananas!', E'\\s+') foo
where regexp_replace(foo, '[^a-zA-Z\-]*', '', 'ig') in (select nm from keywords)
This solution is worse in terms of semantic, so banana and bananas will be different keywords.

How do I match variables from FreeRADIUS in PostgreSQL with the LIKE operator?

In a PostgreSQL query, executed by FreeRADIUS, I want to do something similar to (the table names and values are just examples):
SELECT name
FROM users
WHERE city LIKE '%blahblah%';
but there is a catch: the blahblah value is contained in a FreeRADIUS variable, represented with '%{variable-name}'. It expands to 'blahblah'.
Now my question is: How do I match the %{variable-name} variable to the value stored in the table using the LIKE operator?
I tried using
SELECT name
FROM users
WHERE city LIKE '%%{variable-name}%';
but it doesn't expand correctly like that and is obviously incorrect.
The final query I want to achieve is
...
WHERE city LIKE '%blahblah%';
so it matches the longer string containing 'blahblah' stored in the table, but I want the variable to expand dynamically into the correct query. Is there a way to do it?
Thanks!
Wild guess:
Assuming that FreeRADIUS is doing dumb substitution across the entire SQL string, with no attempt to parse literals, etc, before sending the SQL to PostgreSQL then you could use:
SELECT name
FROM users
WHERE city LIKE '%'||'%{variable-name}'||'%';
Edit: To avoid the warnings caused by FreeRADIUS not parsing cleverly enough, hide the %s as hex chars:
WHERE city LIKE E'\x25%{variable-name}\x25';
Note the leading E for the string marking it as a string subject to escape processing.
SELECT name
FROM users
WHERE city LIKE '%%'||'%{variable-name}'||'%%';
Is slightly cleaner. %% is FreeRADIUS' escape sequence for percents.