Fuzzy string search in array with postgresql

Fuzzy string search in array with postgresql - postgresql

This is how I do fuzzy string search in postgresql:
select * from table where levenshtein(name, 'value') < 2;
But what can I do if the 'name' colum contains array?
P.S.: It is necessary to use index. And this is the difference.

You can use unnest() over the array:
select * from
(
select unnest(name) as name_in_array, id from
(
select 1 as id, ARRAY['value1','valu','lav'] as name
union all
select 2 as id, ARRAY['value2','orange','yellow'] as name
)t1
) t2
where levenshtein(name_in_array, 'value') < 2;

Related

How to use redshift regex to get out numbers in an array

In redshift, I have a column that contains an array-like string like [1,2,3] and I want to return 1,2,3 using Redshift's regex functionality. How can one do this? I don't want to do this:
SELECT LISTAGG(option_name , ',') WITHIN GROUP (ORDER BY option_name) as pets_names
FROM reference.vital_options
WHERE option_id in
(
-- this nested CTE splits the json string array into comma separated pet ids
with NS AS (
SELECT vo.option_id + 1 as n
FROM <column with number id> as vo
WHERE upper(vo.country) = 'US'
...
)
select TRIM(JSON_EXTRACT_ARRAY_ELEMENT_TEXT(u.pets_vital, NS.n - 1)) AS val
FROM NS
INNER JOIN go_prod.users AS u ON NS.n <= JSON_ARRAY_LENGTH(u.pets_vital)
WHERE u.id = %(user_id)s
)
AND ...

Is all you are trying to do is remove the square brackets? If so then the translate() function is likely what you want to use. For example:
create table test as (select '[1,2,3]'::text as A);
select a, translate(a, '][', '') as b from test;

Combine IN and LIKE function in DB2

Is there a way to combine IN and LIKE function together in DB2? For example I would like to exclude users that have userid A,B,C and also userid that start from X% or Y% . I tried the below query however it did not work
select * from table where userid not in ('A','B','C') or (not like 'X%' or not like 'Y%')

Use 'AND' instead of 'OR'
select * from table
where userid not in ('A','B','C')
and userid not like 'X%'
and userid not like 'Y%'

You may use all the constants used in IN in LIKE:
with
table (userid) as
(
values 'A', 'AA', 'XX', 'YY', 'ZZ'
)
, vals (userid) as
(
values 'A', 'B', 'C', 'X%', 'Y%'
)
select *
from table t
where not exists
(
select 1
from vals v
where t.userid like v.userid
);
The result is:
|USERID|
|------|
|AA |
|ZZ |

How to count the frequency of integers in a set of querystrings in postgres

I have a column in a postgres database which logs search querystrings for a page on our website.
The column contains data like
"a=2&b=4"
"a=2,3"
"b=4&a=3"
"a=4&a=3"
I'd like to work out the frequency of each value for a certain parameter (a).
value | freq
------|------
3 | 3
2 | 2
4 | 1
Anyway to do this in a single SQL statement?

Something like this:
with all_values as (
select string_to_array(split_part(parameter, '=', 2), ',') as query_params
from the_table d,
unnest(string_to_array(d.querystring, '&')) as x(parameter)
where x.parameter like 'a%'
)
select t.value, count(*)
from all_values av, unnest(av.query_params) as t(value)
group by t.value
order by t.value;
Online example: http://rextester.com/OXM67442

try something like this :
select data_value,count(*) from (
select data_name,unnest(string_to_array(data_values,',')) data_value from (
select split_part(data_array,'=',1) data_name ,split_part(data_array,'=',2) data_values from (
select unnest(string_to_array(mydata,'&')) data_array from mytable
) a
) b
) c where data_name='a' group by 1 order by 1

Assuming tha table that keeps the counts is called paramcount:
WITH vals(v) AS
(SELECT regexp_replace(p, '^.*=', '')
FROM regexp_split_to_table(
'b=4&a=3,2',
'&|,'
) p(p)
)
INSERT INTO paramcount (value, freq)
SELECT v, 1 FROM vals
ON CONFLICT (value)
DO UPDATE SET freq = paramcount.freq + 1
WHERE paramcount.value = EXCLUDED.value;

get csv integer after 'a='
split that to numbers
stat values
select v, count(*) from (
SELECT c,unnest(string_to_array(unnest(regexp_matches(c,'a=([0-9,]+)','g')),',')) as v FROM qrs
) x group by v;
Parametrize:
WITH argname(aname) as (values ('a'::TEXT))
select v, count(*) from (SELECT c,unnest(string_to_array(unnest(regexp_matches(c,aname||'=([0-9,]+)','g')),',')) as v FROM qrs,argname) x group by v;

Postgresql - get closest datetime row relative to given datetime value

I have a postgres table with a unique datetime field.
I would like to use/create a function that takes as argument a datetime value and returns the row id having the closest datetime relative (but not equal) to the passed datetime value. A second argument could specify before or after the passed value.
Ideally, some combination of native datetime functions could handle this requirement. Otherwise it'll have to be a custom function.
Question: What are methods for querying relative datetime over a collection of rows?

select id, passed_ts - ts_column difference
from t
where
passed_ts > ts_column and positive_interval
or
passed_ts < ts_column and not positive_interval
order by abs(extract(epoch from passed_ts - ts_column))
limit 1
passed_ts is the timestamp parameter and positive_interval is a boolean parameter. If true only rows where the timestamp column is lower then the passed timestamp. If false the inverse.

use simply -.
Assuming you have a table with attributes Key, Attr and T (timestamp with or without timezone):
you can search with
select min(T - TimeValue) from Table where (T - TimeValue) > 0;
this will give you the main difference. You can combine this value with a join to the same table to get the tuple you are interested in:
select * from (select *, T - TimeValue as diff from Table) as T1 NATURAL JOIN
( select min(T - TimeValue) as diff from Table where (T - TimeValue) > 0) as T2;
that should do it
--dmg

You want the first row of a select statement producing all the rows below (or above) the given datetime in descending (or ascending) order.
Pseudo code for the function body:
SELECT id
FROM table
WHERE IF(#above, datecol < #param, datecol > #param)
ORDER BY IF (#above. datecol ASC, datecol DESC)
LIMIT 1
However, this does not work: one cannot condition the ordering direction.
The second idea is to do both queries, and select afterwards:
SELECT *
FROM (
(
SELECT 'below' AS dir, id
FROM table
WHERE datecol < #param
ORDER BY datecol DESC
LIMIT 1
) UNION (
SELECT 'above' AS dir, id
FROM table
WHERE datecol > #param
ORDER BY datecol ASC
LIMIT 1)
) AS t
WHERE dir = #dir
That should be pretty fast with an index on the datetime column.

-- test rig
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE lutser
( dt timestamp NOT NULL PRIMARY KEY
);
-- populate it
INSERT INTO lutser(dt)
SELECT gs
FROM generate_series('2013-04-30', '2013-05-01', '1 min'::interval) gs
;
DELETE FROM lutser WHERE random() < 0.9;
--
-- The query:
WITH xyz AS (
SELECT dt AS hh
, LAG (dt) OVER (ORDER by dt ) AS ll
FROM lutser
)
SELECT *
FROM xyz bb
WHERE '2013-04-30 12:00' BETWEEN bb.ll AND bb.hh
;
Result:
NOTICE: drop cascades to table tmp.lutser
DROP SCHEMA
CREATE SCHEMA
SET
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "lutser_pkey" for table "lutser"
CREATE TABLE
INSERT 0 1441
DELETE 1288
hh | ll
---------------------+---------------------
2013-04-30 12:02:00 | 2013-04-30 11:50:00
(1 row)
Wrapping it into a function is left as an excercise for the reader
UPDATE: here is a second one with the sandwiched-not-exists-trick (TM):
SELECT lo.dt AS ll
FROM lutser lo
JOIN lutser hi ON hi.dt > lo.dt
AND NOT EXISTS (
SELECT * FROM lutser nx
WHERE nx.dt < hi.dt
AND nx.dt > lo.dt
)
WHERE '2013-04-30 12:00' BETWEEN lo.dt AND hi.dt
;

You have to join the table to itself with the where condition looking for the smallest nonzero (negative or positive) interval between the base table row's datetime and the joined table row's datetime. It would be good to have an index on that datetime column.
P.S. You could also look for the max() of the previous or the min() of the subsequent.

Try something like:
SELECT *
FROM your_table
WHERE (dt_time > argument_time and search_above = 'true')
OR (dt_time < argument_time and search_above = 'false')
ORDER BY CASE WHEN search_above = 'true'
THEN dt_time - argument_time
ELSE argument_time - dt_time
END
LIMIT 1;

Filter union result

I'm making select with a union.
SELECT * FROM table_1
UNION
SELECT * FROM table_2
Is it possible to filter query results by column values?

Yes, you can enclose your entire union inside another select:
select * from (
select * from table_1 union select * from table_2) as t
where t.column = 'y'
You have to introduce the alias for the table ("as t"). Also, if the data from the tables is disjoint, you might want to consider switching to UNION ALL - UNION by itself works to eliminate duplicates in the result set. This is frequently not necessary.

A simple to read solution is to use a CTE (common table expression). This takes the form:
WITH foobar AS (
SELECT foo, bar FROM table_1
UNION
SELECT foo, bar FROM table_2
)
Then you can refer to the CTE in subsequent queries by name, as if it were a normal table:
SELECT foo,bar FROM foobar WHERE foo = 'value'
CTEs are quite powerful, I recommend further reading here
One tip that you will not find in that MS article is; if you require more than one CTE put a comma between the expression statements. eg:
WITH foo AS (
SELECT thing FROM place WHERE field = 'Value'
),
bar AS (
SELECT otherthing FROM otherplace WHERE otherfield = 'Other Value'
)

If you want to filter the query based on some criteria then you could do this -
Select * from table_1 where table_1.col1 = <some value>
UNION
Select * from table_2 where table_2.col1 = <some value>
But, I would say if you want to filter result to find the common values then you can use joins instead
Select * from table_1 inner join table_2 on table_1.col1 = table_2.col1

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Fuzzy string search in array with postgresql - postgresql

This is how I do fuzzy string search in postgresql: select * from table where levenshtein(name, 'value') < 2; But what can I do if the 'name' colum contains array? P.S.: It is necessary to use index. And this is the difference.

You can use unnest() over the array: select * from ( select unnest(name) as name_in_array, id from ( select 1 as id, ARRAY['value1','valu','lav'] as name union all select 2 as id, ARRAY['value2','orange','yellow'] as name )t1 ) t2 where levenshtein(name_in_array, 'value') < 2;

Related

How to use redshift regex to get out numbers in an array

Combine IN and LIKE function in DB2

How to count the frequency of integers in a set of querystrings in postgres

Postgresql - get closest datetime row relative to given datetime value

Filter union result

Categories

Resources