using LOWER with IN condition [duplicate] - postgresql

This question already has answers here:
PostgreSQL: Case insensitive string comparison
(6 answers)
Closed 6 years ago.
assume I have a table named comodity_group and the structure looks like:
+----------+-------+
| group_id | name |
+----------+-------+
| 1 | Data1 |
+----------+-------+
| 2 | Data2 |
+----------+-------+
| 3 | data3 |
+----------+-------+
and I have the following query
SELECT * FROM comodity_group WHERE name IN('data1','data2','data3')
the query return 0 result, because condition is all in lowercase (note that the condition is also dynamic, meaning it can be Data1 or daTa1, etc)
so I want to make both condition and field name in lowercase, in other word case insensitive.

You can use ILIKE and an array:
select *
from comodity_group
where name ilike any (array['Data1', 'data2', 'dATA3']);
Note that this won't be really fast as the ILIKE operator can't make use of a regular index on the name column.

You can convert your name data to lowercase
SELECT * FROM comodity_group WHERE lower(name) IN('data1','data2','data3')

Assuming you have control over the terms which appear in the IN clause of your query, then you should only need to lowercase the name column before making the comparison:
SELECT *
FROM commodity_group
WHERE LOWER(name) IN ('data1', 'data2', 'data3')
Off the top of my head, you could also join to an inline table containing the search terms:
WITH cte AS (
SELECT 'daTa1' AS name
UNION ALL
SELECT 'Data2'
UNION ALL
SELECT 'datA3'
)
SELECT *
FROM commodity_group t1
INNER JOIN cte t2
ON LOWER(t1.name) = LOWER(t2.name)
With the possible matches in an actual table, we now have the ability to lowercase both sides of the comparison.

Related

SQL WHERE condition that one field's string can be found in another field

Here's some sample data
ID | sys1lname | sys2lname
------------------------------------
1 | JOHNSON | JOHNSON
2 | FULTON | ANDERS-FULTON
3 | SMITH | SMITH-DAVIDS
4 | HARRISON | JONES
The goal is to find records where the last names do NOT match, BUT allow when sys1lname can be found somewhere within sys2lname, which may or may not be a hyphenated name. So from the above data, only record 4 should return.
When I put this (SUBSTRING(sys2lname, CHARINDEX(sys2lname, ccm.NAME_LAST), LEN(sys1lname))) in the SELECT statement it will properly return the part of sys2lname that matches sys1lname.
But when I use that in the WHERE clause
WHERE 1=1
AND sys1lname <> sys2lname
OR sys1lname not in ('%' + (SUBSTRING(sys2lname, CHARINDEX(sys1lname, sys2lname), LEN(sys1lname))))
the records with hyphenated names are in the result set.
And I can't figure out why.
Just use a NOT LIKE:
SELECT ID
FROM dbo.YourTable
WHERE sys2lname NOT LIKE '%' + sys1lname + '%';
If you could have a name like 'Smith' in sys1lname and 'BlackSmith' (or even 'Green-Blacksmith') in sys2lname and don't want them to match, I would use STRING_SPLIT and a NOT EXISTS:
SELECT ID
FROM dbo.YourTable YT
WHERE NOT EXISTS (SELECT 1
FROM STRING_SPLIT(YT.sys2lname,'-') SS
WHERE SS.[value] = YT.sys1lname);

postgres offset by value not number

I have a table with at least a "name" column and an "ordinal_position" column. I wish to loop each row starting from a certain row the user inputs. Let's say the user inputs "John", and that his ordinal_position is 6 (out of a 10 total). How do I loop only the last 4 rows without using a subquery? I've tried using the "OVER()" window function but it doesn't seem to work on the offset part of the query, and that same offset only takes numbers (as far as I know) not strings.
EDIT (in response to klin):
INSERT INTO foo(id,name,ordinal_position) VALUES
(DEFAULT,'Peter',1),
(DEFAULT,'James',2),
(DEFAULT,'Freddy',3),
(DEFAULT,'Mark',4),
(DEFAULT,'Jack',5),
(DEFAULT,'John',6),
(DEFAULT,'Will',7),
(DEFAULT,'Robert',8),
(DEFAULT,'Dave',9),
(DEFAULT,'Michael',10);
so in my FOR, since the user inputed "John" I want to loop through Will-Michael. Something like the following but without a subquery:
SELECT * FROM foo ORDER BY ordinal_position OFFSET
(SELECT ordinal_position FROM foo WHERE name='John');
Unfortunately, you have to query the table to find an ordinal_position for a given name.
However, do not use offset. You can do it in where clause, for large tables it will be much faster:
select *
from foo
where ordinal_position > (select ordinal_position from foo where name = 'John')
order by ordinal_position;
id | name | ordinal_position
----+---------+------------------
7 | Will | 7
8 | Robert | 8
9 | Dave | 9
10 | Michael | 10
(4 rows)

How to split a string in a smart way?

Function string_to_array splits strings without grouping substrings in apostrophes:
# select unnest(string_to_array('one, "two,three"', ','));
unnest
--------
one
"two
three"
(3 rows)
I would like to have a smarter function, like this:
# select unnest(smarter_string_to_array('one, "two,three"', ','));
unnest
--------
one
two,three
(2 rows)
Purpose.
I know that COPY command does it in a proper way, but I need this feature internally.
I want to parse a text representation of rows of existing table. Example:
# select * from dataset limit 2;
id | name | state
----+-----------------+--------
1 | Smith, Reginald | Canada
2 | Jones, Susan |
(2 rows)
# select dataset::text from dataset limit 2;
dataset
------------------------------
(1,"Smith, Reginald",Canada)
(2,"Jones, Susan","")
(2 rows)
I want to do it dynamically in a plpgsql function for different tables. I cannot assume constant number of columns of a table nor a format of columns values.
There is a nice method to transpose a whole table into a one-column table:
select (json_each_text(row_to_json(t))).value from dataset t;
If the column id is unique then
select id, array_agg(value) arr from (
select row_number() over() rn, id, value from (
select id, (json_each_text(row_to_json(t))).value from dataset t
) alias
order by id, rn
) alias
group by id;
gives you exactly what you want. Additional query with row_number() is necessary to keep original order of columns.

Adding the results of two select queries into one table row with PostgreSQL

I am attempting to return the result of two distinct select statements into one row in PostgreSQL. For example, I have two queries each that return the same number of rows:
Select tableid1, tableid2, tableid3 from table1
+----------+----------+----------+
| tableid1 | tableid2 | tableid3 |
+----------+----------+----------+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+----------+----------+----------+
Select table2id1, table2id2, table2id3, table2id4 from table2
+-----------+-----------+-----------+-----------+
| table2id1 | table2id2 | table2id3 | table2id4 |
+-----------+-----------+-----------+-----------+
| 7 | 8 | 9 | 15 |
| 10 | 11 | 12 | 19 |
+-----------+-----------+-----------+-----------+
Now i want to concatenate these tables keeping the same number of rows. I do not want to join on any values. The desired result would look like the following:
+----------+----------+----------+-----------+-----------+-----------+-----------+
| tableid1 | tableid2 | tableid3 | table2id1 | table2id2 | table2id3 | table2id4 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
| 1 | 2 | 3 | 7 | 8 | 9 | 15 |
| 4 | 5 | 6 | 10 | 11 | 12 | 19 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
What can I do to the two above queries (select * from table1) and (select * from table2) to return the desired result above.
Thanks!
You can use row_number() for join, but I'm not sure that you have guaranties that order of the rows will stay the same as in the tables. So it's better to add some order into over() clause.
with cte1 as (
select
tableid1, tableid2, tableid3, row_number() over() as rn
from table1
), cte2 as (
select
table2id1, table2id2, table2id3, table2id4, row_number() over() as rn
from table2
)
select *
from cte1 as c1
inner join cte2 as c2 on c2.rn = c1.rn
You can't have what you want, as you wrote the question. Your two SELECTs don't have any ORDER BY clause, so the database can return the rows in whatever order it feels like. If it currently matches up, it does so only by accident, and will stop matching up as soon as you UPDATE a row.
You need a key column. Then you need to join on the key column. Anything else is attempting to invent unreliable and unsafe joins without actually using a join.
Frankly, this seems like a pretty dodgy schema. Lots of numbered integer columns like this, and the desire to concatenate them, may be a sign you should be looking at using integer arrays, or using a side-table with a foreign key relationship, instead.
Sample data in case anyone else wants to play:
CREATE TABLE table1(tableid1 integer, tableid2 integer, tableid3 integer);
INSERT INTO table1 VALUES (1,2,3), (4,5,6);
CREATE TABLE table2(table2id1 integer, table2id2 integer, table2id3 integer, table2id4 integer);
INSERT INTO table2 VALUES (7,8,9,15), (10,11,12,19);
Depending on what you're actually doing you might really have wanted arrays.
I think you might need to read these two posts:
Join 2 sets based on default order
How keep data don't sort?
which explain that SQL tables just don't have an order. So you cannot fetch them in a particular order.
DO NOT USE THE FOLLOWING CODE, IT IS DANGEROUS AND ONLY INCLUDED AS A PROOF OF CONCEPT:
As it happens you can use a set-returning function hack to very inefficiently do what you want. It's incredibly ugly and *completely unsafe without an ORDER BY in the SELECTs, but I'll include it for completeness. I guess.
CREATE OR REPLACE FUNCTION t1() RETURNS SETOF table1 AS $$ SELECT * FROM table1 $$ LANGUAGE sql;
CREATE OR REPLACE FUNCTION t2() RETURNS SETOF table2 AS $$ SELECT * FROM table2 $$ LANGUAGE sql;
SELECT (t1()).*, (t2()).*;
If you use this in any real code then kittens will cry. It'll produce insane and bizarre results if the number of rows in the tables differ and it'll produce the rows in orderings that might seem right at first, but will randomly start coming out wrong later on.
THE SANE WAY is to add a primary key properly, then do a join.

Word frequencies from strings in Postgres?

Is it possible to identify distinct words and a count for each, from fields containing text strings in Postgres?
Something like this?
SELECT some_pk,
regexp_split_to_table(some_column, '\s') as word
FROM some_table
Getting the distinct words is easy then:
SELECT DISTINCT word
FROM (
SELECT regexp_split_to_table(some_column, '\s') as word
FROM some_table
) t
or getting the count for each word:
SELECT word, count(*)
FROM (
SELECT regexp_split_to_table(some_column, '\s') as word
FROM some_table
) t
GROUP BY word
You could also use the PostgreSQL text-searching functionality for this, for example:
SELECT * FROM ts_stat('SELECT to_tsvector(''hello dere hello hello ridiculous'')');
will yield:
word | ndoc | nentry
---------+------+--------
ridicul | 1 | 1
hello | 1 | 3
dere | 1 | 1
(3 rows)
(PostgreSQL applies language-dependent stemming and stop-word removal, which could be what you want, or maybe not. Stop-word removal and stemming can be disabled by using the simple instead of the english dictionary, see below.)
The nested SELECT statement can be any select statement that yields a tsvector column, so you could substitute a function that applies the to_tsvector function to any number of text fields, and concatenates them into a single tsvector, over any subset of your documents, for example:
SELECT * FROM ts_stat('SELECT to_tsvector(''english'',title) || to_tsvector(''english'',body) from my_documents id < 500') ORDER BY nentry DESC;
Would yield a matrix of total word counts taken from the title and body fields of the first 500 documents, sorted by descending number of occurrences. For each word, you'll also get the number of documents it occurs in (the ndoc column).
See the documentation for more details: http://www.postgresql.org/docs/current/static/textsearch.html
Should be split by a space ' ' or other delimit symbol between words; not by an 's', unless intended to do so, e.g., treating 'myWordshere' as 'myWord' and 'here'.
SELECT word, count(*)
FROM (
SELECT regexp_split_to_table(some_column, ' ') as word
FROM some_table
) t
GROUP BY word