Exclude a word in sql query? - postgresql

I'm trying to write a query in sql to exclude a keyword:
It's a list of cities written out (e.g. AnnArbor-MI). In the list there are duplicates because some have the word 'badsetup' after the city and these need to be discarded. How would I write something to exclude any city with 'badsetup' after it?

Your question title and content appear to be asking for two different things ...
Query cities while excluding the trailing 'badsetup':
SELECT regexp_matches(citycolumn, '(.*)badsetup')
FROM mytable;
Query cities that don't have the trailing 'badsetup':
SELECT citycolumn
FROM mytable
WHERE citycolumn NOT LIKE '%badsetup';

In psql, to select rows excluding those with the word 'badsetup' you can use the following:
SELECT * FROM table_name WHERE column NOT LIKE '%badsetup%';
In this case the '%' indicates that there can be any characters of any length in this space. So this query will find any instance of the phrase 'badsetup' in your column, regardless of the characters before or after it.
You can find more information in section 9.7.1 here: https://www.postgresql.org/docs/8.3/static/functions-matching.html

Related

Counting the Number of Occurrences of a Multi-Word Phrase in Text with PostgreSQL

I have a problem, I need to count the frequency of a word phrase appearing within a text field in a PostgreSQL database.
I'm aware of functions such as to_tsquery() and I'm using it to check if a phrase exists within the text using to_tsquery('simple', 'sample text'), however, I'm unsure of how to count these occurrences accurately.
If the words are contained just once in the string (I am supposing here that your table contains two columns, one with an id and another with a text column called my_text):
SELECT
count(id)
FROM
my_table
WHERE
my_text ~* 'the_words_i_am_looking_for'
If the occurrences are more than one per field, this nested query can be used:
SELECT
id,
count(matches) as matches
FROM (
SELECT
id,
regexp_matches(my_text, 'the_words_i_am_looking_for', 'g') as matches
FROM
my_table
) t
GROUP BY 1
The syntax of this function and much more about string pattern matching can be found here.

PostgreSQL query on a text column ignoring special characters

I have a table which contains a text column, say vehicle number.
Now I want to query the table for fields which contain a particular vehicle number.
While matching I do not want to consider non-alphanumeric characters.
example: query condition - DEL123
should match - DEL-123, DEL/123, DEL#123, etc...
If you know which characters to skip, put them as the second parameter of this translate() call (which is faster than regexp functions):
select *
from a_table
where translate(code, '-/#', '') = 'DEL123';
Else, you can compare only alphanumeric characters using regexp_replace():
select *
from a_table
where regexp_replace(code, '[^[:alnum:]]', '', 'g') = 'DEL123';
#klin's answer is great, but is not sargable, so in cases where you're searching through millions of records (maybe not your case, but perhaps someone else with a similar question looking for answers), using regular expressions will likely render much better results.
The following will use indexes on code significantly reducing the number of rows tested:
select *
from a_table
where code ~ '^DEL[^[:alnum:]]*123$';

Searching individual words in a string

I know about full-text search, but that only matches your query against individual words. I want to select strings that contain a word that starts with words in my query. For example, if I search:
appl
the following should match:
a really nice application
apples are cool
appliances
since all those strings contains words that start with appl. In addition, it would be nice if I could select the number of words that match, and sort based on that.
How can I implement this in PostgreSQL?
Prefix matching with Full Text Search
FTS supports prefix matching. Your query works like this:
SELECT * FROM tbl
WHERE to_tsvector('simple', string) ## to_tsquery('simple', 'appl:*');
Note the appended :* in the tsquery. This can use an index.
See:
Get partial match from GIN indexed TSVECTOR column
Alternative with regular expressions
SELECT * FROM tbl
WHERE string ~ '\mappl';
Quoting the manual here:
\m .. matches only at the beginning of a word
To order by the count of matches, you could use regexp_matches()
SELECT tbl_id, count(*) AS matches
FROM (
SELECT tbl_id, regexp_matches(string, '\mappl', 'g')
FROM tbl
WHERE string ~ '\mappl'
) sub
GROUP BY tbl_id
ORDER BY matches DESC;
Or regexp_split_to_table():
SELECT tbl_id, string, count(*) - 1 AS matches
FROM (
SELECT tbl_id, string, regexp_split_to_table(string, '\mappl')
FROM tbl
WHERE string ~ '\mappl'
) sub
GROUP BY 1, 2
ORDER BY 3 DESC, 2, 1;
db<>fiddle here
Old sqlfiddle
Postgres 9.3 or later has index support for simple regular expressions with a trigram GIN or GiST index. The release notes for Postgres 9.3:
Add support for indexing of regular-expression searches in pg_trgm
(Alexander Korotkov)
See:
PostgreSQL LIKE query performance variations
Depesz wrote a blog about index support for regular expressions.
SELECT * FROM some_table WHERE some_field LIKE 'appl%' OR some_field LIKE '% appl%';
As for counting the number of words that match, I believe that would be too expensive to do dynamically in postgres (though maybe someone else knows better). One way you could do it is by writing a function that counts occurrences in a string, and then add ORDER BY myFunction('appl', some_field). Again though, this method is VERY expensive (i.e. slow) and not recommended.
For things like that, you should probably use a separate/complimentary full-text search engine like Sphinx Search (google it), which is specialized for that sort of thing.
An alternative to that, is to have another table that contains keywords and the number of occurrences of those keywords in each string. This means you need to store each phrase you have (e.g. really really nice application) and also store the keywords in another table (i.e. really, 2, nice, 1, application, 1) and link that keyword table to your full-phrase table. This means that you would have to break up strings into keywords as they are entered into your database and store them in two places. This is a typical space vs speed trade-off.

Copy selected query fields name in Mysql Workbench

I am using mysql workbench (SQL Editor). I need copy the list of columns in each query as was existed in Mysql Query Browser.
For example
Select * From tb
I want have the list of fields like as:
id,title,keyno,......
You mean you want to be able to get one or more columns for a specified table?
1st way
Do SHOW COLUMNS FROM your_table_name and from there on depending on what you want have some basic filtering added by specifying you want only columns that data type is int, default value is null etc e.g. SHOW COLUMNS FROM your_table_name WHERE type='mediumint(8)' ANDnull='yes'
2nd way
This way is a bit more flexible and powerful as you can combine many tables and other properties kept in MySQL's INFORMATION_SCHEMA internal db that has records of all db columns, tables etc. Using the query below as it is and setting TABLE_NAME to the table you want to find the columns for
SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME='your_table_name';
To limit the number of matched columns down to a specific database add AND TABLE_SCHEMA='your_db_name' at the end of the query
Also, to have the column names appear not in multiple rows but in a single row as a comma separated list you can use GROUP_CONCAT(COLUMN_NAME,',') instead of only COLUMN_NAME
To select all columns in select statement, please go to SCHEMAS menu and right click ok table which you want to select column names, then select "Copy to Clipboard > Select All statement".
The solution accepted is fine, but it is limited to field names in tables. To handle arbitrary queries would be to standardize your select clause to be able to use regex to strip out only the column aliases. I format my select clause as "1 row per element" so
Select 1 + 1 as Col1, 1 + 2 Col2 From Table
becomes
Select 1 + 1 as Col1
, 1 + 2 Col2
From Table
Then I use simple regex on the "1 row per select element" version to replace "^.* " (excluding quotes) with nothing. The regex finds everything before the final space in the line, so it assumes your column aliases doesn't contain spaces (so replace spaces with underscore). Or if you don't like "1 row per element" then always use "as" keyword to give you a handle that regex can grasp.

Eliminating Duplicate rows in Postgres

I want to remove duplicate rows return from a SELECT Query in Postgres
I have the following query
SELECT DISTINCT name FROM names ORDER BY name
But this somehow does not eliminate duplicate rows?
PostgreSQL is case sensitive, this might be a problem here
DISTINCT ON can be used for case-insensitive search (tested on 7.4)
SELECT DISTINCT ON (upper(name)) name FROM names ORDER BY upper(name);
Maybe something with same-looking-but-different characters (like LATIN 'a'/CYRILLIC 'а')
Don't forget to add a trim() on that too. Or else 'Record' and 'Record ' will be treated as separate entities. That ended up hurting me at first, I had to update my query to:
SELECT DISTINCT ON (upper(trim(name))) name FROM names ORDER BY upper(trim(name));
In Postgres 9.2 and greater you can now cast the column to a CITEXT type or even make the column that so you don't have to cast on select.
SELECT DISTINCT name::citext FROM names ORDER BY name::citext