Can to_tsvector be used on a column having single word rather than a sentence? - postgresql-9.4

I am trying to find the English words in a Column "Name" in Postgresql.
I found the function "to_tsvector" does this:
If the name has a column with value "The Best Dogs", the vector returns the following tokens:
best:2 dogs:3
I wonder if it will work when the input is a single word (rather than a sentence)
For e.g. value "Dogs" in the column
Can I expect Dogs:1
I tried and its not working, I am wondering if its a configuration issue or the input format issue.

Related

Postgres regexp_replace: inability to replace source text with first captured group

Using PostgreSQL, I am unable to design the correct regex pattern to achieve the desired output of an SQL statement that uses regexp_replace.
My source text consists of several scattered blocks of text of the form 'PU*' followed by a date string in the form of 'YYYY-MM'--for example, 'PU*2020-11'. These blocks are surrounded by strings of unpredictable, arbitrary text (including other instances of 'PU*' followed by the above date string format, such as 'PU*2017-07), white space, and line feeds.
My desire is to replace the entire source text with the FIRST instance of the 'YYYY-MM' text pattern. In the above example, the desired output would be '2020-11'.
Currently, my search pattern results in the correct replacement text in place of the first capturing group, but unfortunately, all of the text AFTER the first capturing group also inadvertently appears in the output, which is not the desired output.
Specifically:
Version: postgres (PostgreSQL) 13.0
A more complex example of source text:
First line
Exec committee
PU*2020-08
PU*2019-09--cancelled
PU*2017-10
added by Terranze
My pattern so far:
(\s|\S)*?PU\*(\d{4}-\d{2})(\s|\S*)*
Current SQL statement:
select regexp_replace('First line\nExec committee; PU*2020-08\nPU*2019-09\nPU*2017-10\n\nadded by Terranze\n', '(\s|\S)*?PU\*(\d{4}-\d{2})(\s|\S*)*', '\2') as _regex;
Current output on https://regex101.com/
2020-08
Current output on psql
_regex
───────────────────────────────────────────────────────────────────
2020-08\nPU*2019-09--cancelled\nPU*2017-10\n\nadded by Terranze\n
(1 row)
Desired output:
2020-08
Any help appreciated. Thanks--
How about this expression:
'^.*?PU\*(\d{4}-\d{2}).*$'

Why are formulas not propagating in null fields?

I'm trying to change the value of nulls to something else that can be used to filter. This data comes from a QVD file. The field that contains nulls, contains nulls due to no action taken on those items ( they will eventually change to something else once an action has been taken). I found this link which was very informative but i tried multiple solutions from the document to no avail.
What i don't quite understand is that whenever i make a new field (in the script or as an expression) the formula does not propagate in the records that are null, it shows " - ". For instance, the expression isNull(ActionTaken) will return false in a field that that not null, but only " - " in fields that are null. If i export the table to Excel, the " - " is exported, i copy this cell to a text analyzer i the UTF-8 encoded is \x2D\x0A\x0A, i'm not sure if that's an artifact of the export process.
I also tried using the NullAsValue statement but no luck. Using a combination of Len & Trim = 0 will return the same result as above. This is only one table, no other tables are involved.
Thanks in advance.
I had a similar case few years ago where the field looked empty but actually it was filled with a character which just looked empty. Trimming the field also didnt worked as expected in this case, because the character code was different
What I can suggest you is to check if the character number, returned for the empty value, is actually an empty string. You can use the ord to check the character number for the empty values. Once you have the number then you can use this number to replace it with whatever you want (for example empty string)

SIMILAR TO function in postgresql is not working as expected

I am using below code to do calculation
select column1 from tablename where code SIMILAR TO '%(-|_|–)EST[1-2][0-9](-|_)%'
for this column value -CSEST190-KCY18-04-01-L the condition was passed, but in actual I want to ignore this type of data.
The correct value which should pass through the above condition is
-CS-EST19-0-KCY18-04-01-L
-CS_EST19-0-KCY18-04-01-L
Any suggestions, how to avoid this type of confusion?
Easiest way is to go full-regex, instead of using SQL standard SIMILAR TO.
select column1 from tablename where code ~ '[_–-]EST[12][0-9][_-]'
Notice this is does not have to match the full string, and you don't have to add .* on both ends (equivalent of % in LIKE and SIMILAR TO). The reason you got a match on that is, because of the underscore _, which is a single wildcard character.
Also, I switched the order, in the square brackets, so that the dash is the last character. That way it's treated as a character literal, not as a range specifier.

Selecting words out of table which sound similar

I read an interesting article about English and phonetics - and would like to see if my newfound knowledge can be applied in TSQL to generate a fuzzy result set. In one of my applications, there is a table containing words, which I extracted from a word list. It is literally a one-column table -
Word |
------
A
An
Apple
...
their
there
Is there an built-in function in SQL Server to Select a word which Sounds The same, even though it is spelled different? (The globalization settings are on en-ZA - as last time I checked)
SELECT Word FROM WordTable WHERE Word = <word that sounds similar>
SoundEx()
SOUNDEX converts an alphanumeric string to a four-character code that is based on how the string sounds when spoken.
Difference()
Returns an integer value that indicates the difference between the SOUNDEX values of two character expressions.
SELECT word
, SoundEx(word) As word
, SoundEx(word_that_sounds_similar) As word_that_sounds_similar
, Difference(SoundEx(word), SoundEx(word_that_sounds_similar)) As how_similar
FROM wordtable
WHERE Difference(SoundEx(word), SoundEx(word_that_sounds_similar)) <= 1 /* quite close! */
The value returned by Difference() indicates how similar the two words are.
A value of 0 indicates a strong match and a value of 4 means slim-to-no match.

zip code + 4 mail merge treated like an arithmetic expression

I'm trying to do a simple mail merge in Word 2010 but when I insert an excel field that's supposed to represent a zip code from Connecticut (ie. 06880) I am having 2 problems:
the leading zero gets suppressed such as 06880 becoming 6880 instead. I know that I can at least toggle field code to make it so it works as {MERGEFIELD ZipCode # 00000} and that at least works.
but here's the real problem I can't seem to figure out:
A zip+4 field such as 06470-5530 gets treated like an arithmetic expression. 6470 - 5530 = 940 so by using above formula instead it becomes 00940 which is wrong.
Perhaps is there something in my excel spreadsheet or an option in Word that I need to set to make this properly work? Please advise, thanks.
See macropod's post in this conversation
As long as the ZIP codes are reaching Word (with or without "-" signs in the 5+4 format ZIPs, his field code should sort things out. However, if you are mixing text and numeric formats in your Excel column, there is a danger that the OLE DB provider or ODBC driver - if that is what you are using to get the data - will treat the column as numeric and return all the text values as 0.
Yes, Word sometimes treats text strings as numeric expressions as you have noticed. It will do that when you try to apply a numeric format, or when you try to do a calculation in an { = } field, when you sum table cell contents in an { = } field, or when Word decides to do a numeric comparison in (say) an { IF } field - in the latter case you can get Word to treat the expression as a string by surrounding the comparands by double-quotes.
in Excel, to force the string data type when entering data that looks like a number, a date, a fraction etc. but is not numeric (zip, phone number, etc.) simply type an apostrophe before the data.
=06470 will be interpreted as a the number 6470 but ='06470 will be the string "06470"
The simplest fix I've found is to save the Excel file as CSV. Word takes it all at face value then.