Add new language to postgresql full text search - postgresql

Is there any way to add new languages to postgresq full text search?
Where can I read or start from ?

You can look at this a link from PostgreSQL documentation, where CREATE DICTIONARY commands are listed. There are several types of dictionaries that can be used and added, and commands for adding them deffer.
For example, if you wish to add Ispell dictionary, you would do it like this:
CREATE TEXT SEARCH DICTIONARY my_lang_ispell (
TEMPLATE = ispell,
DictFile = path_to_my_lang_dict_file,
AffFile = path_to_my_lang_affixes_file,
StopWords = path_to_my_lang_astop_words_file
);
DictFile and AffFile are files you need to google somewhere, depending on the language you want to add. StopWords file keeps words that should be ignored, I guess you can also find that file on the Internet.

Related

Full Text search with multiple synonyms in PostgresSQL

I am implementing Full Text Search with PostgreSQL. I am using following type query to search in document column.
FROM schema.table t0
WHERE t0.document ## websearch_to_tsquery('error')
I am working on to use FTS Dictionaries to search for similar words. I come across C:\Program Files\PostgreSQL\14\share\tsearch_data folder where I have defined word and its synonyms in xsyn_sample.rules file. File content is as mentioned below.
# Sample rules file for eXtended Synonym (xsyn) dictionary
# format is as follows:
#
# word synonym1 synonym2 ...
#
error fault issue mistake malfunctioning
I want to use this dictionary but don't know how to use it. When I search for 'error', I wants to display result for 'error', 'fault', 'issues', 'mistakes' etc which are having similar meanings. Kindly share if you have ever come across this implementation. Few things I am asking for
Is this xsyn_sample.rules is sufficient for this? If not then what other techniques can be used for this type of search?
How to configure postgreSQL 14 in my local system to use this dictionary instead of 'simple' or 'english'. I know how to use both of these dictionary with select plainto_tsquery('english','errors'); and select plainto_tsquery('simple','errors'); queries. Similarly I want to use my custom dictionary.
Is there any better source for dictionaries use in postgres in compare to https://www.postgresql.org/docs/current/textsearch-dictionaries.html ?
Don't edit the example rules file, create your own file mysyn.rules and add the synonyms there. Then create a dictionary that uses the file:
CREATE TEXT SEARCH DICTIONARY mysyn (TEMPLATE = xsyn_template, RULES = mysyn);
Then copy the English text search configuration and add your dictionary:
CREATE TEXT SEARCH CONFIGURATION myconf (COPY = english);
ALTER TEXT SEARCH CONFIGURATION myconf
ALTER MAPPING FOR word, asciiword WITH mysyn, english_stem;

How to search for a part of a word and replace the whole word afterwards?

I have a relatively big Word document containing Tweets. I now want to replace all links in that file. Every link starts with http, is there a possibility to find all words that start with http but then delete the whole word/link?
I have tried using the search and replace option but I couldn't find a solution.
Any help is much appreciated!
You could use python to process the entire file,
I would advise creating a second file where you storage the result.
A little code that should do what you want.
with open("filename", 'r') as f:
content = f.read()
words = content.split(" ")
filtered_word = [word for word in words if not word.startswith("http")]
content = " ".join(filtered_word)
with open("filename_result", 'w') as f:
f.write(content)
I found an answer: I first used the iOS Version of Word but there I couldn't use wildcards. Now I used a Windows version of Word and through the search and replace option I used "http*" to search for all links.

Find or replace a string in word documents in a given folder

How to find or replace a word in the documents in the given folder
Is there is a tool or any script is available to do that?
Thanks.
Finding text is straightforward. There are several ways to do it, including using the Windows search utility. Here's an article with several methods: Search through the content of multiple Word documents
To find and replace, you can use a free text editor like Notepad++. It has a very good Find in Files utility. There are many other utilities that can do this, some paid and some free.
Finally, you can write a VBA macro that will find and replace all documents in a folder. Here's a page with a macro listing that does that: How to Find and Replace Contents in Multiple Word Documents

Trying to drop the contents of one Word document into another

This is in an IDTExtensibility2 (not VSTO) Word AddIn. I'm trying to do a drag drop where I programatically give the contents of one document to DoDragDrop(). The problem is instead of dropping the contents in the other document, it inserts it as an embedded Word document.
My code is basically:
srcDoc.Activate();
activeWindow = srcDoc.ActiveWindow;
selection = activeWindow.Selection;
selection.WholeStory();
selection.Copy();
data = Clipboard.GetDataObject();
DragDropEffects effect = DoDragDrop(data, DragDropEffects.Copy | DragDropEffects.Scroll);
How can I have it paste the contents instead of pasting the document?
Please note, I want to paste all of the content in the source document into the insertion point of the drop in the destination object. So it's not copy over the full contents of the destination (therefore a file based approach won't work).
And it's drag/drop so I don't have the control that Clipboard.Paste() provides to specify the pasting format. In addition, the format I need is the native DOCX format to bring all properties & formatting across.
thanks - dave

Find my Postgres text search dictionaries

I created a thesaurus for full text search a few months back. I just recently added some entries, and (I think) I update it like this:
ALTER TEXT SEARCH CONFIGURATION english
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart
WITH [my_thesaurus], english_stem;
However, I don't actually don't remember what my thesaurus was called. How can I figure this out?
You may find it in the output of:
SELECT dictname FROM pg_catalog.pg_ts_dict;
If you use psql client, you can use the following command.
\dFd[+] PATTERN
lists text search dictionaries
Basically, you can use \dFd+ to list all dictionaries along with their initialization options.