Postgresql full text search in postgresql - japanese, chinese, arabic

Postgresql full text search in postgresql - japanese, chinese, arabic - postgresql

I'm designing a fulltext search function in postgresql for my current project.
It works ok with ispell/myspell dictionaries so far.
Now I need to add support for chinese, japanese and arabic search.
Where do I start?
There are no templates or dictionaries available for those languages
as far as I can see.
Will it work with pg_catalog.simple configuration?

Just a hint from the manual: A large list of dictionaries is available on the OpenOffice Wiki.

Dictionaries won't help you too much with Chinese - you'll need to look in to NGRAM tokenising...

The similar solution of link at stackoverflow.com is How do I implement full text search in Chinese on PostgreSQL? .
Although that, I would provide a solution below in detail based on my experience and a solution on Internet. I use both tools of SCWS and zhparser as the solution of Chinese full-text search in postgres.
20160131 Update:
You must check whether you have installed postgresql-server-devel-{number version} because we will use pgxs function from it for creating extension in postgresql.
Step1: install SCWS.
It's remarkable that --prefix=/usr/local/scws follows ./configure . Not just has ./configure along in below 4th line.
wget http://www.xunsearch.com/scws/down/scws-1.2.2.tar.bz2
tar xvjf scws-1.2.2.tar.bz2
cd scws-1.2.2
./configure --prefix=/usr/local/scws
make
make install
To check whether it installed successfully, please enter below command:
ls -al /usr/local/scws/lib/libscws.la
Step2: Install zhparser
git clone https://github.com/amutu/zhparser.git
cd zhparser
SCWS_HOME=/usr/local/scws/include make && make install
20160131 Update:
If you use Mac OS X Yosemite, aboved value of SCWS_HOME is same. But if you use Ubuntu 14.04 LTS, please change value of SCWS_HOME to /usr/local/scws .
Step3: Configure a new extension using zhparser in Postres
Step3.1: Login your postgres database through terminal/commandline
psql yourdatabasename
Step3.2: Create extension in Postgres. You could specify what dictionary name you want.
CREATE EXTENSION zhparser;
CREATE TEXT SEARCH CONFIGURATION dictionarynameyouwant (PARSER = zhparser);
ALTER TEXT SEARCH CONFIGURATION dictionarynameyouwant ADD MAPPING FOR n,v,a,i,e,l WITH simple;
If you follow above steps, you can use the function of Postgres full-text searching in Chinese/Mandarin words.
Extra step(not necessary) in Rails for using pg_search gem: Step4. Configure the dictionary name at :dictionary attribute of :tsearch in app/models/yourmodel.rb
class YourOwnClass < ActiveRecord::Base
...
include PgSearch
pg_search_scope :functionnameyoulike, :against => [columnsyoulike1, columnsyoulike2, ...,etc], :using => { :tsearch => {:dictionary => "dictionary name you just specified in creating a extension in postgres", blah blah blah, ..., etc} }
end
Reference:
1. SCWS install tutorial
2. Zhparser#github.com
3. Francs' Post - Postgres full-text search in Chinese with zhparser and SCWS
4. Rails365.net's Post - Postgres full-text search in Chinese with pg_search gem with zhparser
5. My Post at xuite.net - Make Postgres support full text search in Mandarin/Chinese

Related

Where to find Ukrainian 'ispell', 'aspell', 'snowball' dictionary for adding it to full-text search in Postgres?

After parsing many documents, I have a lot of rows/columns with Ukrainian text that should be indexed for full-text search in Postgres.
I've found that Postgres 14 supports by default 29 languages, but unfortunately not the Ukrainian one.
After subsequent digging, I've found that it allows adding an external dictionary:
CREATE TEXT SEARCH DICTIONARY my_lang_ispell (
TEMPLATE = ispell,
DictFile = path_to_my_lang_dict_file,
AffFile = path_to_my_lang_affixes_file,
StopWords = path_to_my_lang_astop_words_file
);
But how to find the most relevant DictFile, AffFile, and StopWords files? For example, snowball source doesn't contain this language.
So, could anyone help me find the best way to obtain ispell, aspell, snowball, or another dictionary for the Ukrainian language?
Thanks!

After more deep exploring, found the solution on this resource dict_uk
Compile files manually by this guide:
sudo snap install gradle
$ cd dict_uk
$ ./gradlew expand
$ cd distr/hunspell/
$ ../../gradlew hunspell
$ sudo cp build/hunspell/uk_UA.aff /usr/share/postgresql/12/tsearch_data/uk_ua.affix
$ sudo cp build/hunspell/uk_UA.dic /usr/share/postgresql/12/tsearch_data/uk_ua.dict
$ sudo cp ../postgresql/ukrainian.stop /usr/share/postgresql/12/tsearch_data/ukrainian.stop
Or just download all files from here
Follow the guide of setting up ukrainian language in the Postgres:
$ sudo cp uk_UA.affix /usr/share/postgresql/12/tsearch_data/uk_ua.affix
$ sudo cp uk_UA.dic /usr/share/postgresql/12/tsearch_data/uk_ua.dict
$ sudo cp ukrainian.stop /usr/share/postgresql/12/tsearch_data/ukrainian.stop
$ sudo su postgres
$ psql
CREATE TEXT SEARCH DICTIONARY ukrainian_huns (TEMPLATE = ispell, DictFile = uk_ua, AffFile = uk_ua, StopWords = ukrainian);
CREATE TEXT SEARCH DICTIONARY ukrainian_stem (template = simple, stopwords = ukrainian);
CREATE TEXT SEARCH CONFIGURATION ukrainian (PARSER=default);
ALTER TEXT SEARCH CONFIGURATION ukrainian ALTER MAPPING FOR hword, hword_part, word WITH ukrainian_huns, ukrainian_stem;
ALTER TEXT SEARCH CONFIGURATION ukrainian ALTER MAPPING FOR int, uint, numhword, numword, hword_numpart, email, float, file, url, url_path, version, host, sfloat WITH simple;
ALTER TEXT SEARCH CONFIGURATION ukrainian ALTER MAPPING FOR asciihword, asciiword, hword_asciipart WITH english_stem;
# \dFd
...
pg_catalog | english_stem | snowball stemmer for english language
...
public | ukrainian_huns |
public | ukrainian_stem |
Now it is available for creating a searchable column with help of to_tsvector:
ALTER TABLE extracted_pages
ADD COLUMN tsvector_uk tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('ukrainian', coalesce(column_with_text, '')), 'A')
) STORED;
This example shows the correct stemming for the Ukrainian language:
SELECT to_tsvector('ukrainian', 'солодко дзюрчить джерело і хочеться жити, любити, творити... ');
=> [{"to_tsvector"=>"'джерело':3 'дзюрчати':2 'жити':6 'любити':7 'солодко':1 'творити':8 'хочеться':5"}]
Results
The Postgres full-text search works as well as similar search text engine SphinxSearch in terms of quality, but it is a bit slower.
On the same query from the huge amount of records (278_000) it returns the same results:
Postgres - ActiveRecord: 67.6ms
SphinxSearch - ActiveRecord: 10.9ms
OS: Ubuntu 20.04
Thank you very much, dict_uk support team!

PostGIS: function ST_AsRaster does not exist. Even using examples from the docs

I'm trying to convert geometries to images, and the functions to do so don't seem to exist.
The following example is from the ST_AsRaster Docs WHich specify the requirements are Availability: 2.0.0 - requires GDAL >= 1.6.0.
SELECT ST_AsPNG(ST_AsRaster(ST_Buffer(ST_Point(1,5),10),150, 150));
This results in:
ERROR: function st_asraster(geometry, integer, integer) does not exist
LINE 1: SELECT ST_AsPNG(ST_AsRaster(ST_Buffer(ST_Point(1,5),10),150,...
I found some info that points towards needing GDAL drivers, however, when I try:
SELECT short_name, long_name FROM ST_GdalDrivers();
I get:
ERROR: function st_gdaldrivers() does not exist
LINE 1: SELECT short_name, long_name FROM ST_GdalDrivers();
I have no idea where to even go to try solving this, why don't the functions exist, was there some config I needed to add, some doc I didn't read?
Even the https://postgis.net/docs/RT_reference.html seems to suggest that it should "just work"?
This is installed from the package manager on Ubuntu 20.0.4.
Version Info SELECT PostGIS_Full_Version();:
POSTGIS="3.0.0 r17983" [EXTENSION]
PGSQL="120"
GEOS="3.8.0-CAPI-1.13.1 "
PROJ="6.3.1"
LIBXML="2.9.4"
LIBJSON="0.13.1"
LIBPROTOBUF="1.3.3"
WAGYU="0.4.3 (Internal)"

You must have forgotten to install the postgis_raster extension:
CREATE EXTENSION postgis_raster;
This extension is new in PostGIS 3.0; before that, its objects were part of the postgis extension.
The documentation mentions that:
Once postgis is installed, it needs to be enabled in each individual database you want to use it in.
psql -d yourdatabase -c "CREATE EXTENSION postgis;"
-- if you built with raster support and want to install it --
psql -d yourdatabase -c "CREATE EXTENSION postgis_raster;"

pg_search trigram extension not working

Rails5, i have it installed on database
pg_trgm | 1.1 | public | text similarity measurement and index searching based on trigrams)
and in the initializer :
PgSearch.multisearch_options = {
:using => [:tsearch, :trigram],
}
i've tried it with only :trigram (not :tsearch), doesn't work, even after db:reset and rake pg_search:multisearch:rebuild[AllModels].
Am I missing a step?

I'm the author of pg_search.
Trigram search requires the pg_trgm extension.
See the pg_search README and the wiki page for Installing PostgreSQL Extensions for more details.

Chef cookbook for installing mongodb-shell only

I am trying to install a mongo client via chef. Essentially this is what I have been doing in manual installs:
sudo vi /etc/yum.repos.d/mongodb.repo
[mongodb]
name=MongoDB Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/
gpgcheck=0
enabled=1
sudo yum install mongodb-org-shell-2.6.7
I don't want to reinvent the wheel here, nor do I want to install anything other than the shell. This cookbook looks like a good resource, but I cannot get it to install just the shell:
https://github.com/edelight/chef-mongodb
But it seems to not allow for any of the main components to be installed. Will i need to LWRP?

Well i picked apart the mongodb cookbook - to this tune:
yum_repository 'mongodb-org-3.0' do
description 'mongodb RPM Repository'
baseurl "http://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.0/#{node['kernel']['machine'] =~ /x86_64/ ? 'x86_64' : 'i686'}"
action :create
gpgcheck false
enabled true
end
case node['platform_family']
when 'debian'
# this options lets us bypass complaint of pre-existing init file
# necessary until upstream fixes ENABLE_MONGOD/DB flag
packager_opts = '-o Dpkg::Options::="--force-confold" --force-yes'
when 'rhel'
# Add --nogpgcheck option when package is signed
# see: https://jira.mongodb.org/browse/SERVER-8770
packager_opts = '--nogpgcheck'
else
packager_opts = ''
end
package node[:frt_mongodb][:package_name] do
options packager_opts
action :install
version node[:frt_mongodb][:package_version]
end
That said it looks like I should be able to use that cookbook configured with the right attributes to aCcomplish this. The biggest problem is that the recipe within manipulates files that aren't necessary for the shell.

Is there a way to tell django compressor to create source maps

I want to be able to debug minified compressed javascript code on my production site. Our site uses django compressor to create minified and compressed js files. I read recently about chrome being able to use source maps to help debug such javascript. However I don't know how/if possible to tell the django compressor to create source maps when compressing the js files

I don't have a good answer regarding outputting separate source map files, however I was able to get inline working.
Prior to adding source maps my settings.py file used the following precompilers
COMPRESS_PRECOMPILERS = (
('text/coffeescript', 'coffee --compile --stdio'),
('text/less', 'lessc {infile} {outfile}'),
('text/x-sass', 'sass {infile} {outfile}'),
('text/x-scss', 'sass --scss {infile} {outfile}'),
('text/stylus', 'stylus < {infile} > {outfile}'),
)
After a quick
$ lessc --help
You find out you can put the less and map files in to the output css file. So my new text/less precompiler entry looks like
('text/less', 'lessc --source-map-less-inline --source-map-map-inline {infile} {outfile}'),
Hope this helps.
Edit: Forgot to add, lessc >= 1.5.0 required for this, to upgrade use
$ [sudo] npm update -g less

While I couldn't get this to work with django-compressor (though it should be possible, I think I just had issues getting the app set up correctly), I was able to get it working with django-assets.
You'll need to add the appropriate command-line argument to the less filter source code as follows:
diff --git a/src/webassets/filter/less.py b/src/webassets/filter/less.py
index eb40658..a75f191 100644
--- a/src/webassets/filter/less.py
+++ b/src/webassets/filter/less.py
## -80,4 +80,4 ## class Less(ExternalTool):
def input(self, in_, out, source_path, **kw):
# Set working directory to the source file so that includes are found
with working_directory(filename=source_path):
- self.subprocess([self.less or 'lessc', '-'], out, in_)
+ self.subprocess([self.less or 'lessc', '--line-numbers=mediaquery', '-'], out, in_)
Aside from that tiny addition:
make sure you've got the node -- not the ruby gem -- less compiler (>=1.3.2 IIRC) available in your path.
turn on the sass source-maps option buried away in chrome's web inspector config pages. (yes, 'sass' not less: less tweaked their debug-info format to match sass's since since sass had already implemented a chrome-compatible mapping and their formats weren't that different to begin with anyway...)

Not out of the box but you can extend a custom filter:
from compressor.filters import CompilerFilter
class UglifyJSFilter(CompilerFilter):
command = "uglifyjs -c -m " /
"--source-map-root={relroot}/ " /
"--source-map-url={name}.map.js" /
"--source-map={relpath}/{name}.map.js -o {output}"

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgresql full text search in postgresql - japanese, chinese, arabic - postgresql

Just a hint from the manual: A large list of dictionaries is available on the OpenOffice Wiki.

Dictionaries won't help you too much with Chinese - you'll need to look in to NGRAM tokenising...

Related

Where to find Ukrainian 'ispell', 'aspell', 'snowball' dictionary for adding it to full-text search in Postgres?

PostGIS: function ST_AsRaster does not exist. Even using examples from the docs

pg_search trigram extension not working

Chef cookbook for installing mongodb-shell only

Is there a way to tell django compressor to create source maps

Categories

Resources