How can I change the to_tsvector configuration to use a simple tokenization rule like:
lowercase
split by spaces only
Executing the following query:
SELECT to_tsvector('english', 'birthday=19770531 Name=John-Oliver Age=44 Code=AAA-345')
I get these lexemes:
'-345':9 '19770531':2 '44':6 'aaa':8 'age':5 'birthday':1 'code':7 'john':4 'name':3
The kind of searching I'm looking for is like:
(!birthday | birthday=19770531) & (code=AAA-345)
It means, get me all records that has a text "birthday=19770531" or doesn't have "birthday" at all, and a text equals to "code=AAA-345"). The way lexemes are being created it is not possible. I was expecting to have something like this:
'birthday=19770531':1 'age=44':2 'code=aaa-345':4 'name=john-oliver':3
You would have to code a custom parser. This can only be done in C.
But you might be able to use the existing testing parser test_parser, it seems to do what you want. If not, it would at least be a good starting point.
The problem may be that this is in src/test/modules/, and I don't think it ships with most installation packaging. So it might take some effort to get it to install. It would depend on your OS, version, and package manager.
Related
Basically i'm trying to do a simple join. I'm a beginner in progress and even if i'm reading always the same things... my problem still unresolved ! :'(
I'm using unixodbc to communicate with my base and this is working like a charm when i'm using simple command like : SELECT * from PUB."Art"
I understood I have to do something who looks like that to join 2 tables :
FOR EACH PUB."Art" WHERE (PUB."Art".IdArt = 16969) ,
EACH PUB."ArtDet" WHERE (PUB."ArtDet".IdArt = PUB."Art".IdArt)
END
But this only return me [ISQL]ERROR: Could not SQLPrepare
I then try to simplify the thing with :
for each PUB."Art": display PUB."Art".IdArt end.
I try to put colon (or not) after the for each loop, using point / comma etc... but I never use the right syntax apparently... or I'm missing a thing to execute this command !
Is anyone can advice me ?
Thx a lot !
You appear to mixing SQL and 4GL syntax.
"FOR EACH" is 4GL. The SQL equivalent is "SELECT".
(If you are using 4GL you do not need then "PUB" prefix and quoting table and field names will not work.)
To do a join with SQL (or the 4GL) use a "," between the table names. For SQL your syntax would look something like:
SELECT * from PUB."Art", PUB."ArtDet"
Gory details regarding WHERE clauses, SQL INNER & OUTER joins etc. can be found in the online documentation:
https://community.progress.com/community_groups/openedge_general/w/openedgegeneral/1329.openedge-product-documentation-overview
You will want to navigate to your specific release and then find the "SQL" guide.
If you search for an airport (aeroway=aerodrome) around brescia, italy, you will also receive a hit for a military airfield, which happens to be tagged as an aerodrome also (it's taggged: aeroway=aerodrome, landuse=military, military=airfield). To avoid this I want to search for aeroway=aerodrome but exclude [military]. I've tried [! military] and [military~"^$"]. Any suggestions?
This particular case may be rare, I realize, but the concept of negating multi-classed elements is useful. And multi-classed elements is not a rare occurance. In general, they seem to be complimentary, not conflicting, so it's not an issue. I also realize that I can weed out conflicting hits with some back-end processing. I wasn't expecting a military airfield to appear with a commercial aerodrome.
In any case, here is a shortened version of my query. I include node, way and relation in full query:
http://overpass-api.de/api/interpreter?
data=[out:json][timeout:25][bbox:45.400861,9.868469,45.641408,10.542755];
(node[aeroway~%22aero|term|heli%22][! military]; ... ) out etc
or:
http://overpass-api.de/api/interpreter?
data=[out:json][timeout:25][bbox:45.400861,9.868469,45.641408,10.542755];
(node[aeroway~%22aero|term|heli%22][military~%22^$%22]; ... ) out etc
If you try to run it, you'll need to include way and relation.
Also, as you can see I don't exactly ask for aeroway=aerodrome. I include terminal and variations on heliport. My experience has been that some aerodromes are tagged only as "terminal", so if you're looking for an airport, asking for "aerodrome" isn't enough.
The correct syntax for negation is as follows:
[military !~ ".*"]
Please see the documentation on the OSM wiki for details.
How to get rid of this warning in statements like this:
pg_prepare('select * from table where id=$1', array($id));
This is no error, but a little annoying thing.
You are using v9.5 -- you really should state such info in your question -- settings screen is a bit different from current stable v9.0.2.
Edit your 2nd Parameters Pattern: make sure it has \ in front of ? -- somehow it interferes with other patterns.
It should be \?\d+ at least.
i am using whoosh to index over 200,000 books. but i have encountered some problems with it.
the whoosh query parser returns NullQuery for words like "C#", "C++" with meta-characters in them and also for some other short words. this words are used in the title and body of some documents so i am not using keyword type for them. i guess the problem is in the analysis or query-parsing phase of searching or indexing but i can't touch my data blindly. can anyone help me to correct this issue. Tnx.
i fixed the problem by creating a StandardAnalyzer with a regex pattern that meets my requirements,here is the regex pattern:
'\w+[#+.\w]*'
this will make tokenizing of fields to be done successfully, and also the searching goes well.
but when i use queries like "some query++*" or "some##*" the parsed query will be a single Every query, just the '*'. also i found that this is not related to my analyzer and this is the Whoosh's default behavior. so here is my new question: is this behavior correct or it is a bug??
note: removing the WildcardPlugin from the query-parser solves this problem but i also need the WildcardPlugin.
now i am using the following code:
from whoosh.util import rcompile
#for matching words like: '.NET', 'C++' and 'C#'
word_pattern = rcompile('(\.|[\w]+)(\.?\w+|#|\+\+)*')
#i don't need words shorter that two characters so i don't change the minsize default
analyzer = analysis.StandardAnalyzer(expression=word_pattern)
... now in my schema:
...
title = fields.TEXT(analyzer=analyzer),
...
this will solve my first problem, yes. but the main problem is in searching. i don't want to let users to search using the Every query or *. but when i parse queries like C++* i end up an Every(*) query. i know that there is some problem but i can't figure out what it is.
I had the same issue and found out that StandardAnalyzer() uses minsize=2 by default. So in your schema, you have to tell it otherwise.
schema = whoosh.fields.Schema(
name = whoosh.fields.TEXT(stored=True, analyzer=whoosh.analysis.StandardAnalyzer(minsize=1)),
# ...
)
I have an MS SQL Server 2008 Database, from which I am fetching data using perl DBD::Sybase module. But there are some special characters in the DB, like the Copyright symbol, Trademark symbol etc., which are not getting imported properly. Perl seems to change all of these special characters to a Question mark character. Is there a way to fix this?
I have tried specifying charset=utf8 in the connection string. The doc mentions a syb_enable_utf8 (bool) setting, but whenever I try that, I get an error:
Can't locate object method "syb_enable_utf8" via package "DBI::db"
One solution I found was this:
use Encode qw(encode_utf8);
Then, wherever you are writing data to a file or anywhere else, use Encode::encode_utf8($data);
where $data is the column/value which you have fetched from MSSQL.
I don't use DBD::Sybase but a) I use a lot of other DBDs and b) I am currently collecting information about unicode support in DBDs. According to the pod you need at least OpenClient 15.x when using syb_enable_utf8. Are you using 15.x or later? Perhaps syb_enable_utf8 is not defined if your client is less than 15.x or perhaps you have too old a version of DBD::Sybase. Unfortunately I cannot see from the Changes file when syb_enable_utf8 was added.
However, when you say "can't locate method" I think that is a clue as syb_enable_utf8 is not a method, it is an attribute (it is under Sybase Specific Attributes) in the pod. So you need to add it to your connect call or set it via a connection handle like this:
my $h = DBI->connect("dbi:Sybase:something","user","password", {syb_enable_utf8 => 1});
or
$h->{syb_enable_utf8} = 1;
You should also read the bits in the pod describing what happens when syb_enable_utf8 is set as it appears from the documents it only applies to UNIVARCHAR, UNICHAR, and UNITEXT columns.
Lastly, you need to ensure you insert the data correctly in the first place. I'd guess if it is not inserted from Perl with syb_enable_utf8 and charset=utf8 and your data is not proper unicode characters in Perl before you insert you'll get garbage back.
The comment Raze2dust made had nothing to do with your issue but is worth heeding if you are going to write the data retrieved from your database elsewhere. Just remember to decode any data input to your script and encode any data output.