I am trying to use PostgreSQL to implement a full-text search system.
I encounter this strange or may be intended feature with that.
While trying to index or search for a column which contains names of files with extension (e.g. myimage.jpg), the system treats it as a url and does not properly tokenize.
I referred to the documentation and see that via ts_debug that the file name is taken as a host of a url.
Could some one tell how to take all inputs as normal word in the FTS of PostgreSQL.
Also, on a second request, how can one do a contains, startswith, and endswith searches with it?
Update
I have now tried the statement create text search configuration..., copied from pg_catalog.english and removed host,url, and url_path and then specified the configuration for the ts_debug method. But still no go., myimage.jpg is still identified as host.
Version
I use version 9.4
tl;dr Look at pre-parsing your input and removing punctuation if you really only want words (and not emails, urls, hosts, etc).
So after trying to figure this out myself the issue is that you don't seem to be able to easily customise the parser. From my understanding the parser runs first, which generates tokens. Those tokens are then matched to dictionaries.
By removing host, url, url_path from the configuration all you are doing is making it so that these tokens don't get looked up in a dictionary, resulting in no lexeme from these tokens. Which essentially means that they don't exist in terms of search. Which is not want you want...
Ideally what you need to do is customise the parser to not generate those tokens in the first place, or to also generate overlapping tokens (similar to how hyphenated words generate a token for the entire word as well as individual components) . This doesn't seem to be possible at the moment without writing a custom parser.
The only solution to this would be to pre-parse the text to remove the full stop. Note that if you rely on other types of tokens like version (e.g. 8.3.0) or email (e.g. name#domain.com) this will break those. So you may need to be a bit clever on how you remove characters.
select ts_debug('english', replace('this-is-a-file.jpg', '.', ' '));
"(asciihword,"Hyphenated word, all ASCII",this-is-a-file,{english_stem},english_stem,{this-is-a-fil})"
"(hword_asciipart,"Hyphenated word part, all ASCII",this,{english_stem},english_stem,{})"
"(blank,"Space symbols",-,{},,)"
"(hword_asciipart,"Hyphenated word part, all ASCII",is,{english_stem},english_stem,{})"
"(blank,"Space symbols",-,{},,)"
"(hword_asciipart,"Hyphenated word part, all ASCII",a,{english_stem},english_stem,{})"
"(blank,"Space symbols",-,{},,)"
"(hword_asciipart,"Hyphenated word part, all ASCII",file,{english_stem},english_stem,{file})"
"(blank,"Space symbols"," ",{},,)"
"(asciiword,"Word, all ASCII",jpg,{english_stem},english_stem,{jpg})"
In terms of your second question. Are you talking about partial word matches? You get this a little bit with the stemming when using a config like english, so running becomes run which will match if you search for run or running. If you're talking about fuzzy matching it gets a little more complicated. I suggest reading this article http://rachbelaid.com/postgres-full-text-search-is-good-enough/
I'm having a hard time locating the documentation explaining how to add a line break in Doxygen markdown.
I've tried using two spaces at the end of the line, and I've also tried a single or double newline, but none of these are working for me.
I'm using Doxygen version 1.8.9.1.
Put a HTML-break tag <br> where you want to have a linebreak, does the job.
At least for man page output and HTML output.
Add \n followed by a space at the end of your line [1].
Especially recommended when editing with Emacs, which reacts in weird ways to the other suggested solution <br>.
[1] As per #albert's comment. Works for me under Doxygen 1.8.13.
I am wondering if I have some code with line numbers embedded,
1 int a;
2 MyC b;
3 YourC c;
etc., and then I copy them and try to paste them in Eclipse, how to get rid of these line numbers to make the source code valid? Is there any convenient way, or a short-cut key?
Thank you.
Simply use the Alt+Shift+A (Eclipse 3.5 M5 and above) shortcut to toggle block selection mode. Then select the column with line numbers and delete it!
To make it easier you could setup a macro, but for that you need additional plug-in. I'm not aware of how to do it even easier.
Try this link. This is a dynamic online tool, where it is very easy to just copy paste code and get code without line numbers:
http://remove-line-numbers.ruurtjan.com/
You could use some script to do the work. For instance, using sed
I removed line numbers by find and replace with regular expression option.
Replacing regular expression \d+\s\s with empty string where \d+ means any combination of numbers and \s is actually a space (This is to avoid any numbers present in the code).
Best way is use SED command. Here you can specify as many as digit you want to replace.
in below example open copied code in VI editor and assuming its containing upto 1000 lines.
:%s/^[0-9][0-9|10-99|100-999]//g
if you want to use more lines then put one more or condition.
I want to automate some writer tasks. I need to create a .odt writer
document with oo:doc using methods such as create paragraph and append
paragraph. The problem is that append paragraph and create paragraph does not
allow text to start at middle of page or at a certain column, ie
Name Surname Address
When I unzip the "master" document I want to to create, when I inspect the content.xml file i see the xml equivalent is
" <text:p text:style-name="Text_20_body"><text:s text:c="115"/><text:span text:style-name="T1"><text:s/>Hallo how are you today</text:span></text:p><text:p text:style-name="P1"><text:s text:c="116"/>I hope you are well also</text:p><text:p text:style-name="P1""
How do I set the text:c and text:s element(s) from within oo::doc
Question2:
How do i set the formatting of a paragraph
to only extend from ie column 20 to column 80
thanks
Those elements are for runs of non-breaking spaces. the text:c attribute says how many spaces there are.
That doesn't strike me as a solution to what you want, which is to change the margins and position of a paragraph, yes?
Do you have a document that you want to use as a template, where the text will be inserted? Or ar you trying to create the entire page from scratch?
I think you want to use OpenOffice.org to create a Writer document that has the structure you want, then look at the XML to see what the markup is that accomplishes that. Look at paragraph-level styles or even frames if that is what is used. You might be able to create insertion points for your generated content by then adding magic-text phrases that you can scan for.
Then figure out how to get that done with the perl module.
In a branch spec, I have the following view:
//depot/dev/t/a/g/... //depot/dev/t/r/g/...
-//depot/dev/t/a/g/p/o*/... //depot/dev/t/r/g/p/...
Perforce reports an "Incompatible wildcards" for the second rule there.
What I'd like to do is exclude all the directories beginning with "o".
What am I doing wrong, and how do I fix this?
I think you need to have matching wildcards on both sides of each mapping. Try:
//depot/dev/t/a/g/... //depot/dev/t/r/g/...
-//depot/dev/t/a/g/p/o*/... //depot/dev/t/r/g/p/o*/...
While not a direct answer to the question (answered above), I was stumped on the same message and found this post while trying to search for a solution.
In my case, it was because when copy-pasting the workspace mapping from another file, the ellipsis character was placed instead of the Perforce "..." wildcard. To fix this, I deleted the ellipsis and replaced it by typing in three periods.