Sphinx how to use wordforms with soundex? - sphinx

I am using sphinx with soundex morphology. I want to use wordworms.
Which form of word do I need to use like a result?
call keywords ('azori', 'test', 1);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1 | azori | a260 | 1550 | 1551 |
+------+-----------+------------+------+------+
1 row in set (0.00 sec)
In wordforms I need to use
azouri > azori
or
azouri > a260

Is is probably the critical bit from the documentation:
wordforms .. It can also be used to implement stemming exceptions, because stemming is not applied to words found in the forms list.
http://sphinxsearch.com/docs/current.html#conf-wordforms
... ie stemming (actually morphology, and hence soundex) algorithm is not run on words/tokens transformed by wordforms. Hence yes you do have to manually 'soundex' the destination keyword.

Related

Problem with Postgresql, LIKE gives double results

I am currently working with Postgresql and I am facing a problem.
I have two tables "question" and "question_detail" in which there are codes. In "question_detail" are the codes including subcode so e.g. TB01Q07, TB01Q07a, TB01Q08_SQ002. Now I wanted to use the command LIKE to see if the table "question" also contains these records. But in "question.code" there are only codes without the following underscore. This is a table that was given to me, I find this somehow very stupid.
The problem is that when I search with LIKE the value TB01Q07a is listed twice. This is also understandable to me, since searching for TB01Q07% also returns the value TB01Q07a.
Does anyone know of a way to search only for TB01Q07a without it resulting in TB01Q07% as TB01Q07a?
Command
SELECT qd.code, qd.label, q.type
FROM public.question q,
public.question_detail qd
where CASE
WHEN qd.code = q.code THEN qd.code = q.code
ELSE qd.code like CONCAT(q.code,'%')
END;
question
| code | type |
| ---------|-------- |
| TB01Q07 | comment |
| TB01Q07a | comment |
| TB01Q08 | option |
**question_detail**
```none
| code | label |
| -------------- | ------|
| TB01Q07 | AB01 |
| TB01Q07a | AB02 |
| TB01Q08_SQL002 | AB03 |
I ran the SQL and wanted the TB01Q07a value to appear only once and not be listed twice.
I think I have found a solution with distinct on.
SELECT distinct on (qd.code) q.id_question,qd.code, q.question, q.question_type
FROM public.question q, public.question_detail qd
where qd.code like CONCAT(q.code,'%');
like('TB01Q07%') matches both TB01Q07 and TB01Q07a, so you get two rows for TB01Q07 and one row for TB01Q07a.
You need to be more precise and include the underscore. Also make sure it's escaped, _ means any one character in a like.
There is no need for a case, use or. Avoid using multiple from statements, use an explicit join with an explicit on. This is clearer and gives you more control over the join.
select qd.*, q.*
from public.question q
join public.question_detail qd
on qd.code = q.code OR qd.code like q.code || '\_%'
Demonstration.
Note: this problem doesn't exist if you use foreign keys. Assign unique IDs to question and reference them in question_detail. This is faster, shields you from changes to the question code, and ensures the referred to question exists.

How can I add a table with multi-row cells to a readme in VSTS?

How can you add tables with multi-row cells to markdown in Microsoft VSTS?
I have previously used asciidoc for readme files on github as it is both richer and less ambiguous. The company now has projects on VSTS which does not support asciidoc so I need to use markdown instead.
However, it is unclear what flavour of markdown is actually supported
This page says that github flavoured markdown can be used
https://learn.microsoft.com/en-us/vsts/collaborate/markdown-guidance
I found another page saying they use commonmark via the markdown-it library.
Q: Does VS Code support GitHub Flavored Markdown?
A: No, VS Code targets the CommonMark Markdown specification using the
markdown-it library. GitHub is moving toward the CommonMark
specification which you can read about in this update.
I've been using a combination of asciidoctor and pandoc to convert files but nothing gets it quite right.
(Asciidoctor converts to docbook which pandoc can then parse)
asciidoctor -b docbook -v -o "$OUTPUT".xml "$INPUT" &&
pandoc -f docbook -t markdown_github -i "$OUTPUT".xml -o "$OUTPUT"
I have to re-add the title manually.
My current stumbling block is multi-row cells.
Github supports grid tables,
see Newline in markdown table?:
+---------------+---------------+--------------------+
| Fruit | Price | Advantages |
+===============+===============+====================+
| Bananas | first line\ | first line\ |
| | next line | next line |
+---------------+---------------+--------------------+
| Bananas | first line\ | first line\ |
| | next line | next line |
+---------------+---------------+--------------------+
But neither this nor embedded html seem to work in VSTS.
I would be happy to use html readmes instead if that was permitted.
Update 17-Nov-2017:
I found the link to markdown-it and added it above. I've raised an issue there for clarification (or enhancement). Its unclear which version VSTS actually uses under the hood.
I would like to ask the question of Microsoft themselves but their ask a question link goes straight to stack overflow.
The markdown-it library does support the usage suggested by #Waylan:
| Fruit | Price | Advantages |
| ------------- | ----------------------- | ----------------------- |
| Bananas | first line<br>next line | first line<br>next line |
| Bananas | first line<br>next line | first line<br>next line |
See https://github.com/markdown-it/markdown-it/issues/406.
The issue is most likely Microsoft disabling html.
A solution thus waits on a reply to #starian's suggestion: https://visualstudio.uservoice.com/forums/330519-visual-studio-team-services/suggestions/32312290-multi-line-in-the-cell-of-a-table-in-markdown-in-v
In short, each row must be on one line and should use <br> to indicate a line break. Like this:
| Fruit | Price | Advantages |
| ------------- | ----------------------- | ----------------------- |
| Bananas | first line<br>next line | first line<br>next line |
| Bananas | first line<br>next line | first line<br>next line |
Below is an explanation of each tool with an analysis of that tool's documentation:
GitHub
CommonMark is a Markdown variant with a strict spec. GitHub-Flavored Markdown (GFM) is an extension of CommonMark (which adds features to CommonMark such as tables), with its own spec. Therefore, to say that an implementation supports GFM is to say that it supports CommonMark with extensions. Note that GitHub adopted the current spec on March 14, 2017, so any information older that that may not be relevant for the current implementation.
Whether VSTS actually uses a CommonMark/GFM implementation or a close approximation is unclear from the documentation. However, as the documentation clearly states that "GitHub-flavored extensions" are supported, that would indicate to me that the GFM Spec would be a good reference. Regardless, the GFM Spec is the controlling spec for any Markdown rendered on github.com.
The Tables section of the GFM Spec plainly states:
Block-level elements cannot be inserted in a table.
And gives this simple example:
| foo | bar |
| --- | --- |
| baz | bim |
While the spec does not specifically mention multiple line cells, there are no examples with any cells that contain multiple lines. It is my understanding that that is not supported by GFM. Therefore, the only way to include line breaks in GFM Table cells is with the <br> tag, which is not a block-level element.
Pandoc
Pandoc supports multiple different styles of table syntax. If you are passing your Markdown to both Pandoc and GFM, then you need to use Pandoc's table style which most closely matches GFM's style. For example, GFM Tables do not include support for + at the corners. That syntax is specific to Pandoc's Grid Tables. Fortunately, Pandoc's documentation tells us which style most closely matches GFM.
Pandoc has support for various "Markdown Variants", one of which is gfm. The docs have this to stay about that variant:
We also support gfm (GitHub-Flavored Markdown) as a set of
extensions on commonmark:
: pipe_tables, raw_html, fenced_code_blocks, auto_identifiers,
ascii_identifiers, backtick_code_blocks, autolink_bare_uris,
intraword_underscores, strikeout, hard_line_breaks, emoji,
shortcut_reference_links, angle_brackets_escapable.
Note that the gfm variant of Pandoc uses Pipe Tables. It is also noteworthy that the markdown_github variant of Pandoc is deprecated since GitHub adopted Commonmark. But even the markdown_github variant uses Pipe Tables.
Pandocs' documentation for Pipe Tables states (emphasis added):
The cells of pipe tables cannot contain block elements like paragraphs and lists, and cannot span multiple lines.
And gives this example:
| Right | Left | Default | Center |
|------:|:-----|---------|:------:|
| 12 | 12 | 12 | 12 |
| 123 | 123 | 123 | 123 |
| 1 | 1 | 1 | 1 |
That is clearly the same as GFM tables and does not include any support for block level elements or multi-line cells.
VSTS
The VSTS Documentation for Tables closely matches GFM and Pandoc Pipe Tables with this example:
| Heading 1 | Heading 2 | Heading 3 |
|-----------|:---------:|----------:|
| Cell A1 | Cell A2 | Cell A3 |
| Cell B1 | Cell B2 | Cell B3 |
While the the VSTS Documentation makes no specific mention of block-level elements or multiple lines, is seems safe to assume that it is in fact the same style.
We can make that assumption because in all three instances (VSTS, GFM and Pandoc Pipe Tables), the syntax does not provide a divider between individual rows of the table (compare with Pandoc Grid Tables, which supports row dividers). While there is a divider between the header and data rows, with no divider between individual data rows, there is no way to indicate how many lines of text belong to each row. Therefore, each row can only be represented by one line of text.
Conclusion
Given the above, to be parsed properly by VSTS, GFM and Pandoc (gfm variant), your table should be formatted like this:
| Fruit | Price | Advantages |
| ------------- | ----------------------- | ----------------------- |
| Bananas | first line<br>next line | first line<br>next line |
| Bananas | first line<br>next line | first line<br>next line |
And when using Pandoc, be sure to use the gfm format (pandoc -f gfm ...).
It is not supported in markdown of VSTS, I submit a user voice here: multi-line in the cell of a table in markdown in VSTS, you can vote and follow it.

Doxygen Table drawing

I would like to insert a ASCII art table (as below) in the documentation.
The Markdown feature of doxygen comes in the way and messes it all up.
I've tried to the HTML table and they are fine but the source document becomes unreadable then.
Can I somehow get doxygen not to process a section but still include it in the output file?
Similar to here where 4 blanks allow to to inserted already formatted text in fixed width font.
|-------------|-------------------------|---------------|
|AUTO_NEW_OFF | Entry action | LED_FLASH |
| | | SEQ_OFF |
|-------------|-------------------------|---------------|
| | eXit action | |
|-------------|-------------------------|---------------|
| | | |
|-------------|-------------------------|---------------|
| OFF | SEQ complete | |
|-------------|-------------------------|---------------|
I think I can answer this myself already.
The Fenced Code Blocks ( 3 x ~) feature seems to work ok
~~~
|-------------|-------------------------|---------------|
| MAN_NEW_OFF | Entry action | LED_FLASH |
| | | SEQ_OFF |
|-------------|-------------------------|---------------|
~~~
An improvement on fenced code would be to surround the table with the doxygen commands #verbatim and #endverbatim.
If you use a "code" style, be that markdown's ~~~ or doxygen's #code, there's a chance that current or future versions of Doxygen will start trying to colour it in syntactically.

Multiple SQLite Queries on iOS

I have more than 3 views. My database looks like this:
Category:
CatID | CatTitle
----------------
1 | XYZ
2 | Sample
Content:
ItemID | ItemCatID | ItemText | ItemText2 | ItemText3 | ItemText4
-----------------------------------------------------------------
1 | 1 | Test | Bla | Sample | MoreContent
2 | 1 | Test2 | BlaBla | Sample2 | Other Content
3 | 2 | Test3 | BlaBla2 | Sample3 | Other Content2
I want a view where first page category, second page list (ItemText), third page detail.
I'm not sure how to go about accomplishing that. If I use JOIN should I define "sqlite3_stmt *compiledStatement" in triple?
I think it can be done with 'For', "get parent,child" (like a cursor in java)?
Any advice welcome.
I'm not sure what you want.
Can you be more specific?
I can give you two tips though, first SQLite does not support stored procedures and has a very limited support for PL/SQL:http://www.sqlite.org/whentouse.html
if you REALLY MUST use it I suggest looking at this, I never tried it but it may work:
http://chriswolf.heroku.com/articles/2011/01/26/adding-stored-procedures-to-sqlite
Second, you usually wanna use a Wrapper around SQLite c functions so you worry about the SQL itself more and less about the c functions, examples:
Best Cocoa/Objective-C Wrapper Library for SQLite on iPhone
Hope this helps

Sphinx, set exact matches for each record?

So I've been using Sphinx with a rails project lately, I want to provide a list of 'would be' exact matches that would match 100% with a give term. For example something like:
+==================+==========================================================+
| ingredient | exact matches |
+==================+==========================================================+
| cheese, cream | 'cream cheese','philadephia cream cheese','cream chese',|
| | 'creamed cheese' |
+------------------+----------------------------------------------------------+
| Cheese, gruyere | 'gruyere','gruyer cheese','gruyeres cheese' |
| | 'gruyere chese' |
+------------------+----------------------------------------------------------+
| Cheese, blue | 'blue cheese','blu cheese' |
+------------------+----------------------------------------------------------+
So basically the functionality I'm looking for would be that Sphinx would try to do its typical matching on all the records, but if the search term matches exactly with one of the strings in an array in that record that result would have a much higher weight. (like 100x, so it would then be the best match)
Is this possible? It seems like other people would have had this problem before... no?
Update
I suppose the best answer might be to just index the exact matches column and provide a really high weight to the terms.
I'm not sure how I can break up the "array" and see if the search term matches exactly though...
Your should try to play with sphinx search modes. Look at match phrase and match extended2.