Is there a way to add zone_weights() in Sphinx? - sphinx

I successfully managed to use field_weights in Sphinx to get the relevance I needed. However my fields are actually inclusive of several html tags (e.g. <Author> and <Description>). I need those in one field because of some other config work I am doing. So the field_weights won't in fact work for me. What I need is a way to weight the ZONES I set up in sphinx. However zone_weights is not working for me.

Well no, there are no explicit zone weights.
Can perhaps use the individual boost modifier to boost words, that are also within the zone...
ZONE:(h3) banada^1.234

Related

Set start-time for histogram sample

The usecase
We got multiple changelogs stored in the database, and want to create a histogram monitoring the duration between changes.
The problem
There doesn't seem to be a way to set the start time of a Historgram.Timer, e.g we want to set it to lastUpdated given the current changelog.
Avenues of approach
1 Subclassing Histogram
Should work. However the java-lib use protected/package-private extensively, thus making it hard without copying large portions of the library.
2 Using reflection
After a Histogram.Timer is created it should be possible to use reflection to set the start field. The field is marked as private final, and thus a SecurityManager could stop us in some environments.
Ideas?
Neither of the solutions seems like the correct way to go, and I suspect that I'm overlooking a simpler solution (but could find anything at SO or google). We're using grafana to visualize our metrics, if thats at all helpful in this scenario.
You don't need to subclass Histogram, as you don't need to use Histogram.Timer only because your histogram is measuring times.
Simply call myHistogram.observe(System.now() - lastUpdated) every time you record a new change in the database.

How to inject (dynamic?) Parameters in Tableau CustomSQL

I currently try to solve the following issue in Tableau:
In the end, I would like to have a Tableau dashboard where the user can select a Customer, and then can see the Customer's KPIs. Nothing spectacular so far.
To obtain a Customer's KPIs, there is a CustomSQL query with a parameter "CustomerName" (that returns the KPIs for that Customer).
Now the thing:
I don't want to have a hardcoded list of CustomerNames, as it would be possible with Tableau Parameters. Instead, the CustomerNames should be fetched from another datasource. I did not find a way to "link" a Parameter to a DataSource, and/or inject something other than static Parameters into CustomSQL.
My Question: Is there really no solution for this, or am I just doing something wrong (I hope so).
I found this workaround here https://www.interworks.com/de/blog/daustin/2015/12/17/dynamic-parameters-tableau that seems to work, but that looks like... a workaround.
Few background info:
I have to stick to using a CustomSQL because
It is not viable for me to calculate all KPIs for all CustomerNames
and then filter by Tableau, since the data amount is too big.
It is not viable to replace the CustomSQL with Tableau Calculations
and Filters (already tried that, ended up in having Tableau pulling
too much data instead of pushing the work to the database).
I cannot believe that Tableau does not offer a solution here, since the use case is pretty common I believe.
Do you have some input for me?
Thank you for your help in advance!
Kind Regards
have you tried using rawsql() functions together with stored functions on the database side? I found it pretty useful when needed to load single value from the dataset completely not related to currently used datasource.
For example, running foo stored function which accepts 2 dates and calculated sum of something, Syntax should be something like:
rawsql_int(your_db_schema.foo(%1,%2),[startDateFieldTableau],[endDateFieldTableau])
but you can access it directly:
rawsql_int("select sum(bar) from sales")
but this is bit risky.
Drawbacks:
it relies on the current connection (you create a calculated field (duh!)
it will not work with extract (but you are using custom sql anyways so I believe you are more into live connection

Using RegeX in Adobe CQ query builder

Is there any way to use Regular expressions in Query builder.
Is JCR supports this?
Any pointers on this would be helpful for us.
Thanks in advance.
San
If this QueryBuilder API documentation where to believed as being definitive, then no I would not say there is regex support. However there does seem to be some wildcard support that may be useful. What I would do in this case is try to craft a query around all the properties that you know of about your nodes that can identify them. For example using the debug tool at http://x.x.x.x:4502/libs/cq/search/content/querydebug.html a query like may give you some ideas
type=cq:Page
path=/content/myapp
nodename=*s
1_relativedaterange.property=jcr:content/cq:lastModified
1_relativedaterange.lowerBound=-48h
Where I'm looking for pages in my app content, that end is 's', that have been modified in the last 48 hours. You can even filter by resourceType, template, and any other property that can help you find those nodes. You may even consider adding your own just for this query.
Maybe you can have a sling job, where in Java you could iterate the node names (or whatever) and you do have regex, and tag nodes with a meaningful property that you can then use to query using the query builder.

How does one generate a slug from user comments dynamically?

I was reading an article on Mongo site where by they mention adding a slug to every user comment. http://docs.mongodb.org/manual/use-cases/storing-comments/
What I am stuck on is how to generate a slug dynamically?
Any tips?
You do this within an area of your comment creation known as before_save. This is basically an event that occurs after you have the information for the comment but you have not saved yet.
This slug is just a unique identifier, you don't have to use the one they provided and infact the one they provide might not be best for storage, whereby they use the date and time and a bit on the end to make it unique.
I personally make a slug out of the _ids of the current and previous documents and then separate with /, it works and sorts well also it's easy to use pre-fixed regexes on since it is just the string representation of the OjectId so less guess work needed.

How to address semantic issues with tag-based web sites

Tag-based web sites often suffer from the delicacy of language such as synonyms, homonyms, etc. For programmers looking for information, say on Stack Overflow, concrete examples are:
Subversion or SVN (or svn, with case-sensitive tags)
.NET or Mono
[Will add more]
The problem is that we do want to preserve our delicacy of language and make the machine deal with it as good as possible.
A site like del.icio.us sees its tag base grow a lot, thus probably hindering usage or search. Searching for SVN-related entries will probably list a majority of entries with both subversion and svn tags, but I can think of three issues:
A search is incomplete as many entries may not have both tags (which are 'synonyms').
A search is less useful as Q/A often lead to more Qs! Notably for newbies on a given topic.
Tagging a question (note: or an answer separately, sounds useful) becomes philosophical: 'Did I Tag the Right Way?'
One way to address these issues is to create semantic links between tags, so that subversion and SVN are automatically bound by the system, not by poor users.
Is it an approach that sounds good/feasible/attractive/useful? How to implement it efficiently?
Recognizing synonyms and semantic connections is something that humans are good at; a solution to organizing an open-ended taxonomy like what SO is featuring would probably be well served by finding a way to leave the matching to humans.
One general approach: someone (or some team) reviews new tags on a daily basis. New synonyms are added to synonym groups. Searches hit synonym groups (or, more nuanced, hit either literal matches or synonym group matches according to user preference).
This requires support for synonym groups on the back end (work for the dev team). It requires a tag wrangler or ten (work for the principals or for trusted users). It doesn't require constant scaling, though—the rate at which the total tag pool grows will likely (after the initial Here Comes Everybody bump of the open beta) will in all likelihood decrease over time, as any organic lexicon's growth-rate does.
Synonymy strikes me as the go-to issue. Hierarchical mapping is an ambitious and more complicated issue; it may be worth it or it may not be, but given the relative complexity of defining the hierarchy it'd probably be better left as a Phase 2 to any potential synonym project's Phase 1.
The way the software on blogspot.com is set up, is that there is an ajax-autocomplete-thingie on the box where you write the name of the tags. This searches all your previous posts for tags that start with the same letters. At least that way you catch different casings and spellings (but not synonyms).
How would the system know which tags to semantically link? Would it keep an ever-growing map of tags? I can't see that working. What if someone typed sbversion instead? How would that get linked?
I think that asking the user when they submit tags could work. For example, "You've entered the following tags: sbversion, pascal and bindings. Did you mean, "Subversion", "Pascal" and "Bindings"?
Obviously the system would have to have a fairly smart matching system for that to work. Doing it this way would be extra input for the user (which'd probably annoy them) but the human input would, if done correctly, make for less duplicate tags.
In fact, having said all that, the system could use the results of the user's input as a basis for automatic tag matching. From the previous example, someone creates a tag of "sbversion" and when prompted changes it to "Subversion" - the system could learn that and do it automatically next time.
Part of the issue you're looking at is that English is rife with synonyms - are the following different: build-management, subversion, cvs, source-control?
Maybe, maybe not. Having a system, like the one [now] in use on SO that brings up the tag you probably meant is extremely helpful. But it doesn't stop people from bulling-through the tagging process.
Maybe you could refuse to accept "new" tags without a user-interaction? Before you let 'sbversion' go in, force a spelling check?
This is definitely an interesting problem. I asked an open question similar to this on my blog last year. A couple of the responses were quite insightful.
I completely agree. The mass of tags that have currently. I don't participate in other tagged based sites. However having a hierarchy of tags would be very helpful, instead of ruby rails ruby-on-rails rubyonrails etc...
Tags are basically our admission that search algorithms aren't up to snuff. If we can get a computer to be smart enough to identify that things tagged "Subversion" have similar content to things tagged "svn", presumably we can parse the contents, so why not skip tags altogether, and match a search term directly to the content (i.e., autotagging, which is basically mapping keywords to results)?!
The problem is to make the search engine use the fact that 'subversion' and 'svn' are very similar to the point that they mean the same 'thing'.
It might be attractive to compute a simple similarity between tags based on frequency: 'subversion' and 'svn' appear very often together, so requesting 'svn' would return SVN-related questions, but also the rare questions only tagged 'subversion' (and vice versa). However, 'java' and 'c#' also appear often together, but for very different reasons (they are not synonyms). So similarity based on frequency is out.
An answer to this problem might be a mix of mechanisms, as the ones suggested in this Q/A thread:
Filtering out typos by suggesting tags when the user inputs them.
Maintaining a user-generated map of synonyms. This map may not be that big if it just targets synonyms.
Allowing multi-tag search, such that the user can put 'subversion svn' or 'subversion && svn' (well, from programmers to programmers) in the search box and get both. This would be quite practical as many users may actually try such approach when they do not know which term is the most meaningful.
#Nick: Agreed. The question is not meant to argue against tags. Tags have great potential, but users will face a growing issue if one cannot search 'across' tags.
#Steve: Maintaining an ever-growing map of tags is definitely not practical. As SO is accumulating an ever-growing bag of tags, how could we shade some light on this bag to make search of Q/A tags even more useful, in a convenient way?
#Espo: 'Ajax-powered' tag suggestions based on existing tags is apparently available on SO when creating a question. This is by the way very helpful to choose tags and appropriate spelling (avoiding the 'subversion' vs. 'sbversion' issue from Steve).