Algolia Tags vs Facets Use Cases - algolia

New to Algolia, and having a bit of trouble deciphering the difference (suggested use) of tags vs. facets -- they seem to be functionally equivalent.
The Algolia documentation gives one example of a tag with a user ID -- e.g. "user_1234", which could then be used for filtering.
However that seems functionally equivalent to simply having this in your JSON:
"user": "1234"
and then declaring "user" as a faceted field.
What's the difference / purpose? Why have both tags and facets?

You're indeed correct that both can give you the same filtering functionality.
The main difference comes from facet counts that are computed at indexing time, which takes time.
That's why you can now add in your attributesForFaceting setting an onlyFilter modifier to your attribute, like so:
{
attributesForFaceting: [
'onlyFilter(user)'
]
}
This will tell the engine that the user attribute should be considered as a tag or tag list (this syntax is currently undocumented, but should soon be).
The same logic can be applied to numeric attributes. By default, the Algolia engine creates data structures for all numbers indexed in order to quickly answer to queries like nb_views>10000.
This is also computation-heavy, which is why you can add the equalOnly modifier in the numericAttributesToIndex.

Related

Select * for Github GraphQL Search

One of the advantage of Github Search v4 (GraphQL) over v3 is that it can selectively pick the fields that we want, instead of always getting them all. However, the problem I'm facing now is how to get certain fields.
I tried the online help but it is more convolution to me than helpful. Till now, I'm still unable to find the fields for size, score and open issues for the returned repository(ies).
That's why I'm wondering if there is a way to get them all, like Select * in SQL. Thx.
GraphQL requires that when requesting a field that you also request a selection set for that field (one or more fields belonging to that field's type), unless the field resolves to a scalar like a string or number. That means unfortunately there is no syntax for "get all available fields" -- you always have to specify the fields you want the server to return.
Outside of perusing the docs, there's two additional ways you can get a better picture of the fields that are available. One is the GraphQL API Explorer, which lets you try out queries in real time. It's just a GraphiQL interface, which means when you're composing the query, you can trigger the autocomplete feature by pressing Shift+Space or Alt+Space to see a list of available fields.
If you want to look up the fields for a specific type, you can also just ask GraphQL :)
query{
__type(name:"Repository") {
fields {
name
description
type {
kind
name
description
}
args {
name
description
type {
kind
name
description
}
defaultValue
}
}
}
}
Short Answer: No, by design.
GraphQL was designed to have the client explicitly define the data required, leading to one of the primary benefits of GraphQL, which is preventing over fetching.
Technically you can use GraphQL fragments somewhere in your application for every field type, but if you don't know which fields you are trying to get it wouldn't help you.

Sulu CMS: how to search/filter for content of a specific type with specific values for specifc attributes?

Short description of the situation:
We're running a forked version of Sulu 1.5.2, PHP 7.1, Windows server environment, db connection with PostgreSQL
We have a website structure/tree where we have house templates at the top level; each house has one house_rooms and one house_occupants template; each house_rooms template has N house_rooms_room templates, and each house_occupants template has N house_occupants_occupant templates. This represents an actual House that has N Rooms and N Occupants.
Now I'd like to know if there is a way to specifically get, for instance, all the house_occupants_occupant content that follows a certain pattern of attributes (for instance: their gender attribute having value 'female' and their date_of_birth parameter being >= 1990/01/01), without having to load each house, then find its house_occupantspage among the children, and then loop over that template's house_occupants_occupant children and filter the thus begotten content according to their gender and date of birth attributes.
I already found that there is a ContentRepository class that can ::findAll() and ::findByUuids(), but there doesn't seem to be a way to filter on specific attributes (like template type, template attributes, ...). So I took a roundabout way of creating my own "repository" that does direct PDO queries on the phpcr_nodes table in the database, to specifically scan the props attribute for the occurence of a certain template name:
$this->pdo->query("SELECT identifier, props FROM phpcr_nodes WHERE props LIKE '%>house_occupants_occupant<%'");
I can see that the propscontains a string value representing an XML document that somehow translates into the entire template with attribute-value pairs, however it is obscured regarding tag-levels and how certain attributes relate to certain values. So in theory I could use a specific XML parser to turn this into something human-readable, so that for my house_occupants_occupant data I could get something like:
// what I would get after putting the props through a certain XML parser:
$xmlHumanReadableData = [
'<the_uuid_of_occupant_1>' => [
...
'gender' => 'female',
'date_of_birth' => '1992-05-18T00:00:00.000+00:00',
...
],
... //etcetera etcetera
];
When I would have that, I could filter the readable data to ascertain which content I want to keep, add the node-uuid to some $theUuids variable, and then retrieve the actual content using Sulu's ContentRepository::findByUuids($theUuids) method. That would "only" require 2 queries and some PHP array filtering in between, which is a great deal better than looping over all the children content starting from a certain parent and doing this until you've traversed all the parents and all their children... (Certainly, the overhead would increase if you'd want to search for, for instance, all house nodes where at least one of its house_occupants_occupant nodes represents a child less than 10 years old, since you'd need extra queries to "set up" the filterdata used in the final query. But still: a great deal better than looping everything... ;-) )
So my question is sort-of twofold:
What is the Sulu-specific XML parser I can use to turn the XML string value in this props column into something human-readable, with proper attribute-value pairs?
And/or, hopefully: is there a way I can avoid all this nonsense and just use a less low-level way of retrieving content of a specific template type with specific values for specific attributes ?
The ContentRepository you've found is already an abstraction to some of our requirements for pages. Your requirements are already quite specific, so you should write your own query using SQL-2, the query language for PHPCR.
This should enable you to write a query which matches your requirements.

How do we prevent alphabetical ordering of returned facet values?

We're searching our index on algolia through the api and rendering facets and their values each time the search is updated. Each facet returns a maximum of 5 values to show the user.
When a facet attribute is selected, the search result json returns that facet and its attributes re-ordered first by their count and second by alphabetical order. Usually the just-elected facet value is shown first and we're happy with that.
If we then select another facet with a count of say 10, then in the returned search results, if there are other facet values that have not been selected but that also have a count of 10 and are higher up in alphabetical order they'll popup ahead of the just-selected facet removing it from sight for the user. And that's unusual because the user expects to see what they just selected in the returned results.
How can we ensure that the returned search result facet values show up in the order: highest count, selected, and then alphabetical as opposed to highest count, alphabetical?
Thanks
This question was cross-posted to Algolia's forum, you can see the full discussion here.
The short answer is:
"The main problem here is that you are using the raw API Client
instead of the JS Helper which we strongly recommend: it handles a
search state internally, it has advanced features built-in (like
facets sorting) and it's really easy to use. You can go from a JS
Client to a Helper implementation very easily (you won't struggle if
you switch)."

elasticsearch array field of keywords - how to index it

I've got input that is analogous to tags, where there are a couple of strings per record, and they should be thought of as keywords, not to be tokenized or broken up or analyzed in any particular way. I want it to show up in faceting "as-is", including spaces, slashes, dashes and ampersands.
I don't think I need multi_field here. There is one input value per record "keyPhrases" but the input value is a simple json array of strings.
I want elasticsearch to insert into the facets each of the values, and tag the record with all of the phrases.
Usually there are only one or two or three phrases per record, but there could be more. The set of keyPhrases is fairly small, like 30 or at most like 50. They could be thought of as "categories".
The faceting keeps breaking up the input strings and using lowercasing, even though I'm trying to specify not_analyzed, keyword tokenizer, keyword analyzer, and trying things like that.
I have other fields that keep their spacing and capitalization as I desire in the facets returned, however those fields are not_analyzed and are also store: true, but are also just exactly 1 string input per record, as opposed to many per record.
I could just take the top 1 keyPhrase per record and flatten it, but ideally all the tags would work and be available as facets.
Any ideas on how to do this?
Well, this is embarrassing.
My strict mapping wasn't actually committed to the server at the time I was trying this.
(I was dropping the index and creating the index again with each new mapping, and hadn't realized it, and this was not the final mapping, so it was getting loaded and then dropped.)

Is it possible to perform a Sphinx search on one string attribute?

sql_query=SELECT id,headline,summary,body,tags,issues,published_at
FROM sphinx_search
I am working on the search feature of my Web site and I am using Sphinx, Perl and Sphinx::Search. As long as I want to search in all the attributes and I don't restrict it to just one, everything goes well. However when the user searches for a specific tag, I can't just give the result of a fuzzy search, I want to use the power of Sphinx to search only on tags or issues, maybe sometimes the user wants to search on headline and issues.
How can I perform such a task?
You need to put it in Extended Match Mode
https://metacpan.org/module/JJSCHUTZ/Sphinx-Search-0.27.2/lib/Sphinx/Search.pm#SetMatchMode
Then you can use Extended Query syntax
http://sphinxsearch.com/docs/current.html#extended-syntax
Which includes the field search operator
#tags keyword1
(Be careful with sphinx, the word "attribute" has a specific meaning - values attached to the document, useful for sorting/grouping/filtering and returning with the resultset. Whereas I think you are talking about fields. All the columns from the sql_query you dont mark as an attribute, are a field - and full text searchable)