How to create index using S&P api - aem

Is there any way to create index through API? I could not find any documentation about index creation using S&P API.
I do not want to let S&P to crawl my site to create index rather I would prefer to use APIs to do it.

No, there is no API for S&P that can be leveraged for the indexing purposes because S&P is basically a crawling engine.
For the sake of simplicity, think of it as a private search engine crawler where you can configure several aspects of crawling like content type, frequency, paths etc. using an interface.
You can, however, use the scripted index features to allow incremental indexing of your site or provide more finer control on crawling. This will not eliminate crawling as the S&P needs to crawl the site in order to extract and index the information. More details about this feature can be found here:
https://marketing.adobe.com/resources/help/en_US/snp/c_about_scripted_index.html
Please note that this is more of a configuration and set of commands rather than an API.

Related

How to Implement Cache while using AEM Search API

We are using AEM 6.3 and we have need to implement Content search functionality in our project.We implemented it using Search API provided but issue is that Search API take only request parameter and hence we are not able to cache the search result page.
Did try to use selector or set request attributes (searchTerm and Tags)and than create Search Client instance and call getResult method but it doesn't return any results.
As we need to do content search across pages and mutilple properties can we use QueryBuilder API here and achieve the same result provided by Search API
Search API is highly performant and the caching is not the best strategy for using searches as you might get stale results. In practice, you end up reducing the cache lifetime and end up at the same problem.
You should look more into optimising your searches with proper indexes over targeted content etc.
However, if you really want to cache the search results you could look into 3rd party solutions but I would highly discourage it in the context of AEM as there are better solutions like:
Offloading searches to a dedicated publisher. You can do it via your LB or dispatcher rules.
Optimise searches by optimising indexes. Remember, indexes don't hit your repository.
Worst case if you really struggle with performance, look into AEM Solr integration as Solr has good caching. You can also achieve same with ElasticSearch or other DB. Just be warned that plumbing and TCO is not free for this.

algolia best practices : new index or tags?

I'v successfully setup an algolia search engine on my web page. My backend syncs public data to algolia, and the searchbar works just fine.
Now I want to setup the same for my admin application. Unlike the public application, this app should be able to recover secret data from algolia.
So far, I can think about two ways of doing this:
For each document, store both a "public" version (with a "public" tag) and an admin version (tagged "admin", and with additional fields). Custom api keys can then ensure that each app has access to the proper data.
OR
Create a new index, perhaps my_admin_collection_index, duplicate the settings, and use it just like the my_collection_index from the admin app.
So in first version I search the same index, but with different tags; in the second version I search two different indices.
Is there some insights about how to choose between the two approaches ?
I'd say it would be easier for me to duplicate documents and put some tags on them, but I can't really tell about the performances impact of such an approach.
Thanks !
The first approach consisting in pushing all objects to a single index and tagging them with the permissions is the good way to go. Combining that approach with the Secured API keys allows you to easily scale while keeping a secure front-end implementation (embedding the key in the javascript code for instance).
Even if the Algolia engine supports an unlimited number of indices per application (I saw users with +700,000 indices), having too many indices may result in some indexing overheads & slowdown (especially on the mutualized plans where you're sharing the indexing CPUs with other customers).

EPiServer - how to search content in different EPiServer websites?

Greetings EPiServer gurus.
Picture this scenario:
A customer has got two different EPiServer-sites.
One internal and one external website.
The external is using EPiServer Find's REST API for search.
The internal is currently using a simple Search page which is based upon the Lucene indexer.
The customer wants to be able to search both the external and internal site's content INSIDE the internal site. They are not keen on the idea of having to buy another EPiServer Find license to apply on the internal. So basically they want to be able to search the content of the external site while inside of the internal.
What would be the proper approach in order to do this?
Any suggestions appreciated.
/ChrisRun
This is a tricky one. EPiServer Find support multi site setup but requires them to be hosted in the same solution. EPiServer constructed the indexing job in such way that it clears the entire Find index, this means that if you have the same Find index on two different machines they will erase each others indexes, effectively you'll only have the results from the most recently indexed site.
We've discussed this with EPiServer on changing this pattern to only allow an indexer to erase posts with siteId's available to the solution running the index job. However, no luck so far, instead we rely on hackish solutions :)
So, what you are asking is possible with a bit of coding, reflect the built-in indexer and ensure the ReindexTarget are scoped correctly (the code is easy to understand). When done this indexing job needs to be used on both the internal and external environment and the original job needs to be removed.
There's no need for filtering in your internal environment but in the external environment you'll have to ensure only external results are posted. If your results include anything else than pages you cannot filter on siteId since global items (like files and images) doesn't have any siteId. We've solved this with a url-filter like the one below.
private static FilterBuilder<ISearchContent> SiteFilterBuilder
{
get
{
var filter = SearchClient.Instance.BuildFilter<ISearchContent>();
filter = filter.Or(x => x.SearchHitUrl.Prefix(EPiServer.Web.SiteDefinition.Current.SiteUrl.AbsoluteUri));
return filter;
}
}
Implement
var query = SearchClient.Instance.UnifiedSearch(Language.Swedish)
.For(searchQuery.Query)
.AndInField(x => x.SearchCategories)
.UsingSynonyms()
.OrFilter(SiteFilterBuilder) // will scope to this site
.ApplyBestBets()
.Track()
.TermsFacetFor(x => x.SearchSection)
;
Off the top of my head, I can see multiple risks involved in adding the public Find index to the internal site - especially if you don't want it two-way (i.e. index the internal site in the same Find index).
One approach could be to add a search endpoint to the public website, which the internal website invokes to do searches.
Basically that endpoint (for example a controller action method) would perform a search using Find (this would happen inside the public web application) and then return the result to the internal website.
Technically, only the public website would use Find - but results would be available to the internal website.

how to specify URL in filters pagePath core reporting api V3

I am building a web app that pulls data through the Core Reporting Api v3 from Google. I am using the client PHP library offered by Google.
I am currently trying to specify a page and retrieve its pageviews for a time range. Every other seems to be working okay except for the fact that if a specfy a filter with ga:pagePath==http://link/uri then I get 0 all the time no matter the time range.
I think the problem is got to do with the setting of value for this pagePath. I want to have spearate data for the desktop version of the site and the smartphone version denoted by s. subdomain
Can anyone hint me on some tips and or tricks to use to get the required data?
Example URL:
http://domain.com/user/profile/id/1
http://s.domain.com/user/profile/id/1
Thanks in advance!
for the the default implementation of Google Analytics, ga:pagePath doesn't include the scheme or hostname so in your case you'd actually want to filter using ga:hostname and ga:pagePath together.
I suggest you use the Query Explorer to build your queries and get familiar with what will work. You can also use this tool to at least get a sense for what type of data the ga:pagePath and ga:hostname dimensions return before trying to filter on them. Finally, once you have the query you want, you can easily get the exact Core Reporting API query by clicking on the Query URI button.
Also check out the Combining Filters section of GA API docs.
So if you want filter on ga:pagepath for domain.com and s.domain.com separately you could do something like
filters=ga:pagePath==/user/profile/id/1;ga:hostname==domain.com
filters=ga:pagePath==/user/profile/id/1;ga:hostname==s.domain.com

How can I fetch more than 1000 Google results with the Perl Google API?

HUsing the regular search engine, as a human, can get you not more than 1000 results, which is far more than a regular person needs.
But what If I do want to get 2000? is it possible? I read that it is possible using the App Engine or something like that (over here...), but, is it possible, somehow, to do it through Perl?
I don't know a way around this limit, other than to use a series of refined searches versus one general search.
For example instead of just "Tim Medora", I might search for myself by:
Search #1: "Tim Medora Phoenix"
Search #2: "Tim Medora Boston"
Search #3: "Tim Medora Canada"
However, if you are trying to use Google to search a particular site, you may be able to read that site's Google sitemaps.
For example, www.linkedin.com exposes all 80 million+ users/businesses via a series of nested sitemap XML files: http://www.linkedin.com/sitemap.xml.
Using this method, you can crawl a specific site quite easily with your own search algorithm if they have good Google sitemaps.
Of course, I am in no way suggesting that you exploit a sitemap for illegal/unfriendly purposes.