How to Query Jackrabbit for same name siblings - aem

Is it possible to find the same-name siblings (SNS) using JCR-SQL2, JCR-SQL or QueryBuilder in Adobe CQ5/Adobe Experience Manager. I'm trying to match those nodes with a query having the following criteria without having to traverse the whole repository (slow and long running operation):
if(node.getIndex() > 1) {
// this node is matching the SNS criteria
}
SNS are defined as follows:
/a/b/c
/a/b/c[2]
/a/b/c[3]
/a/b[2]/c[2]
/a/b/c[3]
/a/d/f
/a/d/f[2]
So the result of the query should include /a/b/c[2], /a/b/c[3], /a/b[2]/c[2], /a/b/c[3], /a/d/f[2].

Adobe published a helpful article for this at:
https://helpx.adobe.com/experience-manager/kb/find-sns-nodes.html
EDIT: One query for this may be as below:
SELECT [jcr:path] FROM [nt:base] WHERE ISDESCENDANTNODE('/') AND [jcr:path] like '%\]'
The idea is that oak queries will be able to find indexed nodes that were migrated via SNS resolution logic. These names will contain ] in their names (paths for URI) which will be selectable via above query.
Use this query with caution as there are a lot of system nodes OOTB that have ] in the name and this is by design.
You can change [nt:base] to other relevant oak index for better filtering.
HTH

Related

Is there an improvement to this DB schema and prefix like query?

I've been noodling over this postgres DB schema and query for a while now and I think I need a fresh set of eyes to understand if/how it can be improved. My schema and query are rather simple which is why a query time of 600-700 MS feels wrong but maybe that's just what it is.
For background I have a table of IPs that contain basic information about an IP address, and a second table containing DNS names mapped back to the IP table via a has many foreign key. An example subset of the data the query in question was run against contained ~5 million IPs with ~39 million associated domains. The table schema is shown below:
This allows queries like the one this question is about:
SELECT ips.id, domain, ip FROM "ips" JOIN domains d ON ips.id = d.ip_id WHERE d.domain like '%.ford.com' ORDER BY ips.id desc LIMIT 100
which asks the question "give me every IP which has a DNS name ending in ford.com". The order by and limit is to enable keyset pagination described here keyset pagination.
An example analysis of the query is below. It averages in the 600-700ms range as I explained in the intro with most of that (94%) being the domain fuzzy search over the gin_trgrm_ops index on the domain field.
Analysis of query
This was a classic case of rubber duck debugging! While writing this I had a great insight that greatly reduced the query times!
Suffix fuzzy searches are much faster than prefix searches. (foo% vs %bar) so what I did was use the reverse function on the domain field in the DB which results in this (ford.com => moc.drof) and switched the query to using a suffix fuzzy search:
SELECT ips.id, domain, ip FROM "ips" JOIN domains d ON ips.id = d.ip_id WHERE reverse(d.domain) like 'moc.drof.%' ORDER BY ips.id desc LIMIT 100
This results in sub 150ms queries! A good explanation of it is here: https://www.alibabacloud.com/blog/postgresql-fuzzy-search-best-practices-single-word-double-word-and-multi-word-fuzzy-search-methods_595635
I'm still open to suggestions on how to improve it but I'm pretty pleased with it!

How to dynamically create index on JSON Object properties (JSON Object props are also dynamic)

I have a scenario where I want to dynamically create index on keys of JSON Object (JSON Object attributes will vary). I am able to store the JSON Object as index (by implementing FieldBridge).
eg1: preference:{"sport":"football", "music":"pop")
eg2: preference:{"sport":"cricket", "music":"jazz", "cuisine":"mexican"}
But I am unable to query the individual fields like:
preference.sport
or preference.cuisine
Is there any way / configuration in hibernate search through which we can achieve that?
If your fields are dynamic, there is no pre-defined schema and Hibernate Search is unable to determine how to query these fields. There are significant differences in how a match query should be executed on a text field or a date field, for example.
For that reason, you cannot use the Hibernate Search Query DSL to build your queries.
However, you can use native APIs.
If you're using the Lucene integration, just creating the relevant queries yourself will work fine (as long as you create the right one):
new TermQuery(new Term("sport", "value"))
If you're using the experimental Elasticsearch integration, you can use org.hibernate.search.elasticsearch.ElasticsearchQueries.fromJson( ... ). You will have to write the whole query as JSON, though, and will not be able to take advantage of the Hibernate Search QueryBuilder at all, even for queries on statically defined fields. See https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#_queries
Better support for native queries, as well as dynamic fields with pre-defined types, which would be targetable in the Query DSL, is planned for Hibernate Search 6, but it's not there yet. See HSEARCH-3273.

Prometheus many-to-many problem for kube cronjobs

Hy there,
I'm trying to configure Kubernetes Cronjobs monitoring & alerts with Prometheus. I found this helpful guide
But I always get a many-to-many matching not allowed: matching labels must be unique on one side error.
For example, this is the PromQL query which triggers this error:
max(
kube_job_status_start_time
* ON(job_name) GROUP_RIGHT()
kube_job_labels{label_cronjob!=""}
) BY (job_name, label_cronjob)
The queries by itself result in e.g. these metrics
kube_job_status_start_time:
kube_job_status_start_time{app="kube-state-metrics",chart="kube-state-metrics-0.12.1",heritage="Tiller",instance="REDACTED",job="kubernetes-service-endpoints",job_name="test-1546295400",kubernetes_name="kube-state-metrics",kubernetes_namespace="monitoring",kubernetes_node="REDACTED",namespace="test-develop",release="kube-state-metrics"}
kube_job_labels{label_cronjob!=""}:
kube_job_labels{app="kube-state-metrics",chart="kube-state-metrics-0.12.1",heritage="Tiller",instance="REDACTED",job="kubernetes-service-endpoints",job_name="test-1546295400",kubernetes_name="kube-state-metrics",kubernetes_namespace="monitoring",kubernetes_node="REDACTED",label_cronjob="test",label_environment="test-develop",namespace="test-develop",release="kube-state-metrics"}
Is there something I'm missing here? The same many-to-many error happens for every query I tried from the guide.
Even constructing it by myself from ground up resulted in the same error.
Hope you can help me out here :)
In my case I don't get this extra label from Prometheus when installed via helm (stable/prometheus-operator).
You need to configure it in Prometheus. It calls: honor_labels: false
# If honor_labels is set to "false", label conflicts are resolved by renaming
# conflicting labels in the scraped data to "exported_<original-label>" (for
# example "exported_instance", "exported_job") and then attaching server-side
# labels.
So you have to configure your prometheus.yaml file - config with option honor_labels: false
# Setting honor_labels to "true" is useful for use cases such as federation and
# scraping the Pushgateway, where all labels specified in the target should be
# preserved
Anyway if I have it like this (I have now exported_jobs), still can't do proper query, but I guess is still because of my LHS.
Error executing query: found duplicate series for the match group
{exported_job="kube-state-metrics"} on the left hand-side of the operation:
[{__name__=
I ran into the same issue when I followed that article, but for me, I actually get duplicate job names but in different namespaces.
Ex. When running kube_job_status_start_time:
kube_job_status_start_time{instance="REDACTED",job="kube-state-metrics",job_name="job-abc-123",namespace="us"}
kube_job_status_start_time{instance="REDACTED",job="kube-state-metrics",job_name="job-abc-123",namespace="ca"}
So I had to either add a filter for the namespace or add namespace into the ON/BY clauses to get it to be unique.
e.g. for one of the subqueries I had to do this:
max(
kube_job_status_start_time
* ON(namespace, job_name) GROUP_RIGHT()
kube_job_labels{label_cronjob!=""}
) BY (namespace, label_cronjob)
Essentially had to apply that principle to all the rest of the queries for it to work for me. Not sure if that applies in your case.
Replacing kube_job_status_start_time with max(kube_job_status_start_time) by (job_name) will aggregate out any duplicates and should resolve the error.
The resulting query will look like this
max(
max(kube_job_status_start_time) by (job_name)
* ON(job_name) GROUP_RIGHT()
kube_job_labels{label_cronjob!=""}
) BY (job_name, label_cronjob)
I dug into this issue a bit more, and I guess the root cause of it is within this one-to-many vector matching expression:
kube_job_status_start_time * ON(job_name) GROUP_RIGHT() kube_job_labels{label_cronjob!=""}
where the group modifier "GROUP_RIGHT()" suggests, that each vector element from the left side (kube_job_status_start_time) can match with multiple elements on the right side (kube_job_labels), based on common label (job_name). The thing is that we are really dealing here with many-to-many matching, as each vector element from right side can match also multiple elements from left vector as well:
I think that what we are missing here is the way to uniquely identify exported Job objects from K8S by Prometheus. The author of this blog post, mentions about this feature in his setup:
...Prometheus resolves this collision of label names by including the
raw metric’s label as an exported_job label...
In my case I don't get this extra label from Prometheus when installed via helm (stable/prometheus-operator).
Regarding the missing labels - make sure that your kube-state-metrics is configured with a --metric-labels-allowlist. This is "new" since kube-state-metrics v2. See https://kubernetes.io/blog/2021/04/13/kube-state-metrics-v-2-0/#what-is-new-in-v2-0
By default, the metric contains only name and namespace labels.
But... the original guide is not woking with newer kube-state-metrics anyway. I can recommend this guide, which is a rework and does not need the labels.

Concise way to filter on two child attributes in ArangoDB (AQL / Spring Data ArangoDB)

In ArangoDB I have documents in a trip collection which is related to documents in a driver collection via edges in a tripToDriver collection, and the trip documents are also related to documents in a departure collection via edges in a departureToTrip collection.
To fetch trips where their driver has a given idNumber and their associated departure has a startTime after a supplied date/time, I've successfully written the following AQL:
FOR doc IN trip
LET drivers = (FOR v IN 1..1 OUTBOUND doc tripToDriver RETURN v)
LET departures = (FOR v in 1..1 INBOUND doc departureToTrip RETURN v
FILTER drivers[0].idNumber == '999999-9999' AND departures[0].startTime >= '2018-07-30'
RETURN doc
But I wonder if there is a more concise / elegant way to achieve the same results?
A related question, since I'm using Spring Data ArangoDB: Is it possible to achieve this result with derived queries?
For a single relation I was able to create a query like:
Iterable<Trip> findTripsByDriversIdNumber( String driverId ); but haven't had luck incorporating the departure relation into this signature (maybe because it's inbound?).
First of all your query only works if you have only one connected driver/departure for every trip. You're fetching all linked drivers but only check the first found one.
If this is your model it is totally ok, but I would recommend to do the idNumber/startTime check within the sub queries. Then, because we only need to know that at least one driver/departure fits our filter condition, we add a LIMIT 1 to the sub query and return only a true. This is enough we need for our FITLER in our main query.
FOR doc IN trip
FILTER (FOR v IN 1..1 OUTBOUND doc tripToDriver FILTER v.idNumber == #idNumber LIMIT 1 RETURN true)[0]
FILTER (FOR v IN 1..1 INBOUND doc departureToTrip FILTER v.startTime >= #startTime LIMIT 1 RETURN true)[0]
RETURN doc
I tested to solve your case with a derived query. It would work, if there wasn't a bug in the current arangodb-spring-data release. The bug is already fixed but not yet released. You can already test it using a snapshot version of arangodb-spring-data (1.3.1-SNAPSHOT or 2.3.1-SNAPSHOT depending on your Spring Data version, see supported versions).
The following derived query method worked for me with the snapshot version.
Iterable<Trip> findByDriversIdNumberAndDeparturesStartTimeGreaterThanEqual(String idNumber, LocalDate startTime);
To make the dervied query work you need the following annotated fields in your class Trip
#Relations(edges = TripToDriver.class, direction = Direction.OUTBOUND, maxDepth = 1)
private Collection<Driver> drivers;
#Relations(edges = DepartureToTrip.class, direction = Direction.INBOUND, maxDepth = 1)
private Collection<Departure> departures;
I also created an working example project on github.

Spring CRUD repository: is there findOneByMaxXYZColumn()?

My requirement:
fetch ONE object (e.g RetainInfo ) from table RETAIN_INFO if VERSION column has max value
Does CRUD repository support for an interface method like
findOneByMaxRetVersionAndCountry("DEFAULT")
Equivalent db2 sql:
select RET_ID, max(ri.RET_VERSION) from RETAIN_INFO ri where ri. COUNTRY='DEFAULT' group by RET_ID fetch first 1 rows only;
This query selects an ID, but I would actually want the RetainInfo object corresponding the SINGLE row returned by the query.
I prefer to get that without using custom query, i.e using findBy or some other method/interface supported by Spring CRUD.
You could use limiting in combination with sorting (spring data reference:limit query results). Declare a method similar to the following in your CrudRepository interface :
RetainInfo findTopByCountryOrderByRetVersionDesc(String country);
You can also use findFirst to get the first result. Before getting the result, make sure to use Orderby and then the ascending(Asc) or descending(Desc). As an example if you want to order by version and retrieve based on productName
RetainInfo findFirstByProductNameOrderByVersionDesc(String productName);
Spring Data doesn't provide an expression to select a max value. All supported query parts could be found in the Spring 1.2.0.RELEASE docs: Appendix A. Namespace reference or line 182 of org.springframework.data.repository.query.parser.Part.
Also feel free to create a feature request at Spring's Jira page.