Hibernate Search: Get all fields for faceting - hibernate-search

I want to place facet requests to (some) indexed fields. In order to place a facet request, I need to know the name of the corresponding field.
Is there a way to programmatically get a list of field names annotated with #Field?

The answer to your question will depend on the version of Search you are using. If you are using a pre Search 5 release where it was possible to facet on any #Field (with the documented restrictions), then you can use the public metadata API to get all configured fields.
The entry point into the public metadata API is via SearchFactory.getIndexedTypeDescriptor(Class<?> entityType) which returns an IndexedTypeDescriptor for the specified entity type. You can then iterate the configured properties and for each properties get the list of configured fields.
As of Search 5, however, facet fields needs to be marked at configuration time using #Facet(s). Only properties with this annotation can be faceted upon. Obviously the public metadata API should expose this as well. Unfortunately, this is not yet implemented - HSEARCH-1853.
There is a workaround if you are happy to use some internal APIs which might change in the future. You would only need this until HSEARCH-1853 is implemented at which stage you could switch to this public (and supported) API.
Search also maintains something which is called an internal metadata API which it uses for all its inner workings. It is basically just a richer model of the public API which is more restrictive on what's exposed. Bottom lines, you want to get hold of the org.hibernate.search.engine.metadata.impl.FacetMetadata. To do so you need to get hold of the DocumentBuilderIndexedEntity which gives you access to the internal org.hibernate.search.engine.metadata.impl.TypeMetadata. Via this type metadata you can get access to PropertyMetadata, then DocumentFieldMetadata and finally FacetMetadata.
To get hold of the DocumentBuilderIndexedEntity, you could do something like this:
ExtendedSearchIntegrator integrator = ContextHelper.getSearchintegratorBySFI( sessionFactory );
IndexManager[] indexManagers = integrator.getIndexBinding( clazz ).getIndexManagers();
DirectoryBasedIndexManager indexManager = (DirectoryBasedIndexManager) indexManagers[0];
EntityIndexBinding indexBinding = indexManager.getIndexBinding(clazz);
DocumentBuilderIndexedEntity documentBuilder = indexBinding.getDocumentBuilder();
Note, the internal API might change at any stage. No guarantees regarding backwards compatibility and evolution of the API are given.

Related

Providing separate OpenApi definitions

We have a service that provides 2 separate Rest API. One is a simple API that our customers use and the other one is an internal API used by a web application.
Our customers are not able to access the web API so I would like to be able to provide 2 separate OpenApi specifications, one for our customers and one for our web developers.
I found a pretty straightforward way to achieve what I want by creating an endpoint, retrieve the OpenApi document and filter out the tags belonging to the customer API.
#Inject
OpenApiDocument document;
#Operation(hidden = true)
#GET
#Produces("application/yaml")
public Response customer() throws IOException {
OpenAPI model = FilterUtil.applyFilter(new MyTagFilter("mytag"), document.get());
String result = OpenApiSerializer.serialize(model, Format.YAML);
return Response.ok(result).build();
}
One problem is that the injected OpenApiDocument instance is null in development mode. The OpenApiDocumentProducer seems to be missing some classloader magic that is present in the OpenApiHandler class. Another minor problem is that the filter “MyTagFilter” also needs to filter out Schemas not used by any tagged PathItems and the code becomes somewhat dodgy.
Is there a better way to solve my problem?
Would it be possible to fix OpenApiDocumentProducer to provide a non null and up to date OpenApiDocument in developer mode?
Similar question: Quarkus: Provide multiple OpenApi/Swagger-UI endpoints

Designing REST end-point(s) for GET request supporting different IDs

I seek suggestions regarding designing an API endpoint.
I have a table (resource) with id (PK) and some other ids, which are not unique but have not-null constraints.
Now for designing this:
For the PK search /<resourceName>/{id}
Non-PK search
2.1 /<resourceName>/someOtherIdName/{someOtherId} - using path param, distinct for different IDs
2.2 or /<resourceName>?<nameOfId>=<value> - using query param
For 2nd one, which way is better? If I use 2.2, then multiple IDs can be supported but it becomes convoluted, as I have to check the nameOfId. And what about 2.1?
Edit: For example, take transactions to be a resource, and txn_id as primary key, and txn_event_id and txn_activity_id as other IDs. The last two ids can represent a group of related transactions. Does 2.2 suits for the last two IDs?
In case of 2.1, the implementation looks like:
#Path("/transactions")
class TransactionResource {
#Path("/eventid/{event_id}")
public List getTxnWithEventId(#PathParam("event_id") String eventId) {
// do a "event_id" based search
}
#Path("/activityid/{activity_id}")
public List getTxnWithActivityId(#PathParam("activity_id") String txnActivityId) {
// do a "pin" based search
}
}
In case of 2.2, the implementation becomes something like:
#Path("/transactions")
class TransactionResource {
public List getTxnsWithAnotherId(#QueryParam("searchKey") String id,
#QueryParam("searchValue") String value) {
if("event_id".equals(id)) // do a "event_id" based search
else if("activity_id".equals(id)) // do a "activity_id" based search
else return null;
}
}
In my opinion, the 2nd option feels better for searches but why not the former if thats true?
I think it all comes down to the developer's preference. I would not go with either of the options you listed. My approach would be collectionId/resourceId/collectionId/resourceId. So in your case, it would be something like users/1/messages to get all messages of a specific user of users/1/messages/1 to get message with id of 1 for that specific user. That way, you create clearer API endpoints which can be routed more efficiently in your app and can be better documented and managed.
Have a look at how Google's API Design Guide approach this subject for their Gmail resource model:
A collection of users: users/*. Each user has the following resources.
A collection of messages: users/*/messages/*.
A collection of threads: users/*/threads/*.
A collection of labels: users/*/labels/*.
A collection of change history: users/*/history/*.
A resource representing the user profile: users/*/profile.
A resource representing user settings: users/*/settings.
For 2nd one, which way is better?
Either of these is fine for most use cases
/<resourceName>?<nameOfId>=<value>
/<resourceName>/<nameOfId>/<value>
Tomato, tomato.
One reason that you might care about the difference is in the use of relative resolution and dot segments. Dot segments are useful for traversing the hierarchical portion of the URI, which is to say the path segments.
Another reason that you might care is that the query part of a URI has not always been understood to be part of the identifier. Old versions of the HTTP spec described exceptions to the caching rules when the query part was present. In the current standard, it shouldn't make a difference.
If you are struggling with readability of the URI with data encoded into the path segments, there are a number of spelling conventions that may help -- many derive ideas from TBL's work on Matrix URIs. If your clients and servers have access to decent URI Template implementations, then a lot of the work has already been done for you.
I am not sure what your resources are specifically but here are some tips that you can keep in mind while designing RESTful APIs
Identify what the primary resource is.
For example: employees
In your first case, you'd then access employees as
GET /employees. To get all employees.
GET /employees/1. Get a specific employee with ID 1.
Search is specific to your needs. If you need to fetch multiple employees based on IDs, you could do
GET /employees?id=1,2,3,4
Alternately if you find that you will need to search based on more than one parameter, I'd recommend a POST
POST /employees/search
{
id: [1,2,3,4],
department: "computer-science"
}

How to better specify kindo fo ID in RESTful service

I'm looking for an opinion about defining contract for standard GET/PUT/POST/DELETE methods.
We have resource, let's say Client, so route will be /clients
However, we have two types of id for the client. One is the ID generated by our system. On top of that we want optionally allow customers use external id, generated by customer themselves.
So, if customer never going to add clients to the system, don't really interested about integration, and need only use method GET to read customer, endpoint will be:
/clients/{id}
However, if they want full integration, with ability to add clients, and use some their id, we want give them ability to use their own id.
We considered four possible solutions:
1. /clients/external/{externaId}
2. /clients/ext-{externalId}
3. /clients/{externalId}?use-external-id=true
4. /clients/{externalId} with additional header -"use-external-id": true
We are leaning to options 3 and 4 (can be supported simultaneously) but concerns about "restfulness" of such approach. Any opinions on this? What would you choose and why?
REST says nothing about URLs.
How different are internal and external clients? If the only difference is the existence of an externalId property, just use the /clients endpoint and add the property to your client resource. Always assign and use the internal id property in your API, but allow queries to filter by the customer-provided external id also.
How about this:
/clients/client_id/1 - for automatically generated ids
/clients/external_id/d23sa - for filtering on the external_id field
This could be extended to generically filter on any field of a resource and is the approach my company used in developing SlashDB.

REST - Getting specific property of resources in collection only

I'm developing the search functionality of my REST API and currently URI's structured as:
api/items?type=egg,potato
Let's say each item resource has 4 properties:
Id, Name, Type, Rating
What would be the most restful way to design my URI and return a subset of properties of each resource, e.g. only names of these resources?
--
The reason I ask this is I often want a less heavy duty result-set. For instance, I may build an AJAX search with dynamically populated names as dropdowns - but I do not want extra bloat coming back with each request.
REST isn't really a set of rock-solid standards, but there are some nice practices.
In this particular case I would recommend using the query parameter of an existing resource field as you are now, to select items which have the type value of egg or potato. But to select only a subset, you can introduce a field query parameter. So you can call your API like api/items?type=egg&fields=name, to get only the name field of all the resources with the egg type.
P.S
This is not my invention, I already seen this in other API's, somewhere called select. As far as I know, Facebook has this feature in its API's.

Representing multiple many-to-one relationships in REST

I am creating a RESTful API that contains the following resources shown in the following UML-ish diagram. As shown by the multiplicities (between parentheses), there are four one-to-many relationships.
I currently have the following GET methods defined:
GET /farmers
GET /farmers/[farmer_id]
GET /farms
GET /farms/[farm_id]
GET /llamas
GET /llamas/[llama_id]
GET /events
I am trying to decide what is the best and most RESTful way to access these relations, as well as accessing events related to Farms and Farmers (via Llamas). All of these relations will be made available using hypermedia links. The two options I have come up with so far are:
Multiple URIs
GET /farmers/[farmer_id]/farms
GET /farmers/[farmer_id]/llamas
GET /farmers/[farmer_id]/events
GET /farms/[farm_id]/farmer
GET /farms/[farm_id]/llamas
GET /farms/[farm_id]/events
GET /llamas/[llama_id]/farm
GET /llamas/[llama_id]/farmer
GET /llamas/[llama_id]/events
Single URIs and Filtering
GET /farms?farmer=[farmer_id]
GET /llamas?farmer=[farmer_id]
GET /events?farmer=[farmer_id]
GET /farmers/[farm_farmer_id]
GET /llamas?farm=[farm_id]
GET /events?farm=[farm_id]
GET /farms/[llama_farm_id]
GET /farmers/[llama_farmer_id]
GET /events?llama=[llama_id]
Are either of these methods preferred for REST, or can I just pick the one I like the most?
It really doesn't matter from a REST perspective, as long as you provide the links to the client.
From an implementation perspective, you may want to favor one of those options over the others if you're using a framework that uses a specific convention for automatic link relation publishing. For instance, Spring Data REST gives you the multiple URI scheme you defined above out of the box. Therefore, it would be a lot less work for you to simply go with that convention.
As Jonathon W says, it doesn't matter from a REST perspective. However, it does matter from a HTTP perspective.
From HTTP perspective, the multiple URI represents a unique, potentially-cachable resource. Cache control headers will obviously control any caching. However, Most proxies will not cache URIs with a query string, so the single URI with filtering option will generally not be cachable.
You also need to consider what are the consequences of not including the query string parameters. Ie. What would /events output? If the llama_id parameter is required, this is not evident.
Personally, I would go for the multiple URI option