Marklogic DHF5:- Can we return multiple content for one document in main.sjs - nosql

We are migrating from MarkLogic DHF4 to DHF5 (Data Hub Framework)
We are having scenario where for one entity, depending on criteria we need to create more than one harmonize document for a single input document. This scenario was possible in legacy flow where we used to call write:write method.
But in DHF5 implementation we are supposed to return only one content for one input document getting processed in main module main.sjs.
Is there a way where we can create multiple harmonize document as required from one input document in DHF5?
We expect that one input document should be able to create more than one harmonized document in single step(in main.sjs)

Your custom module is expected to return a Content object or an array of them. Each Content object must have uri, value, and context properties and may also specify provenance.
Based on this, your one input can be turned into multiple outputs.
See the "Required Outputs" section on this page for more details.
Your module must return the following:
For a Custom-Ingestion step, a Content object.
For all other custom steps, a Content object or an array of Content objects.

Related

Overwrite vs merge difference in Firebase

From reading the documentation for the set document operation, it seems to me that overwrite and merge mean the same thing which is to replace existing data with new ones. I'm not understanding the difference between the two, even they are phrased to have separate meanings here.
Specifically where it says
If the document does exist, its contents will be overwritten with the newly provided data, unless you specify that the data should be merged into the existing document, as follows:
.set(), by default will OVERWRITE any existing document, and will create a document if it does not exist. By default, if you do NOT specify a field in your data, the existing data in that field will be DELETED - the incoming data will become the new DOCUMENT, and all previous fields/data will be lost.
.set() can OPTIONALLY be provided with an object, one field of which is "merge". IF this field exists, and IF it is set to true, the ONLY fields specified in the new data object will be added and/or overwritten. Remember that Firestore documents are effectively Maps, and fieldnames must be unique. New fields will be added; existing fields will be overwritten; fields that are NOT present in the data object will remain - hence "merge".
.update() REQUIRES that the document ALREADY exist, and will fail if it does not. By default, it will ONLY write to the fields specified as arguments to .update()
A set-and-merge will create the document if it doesn't exist, where an update call will fail in that scenario.
There may be more differences that I'm not aware of, but this is my main reason for picking one over the other.

Document versus Dynamic Document Property

What is the key difference between Document and Dynamic Document Property?
We were trying to use these properties in Boomi atomsphere process development but not able to decide which is the best to use.
Document properties: - are additional information or "metadata" about an individual document as it executes through a process. The values contained in properties are separate from the document's actual data contents. These properties remain with a given document as it progresses through the various process steps, even as the document data itself is manipulated through Map steps or outbound connector calls.
There are two types of document properties:
standard document properties :contain run-time specific information such as connector or trading partner details.
Dynamic document properties : are arbitrary values that you can use to temporarily store related values.
Document Properties: Document properties are additional information or "metadata" about an individual document as it executes through a process.
Usage: I have used document properties of Disk, FTP, SFTP for creating file names. Also document properties of Mail can be used to create Body, filename, subject, from Address and To Address.
Dynamic Document Properties: Dynamic document properties are properties that the process developer can define and use to temporarily store additional pieces of information about a given document.
Usage: I have used this to create an ALL_INDEX to get all the data from data cache

RESTful url - getting new subentity

There are 2 models: Entity and Subentity. Entity can have many connected Subentities (one:many relation).
There is a method on server that returns new Subentity (let's call it GetEmptySubentity). Point is, when you want to create new Subentity, you press a button, and model comes from server with some fields pre-filled. Some of those Subentity pre-filled values depend on according Entity, so I need to pass an Entity id in this request.
So should the correct url to get the empty Subentity be like /Entity/{id}/Subentity/empty? Or I am getting something wrong?
Yes you are. According to the uniform interface / hateoas constraint you should send hyperlinks to your REST clients and they should use the API by following those hyperlinks. In order to do this you need a hypermedia format, for example HTML, ATOM+XML, HAL+JSON, LD+JSON & Hydra, etc... (use google). So by HTML the result should contain a HTML form with input fields having default values, etc... You should add semantics to that for with RDFa and so by processing the HTML your REST client will know, that the link is about creating a new resource. Ofc it is easier to parse the other hypermedia formats. By them you can use the same concept with RDF (by JSON-LD or ATOM for example), or you can use link relations with vendor specific MIME types (by HAL or ATOM for example), or your custom solution which describes those input fields. So you usually get the necessary information with the hyperlink, and you don't have to send another request to get the default values.
If you want to make things complicated, then you can send a request for the default values to the entity itself in order to send the values of properties, and not to send a form with input fields. Optionally you can send a request which returns the entire link, for example GET /Entity/{id}/SubEntity/offset=0&count=0 can return an empty array of subentities and the form for creation. You can use additional query or path parameters if that form is really big, and you don't want to send it with every response related to the SubEntity collection. The URL specification says only that the path should contain the hierarchical part and the query should contain the non-hierarchical part of the URL.
Btw. REST is just a delivery method, you don't have to map it to your database entities. The REST resource and URL structure can be completely different from your database, since you can use any type of data storage mechanisms with REST, even the file system...

Elastic Search completion suggester configuration

I'm trying to set autocomplete/suggestions on my site's search form, using Elastic Search's completion suggester feature.
I have a list of products, which are grouped by categories (on multiple levels). The search feature should be able to suggest category names, which are of more interest to users than direct products.
Several of these categories have the same name but a different parent (e.g. 'milk' under parent category 'dairy products' and 'milk' under category 'baby'). When the user selects a category suggestion, she's redirected to another page, with more specific results than mere search method.
Additional metadata (url to redirect to, parent category id/name) are added in the payload field.
I use the output field to return normalized suggestions for different inputs. As stated in the documentation:
"The result is de-duplicated if several documents have the same output,
i.e. only one is returned as part of the suggest result."
But as explained, my outputs may have the same value, while being different results, as they will link to different pages. It is also possible in the future that different category levels will yield different actions.
I am reluctant to differentiate things by adding the full string (i.e. "milk in dairy products") as the output, because:
1. The parent category is conceptually not the output itself but a related metadata.
2. I intend to have some formatting in the results, this forces me to parse the output string to add HTML tags in it.
So, is it possible to deactivate the de-duplication?
One workaround I'm thinking of if it's not possible is to store a stringified json object in the output, with all the data 'll need, both the one displayed in the search form and the metadata currently in the payload. But Id' rather look into existing configuration before resorting to that.

Where does Tridion store Metadata values?

When we define custom metadata for the components, it is my understanding that this user-given metadata is stored in SQL server. And it is not visible in the component xml. Can anyone explain how exactly a metadata linked to a component will actually get stored?
A Component definition in Tridion has two types of fields: Content fields and Metadata fields. Both field types are stored in the Content Manager database (either SQL Server or Oracle). And both field types are retrieved whenever you read the Component back from Tridion through any of its APIs (TOM, TOM.NET or Core Service).
Only the Content fields are shown in the Source tab of a Component edit window, but the Metadata fields are visible on the Metadata tab of that same window.
If you want to have a single view of the XML of both Metadata and Content fields (as well as many other properties of you Component in Tridion), consider installing the PowerTools or the Item XML extension.
I think you may be confusing things a bit.
The Metadata is always stored as part of the component - under tcm:Metadata. When you publish this component, then the metadata fields will also be available for querying on the Content Delivery Data Store.
Whether these fields get displayed as part of the component presentation depends on your templates. There's nothing stopping you from including these values in the output of your template (typical use case for SEO, for instance).
In summary:
In the CM, the Metadata is stored together with the Component
In the CD, the Metadata is stored as part of the "CUSTOM_META"
associated with this component.
Just a note,
There is another metadata that is not stored as metadata fields, which is the system metadata, such as Last Modified Date or the user that last modified the component. That's metadata in the CMS. Also there is system metadata in the front-end (broker or file system metadata) that gets published when you publish a given component, such as Last Published Date.
You can leverage/use the system metadata in your templates as well.