Embedding version control of the weaviate

Embedding version control of the weaviate - version-control

In Weaviate, the vector engine, I wonder how this can handle version issue of embedding model.
For instance, considering the (trained) word2vec model, embedded vectors from different models must be seperated.
One option might think is that make distinct multiple classes representing model version.
Custom script may useful. If new model available, create new class and import accorded data. After that, change (GET) entrypoints (used for searching nearest vectors) to the new class.
Or maybe weaviate have other fancy way to handle this issue, but I couldn't find.

As at version 1.17.3, you have to manage this yourself because weaviate only supports one embedding per object.
There is a feature request to allow multiple embeddings per object here. But it sounds like your request is closer to this one. In any case, have a look at them and upvote the one that addresses your need so the engineering team can prioritize accordingly. Also, feel free to raise a new feature request if neither of these addresses your needs.

Related

.Net Core Rest API Request/Response best practice

I need some advice on how to best structure the requests and the responses for my Rest API.
I'm mostly trying to limit myself to CRUD operations on one resources and I work with one object: for example if the ressource is "book" I end up with the following actions in the controller
[HttpPost("books")] Book Create(Book book)
[HttpGet("books")] Book Get(int id)
This is relatively strait forward.
Now for a more complex example for the creation of a resource, I need to receive a complexe object different from my ressource and return an object containing the resource and extra data
For example for the Order resource I have a the following action in the controller:
[HttpPost("/order")] CreateOrderResponse CreateOrder(CreateOrderRequest createOrderRequest)
Here my action will use the "CreateOrderRequest" object to create to build an Order.
Then I would like to return a "createOrderResponse" object which contains the Order but also extra information that the client needs.
I'm not sure this is the best way to go, any advice ?
Thanks in advance for your help

I prefer the following:
[HttpPost("/order")] CreateOrderResponse CreateOrder(CreateOrderRequest createOrderRequest)
And here is why:
By this method, you are able to protect your public API from implementation details. If you expose your model to your API then you cannot make the same guarantee.
You can also make your validations specific to the request format. In some cases, you might require one subset of your model when creating a record and another subset when editing data. This approach will allow you to handle that scenario as well.
Security. Were you going to add that Book right to a DbContext and save it? Or attach it and update directly? Those would be potential issues from security and data quality perspectives.
But there are downsides:
This approach is time consuming. It may not be worth the time invested if you are writing something as a learning exercise or a quick implementation. And it adds complexity. But then, you might find complexity when you realize your Book object is insufficent in all cases.
You will feel like there is duplicate code in different places. The code may appear to be the same, but the use cases are actually different and may diverge over time. Having a Book parameter will be a liability at that point.

How to "flow" tagged values in Enterprise Architect from one instance to another

My questions is about bringing a concept to reality through technical availability of EA.
I am looking for a way to connect instances at an object diagram through which I can transfer tagged values. Let me explain the background of the project.
Purpose is to first have Stereotypes for specific roles in the system, such as "Calculation", "Transmission", "Decision", "Qualification", "Abstraction" etc.
Each of these stereotypes have specific tagged values suitable for their purpose.
Then I am creating instances from these stereotypes, eg. "MotorTorque:Calculation" and "LimitedTorque:Abstraction"
Each of these instances have a common tagged value, "criticality", boolean and I want this tagged value to progress from "MotorTorque:Calculation" to "LimitedTorque:Abstraction" through an output port > some sort of flow > input port kind of way.
Questions are:
1- Is this approach technically achievable in EA? If so what would be the correct way to do it?
2- The purpose is to have this "connection" readable in XMI export of the diagram which I will be using as an input for another purpose.
I have created an MDG Technology for my project with stereotypes and tagged values, however, I am having difficulty achieving this "connection", this "flow" of values.
Thank you for your time.

What you are asking for is not directly achievable. However, many ways lead to Rome.
One way would be to <<trace>> connect those objects to a Status class (or what ever you like to name it) and have this carry the "shared TV".
Another way is (by far more complex) to use an add-in. You would anyway need ways to create groups which share the TV. From your current explanation I can't see what that might be. Maybe the instantiating class of those instances? If so, you make a script that propagates a TV setting from ist current to all other linked instances. I'm not sure if the add-in events fire when a TV is changes (I do have some doubts here). If needed I could look that up.

What you propose is partially feasible.
There is a tagged value inheritance chain in EA, in which tagged values are inherited down the generalization chain, and from a classifier to its instances. In the GUI, inherited tagged values are shown separately from the instance's own ones, and in the API they are accessed using the Element.TaggedValuesEx property. Inherited tagged values can also be overridden.
Since the correct way to create a port (or part/property) is to make it an instance of a component, a port will inherit any tagged values from that component. So if your Calculation stereotype applies to component, ports which are instances of Calculation components will inherit the MotorTorque tagged value.
However, there is no way to "flow" tagged values from one port to another. If you want such a function, you'll have to implement it yourself with an Add-In.
Regarding XMI, first you must understand that an XMI export is based on a package, not a diagram. The XMI format itself is extensible, which means that different tool vendors create their own extensions which are typically not publicly documented. Crucially, diagram layouts are part of these non-standardized extensions. In EA's case, the image data is some sort of UU-encoded bitmap which you won't be able to extract any useful information from.
Elements' tagged values are included in an XMI export, but again, the EA extensions are not publicly documented. In other words, you can import EA:s XMI format in another program, but you will need to reverse-engineer the format. Not impossible, but it's probably better to either write your own specialized export function, or export via CSV. Note, however, that CSV export cannot be automated -- there's no call for it in the API.

EventStore basics - what's the difference between Event Meta Data/MetaData and Event Data?

I'm very much at the beginning of using / understanding EventStore or get-event-store as it may be known here.
I've consumed the documentation regarding clients, projections and subscriptions and feel ready to start using on some internal projects.
One thing I can't quite get past - is there a guide / set of recommendations to describe the difference between event metadata and data ? I'm aware of the notional differences; Event data is 'Core' to the domain, Meta data for describing, but it is becoming quite philisophical.
I wonder if there are hard rules regarding implementation (querying etc).
Any guidance at all gratefully received!

Shamelessly copying (and paraphrasing) parts from Szymon Kulec's blog post "Enriching your events with important metadata" (emphases mine):
But what information can be useful to store in the metadata, which info is worth to store despite the fact that it was not captured in
the creation of the model?
1. Audit data
who? – simply store the user id of the action invoker
when? – the timestamp of the action and the event(s)
why? – the serialized intent/action of the actor
2. Event versioning
The event sourcing deals with the effect of the actions. An action
executed on a state results in an action according to the current
implementation. Wait. The current implementation? Yes, the
implementation of your aggregate can change and it will either because
of bug fixing or introducing new features. Wouldn’t it be nice if
the version, like a commit id (SHA1 for gitters) or a semantic version
could be stored with the event as well? Imagine that you published a
broken version and your business sold 100 tickets before fixing a bug.
It’d be nice to be able which events were created on the basis of the
broken implementation. Having this knowledge you can easily compensate
transactions performed by the broken implementation.
3. Document implementation details
It’s quite common to introduce canary releases, feature toggling and
A/B tests for users. With automated deployment and small code
enhancement all of the mentioned approaches are feasible to have on a
project board. If you consider the toggles or different implementation
coexisting in the very same moment, storing the version only may be
not enough. How about adding information which features were applied
for the action? Just create a simple set of features enabled, or map
feature-status and add it to the event as well. Having this and the
command, it’s easy to repeat the process. Additionally, it’s easy to
result in your A/B experiments. Just run the scan for events with A
enabled and another for the B ones.
4. Optimized combination of 2. and 3.
If you think that this is too much, create a lookup for sets of
versions x features. It’s not that big and is repeatable across many
users, hence you can easily optimize storing the set elsewhere, under
a reference key. You can serialize this map and calculate SHA1, put
the values in a map (a table will do as well) and use identifiers to
put them in the event. There’s plenty of options to shift the load
either to the query (lookups) or to the storage (store everything as
named metadata).
Summing up
If you create an event sourced architecture, consider adding the
temporal dimension (version) and a bit of configuration to the
metadata. Once you have it, it’s much easier to reason about the
sources of your events and introduce tooling like compensation.
There’s no such thing like too much data, is there?

I will share my experiences with you which may help. I have been playing with akka-persistence, akka-persistence-eventstore and eventstore. akka-persistence stores it's event wrapper, a PersistentRepr, in binary format. I wanted this data in JSON so that I could:
use projections
make these events easily available to any other technologies
You can implement your own serialization for akka-persistence-eventstore to do this, but it still ended up just storing the wrapper which had my event embedded in a payload attribute. The other attributes were all akka-persistence specific. The author of akka-persistence-eventstore gave me some good advice, get the serializer to store the payload as the Data, and the rest as MetaData. That way my event is now just the business data, and the metadata aids the technology that put it there in the first place. My projections now don't need to parse out the metadata to get at the payload.

Are Operational Transformation Frameworks only meant for text?

Looking at all the examples of Operational Transformation Frameworks out there, they all seem to resolve around the transformation of changes to plain text documents. How would an OT framework be used for more complex objects?
I'm wanting to dev a real-time sticky notes style app, where people can co-create sticky notes, change their positon and text value. Would I be right in assuming that the position values wouldn't be transformed? (I mean, how would they, you can't merge them right?). However, I would want to use an OT framework to resolve conflicts with the posit-its value, correct?

I do not see any problem to use Operational Transformation to work with Complex Objects, what you need is to define what operations your OT system support and how concurrency is solved for them
For instance, if you receive two Sticky notes "coordinates move operation" from two different users from same 'client state', you need to make both states to converge, probably cancelling out second operation.
This is exactly the same behaviour with text when two users generate two updates to delete a text range that overlaps completely, (or maybe partially), the second update processed must be transformed against the previous and the resultant operation will only effectively delete a portion of the original one, (or completely cancelled with a 'no-op')
You can take a look on this nice explanation about how Google Wave Operational Transformation works and guess from this point how it should work your own implementation

See the following paper for an approach to using OT with trees if you want to go down that route:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.74
However, in your particular case, I would use a separate plain text OT document for each stickynote and use an existing library, eg: etherPad, to do the heavy lifting. The positions of the notes could then be broadcast on a last-committer-wins basis.

Operation Transformation is a general technique, it works for any data type. The point is you need to define your transformation functions. Also, there are some atomic attributes that you cannot merge automatically like (position and background color) those will be mostly "last-update wins" or the user solves them manually when there is a conflict.
there are some nice libs and frameworks that provide OT for complex data already out there:
ShareJS : library for Node which provides all operations on JSON objects
DerbyJS: framework for NodeJS, it uses ShareJS for OT stuff.
Open Coweb framework : Dojo foundation project for cooperative web applications using OT

Lazy and Deferred TreeViewer questions

I have actually two questions but they are kind of related so here they go as one...
How to ensure garbage collection of tree nodes that are not currently displayed using TreeViewer(SWT.VIRTUAL) and ILazeTreeContentProvider?
If a node has 5000 children, once they are displayed by the viewer they are never let go,
hence Out of Memory Error if your tree has great number of nodes and leafs and not big enough heap size.
Is there some kind of a best practice how to avoid memory leakages, caused by never closed view holding a treeviewer with great amounts of data (hundreds of thousands objects or even millions)?
Perhaps maybe there is some callback interface which allow greater flexibility with viewer/content provider elements?
Is it possible to combine deffered (DeferredTreeContentManager) AND lazy (ILazyTreeContentProvider) loading for a single TreeViewer(SWT.VIRTUAL)?
As much as I understand by looking at examples and APIs, it is only possible to use either one at a given time but not both in conjunction, e.g. ,
fetch ONLY the visible children for a given node AND fetch them in a separate thread using Job API. What bothers me is that Deferred approach
loads ALL children. Although in a different thread, you It still load all elements
even though only a minimal subset are displayed at once.
I can provide code examples to my questions if required...
I am currently struggling with those myself so If I manage to come up with something in the meantime I will gladly share it here.
Thanks!
Regards,
Svilen

I find the Eclipse framework sometimes schizophrenic. I suspect that the DeferredTreeContentManager as it relates to the ILazyTreeContentProvider is one of these cases.
In another example, at EclipseCon this past year they recommended that you use adapter factories (IAdapterFactory) to adapt your models to the binding context needed at the time. For example, if you want your model to show up in a tree, do it this way.
treeViewer = new TreeViewer(parent, SWT.BORDER);
IAdapterFactory adapterFactory = new AdapterFactory();
Platform.getAdapterManager().registerAdapters(adapterFactory, SomePojo.class);
treeViewer.setLabelProvider(new WorkbenchLabelProvider());
treeViewer.setContentProvider(new BaseWorkbenchContentProvider());
Register your adapter and the BaseWorkbenchContentProvider will find the adaption in the factory. Wonderful. Sounds like a plan.
"Oh by-the-way, when you have large datasets, please do it this way", they say:
TableViewertableViewer = new TableViewer(parent, SWT.VIRTUAL);
// skipping the noise
tableViewer.setItemCount(100000);
tableViewer.setContentProvider(new LazyContentProvider());
tableViewer.setLabelProvider(new TableLabelProvider());
tableViewer.setUseHashlookup(true);
tableViewer.setInput(null);
It turns out that first and second examples are not only incompatible, but they're mutually exclusive. These two approaches where probably implemented by different teams that didn't have a common plan or maybe the API is in the middle of a transition to a common framework. Nevertheless you're on your own.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse