When to use copy API instead of rewrite API? - copy

According to the GCS documentation,
Generally, you should use the rewrite method instead of the copy method: the copy method uses the rewrite method, but calls it exactly once. Larger objects can require multiple rewrite calls, so copy attempts of such objects can lead to Payload too large errors.
Reading that it seems I should always be using the rewrite method. copy is a little easier to implement and it inherently limits the size of the object that can be copied, but as long as I've implemented rewrite anyways and I don't want GCS to limit my object size,
I'm wondering if there's any case in which using the copy method has an advantage or what scenarios should I use the copy method instead of rewrite?

TLDR: The copy method has no advantages over the rewrite method.
copy is meant to be used for very quick operations, or small objects, because of how it is implemented.
The copy REST API call has a known issue of DEADLINE EXCEEDED error for larger files.
For longer operations, it is recommended using rewrite instead of copy.

Related

In service fabric do you need to call CommitAsync when just reading a value?

When using a reliable dictionary in service fabric, you have to access everything within a transaction. However when reading you are not changing anything, so it doesn't seem to make any sense to call CommitAsync.
Are there any downsides of not calling CommitAsync?
No, there are no downsides. Just try to free your transaction as fast as possible.
By the way, I have noticed that sometimes somehow values that are returned from reliable collection could be modified. It was a list in my case, so I was unable to enumerate it properly. Pretty sure there were no my code that could modify that value. So my suggestion is to make a copy of a value if it is a reference type and work with that copy, freeing transaction, even if you are not going to modify the value.

Drools 6 Fusion Notification

We are working in a very complex solution using drools 6 (Fusion) and I would like your opinion about best way to read Objects created during the correlation results over time.
My first basic approach was to read Working Memory every certain time, looking for new objects and reporting them to external Service (REST).
AgendaEventListener does not seems to be the "best" approach beacuse I dont care about most of the objects being inserted in working memory, so maybe, best approach would be to inject particular "object" in some sort of service inside DRL. Is this a good approach?
You have quite a lot of options. In decreasing order of my preference:
AgendaEventListener is probably the solution requiring the smallest amount of LOC. It might be useful for other tasks as well; all you have on the negative side is one additional method call and a class test per inserted fact. Peanuts.
You can wrap the insert macro in a DRL function and collect inserted fact of class X in a global List. The problem you have here is that you'll have to pass the KieContext as a second parameter to the function call.
If the creation of a class X object is inevitably linked with its insertion into WM, you could add the registry of new objects into a static List inside class X, to be done in a factory method (or the constructor).
I'm putting your "basic approach" last because it requires much more cycles than the listener (#1) and tons of overhead for maintaining the set of X objects that have already been put to REST.

MarkLogic "XDMP-FRAGTOOLARGE" error while storing 200MB+ File using REST

When i try to store a 200MB+ xml file to marklogic using REST it gives the following error "XDMP-FRAGTOOLARGE: Fragment of /testdata/upload/submit.xml too large for in-memory storage".
I have tried the Fragment Roots and Fragment Parents option but still gets the same error.
But when i store the file without '.xml' extension in uri. it saves the file but not Xquery operations can be performed on it.
MarkLogic won't be able to derive the mime from the uri without extension. It will then fall back to storing it as binary.
I think that if you would use xdmp:document-load from QConsole, you might be able to load it correctly, as that will not try to hold the entire document in memory first. It won't help you much though, you will likely hit the same error elsewhere. The REST api will have to pass it through in memory, so that won't work like this.
You could raise memory settings in the Admin UI, but you are generally better off by splitting your input. MarkLogic Content Pump (MLCP) will allow you to do so using the aggregate_record options. That will split the file into smaller pieces based on a particular element, and store these as separate documents inside MarkLogic.
HTH!

Does CGDataProviderCopyData() actually copy the bytes? Or just the pointer?

I'm running that method in quick succession as fast as I can, and the faster the better, so obviously if CGDataProviderCopyData() is actually copying the data byte-for-byte, then I think there must be a faster way to directly access that data...it's just bytes in memory. Anyone know for sure if CGDataProviderCopyData() actually copies the data? Or does it just create a new pointer to the existing data?
The bytes are copied.
Internally a CFData is created (CFDataCreate) with the content of the data provider. The CFDataCreate function always make a copy.
Anyone know for sure if CGDataProviderCopyData() actually copies the data? Or does it just create a new pointer to the existing data?
Those are the same thing. Pointers are memory addresses; by definition, if you have the same data at two addresses, it is the same data in two places, so you must have copied it (either from one to the other or to both from a common origin).
So, let's restate the question accordingly:
Or does it just copy the existing pointer?
Quartz can't necessarily do this, because data providers do not necessarily provide an existing pointer, as they can be implemented as essentially stream-based (sequential) providers instead.
What about direct-access providers? Even those need not cough up a byte pointer; the provider may simply offer range-on-demand access instead.
But what if it does offer a byte pointer? Well, the documentation for that says:
You must not move or modify the provider data until Quartz calls your CGDataProviderReleaseBytePointerCallback function.
So, conceivably, Quartz could reuse the pointer. But what if you release the data provider (causing your ReleaseBytePointer callback to be called) before you release the data?
This could still be safe if Quartz implements a private custom subclass of CFData or NSData that either implements faulting or takes over the job of calling ReleaseBytePointer, so that if you create a direct-access provider and create a CFData from it and release the provider, you can still use the CFData object.
But that's a lot of ifs. They probably just create a plain old (bytes-copying-at-creation-time) CFData, which makes it a valid performance concern.
Profile it and see how much pain it's causing you. If it's enough to worry about, then you need some solutions:
You could just implement ReleaseBytePointer as a no-op (empty function body) and release the bytes separately, making sure to do so after releasing both the provider and the data. In theory, prevents the bytes from going away out from under the CFData if it is using the original bytes pointer and Quartz doesn't implement a custom CFData subclass. A little hairy. Unfortunately, Apple can't really rely on you doing this, so I doubt it will actually help.
Handle an NS/CFData directly instead. Create the data provider only to pass it to Quartz, and release it and forget about it immediately thereafter (not own it yourself).
Depending on your needs, you may prefer to keep your callbacks structure in an instance variable and call them directly to copy parts of the data. Of course, if this solution works for you, then you don't have the problem described above anyway, since you aren't creating a here-you-can-have-my-bytes-pointer direct-access data provider.
The documentation for CGDataProviderCreateWithCFData doesn't say whether it returns a direct-access data provider or not, so you'll have to err on the side of caution if that's how you're creating your data provider.

Passing and Handling Large Objects in Workflows

I am having to pass a fairly large object/file to a workflow when it starts (in the order of hundreds of MBs). I am using secondary storage to dump the object and have as little of it as possible in the RAM at one time on Workflow side. Is there another way to pass and handle the object which is more efficient. Does WF provide any built in function to handle such situations?
what about passing the URI to that object instead ?