What are the use cases to store ReadableStream in the distributed data store like Cloudflare Workers KV? - distributed-computing

Cloudflare's own globally distributed data store – Workers KV – can accept data of three "types": string, ArrayBuffer and ReadableStream.
While the use cases for the former two are clear enough, I am struggling to figure out how stored ReadableStream could be useful. I am familiar with the concept: using it you can "stream" different values over time, but what is the deal to put this in the data store? What are typical scenarios?

The difference between passing a string, ArrayBuffer, or ReadableStream is not what data is stored, but rather how the data gets there. Note that you can store data as a string and then later read it as an ArrayBuffer or vice versa (strings are converted to/from bytes using UTF-8). When you pass a ReadableStream to put(), the system reads data from the stream and stores that data; it does not store the stream itself. Similarly, when using get(), you can specify "stream" as the second parameter to get a ReadableStream back; when you read from this stream, it will produce the value's content.
The main case where you would want to use streams is when you want to directly store the body of an HTTP request into a KV value, or when you want to directly return a KV value as the body of an HTTP response. Using streams in these cases avoids the need to hold the entire value in memory all at once; instead, bytes can stream through as they arrive.
For example, instead of doing:
// BAD
let value = await request.text();
await kv.put(key, value);
You should do this:
await kv.put(key, request.body);
This is especially important when the value is many megabytes in size. The former version would read the entire value into memory to construct one large string (including decoding UTF-8 to UTF-16), only to immediately write that value back out into a KV (converting UTF-16 back to UTF-8). The latter version copies bytes straight from the incoming connection into KV without ever storing the whole value in memory at once.
Similarly, for a response, instead of doing:
// BAD
let value = await kv.get(key);
return new Response(value);
You can do:
let value = await kv.get(key, "readableStream");
return new Response(value);
This way, the response bytes get streamed from KV to the HTTP connection. This not only saves memory and CPU time, but also means that the client starts receiving bytes faster, because your Worker doesn't wait until all bytes are received before it starts forwarding them.


utf-8 gets stored differently on server side (JAVA)

Im trying to figure out the answer to one of my other questions but anyways maybe this will help me.
When I persist and entity to the server, the byte[] property holds different information than what I persisted. Im persisting in utf-8 to
the server.
An example.
This is the payload I send to the server.
This is what the server has
as you can see the image byte array is different.
WHat im trying to do it get the image bytes saved on the server and display them on the front end. But i dont know how to get the original bytes.
No, you are wrong. Both method stored the ASCII string [object ArrayBuffer].
You are confusing the data with its representation. The data it is the same, but on both examples, you represent binary data in two different way:
The first as an array of bytes (decimal representation), on the second a classic for binary data representation: BASE64 (you may discover it because of final character =.
So you just have different representation of the same data. But so the data is stored on the same manner.
You may need to specify how to get binary data in string form (as in your example), and so the actual representation.

What is the maximum size of JWT token?

I need to know the maximum length of
JSON Web Token (JWT)
In specs there are no information about it. Could be that, there are no limitations in length ?
I've also been trying to find this.
I'd say - try and ensure it's below 7kb.
Whilst JWT defines no upper limit in the spec (http://www.rfc-editor.org/rfc/rfc7519.txt) we do have some operational limits.
As a JWT is included in a HTTP header, we've an upper limit (SO: Maximum on http header values) of 8K on the majority of current servers.
As this includes all Request headers < 8kb, with 7kb giving a reasonable amount of room for other headers. The biggest risk to that limit would be cookies (sent in headers and can get large).
As it's encrypted and base64ed there's at least 33% wastage of the original json string, so do check the length of the final encrypted token.
One final point - proxies and other network appliances may apply an abitrary limit along the way...
As you said, there is no maximum length defined in the RFC7519 (https://www.rfc-editor.org/rfc/rfc7519) or other RFCs related to JWS or JWE.
If you use the JSON Serialized format or JSON Flattened Serialized format, there is no limitation and there is no reason to define a limitation.
But if you use the JSON Compact Serialized format (most common format), you have to keep in mind that it should be as short as possible because it is mainly used in a web context. A 4kb JWT is something that you should avoid.
Take care to store only useful claims and header informations.
When using heroku the header will be limited at 8k. Depending of how much data are you using on jwt2 it will be reach. The request, when oversize, will not touch your node instance, heroku router will drop it before your API layer..
When processing an incoming request, a router sets up an 8KB receive
buffer and begins reading the HTTP request line and request headers.
Each of these can be at most 8KB in length, but together can be more
than 8KB in total. Requests containing a request line or header line
longer than 8KB will be dropped by the router without being
See: Heroku Limits

can ZooKeeper get znode data and znode data version (stat) in one single operation?

I am developing an application that use ZooKeeper as the datastore. For one of the methods in the application, I need to use the optimistic concurrent control. For example, I need to implement a get method which get the znode data, and I use the znode data version for the optimistic concurrent control check. For what I understand, one can't get the znode data and znode data version in one single operation. If there is high contention to update the znode data, the get method will not work since the znode data might changed after getting the znode data. so I am asking - is there a way I get can the znode data and znode data version (or znode stat) in one single operation without any locking attempt in between?
In Java, you can can achieve what you want easily:
Stat stat = new Stat();
byte[] data = zk.getData("/path", null, stat));
This does read data and version information (in the stat object) in a single operation. When you write back the data, you pass the version number you got when you read it:
zk.setData("/path", data, stat.getVersion());
If there is a version mismatch, the method will throw KeeperException.BadVersionException, which gives you an optimistic lock.
In Python using Kazoo it is also trivial to get both stats and implement some optmistic locking. Here a sketch:
while True:
data, stat = zk.get("/path")
# do something with the data and then:
zk.set("/path", new_data, stat.version)
except BadVersionError:
continue # or pass
Also, do use pre-made recipes when you can, as they are already extensively debuged, and should treat all corner cases.

Remove read data for authenticated user?

In DDS what my requirement is, I have many subscribers but the publisher is single. My subscriber reads the data from the DDS and checks the message is for that particular subscriber. If the checking success then only it takes the data and remove from DDS. The message must maintain in DDS until the authenticated subscriber takes it's data. How can I achieve this using DDS (in java environment)?
First of all, you should be aware that with DDS, a Subscriber is never able to remove data from the global data space. Every Subscriber has its own cached copy of the distributed data and can only act on that copy. If one Subscriber takes data, then other Subscribers for the same Topic will not be influenced by that in any way. Only Publishers can remove data globally for every Subscriber. From your question, it is not clear whether you know this.
Independent of that, it seems like the use of a ContentFilteredTopic (CFT) is suitable here. According to the description, the Subscriber knows the file name that it is looking for. With a CFT, the Subscriber can indicate that it is only interested in samples that have a particular value for the file_name attribute. The infrastructure will take care of the filtering process and will ensure that the Subscriber will not receive any data with a different value for the attribute file_name. As a consequence, any take() action done on the DataReader will contain relevant information and there is no need to check the data first and then take it.
The API documentation should contain more detailed information about how to use a ContentFilteredTopic.

Which data-store to use, to store meta data corresponding to the keys in memcache?

I have a memcache backend and i want to add redis for adding the meta data of the keys of the memcache.
Meta data is as follows:
Miss_count: The number of times the data was not present in the memcache.
Hash_value: The hash value of the data corresponding to the key in the memcache.
Data in memcache : key1 ::: Data
Meta data (miss count) : key1_miss ::: 10
Meta data (hash value) : key1_hash ::: hash(Data)
Please provide help as in which data store is preferable as when i store the meta data in the memcache itself, the meta data is removed well before its expiry time as the size of the meta data is small and the slab allocation is allocating a small memory chuck to it.
As the meta data will increase with time, the hash concept of the redis will fail. Therefore apply a client logic to see that the max_zipped is satisfied.
If I understand your use case correctly I suspect Redis might be a good choice. Assuming you'll be periodically updating the meta data miss counts associated with the various hashes over time, you'd probably want to use Redis sorted sets. For example, if you wanted the miss counts stored in a sorted set called "misscounts", the Redis command to add/update those counts would be one and the same:
zadd misscounts misscount key1
... because zadd adds the entry if one doesn't already exist or overwrites an existing entry if it does. If you have a hook into the process that fires each time a miss occurs, you could instead use:
zincrby misscounts 1 key1
Similar to the the zadd command behavior, zincrement will create a new entry (using the increment value as the count) if one doesn't exist, or increment the existing count by the increment value you pass if an entry does exist.
Complete documentation of Redis commands can be found here. Descriptions of the different types of storage options in Redis is detailed here.
Oh, and a final note. In my experience, Redis is THE SHIT. Sorry to curse (in caps), but there's simply no other way to do Redis justice. We call our Redis server "honey badger", because when load starts increasing and our other servers start auto-scaling, honey badger just don't give a shit.