Apache geode multiple regions disk store - geode

if I have 10 or 15 geode regions, and I want to persist 5 regions, can I use 1 disk store for all the regions?
<region name="region1" refid="REPLICATE_PERSISTENT"><region-attributes disk-store-name="regionStore" disk-synchronous="false"></region-attributes></region>
<region name="region2" refid="REPLICATE/>
<region name="region3" refid="REPLICATE/>
<region name="region4" refid="REPLICATE_PERSISTENT"><region-attributes disk-store-name="regionStore" disk-synchronous="false"></region-attributes></region>
<region name="region5" refid="REPLICATE_PERSISTENT"><region-attributes disk-store-name="regionStore" disk-synchronous="false"></region-attributes></region>
Then the disk store config is
<disk-store name="regionStore" compaction-threshold="40"
auto-compact="false" allow-force-compaction="true"
max-oplog-size="512" queue-size="10000"
time-interval="15" write-buffer-size="65536"
disk-usage-warning-percentage="80"
disk-usage-critical-percentage="98">
<disk-dirs>
<disk-dir>C:\DiskStores\regionStore</disk-dir>
</disk-dirs>
</disk-store>

The same disk-store can be shared among several regions, yes. It's a best practice, though, to have one disk-store per region.

Related

Create Lucene Indexes in Apache Geode Region

I'm trying to create Lucene Indexes on Apache Geode Region.
I have all the Region definitions in cache.xml. This cache.xml is read by cache server and Regions are created.
If I define a Region something like below in cache.xml,
<region name="trackRegion" refid="PARTITION_PERSISTENT">
<lucene:index name="myIndex">
<lucene:field name="tenant" />
</lucene:index>
</region>
Region is created with Lucene Index, but it doesn't allow me to add other properties of Region like, indexing on primary key, Region compressor etc.
Geode says we should create the Lucene Index first then Region should be created. How should I define the Lucene Index for a Region like below.
<region name="trackRegion" refid="PARTITION_PERSISTENT">
<region-attributes>
<compressor>
<class-name>org.apache.geode.compression.SnappyCompressor</class-name>
</compressor>
</region-attributes>
<index name="trackRegionKeyIndex" from-clause="/trackRegion" expression="key" key-index="true"/>
</region>
Also, I tried creating the Region with Java annotations following this document, https://github.com/spring-projects/spring-data-gemfire/blob/main/src/main/asciidoc/reference/lucene.adoc#annotation-configuration-support.
Even with this I get The Lucene Index must be created before Region error.
Regarding the Spring configuration model for defining Lucene Indexes and using Apache Geode's Lucene support...
Since I am not familiar with how you setup and arranged your application configuration, then you can have a look at a few SDG Integration Tests to see if this possibly helps you identify your problem.
First, have a look at the LuceneOperationsIntegrationTests class in the SDG test suite. This test class shows how to configure a Spring application using JavaConfig; for example.
Next, have a look at the EnableLuceneIndexingConfigurationIntegrationTests in the SDG test suite. This test class shows how your Spring application would be configured using SDG Annotations; for example.
Keep in mind that 1) Lucene Indexes on Apache Geode Regions can only be created on PARTITION Regions, and 2) PARTITION Regions can only be created on the peer servers in your cluster. That is, Lucene Indexes cannot be applied to client Regions on a ClientCache application.
I suspect your application configuration is missing Spring's #DependsOn annotation, either on the template or on the Region contaning the Lucene Index. For example.

(Socket.io) How should I optimize number of Namespaces?

I am planning on using socket.io to create an online app that allows users to chat via video, and saves each video call/session in a database. (Audio for each of these sessions will be separately available to download as well.)
Because there could be multiple video calls/sessions happening at once, I want to separate the data for each session. Socket.io offers both Namespaces and Rooms to do this, but I am unsure which is more optimal for my problem.
Namespaces, according to the documentation, are "a useful feature to minimize the number of resources (TCP connections) and at the same time separate concerns within your application by introducing separation between communication channels."
Is it best to have 1 namespace per room/call/session?
Or would it be best to limit tcp connections by creating some rule that will create a new namespace if/when:
Let's say MAX_CLIENTS_PER_NAMESPACE = (the max number of sockets/clients we want in a namespace)
Say the last/smallest namespace currently has N clients (i.e. if you add up the clients from all the rooms in that namespace we get some number N), and the new room/call/session to-be-created has M clients. Create a new namespace if M + N >= MAX_CLIENTS_PER_NAMESPACE

Can we preserve the storage class when copying an object to a new bucket?

We have two different buckets: short-term, that has lifecycle policies applied, and retain, where we put data that we intend to keep indefinitely. The way we get data into the retain bucket is usually by copying the original object from the short-term bucket using the JSON API.
The short-term bucket after 30 days moves data to nearline, after 60 days to coldline, and after 90 days deletes the data. The storage class for our retain bucket is standard. When we're copying data from short-term bucket to the retain bucket, we'd like to preserve the storage-class of the file that we're duplicating - is it possible for us to specify the storage class on the destination file using the JSON API?
If you want to preserve the storage class it is recommended to perform a rewrite instead:
Use the copy method to copy between objects in the same location and storage class
In the rewrite you should set the storage class. The other way should be in the case that you have separated the objects according to the storage class, but as per my understanding, this is not your case.

is it possible to copyObject from one cloud object storage instance to another. The buckets are in different regions

I would like to use the node sdk to implement a backup and restore mechanism between 2 instances of Cloud Object Storage. I have added a service ID to the instances and added a permissions for the service id to access the buckets present in the instance i want to write to. The buckets will be in different regions. I have tried a variety of endpoints both legacy and non-legacy private and public to achieve this but i usually get Access Denied.
Is what I am trying to do possible with the sdk? if so can someone point me in the right direction?
var config = {
"apiKeyId": "xxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxx",
"endpoint": "s3.eu-gb.objectstorage.softlayer.net",
"iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - crn:v1:bluemix:public:cloud-object-storage:global:a/xxxxxxxxxxx:xxxxxxxxxxx::",
"iam_apikey_name": "auto-generated-apikey-xxxxxxxxxxxxxxxxxxxxxx",
"iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
"iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/0xxxxxxxxxxxxxxxxxxxx::serviceid:ServiceIdxxxxxxxxxxxxxxxxxxxxxx",
"serviceInstanceId": "crn:v1:bluemix:public:cloud-object-storage:global:a/xxxxxxxxxxxxxxxxxxx:xxxxxxxxxxxxxxxxxxxxxxxxxx::",
"ibmAuthEndpoint": "iam.cloud.ibm.com/oidc/token"
}
This should work as long as you are able to properly grant the requesting user access to be able to read the source of the put-copy, so long as you are not using KeyProtect based keys.
So the breakdown here is a bit confusing due to some unintuitive terminology.
A service instance is a collection of buckets. The primary reason for having multiple instances of COS is to have more granularity in your billing, as you'll get a separate line item for each instance. The term is a bit misleading, however, because COS is a true multi-tenant system - you aren't actually provisioning an instance of COS, you're provisioning a sort of sub-account within the existing system.
A bucket is used to segment your data into different storage locations or storage classes. Other behavior, like CORS, archiving, or retention, acts on the bucket level as well. You don't want to segment something that you expect to scale (like customer data) across separate buckets, as there's a limit of ~1k buckets in an instance. IBM Cloud IAM treats buckets as 'resources' and are subject to IAM policies.
Instead, data that doesn't need to be segregated by location or class, and that you expect to be subject to the same CORS, lifecycle, retention, or IAM policies can be separated by prefix. This means a bunch of similar objects share a path, like foo/bar and foo/bas have the same prefix foo/. This helps with listing and organization but doesn't provide granular access control or any other sort of policy-esque functionality.
Now, to your question, the answer is both yes and no. If the buckets are in the same instance then no problem. Bucket names are unique, so as long as there isn't any secondary managed encryption (eg Key Protect) there's no problem copying across buckets, even if they span regions. Keep in mind, however, that large objects will take time to copy, and COS's strong consistency might lead to situations where the operation may not return a response until it's completed. Copying across instances is not currently supported.

What mechanism is there to prevent GCP bucket name squatting?

My company's name is mycomp.
GCP buckets are in a global, public namespace and their names must be publicly unique so all my buckets are prefixed with mycomp.
So mycomp-production, mycomp-test, mycomp-stage, etc.
What is to prevent someone from grabbing mycomp-dev? Like cybersquatting on that bucket name. Something like that could potentially really screw up my organizational structure.
How can I stop or reserve a bucket prefix? Is this even possible? If I want to be an A-hole whats to stop me from grabbing "Nordstrom" or "walmart" if I get their first?
GCS supports domain-named buckets. This would allow you to create buckets like production.mycomp.com and test.mycomp.com. Since the domain must be owned to create buckets with the domain suffix, it ensures that other people can't create buckets with that naming scheme.