Curator Framework - read data and stats in one request - apache-zookeeper

In my application I use Curator Framework to perform operations in Zookeeper.
I want to read data with timestamp of creation (ctime) and timestamp of modification (mtime) from path. I try to do it, but I don't see any method to get data with stats.
The only working method I found is to do two separate requests.
First:
final Stat stat = curator.checkExists().forPath("myPath");
final long ctime = stat.getCtime();
final long mtime = stat.getMtime();
and second:
final byte[] data = curator.getData().forPath("myPath");
Is there any other way to perform such read operation in one request?

Do it this way:
Stat stat = new Stat();
byte[] data = client.getData().storingStatIn(stat).forPath("myPath");

Related

How to do bulk update of metadata of gcs files ( 5k files per minute )

We have around ~5k GCS files whose metadata needs to be updated.
With the fillowing peince of code i can do one file at a time. Do we have way to bulk update of 5k files .
Blob b = storage.get(BlobId.fromGsUtilUri(file));
// add metadata to the output file
Map<String, String> md = new HashMap<>();
md.put("mtvalue", System.currentTimeMillis() + "");
b.toBuilder().setMetadata(md).build().update();
Google documentation, but couldn't find any such bulk update methods.

DbContext Remove generates DbUpdateConcurrencyException when DATETIME includes milliseconds

I have a "legacy" table that I'm managing in a webapp. It uses a compound key, rather than an auto-increment integer, which I register during OnModelCreating:
modelBuilder.Entity<EmployeeLog>().HasKey(table => new { table.EmployeeID, table.DateCreated });
If it matters, the EmployeeID is a VARCHAR`` and DateCreated is a DATETIME.
When I attempt to delete a record:
...
// the UI stores the date/time without milliseconds, use a theta join to bracket the time
var model = await _context.EmployeeLogs.FirstOrDefaultAsync(l => l.EmployeeID == driverID && (l.DateCreated >= date && l.DateCreated < date.AddSeconds(1)));
_context.Remove(model);
var result = await _context.SaveChangesAsync();
I get an error when SaveChangesAsync is called:
DbUpdateConcurrencyException: Database operation expected to affect 1 row(s) but actually affected 0 row(s). Data may have been modified or deleted since entities were loaded.
The model is correctly identified and contains the expected data.
I appears that the issue is related to the granularity of the date. If milliseconds are included, for example 2021-04-27 08:06:33.193, then the error is generated. If the date/time does not include milliseconds, 2021-04-27 08:06:33.000 for example, the deletion works correctly.
I can truncate milliseconds when creating the record, but I'd like to know if there is a way to handle this if the record already contains milliseconds.
** edit **
I don't have any control over the vendor's database decisions, so I need a solution that addresses that reality.

Splunk to avoid duplication of data pulled by REST API

I have splunk instance where i configure Data Inputs as "REST API input for polling data from RESTful endpoints".
I have almost around 20+ endpoints and where i am pulling data in json format and loading in single index.
However each time any reports or search query runs it will double same data again like very first fetch brings 5 values and subsequent fetch will bring another 5 and so and keep increasing.
Now in my dashboards and reports i kind of landed into problem of duplicate data. How i should avoid it.
So for very unusual work around i increased interval from 1 min to 1 months which helps me to avoid data duplication.
However i cannot have stale data for month...i can still survive with 1 day interval but not with 1 month.
Is there any way in splunk where i can keep my REST API Call tidy(avoid duplicates) ... to make my dashboards and reports on the fly.
Here is snippet of my inputs.conf file for REST API.
[rest://rst_sl_get_version]
auth_password = ccccc
auth_type = basic
auth_user = vvvvvvv
endpoint = https://api.xx.com/rest/v3/xx_version
host = slrestdata
http_method = GET
index = sldata
index_error_response_codes = 0
response_type = json
sequential_mode = 0
sourcetype = _json
streaming_request = 0
polling_interval = 2592000
To remove data that you no longer need or want, you can use the clean command:
splunk clean eventdata -index <index_name>
From the Splunk documentation:
To delete indexed data permanently from your disk, use the CLI clean command. This command completely deletes the data in one or all indexes, depending on whether you provide an argument. Typically, you run clean before re-indexing all your data.
The caveat with this method is that you have to stop Splunk before executing clean. If you wanted to automate the process, you could write a script to stop Splunk, run clean with your parameters, then restart Splunk.
Assuming that every time you make REST API call you have new information you could code a new responsehandler atached in splunkweb/etc/app/rest_ta/bin/responsehandlers.py to include a new field to your json data, (an Id of report reportTime=ff/hh/yyyy h:m:s) so when coding your dashboard you would have a new field with which you could dynamically get last ID to report and at the same time save an historic of your data reports to get more business information.
class RestGetCustomField:
def __init__(self,**args):
pass
def __call__(self, response_object,raw_response_output,response_type,req_args,endpoint):
if response_type == "json":
output = json.loads(raw_response_output)
for flight in output["Data"]:
flight2.update({"My New Field":datetoday()})
print_xml_stream(json.dumps(flight2))
else:
print_xml_stream(raw_response_output)
def datetoday():
today = datetime.date.today()
return today.strftime('%Y/%m/%d')
And then you could configure:
response_handler = RestGetCustomField
And that's it, now the indexed data have a new field that you can use to identify and or filter reports

Orientdb Transactions Best Practices

I'm working on a REST API. I'm having all sorts of problems with transactions in Orientdb. In the current setup, we have a singleton that wraps around the ODatabaseDocumentPool. We retrieve all instances through this setup. Each api call starts by acquiring an instance from the pool and creating a new instance of OrientGraph using the ODatabaseDocumentTx instance. The code that follows uses methods from both ODatabaseDocumentTx and OrientGraph. At the end of the code, we call graph.commit() on write operations and graph.shutdown() on all operations.
I have a list of questions.
To verify, I can still use the ODatabaseDocumentTx instanced I used to create OrientGraph? Or should I use OrientGraph.getRawGraph()?
What is the best way to do read operations when using OrientGraph? Even during read operations, I get OConcurrentModificationExceptions, lock exceptions, or error on retrieving records. Is this because the OrientGraph is transactional and versions are modified even when retrieving records? I should mention, I also use the index manager and iterate through edges of a vertex in these read operations.
When I get a record through the Index Manager, does this update the version on the database?
Does graph.shutdown() release the ODatabaseDocumentTx instance back to the pool?
Does v1.78 still required us to lock records in transactions?
If set autoStartTx to false on OrientGraph, do I have to start transactions manually, or do they start automatically when accessing the database?
Sample Code:
ODatabaseDocumentTx db = pool.acquire();
// READ
OrientGraph graph = new OrientGraph(db);
ODocument doc = (ODocument) oidentifialbe.getRecord() // I use Java API to a get record from index
if( ((String) doc.field("field")).equals('name') )
//code
OrientVertex v = graph.getVertex(doc);
for(OrientVertex vv : v.getVertices()) {
//code
}
// OR WRITE
doc.field('d',val);
doc = doc.save();
OrientVertex v = v.getVertex(doc);
graph.addEdge(null, v, otherVertex);
graph.addEdge(null, v, anotherVertex) // do I have to reload the record in v?
// End Transaction
// if write
graph.commit();
// then
graph.shutdown();

Cassandra Thrift adding dynamic row with DateType Comparator

I have the following column family:
CREATE COLUMN FAMILY messages with comparator=DateType and key_validation_class=UTF8Type and default_validation_class=UTF8Type;
now I'm using Cassandra Thrift to store new Data:
TTransport tr = this.dataSource.getConnection();
TProtocol proto = new TBinaryProtocol(tr);
Cassandra.Client client = new Cassandra.Client(proto);
long timestamp = System.currentTimeMillis();
client.set_keyspace("myTestSPace");
ColumnParent parent = new ColumnParent("messages");
Column messageColumn = new Column();
String time = String.valueOf(timestamp);
messageColumn.setName(time.getBytes());
messageColumn.setValue(toByteBuffer(msg.getText));
messageColumn.setTimestamp(timestamp);
client.insert(toByteBuffer(msg.getAuthor()), parent, messageColumn, ConsistencyLevel.ONE);
but I'm getting exception:
InvalidRequestException(why:Expected 8 or 0 byte long for date (16))
at org.apache.cassandra.thrift.Cassandra$insert_result.read(Cassandra.java:15198)
at org.apache.cassandra.thrift.Cassandra$Client.recv_insert(Cassandra.java:858)
at org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:830)
at com.vanilla.cassandra.DaoService.addMessage(DaoService.java:57)
How to do it correct?
It appears that you're using the raw Thrift interface. For reasons like the one you've encountered and many, many more, I strongly suggest that you use an existing high-level client like Hector, Astyanax, or a CQL client.
The root cause of your issue is that you have to pack different datatypes into a binary format. The higher level clients manage this automatically.