Creating Transactions With OrientDB-Gremlin - orientdb

I am using this plugin so that I can interface with orient DB using tinkerpop 3.x .
I am wondering how can I create different transactions ?
With TitanDB its as simple as:
t1 = graph.newTransaction();
t2 = graph.newTransaction();
t3 = graph.newTransaction();
I tried the following with OrientDB-Gremlin:
t1 = graph.tx().createThreadedTx();
t2 = graph.tx().createThreadedTx();
and received the following error:
java.lang.UnsupportedOperationException: Graph does not support threaded transactions
Does this mean the only way to get different transactions is to open them within the scope of a different thread ?

It doesn't look as though the OrientDB implementation (I suppose you are using this one) supports threaded transactions (i.e. those created in Titan with newTransaction() or under the TinkerPop model of graph.tx().createThreadedTx()). You only need threaded transactions if you intend to have more than one thread operating on the same transaction.
If you do not need that (in most standard use cases you don't), then transactions are simply automatic and bound to the current thread. In other words, as soon as you call a method that reads or writes to the graph, the transaction on that thread is "opened" and as soon as you call graph.tx().commit() or graph.tx().rollback() the transaction on that thread is closed.
Does this mean the only way to get different transactions is to open them within the scope of a different thread ?
Yes - if you wanted the same thread to have two separate open transactions, I guess you would have to start them in two separate threads.

Related

what does the file Snapshot.scala in databricks?

I am running some streaming query jobs on a databricks cluster, and when i look at the cluster/job logs, I see a lot of
first at Snapshot.scala:1
and
withNewExecutionId at TransactionalWriteEdge.scala:130
A quick search yielded this scala script https://github.com/delta-io/delta/blob/master/src/main/scala/org/apache/spark/sql/delta/Snapshot.scala
Any one can explain what this do in laymans term?
Internally this class manages the replay of actions stored in checkpoint or delta file
Generally, this "snapshotting" relies on delta encoding and indirectly allows snaphot isolation as well.
Practically delta-encoding remembers every side-effectful operation like INSERT DELETE UPDATE that you did since the last checkpoint. In case of delta lake it would be SingleAction (source): AddFile (insert) RemoveFile (delete). Conceptually this approach is close to event-sourcing - without it you'd have to literally store/broadcast whole state (database or directory) on every update. It also employed by many classic ACID databases with replication.
Overall it gives you:
ability to continuously replicate file-system/directory/database state (see SnapshotManagement.update). Basically that's why you see a lot of first at Snapshot.scala:1 - it's called in order to catch up with the log every time you start transaction, see DeltaLog.startTransaction. I couldn't find TransactionalWriteEdge sources, but I guess it's called around the same time.
ability to restore state by replaying every action since the last snapshot.
ability to isolate (and store) transactions by keeping their snapshots apart until commit (every SingleAction has txn in order to isolate). Delta-lake uses optimistic locking for that: transaction commits will fail if their logs are not mergeable, while readers don't see uncommitted actions.
P.S. You can see that the log is accessed in line val deltaData = load(files) and actions are stacked on top of previousSnapshot (val checkpointData = previousSnapshot.getOrElse(emptyActions); val allActions = checkpointData.union(deltaData))

Handling multiple updates to a singe db field

To give a bit of background to my issue, I've got a very basic banking system. The process at the moment goes:
A transaction is added to an Azure Service Bus
An Azure Webjob picks up this message and creates the new row in the SQL DB.
The balance (total) of the account needs to be updated with the value in the message (be it + or -).
So for example if the field is 10 and I get two updates (10, -5) the field needs to be 15 (10 + 10 - 5), it isn't a case of just updating the value, it needs to do some arithmetic.
Now I'm not too sure how to handle the update of the balance as there could be many requests come in so need to update accordingly.
I figured one way is to do the update on the SQL side rather than the web job, but that doesn't help with concurrent updates.
Can I do some locking with the field? But what happens to an update when it is blocked because an update is already in progress? Does it wait or fail? If it waits then this should be OK. I'm using EF.
I figured another way round this is to have another WebJob that will run on a schedule and will add up all the amounts and update the value once, and so this will be the only thing touching that field.
Thanks
One way or another, you will need to serialize write access to account balance field (actually to the whole row).
Having a separate job that picks up "pending" inserts, and eventually updates balance will be ok in case writes are more frequent on your system than reads, or you don't have to always return most recent balance. Otherwise, to get the current balance you will need to do something like
SELECT balance +
ISNULL((SELECT SUM(transaction_amount)
FROM pending_insert pi WHERE pi.user_id = ac.user_id
),0) as actual_balance
FROM account ac
WHERE ac.user_id = :user_id
That is definitely more expensive from performance perspective , but for some systems it's perfectly fine. Another pitfall (again, it may or may not be relevant to your case) is enforcing, for instance, non-negative balance.
Alternatively, you can consistently handle banking transactions in the following way :
begin database transaction
find and lock row in account table
validate total amount if needed
insert record into banking_transaction
update user account, i.e. balance = balance +transasction_amount
commit /rollback
If multiple user accounts are involved, you have to always lock them in the same order to avoid deadlocks.
That approach is more robust, but potentially worse from concurrency point of view (again, it depends on the nature of updates in your application - here the worst case is many concurrent banking transactions for one user, updates to multiple users will go fine).
Finally, it's worth mentioning that since you are working with SQLServer, beware of deadlocks due to lock escalation. You may need to implement some retry logic in any case
You would want to use a parameter substitution method in your sql. You would need to find out how to do that based on the programming language you are using in your web job.
$updateval = -5;
Update dbtable set myvalue = myvalue + $updateval
code example:
int qn = int.Parse(TextBox3.Text)
SqlCommand cmd1 = new SqlCommand("update product set group1 = group1 + #qn where productname = #productname", con);
cmd1.Parameters.Add(new SqlParameter("#productname", TextBox1.Text));
cmd1.Parameters.Add(new SqlParameter("#qn", qn));
then execute.

Registering triggers for missing expected events using Esper in real-time

My use-case is to identify entities from which expected events have not been received after X amount of time in real-time rather than using batch jobs. For Example:
If we have received PaymentInitiated event at time T but didn't receive either of PaymentFailed / PaymentAborted / PaymentSucedded by T+X, then raise a trigger saying PaymentStuck along with details of PaymentIntitiated event.
1. Can I capture such triggers using Esper?
In my actual use-case X is not constant and varies as per each record which I would know before the first event has occured.
2. Can Esper support registering such dynamic queries where X is not constant?
Thanks,
Harish
You could use a pattern such as "pattern [every pi=PaymentInitiated -> timer:interval(pi.amountOfTimeInSeconds) and not (PaymentAborted(id=pi.id) or PaymentStuck(id=p.id))]"
An outer join is also handy to detect absences. The solution patterns page in among the Esper web site has more examples.

Cannot find a record just created in a different thread with JPA

I am using the Play! framework, and have a difficulty with in the following scenario.
I have a server process which has a 'read-only' transaction. This to prevent any possible database lock due to execution as it is a complicated procedure. There are one or two record to be stored, but I do that as a job, as I found doing them in the main thread could result in a deadlock under higher load.
However, in one occasion I need to create an object and subsequently use it.
However, when I create the object using a Job, wait for the resulting id (with a Promise return) and then search in the database for it, it cannot be found.
Is there an easy way to have the JPA search 'afresh' in the DB at this point? I implemented a 5 sec. pause to test, so I am sue it is not because the procedure hadn't finished yet.
Check if there is a transaction wrapped around your INSERT and if there is one check that the transaction is COMMITed.

Preventing update loops for multiple databases using CDC

We have a number of legacy systems that we're unable to make changes to - however, we want to start taking data changes from these systems and applying them automatically to other systems.
We're thinking of some form of service bus (no specific tech picked yet) sitting in the middle, and a set of bus adapters (one per legacy application) to translate between database specific concepts and general update messages.
One area I've been looking at is using Change Data Capture (CDC) to monitor update activity in the legacy databases, and use that information to construct appropriate messages. However, I have a concern - how best could I, as a consumer of CDC information, distinguish changes applied by the application vs changes applied by the bus adapter on receipt of messages - because otherwise, the first update that gets distributed by the bus will get re-distributed by every receiver when they apply that change to their own system.
If I was implementing "poor mans" CDC - i.e. triggers, then those triggers execute within the context/transaction/connection of the original DML statements - so I could either design them to ignore one particular user (the user applying incoming updates from the bus), or set and detect a session property to similar ignore certain updates.
Any ideas?
If I understand your question correctly, you're trying to define a message routing structure that works with a design you've already selected (using an enterprise service bus) and a message implementation that you can use to flow data off your legacy systems that only forward-ports changes to your newer systems.
The difficulty is you're trying to apply changes in such a way that they don't themselves generate a CDC message from the clients receiving the data bundle from your legacy systems. In fact, all you're concerned about is having your newer systems consume the data and not propagate messages back to your bus, creating unnecessary crosstalk that might exponentiate, overloading your infrastructure.
The secret is how MSSQL's CDC features reconcile changes as they propagate through the network. Specifically, note this caveat:
All the changes are logged in terms of LSN or Log Sequence Number. SQL
distinctly identifies each operation of DML via a Log Sequence Number.
Any committed modifications on any tables are recorded in the
transaction log of the database with a specific LSN provided by SQL
Server. The __$operationcolumn values are: 1 = delete, 2 = insert, 3 =
update (values before update), 4 = update (values after update).
cdc.fn_cdc_get_net_changes_dbo_Employee gives us all the records net
changed falling between the LSN we provide in the function. We have
three records returned by the net_change function; there was a delete,
an insert, and two updates, but on the same record. In case of the
updated record, it simply shows the net changed value after both the
updates are complete.
For getting all the changes, execute
cdc.fn_cdc_get_all_changes_dbo_Employee; there are options either to
pass 'ALL' or 'ALL UPDATE OLD'. The 'ALL' option provides all the
changes, but for updates, it provides the after updated values. Hence
we find two records for updates. We have one record showing the first
update when Jason was updated to Nichole, and one record when Nichole
was updated to EMMA.
While this documentation is somewhat terse and difficult to understand, it appears that changes are logged and reconciled in LSN order. Competing changes should be discarded by this system, allowing your consistency model to work effectively.
Note also:
CDC is by default disabled and must be enabled at the database level
followed by enabling on the table.
Option B then becomes obvious: institute CDC on your legacy systems, then use your service bus to translate these changes into updates that aren't bound to CDC (using, for example, raw transactional update statements). This should allow for the one-way flow of data that you seek from the design of your system.
For additional methods of reconciling changes, consider the concepts raised by this Wikipedia article on "eventual consistency". Best of luck with your internal database messaging system.