Titan adding vertex using multi-threads - titan

I am using Titan 1 with tinkerpop3 and gremlin.
For small jobs I use threads who are basically doing:
myNode = g.V().has(somthing)
//some tests
newNode = graph.addVertex(someProperties)
g.V(myNode).addEdge(newNode)
During the creation of the edge I get this exceptio:
java.lang.IllegalStateException: The vertex or type is not associated with this transaction [v[41025720]]
I understand that my newNode is (kind of) not on the transaction of my thread.
How can I refresh the transaction scope, or add my newnode to the current transaction?
Thanks

First off I would recommend reading chapter 9 of the titan doc which deals with transactions in greater detail.
For your specific problem all you need to do is create a transaction and have all threads work on that transaction. Taking from the docs directly what you need is:
TitanGraph g = TitanFactory.open(CONFIG);
TransactionalGraph tx = g.newTransaction();
Thread[] threads = new Thread[10];
for (int i=0;i<threads.length;i++) {
threads[i]=new Thread(new DoSomething(tx));
threads[i].start();
}
for (int i=0;i<threads.length;i++) threads[i].join();
tx.commit();
This will get all threads to work on the same transaction and have access to the same nodes and edges.
Without doing this Titan will automatically create a new transaction for each different thread accessing the graph. Which means each thread will be working with different new nodes, edges, etc. . .
Example DoSomething
DoSomething(TransactionalGraph tx){
tx.addVertex();
}

Related

Spring Boot controller preventing multiple inserts upon quick successive requests in mongodb

I have a REST API to calculate something upon a request, and if the same request is made again, return the result from the cache, which consist of documents saved in MongoDB. To know if two request is the same, I am hashing some relevant fields in the request. But when same request is made in a quick succession, duplicate documents occur in MongoDB, which later results in "IncorrectResultSizeDataAccessException" when I try to read them.
To solve it I tried to synchronize on hash value in following controller method (tried to cut out non relevant parts):
#PostMapping(
path = "/{myPath}",
consumes = {MediaType.APPLICATION_JSON_UTF8_VALUE},
produces = {MediaType.APPLICATION_JSON_UTF8_VALUE})
#Async("asyncExecutor")
public CompletableFuture<ResponseEntity<?>> retrieveAndCache( ... a,b,c,d various request parameters) {
//perform some validations on request...
//hash relevant equest parameters
int hash = Objects.hash(a, b, c, d);
synchronized (Integer.toString(hash).intern()) {
Optional<Result> resultOpt = cacheService.findByHash(hash);
if (resultOpt.isPresent()) {
return CompletableFuture.completedFuture(ResponseEntity.status(HttpStatus.OK).body(opt.get().getResult()));
} else {
Result result = ...//perform requests to external services and do some calculations...
cacheService.save(result);
return CompletableFuture.completedFuture(ResponseEntity.status(HttpStatus.OK).body(result));
}
}
}
//cacheService methods
#Transactional
public Optional<Result> findByHash(int hash) {
return repository.findByHash(hash); //this is the part that throws the error
}
I am sure that no hash collision is occuring, its just when the same request is performed in a quick succession duplicate records occur. To my understanding, it shouldn't occur as long as I have only 1 running instance of my spring boot application. Do you see any other reason than there are multiple instances running in production?
You should check the settings of your MongoDB client.
If one thread calls the cacheService.save(result) method, and after that method returns, releases the lock, then another thread calls cacheService.findByHash(hash), it's still possible that it will not find the record that you just saved.
It's possible that e.g. the save method returns as soon as the saved object is in the transaction log, but not fully processed yet. Or the save is processed on the primary node, but the findByHash is executed on the secondary node, where it's not replicated yet.
You could use WriteConcern.MAJORITY, but I'm not 100% sure if it covers everything.
Even better is to let MongoDB do the locking by using findAndModify with FindAndModifyOptions.upsert(true), and forget about the lock in your java code.

Initialize an entity on startup on Akka Sharding

How could I prestart an entity at cluster startup? I have found a way to do so but I think it is not the right way to do. It consists of sending a StartEntity(entityId) message to the shard region on every node. Suppose I have 1000 entities to initialize. It seems very unperformant (explosion of messages in the cluster since every node tries to initialize the remote entity)!
val shardRegion: ActorRef[ShardingEnvelope[Command]] =
sharding.init(Entity(HelloServiceEntity)(createBehavior = ctx => HelloWorldService()))
Seq("S0", "S1").foreach { id =>
shardRegion ! StartEntity(id)
}
Is there any efficient way to achieve what I want? I could not find an official post or documentation about it. Am I doing this wrong?
I had an idea ! I could use a cluster Singleton whose job would be to initialize entities. That’s the most efficient way I came up with without getting into internals and having to propose a pull request :joy:

EF database concurrency

I have two apps: one app is asp.net and another is a windows service running in background.
The windows service running in background is performing some tasks (read and update) on database while user can perform other operations on database through asp.net app. So I am worried about it as for example, in windows service I collect some record that satisfy a condition and then I iterate over them, something like:
IQueryable<EntityA> collection = context.EntitiesA.where(<condition>)
foreach (EntityA entity in collection)
{
// do some stuff
}
so, if user modify a record that is used later in the loop iteration, what value for that record is EF taken into account? the original retrieved when performed:
context.EntitiesA.where(<condition>)
or the new one modified by the user and located in database?
As far as I know, during iteration, EF is taken each record at demand, I mean, one by one, so when reading the next record for the next iteration, this record corresponds to that collected from :
context.EntitiesA.where(<condition>)
or that located in database (the one the user has just modified)?
Thanks!
There's a couple of process that will come into play here in terms of how this will work in EF.
Queries are only performed on enumeration (this is sometimes referred to as query materialisation) at this point the whole query will be performed
Lazy loading only effects navigation properties in your above example. The result set of the where statement will be pulled down in one go.
So what does this mean in your case:
//nothing happens here you are just describing what will happen later to make the
// query execute here do a .ToArray or similar, to prevent people adding to the sql
// resulting from this use .AsEnumerable
IQueryable<EntityA> collection = context.EntitiesA.where(<condition>);
//when it first hits this foreach a
//SELECT {cols} FROM [YourTable] WHERE [YourCondition] will be performed
foreach (EntityA entity in collection)
{
//data here will be from the point in time the foreach started (eg if you have updated during the enumeration in the database you will have out of date data)
// do some stuff
}
If you're truly concerned that this can happen then get a list of id's up front and process them individually with a new DbContext for each (or say after each batch of 10). Something like:
IList<int> collection = context.EntitiesA.Where(...).Select(k => k.id).ToList();
foreach (int entityId in collection)
{
using (Context context = new Context())
{
TEntity entity = context.EntitiesA.Find(entityId);
// do some stuff
context.Submit();
}
}
I think the answer to your question is 'it depends'. The problem you are describing is called 'non repeatable reads' an can be prevented from happening by setting a proper transaction isolation level. But it comes with a cost in performance and potential deadlocks.
For more details you can read this

Grails save not respect flush option

I'm using grails as a poor man's etl tool for migrating some relatively small db objects from 1 db to the next. I have a controller that reads data from one db (mysql) and writes it into another (pgsql). They use similar domain objects, but not exactly the same ones due to limitations in the multi-datasource support in grails 2.1.X.
Below you'll see my controller and service code:
class GeoETLController {
def zipcodeService
def migrateZipCode() {
def zc = zipcodeService.readMysql();
zipcodeService.writePgSql(zc);
render{["success":true] as JSON}
}
}
And the service:
class ZipcodeService {
def sessionFactory
def propertyInstanceMap = org.codehaus.groovy.grails.plugins.DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP
def readMysql() {
def zipcode_mysql = Zipcode.list();
println("read, " + zipcode_mysql.size());
return zipcode_mysql;
}
def writePgSql(zipcodes) {
List<PGZipcode> zips = new ArrayList<PGZipcode>();
println("attempting to save, " + zipcodes.size());
def cntr = 0;
zipcodes.each({ Zipcode zipcode ->
cntr++;
def props = zipcode.properties;
PGZipcode zipcode_pg = new PGZipcode(zipcode.properties);
if (!zipcode_pg.save(flush:false)) {
zipcode_pg.errors.each {
println it
}
}
zips.add(zipcode_pg)
if (zips.size() % 100 == 0) {
println("gorm begin" + new Date());
// clear session here.
this.cleanUpGorm();
println("gorm complete" + new Date());
}
});
//Save remaining
this.cleanUpGorm();
println("Final ." + new Date());
}
def cleanUpGorm() {
def session = sessionFactory.currentSession
session.flush()
session.clear()
propertyInstanceMap.get().clear()
}
}
Much of this is taken from my own code and then tweaked to try and get similar performance as seen in http://naleid.com/blog/2009/10/01/batch-import-performance-with-grails-and-mysql/
So, in reviewing my code, whenever zipcode_pg.save() is invoked, an insert statement is created and sent to the database. Good for db consistency, bad for bulk operations.
What is the cause of my instant flushes (note: My datasource and congig groovy files have NO relevant changes)? At this rate, it takes about 7 seconds to process each batch of 100 (14 inserts per second), which when you are dealing with 10,000's of rows, is just a long time...
Appreciate the suggestions.
NOTE: I considered using a pure ETL tool, but with so much domain and service logic already built, figured using grails would be a good reuse of resources. However, didn't imagine this quality of bulk operations
Without seeing your domain objects, this is just a hunch, but I might try specifying validate:false as well in your save() call. Validate() is called by save(), unless you tell Grails not to do that. For example, if you have a unique constraint on any field in your PGZipcode domain object, Hibernate has to do an insert on every new record to leverage the DBMS's unique function and perform a proper validation. Other constraints might require DBMS queries as well, but only unique jumps to mind right now.
From Grails Persistence: Transaction Write-Behind
Hibernate caches database updates where possible, only actually
pushing the changes when it knows that a flush is required, or when a
flush is triggered programmatically. One common case where Hibernate
will flush cached updates is when performing queries since the cached
information might be included in the query results. But as long as
you're doing non-conflicting saves, updates, and deletes, they'll be
batched until the session is flushed.
Alternately, you might try setting the Hibernate session's flush mode explicitly:
sessionFactory.currentSession.setFlushMode(FlushMode.MANUAL);
I'm under the impression the default flush mode might be AUTO.

What things should I consider when using System.Transactions in my EF project?

Background
I have both an MVC app and a windows service that access the same data access library which utilizes EntityFramework. The windows service monitors certain activity on several tables and performs some calculations.
We are using the DAL project against several hundred databases, generating the connection string for the context at runtime.
We have a number of functions (both stored procedures and .NET methods which call on EF entities) which because of the scope of data we are using are VERY db intensive which have the potential to block one another.
The problem
The windows service is not so important that it can't wait. If something must be blocked, the windows service can. Earlier I found a number of SO questions that stated that System.Transactions is the way to go when setting your transaction isolation level to READ UNCOMMITTED to minimize locks.
I tried this, and I may be misunderstanding what is going on, so I need some clarification.
The method in the windows service is structured like so:
private bool _stopMe = false;
public void Update()
{
EntContext context = new EntContext();
do
{
var transactionOptions = new System.Transactions.TransactionOptions();
transactionOptions.IsolationLevel = System.Transactions.IsolationLevel.ReadUncommitted;
using (var transactionScope = new System.Transactions.TransactionScope( System.Transactions.TransactionScopeOption.Required, transactionOptions))
{
List<Ent1> myEnts = (from e....Complicated query here).ToList();
SomeCalculations(myEnts);
List<Ent2> myOtherEnts = (from e... Complicated query using entities from previous query here).ToList();
MakeSomeChanges(myOtherEnts);
context.SaveChanges();
}
Thread.Sleep(5000); //wait 5 seconds before allow do block to continue
}while (! _stopMe)
}
When I execute my second query, an exception gets thrown:
The underlying provider failed on Open.
Network access for Distributed Transaction Manager (MSDTC) has been disabled. Please
enable DTC for network access in the security configuration for MSDTC using the
Component Services Administrative tool.
The transaction manager has disabled its support for remote/network
transactions. (Exception from HRESULT: 0x8004D024)
I can assume that I should not be calling more than one query in that using block? The first query returned just fine. This is being performed on one database at a time (really other instances are being run in different threads and nothing from this thread touches the others).
My question is, is this how it should be used or is there more to this that I should know?
Of Note: This is a monitoring function, so it must be run repeatedly.
In your code you are using transaction scope. It looks like the first query uses a light weight db transaction. When the second query comes the transaction scope upgrades the transaction to a distributed transaction.
The distributed transaction uses MSDTC.
Here is where the error comes, by default MSDTC is not enabled. Even if it is enabled and started, it needs to be configured to allow a remote client to create a distributed transaction.