I am running a spark(2.3.4) application to populate an orientdb (3.0.34) database.
Configs:
spark : local[*]
OrientDB
this.context = new OrientDB(remoteConnexionCommand, dbUser, dbPwd, dbConfig)
this.pool = new ODatabasePool(this.context, globalConfig.getString("db.name"), dbUser, dbPwd)
Then at some point I try to do an upsert on some edge and the error popup:
The current database instance (com.orientechnologies.orient.core.db.ODatabaseDocumentRemotePooled#737aa62f) is not active on the current thread (Thread[Executor task launch worker for task 319,5,main]). Current active database is: com.orientechnologies.orient.core.db.ODatabaseDocumentRemotePooled#265d0416
I couldn't put all of the code here for this issue, but one starting point for me to try to solve it, is to understand what would cause this error, and also how to reproduce it, it seems to happen in an unpredicted manner.
Also I noticed that the object pool is created twice in the application, not sure why, but would it be the cause ?
EDIT
Code to create a connection to db is executed inside a rdd.foreach method as described here:
rdd.repartition(20).foreach(row => {
try {
DBManager session = DBManager.getSession() // which init a new Session given an initialized pool
session.command(statement, params)
} catch {
case t: Throwable => ...
}
Therefore the code that initializes the dbConnection is done in the worker nodes.
Related
Maybe the question is too simple, at least look like that, but I have the following problem:
A. Execute spark submit in spark streaming process.
ccc.foreachRDD(rdd => {
rdd.repartition(20).foreachPartition(p => {
val repo = getReposX
t foreach (g => {
.................
B. getReposX is a function which one make a query in mongoDB recovering a Map wit a key/value necessary in every executor of the process.
C. Into each g in foreach I manage this map "cached"
The problem or the question is when in the collection of mongo anything change, I don't watch or I don't detect the change, therefore I am managing a Map not updated. My question is: How can I get it? Yes, I know If I reboot the spark-submit and the driver is executed again is OK but otherwise I will never see the update in my Map.
Any ideas or suggestion?
Regards.
Finally I Developed a solution. First I explain the question more in detail because what I really wanted to know is how to implement an object or "cache", which was refreshed every so often, or by some kind of order, without the need to restart the spark streaming process, that is, it would refresh in alive.
In my case this "cache" or refreshed object is an object (Singleton) that connected to a collection of mongoDB to recover a HashMap that was used by each executor and was cached in memory as a good Singleton. The problem with this was that once the spark streaming submit is executed it was cached that object in memory but it was not refreshed unless the process was restarted. Think of a broadcast as a counter mode to refresh when the variable reaches 1000, but these are read only and can not be modified. Think of an counter, but these can only be read by the driver.
Finally my solution is, within the initialization block of the object that loads the mongo collection and the cache, I implement this:
//Initialization Block
{
val ex = new ScheduledThreadPoolExecutor(1)
val task = new Runnable {
def run() = {
logger.info("Refresh - Inicialization")
initCache
}
}
val f = ex.scheduleAtFixedRate(task, 0, TIME_REFRES, TimeUnit.SECONDS)
}
initCache is nothing more than a function that connects mongo and loads a collection:
var cache = mutable.HashMap[String,Description]()
def initCache():mutable.HashMap[String, Description]={
val serverAddresses = Functions.getMongoServers(SERVER, PORT)
val mongoConnectionFactory = new MongoCollectionFactory(serverAddresses,DATABASE,COLLECTION,USERNAME,PASSWORD)
val collection = mongoConnectionFactory.getMongoCollection()
val docs = collection.find.iterator()
cache.clear()
while (docs.hasNext) {
var doc = docs.next
cache.put(...............
}
cache
}
In this way, once the spark streaming submit has been started, each task will open one more task, which will make every X time (1 or 2 hours in my case) refresh the value of the singleton collection, and always recover that value instantiated:
def getCache():mutable.HashMap[String, Description]={
cache
}
I have an issue with JBoss EAP 7.1.0 GA. On one server (my DEV) this works like a charm while on the other (TEST environment) the Callable executed using executor.submit() does not seem to be started (I do not see that "This is call" message in log), but no exception or any other clue is given.
The question is - where should I look like / how should I debug this issue?
The calling code:
#Resource(name = "DefaultManagedExecutorService")
ManagedExecutorService executor;
try {
DownloadPlayers dp = new DownloadPlayers();
Future<Queue<PlayerForDownload>> f = executor.submit(dp);
Queue<PlayerForDownload> q = f.get();
L.info(q.size());
} catch (Exception e) {
L.error("EXCEPTION" + e.getMessage());
}
The class it calls:
public class DownloadPlayers implements Callable<Queue<PlayerForDownload>> {
// the constructor gets called, I'm sure as it writes to log
// the call is as simple as this
#Override
public Queue<PlayerForDownload> call() {
L.info("This is call()");
try {
return this.getPlayersForDownload();
} catch (WorkerException e) {
L.error(e);
return null;
}
}
}
As stated above, the code itself seems to be OK as it works in one server but does not work on the other. Both are
7.1.0GA standalone.
Any advice how to debug the ManagedExecutorService?
Thanks.
In this particular case the problem was that on the TEST environment it was only allowed to run two threads which we already running (by some completely different part of application I didn't realize). So the problem is solved now by setting the "Core threads" parameter in ManagerExecutorService to higher value, the tasks are running.
However the tricky part was that there was no obvious visible difference between the JBoss servers (I compared the standalone.xml configs...) just because the ManagerExecutorService in JBoss has some default (blank) values that actually depend on the system config (v-CPU cores in my case). So despite the config being the same, the "Core threads" seem to default to 2 on TEST and some higher (unknown to me) value on my DEV.
So never depend on default settings in ManagerExecutorService if comparing two environments.
I have also rewritten the logic, instead of using blocking Future.get() or checking for Future.isDone() in a loop a do Future.get() with timeout and in the exception handler I decide whether to keep waiting or fail.
We have an enterprise DB that is replicated through many sites throughout the world. We would like our app to attempt to connect to one of the local sites, and if that site is down we want it to fall back to the enterprise DB. We'd like this behavior on each of our DB operations.
We are using Entity Framework, C#, and SQL Server.
At first I hoped I could just specify a "Failover Partner" in the connection string, but that only works in a mirrored DB environment, which this is not. I also looked into writing a custom IDbExecutionStrategy. But these strategies only allow you to specify the pattern for retrying a failed DB operation. It does not allow you to change the operation in any way like directing it to a new connection.
So, do you know of any good pattern for dealing with this type of operation, other than duplicating retry logic around each of our many DB operations?
Update on 2014-05-14:
I'll elaborate in response to some of the suggestions already made.
I have many places where the code looks like this:
try
{
using(var db = new MyDBContext(ConnectionString))
{
// Database operations here.
// var myList = db.MyTable.Select(...), etc.
}
}
catch(Exception ex)
{
// Log exception here, perhaps rethrow.
}
It was suggested that I have a routine that first checks each of the connections strings and returns the first one that successfully connects. This is reasonable as far as it goes. But some of the errors I'm seeing are timeouts on the operations, where the connection works but the DB has issues that keep it from completing the operation.
What I'm looking for is a pattern I can use to encapsulate the unit of work and say, "Try this on the first database. If it fails for any reason, rollback and try it on the second DB. If that fails, try it on the third, etc. until the operation succeeds or you have no more DBs." I'm pretty sure I can roll my own (and I'll post the result if I do), but I was hoping there might be a known way to approach this.
How about using some Dependency Injection system like autofac and registering there a factory for new context objects - it will execute logic that will try to connect first to local and in case of failure it will connect to enterprise db. Then it will return ready DbContext object. This factory will be provided to all objects that require it with Dependency Injection system - they will use it to create contexts and dispose of them when they are not needed any more.
" We would like our app to attempt to connect to one of the local sites, and if that site is down we want it to fall back to the enterprise DB. We'd like this behavior on each of our DB operations."
If your app is strictly read-only on the DB and data consistency is not absolutely vital to your app/users, then it's just a matter of trying to CONNECT until an operational site has been found. As M.Ali suggested in his remark.
Otherwise, I suggest you stop thinking along these lines immediately because you're just running 90 mph down a dead end street. As Viktor Zychla suggested in his remark.
Here is what I ended up implementing, in broad brush-strokes:
Define delegates called UnitOfWorkMethod that will execute a single Unit of Work on the Database, in a single transaction. It takes a connection string and one also returns a value:
delegate T UnitOfWorkMethod<out T>(string connectionString);
delegate void UnitOfWorkMethod(string connectionString);
Define a method called ExecuteUOW, that will take a unit of work and method try to execute it using the preferred connection string. If it fails, it tries to execute it with the next connection string:
protected T ExecuteUOW<T>(UnitOfWorkMethod<T> method)
{
// GET THE LIST OF CONNECTION STRINGS
IEnumerable<string> connectionStringList = ConnectionStringProvider.GetConnectionStringList();
// WHILE THERE ARE STILL DATABASES TO TRY, AND WE HAVEN'T DEFINITIVELY SUCCEDED OR FAILED
var uowState = UOWStateEnum.InProcess;
IEnumerator<string> stringIterator = connectionStringList.GetEnumerator();
T returnVal = default(T);
Exception lastException = null;
string connectionString = null;
while ((uowState == UOWStateEnum.InProcess) && stringIterator.MoveNext())
{
try
{
// TRY TO EXECUTE THE UNIT OF WORK AGAINST THE DB.
connectionString = stringIterator.Current;
returnVal = method(connectionString);
uowState = UOWStateEnum.Success;
}
catch (Exception ex)
{
lastException = ex;
// IF IT FAILED BECAUSE OF A TRANSIENT EXCEPTION,
if (TransientChecker.IsTransient(ex))
{
// LOG THE EXCEPTION AND TRY AGAINST ANOTHER DB.
Log.TransientDBException(ex, connectionString);
}
// ELSE
else
{
// CONSIDER THE UOW FAILED.
uowState = UOWStateEnum.Failed;
}
}
}
// LOG THE FAILURE IF WE HAVE NOT SUCCEEDED.
if (uowState != UOWStateEnum.Success)
{
Log.ExceptionDuringDataAccess(lastException);
returnVal = default(T);
}
return returnVal;
}
Finally, for each operation we define our unit of work delegate method. Here an example
UnitOfWorkMethod uowMethod =
(providerConnectionString =>
{
using (var db = new MyContext(providerConnectionString ))
{
// Do my DB commands here. They will roll back if exception thrown.
}
});
ExecuteUOW(uowMethod);
When ExecuteUOW is called, it tries the delegate on each database until it either succeeds or fails on all of them.
I'm going to accept this answer since it fully addresses all of concerns raised in the original question. However, if anyone provides and answer that is more elegant, understandable, or corrects flaws in this one I'll happily accept it instead.
Thanks to all who have responded.
I have a windows service, running workflows. The workflows are XAMLs loaded from database (users can define their own workflows using a rehosted designer). It is configured with one instance of the SQLWorkflowInstanceStore, to persist workflows when becoming idle. (It's basically derived from the example code in \ControllingWorkflowApplications from Microsoft's WCF/WF samples).
But sometimes I get an error like below:
System.Runtime.DurableInstancing.InstanceOwnerException: The execution of an InstancePersistenceCommand was interrupted because the instance owner registration for owner ID 'a426269a-be53-44e1-8580-4d0c396842e8' has become invalid. This error indicates that the in-memory copy of all instances locked by this owner have become stale and should be discarded, along with the InstanceHandles. Typically, this error is best handled by restarting the host.
I've been trying to find the cause, but it is hard to reproduce in development, on production servers however, I get it once in a while. One hint I found : when I look at the LockOwnersTable, I find the LockOnwersTable lockexpiration is set to 01/01/2000 0:0:0 and it's not getting updated anymore, while under normal circumstances the should be updated every x seconds according to the Host Lock Renewal period...
So , why whould SQLWorkflowInstanceStore stop renewing this LockExpiration and how can I detect the cause of it?
This happens because there are procedures running in the background and trying to extend the lock of the instance store every 30 seconds, and it seems that once the connection fail connecting to the SQL service it will mark this instance store as invalid.
you can see the same behaviour if you delete the instance store record from [LockOwnersTable] table.
The proposed solution is when this exception fires, you need to free the old instance store and initialize a new one
public class WorkflowInstanceStore : IWorkflowInstanceStore, IDisposable
{
public WorkflowInstanceStore(string connectionString)
{
_instanceStore = new SqlWorkflowInstanceStore(connectionString);
InstanceHandle handle = _instanceStore.CreateInstanceHandle();
InstanceView view = _instanceStore.Execute(handle,
new CreateWorkflowOwnerCommand(), TimeSpan.FromSeconds(30));
handle.Free();
_instanceStore.DefaultInstanceOwner = view.InstanceOwner;
}
public InstanceStore Store
{
get { return _instanceStore; }
}
public void Dispose()
{
if (null != _instanceStore)
{
var deleteOwner = new DeleteWorkflowOwnerCommand();
InstanceHandle handle = _instanceStore.CreateInstanceHandle();
_instanceStore.Execute(handle, deleteOwner, TimeSpan.FromSeconds(10));
handle.Free();
}
}
private InstanceStore _instanceStore;
}
you can find the best practices to create instance store handle in this link
Workflow Instance Store Best practices
This is an old thread but I just stumbled on the same issue.
Damir's Corner suggests to check if the instance handle is still valid before calling the instance store. I hereby quote the whole post:
Certain aspects of Workflow Foundation are still poorly documented; the persistence framework being one of them. The following snippet is typically used for setting up the instance store:
var instanceStore = new SqlWorkflowInstanceStore(connectionString);
instanceStore.HostLockRenewalPeriod = TimeSpan.FromSeconds(30);
var instanceHandle = instanceStore.CreateInstanceHandle();
var view = instanceStore.Execute(instanceHandle,
new CreateWorkflowOwnerCommand(), TimeSpan.FromSeconds(10));
instanceStore.DefaultInstanceOwner = view.InstanceOwner;
It's difficult to find a detailed explanation of what all of this
does; and to be honest, usually it's not necessary. At least not,
until you start encountering problems, such as InstanceOwnerException:
The execution of an InstancePersistenceCommand was interrupted because
the instance owner registration for owner ID
'9938cd6d-a9cb-49ad-a492-7c087dcc93af' has become invalid. This error
indicates that the in-memory copy of all instances locked by this
owner have become stale and should be discarded, along with the
InstanceHandles. Typically, this error is best handled by restarting
the host.
The error is closely related to the HostLockRenewalPeriod property
which defines how long obtained instance handle is valid without being
renewed. If you try monitoring the database while an instance store
with a valid instance handle is instantiated, you will notice
[System.Activities.DurableInstancing].[ExtendLock] being called
periodically. This stored procedure is responsible for renewing the
handle. If for some reason it fails to be called within the specified
HostLockRenewalPeriod, the above mentioned exception will be thrown
when attempting to persist a workflow. A typical reason for this would
be temporarily inaccessible database due to maintenance or networking
problems. It's not something that happens often, but it's bound to
happen if you have a long living instance store, e.g. in a constantly
running workflow host, such as a Windows service.
Fortunately it's not all that difficult to fix the problem, once you
know the cause of it. Before using the instance store you should
always check, if the handle is still valid; and renew it, if it's not:
if (!instanceHandle.IsValid)
{
instanceHandle = instanceStore.CreateInstanceHandle();
var view = instanceStore.Execute(instanceHandle,
new CreateWorkflowOwnerCommand(), TimeSpan.FromSeconds(10));
instanceStore.DefaultInstanceOwner = view.InstanceOwner;
}
It's definitely less invasive than the restart of the host, suggested
by the error message.
you have to be sure about expiration of owner user
here how I am used to handle this issue
public SqlWorkflowInstanceStore SetupSqlpersistenceStore()
{
SqlWorkflowInstanceStore sqlWFInstanceStore = new SqlWorkflowInstanceStore(ConfigurationManager.ConnectionStrings["DB_WWFConnectionString"].ConnectionString);
sqlWFInstanceStore.InstanceCompletionAction = InstanceCompletionAction.DeleteAll;
InstanceHandle handle = sqlWFInstanceStore.CreateInstanceHandle();
InstanceView view = sqlWFInstanceStore.Execute(handle, new CreateWorkflowOwnerCommand(), TimeSpan.FromSeconds(30));
handle.Free();
sqlWFInstanceStore.DefaultInstanceOwner = view.InstanceOwner;
return sqlWFInstanceStore;
}
and here how you can use this method
wfApp.InstanceStore = SetupSqlpersistenceStore();
wish this help
I'm in the process of writing a query manager for a WinForms application that, among other things, needs to be able to deliver real-time search results to the user as they're entering a query (think Google's live results, though obviously in a thick client environment rather than the web). Since the results need to start arriving as the user types, the search will get more and more specific, so I'd like to be able to cancel a query if it's still executing while the user has entered more specific information (since the results would simply be discarded, anyway).
If this were ordinary ADO.NET, I could obviously just use the DbCommand.Cancel function and be done with it, but we're using EF4 for our data access and there doesn't appear to be an obvious way to cancel a query. Additionally, opening System.Data.Entity in Reflector and looking at EntityCommand.Cancel shows a discouragingly empty method body, despite the docs claiming that calling this would pass it on to the provider command's corresponding Cancel function.
I have considered simply letting the existing query run and spinning up a new context to execute the new search (and just disposing of the existing query once it finishes), but I don't like the idea of a single client having a multitude of open database connections running parallel queries when I'm only interested in the results of the most recent one.
All of this is leading me to believe that there's simply no way to cancel an EF query once it's been dispatched to the database, but I'm hoping that someone here might be able to point out something I've overlooked.
TL/DR Version: Is it possible to cancel an EF4 query that's currently executing?
Looks like you have found some bug in EF but when you report it to MS it will be considered as bug in documentation. Anyway I don't like the idea of interacting directly with EntityCommand. Here is my example how to kill current query:
var thread = new Thread((param) =>
{
var currentString = param as string;
if (currentString == null)
{
// TODO OMG exception
throw new Exception();
}
AdventureWorks2008R2Entities entities = null;
try // Don't use using because it can cause race condition
{
entities = new AdventureWorks2008R2Entities();
ObjectQuery<Person> query = entities.People
.Include("Password")
.Include("PersonPhone")
.Include("EmailAddress")
.Include("BusinessEntity")
.Include("BusinessEntityContact");
// Improves performance of readonly query where
// objects do not have to be tracked by context
// Edit: But it doesn't work for this query because of includes
// query.MergeOption = MergeOption.NoTracking;
foreach (var record in query
.Where(p => p.LastName.StartsWith(currentString)))
{
// TODO fill some buffer and invoke UI update
}
}
finally
{
if (entities != null)
{
entities.Dispose();
}
}
});
thread.Start("P");
// Just for test
Thread.Sleep(500);
thread.Abort();
It is result of my playing with if after 30 minutes so it is probably not something which should be considered as final solution. I'm posting it to at least get some feedback with possible problems caused by this solution. Main points are:
Context is handled inside the thread
Result is not tracked by context
If you kill the thread query is terminated and context is disposed (connection released)
If you kill the thread before you start a new thread you should use still one connection.
I checked that query is started and terminated in SQL profiler.
Edit:
Btw. another approach to simply stop current query is inside enumeration:
public IEnumerable<T> ExecuteQuery<T>(IQueryable<T> query)
{
foreach (T record in query)
{
// Handle stop condition somehow
if (ShouldStop())
{
// Once you close enumerator, query is terminated
yield break;
}
yield return record;
}
}