Why should we use batch() instead of Promise.all? - pg-promise

From the pg-promise FAQ Why use method batch instead of promise.all?:
It is quintessential to settle all the promises-queries created within your task or transaction, before the connection is released
I don't see why this should be a problem.
For example when we have an array of queries like this:
[
t.any("SELECT pg_sleep(2) as a"),
t.any('this will fail'),
t.any("SELECT pg_sleep(3) as b")
]
Note: pg_sleep is only used for testing.
In production this would be Insert/Update/Delete statements. And we only want to commit the transaction when all have been successful: i.e. return an error when any of them fails.
When we use batch():
the first promise will resolve after 2 seconds
the 2nd promise will reject
the 3rd query will still be sent to the databsae and returns after 3 more seconds
finally (after a total of 5 seconds), batch is done and we can return an error to the caller.
When we use Promise.all():
the first promise will resolve after 2 seconds
the 2nd promise will reject - and this will rollback the transaciton and release the database connection
now we can already return an error to the caller
the 3rd request would fail immediately with Querying against a released or lost connection.. This is anyway expected, so we can igonre it.
So I'd say that Promise.all is better, because:
it returns immediately after the first error
will not even send the 3rd useless query to the datababase
What am I missing?
Does this maybe this causes other issues: e.g. that a broken connection is returned to the pool, etc.

Method batch caters for scenario where there may be dynamic number of queries created.
It makes sure that all queries are settled (resolved or rejected), so you do not end up with queries being executed against a closed connection, and getting that Querying against a released or lost connection error. It can be bad/confusing, to start getting those errors occur outside of the context, and you can't diagnose what's going on.
Method Promise.all does not settle promises, it stops processing and rejects when the first promise in the array rejects.
And while method batch is still quite useful, as it is more flexible in how it can handle the values, and gives better result/error details than Promise.all, its use today is no longer necessary. It was developed during the ES5 era, when async/await did not exist. But today you can easily replace it with async/await:
Old style:
db.task('get-all-records', t => {
return t.batch([
t.any('SELECT * FROM apples'),
t.any('SELECT * FROM oranges')
]);
})
.then([apples, oranges] => {
// process data here
})
.catch(error => {});
New style:
const {apples, oranges} = await db.task('get-all-records', async t => {
const apples = await t.any('SELECT * FROM apples');
const oranges = await t.any('SELECT * FROM oranges');
return {apples, oranges};
});
The result from the two examples above will be identical, though they are not the same in terms of the execution logic, as the first one is fully asynchronous, while the latter uses async/await, which are blocking operations, they prevent you from even creating the next query, if one before fails.
Extras
The best-performing approach when it comes to executing multiple independent queries (that do not depend on each other), is by concatenating all queries, and executing them all as one query.
For that there is method helpers.concat, plus database method multi, to handle multiple results:
const queries = [
{query: 'SELECT * FROM apples WHERE color = $1', values: ['green']},
'SELECT * FROM oranges'
];
const sql = pgp.helpers.concat(queries);
const [apples, oranges] = await db.multi(sql);
You won't even need a transaction for it, unless some of your independent queries change data.

Related

ServerValue.increment doesn't work properly when Internet goes down

The addition of ServerValue.increment() (Add increment() for atomic field value increments #2437) was a great news as it allows field values ​​to be increased atomically in Firebase RTDB.
I have an application that keeps inventories and this function has been key because it allows updating the inventory regardless of whether the user is offline at times. However, I started to notice that sometimes the function is executed twice, which completely misstates the inventory in the wrong way.
To isolate the problem I decided to do the following test, which shows that ServerValue.Increment() works wrong when the connection goes from Online to Offline:
Make a for loop function from 1 to 200:
for (var i = 1; i <= 200; i++) {
testBloc.incrementTest(i);
print('Pos: $i');
}
The function incrementTest(i) must increment two variables: position (count from 1 in 1 up to 200) and sum (add 1 + 2 + 3, ..., + 200 which should result in 20,100)
Future<bool> incrementTest(int value) async {
try {
db.child('test/position')
.set(ServerValue.increment(1));
db.child('test/sum')
.set(ServerValue.increment(value));
} catch (e) {
print(e);
}
return true;
}
Note that db refers to the Firebase instance (FirebaseDatabase.instance.reference())
With this, comes the tests:
Test 1: 100% Online. PASSED
The function works properly, reaching the two variables to the correct result (in the Firebase console):
position: 200
sum: 20100
Test 2: 100% Offline. PASSED
To do this I used a physical device in airplane mode, then I executed the for loop function, and when the function finished executing I deactivated airplane mode and checked the result in the firebase console, which was satisfactory:
position: 200
sum: 20100
Test 3: Start Online and then go to Offline. FAILED
It is a typical operating scenario when the Internet Connection goes down. Even worse when the connections are intermittent, you are traveling on a subway or you are in a low coverage site for which Offline Persistence is a desired feature. To simulate it, what I did was run the for loop function in online mode, and before it finished, I put the physical device in airplane mode. Later I went Online to finish the test and see the results on the Firebase console. The results obtained are incorrect in all cases. Here are some of the results:
As you can see, the Increment was erroneously repeated 10, 18 and 9 times more.
How can I avoid this behavior?
Is there any other way to increment atomically a number in Firebase that works properly online / Offline ?
firebaser here
That's an interesting edge-case in the increment behavior. Between the client and the server neither can be certain whether the increment was executed or not, so it ends up being retried from the client upon the reconnect. This problem can only occur with the increment operation as far as I can tell, as all the other write operations are idempotent except for transactions, but those don't work while offline.
It is possible to ensure each increment happens only once, but it'll take some work:
First, add a nonce to write operation that unique identifies this operation. You can use a push key for this, but any other UUID also works fine. Combine this with your original set() call into a single multi-path update call, writing the nonce to a top-level node with a server-side timestamp as its value.
Now in your security rules for the top-level location, only allow the write if there is no existing data. This ensures the secondary writes you're seeing get rejected, and since security rules are checked across multi-path updates as a whole, the faulty increment will get rejected too.
You'll probably want to periodically clean up the node with nonce keys, based on the timestamp value in there. It won't matter for performance (since you're never searching here outside of during the cleanup), but may help control the storage cost for the nonces.
I haven't used this approach for this specific use-case yet, but have done it for others. If you'd include a client-side retry, the above essentially builds your own multi-path transaction mechanism, which is what I needed it for in the past. But since you don't need that here, it's simpler without that.
Based on #puf answer, you can proceed as follows:
Future<bool> incrementTest(int value, int dateOfToday) async {
var id = db.push().key;
Map<String, dynamic> _updates = {
'test/position': ServerValue.increment(1),
'test/sum': ServerValue.increment(value),
'test/nonce/$id': dateOfToday,
};
db.child('previousPath').update(_updates)
.catchError((error) => print('Increment Duplication Rejected ${error.message}'));
return true;
}
Then, in Firebase Security Rules, you need to add a rule in test/nonce/id location. Something as follows:
{
"previousPath": {
"test": {
".read": "auth != null", //It depends on your root rules
".write": "auth != null", //It depends on your root rules
"nonce": {
"$nonce_id": {
".validate": "!data.exists()" //THE MAGIC IS HERE
}
}
}
}
}
In this way, when the device tries to write to the database again (wrongly), Firebase will reject it since it already had a write with that same ID before.
I hope it serves someone else!!!

Rx Extensions - Proper way to use delay to avoid unnecessary observables from executing?

I'm trying to use delay and amb to execute a sequence of the same task separated by time.
All I want is for a download attempt to execute some time in the future only if the same task failed before in the past. Here's how I have things set up, but unlike what I'd expect, all three downloads seem to execute without delay.
Observable.amb([
Observable.catch(redditPageStream, Observable.empty()).delay(0 * 1000),
Observable.catch(redditPageStream, Observable.empty()).delay(30 * 1000),
Observable.catch(redditPageStream, Observable.empty()).delay(90 * 1000),
# Observable.throw(new Error('Failed to retrieve reddit page content')).delay(10000)
# Observable.create(
# (observer) ->
# throw new Error('Failed to retrieve reddit page content')
# )
]).defaultIfEmpty(Observable.throw(new Error('Failed to retrieve reddit page content')))
full code can be found here. src
I was hoping that the first successful observable would cancel out the ones still in delay.
Thanks for any help.
delay doesn't actually stop the execution of what ever you are doing it just delays when the events are propagated. If you want to delay execution you would need to do something like:
redditPageStream.delaySubscription(1000)
Since your source is producing immediately the above will delay the actual subscription to the underlying stream to effectively delay when it begins producing.
I would suggest though that you use one of the retry operators to handle your retry logic though rather than rolling your own through the amb operator.
redditPageStream.delaySubscription(1000).retry(3);
will give you a constant retry delay however if you want to implement the linear backoff approach you can use the retryWhen() operator instead which will let you apply whatever logic you want to the backoff.
redditPageStream.retryWhen(errors => {
return errors
//Only take 3 errors
.take(3)
//Use timer to implement a linear back off and flatten it
.flatMap((e, i) => Rx.Observable.timer(i * 30 * 1000));
});
Essentially retryWhen will create an Observable of errors, each event that makes it through is treated as a retry attempt. If you error or complete the stream then it will stop retrying.

General pattern for failing over from one database to another using Entity Framework?

We have an enterprise DB that is replicated through many sites throughout the world. We would like our app to attempt to connect to one of the local sites, and if that site is down we want it to fall back to the enterprise DB. We'd like this behavior on each of our DB operations.
We are using Entity Framework, C#, and SQL Server.
At first I hoped I could just specify a "Failover Partner" in the connection string, but that only works in a mirrored DB environment, which this is not. I also looked into writing a custom IDbExecutionStrategy. But these strategies only allow you to specify the pattern for retrying a failed DB operation. It does not allow you to change the operation in any way like directing it to a new connection.
So, do you know of any good pattern for dealing with this type of operation, other than duplicating retry logic around each of our many DB operations?
Update on 2014-05-14:
I'll elaborate in response to some of the suggestions already made.
I have many places where the code looks like this:
try
{
using(var db = new MyDBContext(ConnectionString))
{
// Database operations here.
// var myList = db.MyTable.Select(...), etc.
}
}
catch(Exception ex)
{
// Log exception here, perhaps rethrow.
}
It was suggested that I have a routine that first checks each of the connections strings and returns the first one that successfully connects. This is reasonable as far as it goes. But some of the errors I'm seeing are timeouts on the operations, where the connection works but the DB has issues that keep it from completing the operation.
What I'm looking for is a pattern I can use to encapsulate the unit of work and say, "Try this on the first database. If it fails for any reason, rollback and try it on the second DB. If that fails, try it on the third, etc. until the operation succeeds or you have no more DBs." I'm pretty sure I can roll my own (and I'll post the result if I do), but I was hoping there might be a known way to approach this.
How about using some Dependency Injection system like autofac and registering there a factory for new context objects - it will execute logic that will try to connect first to local and in case of failure it will connect to enterprise db. Then it will return ready DbContext object. This factory will be provided to all objects that require it with Dependency Injection system - they will use it to create contexts and dispose of them when they are not needed any more.
" We would like our app to attempt to connect to one of the local sites, and if that site is down we want it to fall back to the enterprise DB. We'd like this behavior on each of our DB operations."
If your app is strictly read-only on the DB and data consistency is not absolutely vital to your app/users, then it's just a matter of trying to CONNECT until an operational site has been found. As M.Ali suggested in his remark.
Otherwise, I suggest you stop thinking along these lines immediately because you're just running 90 mph down a dead end street. As Viktor Zychla suggested in his remark.
Here is what I ended up implementing, in broad brush-strokes:
Define delegates called UnitOfWorkMethod that will execute a single Unit of Work on the Database, in a single transaction. It takes a connection string and one also returns a value:
delegate T UnitOfWorkMethod<out T>(string connectionString);
delegate void UnitOfWorkMethod(string connectionString);
Define a method called ExecuteUOW, that will take a unit of work and method try to execute it using the preferred connection string. If it fails, it tries to execute it with the next connection string:
protected T ExecuteUOW<T>(UnitOfWorkMethod<T> method)
{
// GET THE LIST OF CONNECTION STRINGS
IEnumerable<string> connectionStringList = ConnectionStringProvider.GetConnectionStringList();
// WHILE THERE ARE STILL DATABASES TO TRY, AND WE HAVEN'T DEFINITIVELY SUCCEDED OR FAILED
var uowState = UOWStateEnum.InProcess;
IEnumerator<string> stringIterator = connectionStringList.GetEnumerator();
T returnVal = default(T);
Exception lastException = null;
string connectionString = null;
while ((uowState == UOWStateEnum.InProcess) && stringIterator.MoveNext())
{
try
{
// TRY TO EXECUTE THE UNIT OF WORK AGAINST THE DB.
connectionString = stringIterator.Current;
returnVal = method(connectionString);
uowState = UOWStateEnum.Success;
}
catch (Exception ex)
{
lastException = ex;
// IF IT FAILED BECAUSE OF A TRANSIENT EXCEPTION,
if (TransientChecker.IsTransient(ex))
{
// LOG THE EXCEPTION AND TRY AGAINST ANOTHER DB.
Log.TransientDBException(ex, connectionString);
}
// ELSE
else
{
// CONSIDER THE UOW FAILED.
uowState = UOWStateEnum.Failed;
}
}
}
// LOG THE FAILURE IF WE HAVE NOT SUCCEEDED.
if (uowState != UOWStateEnum.Success)
{
Log.ExceptionDuringDataAccess(lastException);
returnVal = default(T);
}
return returnVal;
}
Finally, for each operation we define our unit of work delegate method. Here an example
UnitOfWorkMethod uowMethod =
(providerConnectionString =>
{
using (var db = new MyContext(providerConnectionString ))
{
// Do my DB commands here. They will roll back if exception thrown.
}
});
ExecuteUOW(uowMethod);
When ExecuteUOW is called, it tries the delegate on each database until it either succeeds or fails on all of them.
I'm going to accept this answer since it fully addresses all of concerns raised in the original question. However, if anyone provides and answer that is more elegant, understandable, or corrects flaws in this one I'll happily accept it instead.
Thanks to all who have responded.

Play 1.2.3 framework - Right way to commit transaction

We have a HTTP end-point that takes a long time to run and can also be called concurrently by users. As part of this request, we update the model inside a synchronized block so that other (possibly concurrent) requests pick up that change.
E.g.
MyModel m = null;
synchronized (lockObject) {
m = MyModel.findById(id);
if (m.status == PENDING) {
m.status = ACTIVE;
} else {
//render a response back to user that the operation is not allowed
}
m.save(); //Is not expected to be called unless we set m.status = ACTIVE
}
//Long running operation continues here. It can involve further changes to instance "m"
The reason for the synchronized block is to ensure that even concurrent requests get to pick up the latest status. However, the underlying JPA does not commit my changes (m.save()) until the request is complete. Since this is a long-running request, I do not want to wait until the request is complete and still want to ensure that other callers are notified of the change in status. I tried to call "m.em().flush(); JPA.em().getTransaction().commit();" after m.save(), but that makes the transaction unavailable for the subsequent action as part of the same request. Can I just given "JPA.em().getTransaction().begin();" and let Play handle the transaction from then on? If not, what is the best way to handle this use-case?
UPDATE:
Based on the response, I modified my code as follows:
MyModel m = null;
synchronized (lockObject) {
m = MyModel.findById(id);
if (m.status == PENDING) {
m.status = ACTIVE;
} else {
//render a response back to user that the operation is not allowed
}
m.save(); //Is not expected to be called unless we set m.status = ACTIVE
}
new MyModelUpdateJob(m.id).now();
And in my job, I have the following line:
doJob() {
MyModel m = MyModel.findById(id);
print m.status; //This still prints the old status as-if m.save() had no effect...
}
What am I missing?
Put your update code in a job an call
new MyModelUpdateJob(id).now().get();
thus the update will be done in another transaction that is commited at the end of the job
ouch, as soon as you add more play servers, you will be in trouble. You may want to play with optimistic locking in your example or and I advise against it pessimistic locking....ick.
HOWEVER, looking at your code, maybe read the article Building on Quicksand. I am not sure you need a synchronized block in that case at all...try to go after being idempotent.
In your case if
1. user 1 and user 2 both call that method and it is pending, then it goes to active(Idempotent)
If user 1 or user 2 wins, well that would be like you had the synchronization block anyways.
I am sure however you have a more complex scenario not shown here, BUT READ that article Building on Quicksand as it really changes the traditional way of thinking and is how google and amazon and very large scale systems operate.
Another option for distributed transactions across play servers is zookeeper which the big large nosql guys use BUT only as a last resort ;) ;)
later,
Dean

Cancelling an Entity Framework Query

I'm in the process of writing a query manager for a WinForms application that, among other things, needs to be able to deliver real-time search results to the user as they're entering a query (think Google's live results, though obviously in a thick client environment rather than the web). Since the results need to start arriving as the user types, the search will get more and more specific, so I'd like to be able to cancel a query if it's still executing while the user has entered more specific information (since the results would simply be discarded, anyway).
If this were ordinary ADO.NET, I could obviously just use the DbCommand.Cancel function and be done with it, but we're using EF4 for our data access and there doesn't appear to be an obvious way to cancel a query. Additionally, opening System.Data.Entity in Reflector and looking at EntityCommand.Cancel shows a discouragingly empty method body, despite the docs claiming that calling this would pass it on to the provider command's corresponding Cancel function.
I have considered simply letting the existing query run and spinning up a new context to execute the new search (and just disposing of the existing query once it finishes), but I don't like the idea of a single client having a multitude of open database connections running parallel queries when I'm only interested in the results of the most recent one.
All of this is leading me to believe that there's simply no way to cancel an EF query once it's been dispatched to the database, but I'm hoping that someone here might be able to point out something I've overlooked.
TL/DR Version: Is it possible to cancel an EF4 query that's currently executing?
Looks like you have found some bug in EF but when you report it to MS it will be considered as bug in documentation. Anyway I don't like the idea of interacting directly with EntityCommand. Here is my example how to kill current query:
var thread = new Thread((param) =>
{
var currentString = param as string;
if (currentString == null)
{
// TODO OMG exception
throw new Exception();
}
AdventureWorks2008R2Entities entities = null;
try // Don't use using because it can cause race condition
{
entities = new AdventureWorks2008R2Entities();
ObjectQuery<Person> query = entities.People
.Include("Password")
.Include("PersonPhone")
.Include("EmailAddress")
.Include("BusinessEntity")
.Include("BusinessEntityContact");
// Improves performance of readonly query where
// objects do not have to be tracked by context
// Edit: But it doesn't work for this query because of includes
// query.MergeOption = MergeOption.NoTracking;
foreach (var record in query
.Where(p => p.LastName.StartsWith(currentString)))
{
// TODO fill some buffer and invoke UI update
}
}
finally
{
if (entities != null)
{
entities.Dispose();
}
}
});
thread.Start("P");
// Just for test
Thread.Sleep(500);
thread.Abort();
It is result of my playing with if after 30 minutes so it is probably not something which should be considered as final solution. I'm posting it to at least get some feedback with possible problems caused by this solution. Main points are:
Context is handled inside the thread
Result is not tracked by context
If you kill the thread query is terminated and context is disposed (connection released)
If you kill the thread before you start a new thread you should use still one connection.
I checked that query is started and terminated in SQL profiler.
Edit:
Btw. another approach to simply stop current query is inside enumeration:
public IEnumerable<T> ExecuteQuery<T>(IQueryable<T> query)
{
foreach (T record in query)
{
// Handle stop condition somehow
if (ShouldStop())
{
// Once you close enumerator, query is terminated
yield break;
}
yield return record;
}
}