Zumero sync: knowing what tables are changed - zumero

I have this kind of situation.
A mobile app designed for technicians has a background task that perform data syncinc often (it's planned about each 10 minutes), and for most of the data in the DB there's no problems of doing it so often, as they doesn't change so often. But I have also a PARAMETERS table that I need to know when changes, because it involves some time-consuming operations on the mobile device, and I'd like to do it only when something changes, or postpone them. Is it possible to achieve it with Zumero ?
Thanks.
Rodolfo.

It's undocumented (so, not guaranteed to be future proof), but you can check the following before and after sync:
select max(z_rv) from PARAMETERS;
select count(*) from PARAMETERS;
If updates or inserts have occurred, z_rv will increase. A change in count() will catch deletes.

Related

Do Firebase/Firestore Transactions create internal queues?

I'm wondering if transactions (https://firebase.google.com/docs/firestore/manage-data/transactions) are viable tools to use in something like a ticketing system where users maybe be attempting to read/write to the same collection/document and whoever made the request first will be handled first and second will be handled second etc.
If not what would be a good structure for such a need with firestore?
Transactions just guarantee atomic consistent update among the documents involved in the transaction. It doesn't guarantee the order in which those transactions complete, as the transaction handler might get retried in the face of contention.
Since you tagged this question with google-cloud-functions (but didn't mention it in your question), it sounds like you might be considering writing a database trigger to handle incoming writes. Cloud Functions triggers also do not guarantee any ordering when under load.
Ordering of any kind at the scale on which Firestore and other Google Cloud products operate is a really difficult problem to solve (please read that link to get a sense of that). There is not a simple database structure that will impose an order where changes are made. I suggest you think carefully about your need for ordering, and come up with a different solution.
The best indication of order you can get is probably by adding a server timestamp to individual documents, but you will still have to figure out how to process them. The easiest thing might be to have a backend periodically query the collection, ordered by that timestamp, and process things in that order, in batch.

PostgreSQL: Backend processes are active for a long time

now I am hitting a very big road block.
I use PostgreSQL 10 and its new table partitioning.
Sometimes many queries don't return and at the time many backend processes are active when I check backend processes by pg_stat_activity.
First, I thought theses process are just waiting for lock, but these transactions contain only SELECT statements and the other backend doesn't use any query which requires ACCESS EXCLUSIVE lock. And these queries which contain only SELECT statements are no problem in terms of plan. And usually these work well. And computer resources(CPU, memory, IO, Network) are also no problem. Therefore, theses transations should never conflict. And I thoughrouly checked the locks of theses transaction by pg_locks and pg_blocking_pids() and finnaly I couldn't find any lock which makes queries much slower. Many of backends which are active holds only ACCESS SHARE because they use only SELECT.
Now I think these phenomenon are not caused by lock, but something related to new table partition.
So, why are many backends active?
Could anyone help me?
Any comments are highly appreciated.
The blow figure is a part of the result of pg_stat_activity.
If you want any additional information, please tell me.
EDIT
My query dosen't handle large data. The return type is like this:
uuid UUID
,number BIGINT
,title TEXT
,type1 TEXT
,data_json JSONB
,type2 TEXT
,uuid_array UUID[]
,count BIGINT
Because it has JSONB column, I cannot caluculate the exact value, but it is not large JSON.
Normally theses queries are moderately fast(around 1.5s), so it is absolutely no problem, however when other processes work, the phenomenon happens.
If statistic information is wrong, the query are always slow.
EDIT2
This is the stat. There are almost 100 connections, so I couldn't show all stat.
For me it looks like application problem, not postresql's one. active status means that your transaction still was not commited.
So why do you application may not send commit to database?
Try to review when do you open transaction, read data, commit transaction and rollback transaction in your application code.
EDIT:
By the way, to be sure try to check resource usage before problem appear and when your queries start hanging. Try to run top and iotop to check if postgres really start eating your cpu or disk like crazy when problem appears. If not, I will suggest to look for problem in your application.
Thank you everyone.
I finally solved this problem.
I noticed that a backend process holded too many locks. So, when I executed the query SELECT COUNT(*) FROM pg_locks WHERE pid = <pid>, the result is about 10000.
The parameter of locks_per_transactions is 64 and max_connections is about 800.
So, if the number of query that holds many locks is large, the memory shortage occurs(see calculation code of shared memory inside PostgreSQL if you are interested.).
And too many locks were caused when I execute query like SELECT * FROM (partitioned table). Imangine you have a table foo that is partitioned and the number of the table is 1000. And then you can execute SELECT * FROM foo WHERE partion_id = <id> and the backend process will hold about 1000 table locks(and index locks). So, I change the query from SELECT * FROM foo WHERE partition_id = <id> to SELECT * FROM foo_(partitioned_id). As the result, the problem looks solved.
You say
Sometimes many queries don't return
...however when other processes work, the phenomenon happens. If statistic
information is wrong, the query are always slow.
They don't return/are slow when directly connecting to the Postgres instance and running the query you need, or when running the queries from an application? The backend processes that are running, are you able to kill them successfully with pg_terminate_backend($PID) or does that have issues? To rule out issues with the statement itself, make sure statement_timeout is set to a reasonable amount to kill off long-running queries. After that is ruled out, perhaps you are running into a case of an application hanging and never allowing the send calls from PostgreSQL to finish. To avoid a situation like that, if you are able to (depending on OS) you can tune the keep-alive time: https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-TCP-KEEPALIVES-IDLE (by default is 2 hours)
Let us know if playing with any of that gives any more insight into your issue.
Sorry for late post, As #Konstantin pointed out, this might be because of your application(which is why I asked for your EDIT2). Adding a few excerpts,
table partition has no effect on these locks, that is a totally different concept and does not hold up locks in your case.
In your application, check if the connection properly close() after read() and is in finally block (From Java perspective). I am not sure of your application tier.
Check if SELECT..FOR UPDATE or any similar statement is written erroneously recently which is causing this.
Check if any table has grown in size recently and the column is not Indexed. This is very important and frequent cause of select statements running for some minutes. I'd also suggest using timeouts for select statements in your application. https://www.postgresql.org/docs/9.5/gin-intro.html This can give you a headstart.
Another thing that is fishy to me is the JSONB column, maybe your Jsonb values are pretty long, or the queries are unnecessarily selecting JSONB value even if not required?
Finally, If you don't need some special features of Jsonb data type, then you use JSON data type which is faster (magical maximum, sometimes 50x!)
It looks like the pooled connections not getting closed properly and a few queries might be taking huge time to respond back. As pointed out in other answers, it is the problem with the application and could be connection leak. Most possibly, it might be because of pending transactions over some already pending and unresolved transactions, leading to a number of unclosed transactions.
In addition, PostgreSQL generally has one or more "helper" processes like the stats collector, background writer, autovaccum daemon, walsender, etc, all of which show up as "postgres" instances.
One thing I would suggest you check in which part of the code you have initiated the queries. Try to DRY run your queries outside the application and have some benchmarking of queries performance.
Secondly, you can keep some timeout for certain queries if not all.
Thirdly, you can do kill the idle transactions after certain timeouts by using:
SET SESSION idle_in_transaction_session_timeout = '5min';
I hope it might work. Cheers!

Simulating an Oracle sequence with MongoDB

Our domain model deals with sales invoices, each of which has a unique, automatically generated number. When creating an invoice, our SalesInvoiceService retrieves a number from a SalesInvoiceNumberGenerator, creates a SalesInvoice using this number and a few other objects (seller, buyer, issue date, etc.) and stores it through the SalesInvoiceRepository. Since we are using MongoDB as our database, our MongoDbSalesInvoiceNumberGenerator uses a findAndModify command with $inc 1 on a given InvoicePolicies.nextSalesInvoiceNumber to generate this unique number, similar to what we would using an Oracle sequence.
This is working in normal situations. However, when invoice creation fails because of a broken business rule (e.g. invalid issue date), an exception is thrown and our InvoicePolicies.nextSalesInvoiceNumber has alreay been incremented. Obviously, since there is no transaction managing this unit of work, this increment is not rolled back, so we end up with lost invoice numbers. We do offer a manual compensation mechanism to the user, but we would like to avoid this sort of situation in the first place.
How would you deal with this situation? And no, switching to another database is not option :)
Thanks!
TL;DR: What you want is strict serializability, but you probably won't get it, unless you give up concurrency completely (then you even get linearizability, theoretically). Gap-free is easy, but making sure that today's invoice doesn't get a lower number than yesterdays is practically impossible.
This is tricky, or at least, very expensive. That is also true for any other data store, because you'll have to limit the concurrency of the application to guarantee it. Think of an auto-increasing stamp that is passed around in an office, but some office workers lose letters. Tricky... But you can reduce the likelihood.
Generating sequences without gaps is hard when contention is high, and very hard in a distributed system. Keeping a lock for the entire time the invoice is generated is usually not an option, though that would be easy. So let's try that:
Easiest way out: Use a singleton background worker, i.e. a single-threaded process that runs on a single machine. Have it explicitly check whether the current number is really present in the invoice collection. Because it's single-threaded on a single machine, it can't have race conditions. Done, via limiting concurrency.
When allowing concurrency, things get messy:
It might be best to use something like a two-phase commit protocol. Essentially, make the entire invoice creation process a long-running transaction, and store the pending transactions explicitly, i.e. store all numbers that haven't been used yet, but reserved.
Then track the completion status of each and every transaction. If a transaction hasn't finished after some timeout, consider that number available again. It's hard enough to add that to the counter code, but it's possible (check if a timed out transaction is present, otherwise get a new counter value).
There are several possible errors, but they can all be resolved. This is better explained in the link and on the net. Generally, getting the implementation right is hard though.
The timeout poses a problem, however, because you need to hard-code an assumption about the time it takes for invoices to be generated. That can be awkward close to day/month/year barriers, since you'll want to avoid creating invoice 12345 in 2015 and 12344 in 2014.
Even this won't guarantee gap free numbers for limited time intervals: if no more request is made that could use the gap number in the current year, you're facing a problem.
I wonder if using something like findAndModify and the new Transactions API combined could be used to achieve something like that while also accounting for gaps if ran within a transaction then? I haven't personally tried it, and my project isn't far along yet to worry about the billing system but would love to be able to use the same database for everything to make things a bit easier to operate.
One problem I would think is probably a write bottleneck but this should only take a few milliseconds I'd imagine and you could probably use a different counter for every jurisdiction or store like real life stores do. Then the cash register number could be part of it too, which I guess guess cash register numbers in the digital world could be the transaction processing server it went to if say you used microservices for example, so you could load balance round robin between them probably. That's assuming if it's uses a per document lock - which from my understanding it does possibly.
The only main time I'd probably worry about this bottleneck is if you had a very popular store or around black Friday where there's a huge spike or doing recurring invoices.

Controlling duration of PostgreSQL lock waits

I have a table called deposits
When a deposit is made, the table is locked, so the query looks something like:
SELECT * FROM deposits WHERE id=123 FOR UPDATE
I assume FOR UPDATE is locking the table so that we can manipulate it without another thread stomping on the data.
The problem occurs though, when other deposits are trying to get the lock for the table. What happens is, somewhere in between locking the table and calling psql_commit() something is failing and keeping the lock for a stupidly long amount of time. There are a couple of things I need help addressing:
Subsequent queries trying to get the lock should fail, I have tried achieving this with NOWAIT but would prefer a timeout method (because it may be ok to wait, just not wait for a 'stupid amount of time')
Ideally I would head this off at the pass, and have my initial query only hold the lock for a certain amount of time, is this possible with postgresql?
Is there some other magic function I can tack onto the query (similar to NOWAIT) which will only wait for the lock for 4 seconds before failing?
Due to the painfully monolithic spaghetti code nature of the code base, its not simply a matter of changing global configs, it kinda needs to be a per-query based solution
Thanks for your help guys, I will keep poking around but I haven't had much luck. Is this a non-existing function of psql, because I found this: http://www.postgresql.org/message-id/40286F1F.8050703#optusnet.com.au
I assume FOR UPDATE is locking the table so that we can manipulate it without another thread stomping on the data.
Nope. FOR UPDATE locks only those rows, so that another transaction that attempts to lock them (with FOR SHARE, FOR UPDATE, UPDATE or DELETE) blocks until your transaction commits or rolls back.
If you want a whole table lock that blocks inserts/updates/deletes you probably want LOCK TABLE ... IN EXCLUSIVE MODE.
Subsequent queries trying to get the lock should fail, I have tried achieving this with NOWAIT but would prefer a timeout method (because it may be ok to wait, just not wait for a 'stupid amount of time')
See the lock_timeout setting. This was added in 9.3 and is not available in older versions.
Crude approximations for older versions can be achieved with statement_timeout, but that can lead to statements being cancelled unnecessarily. If statement_timeout is 1s and a statement waits 950ms on a lock, it might then get the lock and proceed, only to be immediately cancelled by a timeout. Not what you want.
There's no query-level way to set lock_timeout, but you can and should just:
SET LOCAL lock_timeout = '1s';
after you BEGIN a transaction.
Ideally I would head this off at the pass, and have my initial query only hold the lock for a certain amount of time, is this possible with postgresql?
There is a statement timeout, but locks are held at transaction level. There's no transaction timeout feature.
If you're running single-statement transactions you can just set a statement_timeout before running the statement to limit how long it can run for. This isn't quite the same thing as limiting how long it can hold a lock, though, because it might wait 900ms of an allowed 1s for the lock, only actually hold the lock for 100ms, then get cancelled by the timeout.
Is there some other magic function I can tack onto the query (similar to NOWAIT) which will only wait for the lock for 4 seconds before failing?
No. You must:
BEGIN;
SET LOCAL lock_timeout = '4s';
SELECT ....;
COMMIT;
Due to the painfully monolithic spaghetti code nature of the code base, its not simply a matter of changing global configs, it kinda needs to be a per-query based solution
SET LOCAL is suitable, and preferred, for this.
There's no way to do it in the text of the query, it must be a separate statement.
The mailing list post you linked to is a proposal for an imaginary syntax that was never implemented (at least in a public PostgreSQL release) and does not exist.
In a situation like this you may want to consider "optimistic concurrency control", often called "optimistic locking". It gives you greater control over locking behaviour at the cost of increased rates of query repetition and the need for more application logic.

Is it possible to pause an SQL query?

I've got a really long running SQL query (data import, etc). It's crap - it uses cursors and it running slowly. It's doing it, so I'm not too worried about performance.
Anyways, can I pause it for a while (instead of canceling the query)?
It chews up a a bit of CPU so i was hoping to pause it, do some other stuff ... then resume it.
I'm assuming the answer is 'NO' because of how rows and data gets locked, etc.
I'm using Sql Server 2008, btw.
The best approximation I know for what you're looking for is
BEGIN
WAITFOR DELAY 'TIME';
EXECUTE XXXX;
END;
GO
Not only can you not pause it, doing so would be bad. SQL queries hold locks (for transactional integrity), and if you paused the query, it would have to hold any locks while it was paused. This could really slow down other queries running on the server.
Rather than pause it, I would write the query so that it can be terminated, and pick up from where it left off when it is restarted. This requires work on your part as a query author, but it's the only feasible approach if you want to interrupt and resume the query. It's a good idea for other reasons as well: long running queries are often interrupted anyway.
Click the debug button instead of execute. SQL 2008 introduced the ability to debug queries on the fly. Put a breakpoint at convenient locations
When working on similar situations, where I was trying to go through an entire list of data, which could be huge, and could tell which ones I have visited already, I would run the processing in chunks.
update or whatever
where (still not done)
limit 1000
And then I would just keep running the query until there are no rows being modified. This breaks the locks up into reasonable time chunks and can allow you to do thinks like move tens of millions of rows between tables while a system is in production.
Jacob
Instead of pausing the script, perhaps you could use resource governor. That way you could allow the script to run in the background without severely impacting performance of other tasks.
MSDN-Resource Governor