How to handle multiple queries in Postgresql? - postgresql

I have data that requires a daily delsupsert to 10 PostgreSQL tables, called from Python's psycopg2. My current (single) query has 30 CTEs of inserts, updates, and deletes, that get executed only at the end as far as I know.
It's clean, but it's challenging to debug and complicated to understand. What's the best way to refactor this situation for code readability and easier debugging? (Stored procedures? Raise statements?)

Related

Query Tuning PostgreSQL stored procedure which has 1000 queries Inside

I want to Tune my PostgreSQL stored procedure which has 1000 queries Inside. My SP is suddenly started to lack Perfomance.
How can I debug this SP which query is lagging performance inside the SP? Since Explain analyze doesn’t really show the much stats on SP.
Thanks for you help out
You best use auto_explain with auto_explain.log_nested_statements, auto_explain.log_analyze and auto_explain.log_buffers turned on.
Then you get the execution plans of all SQL statements with their duration logged.
I think that if you have a single function with 1000 different SQL statements inside, your design could be improved.

How to DoS the Postgres Database

I am doing small research work on how to protect database against DDoS atack.
I am using postgres database in my testings.
I want to perform a DDoS of Database on my local machine. But I don't really know how to do it.
My plan is to create script that runs bunch of queries. But I want these queries took as much time to complete as possible.
I saw this example: select tab1 from (select decode(encode(convert(compress(post) using latin1),concat(post,post,post,post)),sha1(concat(post,post,post,post))) as tab1 from table_1)a;
But I am failing to replicate it in postgres.
I need help in translating this query in postgres or other examples of functions or queries that would take lots of time to complete.
Edit:
sleep functions might not work. They are not loading system enough.
In my understanding, DDoS should be performed with functions that are taking a long time to perform and sucks up tons of compute powers of the system.

Azure Database, EF, Time out issues

I have taken over an existing MVC website which uses entity framework and hangfire and is hosted on Azure and uses Azure database. Every so often the website times out.
I'm new to Azure portal, entity framework and hangfire.
If I increase the DTU's it clears the timeout issues?
I'm looking for ways of how to diagnose why the website times out. I have added error logging using elmah and checked hangfire but this doesn't give me any further information.
Is there anything in azure portal that can help?
If it "times out" and if "increasing DTU resolves timeouts" and these observations are true (I think it's on you to really convince yourself this is absolutely true, don't make this assumption lightly) then the usual and obvious candidate is "a slow sql query". Entity Framework is often used with linq to create sql queries without writing sql. These queries are often fine for very simple tasks, such as someData.Where(x=>x.Id == 1).First(), however, if linq is used to join tables, or create complex associations, the generated sql can become monstrously bad, from a performance perspective. You can add logging to write out the sql generated by linq, or you can try to trace the database to see what sql is running on it. If tracing is out of the question, there are still meta queries you can use to view things like cached query plans and SQL Server can give you estimated costs and cached execution counts.
You can still hang yourself without using linq. You can still use stored procedures with EF. Way too many developers are naive about SQL performance still; you need to comb over your back end and learn the schema, the stored procedures; inspect the sql contents of everything. Check for any database triggers (easy to miss). Red flags are subqueries, too many joining, too many results from a query, lots of string manipulation in a query, joining tables on strings, or XML/JSON-based SQL work.
Be aware that "slow sql queries" will become slower when load is high. And when slow sql queries build up, they only take more time to resolve. This can also cause debilitating table locking, depending on the nature of the query.
But queries can be performant and still cause locking. ie One table is being written to often and it's blocking other writes or reads from that table. This is a little harder to diagnose, but you can figure it out by carefully inspecting logs of database calls and how long they take to execute. There are also sql queries you can run on the database to diagnose long-running queries, or what tables are locked at a given point in time.
Finally, check for any back end webjobs for your application. If timeouts occur at reoccurring days or times, then somebody's batch SQL could be blocking your production database from being read.
But this is all speculation. I think you need to do more research to determine what is actually causing the site to become unresponsive. If you can log response times for common queries, you can rule out SQL-based latency as being the culprit or not and work from there. There's nothing inherently "amiss" about any of the technologies you specified.
If queries are perfomant but still causing issues, a long term solution is to add something like a message queue and batch your sql work intelligently, or just make the database work asynchronous and not block the UI.
You should correlate any logged timeouts with azure's monitoring. Azure can give you CPU/RAM/page visits and such on the dashboard.
SQL Azure is a bit of a different beast. It doesn't have the on-demand performance of a dedicated DB unless you're prepared to throw serious $$ at it. And even then ...
EF, when written for well can perform quite well. When written poorly it can be a dog, and those problems are compounded on a platform like SQL Azure.
The first thing is to check that your EF contexts are set up to use an execution strategy suited to Azure: https://learn.microsoft.com/en-us/ef/ef6/fundamentals/connection-resiliency/retry-logic
The next thing would be to see what kinds of SQL tracing you can run on Azure. Tracing is essential to see what EF is doing behind the scenes. I'm not familiar with tools available for Azure, in my case my Azure experience was running SQL Server on VMs because SQL Azure was too immature, not HIPAA compliant at the time, and expensive for the DTU estimates we were able to get. Worst case, can you restore an database backup into an SQL Server instance and point a copy of your application environment temporarily at that to run through common usage scenarios? Using an SQL Trace you can pick up on exactly when and how often EF is executing queries, and what queries it is executing.
Things to look at:
How many queries are running? If you are loading a set of records and expect one query, are there a whole heap of queries getting sent? This would indicate lazy-load calls being triggered.
What queries are being run? Is it selecting a lot more fields than are being displayed? This would be potentially a case where entire entities are being loaded where a .Select() could be used to reduce the amount of data. Perhaps even the case where entire sets of entities are being loaded that aren't relevant to what is displayed/done, such as cases where someone is using .ToList() prior to just doing a .Count() or .Any() or doing a .FirstOrDefault() just to do a != null check.
Is the database properly indexed? Copy some of the heavier queries into SQL Manager and execute them with an execution plan. Are there indexing suggestions?
The common sins of developing with EF and other ORMs boil down to "pulling too much, too often." It's surprising how many clients I've worked with have development teams that have not used a profiler to inspect their ORM use efficiency. (and I'm talking 0% so far.)

How does postgresql lock tables when inserting and selecting?

I'm migrating data from one table to another in an environment where any long locks or downtime is not acceptable, in total about 80000 rows. Essentially the query boils down to this simple case:
INSERT INTO table_2
SELECT * FROM table_1
JOIN table_3 on table_1.id = table_3.id
All 3 tables are being read from and could have an insert at any time. I want to just run the query above, but I'm not sure how the locking works and whether the tables will be totally inaccessible during the operation. My understanding tells me that only the affected rows (newly inserted) will be locked. Table 1 is just being selected, so no harm, and concurrent inserts are safe so table 2 should be freely accessible.
Is this understanding correct, and can I run this query in a production environment without fear? If it's not safe, what is the standard way to accomplish this?
You're fine.
If you're interested in the details, you can read up on multiversion concurrency control, or on the details of the Postgres MVCC implementation, or how its various locking modes interact, but the implications for your case are nicely summarised in the docs:
reading never blocks writing and writing never blocks reading
In short, every record stored in the database has some version number attached to it, and every query knows which versions to consider and which to ignore.
This means that an INSERT can safely write to a table without locking it, as any concurrent queries will simply ignore the new rows until the inserting transaction decides to commit.

Pg-promise vs sequelize, To handle millions of sql queries

Module pg-promise provides method sequence to execute infinite sequences, suitable for massive transactions, like bulk inserts, with way over 1,000 records. And it supports query streams for high-performance, read-only queries.
Does Sequelize offer anything similar to those things?
Sorry for asking such basic things, but I am new and don't have any idea about these two.
Thanks for your response and suggestions.
This answer will probably anger some of the Sequelize fans, but in the name of truth...
What you're referring to within pg-promise is fully documented in Data Imports.
And no, Sequelize doesn't have anything like that.
And in addition, Sequelize is known to be plagued by performance issues, like this one: Transactions extremely slow when inserting 1000s of records, ones that are very bad and very old at the same time.