Multiple threads in db2luw - db2

I am very new to Db2. I have a question , Developed few procedures which will perform some operations on db2 database. My question is how to create multiple threads on db2 server concurrently. I mean I have a database with 70,000 tables each having more than 1000 records . I have a procedure which will update all these 70,000 tables. So time consumption is the main factor, here. I want to divide my update statement into 10 threads , where each thread will update 7000 tables. I want to run all the 10 threads simultaneously.
Can some one kindly let me know the way , to achieve this.
DB2 c Express on windows.

There's nothing in DB2 for creating multiple threads.
The enterprise level version of DB2 will automatically process a single statement across multiple cores when and where needed. But that's not what you're asking for.
I don't believe any SQL based RDBMS allows for a SP that create it's own threads. The whole point of SQL is hat it's a higher level of abstraction, you don't have access to those kinds of details.
You'll need to write an external app in a language that supports threads and that opens 10 connections to the DB simultaneously. But depending on the specifics of the update you're doing, and hardware you have. You might find that 10 connections is too many.

To elaborate on Charles's correct answer, it is up to the client application to parallelize its DML workload by opening multiple connections to the database. You could write such a program on your own, but many ETL utilities provide components that enable parallel workflows similar to what you've described. Aside from reduced programming, another advantage of using an ETL tool to define and manage a multi-threaded database update is built-in exception handling, making it easier to roll back all of the involved connections if any of them encounter an error.

Related

Postgres architecture for one machine with several apps

I have one machine on which several applications are hosted. Applications work on separated data and don't interact - each application only needs access to its own data. I want to use PostgreSQL as RDBMS. Which one of the following is best and why?
One global Postgres sever, one global database, one schema per application.
One global Postgres server, one database per application.
One Postgres server per application.
Feel free to suggest additional architectures if you think they would be better than the ones above.
The questions you need to ask yourself: does any application ever need to access data from another application (in the same SQL statement). If you can can answer that with a clear NO, then you should at least go for separate databases. Cross-database queries aren't that straight-forward in Postgres, so if the different applications do need a lot of data from other applications, then solution 1 might be deployment layout to think about. If this would only concern very few tables, then using foreign data wrappers with different databases might still be a better solution.
Solution 2 and 3 are more or less the same from the perspective of each application. One thing to keep in mind when deciding between 2 and 3 is availability. Some configuration changes to Postgres require a restart of the service. Is an outage of all applications acceptable in that case, even though the change was only necessary for one?
But you can always start with option 2 and then move database to different servers later.
Another question to ask is if all applications always use the same (major) Postgres version With solution 2 you must make sure that all applications are compatible with a new Postgres version if one of them wants to upgrade e.g. because of new features that the application wants to use.
Solution 1 is stupid : a SQL schema is not a database. Use SQL schema for one application that have multiple "parts" like "Poduction", "sales", "marketing", "finances"...
While the final volume of the data won't be too heavy and the number of user won't be too much, use only one PG cluster to facilitate administration tasks
If the volume of data or the number of user increases, it will be time to separates your different databases on new distinct PG clusters....

What would happen if I run two SQL commands using the same DB connection?

I'm writing a program to run mass calculation and output results into PostgreSQL.
My platform is Windows Sever 2008, PostgreSQL 10. My program is written in C.
The results would be produced group by group, finishing of each group will create an extra thread to write the output.
Now since the output threads are created one by one, it is possible that two or more SQL input commands will be created simultaneously, or the previous one is under process when new ones call the function.
So my questions are:
(1) What would happen if one thread is in SQL processing and another thread called PQexec(PGconn *conn, const char *query), would they effect each other?
(2) What if I apply different PGconn? Would it speed up?
If you try to call PQexec on a connection that is in the process of executing an SQL statement, you would cause a protocol violation. That just doesn't work.
Processing could certainly be made faster if you use several database connections in parallel — concurrent transactions is something that PostgreSQL is designed for.

Data mining with postgres in production environment - is there a better way?

There is a web application which is running for a years and during its life time the application has gathered a lot of user data. Data is stored in relational DB (postgres). Not all of this data is needed to run application (to do the business). However form time to time business people ask me to provide reports of this data data. And this causes some problems:
sometimes these SQL queries are long running
quires are executed against production DB (not cool)
not so easy to deliver reports on weekly or monthly base
some parts of data is stored in way which is not suitable for such
querying (queries are inefficient)
My idea (note that I am a developer not the data mining specialist) how to improve this whole process of delivering reports is:
create separate DB which regularly is update with production data
optimize how data is stored
create a dashboard to present reports
Question: But is there a better way? Is there another DB which better fits for such data analysis? Or should I look into modern data mining tools?
Thanks!
Do you really do data mining (as in: classification, clustering, anomaly detection), or is "data mining" for you any reporting on the data? In the latter case, all the "modern data mining tools" will disappoint you, because they serve a different purpose.
Have you used the indexing functionality of Postgres well? Your scenario sounds as if selection and aggregation are most of the work, and SQL databases are excellent for this - if well designed.
For example, materialized views and triggers can be used to process data into a scheme more usable for your reporting.
There are a thousand ways to approach this issue but I think that the path of least resistance for you would be postgres replication. Check out this Postgres replication tutorial for a quick, proof-of-concept. (There are many hits when you Google for postgres replication and that link is just one of them.) Here is a link documenting streaming replication from the PostgreSQL site's wiki.
I am suggesting this because it meets all of your criteria and also stays withing the bounds of the technology you're familiar with. The only learning curve would be the replication part.
Replication solves your issue because it would create a second database which would effectively become your "read-only" db which would be updated via the replication process. You would keep the schema the same but your indexing could be altered and reports/dashboards customized. This is the database you would query. Your main database would be your transactional database which serves the users and the replicated database would serve the stakeholders.
This is a wide topic, so please do your diligence and research it. But it's also something that can work for you and can be quickly turned around.
If you really want try Data Mining with PostgreSQL there are some tools which can be used.
The very simple way is KNIME. It is easy to install. It has full featured Data Mining tools. You can access your data directly from database, process and save it back to database.
Hardcore way is MADLib. It installs Data Mining functions in Python and C directly in Postgres so you can mine with SQL queries.
Both projects are stable enough to try it.
For reporting, we use non-transactional (read only) database. We don't care about normalization. If I were you, I would use another database for reporting. I will desing the tables following OLAP principals, (star schema, snow flake), and use an ETL tool to dump the data periodically (may be weekly) to the read only database to start creating reports.
Reports are used for decision support, so they don't have to be in realtime, and usually don't have to be current. In other words it is acceptable to create report up to last week or last month.

How to create multiple instance of sqlite database?

I am making an online app in which when I sync my data web then 25 to 30 local database queries in different tables are executed. So it will take around 25 to 30 sec because all database queries are execute in this manner, first check that data is present or not in local database if present then row is update otherwise insert. Now I want to ask that there are any way through which I can execute these all queries concurrently. If I can do this then I can save my 10 to 15 sec in every sync. So please gave a better solution to execute multiple queries.
Consider using High Performance database management system such as cubeSQL :
SQLabs has announced the release of cubeSQL a fully featured and high
performance relational database management system built on top of the
sqlite database engine. It is the ideal database server for both
developers who want to convert a single user database solution to a
multiuser project and for companies looking for an affordable, easy to
use and easy to maintain database management system. cubeSQL runs on
Windows, Mac, Linux and it can be embedded into any iOS and Cocoa
application.
cubeSQL is incredibly fast, has a small footprint, is highly reliable
and it offers some unique features. It can be easily accessed with any
JSON client, with PHP, with the native C SDK, with a Windows DLL and
with an highly optimized REAL Studio plugin.
It is not possible to run 2 or more than two queries at a single time cause when 1 query runs it locks the DataBase.
If all queries you want to execute that relates to the different table then in that case you can create the Separate Database File for every Table.

Does PostgreSQL allow running stored procedures in parallel?

I'm working with an ETL tool, Business Objects Data Services, which has the capability of specifying parallel execution of functions. The documentation says that before you can do this, you have to make sure that your database, which in our case is Postgres, allows "a stored procedure to run in parallel". Can anyone tell me if Postgres does that?
Sure. Just run your queries in different connections, and they will run in parallel transactions. Beware of locking though.
You can also call different stored procedures from the same connection (and effectively still run them in parallel) by using DBLink.
See this SO answer to see an example.