Is it possible to access a Apache::DBI database handle from a perl script (that isn't running under mod_perl).
What I am looking for is database pooling for my perl scripts, I have a fair number of database sources (oracle/mysql) and a growing number of scripts.
Some ideas like SQLRelay, using Oracle10XE with database links and pooling, and or convert all scripts to SOAP calls, etc are becoming more and more viable. But if there was a mechanism for reusing Apache::DBI I could fight this a bit.
I have no non-perl requirements, so we don't have a php/jdbc implementation or similar to deal with.
Thanks
First off it helps to remember that DBI/DBD is not a wire protocol, but an API over diverse data sources.
Since you are wanting to connect to a pool of database connections from separate processes, DBIx::Connector is not appropriate for that, and Rose::DB seems an odd choice too (they are both wrappers over DBI). You are looking for something like DBD::Proxy or DBD::Gofer, which let you connect multiple processes to a shared database handle.
Related
I've an unfortunate situation where multiple Perl processes write and read the same SQLite3 database at the same time.
This often causes Perl processes to crash as two processes would be writing at the same time, or one process would be reading from the database while the other tries to update the same record.
Does anyone know how I could coordinate the multiple processes to work with the same sqlite database?
I'll be working on moving this system to a different database engine but before I do that, I somehow need to fix it to work as it is.
SQLite is designed to be used from multiple processes. There are some exceptions if you host the sqlite file on a network drive, and there maybe a way to compile it such that it expects to be used from one process, but I use it from multiple processes regularly. If you are experiencing problems, try increasing the timeout value. SQLite uses the filesystem locks to protect the data from simultaneous access. If one process is writing to the file, a second process might have to wait. I set my timeouts to 3 seconds, and have very little problems with that.
Here is the link to set the timeout value
I have a project that stores several millions of domain names in database and perform search requests to find if domain is present in DB. The only operation I need - check if given value exists. No range queries, no additional information, nothing.
The number of queries that I make to database is rather big, for example 100'000 per one user session.
I have new database once a day and even it's possible to check what records were deleted and what added - I don't think that it's worth it. So, I am importing database to a new table and point script to a new name.
Looking for solution that can make the whole things faster, as I don't use any SQL features. Name search and import time are important for me.
My server can't store this database in memory, even half of it, so I think some NoSQL solution working from hard drive can help me.
Can you suggest something?
A much smaller and faster solution would be to use Berkeley DB with the key-value pair API. Berkeley DB is a database library that links into your application, so there is no client/server overhead nor separate server to install and manage. Berkeley DB is very straightforward and provides, among several APIs, a simple key-value (NoSQL) API that provides all of the basic data management routines that you would expect to find in a much larger, more complex RDBMS (indexing, secondary indexes, foreign keys), but without the overhead of a SQL engine.
Disclaimer: I am the Product Manager for Berkeley DB, so I am a little biased. That said, it was designed to do exactly what you're asking for -- straightforward, fast, scalable key-value data management without unnecessary overhead.
In fact, there are many "database domain" type application services that use Berkeley DB as their primary data store. Most of the open source and/or commercial LDAP implementations use Berkeley DB (including OpenLDAP, Redhat's LDAP, Sun Directory Server, etc.). Cisco, Juniper, AT&T, Alcatel, Mitel, Motorola and many others use Berkeley DB to manage their They use Berkeley DB for their gateway, authentication, and configuration management systems, They use BDB because it does exactly what they need, it's very fast, scalable and reliable.
You could get by quite nicely with just a Bloom filter if you can accept a very small false positive rate (assuming you use a large enough filter).
On the other hand, you could certainly use Cassandra. It makes heavy use of bloom filters, so asking for something that doesn't exist is quick, and you don't have to worry about false positives. It's designed to handle data sets that do not fit into memory, so performance degredation there is quite smooth.
Importing any amount of data should be quick -- on a normal machine, Cassandra can handle about 15k writes per second.
Many options here. Berkeley DB certainly does the job and is probably one of the simplest solutions. Just as simple: store everything in memcached, then you have the option of splitting the cache of the values across several machines if needed (if query load or data size grows).
I'd like to store a data structure persistently in RAM and have it accessible from pre-forked
web server processes in Perl.
Ideally I would like it to behave like memcached but without the need for a separate daemon. Any ideas?
Use Cache::FastMmap and all you need is a file. It uses mmap to provide a shared in-memory cache for IPC, which means it is quite fast. See the documentation for possible issues and caveats.
IPC::SharedMem might fit the bill.
Mod_perl shares RAM on systems with properly implemented copy-on-write forking. Load your Perl hash in a BEGIN block of your mod_perl program, and all forked instances of the mod_perl program will share the memory, as long as there are no writes to the pages storing your hash. This doesn't work perfectly (some pages will get written to) but on my servers and data it decreases memory usage by 70-80%.
Mod_perl also speeds up your server by eliminating the compile-time for Perl on subsequent web requests. The downside of mod_perl that you have to program carefully, and avoid programs that modify global variables, since these variables, like your hash, are shared by all the mod_perl instances. It is worthwhile to learn enough Perl so that you don't need to change globals, anyway!
The performance gains from mod_perl are fantastic, but mod_perl is not available in many shared hosts. It is easy to screw up, and hard to debug while you are learning it. I only use it when the performance improvements are appreciated enough by my customers to justify my development pain.
I'm working on a .NET program that executes arbitrary scripts against a database.
When a colleage started writing the database access code, he simply exposed one command object to the rest of the application which is re-used (setting CommandText/Type, calling ExecuteNonQuery() etc.) for each statement.
I imagine this is a big performance hit for repeated, identical statements, because they are parsed anew each time.
What I'm wondering about, though, is: will this also degrade execution speed if each statement is different from the previous one (not only different parameters, but an entirely different statement)? I couldn't easily find an answer on that in the documentation.
Btw, the RDBMS used is Oracle, but I guess this question is not really database specific.
P.S. I know exposing the same Command object is not thread safe, but that's not an issue here.
There is some overhead involved in creating new command objects, and so in certain circumstances it can make sense to re-use the same command. But as the general case enforced for an entire application it seems more than a little odd.
The performance hit usually comes from establishing a connection to the database, but ADO.NET creates a connection pool to help here.
If you wish to avoid parsing statements each time anew, you can put them into stored procedures.
I imagine your colleague just uses some old style approach that he's inherited from working on other platforms where reusing a command object did make a difference.
I am looking for a DBI (or similar) proxy that supports both SQL restrictions and transactions. The two I know about are:
DBD::Proxy
DBD::Gofer
DBD::Proxy
The problem I have found with DBD::Proxy is that its server, DBI::ProxyServer, doesn't just restrict queries coming in over the network (which I want), but it also restricts queries generated internally by the database driver. So, for example, with DBD::Oracle, ping no longer works, as well as many other queries it issues itself.
I can't just allow them, because:
That is quite a bit of internal knowledge of DBD::Oracle and would be quite fragile.
The whitelist is query_name => 'sql', where query_name is the first word of whatever is passed to prepare. DBD::Oracle has a lot of internal queries, and the first word of many of them is select (duh).
So, it doesn't seem I can use DBD::Proxy
DBD::Gofer
I haven't tried DBD::Gofer, because the docs seem to tell me that I can't use transactions through it:
CONSTRAINTS
...
You can’t use transactions
AutoCommit only. Transactions aren’t supported.
So, before I write my own application-specific proxy (using RPC::PLServer ?), is there code out there that solves this problem?
This question would be best asked on the DBI Users mailing list, dbi-users#perl.org.
Sign up at http://dbi.perl.org/
I'm not sure what you mean about DBD::Proxy restricting queries. On the only occasion I've used it, it didn't modify the queries at all.