I've read the documentation and found very little about multiple processes (readers and writers) accessing a single kyotocabinet database. It appears you can create multiple readers, but unless you specify ONOLOCK multiple writers will block trying to open the db. Can anyone shed any light on how this works or if it is possible? I understand KyotoTycoon is one option, but am curious specifically about KyotoCabinet.
Found this on the tokyocabinet manpage:
Tokyo Cabinet provides two modes to connect to a database: "reader" and "writer". A reader can perform retrieving but neither storing nor deleting. A writer can perform all access methods. Exclusion control between processes is performed when connecting to a database by file locking. While a writer is connected to a database, neither readers nor writers can be connected. While a reader is connected to a database, other readers can be connect, but writers can not. According to this mechanism, data consistency is guaranteed with simultaneous connections in multitasking environment.
Guessing then, that this applies to kyoto as well.
Kyoto Cabinet is thread safe, but you cannot have separate processes reading and writing at the same time. You can have multiple reader processes as long as there is no writer connected.
From the website:
Sharing One database by Multiple Processes
Multiple processes cannot access one database file at the same time. A
database file is locked by reader-writer lock while a process is
connected to it. Note that the `BasicDB::ONOLOCK' option should not be
used in order to escape the file locking mechanism. This option is for
workaround against some file systems such as NFS, which does not
support file locking mechanisms.
If you want to get multiple processes to share one database, use Kyoto
Tycoon instead. It is a lightweight database server as network
interface to Kyoto Cabinet.
Related
Assume I have two Perl scripts connected to the same SQLite database. If one of the scripts is writing to the database, and the other one also tries to write to the database, will the second script connection get disconnected? or will its input get stored in some SQLite cache and SQLite will eventually commit the writes after the writes from the first script are done?
I don't want my second script connection die just because SQLIte locks the db because of the writes from the first script. Is this what would happen if both scripts write to the database?
Thanks
You can't. SQLite doesn't allow concurrent writes.
From official documentation:
To write to a database, a process must first acquire a SHARED lock as
described above (possibly rolling back incomplete changes if there is
a hot journal). After a SHARED lock is obtained, a RESERVED lock must
be acquired.
A RESERVED lock means that the process is planning on writing to the
database file at some point in the future but that it is currently
just reading from the file. Only a single RESERVED lock may be active
at one time, though multiple SHARED locks can coexist with a single
RESERVED lock. RESERVED differs from PENDING in that new SHARED locks
can be acquired while there is a RESERVED lock.
Is it possible for firebirdSQL to run 2 servers sharing 1 database file (FDB)/ repository?
No. The server needs exclusive access to the database files. In the case of the Classic architecture version, multiple fb_inet_server processes access the same files, but locks are managed through the fb_lock_mgr process.
Databases on NFS or SMB/CIFS shares are disallowed unless one explicitly disables this protection. firebird.conf includes strong warnings against doing this unless you really know what you are doing.
If you mean if two servers on different host can share the same database, then no.
Firebird either requires exclusive access to a database (SuperServer), or coordinates access to the database by different processes on the same host through a lock file (SuperClassic and ClassicServer).
In both cases the server requires certain locking and write visibility guarantees, and most networked filesystems don't provide those (or don't provide the locking semantics Firebird needs it).
If you really, really want to, you can by changing a setting in firebird.conf, but that is a road to corrupt database or other consistency problems. And therefor not something you should want to do.
Every SQL server will not allow such configuration. If you want to split load, maybe you need to look at Multi Tier architecture. Using this architecture, you can split your SQL query load to many computers.
I have a task ahead of me that requires the use of local temporary tables. For performance reasons I can't use transactions.
Temporary tables much like transactions require that all queries must come from one connection which must not be closed or reset. How can I accomplish this using Enterprise library data access application block?
Enterprise Library will use a single database connection if a transaction is active. However, there is no way to force a single connection for all Database methods in the absence of a transaction.
You can definitely use the Database.CreateConnection method to get a database connection. You could then use that connection along with the DbCommand objects to perform the appropriate logic.
Other approaches would be to modify Enterprise Library source code to do exactly what you want or create a new Database implementation that does not perform connection management.
Can't see a way of doing that with DAAB. I think you are going to have to drop back to use ADO.Net connections and manage them yourself, but even then, playing with temporary tables on the server from a client-side app doesn't strike me as an optimal solution to a problem.
I am working on a research platform that reads relevant Twitter feeds via the Twitter API and stores them in a PostgreSQL database for future analysis. Middleware is Perl, and the server is an HP ML310 with 8GB RAM running Debian linux.
The problem is that the twitter feed can be quite large (many entries per second), and I can't afford to wait for the insert before returning to wait for the next tweet. So what I've done is to use a fork() so each tweet gets a new process to insert into the database and the listener and return quickly to grab the next tweet. However, because each of these processes effectively opens a new connection to the PostgreSQL backend, the system never catches up with its twitter feed.
I am open to using a connection pooling suggestion and/or to upgrading hardware if necessary to make this work, but would appreciate any advice. Is this likely RAM bound, or is there configuration or software approaches I can try to make the system sufficiently speedy?
If you open and close a new connection for each insert, that is going to hurt big time. You should use a connection pooler instead. Creating a new database connection is not a lightweight thing to do.
Doing a a fork() for each insert is probably not such a good idea either. Can't you create one process that simply takes care of the inserts and listens on a socket, or scans a directory or something like that and another process signalling the insert process (a classical producer/consumer pattern). Or use some kind of message queue (I don't know Perl, so I can't say what kind of tools are available there).
When doing bulk inserts do them in a single transaction, sending the commit at the end. Do not commit each insert. Another option is to write the rows into a text file and then use COPY to insert them into the database (it doesn't get faster than that).
You can also tune the PostgreSQL server a bit. If you can afford to lose some transactions in case of a system crash, you might want to turn synchronous_commit off.
If you can rebuild the table from scratch anytime (e.g. by re-inserting the tweets), you might also want to make that table an "unlogged" table. It is faster than a regular table in writing, but if Postgres is not shown down cleanly, you lose all the data in the table.
Use COPY command.
One script reads Tweeter and appends strings to the CSV file on disk.
Other scripts looking for CSV file on disk, renamed this file file and started COPY command from this file.
We have a web service that acccepts an XML file for any faults that occur on a vehicle. The web service then uses EF 3.5 to load these files to a hyper normalized database. Typically an XML file is processed in 10-20 seconds. There are two concurrency scenarios that I need to handle:
Different vehicles sending XML files at the same time: This isn't a problem. EF's default optimistic concurrency ensures that I am able to store all these files in the same tables as their data is mutually exclusive.
Same vehicle sending multiple files at the same time: This creates a problem as my system tries to write same or similar data to the database simultaneously. And this isn't rare.
We needed a solution for point 2.
To solve this I introduced a lock table. Basically, I insert a concatenated vehicle id and fault timestamp (which is same for the multiple files sent by a vehicle for the same fault) into this table when I start writing to the DB and I delete the record once I am done. However, there are a lot of times when both the files try to insert this row into the database simultaneously. In such cases, one file succeeds, while the other throws a duplicate key exception that goes to the caller of the webservice.
What's the best way to handle such scenarios? I wouldn't like to rollback anything from the db as there are many tables involved for a single file.
And what solution do you expect? Your current approach with lock table is exactly what you need. If the exception is fired because of duplicate you can either wait and try it again later or fire typed fault back to client and let him upload the file later. Both solutions are ugly but that is what your application currently offer.
The better solution would be replacing current web service with another solution where web service call would only add job to the queue and some background process would process these jobs and ensure that two files for the same car would not be processed concurrently. This would also offer much better throughput control for peek situations. The disadvantage is that you must implement some notification that file has been processed because it will not be online.