Esent database engine limited to specific page sizes? - esent

I had a problem opening an esent database (Windows.edb) due to some problem with the page size. The pagesize of the Windows.edb on my system is 32K. When I set this via JET_paramDatabasePageSize JetInit would return the error -1213 (The database page size does not match the engine). Laurion Burchall suggested to turn off JET_paramRecovery once I need only a ReadOnly access to the database. That solved my problem.
Until now. I have a not perfectly shutdown database. I assume that, with JET_paramRecovery=On, JetInit would automagically do the recovery and let me read the database. But if I try that I get that old -1213 error.
Now I can fix my file with ESENTUTL but the dummy user of my app won't be able to. Is there some way to have recovery on and still be able to define ANY DatabasePageSize? There are no log files present at the location of the database (and I set the Logpath to the same directory to make sure they aren't written anywhere else).
Does this mean that the engine on my machine does not support the page size or the database? Or could I solve the problem with setting another magic switch?

Running recovery on another application's database is tricky. ESENT is an embedded engine and each application can have its own settings. Before you run recovery you need to know:
Where the logfiles are located (JET_paramLogFilePath)
The logfile size (JET_paramLogFileSize)
The database page size (JET_paramDatabasePageSize)
The logfile basename (JET_paramBaseName)
If you set all those parameters correctly then recovery will work properly. If you don't do that properly then the other application may have problems recovering its database!

There is a simple (but tricky) way to "fix" an EDB database which wasn't shut down gracefully. There is the state flag in the header at offset 52. It's a 4Byte Integer which should be set to 3 (if not closed gracefully the value you will find is be probably 2).
You will probably need to repeat this entry at the 2nd database page which contains a copy of the database header. You can find that page simply be the page size of the database (usually 4096, 8192, etc).
As this is really a hack, you should use it at your own RISK!

Related

After using pg_dump behind pg_bouncer, the search_path appears to be altered and other clients are affected

My network looks like this:
App (many connections) pg_bouncer (few sessions) PostgreSql
nodes ----------------------- nodes ----------------- nodes
So pg_bouncer multiplexes connections giving app nodes the illusion that they are all connected directly.
The issue comes when I launch pg_dump: few milliseconds after the dump finishes, all app nodes fail with errors saying "relation xxxx does not exist" though the table or sequence is actually there. I'm pretty sure the cause is pg_bouncer manipulating the "search_path" variable, so that app nodes no longer find tables in my schema. This happens at dump time even if the dump file is not imported nor executed.
Note, I've searched SO and google and I've seen there are many threads asking about the search_path in the generated file, but that's not what I'm asking about. I have no problems with the generated file, my issue is the pg_bouncer session that other clients are using, and I haven't found anything about this.
The most obvious workaround would probably be to set the search_path manually in the app, but attention, don't fall into this fallacy: it's useless for the app to do it at the beginning since it may be assigned a different pg_bouncer session at the next transaction. And I cannot be setting it all the time.
The next most obvious workaround would be to set it back to the intended value immediately after launching pg_dump, but there's a race condition here, and other nodes are quick enough so that I fear they will still fail.
Is there a way to avoid letting pg_dump manipulate this variable, or making sure it resets it before exiting?
(Also, I'm taking for granted pg_dump and search_path are the cause for this, can you suggest a way to confirm that? All the evidence I have is the errors few milliseconds later and the set search_path instruction in the generated file which produces the same errors if executed.)
Thanks
Don't connect pg_dump through pgbouncer with transaction pooling. Just change the port number so it connects directly to the database. pg_dump is incompatible with transaction pooling.
You might be able to get it to work anyway by setting server_reset_query_always = 1

Is there a way to show everything that was changed in a PostgreSQL database during a transaction?

I often have to execute complex sql scripts in a single transaction on a large PostgreSQL database and I would like to verify everything that was changed during the transaction.
Verifying each single entry on each table "by hand" would take ages.
Dumping the database before and after the script to plain sql and using diff on the dumps isn't really an option since each dump would be about 50G of data.
Is there a way to show all the data that was added, deleted or modified during a single transaction?
Dude, What are you looking for is the most searchable thing on the internet when it comes to capturing Database changes. It is a kind of version control we can say.
But as long as I know, sadly there are no in-built approaches are available in PostgreSQL or MySql. But you can overcome it by setting/adding some triggers for your most usable operations.
You can create some backup schemas, and tables to capture your changes that are changed(updated), created, or deleted.
In this way you can achieve what you want. I know this process is fully manual, But really effective.
If you need to analyze the script's behaviour only sporadically, then the easiest approach would be to change server configuration parameter log_min_duration_statement to 0 and then back to any value it had before the analysis. Then all of the script activity will be written to the instance log.
This approach is not suitable if your storage is not prepared to accommodate this amount of data, or for systems in which you don't want sensitive client data to be written to a plain-text log file.

ASA 8 database.db file size doesn't increase

I'm running an app on a asa8 database. Since the beginning with a 'clean' database the size was 9776 kb. Now after a while and being populated with 30 tables with lots of data, the size is still 9776kb.
My question is - is there a ceiling as to how much data can be added to this db or will this file automatically increase in size when it needs to?
Thanks in advance
Alex
I am running an app on ASA8 for years. Even a empty database (regarding data) has a size because of its data structure. This is also influenced by the page size of the database.
When you begin adding data the database might be able to fill in the data into the existing pages without the need to extend the file size for some time. But you should soon see an increasing file size of the database.
ASA writes new or updated data to a log file in the same directory as the database file (default option). The changes from this log file are fed into the main database when the database server executes a so called checkpoint. Check the server log messages - a message should inform you about checkpoints to happen. After a checkpoint the timestamp of the database file gets updated. Check this too.

Is it possible to run Postgres on a write-protected file system? Or a shared file system?

I'm trying to set up a distributed processing environment,
with all of the data sitting in a single shared network drive.
I'm not going to write anything to it, and just be reading from it,
so we're considering write-protecting the network drive as well.
I remember when I was working with MSSQL,
I could back up databases to a DVD and load it directly as a read-only database.
If I can do something like that in Postgres,
I should be able to give it an abstraction like a read-only DVD,
and all will be good.
Is something like this possible in Postgres,
if not, any alternatives? (MySQL? sqlite even?)
Or if that's not possible is there some way to specify a shared file system?
(Make it know that other processes are reading from it as well?)
For various reasons, using a parallel dbms is not possible,
and I need two DB processes running parallel...
Any help is greatly appreciated.
Thanks!!
Write-protecting the data directory will cause PostgreSQL to fail to start, as it needs to be able to write postmaster.pid. PostgreSQL also needs to be able to write temporary files and tablespaces, set hint bits, manage the visibility map, and more.
In theory it might be possible to modify the PostgreSQL server to support running on a read-only database, but right now AFAIK this is not supported. Don't expect it to work. You'll need to clone the data directory for each instance.
If you want to run multiple PostgreSQL instances for performance reasons, having them fighting over shared storage would be counter-productive anyway. If the DB is small enough to fit in RAM it'd be OK ... but in that case it's also easy to just clone it to each machine. If the DB isn't big enough to be cached in RAM then both DB instances would be I/O bottlenecked and unlikely to perform any better than (probably slightly worse than) a single DB not subject to storage contention.
There's some chance that you could get it to work by:
Moving the constant data into a new tablespace onto read-only shared storage
Taking a basebackup of the database, minus the newly separated tablespace for shared data
Copying the basebackup of the DB to read/write private storage on each host that'll run a DB
Mounting the shared storage and linking the tablespace in place where Pg expects it
Starting pg
... at least if you force hint-bit setting and VACUUM FREEZE everything in the shared tablespace first. It isn't supported, it isn't tested, it probably won't work, there's no benefit over running private instances, and I sure as hell wouldn't do it, but if you really insist you could try it. Crashes, wrong query results, and other bizarre behaviour are not unlikely.
I've never tried it, but it may be possible to run postgres with a data dir which is mostly on a RO file system if all your use is indeed read-only. You will need to be sure to disable autovacuum. I think even read activity may generate xlog mutation, so you will probably have to symlink the pg_xlog directory onto a writeable file system. Sometimes read queries will spill to disk for large sorts or other temp requirements, so you should also link base/pgsql_tmp to a writeable disk area.
As Richard points out there are visibility hint bits in the data heap. May want to try VACUUM FULL FREEZE ANALYZE on the db before putting it on the RO file system.
"Is something like this possible in Postgres, if not, any alternatives? (MySQL? sqlite even?)"
I'm trying to figure out if I can do this with postgres as well, to port over a system from sqlite. I can confirm that this works just fine with sqlite3 database files on a read-only NFS share. Sqlite does work nicely for this purpose.
When done with sqlite, we cut over to a new directory with new sqlite files whenever there are updates. We don't ever insert into the in-use database. I'm not sure if inserts would pose any problems (with either database). Caching read-only data at the OS level could be an issue if another database instance mounted the dir read-write. This is something I would ideally like to be able to do.

Database last updated?

I'm working with SQL 2000 and I need to determine which of these databases are actually being used.
Is there a SQL script I can used to tell me the last time a database was updated? Read? Etc?
I Googled it, but came up empty.
Edit: the following targets issue of finding, post-facto, the last access date. With regards to figuring out who is using which databases, this can definitively monitored with the right filters in the SQL profiler. Beware however that profiler traces can get quite big (and hence slow/hard to analyze) when the filters are not adequate.
Changes to the database schema, i.e. addition of table, columns, triggers and other such objects typically leaves "dated" tracks in the system tables/views (can provide more detail about that if need be).
However, and unless the data itself includes timestamps of sorts, there are typically very few sure-fire ways of knowing when data was changed, unless the recovery model involves keeping all such changes to the Log. In that case you need some tools to "decompile" the log data...
With regards to detecting "read" activity... A tough one. There may be some computer-forensic like tricks, but again, no easy solution I'm afraid (beyond the ability to see in server activity the very last query for all still active connections; obviously a very transient thing ;-) )
I typically run the profiler if I suspect the database is actually used. If there is no activity, then simply set it to read-only or offline.
You can use a transaction log reader to check when data in a database was last modified.
With SQL 2000, I do not know of a way to know when the data was read.
What you can do is to put a trigger on the login to the database and track when the login is successful and track associated variables to find out who / what application is using the DB.
If your database is fully logged, create a new transaction log backup, and check it's size. The log backup will have a fixed small lengh, when there were no changes made to the database since the previous transaction log backup has been made, and it will be larger in case there were changes.
This is not a very exact method, but it can be easily checked, and might work for you.