I'm trying to import data from a MS Access database from Matlab and I'm getting the following error:
Error using database/fetch (line 37)
[Microsoft][ODBC Microsoft Access Driver] The query cannot
be completed. Either the size of the query result is larger
than the maximum size of a database (2 GB), or there is not
enough temporary storage space on the disk to store the
query result.
I have 4GB of RAM and 60GB of free hard drive space so I don't think it's a space problem. The database is 1022Mb.
Are you by any chance asking for a huge amount of data?
A few nice outerjoins perhaps, or multiple combinations of tables?
My guess is that if this is causing the problem, you should just split the query up into a few pieces and it will work.
Related
We have a PGSQL server (v13) with alot of data in it.
The database contains documents.
The total database is around 1.5 TB. Today, someone called me telling me the disk space was almost full. They put in 1 TB extra storage some time ago but that extra storange ran full extremely quickly, which is very abnormal. Disk was 2 TB, now 3 TB with the extra storage.
If I look at the table containing the documents, it only added around 10 GB since 20/07/2022, so I really don't understand why the disk is running full this fast. If I do this query on the database:
SELECT pg_size_pretty( pg_total_relation_size('documents') );
It returns '2.7 TB' which is impossible, since there aren't that much documents added recently.
I did a Vacuum as a test on a certain table (total: 20 gb). The vacuum failed with Error:
ERROR: wrong tuple length
What does it mean? I have the same errors in the PGSQL logfiles. They recently installed a new antivirus system on the server. I already asked for exclusions but it didn't seem to solve the problem.
I now only have +/- 130 gb free disk space and it keeps getting full.
Is it possible the vacuum takes the disk space and does not return it to Windows because of the error?
Any help is appreciated. I'm not a database expert but i really need to solve this.
I created a test Postgres database in AWS RDS. Created a 100 million row, 2 column table. Ran select * on that table. Postgres reports "Buffers: shared hit=24722 read=521226" but AWS reports IOPS in the hundreds. Why this huge discrepancy? Broadly, I'm trying to figure out how to estimate the number of AWS I/O operations a query might cost.
PostgreSQL does not have insight into what the kernel/FS get up to. If PostgreSQL issues a system call to read the data, then it reports that buffer as "read". If it was actually served out of the kernel's filesystem cache, rather than truly from disk, PostgreSQL has no way of knowing that (although you can make some reasonable statistical guesses if track_io_timing is on), while AWS's IO monitoring tool would know.
If you set shared_buffers to a large fraction of memory, then there would be little room left for a filesystem cache, so most buffers reported as read should truly have been read from disk. This might not be a good way run the system, but it might provide some clarity to your EXPLAIN plans. I've also heard rumors that Amazon Aurora reimplemented the storage system so that it uses directIO, or something similar, and so doesn't use the filesystem cache at all.
So, basically, I have a website that will be used by people to modify filters and then click 'download' and the resulting Excel file will have the data (specified by their filters). There are about 125,000+ data-points in my postgreSQL database, and I currently have it being loaded in the background using a simple
df = pd.read_sql_query('select * from real_final_sale_data', con = engine)
The only problem is that this quickly overwhelms Heroku's memory allowance on 1 dyno (512 MB), but I do not understand why this is happening or what the solution is.
For instance, when I run this on my computer and do 'df.info()' it shows that it's only using about 30 MB of space, so how come when I read it, it suddenly is sucking up so much MB?
Thank you so much for your time!
So, the solution that ended up working was to just use some of the filters as queries to SQL. I.e., I had been just doing a select * without filtering anything from SQL, so my database which has like 120,000 rows and 30 columns caused a bit of strain on Heroku's dyno so it's definitely recommended to either use chunking or do some filtering when querying the DB.
I must say I don't understand this completely.
But when I try to convert a binary pbf file for my country Germany which is of size 3gb using osm2pgsql (slim mode), it is converted to postgresql tables for 3 hours and fails with the message 'not enough disk space'. I have 50gb of free space in my linux machine.
I can understand the temporary files are added to RAM normally and because I am using slim mode it is saved to database.
Please enlighten me, how a 3gb osm file while converting to postgresql(gis) tables takes 50gb space and throws that error ?
How do I solve this ?
Yes it could cross 50gb. As India pbf is around 375mb and PostgreSQL data folder size is 11gb that include world boundary also from OSM.
I have a question regarding MongoDB's collection size.
I did a small stress test in which my MongoDB server was constantly inserting, deleting and updating data for about 48 hours. The documents were only of small size, simply a numerical value and a timestamp as well as an ID.
Now, after those 48 hours, the collection used for inserting, deleting and updating data was 98.000 Bytes and the preallocated storage size was 696.320 Bytes. It has become that much higher than the actual collection size because of one input spike during an insertion phase. Due to following deletions of objects the actual collection size decreased again, the preallocated storage size didn't (AFAIK a common database management problem, since it's the same with e.g. MySQL).
After the stress test was completed I created a dump of my MongoDB database and dropped the database completely, so I could import the dump afterwards again and see how the stats would look then. And as I suspected, the collection size was still the same (98.000 Bytes) but the preallocated storage size went down to 40.960 Bytes (from 696.320 Bytes before).
Since we want to try out MongoDB for an application that produces hundreds of MB of data and therefore I/O traffic every day, we need to keep the database and its occupied space to a minimum. And preferably without having to create a dump, drop the whole database and import the dump again every now and then.
Now my question is: is there a way to call the MongoDB garbage collector functionally from code? The software behind it is a Java software and my idea was to call the garbage collector after a certain amount of time/operations or after the preallocated storage size has reached a certain threshold.
Or maybe there's an ever better (more elegant) way to minimize the occupied space?
Any help would be appreciated and I'll try to provide any further information if needed. Thanks in advance.