Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I've got a problem whit the gfix sweep command, because it doesn't clean the garbage collector. What problem it can be. The database backup size is 900mb smaller than the database itself. What is the problem if the gfix sweep started manually don't work?
A backup is smaller because it doesn't contain indexes, but just the database data itself, and it only contains data of the latest committed transaction, no earlier record versions. In addition, the storage format of the backup is more efficient, because it is written and read serially and doesn't need the more complex layout used for the database itself.
In other words, in almost all cases a backup will be smaller than the database itself, sometimes significantly smaller (if you have a lot of indexes or a lot of transaction churn, or a lot of blobs).
Garbage collection in Firebird will remove old record versions, sweep will also cleanup transaction information. Neither will release allocated pages, that is: the database file will not shrink. See Firebird for the Database Expert: Episode 4 - OAT, OIT, & Sweep
If you want to shrink a database, you need to backup and restore it, but generally there is no need for that: Firebird will re-use free space on its data pages automatically.
See also Firebird for the Database Expert: Episode 6 - Why can't I shrink my databases.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am developing a system which constains a lot of olap work. According to my research, column based data warehouse is the best choice. But I am puzzled to choose a good data warehouse product.
All the data warehouse comparison article I see is befor 2012,and there seems little article about it. Is data warehouse out-of-date? Hadoop HBase is better?
As far as I know, InfiniDB is a high performance open source data warehouse product, but it has not been maintained for 2 years https://github.com/infinidb/infinidb. And there is little document about InfiniDB . Has InfiniDB been abundanted by developers ?
Which is the best data warehouse product by now?
How do I incrementally move my Business data stored in the Mysql database to data warehouse ?
Thank you for your answer!
Data warehousing is still a hot topic, and HBase is not the fastes, but a very well known and compatible one (many applications build on it)
I have taken the Journey for a good Column store some years ago and finally went with InfiniDB because of the easy migration from plain mysql. its a nice piece of software, but it has still bugs, so i cannot fully recommend it to be used in production. (not without a 2nd failover instance).
However, MariaDB has picket up the InfiniDB technology and is porting it over to their MariaDB Database Server. This new product ist called MariaDB Columnstore[1], of with a testing build is available. They have already put a lot effort in it, so i think ColumnStore will get a Major product of MariaDB within the next two years.
I cant answer that. Im still with InfiniDB and also helping others with their projects.
This totally depends on your data structure and usage.
InfiniDB is great at querying, it had (in my tests) ~8% better performance than impala, however, while infinidb supports INSERT, UPDATE, DELETE and transactions it is not great on transactional workload. i.e. just moving a community driven website to infinidb where visitors always manipulating data will NOT work well. one insert with 10000 rows will work well, 10000 inserts with 1 row will kill it.
We deployed Infinidb for our customers to 'aid' the query performance of a regular mariadb installation - we created a tool that imports and updates MariaDB database tables into InfiniDB faster querying. manipulations on that table are still done in MairaDB and the changes get batch-imported into InfiniDB with 30 sec delay. as original and infinidb tables have the same structure and are accessable with api mysql, we just can switch the database connection and have super-fast SELECT queries. this works well for our use case.
We also built new statistics/analytics applications from ground up to work with infinidb and replace a older MySQL-Based System, which also works great and above any performance-expectations. (we now have 15x of the data we had in mariadb, and its still easier to maintain and much faster to query).
[1] https://mariadb.com/products/mariadb-columnstore
I would give Splice Machine a shot (Open Source). It stores data on HBase and will provide the core data management functions that a warehouse provides (Primary Keys, Constraints, Foreign Keys, etc.)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Good sirs.
I've just started planning a new project, and it seems that I should stick with a relational database, (even though I want to play with mongo). Tell me if I'm mistaken!
There will be box models, each of which can contain hundreds to thousands of items.
At any time, the user can move an item to another box.
for example, using some Railsy pseudocode...
item = Item(5676)
item.box // returns 24
item.update(box:25)
item.box // returns 25
This sounds like a simple SQL join table to me, but an expensive array manipulation operation for mongodb.
Or is removing an object out of one (huge) array and inserting it in another (huge) array not a big problem for mongo?
Thanks for any wisdom. I've only just started with mongo.
If you want to use big arrays, stay away from MongoDB. I tell from personal experience. There are two big problems with arrays. If they start to grow, document grows and it needs to be moved on disk. That is very, very slow operation. Plus if you need to scan array to get to 10000 element, that will be very slow as it needs to check 9999 before that.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I appreciate there are one or two similar questions on SO but we are a few years on and I know EF's speed and general performance has been enhanced so those may be out of date.
I am writing a new webservice to replace an old one. Replicating the existing functionality it needs to do just a handful of database operations. These are:
Call existing stored procedures to get data (2)
Send SQL to the database to be executed (should be stored procedures I know) (5)
Update records (2)
Insert records (1)
So 10 operations in total. The database is HUGE but I am only dealing with 3 tables directly (stored procedures do some complex JOINs).
When getting the data I build an array of objects (e.g. Employees) which then get returned by the web service.
From my experience with Entity Framework, and because I'm not doing anything clever with the data, I believe EF is not the right tool for my purpose and SqlDataReader is better (I imagine it is going to be lighter and faster).
Entity Framework focuses mostly on developer productivity - easy to use, easy to get things done.
EF does add some abstraction layers on top of "raw" ADO.NET. It's not designed for large-scale, bulk operations, and it will be slower than "raw" ADO.NET.
Using a SqlDataReader will be faster - but it's also a lot more (developer's) work, too.
Pick whichever is more important to you - getting things done quickly and easily (as a developer), or getting top speed by doing it "the hard way".
There's really no good "one single answer" to this "question" ... pick the right tool / the right approach for the job at hand and use it.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I want to understand more about the system and DB architecture of MongoDB.
I am trying to understand how MongoDB stores and retrieves the documents. If it's all in memory etc.
A comparative analysis between MongoDB and Oracle will be a bonus but, I am mostly focusing on understanding the MongoDB architecture per se.
Any pointers will be helpful.
MongoDB memory maps the database files. It allows the OS to control this and allocate the maximum amount of RAM to the memory mapping. As MongoDB updates and reads from the DB it is reading and writing to RAM. All indexes on the documents in the database are held in RAM also. The files in RAM are flushed to disk every 60 seconds. To prevent data loss in the event of power failure, the default is to run with journaling switched on. The journal file is flushed to disk every 100ms and if there is power loss is used to bring the database back to a consistent state.
An important design decision with mongo is on the amount of RAM. You need to figure out your working set size - i.e if you are going to be reading and writing to only the most recent 10% of your data in the database then this 10% is your working set and should be held in memory for maximum performance. So if your working set is 10GB you are going to neen 10GB for max performance - otherwise your queries/updates will run slower as pages of memory are paged from disk into memory.
Other important aspects of mongoDB are replication for backups and sharding for scaling.
There are a lot of great online resources for learning. MongoDB is free and opensource.
EDIT:
It's a good idea to check out the tutorial
http://www.mongodb.org/display/DOCS/Tutorial
and manual
http://www.mongodb.org/display/DOCS/Manual
and the Admin Zone is useful too
http://www.mongodb.org/display/DOCS/Admin+Zone
and if you get bored of reading then the presentations are worth checking out.
http://www.10gen.com/presentations
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have 2 computers which are connected to each other via serial comunication.
The main computer is holding a DB (about 10K words) the computer is working at a 20Hz rate.
I need real-time synchronization of the DB for the other computer - if data is added, deleted, or updated, I want the other computer to see or get the changes in real-time.
If I will transfer whole the DB peirodicly it will take about 5 seconds to update the other side - which is not acceptable.
Spmeone has an idea?
As you said, the other computer has to get the changes (i.e. insert, delete, update) via the serial link.
The easiest way to do this (but maybe impossible, if you can't change certain things) is to extend the database-change methods (or, if thats not possible: every call) to send an insert/delete/update-datagram with all required data over the serial link, which has to be robust against packet-loss (i.e. error detection, retransmission, etc.).
On the other end you have to implement a semantically equivalent database where you replay all the received changes.
Of course you still have to synchronize the databases at startup/initialization or maybe periodically (e.g. once per day).