About MongoDB repair's working (option, log, progress or so) - mongodb

MongoDB was accidentally broken, and now in 'repair'. (wiredTiger, old version 3.6)
In my case, repair work is more needed really for some instances, so if there is available option, I consider to use it and firstly to skip less necessary but more indexes, probably erroneous ones.
(But the job is likely to proceed all the data through instances, especially for 'wiredTiger' which is named for a sort of 'interleaved' status, then no way to prioritize, though.)
Second, repair is likely to work longer for those with more indexes, and longer for more data (even with less indexes),
whatever, progress log messages are in standard-out, BTW time expectation seems difficult (just keeping going even no log for four hours). Instances are different in size of less than 100GB or more than 1TB.
Logs being shown on screen would be saved as any file?
If there are some problems, assuming a certain instance has ones (e.g, complexity, poor structure, etc and as a result caused crash), the repair could give some of left crashed and others rescued?
And, practically there is no more possible method to recover instances if repair finally fails?

Related

What impact does increasing the number of days historical events are stored have on overall Tableau Server performance

I'm looking at increasing the number of days historical events that are stored in the Tableau Server database from the default 183 to +365 days and I'm trying to understand what the performance impact to Tableau Server itself would be since the database and backup sizes also start increasing. Would it cause the overall Tableau Server running 2019.1.1 over time to slow to a crawl or begin to have a noticeable impact with respect to performance?
I think the answer here depends on some unknowns and makes it pretty subjective:
How much empty space is on your PostGres node.
How many events typically occur on your server in a 6-12 month period.
Maybe more importantly than a yes or a no (which should be taken with a grain of salt) would be things to consider prior to making the change.
Have you found value in the 183 default days? Is it worth the risk adding 365? It sounds like you might be doing some high level auditing, and a longer period is becoming a requirement. If that's the case, the answer is that you have no choice but to go ahead with the change. See steps below.
Make sure that the change is made in a Non-prod environment first. Ideally one with high traffic. Even though you will not get an exact replica - it would certainly be worth the practice in switching it over. Also you want to make sure that Non-prod and prod environments match exactly.
Make sure the change is very well documented. For example, if you were to change departments without anyone having knowledge of a non-standard config setting, it might make for a difficult situation if ever Support is needed or if there is a question as to what might be causing slow behavior.
Things to consider after a change:
Monitor the size of backups.
Monitor the size of the historical table(s) (See Data-Dictionary for table names if not already known.)
Be ready to roll back the config change if the above starts to inflate.
Overall:
I have not personally seen much troubleshooting value from these tables being over a certain number of days (ie: if there is trouble on the server it is usually investigated immediately and not referenced to 365+ days ago.) Perhaps the value for you lies in determining the amount of usage/expansion on Tableau Server.
I have not seen this table get so large that it brings down a server or slows it down. Especially if the server is sized appropriately.
If you're regularly/heavily working with and examining PostGres data, it may be wise to extract at a low traffic time of the day. This will prevent excess usage from outside sources during peak times. Remember that adhoc querying of PostGres is Technically unsupported. This leads to awkward situations if things go awry.

Zero evictions memcached, but items still disappear

Items stored in Memcached seem to disappear without reason (TTL: 86400 but sometimes gone within 60s). However there's enough free space, and stats give zero evictions.
The items that get lost seem to be the larger items. They seem to disappear after adding some other big items. Could it be the case "The slab" for larger items is full and items are being evicted without being reported?
Memcached version 1.4.5.
Keys can get evicted before their expiration in memcached; this is a side effect of how memcached handles memory (see this answer for more details).
If the items you are storing are large enough that this is becoming a problem, memcached may be the wrong tool for the task you are trying to perform. You essentially have 2 practical options in this scenario:
break down the data you're trying to cache in smaller chunks
if this isn't feasible for any reason, you will have to use some sort of permanent storage, the nature of which will be dependent on the nature of data you're trying to store (choices would include redis, mongodb, SQL database, filesystem, etc.)

PostgreSQL temporary table cache in memory?

Context:
I want to store some temporary results in some temporary tables. These tables may be reused in several queries that may occur close in time, but at some point the evolutionary algorithm I'm using may not need some old tables any more and keep generating new tables. There will be several queries, possibly concurrently, using those tables. Only one user doing all those queries. I don't know if that clarifies everything about sessions and so on, I'm still uncertain about how that works.
Objective:
What I would like to do is to create temporary tables (if they don't exist already), store them on memory as far as that is possible and if at some point there is not enough memory, delete those that would be committed to the HDD (I guess those will be the least recently used).
Examples:
The client will be doing queries for EMAs with different parameters and an aggregation of them with different coefficients, each individual may vary in terms of the coefficients used and so the parameters for the EMAs may repeat as they are still in the gene pool, and may not be needed after a while. There will be similar queries with more parameters and the genetic algorithm will find the right values for the parameters.
Questions:
Is that what "on commit drop" means? I've seen descriptions about
sessions and transactions but I don't really understand those
concepts. Sorry if the question is stupid.
If it is not, do you know about any simple way to get Postgres to do
this?
Workaround:
In the worst case I should be able to make a guesstimation about how many tables I can keep on memory and try to implement the LRU by myself, but it's never going to be as good as what Postgres could do.
Thank you very much.
This is a complicated topic and probably one to discuss in some depth. I think it is worth both explaining why PostgreSQL doesn't support this and also what you can do instead with recent versions to approach what you are trying to do.
PostgreSQL has a pretty good approach to caching diverse data sets across multiple users. In general you don't want to allow a programmer to specify that a temporary table must be kept in memory if it becomes very large. Temporary tables however are managed quite differently from normal tables in that they are:
Buffered by the individual back-end, not the shared buffers
Locally visible only, and
Unlogged.
What this means is that typically you aren't generating a lot of disk I/O for temporary tables. The tables do not normally flush WAL segments, and they are managed by the local back-end so they don't affect shared buffer usage. This means that only occasionally is data going to be written to disk and only when necessary to free memory for other (usually more frequent) tasks. You certainly aren't forcing disk writes and only need disk reads when something else has used up memory.
The end result is that you don't really need to worry about this. PostgreSQL already tries, to a certain extent, to do what you are asking it to do, and temporary tables have much lower disk I/O requirements than standard tables do. It does not force the tables to stay in memory though and if they become large enough, the pages may expire into the OS disk cache, and eventually on to disk. This is an important feature because it ensures that performance gracefully degrades when many people create many large temporary tables.

Compact Firebird 2.1 Database

How can I compact Firebird 2.1 database, like we do in MS Access (discarding erased data, remaking index, etc)?
There's a way to do it?
Thanks!
Usually there is no need to compact a Firebird Database: see fb release notes about garbage collection and an automatic (per-database configurable) operation named "sweep".
In few words, fb reuses space in pages when records are deleted or oldest record version are freed asking for disk space chunks only when free space becomes too small (i.e. under a defined percent).
Sweep is performed as default after a predefined number of commited transactions, bur it's an expensive task.
Backup and restore must be intended as last resort to optimize and shrink, as this rebuilds and optimize indexes too, but usually this is not needed as there are commands and tools to rebuild indexes.
The only way to do it is to make a backup and a restore.
From the official faq
Many users wonder why they don't get their disk space back when they
delete a lot of records from database.
The reason is that it is an expensive operation, it would require a
lot of disk writes and memory - just like doing refragmentation of
hard disk partition. The parts of database (pages) that were used by
such data are marked as empty and Firebird will reuse them next time
it needs to write new data.
If disk space is critical for you, you can get the space back by
doing backup and then restore. Since you're doing the backup to
restore right away, it's wise to use the "inhibit garbage collection"
or "don't use garbage collection" switch (-G in gbak), which will make
backup go A LOT FASTER. Garbage collection is used to clean up your
database, and as it is a maintenance task, it's often done together
with backup (as backup has to go throught entire database anyway).
However, you're soon going to ditch that database file, and there's no
need to clean it up.

Reasons for & against a Database

i had a discussion with a coworker about the architecture of a program i'm writing and i'd like some more opinions.
The Situation:
The Program should update at near-realtime (+/- 1 Minute).
It involves the movement of objects on a coordinate system.
There are some events that occur at regular intervals (i.e. creation of the objects).
Movements can change at any time through user input.
My solution was:
Build a server that runs continously and stores the data internally.
The server dumps a state-of-the-program at regular intervals to protect against powerfailures and/or crashes.
He argued that the program requires a Database and i should use cronjobs to update the data. I can store movement information by storing startpoint, endpoint and speed and update the position in the cronjob (and calculate collisions with other objects there) by calculating direction and speed.
His reasons:
Requires more CPU & Memory because it runs constantly.
Powerfailures/Crashes might destroy data.
Databases are faster.
My reasons against this are mostly:
Not very precise as events can only occur at full minutes (wouldn't be that bad though).
Requires (possibly costly) transformation of data on every run from relational data to objects.
RDBMS are a general solution for a specialized problem so a specialized solution should be more efficient.
Powerfailures (or other crashes) can leave the Data in an undefined state with only partially updated data unless (possibly costly) precautions (like transactions) are taken.
What are your opinions about that?
Which arguments can you add for any side?
Databases are not faster. How silly... How can a database be faster than writing a custom data structure and storing it in memory ?? Databases are Generalized tools to persist data to disk for you so you don't have to write all the code to do that yourself. Because they have to address the needs of numerous disparate (and sometimes inconsistent) business functions (Persistency (Durability), Transactional integrity, caching, relational integrity, atomicity, etc. etc. ) and do it in a way that protects the application developer from having to worry about it so much, by definition it is going to be slower. That doesn't necessarilly mean his conclusion is wrong however.
Each of his other objections can be addressed by writing the code to address that issue yourself... But you see where that is going... At some point, the development efforts of writing the custom code to address the issues that are important for your application outweigh the performance hit of just using a database - which already does all that stuff out of the box... How many of these issues are important ? and do you know how to write the code necessary to address them ?
From what you've described here, I'd say your solution does seem to be the better option. You say it runs once a minute, but how long does it take to run? If only a few seconds, then the transformation to relational data would likely be inconsequential, as would any other overhead. most of this would take likely 30 seconds. This is assuming, again, that the program is quite small.
However, if it is larger, and assuming that it will get larger, doing a straight dump is a better method. You might not want to do a full dump every run, but that's up to you, just remember that it could wind up taking a lot of space (same goes if you're using a database).
If you're going to dump the state, you would need to have some sort of a redundancy system in place, along with quasi-transactions. You would want to store several copies, in case something happens to the newest version. Say, the power goes out while you're storing, and you have no backups beyond this half-written one. Transactions, you would need something to tell that the file has been fully written, so if something does go wrong, you can always tell what the most recent successful save was.
Oh, and for his argument of it running constantly: if you have it set to a cronjob, or even a self-enclosed sleep statement or similar, it doesn't use any CPU time when it's not running, the same amount that it would if you're using an RDBMS.
If you're writing straight to disk, then this will be the faster method over a database, and faster retrieval, since, as you pointed out, there is no overhead.
Summary: A database is a good idea if you have a lot of idle processor time or historical records, but if resources are a legitimate concern, then it can become too much overhead and a dump with precautions taken is better.
mySQL can now model spatial data.
http://dev.mysql.com/doc/refman/4.1/en/gis-introduction.html
http://dev.mysql.com/doc/refman/5.1/en/spatial-extensions.html
You could use the database to keep track of world locations, user locations, items locations ect.