tfs database size - version control - version-control

I have TFS installed on a single server and am running out of space on the disk. (We've been using the instance for about 2 years now.)
Looking at the tables in SQL Server what seems to be culprit is the tbl_content table, it is at 70 GB. If I do a get on the entire source tree for all projects it is only about 8 GB of data.
Is this just all the histories of the files? It seems like a 10:1 ratio just the histories...since I would think the deltas would be very small.
Does anyone know if that is a reasonable size given 8 GB of source (and 2 yrs of activity)? And if not what to look at to 'fix' this?
Thanks

I can't help with the ratio question at the moment, sorry. For a short-term fix you might check to see if there is any space within the DB files that can be freed up. You may have already, but if not..
SELECT name ,size/128.0 - CAST(FILEPROPERTY(name, 'SpaceUsed') AS int)/128.0 AS AvailableSpaceInMB
FROM sys.database_files;
If the statement above returns some space you want to recover you can look into a one time DBCC SHRINKDATABASE or DBCC SHRINKFILE along with scheduling routine SQL maintenance plan that may include defragmenting the database.
DBCC SHRINKDATABASE and DBCC SHRINKFILE aren't things you should do on a regular basis, because SQL Server needs some "swap" space to move things around for optimal performance. So neither should be relied upon as your long term fix, and both could cause some noticeable performance degradation of TFS response times.
JB

Are you seeing data growth every day, even when no activity occurs on the system? If the answer is yes, are you storing any binaries outside of the 8GB of source somewhere?
The reason that I ask is that if TFS is unable to calculate a delta or if the file exceeds the size of delta generation, TFS will duplicate the entire binary file. I don't have the link with me, but I have it on my work machine, which describes this scenario and how to fix it, in the event that this is the cause of your problems.

Related

What impact does increasing the number of days historical events are stored have on overall Tableau Server performance

I'm looking at increasing the number of days historical events that are stored in the Tableau Server database from the default 183 to +365 days and I'm trying to understand what the performance impact to Tableau Server itself would be since the database and backup sizes also start increasing. Would it cause the overall Tableau Server running 2019.1.1 over time to slow to a crawl or begin to have a noticeable impact with respect to performance?
I think the answer here depends on some unknowns and makes it pretty subjective:
How much empty space is on your PostGres node.
How many events typically occur on your server in a 6-12 month period.
Maybe more importantly than a yes or a no (which should be taken with a grain of salt) would be things to consider prior to making the change.
Have you found value in the 183 default days? Is it worth the risk adding 365? It sounds like you might be doing some high level auditing, and a longer period is becoming a requirement. If that's the case, the answer is that you have no choice but to go ahead with the change. See steps below.
Make sure that the change is made in a Non-prod environment first. Ideally one with high traffic. Even though you will not get an exact replica - it would certainly be worth the practice in switching it over. Also you want to make sure that Non-prod and prod environments match exactly.
Make sure the change is very well documented. For example, if you were to change departments without anyone having knowledge of a non-standard config setting, it might make for a difficult situation if ever Support is needed or if there is a question as to what might be causing slow behavior.
Things to consider after a change:
Monitor the size of backups.
Monitor the size of the historical table(s) (See Data-Dictionary for table names if not already known.)
Be ready to roll back the config change if the above starts to inflate.
Overall:
I have not personally seen much troubleshooting value from these tables being over a certain number of days (ie: if there is trouble on the server it is usually investigated immediately and not referenced to 365+ days ago.) Perhaps the value for you lies in determining the amount of usage/expansion on Tableau Server.
I have not seen this table get so large that it brings down a server or slows it down. Especially if the server is sized appropriately.
If you're regularly/heavily working with and examining PostGres data, it may be wise to extract at a low traffic time of the day. This will prevent excess usage from outside sources during peak times. Remember that adhoc querying of PostGres is Technically unsupported. This leads to awkward situations if things go awry.

What about expected performance in Pentaho?

I am using Pentaho to create ETL's and I am very focused on performance. I develop an ETL process that copy 163.000.000 rows from Sql server 2088 to PostgreSQL and it takes 17h.
I do not know how good or bad is this performance. Do you know how to measure if the time that takes some process is good? At least as a reference to know if I need to keep working heavily on performance or not.
Furthermore, I would like to know if it is normal that in the first 2 minutes of ETL process it load 2M rows. I calculate how long will take to load all the rows. The expected result is 6 hours, but then the performance decrease and it takes 17h.
I have been investigating in goole and I do not find any time references neither any explanations about performance.
Divide and conquer, and proceed by elimination.
First, add a LIMIT to your query so it takes 10 minutes instead of 17 hours, this will make it a lot easier to try different things.
Are the processes running on different machines? If so, measure network bandwidth utilization to make sure it isn't a bottleneck. Transfer a huge file, make sure the bandwidth is really there.
Are the processes running on the same machine? Maybe one is starving the other for IO. Are source and destination the same hard drive? Different hard drives? SSDs? You need to explain...
Examine IO and CPU usage of both processes. Does one process max out one cpu core?
Does a process max out one of the disks? Check iowait, iops, IO bandwidth, etc.
How many columns? Two INTs, 500 FLOATs, or a huge BLOB with a 12 megabyte PDF in each row? Performance would vary between these cases...
Now, I will assume the problem is on the POSTGRES side.
Create a dummy table, identical to your target table, which has:
Exact same columns (CREATE TABLE dummy LIKE table)
No indexes, No constraints (I think it is the default, double check the created table)
BEFORE INSERT trigger on it which returns NULL and drop the row.
The rows will be processed, just not inserted.
Is it fast now? OK, so the problem was insertion.
Do it again, but this time using an UNLOGGED TABLE (or a TEMPORARY TABLE). These do not have any crash-resistance because they don't use the journal, but for importing data it's OK.... if it crashes during the insert you're gonna wipe it out and restart anyway.
Still No indexes, No constraints. Is it fast?
If slow => IO write bandwidth issue, possibly caused by something else hitting the disks
If fast => IO is OK, problem not found yet!
With the table loaded with data, add indexes and constraints one by one, find out if you got, say, a CHECK that uses a slow SQL function, or a FK into a table which has no index, that kind of stuff. Just check how long it takes to create the constraint.
Note: on an import like this you would normally add indices and constraints after the import.
My gut feeling is that PG is checkpointing like crazy due to the large volume of data, due to too-low checkpointing settings in the config. Or some issue like that, probably random IO writes related. You put the WAL on a fast SSD, right?
17H is too much. Far too much. For 200 Million rows, 6 hours is even a lot.
Hints for optimization:
Check the memory size: edit the spoon.bat, find the line containing -Xmx and change it to half your machine memory size. Details varies with java version. Example for PDI V7.1.
Check if the query from the source database is not too long (because too complex, or server memory size, or ?).
Check the target commit size (try 25000 for PostgresSQL), the Use batch update for inserts in on, and also that the index and constraints are disabled.
Play with the Enable lazy conversion in the Table input. Warning, you may produce difficult to identify and debug errors due to data casting.
In the transformation property you can tune the Nr of rows in rowset (click anywhere, select Property, then the tab Miscelaneous). On the same tab check the transformation is NOT transactional.

PostgreSQL - Recovery of Functions' code following accidental deletion of data files

So, I am (well... I was) running PostgreSQL within a container (Ubuntu 14.04LTS with all the recent updates, back-end storage is "dir" because of convince).
To cut the long story short, the container folder got deleted. Following the use of extundelete and ext4magic, I have managed to extract some of the database physical files (it appears as if most of the files are there... but not 100% sure if and what is missing).
I have two copies of the database files. One from 9.5.3 (which appears to be more complete) and one from 9.6 (I upgraded the container very recently to 9.6, however it appears to be missing datafiles).
All I am after is to attempt and extract the SQL code the relates to the user defined functions. Is anyone aware of an approach that I could try?
P.S.: Last backup is a bit dated (due to bad practices really) so it would be last resort if the task of extracting the needed information is "reasonable" and "successful".
Regards,
G
Update - 20/4/2017
I was hoping for a "quick fix" by somehow extracting the function body text off the recovered data files... however, nothing's free in this life :)
Starting from the old-ish backup along with the recovered logs, we managed to cover a lot of ground into bringing the DB back to life.
Lessons learned:
1. Do implement a good backup/restore strategy
2. Do not store backups on the same physical machine
3. Hardware failure can be disruptive... Human error can be disastrous!
If you can reconstruct enough of a data directory to start postgres in single user mode you might be able to dump pg_proc. But this seems unlikely.
Otherwise, if you're really lucky you'll be able to find the relation for pg_proc and its corresponding pg_toast relation. The latter will often contain compressed text, so searches for parts of variables you know appear in function bodies may not help you out.
Anything stored inline in pg_proc will be short functions, significantly less than 8k long. Everything else will be in the toast relation.
To decode that you have to unpack the pages to get the toast hunks, then reassemble them and uncompress them (if compressed).
If I had to do this, I would probably create a table with the exact same schema as pg_proc in a new postgres instance of the same version. I would then find the relfilenode(s) for pg_catalog.pg_proc and its toast table using the relfilenode map file (if it survived) or by pattern matching and guesswork. I would replace the empty relation files for the new table I created with the recovered ones, restart postgres, and if I was right, I'd be able to select from the tables.
Not easy.
I suggest reading up on postgres's storage format as you'll need to understand it.
You may consider https://www.postgresql.org/support/professional_support/ . (Disclaimer, I work for one of the listed companies).
P.S.: Last backup is a bit dated (due to bad practices really) so it would be last resort if the task of extracting the needed information is "reasonable" and "successful".
Backups are your first resort here.
If the 9.5 files are complete and undamaged (or enough so to dump the schema) then simply copying them in place, checking permissions and starting the server will get you going. Don't trust the data though, you'll need to check it all.
Although it is possible to partially recover given damaged files, it's a long complicated process and the fact that you are asking on Stack Overflow probably means it's not for you.

SQL Server 2008 R2 log files filled up the drive

So some of us dev's are starting to take over the management of some of our SQL Server boxes as we upgrade to SQL Server 2008 R2. In the past, we've manually reduced the log file sizes by using
USE [databaseName]
GO
DBCC SHRINKFILE('databaseName_log', 1)
BACKUP LOG databaseName WITH TRUNCATE_ONLY
DBCC SHRINKFILE('databaseName_log', 1)
and I'm sure you all know how the truncate only has been deprecated.
So the solutions that I've found so far are setting the recovery = simple, then shrink, then set it back... however, this one got away from us before we could get there.
Now we've got a full disk, and the mirroring that is going on is stuck in a half-completed, constantly erroring state where we can't alter any databases. We can't even open half of them in object explorer.
So from reading about it, the way around this happening in the future is to have a maintenance plan set up. (whoops. :/ ) but while we can create one, we can't start it with no disk space and SQL Server stuck in its erroring state (event viewer is showing it recording errors about 5 per second... this has been going on since last night.)
Anyone have any experience with this?
So you've kind of got a perfect storm of bad circumstances here in that you've already reached the point where SQL Server is unable to start. Normally at this point it's necessary to detach a database and move it to free space, but if you're unable to do that you're going to have to start breaking things and rebuilding.
If you have a mirror and a backup that is up to date, you're going to need to blast one unlucky database on the disk to get the instance back online. Once you have enough space, then take emergency measures to break any mirrors necessary to get the log files back to a manageable size and get them shrunk.
The above is very much emergency recovery and you've got to triple check that you have backups, transaction log backups, and logs anywhere you can so you don't lose any data.
Long term to manage the mirror you need to make sure that your mirrors are remaining synchronized, that full and transaction log backups are being taken, and potentially reconfiguring each database on the instance with a maximum file size where the sum of all log files does not exceed the available volume space.
Also, I would double check that your system databases are not on the same volume as your database data and log files. That should help with being able to start the instance when you have a full volume somewhere.
Bear in mind, if you are having to shrink your log files on a regular basis then there's already a problem that needs to be addressed.
Update: If everything is on the C: drive then consider reducing the size of the page file to get enough space to online the instance. Not sure what your setup is here.

Firebird database keeps getting larger

This one is interesting to me - despite the almost inane title. I have used Firebird for a long time, but not until recently noticed an interesting behavior.
I am using embedded Firebird 1.5, and noticed that if I stuff the database full of blobs (lets say 10mb worth), the size of the database increases. I can then delete all the fields in the database, and the file size of the DB remains at its expanded size. Currently it is at 20mb and is completely empty.
I know that Firebird has this built into its architecture (for quick indexing, speed issues etc), but I always thought it would decrease back down to its original ~2mb default.
Does anyone have any suggestions to 'deflate' the file size? The reason being is that this is a space conscious issue. If I had tons of space to work with, I wouldn't care. However that is not the case, and I need things to be as optimal as possible
The only way to free unused space in a firebird database is to do a backup then an immediate restore of that backup (Reference: Firebird FAQ).
Here is a good technical explanation of why this is so.
Note that Firebird will reuse the currently unused space - ie. if you put another 10MB of blobs in now, the database should not grow to 30MB.