When is correct split database file and log register file in Sql Server? - sql-server-2008-r2

In the next days, I will have a DB with more than 400 GB of information. I would like to know what could be a good option to split the database and log register files. Also : is it necessary to create distinct file groups?
Thanks.

This is one of those "it depends" questions, what is the exact issue you are trying to solve here? A 400Gb file is not really a problem until it becomes a problem.
If you're experiencing issues with IO throughput, then you might get performance improvements by splitting the data into different files and putting them on separate drives. Putting the log file on a different set of drives is also recommended to improve performance, but if you're not having IO performance problems, then why bother?
There is a lot of talk about best practices etc with regard to setting up SQL server and there are a few things that as a general rule are good to follow, but if you have something already set up and working and users aren't shouting, then why make the work for yourself by changing things?
Whenever you make a change like this in SQL server, make sure you know what problem you are trying to solve, make sure what you're doing is likely to improve or solve that problem and then take measurements to verify that what you have done has actually improved things.

Related

PostgreSQL Replication Tools

On the postgreSQL's wiki, on the "Replication, Clustering, and Connection Pooling" page ( http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling) , it shows the following example on replication's requirements:
"Your users take a local copy of the database with them on laptops when they leave the office, make changes while they are away, and need to merge those with the main database when they return. Here you'd want an asynchronous, lazy replication approach, and will be forced to consider how to handle conflicts in cases where the same record has been modified both on the master server and on a local copy"
And that's pretty much my case. But, unfortunatelly, on the same page, it says: "(...) A great source for this background is in the Postgres-R Terms and Definitions for Database Replication. The main theoretical topic it doesn't mention is how to resolve conflict resolution in lazy replication cases like the laptop situation, which involves voting and similar schemes."
What I want to know, is where can I find material on how to resolve this kind of situation, and wich would be the best way to do this on PostgreSQL.
I will have to check into RubyRep but it seems like Bucardo might be a more widely supported option.
Gabriel Weinberg has an EXCELLENT tutorial on his site for how he uses Bucardo. The guy runs his own search engine called DuckDuckGo and there are quite a few tips and tricks that are optimized for his use cases.
http://www.gabrielweinberg.com/blog/2011/05/replicating-postgresql-with-bucardo.html
Just answering my own question, if anyone ever finds it: I'm using Rubyrep http://www.rubyrep.org/ and it's working.

Two-way syncing with iPhone and Web service

Yeah I know there are couple of questions which are related to syncing with iPhone and Web DB but none of them helped me.
I also did a lot googling but I rarely found informations about two-way-syncing. Maybe I just used the wrong keywords.
I'm building an App right now and I came up with the idea to add a two-sync to my App and my Web service.
My first thought were that it would be ridiculously easy but it turns out to be not that easy at all.
I found couple of problems and some solutions to my problems but I would like to hear from you guys if these soultions would create other problems or if these solutions are good or bad.
The idea of my App is helping me sync my notes which I will take on the go with my iPhone and at work or at home with a Web App.
Those two ends should always sync'd cause I don't know on any time which device (iPhone or Computer) I will use to take, edit or just read my notes.
What I have on both sides:
For my Web service (and web app) I will use rails and I think mysql on the DB side.
On the iPhone I will use a SQLite DB with a Objective-C wrapper (FMDB).
Both will exchange data via JSON (using a JSON framwork on iPhone side).
My ideas so far:
Primary key has to be unique on both sides
As a primary key I will use a UUID. I think it's a unique solution on both sides and it won't make any duplicates (at least I hope).
Revisions for changes of data
Each change will be saved as revision with a SHA1 key, which I will create from date + note data.
The revision object is also including information like:
date
which note object belongs to this revision
changes are made on which device?
what chaged? (atually I'm not sure about including this information)
My "solution" so far is I will track every modification (create, update, delete) on a histroy-table with revisions on both sides.
On the iPhone side I will first update my history-table from Web DB and then commit my changes to the Web DB.
This should work, right?
That sounds not that bad to me but my question here is how can I handle conflicts? I don't want to bother the user with messages how to handle the conflicts.
Roundup of my questions:
Is my "solution" good or bad? What should I change to make it better?
How can I handle change conflicts so the user don't notice them?
Do you have any resources I could read about two-way-sycing?
EDIT:
Thank you all for your answers. I know now that I'm not alone with this "problem" and there is no simple and all fitting solution for all Apps. I assume that I'm doing good with my ideas or solutions so far and I will try to come up with syncing rules.
My idea so far is: I will develop it simple as possible and will use it for my own needs. Solve problems I discovered while using and syncing. After that I will invite my friends to test and solve problems they have.
I think this way I can came up with real world rules for syncing my data with Web cause I see what people are actually doing and where problems are.
What you think?
"It depends."
Everyone loves that line in their answers.
Two way sync boils down fundamentally to conflict resolution. And only you as the application designer can come up with the rules for conflict resolution.
Without conflict, syncing is easy.
One way syncing is "easy" because it's just like two way sync, save that the rules for conflict always favor one party. "Make this look like that." Simple rule.
Fine grained two way syncing isn't that hard, you just need to record the specific changes that are made and when they are done, then when you sync, you take that log of changes from each party, combine them in to a single log, and then apply that log to each party starting with the last time they were in sync.
By specific changes I don't mean "record changed", as it's too coarse. Rather you want to know that "lastName" of record changed. It changed at 01/01/2011 12:23:45.
When party A says lastName changed to "Johnson" at 01/01/2011 12:22:45 and party B says lastName changed to "Smith" at 01/01/2011 12:22:46, then "Smith" is the right answer, since it's the latest.
But wait, did you see what happened there? I just pulled a rule out of thin air. "Latest wins". Maybe that doesn't work for you, maybe you have different rules. "It Depends".
So, really, it all comes down to the rules. You can make it as fine grained as you want. There will ALWAYS be conflicts. That's what the rules are for.
So you need to decide what those are for you application.
actually i consider that the only problem in any kind of two way syncing only happens when there are conflicts. Really. Take for example any version control system (svn, cvs, git, etc.). They solve this conflict more granularly because they split the file itself, and they are checking for line conflicts, so changes in two different parts of the file are not treated as conflicts.
However i suppose this solution would not be really feasible because it's a pain to implement it :) ...
If you decide to handle the conflicts at the level of notes, and not it's lines, then probably at the end of the day you need to come up with some business rule that defines what happens when there are changes that result in conflict.
Possibilities:
Use the last change. Override the older. This is easy.
A solution what Dropbox uses i've seen it a couple of times when we were changing the same document on multiple machines is, that it creates multiple files appending a suffix to let users know about the changes on the multiple machines. Something like this you could do easily with notes as well.
I'm not sure i've helped though ...
Moszi

How to implement long-term statistics and short-time log?

We develop a larger database web application with Perl Catalyst and PostgreSQL under Linux. Users can login and upload and download data files (scientific measurements).
I wonder how to implement a logging/statistics system.
We need to view general access trends, and want to analyze traffic caused by certain users/IPs and get access numbers for certain files or topics. I was thinking about something like RRDtool to implement this or writing the total numbers to another database table. I would be nice to get some visual graphs from the access data:-)
Additionally we need to analyze the activity over the last days in detail. If problems or attacks occurred it must be understood and undone. IMO this needs an action log in a database table.
Can you give me some inspiration on how to implement these things? I would love to use the same system both for logging and long-term statistics. Maybe we can accumulate log data after a period of e.g. 7 days. Not that I had no idea how to do it, but I'd like to hear some opinion from somebody else.
Hints to useful CPAN modules are appreciated. We know and already use log4perl but this is a bit too detailed to store it for ~7 days...
Actually I think you answered yourself, RRDTool is pretty good for long-term, I use it for 1/2hr automatic meter readings for a communal boiler system, with a 3 year window. Nice graphs too.
However, I'm assuming that all this runs under a web server and the uploads and downloads generate [for example] Apache logfile entries, then you have a great many options with this: http://httpd.apache.org/docs/current/mod/mod_log_config.html.
This would mean that you could use Webalizer for 'routine' reports and write roll-your-own for the detail, maybe starting from: http://search.cpan.org/~ulpfr/Logfile-0.302/Logfile.pod
Hope that's a little bit helpful, it's a broad, broad question though.

need to implement versioning in Online backup tool

I am working on the developement of a application that will perform online backup of the files and folder in the PC, automatically or manually. Currently, I was keeping only the latest version of the file at the server.Now, I have to implement the versioning so that only the changes can be transfered to the online server and user must be able to download any of the available version of the file at Backup Server.
I need to perform Deduplication for this. Guys, though I am able to perform it using the fixed block size but facing an overhead of transferring the file having CRC information with each version backup.
I have never worked on such technology , so lacks in experience. I am eager to know is there any feasible method to embedd this functionality in the application without much pain. Is any third party tool would help to perform same thing? Please let me know?
Note: I am using FTP protocol to transfer the data.
There's a program called dump that does something similar, but it operates on filesystem blocks rather than files. rsync also may be of interest.
You will need to keep track of a large number of blocks with multiple versions and how they fit into the various versions of the original files, so you will need some kind of database to track this information, and an efficient way to query it to determine which blocks in a given file need to be transferred. Also note that adding something to the beginning of a file will cause all your blocks to be "new" if you use a naive blocking and diff scheme.
To do this well will be very complex. I highly recommend you thoroughly research already-available solutions, and if you decide you need to write your own, consider the benefits of their designs carefully.

Log messages for revision control by yourself

I use version control extensively. When I'm working by myself, I still use it, and find many good things about it. I know I'm 'supposed' to put in good messages etc, but find that usually the date of a commit and all the tools for checking diffs etc are enough. I often end up putting in junk messages like 'changes'.
I guess this is a weird question, but, what do others use as their log messages when they're making commits in repositories that only they are using? Is there any problem with not leaving messages?
I happen to use git, but this question is more general.
For me it depends on the nature of the fix. Sometimes, its just one word. "Backup", or "copy changes". However, if something that caused me a lot of grief, I'll document my changes a lot more extensively. If it is open source and I won't be there all that long, I document my changes very extensively. svn -diff ( and then document all my changes that way...:)
Bug fixes that are identified by a number in another system, need to be in the change log.
I'll, grant you that "Fixed bug" isn't very good in the change log, but if it a simple bug then maybe that will do.
I don't think there is a good and fast rule, but your entry should be proportional to the amount of time you spent doing the code. A copy change? spelling mistake? not that much of a message needed.
Did you spend 2 hours fixing a bug? Yep! Long commit message.
I'm a solo developer using version control as well. I recently started using an issue tracking system that monitors the messages, so mine have gotten better and at least reference an issue number when there is one. The rest of the time, I try to at least generally state what areas changed in a short sentence or two.
But every once in a while, I still get lazy (or am half asleep) and type in things like "fixed a bug".
You should put in messages as meaningful as those you would put into code that has multiple developers. There's usually little difference between someone else looking at your changes in a couple of days, and you looking at them 12 months down the track. There's a good chance, in both those situations, that the person looking will have no idea why the change was made :-)
I even go so far as to use proper change control, even for the stuff I do solo. That means every change to the code base has to have either a change request or a bug report (with full documentation).
That makes my life a lot easier when I need to understand why something was done. I've got better uses for my "wetware" than trying to remember every little change and why it was done. Far better to let the machine remember it - its memory is so much better.
And, in my opinion, if you can't be bothered doing it right, don't do it at all. Just revert to the cowboy-coder mentality and save yourself some effort.
Doing it right doesn't take that much extra effort and the rewards are substantial. It all comes down to a cost/benefit analysis.