GIT vs. SVN for single user repository

I recently started a discussion on Buzz on this topic and there have been some interesting comments that were made, therefore I am going to make a post on the topic. Basically the requirements are the following:

  • Personal Repository
  • Single User
  • Housing all personal source
  • Multiple language projects

The debate came from deciding where to host the code, either GitHub (git) or GoogleCode (svn / mercurial). Now I am familiar with both of the version control packages, not so much mercurial. I like SVN for a single user repository; I really want to stay on Google products if possible. The arguments against GIT are the learning time and the [interfaces / ease of use]. With TortoiseGit the ease of use is very similar to SVN and I already know GIT, so that is not a concern either.

So this is how things progressed, very minor edits made and parts that were irrelevant were left out. It might also be noted that I respect each one of these individuals.

Andrew van der Westhuizen: I like GIT, but I want to have a large repo for everything as a backup… not sure Git is suited for that. SVN you can checkout from within the repo structure. Haven’t been able to get that right with Git, not sure it can do it.

Andrew Symington: I’d go for git. In my experience its integration with diff makes it really good for generating patches. Git also doesn’t seem to freak out so much on commits.

If you’re used to SVN, this is your friend: http://git.or.cz/course/svn.html

Fritz Meissner: I guess it depends what sort of single user coding you’re doing. My MSc project is “single user” but I’m doing releases to a test server that people are actually using, while writing new stuff on my local machine. At the other end of the spectrum is my project euler stuff that just sits on my local machine without any backing up at all.

I guess in your case I might do a dropbox (or MozyHome, 2GB free) purely for backing up.

Dave Jacka: I would say that VC has massive value even if it is only you using it and even if it is only stored on your local drive (ie not really a backup). The advantage is that you have a complete history for what you have built. You can feel free to change things and try things because you never lose what you had before.

I also find looking through a VC history less confusing than browsing a bunch of folders on a file system, especially when the folder structure has grown organically over time. VC changesets almost always groups functionality together.

Fritz Meissner: Meh, if you’re gna do VC it’s just not worth not being distributed, and not really for the distributed reasons. Mercurial or GIT ftw. Being on any branch any time you want is too huge a win to ignore.

Dave Jacka: To be honest I really haven’t used git or mercurial much. I don’t really understand why I should. What do you mean being on any branch any time out want?

Andrew van der Westhuizen: In SVN terms,

You can have six of the same folder on your computer “checkouts”, but each one can push (commit) and pull (update) changes from each other.

Essentially in Fritz terms… you can work on any branch when you want… each folder is a branch.

Dave Jacka: Sorry yeah I see what you mean about the branches. Well I understand the concept. Just not that clear on why you would want that in your case. I mean how many different repos would you actually have? Do they spend large amounts of time disconnected from one another? Surely if it is just you then syncing the repos becomes a headache?

Ok there is a clear advantage in that you have a full repository on more than one machine – more robust to harddrive failures.

Fritz Meissner: Even if you have a repository on your local machine and no other machines at all, branching is so quick and easy you would never not create a branch for a new feature. The workflow is like so:

1. create branch A
2. Do some work on feature A
3. Get bored of feature A. Commit changes so far to branch A.
4. Create branch B from master
5. Build feature B and commit it.
6. Checkout master and merge with B.
7 – 89 Build some other stuff
90. Come back to A many years later, commit it.
91. Merge master with A.

So you don’t have to work in a strictly linear fashion. You could just forget about A and delete that branch, without having to reverse anything (yes you can go back to a previous state with SVN, but here choosing which changes to keep over many commits is painless). It gives you the freedom to muck around while still having revision control. Also step 91 you could merge A and any of your other features directly.

I will be updating this post when progress is made…

EDIT: final words on the topic

Fritz Meissner – Opening disclaimer: the only DVCS I know is Git. It’s not the distributedness of Git that makes it better at this stuff than anything else. If Mercurial is as good (I’ve given to understand that it is) it’s unrelated to the distributed thing. So the name is misleading.

I guess I assumed that the SVN workflow wouldn’t be the same, because I used branches very little when working with it. If you are using branches, the workflow is similar, as long as you discount the cost of creating a branch. But how often do you choose to create a branch in SVN? Basically never, or failing that, only when your team’s methodology dictates it. You’d rather just work in a linear fashion, and hope you never need to hop randomly back and forth. Here’s why it’s easier to work with branches in Git:

visualise the folder structure

my_code
__my_project
____folder_1
____folder_2
______sub_folder
________sub_sub_folder
__________file_a
__________file_b

Let’s say change A is a change to file_a. From inside sub_sub_folder I type:

git branch branch_a; git checkout branch_a

I edit file_a as I want to, then go:

git add file_a; git commit file_a; git checkout master

The git commands thus far have taken more time to type than to run (even for big projects). I’ve never left sub_sub_folder from my command prompt, and my machine has not needed to connect to the network. I’ve just finished step 3, here’s step 4:

git branch branch_b; git checkout branch_b

Predictable. Note that I type nothing else… no cd back to the project root, no mkdir, no copying untracked config files from sub_folder in branch_a’s hierarchy to branch_b. Branches are switched seamlessly in the same place, untracked files are left where they are while the versioned files move around them. My folder hierarchy at the beginning of step 5 looks exactly the same as it did when I started. Here’s what it would look like if I were working with SVN:

my_code
__my_project
____trunk
______folder_1
______folder_2
________sub_folder
__________sub_sub_folder
____________file_a
____________file_b
____branch_a
______folder_1
______folder_2
________sub_folder
________sub_sub_folder
__________file_a
__________file_b
____branch_b
etc. too many underscores!

Duplicate copies of every file and folder everywhere! Useless. The commands to actually make the branch I can’t remember for SVN, but the checkout process in SVN actually creates a new copy of every single file on the disk. Big difference for Git is that it stores changes, rather than the state of the entire project. When you checkout a branch Git only has to apply the changes that are different from your current branch. So the actual commands are faster too.

(I’m giving SVN slightly less credit than is due, it does some clever stuff to avoid duplicating files in the actual repository when a new branch is created… but as soon as you create a working copy, it duplicates everything)

Other things to like about Git:

– Code is compressed as it connects over the network
– No @!@#$# SVN folders everywhere in my directory structure, just a config file and a .git directory in the project root.

I’m sure there are others, I’m still a git noob.

Fritz Meissner – What I should’ve just said at the beginning: http://whygitisbetterthanx.com/#svn

Andrew van der Westhuizen – Fritz, over multiple projects (multiple repositories) and having multiple branches, even at file level. How would you keep track of all the branches you have made and which one does what?

Fritz Meissner – I only keep branches around as long as I haven’t finished the dev on the feature. Once I’m done I merge with my main branch and delete the feature branch. Even if some features hang around unfinished for a long time I can’t see myself needing more than a few branches in existence at the same time on the same project. I just choose branch names that mean stuff to me in the context of the project, although at one stage I was using the ID field from my issue tracking software in the branch name.