Article
Harness the Power of CVS for Your Site
Now Where did I Put that File?
In your travels on the Internet, you may have come across the acronym CVS, which is used with a kind of fanaticism by software developers who work on Open Source projects.
If you've been too shy to ask, CVS stands for Concurrent Versions System (not to be confused with CSV - the Comma Separated Value format for data files), and is a tool for controlling the version/revision of any type of file. The basic idea behind CVS is that there's a central place (or repository) where all revisions of a file are stored, along with comments on what they are and what changed with each new version. The repository can then be accessed from more or less anywhere you like -- whether it's your PC at home, the one at the office, or by another developer working on the other side of the world. So CVS ensures that everyone involved with the project always works on the latest version of a file.
Why should this interest you? If you've recently started making your own Web pages or your first PHP scripts, just for fun, the idea of version/revision control may seem like overkill. But if you do anything more than dabble in Website building, eventually one or more of the following things will happen to you:
You work on two computers. You do a whole load of work on a Web page on the first and add it to your site. Then later, you do a whole load more work on the other computer, and add it to the site, only to discover that you started with an old version of the page on the second machine, so all the work you did earlier on the first computer is lost.
You make some major changes to a Web page and add it to your site, only to discover that there's a major bug -- your site grinds to a halt. Desperately you scramble for the previous working version of the page to no avail; you've over-written it. As if on cue, the phone starts ringing with the first of many angry customers...
You overhaul your site, making changes to more or less anything and everything, and it takes you three hours (during which time your site is down) to update your site with all the new files.
You start to collaborate with some other site developers and instantly work grinds to a halt as you spend 95% of your time just keeping up with all the changes other people are making.
An experiment with the rm -r * command in Linux, or a slip of the mouse in Windows Explorer goes horribly wrong ,and you erase your entire site from your Web server. The copy on your own PC is out of date and your Web host says "Sure we'll restore your site from the backup. Tomorrow."
CVS can help prevent any of these scenarios from happening. And even better news: it's free (Open Source)! So whether you're a graphics designer or writing the latest dB abstraction class in PHP, hopefully this article will help de-mystify CVS and show you how to get the most from it. CVS is a big subject so we'll have to stay away from the fine print, to avoid getting bogged down, but you'll find relevant links to further reading at all times.
So here's tonight's billing:
- Absolute beginners: the basic concepts behind CVS
- Hugging the tree: Getting down to business by checking out our first CVS repository
- SitePoint members hit Sourceforge: HarryF and Glenplake run amok with CVS in the Open Source community
- Wrap up: some final words and pointers to useful documentation, such as setting up your own CVS server on Windows (it's not just *Nix types who can do it)
Absolute Beginners
If you've worked in any distributed computing environment (and, as you're reading this Web page, you have), you've probably come across version control in some form or other. It may be that you've tried to update a Word document from the central NT server at work, only to be told it's available in read-only format because someone's working on it. Or perhaps you've had to deal with "record locking" on a database and had some bad experiences where two of your site administrators updated the same article at the same time. One way or another, you've probably come across version/revision control in some form and wished you could find a better way to handle it. Well now you can: with CVS.
Essentially, CVS provides two functions: record keeping and collaboration in a manner designed to solve all your headaches. The first thing that makes CVS special is that it doesn't lock records. Instead it keeps track of who's doing what, allowing everyone to work in their own style, but watches for potential conflicts, where two people try to update the same record with their own individual revisions.
But before we go any further, it's time to introduce some terminology we'll need to use if this explanation is to make sense. You'll see these terms used all over the Internet, wherever CVS is concerned.
Record: I'll be using this here to refer to any "object" you're working with, be it a .gif image, a Shockwave file, a PHP script, a Word document ...or whatever you want.
Revision: a change to a record (or group or records), while "work is in progress". For instance, you create a PHP script. Later, you come back and make some changes to it, thereby creating the next revision of the script.
Repository: this is the "mother ship" of the project you're working on. All your work is stored here and fully tracked with revision history, allowing you to check out the latest records at a moment's notice. Say you get a new client, "Buildmysite Inc." who wants you to build their Website. You'd create a repository called "buildmysite" where all the HTML pages, images, PHP scripts, documentation, queries for creating their MySQL database would be stored.
Working copy: refers to any record that you (or another developer) are working on. A working copy has been checked out of the repository and sits on your own hard disk while you edit it with Photoshop, Ultradev, etc.
Check out: what you have to do to get a working copy of a record from the repository. Once your repository is created and all the work you've done has been stored in it, you begin every working day by checking out the latest revision so you can work on it.
Commit (aka. check in): you've been working on a record and have done enough for the time being (yes! Time for a coffee break!), you commit the record back into the repository so that the latest revision is stored there. Another developer, who later checks out the record gets the latest revision to work with.
Log message: the comment you supply every time you commit a record; usually a sentence or two that describes what you've done. This log message is then available for general viewing, so that everyone can see what's changed.
Update: this updates your working copy with all the latest changes from the repository. For example, you start your day by checking out a newly created record to your computer, and work solidly on the HTML template till twelve noon. Your good buddy Bob then calls to say he's made some changes to the CSS file and committed them to the repository, and he thinks Janet might also have altered some of the images. So you quickly perform an update to bring your working copy in line with everyone else's work. Once you've seen how your HTML template looks with all the new work, you check it in to the repository.
Conflict: let's say that Bob and Janet check out the same PHP script, unaware that the other is also working on the same file simultaneously. After making some changes, Janet commits her work to the repository; no problem. Bob then tries to commit his changes, but is about to overwrite the work Janet has done. CVS spots the problem and warns Bob of conflicts. It's then down to Bob and Janet to work out the best way to combine both their work. That may seem like an odd way to handle this situatio but it's one of the things that makes CVS powerful. On big projects with many developers, or projects where the developers must collaborate over the Internet, this approach allows for far more flexibility.
Also, if the record you're working with is in text form (such as a Web page or a PHP script), it's possible for two people to work on the same record in different sections (without conflicting with on another). They can then commit their respective changes to the repository, where CVS will merge the second part with the first, to produce the latest whole revision.
Tree: Refers to everything stored under CVS. There could be multiple repositories stored on a given CVS server. The entire structure is referred to as a tree (like any directory structure on a hard disk).
You may have already realised that CVS operates on a "client-server" basis. For instance, you have a server where your CVS repository is stored (this is usually a machine with plenty of disk space and a fast network connection, which could be running some flavour of Unix like Linux, or a version of Windows NT). You access the server over the network using a client, which you run on your own workstation (there's client support for more or less every operating system: Unix, Windows and Macintosh). This makes CVS extremely powerful -- if your CVS server is connected to the Internet, you can check out the latest version of your project to anywhere in the world!
OK - they're the basic terms and concepts you'll need to get into CVS. Don't panic! You need to know what they are, but we'll be showing you CVS in action in a moment, which should make things a bit clearer.
Harry has been working in corporate IT since 1994, with everything from start-ups to Fortune 100 companies. Outside of office hours he runs