Kevin Hatfield's Blog

Kevin's blurry train of thought……

Archive for March, 2009

Fast File Copy – Linux!

Tuesday, March 31st, 2009

If you’ve ever had to move a huge directory containing many files from one server to another, you may have encountered a situation where the copy rate was significantly less that what you’d expect your network could support. Rsync does a fantastic job of quickly syncing two relatively similar directory structures, but the initial clone can take quite a while, especially as the file count increases.

The problem is that there is a certain amount of per-file overhead when using scp or rsync to copy files from one machine to the other. This is not a problem under most circumstances, but if you are attempting to duplicate tens of thousands of files (think, server or database backup), this per-file overhead can really add up. The solution is to copy the files over in a single stream, which normally means tarring them up on one server, copying the tarball, then untarring on the destination. Unless you are under 50% disk utilization on the source server, this could cause you to run out of space.

Brett Jones has an alternative solution, which uses the handy netcat utility:

After clearing up 10 GBs of log files, we were left with hundreds of thousands of small files that were going to slow us down. We couldn’t tarball the file because of a lack of space on the source server. I started searching around and found this nifty tip that takes our encryption and streams all the files as one large file:

This requires netcat on both servers.

Destination box: nc -l -p 2342 | tar -C /target/dir -xzf -
Source box: tar -cz /source/dir | nc Target_Box 2342

This causes the source machine to tar the files up and send them over the netcat pipe, where they are extracted on the destination machine, all with no per-file negotiation or unnecessary disk space used. It’s also faster than the usual scp or rsync over scp because there is no encryption overhead. If you are on a local protected network, this will perform much better, even for large single-file copies.

If you are on an unprotected network, however, you may still want your data encrypted in transit. You can perform about the same task over ssh:

Run this on the destination machine:
cd /path/to/extract/to/
ssh user@source.server 'tar -cz -C /source/path/ *' | tar -zxv

This command will issue the tar command across the network on the source machine, causing tar’s stdout to be sent back over the network. This is then piped to stdin on the destination machine and the files magically appear in the directory you are currently in.

The ssh route is a little slower than using netcat, due to the encryption overhead, but it’s still way faster than scping the files individually. It also has the added advantage of potentially being compatible with Windows servers, provided you have a few of the unix tools like ssh and tar installed on your Windows server (using the cygwin linked binaries that are available).

GiftStick – give the gift of Free Software – distributing flash drives with useful open-source desktop applications

Tuesday, March 31st, 2009

giftstick_20081215

Anti Virus

The first thing that needs to be done on many machines is to run a virus and adware scan. ClamAV is a solid open source virus scanner that’s used by server administrators to scan and filter email coming in and out of corporate networks. There’s a derivative of this program, ClamWin, that adds a desktop interface to the package, and you can use this to effectively scan and clean a Windows box before installing anything else.

There’s also a Mac version called ClamXav which I’ve included a link to for completeness. At this point in time I haven’t really experienced any adware or virus issues in OS X, but the download is there if you need it.

On a side note, I’ve noticed so many problems on relatives’ machines that are directly related to commercial virus scanning utilities. I’ve never installed these, so I don’t know if it’s a misconfiguration issue or that the hardware/OS combination isn’t speedy enough to handle on-the-fly scanning of everything that’s going on, but my first move is usually to uninstall these programs. More often than not, performance problems seem to be the direct result of a virus scanner. This is my opinion only, but I’d recommend removing any existing anti-virus software, or at least disabling the real-time scanning features before installing ClamWin. Better to use a safe browser and scan down downloads manually, in my opinion.

Safe Browsing

One of the best things you can do for Windows users is to install Firefox. It provides a more secure browsing experience and lessens the exposure to adware, popups, and virii, without damaging the user experience with so many warnings you can’t differentiate problems from normal behavior. It’s a solid install for Mac users as well, since there are a number of Firefox addons which can add additional security features to your browsing experience.


Secure Browsing and Communication

Encryption and anonymity features can be very important, especially for laptop owners and users who want to email information securely. If you know a very mobile laptop user, there’s a reasonable chance that their laptop will get lifted or lost at some point. I’ve included GPG on my gift sticks so that sensitive files like bank and tax records can be encrypted in case of loss.

Also recommended is the Tor anonymity routing tool. If you’re browsing the web in a public space, it will help to prevent local snoops from monitoring your communications. You’ll still need to use SSL enabled (https) sites to ensure end-to-end encrypted connections, but Tor will help to keep those connections anonymous.

There are also Firefox plugins for GPG and Tor. The GPG plugin allows you to easily encrypt and decrypt data within web applications. If you know a GMail or webmail user, this will allow them to secure communication on these mail systems. Torbutton for Firefox provides a simple way to enable and disable the Tor network. Browsing with Tor can be heinously slow, so it’s nice to only use it when necessary.


Word Processing, Spreadsheets, and Office Tools

Most people use MS Office at work, but don’t have the latest version (or any version at all) at home. The Google Docs online apps do a reasonable job, but OpenOffice.org is a much more robust suite, can do most everything MS Office can, and it will function offline. It’s the largest install on my gift drives—the installer weighs in at about 150MB for both the Mac and Windows versions. If you know someone who has a pirated or ancient copy of MS Office, though, installing OpenOffice on their machine might make for a great gift.

Photography and Image Manipulation

If someone gets a camera this year, forget about the crappy bundled software that comes with it. Even in the best case scenario, it’s a crippled version of a popular app that will leave anyone wanting. In the worst case, well, some of the bundled photo tools rank right up there with bundled printer applications, which is to say that I can’t write about how I really feel about them.

GIMP is awesome. Install it for your family and show them how to open, crop, scale and save JPGs. If they can get past that, teach them how to color correct and desaturate images. It’s the 99% of what most people need to know to get the most out of their photography.

As of this writing, the native Darwin version of GIMP for OS X is still too buggy to use. The X11 version runs reliably, but you’ll need to get the X11 package that comes with XQuartz or install it from the OS X install disks.

The Software

Without further ado, here are the downloads for all the above packages, separated by platform.

For Windows Users:

Downloads:
Firefox 3

Tor (Install guide)

GPG4Win (Based on GNU Privacy Guard)

ClamWin Antivirus

OpenOffice.org

GIMP – GNU Image Manipulation Program

Firefox Extensions:

Torbutton for Firefox

FireGPG GPG Firefox extension – use GMail/Webmail securely

For Mac Users:

Downloads:
Firefox 3

Tor (Install guide)

Mac GPG (Based on GNU Privacy Guard)

ClamXav Antivirus

OpenOffice.org

GIMP for OS X – GNU Image Manipulation Program
This requires X11, which you can install with the XQuartz-Project if you have Leopard. With 10.4 (Tiger), you’ll need to install the X11 package from the optional installs section of the OS X disks.

Firefox Extensions:

Torbutton for Firefox

FireGPG GPG Firefox extension - use GMail/Webmail securely

Easy Download Option

I collected all of the installers for both Windows and Mac platforms into a single zip file that you can download from Sourceforge. At the time of this post, these were the latest stable binary installer releases for all of the above files, excluding the Firefox extensions, which you’ll need to open in Firefox manually after installing it on the destination machine. You should be able to download either the giftstick-mac.zip or giftstick-windows.zip (or both), unzip it to a 500MB flash disk, and go on a free software installation binge.

Please note that the source files and latest releases for all of these programs are available at the sites listed above. The GiftStick zips are provided as a convenience for grabbing all of these applications in a single download. If this turns out to be helpful for a lot of people, I’ll try and keep the GiftStick downloads up to date.

Considerations

The Mac files come out to about 360MB (bigger if you need to keep both 10.4 ad 10.5 installers), and the Windows files are at about 220MB. You can fit either on a 512MB or larger flash drive, or toss files for both platforms on a 1 Gig drive. I’m going to do the latter and help people with the installs and training, but you may want to consider putting this on a couple $30 drives and leaving them behind. Maybe it will get passed along to other potential free software converts.

Of course, there are many other open source packages that I’ve missed here. If you think I’ve left out anything essential, please share a link in the comments.

Hack on, and happy holidays!

Original content by: MakeZine

Zero downtime with server restarts using HAProxy

Tuesday, March 24th, 2009

HAProxy is a high availability, software-based HTTP load balancing tool that I’ve seen gaining a lot of traction in large server cluster and cloud computing environments. I’m currently using it as part of a pre-built, cluster image that a third party vendor is maintaining, and it’s performance impressed me enough that I’ve started to look into its capabilities further. Because it’s a software solution, it gives you a lot of flexibility to customize it’s configuration.

One of the neat features I came across is a configuration that will allow you to reboot servers in a cluster without a single user experiencing a 404 error, down-time, or lost sessions. The trick is to use an iptables rule to have Apache respond to two ports, say 80 and 81. Apache really runs on port 80, and then port 81 is configured to forward to port 80. HAProxy is then configured to use the application server’s port 81, and the same server at port 80 is defined as the hot backup.

The igvita.com blog has a good howto on doing just this:

Instead of specifying a physically different app server, we’re going to define our backup instance to be the exact same application server in each case, but with one minor difference: the status port, for the main app server will be different from the one we use on the backup.

Now, if we want to put the server into maintenance mode, we remove the IPTables rule for the forwarded port, and wait a few seconds so that our upstream HAProxy instance recognizes that the server is no longer available for new connections – this is key, it means that no client is dropped in the process. Now, once the server is out of rotation in HAProxy, we can do a graceful restart, add the IPTables rule back in, and we’re live!

What’s cool is that without any reconfiguration on the proxy, you can pull a machine offline gracefully. You simply disable the iptables port forward, HAProxy will notice that port 81 went offline and start sending existing users to port 80 with their current cookies. In reality, it’s the exact same Apache instance, so all session information remains intact. New sessions will all be sent to your other servers, and you can wait until nobody is left using the maintenance-mode machine before taking it offline.

HAProxy
Zero-Downtime Restarts with HAPRoxy
Official HAProxy Documentation (see section 4.2, soft-stop using backup servers)