Replacing File.Copy

By Ashish Khandelwal, September 10, 2009

The easiest way to copy a file in a .NET program is to call the File.Copy method, supplying it the source and destination files. It could hardly be simpler than this:

File.Copy(srcFilename, destFilename);

That method will throw IOException if there is an existing file of the same name as destFilename. If you want to overwrite existing files, call the overload and specify true for the overwrite parameter:

File.Copy(srcFilename, destFilename, true);

File.Copy certainly is very easy to use, but as I mentioned in the previous section, it suffers from the large file copy problem. That is, using File.Copy to copy a very large file from one machine to another can cause the source machine to run out of memory. For example, I encountered that problem when executing a statement of this form:

File.Copy(@"\\server\data\file.dat", @"d:\data\file.dat", true);

If you want to copy a file and not encounter those problems, you have to write your own copy method.

A note about timing

Timing disk input and output operations on a Windows system is somewhat difficult due to caching. If you read a local file, close it, and then immediately read it again, your second time through will usually be much faster because Windows caches the file in memory. Assuming, of course, that the file will fit in memory. On a machine with 16 gigabytes of RAM with nothing but the basic services running, reading a 100 megabyte file might take two seconds the first time through. The second time through it’ll be almost instantaneous.

Writing is even less intuitive. I’ve seen Windows “complete” a write of 100 megabytes in milliseconds. The reason? Windows’ I/O system buffers the data, lets the WriteFile method return, and an asynchronous process actually writes the data to the disk when it can.

You can get more realistic timings by specifying no buffering when you open the file but doing so can reduce performance. How much it reduces real performance is hard to say. And if you’re using a method like File.Copy, there’s nothing you can do about it: you’re at the mercy of how the operating system’s API implements the copy operation.

In the tests that follow, I’ve attempted to be very specific about the conditions under which timings are taken. But be forewarned that your mileage may vary wildly, depending on how much free memory you have the speed of your disk subsystem, and also how fragmented your hard disk is.

Copying a file in blocks

If the file you’re copying is smaller than two gigabytes in size, then you can potentially copy the entire input file into a memory buffer and then write it to the output file. If the file is larger than two gigabytes, you can’t easily copy it all into memory because of the two gigabyte limit on .NET data structures. Even if the file would fit within the 2 gigabyte limit, it’s probably not a good idea to try copying it this way. You’ll see why below.

To copy a large file, you need to copy it block by block: read a chunk from the input, write it to the output, and continue until you’ve copied the entire file. The simplest implementation would have you reading the file byte by byte, but that’s incredibly slow. Increasing the buffer size will make for a much more efficient operation, and doesn’t require much more work than the byte-by-byte copy.

The code below copies a file in blocks, using the buffer size that you specify. Note that the buffer size parameter is the size of the memory buffer that you use to shuttle data between the input and output files. It has nothing to do with the operating system’s internal buffer used for read ahead or write buffering.

static void CopyFile1(string src, string dest, int bufferSize)
{
using (var outputFile = File.OpenWrite(dest))
{
using (var inputFile = File.OpenRead(src))
{
var buffer = new byte[bufferSize];
int bytesRead;
while ((bytesRead = inputFile.Read(buffer, 0, bufferSize)) != 0)
{
outputFile.Write(buffer, 0, bytesRead);
}
}
}
}

File.Copy takes an average of right at nine seconds to copy a 650 megabyte file from my D: drive to my C: drive if I delete the destination file first.

My CopyFile1 method also takes right at nine seconds if I use a bufferSize of 32 kilobytes. If I set bufferSize to one kilobyte, it takes about 11 seconds. Time to copy decreases as the buffer size increases from 2K to 32K, and levels off up to 256K. A buffer size of 256K results in a copy operation that’s slightly faster than File.Copy most of the time. Beyond that, times increase dramatically. A buffer size of 512K causes the copy operation to take more than 20 seconds.

Why a larger buffer should increase run times is something of a mystery. I would expect a point of diminishing returns, where increasing the buffer size would not increase performance, but doubling the runtime is an unexpected result.

Curious, I thought I’d see if breaking up that big write into smaller chunks would change the run time. The modified method, shown below, still reads the buffer in one big chunk, but when it writes the output file, it does it in 32K chunks.

const int K32 = 32 * 1024;
static void CopyFile1(string src, string dest, int bufferSize)
{
using (var outputFile = File.OpenWrite(dest))
{
using (var inputFile = File.OpenRead(src))
{
var buffer = new byte[bufferSize];
int bytesRead;
while ((bytesRead = inputFile.Read(buffer, 0, bufferSize)) != 0)
{
int bptr = 0;
while (bptr < bytesRead)
{
int bytesLeft = bytesRead – bptr;
int bytesToWrite = (bytesLeft <= K32) ? bytesLeft : K32;
outputFile.Write(buffer, bptr, bytesToWrite);
bptr += bytesToWrite;
}
}
}
}
}

The result is quite surprising. The modified method takes a little less than 10 seconds to execute. It appears that either the FileStream.Write method, or the underlying Windows file system API function it calls does not handle large output buffers well. Further testing shows that FileStream.Read does not suffer from this problem of the buffer being too big. Only FileStream.Write does.

After a lot of experimenting and performance testing, I decided to remove the bufferSize parameter since allowing the user to specify it can only lead to problems. I settled on a fixed buffer size of 64 kilobytes because that gave me consistently good results. 256K gives slightly better results, and I might increase my buffer size to that in the future once I’ve had a chance to test my code on more machines. Right now I’ve only tested it on my x64 machine running Windows Server 2008. The final (for now) version of the copy program, which performs as well as File.Copy is called CopyFile3:

const int CopyBufferSize = 64 * 1024;
static void CopyFile3(string src, string dest)
{
using (var outputFile = File.OpenWrite(dest))
{
using (var inputFile = File.OpenRead(src))
{
var buffer = new byte[CopyBufferSize];
int bytesRead;
while ((bytesRead = inputFile.Read(buffer, 0, CopyBufferSize)) != 0)
{
outputFile.Write(buffer, 0, bytesRead);
}
}
}
}

Where are we?

It’s always good to stop and see where we are in relation to our goal. The primary goal is to write a replacement for File.Copy that doesn’t cause the source machine to run out of memory when the client is copying a file from a server across the network. Although a faster copy would be nice, that’s a secondary goal. At this point we have a method that duplicates what File.Copy does and operates at about the same speed.

Except it doesn’t quite duplicate the operation of File.Copy. When copying a file across the network (i.e. the src parameter is of the form \\server\dir\file.dat), it runs twice as fast as File.Copy. More importantly, doesn’t cause the server to use a lot of memory. I call that a win.

There are undoubtedly ways to increase the performance of CopyFile3. Both RoboCopy and Teracopy are somewhat faster, although RoboCopy also causes the server to run out of memory. But for my purposes and most that I can envision, it’s plenty fast. And since it eliminates the out of memory problem, I see no need to experiment further.

VN:F [1.7.2_963]
Rating: 4.0/5 (1 vote cast)

Leave a Reply