Thomas Tempelmann | Technical aspects of OS X file server access performance

This article is about mounting network volumes (such as NAS and networked computers) and its (bad) performance implications, along with tips on how to solve them, both for programmers and end users.

Background for this article

As a long-time user (since System 6) of Macs and as someone with quite some experience with disk data recovery and file system programming, I've always been interested in preventing accidental data loss and performance optimizations.

I've always had more than just one computer, usually both a desktop and a laptop computer. Both were networked, and thus I have accessed the files on the disks on both computers through network file sharing. Back in the 90s Apple's Mac OS made this much easier than how it worked on Windows, at least in regard to setup and accessibility. However, performance-wise, it often appeared to me that Windows did better.

Lately I got the impression that accessing remote volumes over the network got actually slower than before in OS X, especially with the Finder. I suspect that part of this is due to the Finder having been rewritten in Cocoa, probably using new File System APIs which, IMO, are part of the problem.

But not only this - even opening just my Applications folder that contains 250 items takes quite a long time to update all items nowadays. And if I'd do that over the network it can a minute while I wait to have anything appear in the Finder window. Even over a fast Gigabit Ethernet connection to the Mac next to me. That's just awful.

One of my own program, Find Any File, which also deals, like the Finder, with showing lists of files, showed also a bad performance until recently. That's when I took a day or two to investigate what makes it so slow, found the issue and fixed them. Now Find Any File is much faster, especially when compared to the Finder.

The culprits that affect the performance

The network is the bottleneck

The base of the problem lies in (unncessary) repeated calls to fetch the same file attributes. By using the "fs_usage" tool one can see that programs keep asking for the same data on the same file successively within milliseconds. Obviously, there's no caching done to prevent this.

While these repeated calls do not cause significant performance hits on a local file system, it's really bad when the queried volume is on a network share. In that case, every time a program asks for information about a file, the data goes back and forth over the network. And the time consumed for performing these network operations is by magnitudes longer than the processing time for looking up the data locally on the computer.

The Finder

The Finder is extremely wasteful when it comes to performance regarding the listing of files in a folder. And not just over the network! I am pretty sure that it would also improve performance when showing large folders such as the Applications folder if would get just a little optimized (it's not too late for that, Apple!)

I've seen that the Finder easily asks ten(!) times for attributes about the same item in succession. If it would cache the data and ask only once, this would practically speed up network folder listings by ten times, because the bottleneck is the network transfer here, and hardly anything else. I could see the same degree of speed improvement in my own app (Find Any File). Apple could make the Finder on network volumes faster by a significant factor if it would only put a little effort into that. Simple as that.

The high level file API

The problem lies not in the Finder alone but also in the design of some of the function calls that are used to collect information about a file, though.

For instance, while there are calls that would allow a program (including the Finder) to fetch all attributes of a file once, and then cache and use that information to display a file's name, size, dates, etc., there are OS functions that do their own fetching of the same attributes. That means that to fetch some information such as the Finder's "Kind" of a file, one cannot simply pass the already fetched data to that function in order to get the Kind returned. Instead, one can only pass it the reference to the file, and the operation then goes on to query the same attributes once again on its own.

So, here, there's already a big design flaw in the APIs, and there's nothing we can do about this. Yet, It is my impression (and experience) that the number of calls per file in the Finder could be reduced from 10 to 2, still giving a five-times speed-up on network volumes.

The low level File System

The file system does not perform any caching. So, every time a program asks for attributes of a file, the file system makes a call to the disk, or in case of a network volume, a call over the network, to get the current attributes. Even if a program asks for the same information twice within milliseconds, each query leads to the same complete conversation over the network.

Now, in some cases this is indeed necessary:

For instance, if a program gets a file's attributes, changes them and then gets them again, any cached information from the first attribute fetch needs to be cleared.

On the other hand, if there's no modification in between, then there's little reason not to use locally cached information.

On could argue that the file in question may have been modified by another program or even by another computer, in which case the locally cached information migth be out of sync.

But think of when such frequent queries for the same file would occur: This is usually the case when programs such as the Finder fetch all information about a file, using various OS functions to collect all attributes. In this case, the Finder _expects_ that the file won't change anyways: It has seen it and now it wants all information about the file it has seen. If the file keeps changing in this process, the Finder will get confusing information, which won't lead to consistent information shown to the user anyway. Hence, in this case, it's actually _preferrable_ if the data gets fetched just once and then cached until the Finder has gotten all the data it wants about the file, which should usually happen within a few milliseconds. Sure, then information the Finder will then show may be out-of-date, but so it would be if it would keep ever-changing information while collecting the data in that short time, making the result even worse to interpret as it won't be consistent at all (e.g. if the file's extension got renamed in the middle of the query, the Finder could end up showing the old file name but the Kind of the new name, which would make no sense to the viewer). Instead, once the data has been collected and the Finder has shown one - consistent - state of information about a file, the Finder is free to check for update to the shown files and then update the information.

However, there are also times when a program might want to perform frequent checks on a networked file (polling) to see if a process on another computer has modified it (file locking on single-user databases, for instance). I don't know use cases for this, though. But I wonder if with a timeout of a few milliseconds per cached file this won't be a problem. At worst, the polling process will be having to wait just a few ms longer until it detects the file change. Will that really be a problem? If so, can those (rather rare, I wonder) cases be detected and specially handled? E.g, I believe that lock files that are usually used for this, follow a specific naming convention. If that's the case, the caching could be skipped for those by default.

Alternatively, I wonder if the File System APIs can be extended to have a cache-request operation. To be convenient enough, it would affect entire processes: I.e. each process can request that attributes of file requests be cached until either a timeout occurs, the caching is disabled again or the affected file has been modified thru the file system (i.e. by a process on that same computer). Anyone, including the Finder, could enable this feature easily without the need to analyse and optimize all the wasteful FS calls it makes. This call could also be used by the file chooser dialogs, because they're affected by the same performance issue as the Finder: Enable attributes caching when the dialog gets shown, disable once it gets closed. Easy as pie.

Of course, this requires work on the file system driver. As a start, maybe it could be applied just to the network (afp) driver to see how that goes. Until OS X 10.4 (Tiger), it was even possible to write a pass-thru file system driver as a system extension written by a 3rd party (e.g. me), that could have intercepted all file system requests, caching them optionally, and then pass them on. Sadly, Apple has removed such abilities from recent OS X versions. :^(

What you can do as a user if you're using a file server (e.g. a NAS)

Whether you've understood the technical reasons for slow network operations or not, here's a simple but limited trick to improve access to network volumes from you Mac:

If possible, create a disk image on your network volume and mount that volume on your Mac. Such a mounted disk image will be used like a local disk, with the only difference that the modified data of the "disk" will be transferred over the network to the server as file operations. This is fundamentally different from accessing the server volume as individual folders and files, because our Mac will perform all the directory management and lookup locally on the Mac, even cache the data, before it updates it on the server. This will lead to higher performance when accessing files and folders on that mounted disk image volume.

To create such a volume, launch the program "Disk Utility". It offers a New Image command. Give the image a name and choose to save it on your network volume. In the options below, choose the maximum size you ever want to use for it - I suggest entering the size of your network volume's capacity unless you know you won't need that much space. And lastly, change the "Image Format" to "sparse bundle disk image" - that will ensure that the disk will occupy only as much space on the network volume as it currently needs, and not as much space as you've reserved for it. Note, however, that the space will ever only grow and never shrink again, even if you delete files from the disk image's volume later again. If it has grown too large, using a lot of dead space, you can use the Convert command in Disk Utility to copy the used data of the disk image to a new volume that's shrunk to the smallest possible size again.

Using a disk image on a network share also has disadvantages, though:

If the connection to the server suddenly breaks and isn't restored within minutes, same of the cached data on the Mac may not have been send to the disk image file on the server, leaving that volume currupted. Hence, you should use this technique preferrably with data that you store once and read often, such as an iTunes library, for instance. For archiving purposes it might be suitable as well, as long as you unmount the volume after archiving, thereby making sure everything has actually been transferred to the server.
Only one user can mount such an image. That means if you have files you want to share between multiple computers at the same time, this won't work. So this is only suitable for a single user's data (such as the iTunes music) or when used only temporarily (in case of archiving where the image is mounted only for a brief time).

Note that networked Time Machine volumes work this way, too: The files backup up to a Time Capsule are not stored on the Time Capsule's disk as indidivual files, meaning that if you'd plug the disk into a Mac, you won't see the files there. Instead, there's just a disk image on it, and that image then gets mounted on the Mac. If it were not like this, TM with a Time Capsule would be impossibly slow due to the issues I explained above.

Programming, OSX, Disks