Saturday, July 3, 2010

Windows and file links

It has long been the accepted wisdom that "Windows doesn't support symbolic links". But this has always been a gray area, and lately it's been getting more gray. As the MSDN documentation describes,

There are three types of file links supported in the NTFS file system: hard links, junctions, and symbolic links.

So let's take a look at this in some more detail.

My copy of Helen Custer's Inside the Windows NT File System, from 1994, says:

the POSIX standard requires the file system to support case-sensitive file and directory names, a "file-change-time" time stamp (which is different from the MS-DOS "time-last-modified" stamp), and hard links. NTFS implements each of these features. NTFS does not implement POSIX symbolic links in its first release, but it can be extended to do so.

Custer briefly describes how hard links work:

When a hard link to a POSIX file is created, NTFS adds another file name attribute to the file's MFT file record. When a user deletes a POSIX file that has multiple names (hard links), the file record and the file remain in place. The file and its record are deleted only when the last file name (hard link) is deleted.

So hard links have been part of Windows/NTFS for over 15 years; there is a CreateHardLink function in the Windows API; you can read more about hard links at the MSDN web site.

The second stage of support for file links was added in Windows 2000, and was called junctions. As the MSDN documentation describes:

A junction (also called a soft link) differs from a hard link in that the storage objects it references are separate directories, and a junction can link directories located on different local volumes on the same computer. Otherwise, junctions operate identically to hard links. Junctions are implemented through reparse points.

Note that hard links are alternate names for files, whereas junction points occur at the directory level of a path, and include the name of another directory. So junction points are always used to have a soft link from one directory to another directory.

Junction points appear to have been originally directed at the problem of stitching together the multi-volume Windows file system into a single logical file system, with links that crossed from one volume to another. As Knowledge Base article 205524 describes:

You can surpass the 26 drive letter limitation by using NTFS junction points. By using junction points, you can graft a target folder onto another NTFS folder or "mount" a volume onto an NTFS junction point. Junction points are transparent to programs.

The KB article describes the commands "linkd", "mountvol", and "delrp", and notes that the "rp" in "delrp" refers to reparse points, which are the underlying feature that supports junction points. The Sysinternals section of TechNet additionally provides a utility named Junction, for working with junction points. (Sadly, unlike many of the Sysinternals tools, the Junction tool does not appear to come with source code; it's an executable only.)

It's not clear how you work with a Junction point programmatically. Is there a call in the Windows API to create or delete a Junction point? Or is it only possible using these special command-line tools? It must be possible, as various people have implemented their own tools and extensions for working with junction points: here are two: (a) Hermann Schinagl's LinkShellExtension, and (b)'s CreateJunction utility, which does include a snippet of source describing their use of the FSCTL_SET_REPARSE_POINT IOControl operation. Yikers!

Junction points are implemented using reparse points, which are also described in the MSDN documentation:

A file or directory can contain a reparse point, which is a collection of user-defined data. The format of this data is understood by the application which stores the data, and a file system filter, which you install to interpret the data and process the file. When an application sets a reparse point, it stores this data, plus a reparse tag, which uniquely identifies the data it is storing. When the file system opens a file with a reparse point, it attempts to find the file system filter associated with the data format identified by the reparse tag. If a file system filter is found, the filter processes the file as directed by the reparse data. If a file system filter is not found, the file open operation fails.

The third stage of support for file links in Windows came rather quietly, as part of Windows Vista. I don't recall a lot of fanfare about this functionality when Vista was announced; I guess I wasn't paying attention! As described in the MSDN documentation:

A symbolic link is a file-system object that points to another file system object. The object being pointed to is called the target.

Symbolic links are transparent to users; the links appear as normal files or directories, and can be acted upon by the user or application in exactly the same manner.

Symbolic links are designed to aid in migration and application compatibility with UNIX operating systems. Microsoft has implemented its symbolic links to function just like UNIX links.

At least at this level of detail, this seems pretty complete, and promising. There is the small complexity, of course, that "Symbolic links are available in NTFS starting with Windows Vista."

The MSDN documentation has a fairly detailed discussion of symbolic links:

Symbolic links can either be absolute or relative links. Absolute links are links that specify each portion of the path name; relative links are determined relative to where relative–link specifiers are in a specified path.

There is a CreateSymbolicLink function that allows a program to, naturally, create a symbolic link.

And there is a small set of notes entitled Programming Considerations, which really seems like it belonged in the "Remarks" section under the CreateSymbolicLink function.

And there is a short article about how to open a filesystem object to read the reparse point information:

To determine if a specified directory is a mounted folder, first call the GetFileAttributes function and inspect the FILE_ATTRIBUTE_REPARSE_POINT flag in the return value to see if the directory has an associated reparse point. If it does, use the FindFirstFile and FindNextFile functions to obtain the reparse tag in the dwReserved0 member of the WIN32_FIND_DATA structure. To determine if the reparse point is a mounted folder (and not some other form of reparse point), test whether the tag value equals the value IO_REPARSE_TAG_MOUNT_POINT. For more information, see Reparse Points.

To obtain the target volume of a mounted folder, use the GetVolumeNameForVolumeMountPoint function.

In a similar manner, you can determine if a reparse point is a symbolic link by testing whether the tag value is IO_REPARSE_TAG_SYMLINK.

So there's fairly clear documentation about how to create a symbolic link, and it's also clear that you can delete one by simply calling DeleteFile. It's less clear how to do the equivalent of the Unix readlink function; that is, how do you read the file system to find out whether a particular object is a symbolic link or not, and, if it is, what it points to? It's clear I'm not the only person confused about this. There's apparently a fsutil tool in the Windows 7 command line that does this, but what APIs does it call to get its job done? Even the usually authoritative Raymond Chen doesn't cover this?

Overall, it's quite clear that the "accepted wisdom" has become somewhat stale and obsolete: Windows does support file links, and with relatively complete support. However, as is often the case with Windows, they have their own set of strange and unique APIs for working with the functionality, the documentation about how to use the APIs is scattered and terse, and there is the complexity of dealing with the enormous Windows installed base, and the fact that the support for file links was introduced over time, and hence varies from Windows platform to Windows platform. But, if you are running Windows 7 (and if you aren't, why aren't you?), it seems like you should have enough operating system support to build an application with fairly complete support for file links.

Enough of this overview-level discussion, it's time to write some code! I'll see you later, when I have some actual code to discuss...

1 comment:

  1. hi,

    About the symlink, there is finally posix equivalent on windows for symbolic links. Junction nad hardlink have been around for a while too.

    Bad thing is that they are not accessible by default to normal users. Only admin can create them or a user has to have the 'allow symlink creation' flag in his security profile (have to find back where it is exactly :).

    If you need an extensive example on how to resolve symlink (and mapped drives, junctions, etc.), see php's tsrm. I implemented that in 5.3+, rather painful and really not developer friendly, but it works. it is in:


    Feel free to ping me back if you need more support on that.