The World Wide Web as we know it is changing. The browser paradigm is sessionless, it has no concept of continuity. Web browsers limit us to simple one way file transfers, either by browsing or downloading. There is great demand for a true interactive web, some way to allow us to write documents across the web. A real Internet filesystem, one that is designed to span the huge distance of the Internet, must be able to deliver this ability. It must be able to deliver it with high speed and high security. Once in place, a real filesystem on the Internet, means that we will have an infinite hard disk to read and write from. It also means that the Network Computer, that much talked about machine that is simply a terminal accessing data somewhere on the net, can finally be a reality. An Internet filesystem will allow the high speed read and write access to documents anywhere in the net. One other feature of an Internet filesystem unavailable to web browsers is random access to files. A web browser (and FTP) can only read the entire contents of a file from beginning to end. With a filesystem, applications need only load those parts of a file that are required. The time and bandwidth saving implications of this are huge.
Traditionally, network filesystems have been used to achieve these features on local area networks. These network filesystems are inadequate in terms of speed and security when dealing with the Internet at large. To address this, protocols have been developed to allow high speed secure filesystem operations across the Internet. In particular, WebNFS and CIFS allow access to remote filesystems as if they were local.
There is a huge amount of literature available about NFS and SMB. In this article I do not describe their design or implementation details in any great length. Rather, I scrape the surface of their relative merits and touch on some of their more fundamental concepts.
Before the local area networks came into common use, offices and computer centres would be limited in terms of sharing files and resources. Some sites had a single large computer with many terminals connected directly to it, allowing users to share that single computer's resources and files. Other sites had several non-connected PCs and would be limited to transferring files by floppy disk. Usually, sites had a mixture of both. Obviously this was very inconvenient for users and administrators. In the "single large computer" scenario users were limited by the large computer. If this computer went down, had to be upgraded, etc. no work could be done by users. The PC scenario was a nightmare to administrate - system administrators had their hands full ensuring data and application consistency and integrity. As well, users were inconvenienced having to, say, copy files to a floppy so that they could take their document to the only PC connected to a printer to get a hard copy. Obviously some method of sharing not only data but entire filesystems, and the associated resources, had to be implemented.
Many resource sharing solutions have been implemented however two main solutions have taken root and stuck. These two solutions are SMB (Microsoft's Server Message Block protocol) and NFS (Sun Microsystems' Network File System).
The Network File System (NFS) protocol is a mechanism allowing clients to transparently access files and filesystems on remote servers. It is independent of architecture, operating system, network and transport protocols. NFS was released to the world at large in 1985. This release was NFS version 2. NFS is shipped with almost every UNIX system and implementations exist for nearly every computing platform in use from desktop computers to supercomputers. Since its release, NFS has become a de facto standard and it is estimated to be used by over ten million systems. The latest version of NFS is version 3, released in 1994.
The NFS protocol follows the client server architecture. Simply, clients request files from an NFS server. The server responds by returning a file handle to the client.
NFS is primarily designed so that the server is stateless. This means that each server request contains sufficient information to be completely processed without regard to other requests. The only information retained by the server is a map of file handles to files on the local disk. One of the main design goals of NFS was to allow simple recovery in the event of a server crash. Crash recovery is simple, the protocol at the client side tries to reconnect to the server until a new connection is established. In theory the end user may not even know that the server was unavailable. This sort of design has very positive implications in the Internet. Very often network traffic may cause connections to timeout or intermediate gateways may become unavailable for short periods of time. Version 3 of the NFS protocol requires that modified data on the server be flushed before replying. In particular the client blocks on a file close call until all data is flushed so that any errors, such as out of disk space, may be returned to the application. The exception to this is when the client requests an asynchronous write. When a server receives an asynchronous write request it is permitted to reply immediately. Later the client will send a commit request to verify that the data has, in fact, been written to disk. The server may not reply to this unless the data is safely written. This is most useful for large files - a client can send many write requests and then send a single commit request when the file is closed.
WebNFS is a mechanism that allows NFS clients to obtain services from WebNFS-enabled servers with a minimum of protocol overhead. WebNFS uses NFS (it can use either version 2 or version 3) as an underlying layer.
WebNFS introduces the concept of a public filehandle. A public filehandle simply defines a starting point on the server to which clients can connect. Using the public file handle circumvents having to use traditional NFS techniques involving the MOUNT protocol and the portmapper service. A file is simply looked up relative to the public filehandle.
Currently, WebNFS ties directly into the web browser requiring
the browser to support a new protocol (NFS) and a new URL with
the format:
nfs://server:port/path
This tells the client to connect to the server specified by "server" at the given port. The returned file is found by taking the path defined by "path" relative to the public file handle. If the ":port" is omitted the default NFS port number (2049) is used.
A major advantage of WebNFS over current file access protocols is that WebNFS need only open a single TCP connection from a client to the server. This is a considerable performance advantage over HTTP which requires a separate connection for every file component. A connection between the client and server is kept in place until the client breaks the connection.
Traditionally NFS has always used UDP as a transport. This is because UDP was always faster than TCP. With improvements in hardware and TCP implementations, using TCP as a transport mechanism is now a viable option. NFS version 3 can use TCP or UDP. When using TCP, NFS will be able to pass through corporate packet filtering gateways (a particular firewall strategy) that usually does not allow UDP packets. This is advantageous as firstly it allows WebNFS to be used in networks with gateways and secondly it allows regular firewall security methods to be used with WebNFS.
WebNFS uses the authentication methods of the underlying RPC (remote procedure call) layer. As well as this, NFS is a prime candidate for the Secure Socket Layer (SSL) at the TCP level.
NFS generally also supports whatever access control method is used by the underlying operating system.
More information about WebNFS can be found at: http://www.sun.com/solaris/networking/webnfs/webnfs.html
The Common Internet Filesystem Standard (CIFS) is a specification for a file access protocol developed by Microsoft. CIFS is based on the existing SMB (Server Message Block) protocol that is currently used by Windows and OS/2 for file and printer sharing in a local area network. CIFS is not designed to replace HTTP or FTP, rather, it is supposed to complement them. There is no new URL defined with CIFS, rather the file: URL is used.
Some functionality provided by CIFS includes:
In order to resolve the CIFS server name into a network address CIFS can use either DNS, IPX or NETBIOS. The method used is configurable but there are some constraints. In the case of NETBIOS, for example, the server name is limited to 15 characters and must be upper case. In the case of DNS (the default address resolution method) the address can either be a name or an IPv4 address (for example 132.18.154.1).
Name resolution becomes complicated in CIFS when a client and server are using different resolution methods. As an example, consider the case where a client running DNS name resolution (as nearly all Internet aware applications currently do) wishes to connect to a server running NETBIOS name resolution. In this case the client must convert the server name by changing it to upper case and then padding it out with blanks to a length of 16.
More information about CIFS can be found at: http://www.microsoft.com/intdev/cifs/cifs.htm
One of the newest product types available are Multi Protocol Network File Systems. These are file servers that sit on a network and "speak" more than one filesystem language. These file server machines usually have proprietary operating systems that are dedicated to file I/O, fault tolerance, speed and multi protocol serving. They are quick, reliable and can hook up to most networked computers with little effort.
Two companies in particular, Network Appliance and Auspex, supply such systems. Note that both companies have written several technical papers on their particular solutions to network storage. These papers can be found on the websites of the respective companies (see below for the URL's).
Network Appliance is a U.S. company that has released a range of multi protocol file servers they call Web filers or just filers. These filers run the Network Appliances proprietary ONTAP system and can support NFS, CIFS and HTTP simultaneously. HTTP support is limited to the GET request only (simply downloading a page) as this represents the majority of HTTP access to web servers.
More information can be obtained via the Network Appliance homepage:http://www.netapp.com
Auspex is another U.S. company that specialises in network file servers called NetServers. Auspex NetServers run a proprietary operating system that currently supports FTP and will support WebNFS in its next software release. It does not support CIFS at the moment.
More information can be obtained via the Auspex homepage: http://www.auspex.com
Internet filesystems take the web into its next stage of life. The filesystem issue is one of the last hurdles in the path to making the Internet one large local network. WebNFS and CIFS offer slightly different methods of providing a filesystem. While NFS is a mature protocol, having first been used in the early 1980s, SMB already runs on many desktop computers. Neither protocol is the "right" one, it probably depends on what one is using it for. There will be additions, extensions and enhancements to both of them over the years and there will probably be new protocols as well. Whatever happens, we end up with a faster, more secure web that can be accessed as if it were local.
Copyright © 1997, 1998 Robi Karp. Robi is a consultant specialising in the areas of Unix application software, security software, software development environments and The Internet. He is technical director of Fluffy Spider Technologies Pty. Ltd. He can be contacted via email: robi@fluffyspider.com.au