Nemo I/O Primer
draft ("Better Than Nothing")
2001-08-23
Darin Adler <darin@bentspoon.com>

The Nemo shell, and the file manager inside it, does a lot of
I/O. Because of this, there are some special disciplines required when
writing Nemo code.

No I/O on the main thread

To be able to respond to the user quickly, Nemo needs to be
designed so that the main user input thread does not block. The basic
approach is to never do any disk I/O on the main thread.

In practice, Nemo code does assume that some disk I/O is fast, in
some cases intentionally and in other cases due to programmer
sloppiness. The typical assumption is that reading files from the
user's home directory and the installed files in the Nemo datadir
are very fast, effectively instantaneous.

So the general approach is to allow I/O for files that have file
system paths, assuming that the access to these files is fast, and to
prohibit I/O for files that have arbitrary URIs, assuming that access
to these could be arbitrarily slow. Although this works pretty well,
it is based on an incorrect assumption, because with NFS and other
kinds of abstract file systems, there can be arbitrarily slow parts of
the file system that have file system paths.

For historical reasons, threading in Nemo is done through the
gnome-vfs asynchronous I/O abstraction rather than using threads
directly. This means that all the threads are created by gnome-vfs,
and Nemo code runs on the main thread only. Thus, the rule of
thumb is that synchronous gnome-vfs operations like the ones in
<libgnomevfs/gnome-vfs-ops.h> are illegal in most Nemo
code. Similarly, it's illegal to ask for a piece of information, say a
file size, and then wait until it arrives. The program's main thread
must be allowed to get back to the main loop and start asking for user
input again.

How NemoFile is used to do this

The NemoFile class presents an API for scheduling this
asynchronous I/O and dealing with the uncertainty of when the
information will be available. (It also does a few other things, but
that's the main service it provides.) When you want information about
a particular file or directory, you get the NemoFile object for
that item using nemo_file_get. This operation, like most
NemoFile operations, is not allowed to do any disk I/O. Once you
have a NemoFile object, you can ask it questions like "What is
your file type?" by calling functions like
nemo_file_get_file_type. However, for a newly created NemoFile
object the answer is almost certainly "I don't know." Each function
defines a default, which is the answer given for "I don't know." For
example, nemo_file_get_type will return
GNOME_VFS_FILE_TYPE_UNKNOWN if it doesn't yet know the type.

It's worth taking a side trip to discuss the nature of the
NemoFile API. Since these classes are a private part of the
Nemo implementation, we make no effort to have the API be
"complete" in an abstract sense. Instead we add operations as
necessary and give them the semantics that are most handy for our
purposes. For example, we could have a nemo_file_get_size that
returns a special distinguishable value to mean "I don't know" or a
separate boolean instead of returning 0 for files where the size is
unknown. This is entirely motivated by pragmatic concerns. The intent
is that we tweak these calls as needed if the semantics aren't good
enough.

Back to the newly created NemoFile object. If you actually need to
get the type, you need to arrange for that information to be fetched
from the file system. There are two ways to make this request. If you
are planning to display the type on an ongoing basis then you want to
tell the NemoFile that you'll be monitoring the file's type and want to
know about changes to it. If you just need one-time information about
the type then you'll want to be informed when the type is
discovered. The calls used for this are nemo_file_monitor_add and
nemo_file_call_when_ready respectively. Both of these calls take a
list of information needed about a file. If all you need is the file
type, for example, you would pass a list containing just
NEMO_FILE_ATTRIBUTE_FILE_TYPE (the attributes are defined in
nemo-file-attributes.h). Not every call has a corresponding file
attribute type. We add new ones as needed.

If you do a nemo_file_monitor_add, you also typically connect to
the NemoFile object's changed signal. Each time any monitored
attribute changes, a changed signal is emitted. The caller typically
caches the value of the attribute that was last seen (for example,
what's displayed on screen) and does a quick check to see if the
attribute it cares about has changed. If you do a
nemo_file_call_when_ready, you don't typically need to connect to
the changed signal, because your callback function will be called when
and if the requested information is ready.

Both a monitor and a callback can be cancelled. For ease of
use, neither requires that you store an ID for
canceling. Instead, the monitor function uses an arbitrary client
pointer, which can be any kind of pointer that's known to not conflict
with other monitorers. Usually, this is a pointer to the monitoring
object, but it can also be, for example, a pointer to a global
variable. The call_when_ready function uses the callback function and callback
data to identify the particular callback to cancel. One advantage of the monitor
API is that it also lets the NemoFile framework know that the file
should be monitored for changes made outside Nemo. This is how we
know when to ask FAM to monitor a file or directory for us.

Lets review a few of the concepts:

1) Nearly all NemoFile operations, like nemo_file_get_type,
   are not allowed to do any disk I/O.
2) To cause the actual I/O to be done, callers need to use set up
   either a monitor or a callback.
3) The actual I/O is done by asynchronous gnome-vfs calls, so the work
   is done on another thread.

To work with an entire directory of files at once, you use
a NemoDirectory object. With the NemoDirectory object you can
monitor a whole set of NemoFile objects at once, and you can
connect to a single "files_changed" signal that gets emitted whenever
files within the directory are modified. That way you don't have to
connect separately to each file you want to monitor. These calls are
also the mechanism for finding out which files are in a directory. In
most other respects, they are like the NemoFile calls.

Caching, the good and the bad

Another feature of the NemoFile class is the caching. If you keep
around a NemoFile object, it keeps around information about the
last known state of that file. Thus, if you call
nemo_file_get_type, you might well get file type of the file found
at this location the last time you looked, rather than the information
about what the file type is now, or "unknown". There are some problems
with this, though.

The first problem is that if wrong information is cached, you need
some way to "goose" the NemoFile object and get it to grab new
information. This is trickier than it might sound, because we don't
want to constantly distrust information we received just moments
before. To handle this, we have the
nemo_file_invalidate_attributes and
nemo_file_invalidate_all_attributes calls, as well as the
nemo_directory_force_reload call. If some code in Nemo makes a
change to a file that's known to affect the cached information, it can
call one of these to inform the NemoFile framework. Changes that
are made through the framework itself are automatically understood, so
usually these calls aren't necessary.

The second problem is that it's hard to predict when information will
and won't be cached. The current rule that's implemented is that no
information is cached if no one retains a reference to the
NemoFile object. This means that someone else holding a
NemoFile object can subtly affect the semantics of whether you
have new data or not. Calling nemo_file_call_when_ready or
nemo_file_monitor_add will not invalidate the cache, but rather
will return you the already cached information.

These problems are less pronounced when FAM is in use. With FAM, any
monitored file is highly likely to have accurate information, because
changes to the file will be noticed by FAM, and that in turn will
trigger new I/O to determine what the new status of the file is.

Operations that change the file

You'll note that up until this point, I've only discussed getting
information about the file, not making changes to it. NemoFile
also contains some APIs for making changes. There are two kinds of
these.

The calls that change metadata are examples of the first kind. These
calls make changes to the internal state right away and schedule I/O
to write the changes out to the file system. There's no way to detect
if the I/O succeeds or fails, and as far as the client code is
concerned the change takes place right away.

The calls that make other kinds of file system change are examples of
of the second kind. These calls take a
NemoFileOperationCallback. They are all cancellable, and they give
a callback when the operation completes, whether it succeeds or fails.

Files that move

When a file is moved, and the NemoFile framework knows it, then
the NemoFile and NemoDirectory objects follow the file rather
than staying stuck to the path. This has a direct influence on the
user interface of Nemo -- if you move a directory, already-open
windows and property windows will follow the directory around.

This means that keeping around a NemoFile object and keeping
around a URI for a file have different semantics, and there are
cases where one is the better choice and cases where the other is.

Icons

The current implementation of the Nemo icon factory uses
synchronous I/O to get the icons and ignores these guidelines. The
only reason this doesn't ruin the Nemo user experience is that it
also refuses to even try to fetch icons from URIs that don't
correspond to file system paths, which for most cases means it limits
itself to reading from the high-speed local disk. Don't ask me what
the repercussions of this are for NFS; do the research and tell me
instead!

Slowness caused by asynchronous operations

One danger in all this asynchronous I/O is that you might end up doing
repeated drawing and updating. If you go to display a file right
after asking for information about it, you might immediately show an
"unknown file type" icon. Then, milliseconds later, you may complete
the I/O and discover more information about the file, including the
appropriate icon. So you end up drawing the icon twice. There are a
number of strategies for preventing this problem. One of them is to
allow a bit of hysteresis and wait some fixed amount of time after
requesting the I/O before displaying the "unknown" state. One
strategy that's used in Nemo is to wait until some basic
information is available until displaying anything. This might make
the program overall be faster, but it might make it seem slower,
because you don't see things right away. [What other strategies
are used in Nemo now for this?]

How to make Nemo slow

If you add I/O to the functions in NemoFile that are used simply
to fetch cached file information, you can make Nemo incredibly I/O
intensive. On the other hand, the NemoFile API does not provide a
way to do arbitrary file reads, for example. So it can be tricky to
add features to Nemo, since you first have to educate NemoFile
about how to do the I/O asynchronously and cache it, then request the
information and have some way to deal with the time when it's not yet
known.

Adding new kinds of I/O usually involves working on the Nemo I/O
state machine in nemo-directory-async.c. If we changed Nemo to
use threading instead of using gnome-vfs asychronous operations, I'm
pretty sure that most of the changes would be here in this
file. That's because the external API used for NemoFile wouldn't
really have a reason to change. In either case, you'd want to schedule
work to be done, and get called back when the work is complete.

[We probably need more about nemo-directory-async.c here.]

Future direction

Some have suggested that by using threading directly in Nemo
rather than using it indirectly through the gnome-vfs async. calls,
we could simplify the I/O code in Nemo. It's possible this would
make a big improvement, but it's also possible that this would primarily
affect the internals and implementation details of NemoFile and
still leave the rest of the Nemo code the same.

That's all for now

This is a very rough early draft of this document. Let me know about
other topics that would be useful to be covered in here.

    -- Darin
