Saturday, September 10, 2011

Debugging file system hang using kconsole

I thought it may be useful to interested people if I demonstrated how the HelenOS kconsole ipc command can be used to root cause a deadlock in the file system code. Yesterday I had the opportunity to use this technique on ticket #373. In short, the mount command was hanging in an attempt to mount a MFS file system on a file-backed block device with the underlying image located on the root FAT file system.

Let's first have a look at the commands which lead to this deadlock:
 

With the deadlock in place, we can start debugging it by switching to kconsole and inspecting the IPC state of essential tasks. The first one to look at is VFS, which is usually task number 6:


The screenshot above reveals that VFS is processing two client calls and that it itself has made two calls to the FAT server. The first call with method 1032 is VFS_IN_MOUNT sent by bdsh as it was executing the mount built-in command. From the source code, we see that VFS needs to process VFS_IN_MOUNT by sending VFS_OUT_MOUNT to the file system with the mount-point node. This corresponds to the first of the two calls sent over phone 2. Let's now look at what the situation is at the FAT server:


We see three calls dispatched to the FAT server, the first one being the VFS_OUT_MOUNT. In reaction to VFS_OUT_MOUNT, FAT will send VFS_OUT_MOUNTED to the mountee file system. The mountee, in this case, is task 41 - the MFS server.


Since MFS is being mounted on a file_bd block device, MFS will want to do some reads from it before it answers the VFS_OUT_MOUNTED call.


This file_bd instance has its blocks in the image file located on the root FAT file system, so it makes VFS_IN_READ and IPC_M_READ_DATA calls to VFS in order to read the block. We already saw the former on the screenshot with VFS as the second dispatched call. We also saw the IPC_M_READ_DATA on the FAT situation screenshot (call with method 8) because it got forwarded to FAT.

In reaction to VFS_IN_READ, VFS will send VFS_OUT_READ to the file system with the respective image file, i.e. the FAT server. It is the call with method 1025 on the FAT situation screenshot above. It is this call which contributes to the calls column of phone 2 on the VFS screenshot by its second half.

Now the deadlock cycle is completed. As seen on the FAT screenshot, the connection fibril associated with the VFS phone 2 is processing the VFS_OUT_MOUNT call and cannot finish it because for that it also needs to start and complete processing of the VFS_OUT_READ call. On the other hand, it cannot even start processing VFS_OUT_READ, before it is done with VFS_OUT_MOUNT.

I have described a similar problem several years ago in this blog. The solution was the introduction of async sessions. So the million-dollar question is why this didn't work this time and why we see that the two requests which should be handled in parallel are sent over VFS phone 2 serially.

The possible answer could be hidden in the way how the first call is handled. And indeed, in vfs_mount_internal(), we find the following piece of code:
        vfs_exchange_release(exch);
        async_wait_for(msg, &rc);
The code is wrong because it ends the exchange before the answer to the VFS_OUT_MOUNT call arrives. That makes it possible for the underlying exchange phone to be reused for handling other requests, including those that need to be handled in parallel to the first one. With this realization, the fix is a piece of cake.

No comments: