Fault-injecting filesystem cookbook
Block devices sometimes do bad things (or just fill up), so sometimes bad things happen to good software. CharybdeFS makes it easy to do integration testing that covers hard-to test filesystem errors. And good error handling is a sign of well-thought-out software. For example, your program will make a much better impression on users if you have it show a nice “insufficient space” message than if it just crashes for no apparent reason. The CharybdeFS filesystem lets you inject arbitrary file errors for testing. This article covers some common examples for getting started.
Running the cookbook
The examples in this article are included in the
cookbook subdirectory of the CharybdeFS project on GitHub. You can clone the project with
git clone https://github.com/scylladb/charybdefs.git.
Because CharybdeFS is a filesystem, the tests need to run as root. You can run as root in a container, or use
sudo. To build and run the cookbook tests on CentOS 7, do
yum install epel-release yum install gcc-c++ cmake cmake thrift fuse-devel python-thrift thrift-devel thrift -r --gen cpp server.thrift &> /dev/null cmake CMakeLists.txt make modprobe fuse cd cookbook python demo.py
thrift command needs to be re-run if you upgrade CharybdeFS. (You don’t need to run it every time you change the program you are testing.)
The modprobe command only needs to be done when you reboot. Or add
/etc/modules to insert it every time you boot the system.
Integrating the recipes into your projects
You can look at the layout of the
cookbook sample project to see how to integrate CharybdeFS into your project’s existing unit tests. The
tearDown functions in the test setup will start and stop CharybdeFS for you. These examples are in Python, but you can make it work with projects in other languages, and with other build and test tools.
Bad things can happen to good filesystems. Can your program handle rare errors correctly?
Anatomy of the cookbook
The following snippet of code takes care of instantiating thrift and connecting to CharybdeFS, leaving you at the point where you can instruct CharybdeFS to do something useful.
## The mandatory boilerplate import sys, glob import errno import sys sys.path.append('gen-py') from server import server from server.ttypes import * from thrift import Thrift from thrift.transport import TSocket from thrift.transport import TTransport from thrift.protocol import TBinaryProtocol def connect(): transport = TSocket.TSocket('127.0.0.1', 9090) transport = TTransport.TBufferedTransport(transport) protocol = TBinaryProtocol.TBinaryProtocol(transport) client = server.Client(protocol) transport.open() return client def main(): client = connect()
See server.thrift for a full reference on the methods of the client object returned from the
server.Client call, including the arguments for
set_all_fault. The examples in this cookbook assume that you have already created a client as above. You can re-use the same client for the entire test suite.
Some CharybdeFS recipes
The following code will return ENOSPC on all filesystem operations:
import errno client.set_all_fault(False, errno.ENOSPC, 0, "", False, 0, False)
This will return EIO on all filesystem operations:
import errno client.set_all_fault(False, errno.EIO, 0, "", False, 0, False)
Same idea for a quota exceeded error.
import errno client.set_all_fault(False, errno.EDQUOT, 0, "", False, 0, False)
All the available errno codes on your operating system can be used. Read the errno documentation and get imaginative.
Very slow writes
Now, let’s delay each filesystem operation by 50 ms.
client.set_all_fault(False, 0, 0, "", False, 50000, False)
Returning random errors
To return a random error just use the random error flag.
client.set_all_fault(True, 0, 0, "", False, 0, False)
Restricting errors to specific syscalls
Let’s say we want to return random error on reads and writes. For this let’s use the alternate set_fault method.
client.set_fault(['read', 'read_buf', 'write', 'write_buf'], True, 0, 0, "", False, 0, False)
Fiddling with the error probability
Now we want to trigger the same behavior in 1% of the cases. (The
probability argument is the probability over 100,000.)
client.set_fault(['read', 'read_buf', 'write', 'write_buf'], True, 0, 1000, "", False, 0, False)
Matching a file pattern
Let’s say we want to restrict this behavior to a file named
client.set_fault(['read', 'read_buf', 'write', 'write_buf'], True, 0, 0, ".*sendmail.cf", False, 0, False)
Mix and match: the agonising drive simulator
Let’s make a drive filesystem that drags to a crawl, and returns I/O errors on 10% of system calls.
client.set_all_fault(False, errno.EIO, 10000, "", False, 100000, False)
Clearing faults: to stop injecting fault in future system calls
You can clear a fault for a single system call with
will do it only for the fsync syscall.
Or if you are having a good day and want to clear all errors:
All the code of this cookbook as been compiled in a simple python script that can be found in the cookbook subdirectory of the project.
Tests for Scylla
CharybdeFS is used for testing the Scylla NoSQL database. Scripts for the Scylla tests are in the /tests directory of the CharybdeFS project. If you need a fast, resilient database that’s compatible with Apache Cassandra, you can be sure that it will do the right thing if an SSD fills up or returns an error.
Not just Python
Using CharybdeFS for a project in another language, or a different test framework? Please let us know and we’ll link to your example. Pull requests are welcome.
CharybdeFS is open source and available on GitHub. An intro to CharybdeFS is CharybdeFS: a new fault-injecting filesystem for software testing.
Photo: John for Wikimedia Commons. Available under the Creative Commons CC BY 2.0 license.