Block devices sometimes do bad things (or just fill up), so sometimes bad things happen to good software. CharybdeFS makes it easy to do integration testing that covers hard-to test filesystem errors. And good error handling is a sign of well-thought-out software. For example, your program will make a much better impression on users if you have it show a nice “insufficient space” message than if it just crashes for no apparent reason. The CharybdeFS filesystem lets you inject arbitrary file errors for testing. This article covers some common examples for getting started.
Running the cookbook
The examples in this article are included in the cookbook
subdirectory of the CharybdeFS project on GitHub. You can clone the project with git clone https://github.com/scylladb/charybdefs.git
.
Because CharybdeFS is a filesystem, the tests need to run as root. You can run as root in a container, or use sudo
. To build and run the cookbook tests on CentOS 7, do
yum install epel-release
yum install gcc-c++ cmake cmake thrift fuse-devel python-thrift thrift-devel
thrift -r --gen cpp server.thrift &> /dev/null
cmake CMakeLists.txt
make
modprobe fuse
cd cookbook
python demo.py
The thrift
command needs to be re-run if you upgrade CharybdeFS. (You don’t need to run it every time you change the program you are testing.)
The modprobe command only needs to be done when you reboot. Or add fuse
to /etc/modules
to insert it every time you boot the system.
Integrating the recipes into your projects
You can look at the layout of the cookbook
sample project to see how to integrate CharybdeFS into your project’s existing unit tests. The setUp
and tearDown
functions in the test setup will start and stop CharybdeFS for you. These examples are in Python, but you can make it work with projects in other languages, and with other build and test tools.
Bad things can happen to good filesystems. Can your program handle rare errors correctly?
Anatomy of the cookbook
The following snippet of code takes care of instantiating thrift and connecting to CharybdeFS, leaving you at the point where you can instruct CharybdeFS to do something useful.
## The mandatory boilerplate
import sys, glob
import errno
import sys
sys.path.append('gen-py')
from server import server
from server.ttypes import *
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
def connect():
transport = TSocket.TSocket('127.0.0.1', 9090)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = server.Client(protocol)
transport.open()
return client
def main():
client = connect()
See server.thrift for a full reference on the methods of the client object returned from the server.Client
call, including the arguments for set_fault
and set_all_fault
. The examples in this cookbook assume that you have already created a client as above. You can re-use the same client for the entire test suite.
Some CharybdeFS recipes
Disk full
The following code will return ENOSPC on all filesystem operations:
import errno
client.set_all_fault(False, errno.ENOSPC, 0, "", False, 0, False)
IO error
This will return EIO on all filesystem operations:
import errno
client.set_all_fault(False, errno.EIO, 0, "", False, 0, False)
Quota exceeded
Same idea for a quota exceeded error.
import errno
client.set_all_fault(False, errno.EDQUOT, 0, "", False, 0, False)
All the available errno codes on your operating system can be used. Read the errno documentation and get imaginative.
Very slow writes
Now, let’s delay each filesystem operation by 50 ms.
client.set_all_fault(False, 0, 0, "", False, 50000, False)
Returning random errors
To return a random error just use the random error flag.
client.set_all_fault(True, 0, 0, "", False, 0, False)
Restricting errors to specific syscalls
Let’s say we want to return random error on reads and writes. For this let’s use the alternate set_fault method.
client.set_fault(['read', 'read_buf', 'write', 'write_buf'], True, 0, 0, "", False, 0, False)
Fiddling with the error probability
Now we want to trigger the same behavior in 1% of the cases. (The probability
argument is the probability over 100,000.)
client.set_fault(['read', 'read_buf', 'write', 'write_buf'], True, 0, 1000, "", False, 0, False)
Matching a file pattern
Let’s say we want to restrict this behavior to a file named sendmail.cf
.
client.set_fault(['read', 'read_buf', 'write', 'write_buf'], True, 0, 0, ".*sendmail.cf", False, 0, False)
Mix and match: the agonising drive simulator
Let’s make a drive filesystem that drags to a crawl, and returns I/O errors on 10% of system calls.
client.set_all_fault(False, errno.EIO, 10000, "", False, 100000, False)
Clearing faults: to stop injecting fault in future system calls
You can clear a fault for a single system call with clear_fault
. This
client.clear_fault(“fsync”)
will do it only for the fsync syscall.
Or if you are having a good day and want to clear all errors:
client.clear_all_fault()
All the code of this cookbook as been compiled in a simple python script that can be found in the cookbook subdirectory of the project.
Tests for ScyllaDB
CharybdeFS is used for testing the ScyllaDB NoSQL database. Scripts for the ScyllaDB tests are in the /tests directory of the CharybdeFS project. If you need a fast, resilient database that’s compatible with Apache Cassandra, you can be sure that it will do the right thing if an SSD fills up or returns an error.
Not just Python
Using CharybdeFS for a project in another language, or a different test framework? Please let us know and we’ll link to your example. Pull requests are welcome.
Open Source
CharybdeFS is open source and available on GitHub. An intro to CharybdeFS is CharybdeFS: a new fault-injecting filesystem for software testing.
Subscribe to this blog’s RSS feed for automatic updates. Or follow ScyllaDB on Twitter.
Photo: John for Wikimedia Commons. Available under the Creative Commons CC BY 2.0 license.