Fault-injecting filesystem cookbook

By Benoît Canet

May 2, 2016

Block devices sometimes do bad things (or just fill up), so sometimes bad things happen to good software. CharybdeFS makes it easy to do integration testing that covers hard-to test filesystem errors. And good error handling is a sign of well-thought-out software. For example, your program will make a much better impression on users if you have it show a nice “insufficient space” message than if it just crashes for no apparent reason. The CharybdeFS filesystem lets you inject arbitrary file errors for testing. This article covers some common examples for getting started.

Running the cookbook

The examples in this article are included in the cookbook subdirectory of the CharybdeFS project on GitHub. You can clone the project with git clone https://github.com/scylladb/charybdefs.git.

Because CharybdeFS is a filesystem, the tests need to run as root. You can run as root in a container, or use sudo. To build and run the cookbook tests on CentOS 7, do

yum install epel-release
yum install gcc-c++ cmake cmake thrift fuse-devel python-thrift thrift-devel
thrift -r --gen cpp server.thrift &> /dev/null
cmake CMakeLists.txt
make
modprobe fuse
cd cookbook
python demo.py

The thrift command needs to be re-run if you upgrade CharybdeFS. (You don’t need to run it every time you change the program you are testing.)

The modprobe command only needs to be done when you reboot. Or add fuse to /etc/modules to insert it every time you boot the system.

Integrating the recipes into your projects

You can look at the layout of the cookbook sample project to see how to integrate CharybdeFS into your project’s existing unit tests. The setUp and tearDown functions in the test setup will start and stop CharybdeFS for you. These examples are in Python, but you can make it work with projects in other languages, and with other build and test tools.

server fire photo
Bad things can happen to good filesystems. Can your program handle rare errors correctly?

Anatomy of the cookbook

The following snippet of code takes care of instantiating thrift and connecting to CharybdeFS, leaving you at the point where you can instruct CharybdeFS to do something useful.

## The mandatory boilerplate
   import sys, glob
   import errno

   import sys

   sys.path.append('gen-py')
   from server import server
   from server.ttypes import *

   from thrift import Thrift
   from thrift.transport import TSocket
   from thrift.transport import TTransport
   from thrift.protocol import TBinaryProtocol

   def connect():
       transport = TSocket.TSocket('127.0.0.1', 9090)
       transport = TTransport.TBufferedTransport(transport)
       protocol = TBinaryProtocol.TBinaryProtocol(transport)
       client = server.Client(protocol)
       transport.open()
       return client

   def main():
       client = connect()

See server.thrift for a full reference on the methods of the client object returned from the server.Client call, including the arguments for set_fault and set_all_fault. The examples in this cookbook assume that you have already created a client as above. You can re-use the same client for the entire test suite.

Some CharybdeFS recipes

Disk full

The following code will return ENOSPC on all filesystem operations:

import errno
client.set_all_fault(False, errno.ENOSPC, 0, "", False, 0, False)

IO error

This will return EIO on all filesystem operations:

import errno
client.set_all_fault(False, errno.EIO, 0, "", False, 0, False)

Quota exceeded

Same idea for a quota exceeded error.

import errno
client.set_all_fault(False, errno.EDQUOT, 0, "", False, 0, False)

All the available errno codes on your operating system can be used. Read the errno documentation and get imaginative.

Very slow writes

Now, let’s delay each filesystem operation by 50 ms.

client.set_all_fault(False, 0, 0, "", False, 50000, False)

Returning random errors

To return a random error just use the random error flag.

client.set_all_fault(True, 0, 0, "", False, 0, False)

Restricting errors to specific syscalls

Let’s say we want to return random error on reads and writes. For this let’s use the alternate set_fault method.

client.set_fault(['read', 'read_buf', 'write', 'write_buf'], True, 0, 0, "", False, 0, False)

Fiddling with the error probability

Now we want to trigger the same behavior in 1% of the cases. (The probability argument is the probability over 100,000.)

client.set_fault(['read', 'read_buf', 'write', 'write_buf'], True, 0, 1000, "", False, 0, False)

Matching a file pattern

Let’s say we want to restrict this behavior to a file named sendmail.cf.

client.set_fault(['read', 'read_buf', 'write', 'write_buf'], True, 0, 0, ".*sendmail.cf", False, 0, False)

Mix and match: the agonising drive simulator

Let’s make a drive filesystem that drags to a crawl, and returns I/O errors on 10% of system calls.

client.set_all_fault(False, errno.EIO, 10000, "", False, 100000, False)

Clearing faults: to stop injecting fault in future system calls

You can clear a fault for a single system call with clear_fault. This

client.clear_fault(“fsync”)

will do it only for the fsync syscall.

Or if you are having a good day and want to clear all errors:

client.clear_all_fault()

All the code of this cookbook as been compiled in a simple python script that can be found in the cookbook subdirectory of the project.

Tests for ScyllaDB

CharybdeFS is used for testing the ScyllaDB NoSQL database. Scripts for the ScyllaDB tests are in the /tests directory of the CharybdeFS project. If you need a fast, resilient database that’s compatible with Apache Cassandra, you can be sure that it will do the right thing if an SSD fills up or returns an error.

Not just Python

Using CharybdeFS for a project in another language, or a different test framework? Please let us know and we’ll link to your example. Pull requests are welcome.

Open Source

CharybdeFS is open source and available on GitHub. An intro to CharybdeFS is CharybdeFS: a new fault-injecting filesystem for software testing.

Subscribe to this blog’s RSS feed for automatic updates. Or follow ScyllaDB on Twitter.

Photo: John for Wikimedia Commons. Available under the Creative Commons CC BY 2.0 license.

deep-dive testing

Previous Post Next Post

Apache® and Apache Cassandra® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Amazon DynamoDB® and Dynamo Accelerator® are trademarks of Amazon.com, Inc. No endorsements by The Apache Software Foundation or Amazon.com, Inc. are implied by the use of these marks.

Why ScyllaDB?

Is ScyllaDB right for me?

ScyllaDB University

ScyllaDB Blog

Fault-injecting filesystem cookbook

Running the cookbook

Integrating the recipes into your projects

Anatomy of the cookbook

Some CharybdeFS recipes

Disk full

IO error

Quota exceeded

Very slow writes

Returning random errors

Restricting errors to specific syscalls

Fiddling with the error probability

Matching a file pattern

Mix and match: the agonising drive simulator

Clearing faults: to stop injecting fault in future system calls

Tests for ScyllaDB

Not just Python

Open Source

Start scaling with the world's best high performance NoSQL database.

Why ScyllaDB?

Is ScyllaDB right for me?

ScyllaDB University

ScyllaDB Blog

Fault-injecting filesystem cookbook

Running the cookbook

Integrating the recipes into your projects

Anatomy of the cookbook

Some CharybdeFS recipes

Disk full

IO error

Quota exceeded

Very slow writes

Returning random errors

Restricting errors to specific syscalls

Fiddling with the error probability

Matching a file pattern

Mix and match: the agonising drive simulator

Clearing faults: to stop injecting fault in future system calls

Tests for ScyllaDB

Not just Python

Open Source

Related Posts

Start scaling with the world's best high performance NoSQL database.

Subscribe to the ScyllaDB Blog