TL;DR Software developer should cover each fault case with appropriate error handler, even if the case is impossible. Because even impossible case works sometimes. Because shit happens.

Let’s see an example.

I develop an authentication system based on JSON Web Tokens. There are following steps.

  1. Client application sends device information to backend.
  2. Backend saves the information into database and issues a device token.
  3. Client sends the token with user credentials on login.
  4. Backend decodes the token and binds device to the user.

There is an impossible scenario. Backend successfully decodes device token, but cannot find device information in the database. This scenario is impossible, because the token issued only after successful database write, moreover it contains a device ID generated by the database. And client cannot forge the token (theoretically) because it doesn’t have a cryptographic key. So the case is impossible, and I don’t have to cover it by special error handler. Correct? No.

Let’s see what could happen here.

  1. Backend saves a device and issues a token.
  2. Database corrupts.
  3. Administrator rollbacks the database to a previous snapshot, that doesn’t contain the device information.
  4. Client owns the valid device token, but the device information doesn’t exist in the database. The impossible case works!

If I don’t cover the case by an error handler, the backend will return a vague “500 Internal Server Error.” But it isn’t the server error, it’s the client error, because the client sends invalid token. And the backend must inform it by an appropriate error code. So the client will be able to throw away invalid token and reregister the device, instead of showing useless error message.

Therefore, adding error handlers for impossible fault cases increases sustainability of the system.

You can say: “Well, all correct, but it happens so rare. Why do we need to care about it? These efforts will never pay off.” And you won’t be right. It happens much more frequently than you expect. People are optimists. We suck at estimating risks. Every time we think about something bad, we think it won’t happen, at least with us. A lot of people dies every day of lung cancer and atherosclerosis. But a lot of people keeps on smoking and eating fast-food. They are optimists. Every day we stumble with poorly developed software, and keep on develop fragile systems, because we’re optimists too. We all think that shit won’t happen with us, it could happen with someone else. But it isn’t true. The true is that shit will definitely happen with us.

Here is my top three.

  1. Firefox had been updated and switched off incompatible extensions. LastPass was one of them. It had happened when I had to pay a bill. So I wasn’t able to login into online bank, because of browser update! Needless to say I don’t use Firefox anymore.
  2. Trello lost connection to its server and had silently lost the changes I made on a board. Communication between my teammates was broken.
  3. Twitter CDN was dead, and my browser wasn’t able to load JavaScript. Thus I wasn’t able to write a tweet for two days. Nobody got harmed, but it wasn’t good anyway.

Devil in the details. You program could work well in the ideal conditions, but remember that there are no ones. So, next time, when you develop software, please, switch you brain in paranoid mode. It will help your system to be robust and the world to be better.

P.S. Hey, look mom, I’ve invented a cool buzzword!

I have been asked to interview Python programmers for our team recently. And I gave them a task—implement dictionary-like structure Tree with the following features:

>>> t = Tree()
>>> t['a.x'] = 1
>>> t['a.y'] = 2
>>> t['b']['x'] = 3
>>> t['b']['y'] = 4
>>> t == {'a.x': 1, 'a.y': 2, 'b.x': 3, 'b.y': 4}
True
>>> t['a'] == {'x': 1, 'y': 2}
True
>>> list(t.keys())
['a.x', 'a.y', 'b.x', 'b.y']
>>> list(t['a'].keys())
['x', 'y']

“It’s quite simple task,” you may think at a glance. But it isn’t, in fact it’s tricky as hell. Any implementation has its own trade-offs and you can never claim that one implementation better another—it depends on context. There is also a lot of corner cases that have to be covered with tests. So I expected to discuss such tricks and trade-offs on the interview. I think, it is the best way to learn about interviewee problem solving skills.

However, there is one line of code that gives away bad solution.

class Tree(dict):

Inheritance from built-in dict type. Let’s see why you shouldn’t do that and what you should do instead.

Python dictionary interface has number of methods that seems to use one another. For example, reading methods:

>>> d = {'x': 1}
>>> d['x']
1
>>> d.get('x')
1
>>> d['y']          # ``__getitem__`` raises KeyError for undefined keys
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'y'
>>> d.get('y')      # whereas ``get`` returns None
>>> d.get('y', 2)   # or default value passed as second argument
2

So you can expect that dict.get() method is implemented like this:

def get(self, key, default=None):
    try:
        return self[key]
    except KeyError:
        return default

And you can also expect that overriding dict.__getitem__() behavior you will override dict.get() behavior too. But it doesn’t work this way:

>>> class GhostDict(dict):
...     def __getitem__(self, key):
...         if key == 'ghost':
...             return 'Boo!'
...         return super().__getitem__(key)
...
>>> d = GhostDict()
>>> d['ghost']
'Boo!'
>>> d.get('ghost')  # returns None
>>>

It happens, because Python built-in dict is implemented on C and its methods are independent of one another. It is done for performance, I guess.

So what you really need is Mapping (read-only) or MutableMapping abstract base classes from collections.abc module. The classes provide full dictionary interface based on a handful of abstract methods you have to override and they work as expected.

>>> from collections.abc import Mapping
>>> class GhostDict(Mapping):
...     def __init__(self, *args, **kw):
...         self._storage = dict(*args, **kw)
...     def __getitem__(self, key):
...         if key == 'ghost':
...             return 'Boo!'
...         return self._storage[key]
...     def __iter__(self):
...         return iter(self._storage)    # ``ghost`` is invisible
...     def __len__(self):
...         return len(self._storage)
...
>>> d = GhostDict(x=1, y=2)
>>> d['ghost']
'Boo!'
>>> d.get('ghost')
'Boo!'
>>> d['x']
1
>>> list(d.keys())
['y', 'x']
>>> list(d.values())
[1, 2]
>>> len(d)
2

Type checking also works as expected:

>>> isinstance(GhostDict(), Mapping)
True
>>> isinstance(dict(), Mapping)
True

P.S. You can see my own implementation of the task in the sources of ConfigTree package. As I said above, it isn’t perfect, it’s just good enough for the context it is used in. And its tests... well, I have no idea what happens there now. I just don’t touch them.

The most miserable being in the world is a lost dog. Number two is a programmer who got legacy. But not the legacy of rich childless dead uncle you never know. No, I mean legacy code of the guy who worked before you. You are smart, you use agile, test driven development, continuous integration and other cool things... But it doesn’t matter anymore. Because the guy preferred to apply hot fixes on production using Vim and SSH. The repository keeps outdated broken code. The live code accidentally crashes, but nothing useful could be found in the logs. And you have to deal with it. My condolences, you got legacy.

And the most frustrating thing is that you cannot start from scratch, because the product is alive, too much time and money had been invested, you know, you are smart, fix it, please. And then you get paralysis. You have to do something, but you cannot force yourself to start coding. You take a cup of coffee, check your inbox, check for updates on Redit, Facebook, Twitter, then check your inbox again, then another cup of coffee, then lunchtime, then updates... and that never ends. But what the heck? You can write code all day long, and all night long. You love it. What happens here? Why you cannot just start?

The answer is chaos. You don’t know what to do. And you must have a plan.

  1. Explain to your family, that the bad mood you got is not because of them. It’s really important.
  2. Explain to your customer, that you are going to fix stuff, but some working things might accidentally be broken. It’s really important too. The paralysis you got includes the fear to break something, because of lack of understanding how things work together.
  3. Create new repository and place the code from production into it. You really don’t need to find out, which hot fixes had been applied on the live code and why code in the current repository is outdated and broken. Just throw it away.
  4. Make working development environment. It shall give you some inside of how the code works and will help to eliminate the fear.
  5. Make test plan. Some day you will make automated tests. But for now, simple checklist would be enough. It will help you to control process. More control—less fear.
  6. Setup staging environment and continuous integration. No comments, you must have it.

At the moment you make these six steps, you will have understanding of what you have to do and how you have to do it. The paralysis will go away, and the confidence will get back. Go ahead, and may the force be with you.

This story had to be published on 13th September, on Day of the Programmer. But I always forget about this holiday. So I did it again this year.

It happened about 15 years ago. I was a student and worked on a game with my friends. The game was multiplayer sci-fi turn-based strategy. There was no graphics, just text mode. And its gameplay was endless. The game was run on Turbo Pascal on PC with Intel 80386 processor. Golden times!

Once we had to implement a function that generates names for new weapons. Because of endless gameplay, upgrade process was endless too. So the function had to generate a new name for each call. The idea was to join a couple of random prefixes, random string, and random suffix. Prefixes and suffix had to be selected from predefined lists. The prefix list looked like this: hyper, mega, plasma, etc. And the suffix list was like this: gun, cannon, blaster, rifle, etc. The middle string was random combination of consonant syllables.

So the function had been done, and we ran a test. And in the first dozen of names, it printed out a name of probably the most powerful weapon in whole known universe: Super Megadick Launcher.

The test had been passed. The game unfortunately had not been finished.

If you develop applications that use untrusted input, you deal with validation. No matter which framework or library you are using. It is a common task. So I am going to share a recipe that neatly integrates validation layer with business logic one. It is not about what data to validate and how to validate they, it is mostly about how to make code looks better using Python decorators and magic methods.

Let’s say we have a class User with a method login.

class User(object):

    def login(self, username, password):
        ...

And we have a validation schema Credentials. I use Colander, but it does not matter. You can simply replace it by your favorite library:

import colander


class Credentials(colander.MappingSchema):

    username = colander.SchemaNode(
        colander.String(),
        validator=colander.Regex(r'^[a-z0-9\_\-\.]{1,20}$')
    )
    password = colander.SchemaNode(
        colander.String(),
        validator=colander.Length(min=1, max=100),
    )

Each time you call login with untrusted data, you have to validate the data using Credentials schema:

user = User()
schema = Credentials()

trusted_data = schema.deserialize(untrusted_data)
user.login(**trusted_data)

The excessive code is a trade-off for flexibility. Such methods are also can be called using trusted data. So we can’t just put validation into the method itself. However, we can bind the schema to the method without loss of flexibility.

Firstly, create validation package using the following structure:

myproject/
    __init__.py
    ...
    validation/
        __init__.py
        schema.py

Then add the following code into myproject/validation/__init__.py (again, usage of cached_property is inessential detail, you can use the same decorator provided by your favorite framework):

from cached_property import cached_property

from . import schema


def set_schema(schema_class):
    def decorator(method):
        method.__schema_class__ = schema_class
        return method
    return decorator


class Mixin(object):

    class Proxy(object):

        def __init__(self, context):
            self.context = context

        def __getattr__(self, name):
            method = getattr(self.context, name)
            schema = method.__schema_class__()

            def validated_method(params):
                params = schema.deserialize(params)
                return method(**params)

            validated_method.__name__ = 'validated_' + name
            setattr(self, name, validated_method)
            return validated_method

    @cached_property
    def validated(self):
        return self.Proxy(self)

There are three public objects: schema module, set_schema decorator, and Mixin class. The schema module is a container for all validation schemata. Place Credentials class into this module. The set_schema decorator simply adds passed validation schema to decorating method as __schema_class__ attribute. The Mixin class adds proxy object validated. The object provides access to the methods with __schema_class__ attribute and lazily creates their copies wrapped by validation routine. This is how it works:

from myproject import validation


class User(object, validation.Mixin):

    @validation.set_schema(validation.schema.Credentials)
    def login(self, username, password):
        ...

Now, we can call validated login method within a single line of code:

user = User()

user.validated.login(untrusted_data)

So what we get: the code is more compact; it is still flexible, i.e. we can call the method without validation; and it is more readable and self-documenting.

I hate writing documentation, but I have to. Good actual documentation significantly decreases efforts for introduction new teammates. And of course, nobody would use perfect open source code without documentation. So I have written plenty of documents. The most of them are miserable. But I tried to find a way to make them better. And it seems, I have found the general mistake I did.

Typical documentation consists of three parts:

  • Getting started guide describes main features and principles.
  • Advanced usage guide describes each feature in details.
  • Internals or API documentation describes low level things, i.e. particular modules, classes, and functions. It is usually generated from doc-strings of the sources.

I used to write documentation in the direct order: getting started tutorial, advanced section, and finally internals. Don’t do that. If you want to write good documentation, you have to write it in the opposite order.

This is how it works. The most important thing of any documentation is cross-linking. When you describe a feature that consists of a number of smaller ones, you have to link each mention of the smaller feature to its full description. That is why internal documentation generated from doc-strings is your foundation. It is quite easy to document particular function or class (lazy developers guess it is enough). So when you describe how the things work together, you can link mentions of the particular thing to its own documentation, instead of overburden the entire description by the details. The same works for getting started tutorial. It must be concise, but there must be links to the full description of each feature it mentions.

There is no magic. This technique just makes documentation writing process more productive and fun. Use it and make your documents better and your users happier.

PasteDeploy is a great tool for managing WSGI applications. Unfortunately, there is no support of configuration formats other than INI-files. Montague is going to solve the problem, but its documentation is unfinished and says nothing useful. Hope, it will be changed soon. But if you don’t want to wait, as me do, the following recipe is for you.

Using ConfigTree on my current project, I stumbled with the problem: how to serve Pyramid applications (I got three ones) from the custom configuration? Here is how it looks like in YAML:

app:
    use: "egg:MyApp#main"
    # Application local settings goes here
filters:
    -
        use: "egg:MyFilter#filter1"
        # Filter local settings goes here
    -
        use: "egg:MyFilter#filter2"
server:
    use: "egg:MyServer#main"
    # Server local settings goes here

The easy way is to build INI-file and use it. The hard way is to make my own loader. I chose the hard one.

PasteDeploy provides public functions loadapp, loadfilter, and loadserver. However, these functions don’t work, because they don’t accept local settings. Only global configuration can be passed into.

app = loadapp('egg:MyApp#main', global_conf=config)

But the most of PasteDeploy-based applications simply ignore global_conf. For example, here is the paste factory of Waitress:

def serve_paste(app, global_conf, **kw):
    serve(app, **kw)        # global_conf? Who needs this shit?
    return 0

I dug around the sources of PasteDeploy and found loadcontext function. It is kind of low level private function. But who cares? So here is the source of loader, that uses the function.

from paste.deploy.loadwsgi import loadcontext, APP, FILTER, SERVER


def run(config):

    def load_object(object_type, conf):
        conf = conf.copy()
        spec = conf.pop('use')
        context = loadcontext(object_type, spec)    # Loading object
        context.local_conf = conf                   # Passing local settings
        return context.create()

    app = load_object(APP, config['app'])
    if 'filters' in config:
        for filter_conf in config['filters']:
            filter_app = load_object(FILTER, filter_conf)
            app = filter_app(app)
    server = load_object(SERVER, config['server'])
    server(app)

But it is not the end. Pyramid comes with its own command pserve, that uses PasteDeploy to load and start up application from INI-file. And there is an option of the command that makes development fun. I mean --reload one. It starts separate process with a file monitor that restarts your application when its sources are changed. The following code provides the feature. It depends on Pyramid, because I don’t want to reinvent the wheel. But if you use another framework, it won’t be hard to write your own file monitor.

import sys
import os
import signal
from subprocess import Popen

from paste.deploy.loadwsgi import loadcontext, APP, FILTER, SERVER
from pyramid.scripts.pserve import install_reloader, kill


def run(config, with_reloader=False):

    def load_object(object_type, conf):
        conf = conf.copy()
        spec = conf.pop('use')
        context = loadcontext(object_type, spec)
        context.local_conf = conf
        return context.create()

    def run_server():
        app = load_object(APP, config['app'])
        if 'filters' in config:
            for filter_conf in config['filters']:
                filter_app = load_object(FILTER, filter_conf)
                app = filter_app(app)
        server = load_object(SERVER, config['server'])
        server(app)

    if not with_reloader:
        run_server()
    elif os.environ.get('master_process_is_running'):
        # Pass your configuration files here using ``extra_files`` argument
        install_reloader(extra_files=None)
        run_server()
    else:
        print("Starting subprocess with file monitor")
        environ = os.environ.copy()
        environ['master_process_is_running'] = 'true'
        childproc = None
        try:
            while True:
                try:
                    childproc = Popen(sys.argv, env=environ)
                    exitcode = childproc.wait()
                    childproc = None
                    if exitcode != 3:
                        return exitcode
                finally:
                    if childproc is not None:
                        try:
                            kill(childproc.pid, signal.SIGTERM)
                        except (OSError, IOError):
                            pass
        except KeyboardInterrupt:
            pass

That’s it. Wrap the code with a console script and don’t forget to initialize the logging.

I have just released ConfigTree. It is the longest project of mine. It took more than two and a half years from the first commit to the release. But the history of the project is much longer.

The idea came from “My Health Experience” project. It was a great project I worked on, unfortunately it is closed now. My team started from a small forum and ended up with a full featured social network. We got a single server at the start and a couple of clusters at the end. A handful of configuration files grew up to a directory with dozens of ones, which described all subsystems in all possible environments. Each module of the project had dozens of calls to the configuration registry. And we developed a special tool to manage the settings.

This is how it worked. An environment name was a dot-separated string in format group.subgroup.environment. For instance, prod.cluster-1.server-1 was an environment name of the first server from the first cluster of the production environment; and dev.kr41 was the name of my development environment. The configuration directory contained a tree of subdirectories, where each of the subdirectory was named after a part of some environment name. For example:

config/
    prod/
        cluster-1/
            server-1/
    dev/
        kr41/

The most common configuration options were defined at the root of the tree, the most specific ones—at the leafs. For example, config/prod directory contained files with common production settings; config/prod/cluster-1—common settings for all servers of the first cluster; and config/prod/cluster-1/server-1—concrete settings for the first server. The files were merged by a loader on startup into a single mapping object using passed environment name. Some of the common settings were overridden by the concrete ones during the loading process. So that we did not use copy-paste in our configuration files. If there was an option for a number of environments, this option had been defined within group settings. There we also post-loading validation, that helped us to use safe defaults. For instance, when each server had to use its own cryptographic key, such key had been defined on the group level with an empty default value, which was required to be overridden. So that validator raised an exception on startup, when it had found this empty value in the result configuration. Because of this we never deployed our application on production with unsafe settings.

The tool was so useful, so when I started to use Python I had tried to find something similar. Yep, “My Health Experience” had been written on PHP, and it was the last PHP project I worked on. My search was unsuccessful, and I reinvented such tool working on each my project. So I eventually decided to rewrite and release it as an open-source project. And here it is.

I added some flexibility and extensibility to the original ideas. Each step of configuration loading process can be customized or replaced by your own implementation. It also comes with command line utility program, which can be used to build configuration as a single JSON file. So you can even use it within a non-Python project—JSON parser is all what you need. I hope, the tool is able to solve a lot of problems and can be useful for different kind of projects. Try it out and send me your feedback. As for me, I am going to integrate it into my current project right now.

Requests library is a de facto standard for handling HTTP in Python. Each time I have to write a crawler or REST API client, I know what to use. I have made a dozen of ones during the last couple of years. And each time I stumbled one frustrating thing. I mean requests.exceptions.ConnectionError which is unexpectedly raised with the message error(111, 'Connection refused') after 3–5 hours of client uptime, when remote service works well and stays available.

I don’t know for sure why it happens. I have a couple of versions, but essentially they all about unideal world we live in. Connection may die or hang. Highly loaded web server may refuse request. Packets may be lost. Long story short—shit happens. And when it happens, default Requests settings will not be enough.

So if you are going to make long-live process, which will use some services via Requests, you should change its default settings in this way:

from requests import Session
from requests.adapters import HTTPAdapter


session = Session()
session.mount('http://', HTTPAdapter(max_retries=5))
session.mount('https://', HTTPAdapter(max_retries=5))

HTTPAdapter performs only one try by default and raises ConnectionError on fail. I started from two tries, and empirically got that five ones gives 100% resistance against short-term downtimes.

I am not sure is it a bug or a feature of the Requests. But I never see that these default settings are changed in some Requests-based library like Twitter or Facebook API client. And I got such errors using these libraries too. So if you are using such library, examine its code. Now you know how to fix it. Thanks Python design, there are no true private members.

Unfortunately, I cannot reproduce this bug (if it is a real bug) in laboratorial conditions for now. So I will be grateful, if somebody suggests me how to.

I released GreenRocket library on October 2012. It is a dead simple implementation of Observer design pattern, which I use in almost all of my projects. I thought, there was nothing to improve. But my recent project heavily uses the library. And I get tired to write tests that checks signals. This is how them look like:

from nose import tools
# I use Nose for testing my code

from myproject import MySignal, some_func
# ``MySignal`` inherits ``greenrocket.Signal``
# ``some_func`` must fire ``MySignal`` as its side-effect


def test_some_func():
    log = []                    # Create log for fired signals

    @MySignal.subscribe         # Subscribe a dummy handler
    def handler(signal):
        log.append(signal)      # Put fired signal to the log

    some_func()                 # Call the function to test

    # Test fited signal from the log
    tools.eq_(len(log), 1)
    tools.eq_(log[0].x, 1)
    tools.eq_(log[0].y, 2)
    # ...and so on

There are four lines of utility code. And it is boring. So I added helper class Watchman to the library to make it testing friendly. This is how it works:

from greenrocket import Watchman

from myproject import MySignal, some_func


def test_some_code_that_fires_signal():
    watchman = Watchman(MySignal)           # Create a watchman for MySignal
    some_func()
    watchman.assert_fired_with(x=1, y=2)    # Test fired signal

Just one line of utility code and one line for actual test! I have already rewritten all of my tests. So if you are using the library, it’s time to upgrade. If you don’t, then try it out.