It is usually a bad idea to measure programmer’s efficiency in numbers. There is a lot of useless metrics and each one brings negative (even disruptive) side effects. Once I got a piece of code that had been written by someone who literally was been paid for each line of code. That was a module of about 9000 lines of JavaScript spaghetti. The author did not use neither loops nor arrays. I squeezed the module to about 150 LOC. I also heard about a company, where programmers get bonuses for the lowest number of introduced bugs at the end of month. Their “top” engineers just do not write code. And there are dozens of such “success” stories over the Internet. If manager is dumbass, nothing will help from.

Nevertheless, sometimes it makes sense to get some numbers. It can be used as a signal that something went wrong. Maybe your teammate got family problems or burnout. Anyway, if a programmer usually produces X units of work, but have done only X/2 in the last measured period, it is time to figure out what happened.

I am not a manager, and I use metrics to estimate my own productivity. I often work totally without supervision and my collaboration with customers is only based on trust: “Do what you want to, but get it done.” So I need some metrics for self-control. And I discovered an optimal one (at least it works for me).

Meet the number of unit tests (NOUT). Unit test usually has constant complexity. It is a key feature. And it is obvious, that NOUT grows proportionally with program complexity. So you can measure productivity in NOUT/week or total progress in just NOUT. You can even try to estimate features in NOUT. It seems, it will be more accurate rather than story points. And if you start to pay your programmers for NOUT, they will just improve test coverage (it is a bad advise, don’t do that, they will find out how to fuck you up).

In my opinion, it is still far from perfect. But if it is used with wisdom, it could give you some advantage. If it isn’t, well... see the first paragraph.

It is not my regular article, it is the appeal for IT community. If you think, the thoughts stated below are worthwhile, please share them. To make it real we need efforts of whole community. Thus we need to make some buzz.

The Problem

Modern web application usually supports a number of authentication methods. But almost all web applications still support authentication using login-password pair. It is usually named “simple” by developers. So the article is about it.

User should not use one password for different web services. It is recommended for security reasons. However, it is almost impossible for human to remember a number of passwords for a number of web services. Fortunately, there are applications, so known password keepers, which help to use different strong passwords for different web applications.

This is how typical password keeper works. It keeps all user’s credentials in a storage encrypted by master password. The user have to remember only the master password to be able to access the storage. Whereas all other passwords can be long randomly generated strings, because the user never enters them manually, password keeper does that. So, here is the problem. Authentication and registration web forms are designed for human. Thus, password keeper have to use a tricky algorithm to recognize such forms. Even though it works well in the most scenarios, there is one where password keepers are useless. It is password changing.

I keep my passwords in LastPass. After Heartbleed incident had happened, I decided to change them all. I spent whole Friday night sipping Scotch and stupidly clicking:

  1. Go to a web site
  2. Login
  3. Change the password
  4. Logout
  5. Go to the next one...

It definitely should be done in one click by password keeper!

Possible Solution

I am not going to propose a detailed solution, it is a subject for big discussion. But here is a rough one. The main idea is that web applications should provide unified REST API. So that password keeper applications will be able to interact with any web application using the same API. Such API should include endpoints for: user registration, authentication (login), deauthentication (logout), changing password, and utility one for testing available user names (it will be used during registration to indicate whether a user name is already in use). Description of the API can be provided by a file hosted at the root of web application (similar to robots.txt, favicon.ico, and sitemap.xml). Let it be simpleauth.json. Here is an example of one:

{
    "endpoints": {
        "login": "https://example.com/simple-auth/login",
        "logout": "https://example.com/simple-auth/logout",
        "register": "https://example.com/simple-auth/register",
        "testusername": "https://example.com/simple-auth/testusername",
        "changepassword": "https://example.com/simple-auth/changepassword"
    },
    "password": {
        "min": 6,
        "max": 255,
        "chars": ["a-z", "A-Z", "0-9", "!@#$%^&*()-_+=|\/?.,<>"]
    },
    "username": {
        "min": 3,
        "max": 30,
        "chars": ["a-z", "A-Z", "0-9", " _-."]
    }
}

It describes endpoints and restrictions for password and user name. So that password keeper will be able to automatically interact with web application using this information.

The goal of the protocol is improving of security. For instance, you can tune your password keeper to change password after each login. So that, you will use new password for each session. Of course, there is a possible problem, that the protocol can be used for bruteforcing. But I think, it is solvable by restriction of number of unsuccessful requests.

In conclusion. If you think, it is worthwhile, share it, because without buzz it will be just idea. Again, we need efforts of the whole community, to make it real. Proposals and criticism are welcome too.

Traversal is awesome thing, I believe that it is real killer feature of Pyramid web framework. However people usually don’t get it. They think it is too complicated. So I’m going to convince you in the opposite.

I assume, you know that Pyramid supports two URL handling methods: URL Dispatch and Traversal. And you familiar with the technical details of how them work (follow the links above if you don’t). So here I’m considering the benefits of Traversal, instead of how it actually works.

Pyramid is super-flexible framework, where you can do thing in the way you want to. Traversal is not an exception. To start working with Traversal, you just need to provide a root_factory callable, which accepts single argument request and returns a root resource of your web application. The root can be arbitrary object. However, to feel all power of traversal the root_factory should return a resource tree—a hierarchy of objects, where each one provides the following features:

  • it knows its name, i.e. has __name__ attribute;

  • it knows its parent, i.e. has __parent__ attribute;

  • it knows its children, i.e. implements __getitem__ method in the following way:

    >>> root = root_factory(request)
    >>> child = root['child_resource']
    >>> child.__name__
    'child_resource'
    >>> child.__parent__ is root
    True
    

So that to build URL structure of your web site, you should build a resource tree—a bunch of classes, in fact. And that is exactly what usually confuses people. Is it overengineering? Why so complicated? Indeed, writing a dozen of routes will take exactly a dozen lines of code. Whereas writing a couple of classes will take much more ones.

However, the answer is “No”, it’s not overengineering. Traversal use resource tree for handling URLs, but the resource tree itself is not only used to represent URL structure. It is a perfect additional abstraction level which can encapsulate business logic. In that way the old holy war about fat models and skinny controllers (views in Pyramid terms) can be solved.

Resource also provides a unified interface between models and views. From one hand, you can build your models using different data sources: RDBMS, NoSQL, RPC, REST, and other terrifying abbreviations. And resources will make them work together. From other hand, you can use these resources in different interfaces: web (which Pyramid actually provides), RPC, CLI, even tests. Because test is just another interface of your application. And yes, using Traversal will make testing much more easier.

But what about URL structure? Using traversal is hard to start, you should build a resource tree. However these efforts will be rewarded in future. Because supporting traversal-based application is a walk in the park. For example, you have code that implements blog:

class Blog(Resource):

    def __getitem__(self, name):
        return BlogPost(name, parent=self)


class BlogPost(Resource):
    ...


@view_config(context=Blog)
def index(context, request):
    ...

@view_config(context=Blog, view_name='archive')
def archive(context, request):
    ...

@view_config(context=BlogPost)
def show(context, request):
    ...

Now, you can bind Blog resource to other ones, to add blogs into different places of your site. And it can be done with a couple of lines of code:

class User(Resource):

    def __getitem__(self, name):
        if name == 'blog':
            # Now each user has her own blog
            return Blog(name, parent=self)
        elif ...

From this point of view, resource with associated views can be considered as a reusable component, just like application in Django. You can also use mixin classes to create plugins:

class Commetable(object):
    """ Implements comment list """

class Likeable(object):
    """ Implements like/unlike buttons behavior """

class BlogPost(Resource, Commentable, Likeable):
    """ Blog post that can be commented and liked/unliked """

You can even make the trick, which I described in Obscene Python, i.e. constructing your resource classes on the fly using different set of mixins for each one.

And the last, but not least, Traversal is a right way for handling URLs, because it works with hierarchical structure which reflects URL. Whereas URL Dispatch uses flat list of regular expressions. So that, task like rendering breadcrumb navigation is trivial for traversal-based application, but it is hard as hell using URL Dispatch (in fact, it cannot be done without dirty hacks).

So if you are going to use Traversal, try also TraversalKit. This library is essential of my own experience of Traversal usage. I hope it will be useful for you too.

P.S. The article has been written in transfer zone of Moscow airport Domodedovo on my way from PyCon Finland 2014, Helsinki to Omsk.

I sighted increasing interest to my project Bash Booster last week. There is a lot of links, tweets, discussions over the Internet. So, I think I have to write this post to clarify what Bash Booster is and what it isn’t.

First of all, I am definitely not devop. However I played this role at the begin of last summer. This is how it happened. I finished my project, and had to wait while Thumbtack’s sales team was signing a contract of a new one. At that time, another team went into production and they needed help. Because I had nothing to do, I joined the team as a devop engineer. I had to write Chef recipes for AWS OpsWorks to deploy and manage application cluster. It was my first and last time I was using Chef.

Chef is awesome thing, but IMO it is too complicated. I spent a week looking into it. I think, it is mainly because of Chef documentation. I guess, it had been written by marketing-oriented assholes. I understand their goals, you can buy support, and they will make you happy. But if you decide to make it work by your own... Well, there is nobody to blame. Another reason is that Chef is written in Ruby (at least some of its components). The fact itself is not bad. However, in my experience, rubists don’t give a shit about backward compatibility. If Ruby is not your primary tool, get ready to shamanic dance. The most stuff does not work on Ruby shipped with your favorite distributive, because it’s too old. It is also very possible that it will not work on the most recent version, because this version breaks backward compatibility.

Despite all of these, I got working cluster after three weeks. And Chef have done its work perfectly. It is a good tool for devops, sysadmins, and everyone who works with dozens of servers. However, using it for provisioning single Vagrant virtual machine is overkill, it is like hunting a fly with sledgehammer.

When this project was done, my new one had been postponed, and I decided to go to vacation. But before it happened, I had joined to another team for a couple of days. They asked me to set up developing environment using Vagrant. I had written provisioning script using Bash, and went to booze trip into Kazakhstan.

In the middle of the vacation, I reviewed this script. It had been written in Chef recipe style. I didn’t think about it when I was writing it. But in fact Chef’s philosophy influenced me. So I decide to prove a crazy idea—write a Bash version of Chef subset.

As I wrote above, I am not devop, I am developer. I do not work with dozens of servers. It would rather an exception, when I had to. However, it is my normal routine when I have to setup single server for development purposes. It is also usual to me to share my virtual machines with teammates. So it is obvious, that I intensively use Vagrant. Because it is the best tool for such tasks.

Additionally, I do not like overcomplicated tools, which enforce end user set up various dependencies, write config files, and so on. And learning time is also critical to me. Ideally, it should be zero hours.

So, Bash is such tool. Everybody knows it, and it is installed everywhere. BTW, this fact have played Old Harry with Bash recently. I mean ShellShock vulnerability. But anyway, I decided to make a provisioning tool using Bash. So this is how Bash Booster was born.

It does not include any remote execution feature, because I do not need it. It does not try to cover all existent package managers, because I mainly use Debian-based and RedHat-based distributives. So support of Apt and Yum is enough for me. I also did not try, to unify these two ones. Because similar packages have different names in Apt and Yum, anyway you will have to check what manager do you use. So I applied JavaScript approach: test for feature, instead of version. That is why there are functions bb-apt? and bb-yum?.

I also like readability. Because of readability there is a lot of short functions of 1–2 LOC like this one:

bb-apt-package?() {
    local PACKAGE=$1
    dpkg -s "$PACKAGE" 2> /dev/null | grep -q '^Status:.\+installed'
}

I can use $1 variable directly, but assignment to $PACKAGE means that this function accepts single argument—package name. It is for self documenting. The black magic spell of the second line... Well, I do not want to see it in my final script. After a month I would have to find out what does it mean. And I would hardly reproduce it in the next script. So let it live in the library.

The main feature of Bash Booster is delayed events. It helps to avoid unnecessary operations. That is why sync module does not use rsync (I got a lot of criticism about it). It is not about coping files and directories, it is for triggering events when target and destination files are different.

In short, I solved my practical tasks developing Bash Booster. It is definitely not a Chef substitution. You can not manage a server park using it. It is not useful in writing universal platform-agnostic scripts. But if you need a little bit more than simple Bash script, especially when you need event driven Bash script, it will be helpful for you. Moreover, if you missed something, I will be happy to merge your pull request.

When you develop a library, which should work with a number of Python versions, Tox is obvious choice. However, I start using it even in application developing, where single Python version is used. Because it helps me significantly reduce efforts of documentation writing. How? It transparently manages virtual environments.

For instance, you work on backend, and your colleague works on frontend. This guy is CSS ninja, but knows nothing about Python. So you have to explain him or her how to start the application in development mode. You can do this in two ways.

The first one is to write an instruction. The instruction should explain how to setup virtual environment, activate it, setup application in development mode, and run it. In 9 cases of 10 it blows non-pythonista’s mind away. Moreover, writing docs is tedious. Who likes writing docs, when you can write a script?!

And this is the second way. You can write a script, which creates virtual environment, activates it, installs application, and runs it. But this is exactly what Tox does.

Here is how I do it. The following tox.ini file is from the project I am working on now. It is a Pyramid application, which I test against Python 3.3 and 3.4, but develop using Python 3.4.

[tox]
envlist=py33,py34

[testenv]
deps=
    -r{toxinidir}/tests/requires.txt
    flake8
commands=
    nosetests
    flake8 application
    flake8 tests

[testenv:dev]
envdir=devenv
basepython=python3.4
usedevelop=True
deps=
    -r{toxinidir}/tests/requires.txt
    waitress
commands={toxinidir}/scripts/devenv.sh {posargs}

Take a notice on the section [testenv:dev]. It launches devenv.sh script passing command line arguments, which are not processed by Tox itself. Here is the script:

#!/bin/bash

test() {
    nosetests "$@"
}

serve() {
    pserve --reload "configs/development.ini"
}

cmd="$1"
shift

if [[ -n "$cmd" ]]
then
    $cmd "$@"
fi

And here is an example of the manual:

  1. Install Tox.

  2. To run application use:

    $ tox -e dev serve
    
  3. To run all tests using development environment:

    $ tox -e dev test
    
  1. To run single test using development environment:

    $ tox -e dev test path/to/test.py
    
  2. To run complete test suite and code linting:

    $ tox
    

That’s it. Pretty simple. I copy it from project to project and my teammates are happy. You can even eliminate the first item from the list above, using Vagrant and installing Tox on provision stage. But there is a bug in Distutils, which breaks Tox within Vagrant. Use this hack to make it work.

It is always easy and fun to do something, if you have right tools. Writing tests is not exception. Here is my toolbox, all things at one place. I hope, the following text will save somebody’s time and Google’s bandwidth.

Here we go.

Flake8

It is a meta tool, which tests code using PyFlakes and pep8. The first one is a static analyzer and the second one is a code style checker. They can be used separately, but I prefer they work as a team. It helps to find stupid errors such as unused variables or imports, typos in names, undefined variables, and so on. It also helps to keep code consistent according to PEP 8 Style Guide for Python Code, which is critical for code-style nazis like me. The usage is quite simple:

$ flake8 coolproject
coolproject/module.py:97:1: F401 'shutil' imported but unused
coolproject/module.py:625:17: E225 missing whitespace around operato
coolproject/module.py:729:1: F811 redefinition of function 'readlines' from line 723
coolproject/module.py:1028:1: F841 local variable 'errors' is assigned to but never used

Additionally, Flake8 includes complexity checker, but I never use it. However, it can help to decrease WTFs per minute during code review, I guess.

Nose

It is a unit-test framework, an extension of traditional unittest. I never use the last one itself, so I cannot adequately compare it with Nose. However, at a glance, Nose-based tests is more readable and compact. But it is only my subjective opinion.

Another benefit of Nose is its plugins. Some of them I use from time to time, but there are two, which I use unconditionally on each my project: doctest and cover.

The doctest plugin collects test scenarios from source code doc-strings and run them using Doctest library. It helps to keep doc-strings consistent with the code they describe. It also good place for unit tests for simple functions and classes. If test cases are not too complex, it will be enough to cover the code directly in the doc-string.

The cover plugin calculates test coverage and generates reports like this one:

Name                      Stmts   Miss  Cover   Missing
-------------------------------------------------------
coolproject                  20      4    80%   33-35, 39
coolproject.module           56      6    89%   17-23
-------------------------------------------------------
TOTAL                        76     10    87%

Such reports help to check test cases themselves and significantly improve the quality of ones. The cover plugin uses Coverage tool behind the scene, so you have to manually install it.

Nose is perfectly integrated with Setuptools. By the way, that is another reason to use the last one. I prefer to store Nose settings in the setup.cfg file, which usually looks like this:

[nosetests]
verbosity=2
with-doctest=1
with-coverage=1
cover-package=coolproject
cover-erase=1

It makes Nose usage very simple:

$ nosetests
tests.test1 ... ok
tests.test2 ... ok
tests.test3 ... ok
Doctest: coolproject.function1 ... ok
Doctest: coolproject.module.function2 ... ok

Name                      Stmts   Miss  Cover   Missing
-------------------------------------------------------
coolproject                  20      4    80%   33-35, 39
coolproject.module           56      6    89%   17-23
-------------------------------------------------------
TOTAL                        76     10    87%
Ran 5 tests in 0.021s

OK

Mocks

There is no way to write tests without mocks. At the most cases, Mock library is all what you need. However, there are other useful libraries, that can be helpful in particular cases.

  • Venusian can help to mock decorated functions deferring decorator action on the separate step.
  • FreezeGun is a neat mock for date and time. There is nothing you cannot do using Mock library, but it has already been done. So, just use it.
  • Responses is a mock for Requests library. If you develop client for third-party REST-service using Requests, that is what you need.

Additionally, I strongly recommend to look over the perfect article Python Mock Gotchas by Alex Marandon.

Tox

Tox gets things together and makes them run against different Python versions. It is like a command center for all testing infrastructure. It automatically creates virtual environments for specified Python versions, installs test dependencies and runs tests. And all of these is done using single command tox.

For example, tox.ini described bellow sets up testing for Python 3.3, 3.4, and PyPy, using Nose for unit tests and Flake8 for static analysis of source code of project itself, as well as source code of unit tests.

[tox]
envlist=py33,py34,pypy

[testenv]
deps=
    nose
    coverage
    flake8
commands=
    nosetests
    flake8 coolproject
    flake8 tests

The usage of this tool is not limited by tests only. But it deserves a separate article, so I will write it soon.

Conclusion

I am pretty sure the list above is not complete. And there is a lot of awesome testing libraries that make life easier. So, post your links in the comments. I will try to keep the article updated.

So, a product owner created a ticket with the text from headline of the article. You got it into sprint and started to work on it. How will you implement it? Usually we think that e-mail notification is something asynchronous. We can send it anytime. Commonly at night, when our server is under low load. User will read it in some time, but not right after you have sent it. It was true before we have gotten smartphones. Nowadays most of e-mails come instantly with loud notification sound. And user will not pleasured to hear this at sleep time. Therefore I think it is not a good idea to send e-mails at night. So next time you will implement such feature think about it.

P.S. Thanks to LinkedIn for this article. I am very happy to get notifications at 3AM when someone from my connections is celebrating a work anniversary. That is completely what I need to have a good night. And guys... I pointed my location in the profile. It would help to detect my time zone.

Since I started developing in Python, I use Setuptools in each of my projects. So I was sure that this approach is obvious and nobody needs an explanation what benefits it brings. I think, it happened because the first thing I’ve learned was Pylons web framework. There was no other way developing project rather than using Setuptools. However I was wondered to know how much people develop applications without packaging and get troubles, which already solved in Setuptools.

Let’s consider a typical application. It usually consists of single package that includes a number of modules and subpackages:

MyProject/                  # a project folder
    myproject/              # a root package of the project
        subpackage/
            __init__.py
            module3.py
            module4.py
        __init__.py
        module1.py
        module2.py

If you are going to use Setuptools, you have to add at least a single file at the project folder—setup.py with the following contents:

from setuptools import setup

setup(
    name='MyProject',
    version='0.1',
    description='42',
    long_description='Forty two',
    author='John Doe',
    author_email='jdoe@example.com',
    url='http://example.com',
    license='WTFPL',
    packages=['myproject'],
)

This script adds metadata to the project and tells Python how to install it. For example the following command installs project into Python site-packages:

$ python setup.py install

...and this one installs it in development mode, i.e. creates a link to the code instead of copy it:

$ python setup.py develop

If you ever developed a library and published it on PyPI the code above should be familiar to you. So I’m not going to discuss why do you need Setuptools in library development process. What I’m going to consider is why do you need Setuptools in application development. For example you develop a web site. It should work on your local development environment and production one. You are not going to distribute it via PyPI. So why do you need to add extra steps of deployment—packaging and installation? What issues Setuptools can solve?

Mess with import path

Each application has at least one main module. This module is usually executable script or contains a special object, which will be used by third-party applications. For example, uWSGI application server requires a module with a callable object application which will be served by uWSGI. Obviously, this module should import another ones from the project. Because of this, it usually contains dirty hacks around sys.path. For example, if module1.py from the example above is executable, it might contain the following patch:

#!/usr/bin/python

import os
import sys

# Makes ``myproject`` package discoverable by Python
# adding its parent directory to import path
root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path = [root] + sys.path

from myproject import module2

And the relative import just doesn’t work:

from . import module2
# You will get
# ValueError: Attempted relative import in non-package

When your application is installed into Python site-packages using Setuptools, you will never have the problems with importing modules. Any import relative or absolute just works. And completely no hacks.

Executable scripts

There is one more trouble with executable scripts. You will have to specify a path when you call it:

$ /path/to/myproject/dosomething.py

...or create a symlink to /usr/bin

$ sudo ln -s /path/to/myproject/dosomething.py /usr/bin/dosomething
$ dosomething

Setuptools automate this routine. You just need to add a special entry point into setup() function:

setup(
    # ...
    entry_points="""
    [console_scripts]
    dosomething = myproject.module1:dosomething
    """,
)

It creates a console script dosomething that will call dosomething() function from module myproject.module1 each time the script is executed. And this feature even works in the virtual environment. As soon as you activate virtual environment, each executable script will be available at console.

Entry points

Entry points are not limited by creating console scripts. It is a powerful feature with a lot of use cases. In a nutshell, it helps packages to communicate each other. For example, an application can scan packages for a special entry point and use them as plugins.

Entry points are usually described using ini-file syntax. Where section name is entry point group name, key is entry point name, and value is Python path to target object, i.e.:

[group_name]
entry_point_name = package.module:object

For instance, application can discover entry points from group myproject.plugins to load plugins defined in separate packages:

import pkg_resources

plugins = {}
for entry_point in pkg_resources.iter_entry_points('myproject.plugin'):
    plugins[entry_point.name] = entry_point.load()

Another use case is to make your application pluggable. For example, the common way to deploy Pyramid applications is using PasteDeploy-compatible entry points, which return WSGI application factory:

[paste.app_factory]
main = myproject.wsgi:get_application

Requirements and wheels

You can also specify application requirements in the setup() function:

setup(
    # ...
    install_requires=['Pyramid', 'lxml', 'requests'],
)

The third-party packages will be downloaded from PyPI each time you install the application. Additionally you can use wheels. It helps to speedup installation process dramatically and also freeze versions of third-party packages. Make sure you are using latest version of Setuptools, Pip and Wheel:

$ pip intall -U pip setuptools wheel

Then pack your application with its dependencies into wheelhouse directory using the following script:

#!/bin/bash

APPLICATION="MyProject"
WHEELHOUSE="wheelhouse"
REQUIREMENTS="${APPLICATION}.egg-info/requires.txt"

python setup.py bdist_wheel

mkdir -p "${WHEELHOUSE}"
pip wheel \
    --use-wheel \
    --wheel-dir "${WHEELHOUSE}" \
    --find-links "${WHEELHOUSE}" \
    --requirement "${REQUIREMENTS}"

cp dist/*.whl "${WHEELHOUSE}/"

Now, you can copy wheelhouse directory to any machine and install your application even without an Internet connection:

$ pip install --use-wheel --no-index --find-links=wheelhouse MyProject

Want more?

The features described above are not only available ones. A lot of other cool things you can find in the official documentation. I hope I've awoken your interest.

My enthusiasm in learning D is contagious. Some of my colleagues ask me from time to time about useful resources. Here is a list of ones.

  • APT repository for D—if you are an Ubuntu fan (like me), you don’t need an explanation what is it. Here you can get latest stable DMD compiler and some useful libraries and tools.
  • DUB—it is a build tool with support of managing dependencies. Its features are similar to Maven for Java and Pip for Python.
  • Derelict—awesome collection of binding to popular C libraries. Useful in game development.
  • Vibe.d—the web framework, nuff said. Frankly, I haven’t spend a lot of time fiddling with it, but it looks promising.
  • Phobos—the standard library. It is not so diversified as Python one, but it is powerful enough. By the way, if you dream of reinventing the weel, it is your chance! There is still a lot of work.
  • The D Programming Language by Andrei Alexandrescu—the book you must read.

That is all for now. Wish you happy hacking!

What is coroutine? Complete explanation you can find in David Beazley’s presentation—“A Curious Course on Coroutines and Concurrency.” Here is my rough one. It is a generator which consumes values instead of emits ones.

>>> def gen():  # Regular generator
...     yield 1
...     yield 2
...     yield 3
...
>>> g = gen()
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
>>> g.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> def cor():  # Coroutine
...     while True:
...         i = yield
...         print '%s consumed' % i
...
>>> c = cor()
>>> c.next()
>>> c.send(1)
1 consumed
>>> c.send(2)
2 consumed
>>> c.send(3)
3 consumed

As you can see yield statement can be used with assignment to consume values from outer code. An obviously named method send is used to send value to coroutine. Additionally coroutine should be “activated” by calling next method (or __next__ in Python 3.x). Since coroutine activation may be annoying, the following decorator is usually used for this purposes.

>>> def coroutine(f):
...     def wrapper(*args, **kw):
...         c = f(*args, **kw)
...         c.send(None)    # This is the same as calling ``next()``,
...                         # but works in Python 2.x and 3.x
...         return c
...     return wrapper

If you need to shutdown coroutine, use close method. Calling it will raise an exception GeneratorExit inside coroutine. It will raise also, when coroutine is destroyed by garbage collector.

>>> @coroutine
... def worker():
...     try:
...         while True:
...             i = yield
...             print "Working on %s" % i
...     except GeneratorExit:
...         print "Shutdown"
...
>>> w = worker()
>>> w.send(1)
Working on 1
>>> w.send(2)
Working on 2
>>> w.close()
Shutdown
>>> w.send(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> w = worker()
>>> del w  # BTW, it will not be passed in PyPy. You should explicitly call ``gc.collect()``
Shutdown

This exception cannot be “swallowed”, because it will cause of RuntimeError exception. Catching it should be used for freeing resources only.

>>> @coroutine
... def bad_worker():
...     while True:
...         try:
...             i = yield
...             print "Working on %s" % i
...         except GeneratorExit:
...             print "Do not disturb me!"
...
>>> w = bad_worker()
>>> w.send(1)
Working on 1
>>> w.close()
Do not disturb me!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: generator ignored GeneratorExit

That is all what you need to know about coroutines to start using them. Let’s see what benefits they give. In my opinion, a single coroutine is useless. The true power of coroutines comes when they are used in pipelines. A simple abstract example: filter out even numbers from input source, then multiply each number on 2, then add 1.

>>> @coroutine
... def apply(op, next=None):
...     while True:
...         i = yield
...         i = op(i)
...         if next:
...             next.send(i)
...
>>> @coroutine
... def filter(cond, next=None):
...     while True:
...         i = yield
...         if cond(i) and next:
...             next.send(i)
...
>>> result = []
>>> pipeline = filter(lambda x: not x % 2, \
...            apply(lambda x: x * 2, \
...            apply(lambda x: x + 1, \
...            apply(result.append))))
>>> for i in range(10):
...     pipeline.send(i)
...
>>> result
[1, 5, 9, 13, 17]

Schema of pipeline

Schema of pipeline

But the same pipeline can be implemented using generators:

>>> def apply(op, source):
...     for i in source:
...         yield op(i)
...
>>> def filter(cond, source):
...     for i in source:
...         if cond(i):
...             yield i
...
>>> result = [i for i in \
...     apply(lambda x: x + 1, \
...     apply(lambda x: x * 2, \
...     filter(lambda x: not x % 2, range(10))))]
>>> result
[1, 5, 9, 13, 17]

So what the difference between coroutines and generators? The difference is that generators can be connected in straight pipeline only, i.e. single input—single output. Whereas coroutines may have multiple outputs. Thus they can be connected in really complicated forked pipelines. For example, filter coroutine could be implemented in this way:

>>> @coroutine
... def filter(cond, ontrue=None, onfalse=None):
...     while True:
...         i = yield
...         next = ontrue if cond(i) else onfalse
...         if next:
...             next.send(i)
...

But let’s see an another example. Here is the mock of distributed computing system with cache, load balancer, and three workers.

def coroutine(f):
    def wrapper(*arg, **kw):
        c = f(*arg, **kw)
        c.send(None)
        return c
    return wrapper


@coroutine
def logger(prefix="", next=None):
    while True:
        message = yield
        print("{0}: {1}".format(prefix, message))
        if next:
            next.send(message)


@coroutine
def cache_checker(cache, onsuccess=None, onfail=None):
    while True:
        request = yield
        if request in cache and onsuccess:
            onsuccess.send(cache[request])
        elif onfail:
            onfail.send(request)


@coroutine
def load_balancer(*workers):
    while True:
        for worker in workers:
            request = yield
            worker.send(request)


@coroutine
def worker(cache, response, next=None):
    while True:
        request = yield
        cache[request] = response
        if next:
            next.send(response)


cache = {}
response_logger = logger("Response")
cluster = load_balancer(
    logger("Worker 1", worker(cache, 1, response_logger)),
    logger("Worker 2", worker(cache, 2, response_logger)),
    logger("Worker 3", worker(cache, 3, response_logger)),
)
cluster = cache_checker(cache, response_logger, cluster)
cluster = logger("Request", cluster)


if __name__ == "__main__":
    from random import randint


    for i in range(20):
        cluster.send(randint(1, 5))

Schema of the mock

Distributed computing system mock

To start love coroutines try to implement the same system without them. Of course, you can implement some classes to store state in the attributes and do work using send method:

class worker(object):

    def __init__(self, cache, response, next=None):
        self.cache = cache
        self.response = response
        self.next = next

    def send(self, request):
        self.cache[request] = self.response
        if self.next:
            self.next.send(self.response)

But I dare you to find a beautiful implementation for load balancer in this way!

I hope I persuaded you that coroutines are cool. So if you are going to try them, take a look at my library—CoPipes. It will be helpful to build really big and complicated data processing pipelines. Your feedback is desired.