28 December 2013

Documenting code with Sphinx

When writing code that will be used by others is critical to document it properly . This will help the users to get the most of our code and understand the design choices taken.

However, documenting code can be a tedious task. It is difficult to get programmers comment properly their source code so it's almost impossible to get them keep updated their code and a separate document with explanations about using functions and objects. Therefore, there are many tools to generate code from the comments captured in the very source code . Thus programmers only have to update information in one place ( the source) which facilitates keeping it updated .

In Python world , one of the most used tools for these purposes is Sphinx , which generates a documentation similar to the Python documentation format (in fact this one has also been generated with Sphinx) . To generate the documentation , Sphinx inspect the source code directory, detected all the packages , modules, classes and functions placed there, and associates their respective docstrings to each of them.

A docstring is a comment that is placed in the first line of the module, function or class it comments . Is enclosed between two groups of three double quotes ( "" "... " "") and unlike "programmer" comments  (initiated with a hash #), which are simply ignored by the Python interpreter , the docstrings are stored in the attribute __ doc__ of commented element (remember that even Python functions are objects which may have their own attributes). While the developer comments are intended to be read by those who will directly modify the code, the docstring are more geared to those who will make use of the code. So while the focus of developer comments is to explain why the code programmed in a certain way , the docstring is often used to explain how to use each module, function or class. The PEP- 8 provides good practice writing Python code and redirects to PEP- 257 for the specific case of docstrings .

Actually , there is no single way to write docstrings and even following good practice the topic is wide open . Google recommends a format for docstrings in which clarity prevails for direct reading, however if you want to generate documentation automatically with Sphinx you must follow a format that is perhaps not so clear at first glance, but that uses tags identified by Sphinx to build documentation based on docstrings .

Let's begin explaining how to install Sphinx . The source code of the program can be obtained from the repository at Bitbucket. From there you can download the program in a single compressed package. It is too in the official Ubuntu repositories with the name " python- sphinx", so to install it is enough to do a simple:
dante@Camelot:~$ sudo aptitude install python-sphinx

In Windows, it is best to use pip or easy_install (it is included in setuptools) since Sphinx is in the official pypi repository.

Once installed, Sphinx is used with a few commands. To begin, we must generate the work directory for Sphinx. Let's follow an example where our source files are in the folder /pyalgorithm, there I've stores a library called PyAlgorithm that is being developed by a friend, so that the contents of the folder is as follows:
dante@Camelot:~/pyalgorithm$ ls __init__.py  __init__.pyc  __pycache__  checkers  exceptions.py  exceptions.pyc  multiprocessing  stats  time

All items except those with extension are directories with source code.

From there we will launch the wizard for initial configuration for our project Sphinx, sphinx-quickstart:
dante@Camelot:~/pyalgorithm$ sphinx-quickstart Welcome to the Sphinx 1.1.3 quickstart utility. Please enter values for the following settings (just press Enter to accept a default value, if one is given in brackets). Enter the root path for documentation. > Root path for the documentation [.]: docs                                                                                                                                                                                                                   You have two options for placing the build directory for Sphinx output.                                                                                                                                           Either, you use a directory "_build" within the root path, or you separate                                                                                                                                       "source" and "build" directories within the root path.                                                                                                                                                           > Separate source and build directories (y/N) [n]:                                                                                                                                                                                                                                                                                                                                                                                 Inside the root directory, two more directories will be created; "_templates"                                                                                                                                     for custom HTML templates and "_static" for custom stylesheets and other static                                                                                                                                   files. You can enter another prefix (such as ".") to replace the underscore.                                                                                                                                     > Name prefix for templates and static dir [_]:                                                                                                                                                                                                                                                                                                                                                                                     The project name will occur in several places in the built documentation.                                                                                                                                         > Project name: PyAlgorithm                                                                                                                                                                                   > Author name(s): Nobody                                                                                                                                                                                     Sphinx has the notion of a "version" and a "release" for the software. Each version can have multiple releases. For example, for Python the version is something like 2.5 or 3.0, while the release is something like 2.5.1 or 3.0a1.  If you don't need this dual structure, just set both to the same value. > Project version: 1 > Project release [1]: The file name suffix for source files. Commonly, this is either ".txt" or ".rst".  Only files with this suffix are considered documents. > Source file suffix [.rst]: One document is special in that it is considered the top node of the "contents tree", that is, it is the root of the hierarchical structure of the documents. Normally, this is "index", but if your "index" document is a custom template, you can also set this to another filename. > Name of your master document (without suffix) [index]: Sphinx can also add configuration for epub output: > Do you want to use the epub builder (y/N) [n]: Please indicate if you want to use one of the following Sphinx extensions: > autodoc: automatically insert docstrings from modules (y/N) [n]: y > doctest: automatically test code snippets in doctest blocks (y/N) [n]: > intersphinx: link between Sphinx documentation of different projects (y/N) [n]: > todo: write "todo" entries that can be shown or hidden on build (y/N) [n]: > coverage: checks for documentation coverage (y/N) [n]: > pngmath: include math, rendered as PNG images (y/N) [n]: > mathjax: include math, rendered in the browser by MathJax (y/N) [n]: > ifconfig: conditional inclusion of content based on config values (y/N) [n]: > viewcode: include links to the source code of documented Python objects (y/N) [n]: y A Makefile and a Windows command file can be generated for you so that you only have to run e.g. `make html' instead of invoking sphinx-build directly. > Create Makefile? (Y/n) [y]: > Create Windows command file? (Y/n) [y]: Creating file docs/conf.py. Creating file docs/index.rst. Creating file docs/Makefile. Creating file docs/make.bat. Finished: An initial directory structure has been created. You should now populate your master file docs/index.rst and create other documentation source files. Use the Makefile to build the docs, like so:    make builder where "builder" is one of the supported builders, e.g. html, latex or linkcheck. dante@Camelot:~/pyalgorithm$

As you can see the wizard is very simple and most of the time we leave the default options just with a couple of notable exceptions:
  1. If you do not want our documentation files get mixed with source code you must answer the question "Root path for the documentation [.]" With the directory you want to be created, inside /pyalgorithm, to store Sphinx work files and documentation generated.
  2. On the other hand, we must be sure to answer "Yes" to the question "doc: automatically insert docstrings from modules (y / N) [n]:" since autodoc is actualle the extension that Sphinx  uses to assess docstrings and generate documentation.
We see that the wizard created in the docs directory the next files:
dante@Camelot:~/source$ ls __init__.py  __init__.pyc  __pycache__  checkers  docs  exceptions.py  exceptions.pyc  multiprocessing  stats  time

And 4 files inside it :

  • conf.py : It is a python file where the configuration of Sphinx is stored. If you ever wanted to change some of the choices you made with sphinx-quickstart you'd do it here. For example, if we want to change the version numbers we would set variables release and version of that file . Actually, the only thing we need to change in that file is uncomment a line at the top that says " sys.path.insert (0, os.path.abspath () '. ')" . That line serves to Sphinx to find the elements of our source code. We have two options: putting the path to your source code directory on Sphinx (in our example docs/ ) or putting the absolute path. In the first case, in our example the file is located in /pyalgorithm/docs/conf.py so we would put : sys.path.insert (0, os.path.abspath ( ' .. / .. /)) . If we want to use the absolute path we would have to remove the part of abspath () and put the path directly: sys.path.insert (0, " path to conf.py ").
  • Makefile : When generating documentation Sphinx uses make. This Makefile has what makes needs to render documents in multiple formats : html, pdf, latex, epub , etc. . Thus, to generate documentation in html simply do: make html. The documentation generated is stored in the build directory, inside the Sphinx .
  • make.bat : make executable for Windows users.
  • index.rst : Main template of the generated documentation. It is based in reStructuredText format. This format allows you to create templates that serve Sphinx to structure information extracted from the source files .

If at this point you run "make html" you will see that some html files are generated, but if we open the index file we would see it is empty. This is because Sphinx expects to find a file . rst for each of the packages you want to document. We can create them by hand, but it is more convenient to use the sphinx-apidoc command, which examines the source directory and creates a file . rst for each packet it detects. Its use is very simple, in our case it would suffice to do :
dante@Camelot:~/pyalgorithm$ sphinx-apidoc -o docs .
The -o option sets the directory in which the .rst file is to be created, and next parameter is the directory we want sphinx-apidoc start looking. In our example the following files are generated:
dante@Camelot:~/pyalgorithm$ ls *.rst index.rst pyalgorithm.checkers.rst pyalgorithm.rst pyalgorithm.time.rst modules.rst  pyalgorithm.multiprocessing.rst  pyalgorithm.stats.rst
With these files we could do "make html" and see that many html files are generated (actually one per file .rst). However, the file index.html continue showing an empty page. That's because we have to configure it to include links to other pages. In our case, the documentation refers to a library whose root is pyalgorithm, so in index.rst we do include reference to it:

Welcome to PyAlgorithm's documentation!
=======================================
Contents:
.. toctree::
   :maxdepth: 3

   pyalgorithm
When Sphinx finds that reference, it will search pyalgorithm.rst file and will render its content in index.html, doing the same recursively with all the references found pyalgorithm.rst and dependant .rst files. Sphinx will stop rendering when maximum depth (defined by parameter "maxdepth" is reached). Following our example, we have seen the content of index.rst, if you define a maxdepth 3 in index.rst, and the content of pyalgorithm.rst is:

pyalgorithm package
===================
Subpackages
-----------
.. toctree::
    pyalgorithm.checkers
    pyalgorithm.multiprocessing
    pyalgorithm.stats
    pyalgorithm.time
pyalgorithm.exceptions module
-----------------------------
.. automodule:: pyalgorithm.exceptions
    :members:
    :undoc-members:
    :show-inheritance:
 Rendered pages would be:



Whereas if we set it to 4, Sphinx would assess al .rst files listed in pyalgorithm.rst so rendered page would be:


You can set the level you want. Once the maximum level is reached you have to click on each link to see additional sublevels that have included.

The .rst files that refer to packages with modules feature a series of tags Sphinx to which you should pay special attention, let's see an example:

pyalgorithm.checkers package
============================
pyalgorithm.checkers.input_decorators module
--------------------------------------------
.. automodule:: pyalgorithm.checkers.input_decorators
    :members:
    :undoc-members:
    :show-inheritance:

This .rst file generated by sphinx-apidoc refers to a package ("pyalgorithm.checkers ") in which modules have been detected (in this case "input_decorators") . The .rst file explains how to generate the documentation associated with that module. The tag automodule belongs to labels used by autodoc Sphinx extension (along with labels autofunction and autoclass ) and tells Sphinx to document the module taking its docstring usually placed at the beginning of the element they are commenting. The autoclass and autofunction labels do the same but at the class and function respectively. The members tag makes autodoc to include in the documentation the "public" members of the element , ie those attributes whose name is preceded by an underscore ( "_") , while "undoc-members" asks to include also in documentation the elements laking a docstring. If we wanted to show private items (those that start with " _" or " __" ) we should use the label "private- members" . If you only want to show the public members you will find that class constructors are not shown, so to make them appear in the documentation you must include the label "special- members: __ init__ " next to the "members" . As for the "show- inheritance" label what it does is to show the list of base classes from which it inherits,  placing that list under the class declaration . These are labels that includes sphinx-apidoc, but there are many more . We are free to include more labels and formatting options to make our documentation show the information we want in the format that we like. We can also change the text files rst since these are mere templates, what we put in them will be added to automatically generate Sphinx after doing the " make".

Labels can also include Sphinx docstrings for help to distinguish which refers each text element. Let's see how a function would be documented:

 def apply_async(self, func, args):
        """Insert a function and its arguments in process pool.
   
        Input is inserted in queues using a round-robin fashion. Every job is
        identified by and index that is returned by function. Not all parameters
        of original multiprocessing.Pool.apply_aync are implemented so far.
   
        :param func: Function to process.
        :type func: Callable.
        :param args: Arguments for the function to process.
        :type args: Tuple.
        :returns: Assigned job id.
        :rtype: Int.
        """ 
Sphinxs tags used in last example are:
  • param : Identifies an argument in the function declaration.
  • type : Expected argument type.
  • returns: What function returns.
  • rtype: Expected returned value type.
Last docstring would render in this way:


 I do not want to finish without commenting a problem that I've found when using Sphinx on a directory with folders with unittest scripts. In such cases I have had to move the folder out of the scope that examines Sphinx to allow this work smoothly. If not Sphinxs crash at build time with an exception. This is only a rather crude workaround, if someone finds a more elegant solution will be happy to hear it.