17 April 2015

Packaging Python programs - PyPI packages

Once you have finished your program you probably will want to share it. There are few ways to share a Python program:
  • Zip your program and send it throught email: I'd only use this method for very short scripts that only use the built-in libraries in standard Python distribution. 
  • Place it in a web repository like GitHub or BitBucket: It's a good option if you want to share your program with other developers in order to let them make code contributions. We reviewed that option in my tutorial about Mercurial
  • Place it in Python Package Index (PyPI): It's the obvious choice if you want to share your program with other python developers not to let them modify your code but to use it as a library in their own python developments. The good point with this option is that you can install dependencies automatically.
  • Bundle your program in a native package for Linux distributions: In another article I'm going to explain ways to bundle our python program into a RPM package (for Fedora/RedHat/Suse distributions) or a DEB package (for Ubuntu/Debian distributions).
In this article I'm going to explain the third method: using Python Package Index (PyPI).

PyPI is the official python package repository and it's maintained by Python Software Foundation. So it's the central repository of reference for nearly all python packages publicly available. PyPi has a webpage from where you can download any package manually but usually you'd better use pip or easy_install to install packages from your console prompt. I usually prefer pip, because it's included by default in later Python 3.x versions and besides it's the tool recommended by Python Packaging Authority (PyPA), the main source about best practices about python packaging. Using pip you can download and install any python package from PyPI repositories, along any needed dependency.

You can upload two main types of python packages to PyPI:
  • Wheel packages: It's a binary package format intended to replace older Egg format. It's the recommended format for python packages uploaded to PyPI.
  • Source distribution packages: It's a package that has to be compiled in installer's end. Wheel format is faster and easier to install because it's precompiled in packager's end. A good reason to choose sources packages instead of wheels ones could be to include C extensions, because those should be compiled in users end.
Wheel packages are the recommended format for main users but source distribution are useful to build native packages for popular linux distros (deb packages for Debian/Ubuntu, rpm packages for Red Hat/Fedora/Suse, etc.). We are going to see both of them in this article.

First thing you should get to make sure your application is ready to be packaged is to structure it in an standard folder tree structure. You can order your files inside your project in the way you want but if you review some of the more popular python projects in Bitbucket or GitHub you'll that they follow a similar way to place their files across their folder structure. That way is a best practice that Python people has been adopting as time passed. To see and example of that structure you can check a sample project structure developed by PyPA, Following that best practice you are supposed to put your installation script and all files describing your project at the project root folder. Your application files (the ones you actually develop) should be inside a folder called as the project. Other files related to your development, but not the developed application itself, should be in their own folders at the same level than your application folder but not inside it.

Let's see another example: check Geolocate GitHub repository. There you can see that I put compiling, packaging and installation scripts at root level, along with files like README.rst or REQUIREMENTS.txt that describe the application. Development files are inside application folder (geolocate folder) instead. Some people prefer to place their unittest files in their own folder outside application folder and other prefer to put them inside it. If tests are not intended to be distributed to final users I think it's better to keep them off application folder. In case of Geolocate they are inside to solve some problems with imports, but now I know the causes so I guess in my next project I will keep my tests appart in their own folder.

Once you have structured your project in an standard folder tree, it is a good idea to create a virtualenv to run your application. If you don't know what a virtualenv is, take a look of this tutorial. That virtualenv would let you define the list of python packages your application needs as dependencies and export that list to a REQUIREMENTS.txt file, as is explained here. That file will be really useful when you write your setup.py script, as I'm going to explain.

Setup.py script is the most important file to create PyPI compatible packages. It serves two primary functions:
  • It's the file where various aspects of your project are configured. This scripts contains a global function: setup(). The arguments you pass to that function defines details of your application: namely application author, version, description, license, etc.
  • It's the command line interface for running various commands that relate to packaging tasks.
You can get a listing of commands available through setup.py running:
dante@Camelot:~/project-directory$ python setup.py --help-commands

Setup.py depends on setuptools python package, so you have to be sure to have it installed. 

In this article I'm going to explain you how to write a functional setup.py script using as a guideline the setup.py file at geolocate. As you can see in that example, file is essentially simple: just import setup function from setuptools package and call it. The real customization comes with parameters we pass to setup(). Let's see those parameters:
  • "name": It is the package name as it is going to be identified in PyPI repository. You'd better check if your desired name is already used in PyPI before deciding your final application name.  I developed geolocate just to find that name was already used by another package in PyPI, so I had to name the package glocate although its inside executable was still named as geolocate. It was a dirty solution but next time I'll do better. 
  • "version": Package version. Try to keep it updated to let your user upgrade their package instance downloading them from PyPI.
  • "description": It's the short description that will be shown in your package page at PyPI. Try to keep it short and descriptive.
  • "long description": This is the long version of your description. Here you can put more detail. Two or three paragraphs is right.
  • "author": Your name (real or nickname).
  • "author_email": Email where you want to be contacted for things related with this application.
  • "license": Name of the license you have chosen.
  • "url": Website's URL for this application.
  • "download_url": I put here the website's url where you can find linux distro dependant versions of this package.
  • "classifiers": Categories to classify this application. Its important to set them because they help users to find the application they need when they search in PyPI database. You can find a full listing of available classifiers here.
  • "keywords": List of keywords that describe your project.
  • "install_requires": Here you place the list of dependencies you exported to REQUIREMENTS.txt.
  • "zip_safe": As an optimization, PyPI packages can be installed in a compressed format so they consume less hard disk space. Problem is that some applications don't work well that way, so I prefer set this to false.
  • "packages": It's required to list packages to be included in your project. Although they can be listed manually I prefer setuptools.find_packages() to find them automatically. "Exclude keyword is supposed to allow you omit packages that are not intended to be released and installed. Problem is that that keyword doesn't actually work because of a bug. We'll see the workaround in this article.
  • "entry_points": Here you define which function, inside your scripts, will be called by user. I use it to define console_scripts. With console-scripts setuptool "compiles" the called script making it a linux executable. For instance, if you define the main function inside your_script.py you can get a your_script executable with no py extension that can be executed directly.
  • "package_data": By default, setup only includes python files in your package. If your application contains other file types that are called by your python packages you shoud use this keyword to make they are included to. You set package_data to a python dictionary. Each key is one of you application package a it value is a list of relative path names that should be copied into the package. The paths are interpreted as relative to the directory containing the package. Setup.py is not able to create empty folders to place there files after installation, so the workaround is to create a dummy empty file in that folder and include that file in installble with this keyword.
Some people use "data_files" keyword to include files that are not placed inside any of their application python packages. Problem I found with this approach is that those files end installed in platform dependent paths so it's really hard for your scripts to find them when they are run after installation. That's why I prefer to put my files inside my packages and use "package_data" keyword instead.

Once you have written your setup.py file you can compile your packages, but you might want to check the sanity of your setup.py before. Pyroma analyzes if your setup.py complies with recognized good packaging practices, alerting you if doesn't.

If you are happy with your setup.py configuration you can create a source package just doing:
dante@Camelot:~/project-directory$ python setup.py sdist

While to create a wheel package you are only supposed to do:
dante@Camelot:~/project-directory$ python setup.py bdist_wheel

When you use setup.py to create your packages, it will create a dist folder (in the same folder as setup.py) and place packages there.

Problem arises when you try to use find_packages function, for your packages keyword, along with exclude argument. I've found that in that particular case, exclude argument doesn't work and your undesired files get included into the package. It happens this behavior is a bug, and while they fix it the workaround involves first creating source package and afterwards creating wheel package from source one with this command:

dante@Camelot:~/project-directory$ pip wheel --no-index --no-deps --wheel-dir dist dist/*.tar.gz

Your packages can be locally installed with pip:
dante@Camelot:~/project-directory$ pip install dist/your_package.whl

Trying to install locally your package in a freshly created virtualenv is a good way to check installation really works as expected.

To share your package you can send it through email or make it available in a web server, but the pythonic preferred way is make it publicly available through PyPI.

PyPI has two main web sites:

  • PyPI test site: This site is cleaned on a semiregular basis.Before releasing your package on main PyPI site you might prefer training on test site. You can try to download your package from PyPI test site but it can happen your installation fails because dependencies cannot be downloaded from the same site. That is because PyPI test site doesn't have the entire packages database, it only has packages that people have uploaded to test them. 
  • PyPI main site: It has the entire package database. If your dependencies cannot be downloaded from here them you are doing somethig wrong.
To use any of the two interfaces you have to register. Be aware that user database is not shared between two sites so you'll have to register twice. In register page you only have to fill the submission form with your project details. Don't be stressed you can modify any of the fields afterwards.

After registering you can submit your packages files through PyPI web interface but you might want a higher level of automation. To be able to submit files from console (or from any script), you'll need to create a .pypirc in your home folder (notice the dot before the file name), with this content:

repository = https://pypi.python.org/pypi
username = <username>
password = <password>

Afterwards you can run twine command to upload your packages to PyPI:
dante@Camelot:~/project-directory$ twine upload dist/*

You may need to install twine with pip before using it if your system has not installed yet. Twine will use your credentials stored in ~/.pypirc to connect to PyPI and upload your packages. Twine uses SSL to cipher its connections so it's a safer alternative to other options like using: "python setup.py upload".

After that your packages will be available through PyPI and anyone will be able to install it just doing:

dante@Camelot:~/project-directory$ pip install your_package

1 comment: