Nothing Special   »   [go: up one dir, main page]

Cython Cise PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Cython: The best of both worlds

Stefan Behnel, Senacor Technologies AG Germany


Robert Bradshaw, Google USA
Craig Citro, Google USA
Lisandro Dalcin, National University of the Littoral Argentina
Dag Sverre Seljebotn, University of Oslo Norway
Kurt Smith, University of Wisconsin-Madison USA

This article is published in IEEE Computing in Science and Engineering. Please refer to
the published version if accessible, as it contains editors improvements. (c) 2011 IEEE.
Permalink: http://dx.doi.org/10.1109/MCSE.2010.118

Abstract the need for many such loops, there are always
going to be computations that can only be ex-
Cython is an extension to the Python language pressed well through looping constructs. Cython
that allows explicit type declarations and is com- aim to be a good companion to NumPy for such
piled directly to C. This addresses Pythons large cases.
overhead for numerical loops and the difficulty of
efficiently making use of existing C and Fortran Given the magnitude of existing, well-tested code
code, which Cython code can interact with na- in Fortran and C, rewriting any of this code in
tively. The Cython language combines the speed Python would be a waste of our valuable re-
of C with the power and simplicity of the Python sources. A big part of the role of Python in sci-
language. ence is its ability to couple together existing com-
ponents instead of reinventing the wheel. For in-
stance, the Python-specific SciPy library contains
Introduction over 200 000 lines of C++, 60 000 lines of C, and
75 000 lines of Fortran, compared to about 70 000
Pythons success as a platform for scientific com-
lines of Python code. Wrapping of existing code
puting to date is primarily due to two factors.
for use from Python has traditionally been the
First, Python tends to be readable and concise,
domain of the Python experts, as the Python/C
leading to a rapid development cycle. Second,
API has a high learning curve. While one can use
Python provides access to its internals from C via
such wrappers without ever knowing about their
the Python/C API. This makes it possible to in-
internals, they draw a sharp line between users
terface with existing C, C++, and Fortran code,
(using Python) and developers (using C with the
as well as write critical sections in C when speed
Python/C API).
is essential.
Cython solves both of these problems, by com-
Though Python is plenty fast for many tasks, low-
piling Python code (with some extensions) di-
level computational code written in Python tends
rectly to C, which is then compiled and linked
to be slow, largely due to the extremely dynamic
against Python, ready to use from the interpreter.
nature of the Python language itself. In particu-
Through its use of C types, Cython makes it
lar, low-level computational loops are simply in-
possible to embed numerical loops, running at
feasible. Although NumPy [NumPy] eliminates
C speed, directly in Python code. Cython also

Personal use of this material is permitted. Permission significantly lowers the learning curve for calling
from IEEE must be obtained for all other users, includ- C, C++ and Fortran code from Python. Using
ing reprinting/ republishing this material for advertising
or promotional purposes, creating new collective works for
Cython, any programmer with knowledge of both
resale or redistribution to servers or lists, or reuse of any Python and C/C++/Fortran can easily use them
copyrighted components of this work in other works. together.

1
In this paper, we present an overview of the Finally, numexpr1 and Theano2 are specialized
Cython language and the Cython compiler in tools for quickly evaluating numerical expressions
several examples. We give guidelines on where (see below).
Cython can be expected to provide signifi-
To summarize, Cython could be described as a
cantly higher performance than pure Python and
swiss army knife: It lacks the targeted function-
NumPy code, and where NumPy is a good choice
ality of more specialized tools, but its generality
in its own right. We further show how the Cython
and versatility allow its application in almost any
compiler speeds up Python code, and how it can
situation that requires going beyond pure Python
be used to interact directly with C code. We also
code.
cover Fwrap, a close relative of Cython. Fwrap
is used for automatically creating fast wrappers
around Fortran code to make it callable from C, Cython at a glance
Cython, and Python.
Cython is a programming language based on
Cython is based on Pyrex [Pyrex] by Greg Ew- Python, that is translated into C/C++ code, and
ing. Its been one of the more friendly forks in finally compiled into binary extension modules
open source, and we are thankful for Gregs co- that can be loaded into a regular Python ses-
operation. The two projects have somewhat dif- sion. Cython extends the Python language with
ferent goals. Pyrex aims to be a smooth blend explicit type declarations of native C types. One
of Python and C, while Cython focuses more can annotate attributes and function calls to be
on preserving Python semantics where it can. resolved at compile-time (as opposed to runtime).
Cython also contains some features for numeri- With the extra information from the annotations,
cal computation that are not found in Pyrex (in Cython is able to generate code that sidesteps
particular fast NumPy array access). While there most of the usual runtime costs.
is a subset of syntax that will work both in Pyrex
and Cython, the languages are diverging and one The generated code can take advantage of all the
will in general have to choose one or the other. optimizations the C/C++ compiler is aware of
For instance, the syntax for calling C++ code is without having to re-implement them as part of
different in Pyrex and Cython, since this feature Cython. Cython integrates the C language and
was added long after the fork. the Python runtime through automatic conver-
sions between Python types and C types, allowing
There are other projects that make possible the the programmer to switch between the two with-
inclusion of compiled code in Python (e.g. Weave out having to do anything by hand. The same
and Instant). A comparison of several such tools applies when calling into external libraries writ-
can be found in [comparison]. Another often used ten in C, C++ or Fortran. Accessing them is a
approach is to implement the core algorithm in native operation in Cython code, so calling back
C, C++ or Fortran and then create wrappers and forth between Python code, Cython code and
for Python. Such wrappers can be created with native library code is trivial.
Cython or with more specialized tools such as
SWIG, ctypes, Boost.Python or F2PY. Each tool Of course, if were manually annotating every
has its own flavor. SWIG is able to automatically variable, attribute, and return type with type in-
wrap C or C++ code while Cython and ctypes formation, we might as well be writing C/C++
require redeclaration of the functions to wrap. directly. Here is where Cythons approach of ex-
SWIG and Cython require a compilation stage tending the Python language really shines. Any-
which ctypes does not. On the other hand, if one thing that Cython cant determine statically is
gets a declaration wrong using ctypes it can result compiled with the usual Python semantics, mean-
in unpredictable program crashes without prior ing that you can selectively speed up only those
warning. With Boost.Python one implements a parts of your program that expose significant ex-
Python module in C++ which depending on ecution times. The key thing to keep in mind in
who you ask is either a great feature or a great this context is the Pareto Principle, also known
disadvantage. 1
http://code.google.com/p/numexpr/
2
http://deeplearning.net/software/theano/

2 IEEE Computing in science and Engineering


as the 80/20 rule: 80% of the runtime is spent in (int, double) are statically declared for some
20% of the source code. This means that a little variables, using Cython-specific syntax. The cdef
bit of annotation in the right spot can go a long keyword is a Cython extension to the language, as
way. is prepending the type in the argument list. In ef-
fect, Cython provides a mixture of C and Python
This leads to an extremely productive workflow in
programming. The above code is 30 times faster
Cython: users can simply develop with Python,
than the corresponding Python loop, and much
and if they find that a significant amount of time
more memory efficient (although not any faster)
is being spent paying Python overheads, they can
than the corresponding NumPy expression:
compile parts or all of their project with Cython,
possibly providing some annotations to speed up import numpy as np
the critical parts of the code. For code that y = scipy.special.gamma(
spends almost all of its execution time in libraries np.linspace(a, b, n, endpoint=False))
doing things like FFTs, matrix multiplication, or y *= ((b - a) / n)
result = np.sum(y)
linear system solving, Cython fills the same rapid
development role as Python. However, as you ex-
Cython especially shines in more complicated ex-
tend the code with new functionality and algo-
amples where for loops are the most natural or
rithms, you can do this directly in Cython and
only viable solution. Examples are given below.
just by providing a little extra type information,
you can get all the speed of C without all the
headaches. Writing fast high-level code
Python is a very high-level programming lan-
A simple example guage, and constrains itself to a comparatively
small set of language constructs that are both
As an introductory example, consider naive nu-
simple and powerful. To map them to efficient C
merical integration of the Gamma function. A
code, the Cython compiler applies tightly tailored
fast C implementation of the Gamma function is
and optimized implementations for different use
available e.g. in the GNU Scientific Library, and
patterns. It therefore becomes possible to write
can easily be made available to Cython through
simple code that executes very efficiently.
some C declarations in Cython code (double
refers to the double precision floating point type Given how much time most programs spend in
in C): loops, an important target for optimizations is the
for loop in Python, which is really a for-each loop
cdef extern from "gsl/gsl_sf.h": that can run over any iterable object. For exam-
double gsl_sf_gamma(double x)
ple, the following code iterates over the lines of a
double GSL_SF_GAMMA_XMAX
file:
One can then write a Cython function, callable f = open(a_file.txt)
from Python, to approximate the definite integral: for line in f:
handle(line)
def integrate_gamma(double a, double b, f.close()
int n=10000):
if (min(a, b) <= 0 or The Python language avoids special cases where
max(a, b) >= GSL_SF_GAMMA_XMAX):
possible, so there is no special syntax for a plain
raise ValueError(Limits out
of range (0, \%f) %
integer for loop. However, there is a common
GSL_SF_GAMMA_XMAX) idiom for it, e.g. for an integer loop from 0 to
cdef int i 999:
cdef double dx = (b - a) / n, result = 0
for i in range(1000):
for i in range(n):
do_something(i)
result += gsl_sf_gamma(a + i * dx) * dx
return result
The Cython compiler recognizes this pattern and
This is pure Python code except that C types transforms it into an efficient for loop in C, if the

3
value range and the type of the loop variable al- are easy to add. It therefore becomes reasonable
low it. Similarly, when iterating over a sequence, for code writers to stick to the simple and read-
it is sometimes required to know the current in- able idioms of the Python language, to rely on the
dex inside of the loop body. Python has a spe- compiler to transform them into well specialized
cial function for this, called enumerate(), which and fast C language constructs, and to only take
wraps the iterable in a counter: a closer look at the code sections, if any, that still
prove to be performance critical in benchmarks.
f = open(a_file.txt)
for line_no, line in enumerate(f): Apart from its powerful control flow constructs,
# prepend line number to line a high-level language feature that makes Python
print("%d: %s" % (line_no, line)) so productive is its support for object oriented
programming. True to the rest of the language,
Cython knows this pattern, too, and reduces the Python classes are very dynamic methods and
wrapping of the iterable to a simple counter vari- attributes can be added, inspected, and modi-
able, so that the loop can run over the iterable fied at runtime, and new types can be dynami-
itself, with no additional overhead. Cythons for cally created on the fly. Of course this flexibil-
loop has optimizations for the most important ity comes with a performance cost. Cython al-
built-in Python container and string types and it lows one to statically compile classes into C-level
can even iterate directly over low-level types, such struct layouts (with virtual function tables) in
as C arrays of a known size or sliced pointers: such a way that they integrate seamlessly into the
cdef char* c_string = \ Python class hierarchy without any of the Python
get_pointer_to_chars(10) overhead. Though much scientific data fits nicely
cdef char char_val into arrays, sometimes it does not, and Cythons
support for compiled classes allows one to effi-
# check if chars at offsets 3..9 are ciently create and manipulate more complicated
# any of abcABC data structures like trees, graphs, maps, and other
for char_val in c_string[3:10]:
heterogeneous, hierarchal objects.
print( char_val in babcABC )

Another example where high-level language id- Some typical usecases


ioms lead to specialized low-level code is cascaded
if statements. Many languages provide a special Cython has been successfully used in a wide va-
switch statement for testing integer(-like) values riety of situations, from the half a million lines of
against a set of different cases. A common Python Cython code in Sage (http://www.sagemath.org),
idiom uses the normal if statement: to providing Python-friendly wrappers to C li-
braries, to small personal projects. Here are some
if int_value == 1:
func_A() example usecases demonstrating where Cython
elif int_value in (2,3,7): has proved valuable.
func_B()
else:
func_C() Sparse matrices
SciPy and other libraries provide the basic high-
This reads well, without needing a special syntax.
level operations for working with sparse matri-
However, C compilers often fold switch state-
ces. However, constructing sparse matrices often
ments into more efficient code than sequential or
follows complicated rules for which elements are
nested if-else statements. If Cython knows that
nonzero. Such code can seldomly be expressed
the type of the int value variable is compatible
in terms of NumPy expressions the most naive
with a C integer (e.g. an enum value), it can ex-
method would need temporary arrays of the same
tract an equivalent switch statement directly from
size as the corresponding dense matrix, thus de-
the above code.
feating the purpose!
Several of these patterns have been implemented
Cython is ideal for this, as we can easily and
in the Cython compiler, and new optimizations

4 IEEE Computing in science and Engineering


quickly populate sparse matrices element by el- cimport libc
ement: ...
cdef np.ndarray[double] x, y, z, v
import numpy as np x = ...; y = ...; z = ...
cimport numpy as np v = np.zeros_like(x)
... ...
cdef np.ndarray[np.intc_t] rows, cols for i in range(x.shape[0]):
cdef np.ndarray[double] values v[i] = libc.sqrt(
rows = np.zeros(nnz, dtype=np.intc) x[i]**2 + y[i]**2 + z[i]**2)
cols = np.zeros(nnz, dtype=np.intc)
values = np.zeros(nnz, dtype=np.double)
which avoids these problems, as no temporary
cdef int idx = 0
for idx in range(0, nnz):
buffers are required. The speedup is on the or-
# Compute next non-zero matrix element der of a factor of ten for large arrays.
... If one is doing a lot of such transformations, one
rows[idx] = row; cols[idx] = col
should also evaluate numexpr and Theano which
values[idx] = value
# Finally, we construct a regular are dedicated to the task. Theano is able to refor-
# SciPy sparse matrix: mulate the expression for optimal numerical sta-
return scipy.sparse.coo_matrix( bility, and is able to compute the expression on a
(values, (rows, cols)), shape=(N,N)) highly parallel GPU.

Data transformation and reduction Optimization and equation solving

Consider computing a simple expression for a In the case of numerical optimization or equa-
large number of different input values, e.g.: tion solving, the algorithm in question must be
handed a function (a callback) which evaluates
v = np.sqrt(x**2 + y**2 + z**2) the function. The algorithm then relies on mak-
ing new steps depending on previously computed
where the variables are arrays for three vectors function values, and the process is thus inherently
x, y and z. This is a case where, in most cases, sequential. Depending on the nature and size of
one does not need to use Cython it is easily the problem, different levels of optimization can
expressed by pure NumPy operations that are al- be employed.
ready optimized and usually fast enough. For medium-sized to large problems, the standard
The exceptions are for either extremely small or scientific Python routines integrate well with with
large amounts of data. For small data sets that Cython. One simply declares types within the
are evaluated many, many times, the Python over- callback function, and hands the callback to the
head of the NumPy expression will dominate, solver just like one would with a pure Python
and making a loop in Cython removes this over- function. Given the frequency with which this
head. For large amounts of data, NumPy has two function may be called, the act of typing the
problems: it requires large amounts of temporary variables in the callback function, combined with
memory, and it repeatedly moves temporary re- the reduced call overhead of Cython implemented
sults over the memory bus. In most scientific set- Python functions, can have a noticeable impact
tings the memory bus can easily become the main on performance. How much depends heavily on
bottleneck, not the CPU (for a detailed explana- the problem in question; as a rough indicator, we
tion see [Alted]). In the example above, NumPy have noted a 40 times speedup when using this
will first square x in a temporary buffer, then method on a particular ordinary differential equa-
square y in another temporary buffer, then add tion in 12 variables.
them together using a third temporary buffer, and For computationally simple problems in only a
so on. few variables, evaluating the function can be such
In Cython, it is possible to manually write a loop a quick operation that the overhead of the Python
running at native speed: function call for each step becomes relevant. In

Optimization and equation solving 5


these cases, one might want to explore calling ex- tally, the variance per `, or power spectrum, is
isting C or Fortran code directly from Cython. the primary quantity of interest to observational
Some libraries have ready-made Cython wrappers cosmologists.)
for instance, Sage has Cython wrappers around
The spherical harmonic transform mentioned
the ordinary differential equation solvers in the
above is computed using the Fortran library
GNU Scientific Library. In some cases, one might
HEALPix3 , which can readily be called from
opt for implementing the algorithm directly in
Cython with the help of Fwrap. However,
Cython, to avoid any callback whatsoever using
HEALPix spits out the result as a 2D array, with
Newtons method on equations of a single variable
roughly half of the elements unoccupied. The
comes to mind.
waste of storage aside, 2D arrays are often incon-
venient with 1D arrays one can treat each set of
Non-rectangular arrays and data repacking coefficients as a vector, and perform linear alge-
bra, estimate covariance matrices and so on, the
Sometimes data does not fit naturally in rectangu- usual way. Again, it is possible to quickly reorder
lar arrays, and Cython is especially well-suited to the data the way we want it with a Cython loop.
this situation. One such example arises in cosmol- With all the existing code out there wanting data
ogy. Satellite experiments such as the Wilkinson in slightly different order and formats, for loops
Microwave Anisotropy Probe have produced high- are not about to disappear.
resolution images of the Cosmic Microwave Back-
ground, a primary source of information about the
early universe. The resulting images are spheri- Fwrap
cal, as they contain values for all directions on the
Whereas C and C++ integrate closely with
sky.
Cython, Fortran wrappers in Cython are gener-
The spherical harmonic transform of these maps, ated with Fwrap, a separate utility that is dis-
a fourier transform on the sphere, is especially tributed separately from Cython. Fwrap [fwrap]
important. It has complex coefficients a`m where is a tool that automates wrapping Fortran source
the indices run over 0 ` `max , ` m `. in C, Cython and Python, allowing Fortran code
An average of the entire map is stored in a0,0 , to benefit from the dynamism and flexibility of
followed by three elements to describe the dipole Python. Fwrapped code can be seamlessly inte-
component, a1,1 , a1,0 , a1,1 , and so on. Data like grated into a C, Cython or Python project. The
this can be stored in a one-dimensional array and utility transparently supports most of the features
elements looked up at position `2 + ` + m. introduced in Fortran 90/95/2003, and will han-
dle nearly all Fortran 77 source as well. Fwrap
It is possible, but not trivial, to operate on such
does not currently support derived types or func-
data using NumPy whole-array operations. The
tion callbacks, but support for these features is
problem is that NumPy functions, such as find-
scheduled in an upcoming release.
ing the variance, are primarily geared towards
rectangular arrays. If the data was rectangular, Thanks to the C interoperability features supplied
one could estimate the variance per `, averaging in the Fortran 2003 standard and supported in
over m, by calling np.var(data, axis=1). This recent versions of all widely-used Fortran 90/95
doesnt work for non-rectangular data. While compilers Fwrap generates wrappers that are
there are workarounds, such as the reduceat portable across platforms and compilers. Fwrap is
method and masked arrays, we have found it intended to be as friendly as possible, and handles
much more straightforward to write the obvious the Fortran parsing and generation automatically.
loops over ` and m using Cython. For compar- It also generates a build script for the project that
ison, with Python and NumPy one could loop will portably build a Python extension module
over ` and call repeatedly call np.var for sub- from the wrapper files.
slices of the data, which was 27 times slower in
Fwrap is similar in intent to other Fortran-Python
our case (`max = 1500). Using a naive double
3
loop over both ` and m was more than a 1000 Hierarchical Equal Area isoLatitude Pixelization,
times slower in Python than in Cython. (Inciden- Gorski et al, http://healpix.jpl.nasa.gov/

6 IEEE Computing in science and Engineering


tools such as F2PY, PyFort and Forthon. F2PY is >>> import numpy as np
distributed with NumPy and is a capable tool for >>> from numpy.random import rand
wrapping Fortran 77 codes. Fwraps approach dif- >>> m = 10
fers in that it leverages Cython to create Python >>> rand_array = rand(m, m)
>>> a = np.asfortranarray(rand_array,
bindings. Manual tuning of the wrapper can be
... dtype=np.double)
easily accomplished by simply modifying the gen-
erated Cython code, rather than using a restricted The asfortranarray() function is important
domain-specific language. Another benefit is re- this ensures that the array a is laid out in column-
duced overhead when calling Fortran code from major ordering, also known as fortran ordering.
Cython. This ensures that no copying is required when
Consider a real world example: wrapping a sub- passing arrays to Fortran subroutines.
routine from netlibs LAPACK Fortran 90 source. Any subroutine argument that is an INTENT(OUT)
We will use the Fortran 90 subroutine interface array needs to be passed to the subroutine. The
for dgesdd, used to compute the singular value de- subroutine will modify the array in place; no
composition arrays U, S, and VT of a real array A, copies are made for arrays of numeric types. This
such that A = U * DIAG(S) * VT. This routine is not required for scalar INTENT(OUT) arguments,
is typical of Fortran 90 source code it has scalar such as the INFO argument. This is how one would
and array arguments with different intents and create three empty arrays of appropriate dimen-
different datatypes. We have augmented the ar- sions:
gument declarations with INTENT attributes and
removed extraneous work array arguments for il- >>> s = np.empty(m, dtype=np.double,
lustration purposes: ... order=F)
>>> u = np.empty((m, m), dtype=np.double,
SUBROUTINE DGESDD(JOBZ, M, N, A, LDA, S, & ... order=F)
& U, LDU, VT, LDVT, INFO) >>> vt = np.empty((m, m), dtype=np.double,
! .. Scalar Arguments .. ... order=F)
CHARACTER, INTENT(IN) :: JOBZ
INTEGER, INTENT(OUT) :: INFO The order=F keyword argument serves the
INTEGER, INTENT(IN) :: LDA, LDU, LDVT & same purpose as the asfortranarray() function.
& M, N
! .. Array Arguments .. The extension module is named fw dgesdd.so
DOUBLE PRECISION, INTENT(INOUT) :: & (the file extension is platform-dependent). We
& A(LDA, *) import dgesdd from it and call it from Python:
DOUBLE PRECISION, INTENT(OUT) :: &
& S(*), U(LDU, *), VT(LDVT, *) >>> from fw_dgesdd import dgesdd
! DGESDD subroutine body # specify that we want all the output vectors
>>> jobz = A
END SUBROUTINE DGESDD >>> (a, s, u, vt, info) = dgesdd(
... jobz, m, n, a, m, s, u, m, vt, m)
When invoked on the above Fortran code, Fwrap
parses the code and makes it available to C, The return value is a tuple that contains all ar-
Cython and Python. If desired, we can generate guments that were declared intent out, inout
a deployable package for use on computers or with no intent spec. The a argument (intent
that dont have Fwrap or Cython installed. inout) is in both the argument list and the return
To use the wrapped code from Python, we tuple, but no copy has been made.
must first set up the subroutine argumentsin We can verify that the result is correct:
particular, the a array argument. To do this,
we set the array dimensions and then create the >>> s_diag = np.diag(s)
array, filling it with random values. To simplify >>> a_computed = np.dot(u,
... np.dot(s_diag, vt))
matters, we set all array dimensions equal to m:
>>> np.allclose(a, a_computed)
True

7
Here we create a computed which is equivalent is inherently limited in its multithreading capa-
to the matrix product u * s diag * vt, and we bilities, due to the use of a Global Interpreter
verify that a and a computed are equal to within Lock (GIL). Cython code can declare sections as
machine precision. only containing C code (using a nogil directive),
which are then able to run in parallel. How-
When calling the routine from within Cython
ever, this can quickly become tedious. Currently
code, the invocation is identical, and the argu-
theres also no support for OpenMP programming
ments can be typed to reduce function call over-
in Cython. On the other hand, message passing
head. Again, please see the documentation for
parallelism using multiple processes, for instance
details and examples.
through MPI, is very well supported.
Fwrap handles any kind of Fortran array declara-
Compared to C++, a major weakness is the lack
tion, whether assumed-size (like the above exam-
of built-in template support, which aids in writ-
ple), assumed-shape or explicit shape. Options
ing code that works efficiently with many differ-
exist for hiding redundant arguments (like the ar-
ent data types. In Cython, one must either repeat
ray dimensions LDA, LDU and LDVT above) and are
code for each data type, or use an external tem-
covered in Fwraps documentation.
plating system, in the same way that is often done
This example covers just the basics of for Fortran codes. Many template engines exists
what Fwrap can do. For more informa- for Python, and most of them should work well
tion, downloads and help using Fwrap, see for generating Cython code.
http://fwrap.sourceforge.net/. You can
Using a language which can be either dynamic
reach other users and the Fwrap devel-
or static takes some experience. Cython is clearly
opers on the the fwrap-users mailing list,
useful when talking to external libraries, but when
http://groups.google.com/group/fwrap-users.
is it worth it to replace normal Python code with
Cython code? The obvious factor to consider is
Limitations the purpose of the code is it a single exper-
iment, for which the Cython compilation time
When compared to writing code in pure Python, might overshadow the pure Python run time? Or
Cythons primary disadvantages are compilation is it a core library function, where every ounce of
time and the need to have a separate build phase. speed matters?
Most projects using Cython are therefore writ-
ten in a mix of Python and Cython, as Cython It is possible to paint some broad strokes when
sources dont need to be recompiled when Python it comes to the type of computation considered.
sources change. Cython can still be used to com- Is the bulk of time spent doing low-level number
pile some of the Python modules for performance crunching in your code, or is the heavy lifting done
reasons. There is also an experimental pure through calls to external libraries? How easy is it
mode where decorators are used to indicate static to express the computation in terms of NumPy
type declarations, which are valid Python and ig- operations? For sequential algorithms such as
nored by the interpreter at runtime, but are used equation solving and statistical simulations it is
by Cython when compiled. This combines the indeed impossible to do without a loop of some
advantage of a fast edit-run cycle with a high kind. Pure Python loops can be very slow; but
runtime performance of the final product. There the impact of this still varies depending on the
is also the question of code distribution. Many use case.
projects, rather than requiring Cython as a de-
pendency, ship the generated .c files which com- Further reading
pile against Python 2.3 to 3.2 without any modi-
fications as part of the distutils setup phase. If you think Cython might help you, then the next
stop is the Cython Tutorial [tutorial]. [numerics]
Compared to compiled languages such as Fortran
presents optimization strategies and benchmarks
and C, Cythons primary limitation is the limited
for computations.
support for shared memory parallelism. Python
As always, the online documentation at

8 IEEE Computing in science and Engineering


http://docs.cython.org provides the most [tutorial] S. Behnel, R. W. Bradshaw, D. S. Sel-
up-to-date information. If you are ever stuck, jebotn, Cython Tutorial, Proceedings of the
or just wondering if Cython will be able 8th Python in Science Conference, 2009. URL:
to solve your particular problem, Cython http://conference.scipy.org/proceedings/
has an active and friendly mailing list at SciPy2009/paper 1
http://groups.google.com/group/cython-users.

References
[Alted] F. Alted. Why modern CPUs are starv-
ing and what can be done about it. CiSE 12,
68, 2010.
[comparison] I. M. Wilbers, H. P. Lang-
tangen, A. Oedegaard, Using Cython to
Speed up Numerical Python Programs,
Proceedings of MekIT09, 2009. URL:
http://simula.no/research/sc/publications/
Simula.SC.578
[fwrap] K. W. Smith, D. S. Selje-
botn, Fwrap: Fortran wrappers
in C, Cython & Python. URL:
http://conference.scipy.org/abstract?id=19
Project homepage:
http://fwrap.sourceforge.net/
[numerics] D. S. Seljebotn, Fast numerical com-
putations with Cython, Proceedings of the 8th
Python in Science Conference, 2009. URL:
http://conference.scipy.org/proceedings/
SciPy2009/paper 2
[NumPy] S. van der Walt, S. C. Colbert, G.
Varoquaux, The NumPy array: a structure for
efficient numerical computation, CiSE, present
issue
[Pyrex] G. Ewing, Pyrex: A language for
writing Python extension modules. URL:
http://www.cosc.canterbury.ac.nz/greg.ewing
/python/Pyrex/
[Sage] William A. Stein et al. Sage Mathematics
Software, The Sage Development Team, 2010,
http://www.sagemath.org.
[Theano] J. Bergstra. Optimized Symbolic Ex-
pressions and GPU Metaprogramming with
Theano, Proceedings of the 9th Python in Sci-
ence Conference (SciPy2010), Austin, Texas,
June 2010.
[numexpr] D. Cooke, F. Alted, T.
Hochberg, G. Thalhammer, numexpr
http://code.google.com/p/numexpr/

You might also like