onsdag 24 september 2014

Immutability in Python?

Where I dwell into the immutability concepts available in Python and make a shameless plug for Pyrsistent.


DISCLAIMER:
Some parts of this discussion is based on how CPython works. PyPy, Jython, IronPython and other Python implementations may behave differently in these areas.

What is immutability?

Lets keep it simple: If a thing is immutable it cannot be changed.

Why would you want immutability in programming?

These are the reasons I find most important:
  • Cognitive offload for the developer. No need to ever worry about the content of an object once it has been created. It will remain the same until it is destroyed. You can pass it around confidently knowing that no harm will ever be done to it.
  • Safe and simple invariance checking. It is only necessary to check the validity of the object state at construction time since there it is no way to change the object thereafter.
  • Safe and fast sharing, within and between threads. No need to make use of locks or similar constructs to synchronized access to an object that can never change.
  • Safe and fast reuse. If more objects with the same properties are required just use the same object over and over again. No need to ever restore anything to it's "original" state.
  • Efficiency. Can make certain classes of problems more efficient, e.g. some types of comparisons. Makes object caching less error prone.

Why would you not want immutability in programming?

Of course there are also reasons that you might want mutable objects in your programming. These are some reasons I can find for that, and I think that they are all interrelated:
  • Efficiency. It is often less efficient to create new objects than changing the value of existing ones. Both with regards to memory consumption and CPU cycles.
  • Habits. It requires changes to programming style that may take some getting used to.
  • Language support. Some languages (such as Clojure) embrace the use of immutability and make it the path of least resistance for the developer while others (such as Python) do not.

Immutability classes, (degrees of immutability?)

Are there really different classes of immutability? No, either it is possible to change something or not. End of discussion? Not really, for the continued discussion around immutability in Python we need to define a couple different classes.
  • By convention.  The programmer who wrote the code has expressed the intention of immutability in some way. For example by comments or variable naming. There is nothing technically stopping anyone from mutating the object. Everything is fully flexible, nothing is known. This is what we love in Python, right?
  • Apparent. The object appears to be immutable from the Python perspective. There are still moving things that are moving under the hood mainly related to reference counting / garbage collection in the CPython interpreter. There are also ways of mutating these objects from Python but you would have to use libraries such ctypes to achieve this.
  • True immutability. The real thing, nothing moves once the object has been created. This is impossible in CPython.


The good thing is that in CPython there is a thing called the Global Interpreter Lock (GIL) which promotes apparently immutable objects to (almost*) truly immutable objects. It does this by excluding more than one thread from executing concurrently in the interpreter. This is why the GIL is also a bad thing since it makes certain types of tasks harder to parallelize.


* It is still possible to poke on the objects if you resign to writing C code that interacts with them.

The state of immutability in Python

There is no such thing as a true constant reference in Python. There are certain tricks that you can apply to mimic constants but these are crude workarounds.


There are  a  bunch of object types though that fall in the class of apparent immutability and hence are promoted to truly immutable objects in python.

Basic types

Strings and numbers are immutable. There’s no way to change a letter in the middle of a string without creating a new string. There’s no way to change the value of a number without creating a new number. CPython utilizes this fact to cache small numbers so that their underlying object can be reused.


>>> x = 17
>>> y = 17
>>> x is y
True
>>> y = 1700
>>> x = 1700
>>> x is y
False

Collections

Python comes with a couple of immutable collections, the well known tuple and the sligthly less known frozenset which are basically lists and sets but with all mutating operations removed.
There’s also the namedtuple which is a factory function to create classes inheriting from a tuple with a predifined, and fixed, set of members that cannot be changed once an object of the type has been instantiated. This is a very nice feature that has been around since Python 2.6.


These structures gives you immutability goodies such as hashability and to some extent also reuse. For example:


>>> tuple() is tuple()
True
>>> list() is list()
False


The problem with the above structures is that they are awkward to work with if you wish to actually manipulate data in your program. For example creating a new tuple based on an existing one but with an updated element at position 2 would require something like the below:


>>> t1 = (1, 2, 3, 4, 5)
>>> t2 = t1[:2] + tuple(17) + t1[3:]
# or
>>> l1 = list(t1)
>>> l1[2] = 17
>>> t2 = tuple(l1)


Not only is this a pain to write (and read) it’s also inefficient since we have to create several objects along the way that are immediately thrown away. Furthermore all references to items in t1 have to be copied to t2 even though only one reference was updated.

Enter Pyrsistent… Today v0.5.0 was released.