ZODB/ZEO Programming Guide

Author: A.M. Kuchling <amk@amk.ca>
Version: Release 3.7.0a0
Date: August 11, 2006


This in unfinished attempt to convert "ZODB/ZEO Programming Guide" to ReStructuredText? format. Original is available from here: http://svn.zope.org/ZODB/trunk/doc/guide . I was learning ZODB, and this conversion is a time pass.


Copyright 2002 A.M. Kuchling.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the appendix entitled "GNU Free Documentation License".

1   Introduction

This guide explains how to write Python programs that use the Z Object Database (ZODB) and Zope Enterprise Objects (ZEO). The latest version of the guide is always available at http://www.zope.org/Wikis/ZODB/guide/index.html.

1.1   What is the ZODB?

The ZODB is a persistence system for Python objects. Persistent programming languages provide facilities that auto-matically write objects to disk and read them in again when they're required by a running program. By installing the ZODB, you add such facilities to Python. It's certainly possible to build your own system for making Python objects persistent. The usual starting points are the pickle module, for converting objects into a string representation, and various database modules, such as the gdbm or bsddb modules, that provide ways to write strings to disk and read them back. It's straightforward to combine the pickle module and a database module to store and retrieve objects, and in fact the shelve module, included in Python's standard library, does this. The downside is that the programmer has to explicitly manage objects, reading an object when it's needed and writing it out to disk when the object is no longer required. The ZODB manages objects for you, keeping them in a cache, writing them out to disk when they are modified, and dropping them from the cache if they haven't been used in a while.

1.2   OODBs? vs. Relational DBs?

Another way to look at it is that the ZODB is a Python-specific object-oriented database (OODB). Commercial object databases for C++ or Java often require that you jump through some hoops, such as using a special preprocessor or avoiding certain data types. As we'll see, the ZODB has some hoops of its own to jump through, but in comparison the naturalness of the ZODB is astonishing. Relational databases (RDBs?) are far more common than OODBs?. Relational databases store information in tables; a table consists of any number of rows, each row containing several columns of information. (Rows are more formally called relations, which is where the term "relational database" originates.) Let's look at a concrete example. The example comes from my day job working for the MEMS Exchange, in a greatly simplified version. The job is to track process runs, which are lists of manufacturing steps to be performed in a semiconductor fab. A run is owned by a particular user, and has a name and assigned ID number. Runs consist of a number of operations; an operation is a single step to be performed, such as depositing something on a wafer or etching something off it. Operations may have parameters, which are additional information required to perform an operation. For example, if you're depositing something on a wafer, you need to know two things: 1) what you're depositing, and 2) how much should be deposited. You might deposit 100 microns of silicon oxide, or 1 micron of copper. Mapping these structures to a relational database is straightforward:

   int      run_id,
   varchar owner,
   varchar title,
   int      acct_num,
   primary key(run_id)

CREATE TABLE operations (
   int      run_id,
   int      step_num,
   varchar process_id,
   PRIMARY KEY(run_id, step_num),
   FOREIGN KEY(run_id) REFERENCES runs(run_id),

CREATE TABLE parameters (
   int       run_id,
   int       step_num,
   varchar param_name,
   varchar param_value,
   PRIMARY KEY(run_id, step_num, param_name)
   FOREIGN KEY(run_id, step_num)
       REFERENCES operations(run_id, step_num),

In Python, you would write three classes named Run, Operation, and Parameter. I won't present code for defining these classes, since that code is uninteresting at this point. Each class would contain a single method to begin with, an __init__ method that assigns default values, such as 0 or None, to each attribute of the class. It's not difficult to write Python code that will create a Run instance and populate it with the data from the relational tables; with a little more effort, you can build a straightforward tool, usually called an object-relational mapper, to do this automatically. (See http://www.amk.ca/python/unmaintained/ordb.html for a quick hack at a Python object- relational mapper, and http://www.python.org/workshops/1997-10/proceedings/shprentz.html for Joel Shprentz's more successful implementation of the same idea; Unlike mine, Shprentz's system has been used for actual work.) However, it is difficult to make an object-relational mapper reasonably quick; a simple-minded implementation like mine is quite slow because it has to do several queries to access all of an object's data. Higher performance object-relational mappers cache objects to improve performance, only performing SQL queries when they actually need to. That helps if you want to access run number 123 all of a sudden. But what if you want to find all runs where a step has a parameter named 'thickness' with a value of 2.0? In the relational version, you have two unappealing choices:

  1. Write a specialized SQL query for this case: SELECT run_id FROM operations WHERE param_name = 'thickness' AND param_value = 2.0 If such queries are common, you can end up with lots of specialized queries. When the database tables get rearranged, all these queries will need to be modified.
  2. An object-relational mapper doesn't help much. Scanning through the runs means that the the mapper will perform the required SQL queries to read run #1, and then a simple Python loop can check whether any of its steps have the parameter you're looking for. Repeat for run #2, 3, and so forth. This does a vast number of SQL queries, and therefore is incredibly slow.

An object database such as ZODB simply stores internal pointers from object to object, so reading in a single object is much faster than doing a bunch of SQL queries and assembling the results. Scanning all runs, therefore, is still inefficient, but not grossly inefficient.

1.3   What is ZEO?

The ZODB comes with a few different classes that implement the Storage interface. Such classes handle the job of writing out Python objects to a physical storage medium, which can be a disk file (the FileStorage? class), a BerkeleyDB? file (BDBFullStorage?), a relational database (DCOracleStorage?), or some other medium. ZEO adds ClientStorage?, a new Storage that doesn't write to physical media but just forwards all requests across a network to a server. The server, which is running an instance of the StorageServer? class, simply acts as a front-end for some physical Storage class. It's a fairly simple idea, but as we'll see later on in this document, it opens up many possibilities.

1.4   About this guide

The primary author of this guide works on a project which uses the ZODB and ZEO as its primary storage technology. We use the ZODB to store process runs and operations, a catalog of available processes, user information, accounting information, and other data. Part of the goal of writing this document is to make our experience more widely available. A few times we've spent hours or even days trying to figure out a problem, and this guide is an attempt to gather up the knowledge we've gained so that others don't have to make the same mistakes we did while learning. The author's ZODB project is described in a paper available here, http://www.amk.ca/python/writing/mx-architecture/ This document will always be a work in progress. If you wish to suggest clarifications or additional topics, please send your comments to zodb-dev@zope.org.

1.5   Acknowledgements

Andrew Kuchling wrote the original version of this guide, which provided some of the first ZODB documentation for Python programmers. His initial version has been updated over time by Jeremy Hylton and Tim Peters. I'd like to thank the people who've pointed out inaccuracies and bugs, offered suggestions on the text, or proposed new topics that should be covered: Jeff Bauer, Willem Broekema, Thomas Guettler, Chris McDonough?, George Runyan.

2   ZODB Programming

2.1   Installing ZODB

ZODB is packaged using the standard distutils tools.

2.1.1   Requirements

You will need Python 2.3 or higher. Since the code is packaged using distutils, it is simply a matter of untarring or unzipping the release package, and then running python setup.py install. You'll need a C compiler to build the packages, because there are various C extension modules. Binary installers are provided for Windows users.

2.1.2   Installing the Packages

Download the ZODB tarball containing all the packages for both ZODB and ZEO from http://www.zope.org/Products/ZODB3.3 . See the README.txt file in the top level of the release directory for details on building, testing, and installing. You can find information about ZODB and the most current releases in the ZODB Wiki at http://www.zope.org/Wikis/ZODB .

2.2   How ZODB Works

The ZODB is conceptually simple. Python classes subclass a persistent.Persistent class to become ZODB-aware. Instances of persistent objects are brought in from a permanent storage medium, such as a disk file, when the program needs them, and remain cached in RAM. The ZODB traps modifications to objects, so that when a statement such as obj.size = 1 is executed, the modified object is marked as "dirty". On request, any dirty objects are written out to permanent storage; this is called committing a transaction. Transactions can also be aborted or rolled back, which results in any changes being discarded, dirty objects reverting to their initial state before the transaction began. The term transaction has a specific technical meaning in computer science. It's extremely important that the contents of a database don't get corrupted by software or hardware crashes, and most database software offers protection against such corruption by supporting four useful properties, Atomicity, Consistency, Isolation, and Durability. In computer science jargon these four terms are collectively dubbed the ACID properties, forming an acronym from their names. The ZODB provides all of the ACID properties. Definitions of the ACID properties are:

  • Atomicity means that any changes to data made during a transaction are all-or-nothing. Either all the changes are applied, or none of them are. If a program makes a bunch of modifications and then crashes, the database won't be partially modified, potentially leaving the data in an inconsistent state; instead all the changes will be forgotten. That's bad, but it's better than having a partially-applied modification put the database into an inconsistent state.
  • Consistency means that each transaction executes a valid transformation of the database state. Some databases, but not ZODB, provide a variety of consistency checks in the database or language; for example, a relational database constraint columns to be of particular types and can enforce relations across tables. Viewed more generally, atomicity and isolation make it possible for applications to provide consistency.
  • Isolation means that two programs or threads running in two different transactions cannot see each other's changes until they commit their transactions.
  • Durability means that once a transaction has been committed, a subsequent crash will not cause any data to be lost or corrupted.

2.3   Opening a ZODB

There are 3 main interfaces supplied by the ZODB: Storage, DB, and Connection classes. The DB and Connection interfaces both have single implementations, but there are several different classes that implement the Storage interface.

  • Storage classes are the lowest layer, and handle storing and retrieving objects from some form of long-term storage. A few different types of Storage have been written, such as FileStorage?, which uses regular disk files, and BDBFullStorage?, which uses Sleepycat Software's BerkeleyDB? database. You could write a new Storage that stored objects in a relational database, for example, if that would better suit your application. Two example storages, DemoStorage? and MappingStorage?, are available to use as models if you want to write a new Storage.
  • The DB class sits on top of a storage, and mediates the interaction between several connections. One DB instance is created per process.
  • Finally, the Connection class caches objects, and moves them into and out of object storage. A multi-threaded program should open a separate Connection instance for each thread. Different threads can then modify objects and commit their modifications independently.

Preparing to use a ZODB requires 3 steps: you have to open the Storage, then create a DB instance that uses the Storage, and then get a Connection from the DB instance. All this is only a few lines of code:

from ZODB import FileStorage, DB

storage = FileStorage.FileStorage('/tmp/test-filestorage.fs')
db = DB(storage)
conn = db.open()

Note that you can use a completely different data storage mechanism by changing the first line that opens a Storage; the above example uses a FileStorage?. In section 3, "How ZEO Works", you'll see how ZEO uses this flexibility to good effect.

2.4   Using a ZODB Configuration File

ZODB also supports configuration files written in the ZConfig? format. A configuration file can be used to separate the configuration logic from the application logic. The storages classes and the DB class support a variety of keyword arguments; all these options can be specified in a config file.

The configuration file is simple. The example in the previous section could use the following example:

  path /tmp/test-filestorage.fs

The ZODB.config module includes several functions for opening database and storages from configuration files.

import ZODB.config

db = ZODB.config.databaseFromURL('/tmp/test.conf')
conn = db.open()

The ZConfig? documentation, included in the ZODB3 release, explains the format in detail. Each configuration file is described by a schema, by convention stored in a component.xml file. ZODB, ZEO, zLOG, and zdaemon all have schemas.

2.5   Writing a Persistent Class

Making a Python class persistent is quite simple; it simply needs to subclass from the Persistent class, as shown in this example:

from persistent import Persistent

class User(Persistent):

The Persistent base class is a new-style class implemented in C. For simplicity, in the examples the User class will simply be used as a holder for a bunch of attributes. Normally the class would define various methods that add functionality, but that has no impact on the ZODB's treatment of the class.

The ZODB uses persistence by reachability; starting from a set of root objects, all the attributes of those objects are made persistent, whether they're simple Python data types or class instances. There's no method to explicitly store objects in a ZODB database; simply assign them as an attribute of an object, or store them in a mapping, that's already in the database. This chain of containment must eventually reach back to the root object of the database. As an example, we'll create a simple database of users that allows retrieving a User object given the user's ID. First, we retrieve the primary root object of the ZODB using the root() method of the Connection instance. The root object behaves like a Python dictionary, so you can just add a new key/value pair for your application's root object. We'll insert an OOBTree? object that will contain all the User objects. (The BTree? module is also included as part of Zope.)

dbroot = conn.root()

# Ensure that a 'userdb' key is present
# in the root
if not dbroot.has_key('userdb'):
    from BTrees.OOBTree import OOBTree
    dbroot['userdb'] = OOBTree()

userdb = dbroot['userdb']

Inserting a new user is simple: create the User object, fill it with data, insert it into the BTree? instance, and commit this transaction.

# Create new User instance
import transaction

newuser = User()

# Add whatever attributes you want to track
newuser.id = 'amk'
newuser.first_name = 'Andrew' ; newuser.last_name = 'Kuchling'

# Add object to the BTree, keyed on the ID
userdb[newuser.id] = newuser

# Commit the change

The transaction module defines a few top-level functions for working with transactions. commit() writes any modified objects to disk, making the changes permanent. abort() rolls back any changes that have been made, restoring the original state of the objects. If you're familiar with database transactional semantics, this is all what you'd expect. get() returns a Transaction object that has additional methods like note(), to add a note to the transaction metadata.

More precisely, the transaction module exposes an instance of the ThreadTransactionManager? transac- tion manager class as transaction.manager, and the transaction functions get() and begin() redirect to the same-named methods of transaction.manager. The commit() and abort() functions apply the methods of the same names to the Transaction object returned by transaction.manager.get(). This is for convenience. It's also possible to create your own transaction manager instances, and to tell DB.open() to use your transaction manager instead.

Because the integration with Python is so complete, it's a lot like having transactional semantics for your program's variables, and you can experiment with transactions at the Python interpreter's prompt:

>>> newuser
<User instance at 81b1f40>
>>> newuser.first_name                       # Print initial value
>>> newuser.first_name = 'Bob'               # Change first name
>>> newuser.first_name                       # Verify the change
>>> transaction.abort()                      # Abort transaction
>>> newuser.first_name                       # The value has changed back

2.6   Rules for Writing Persistent Classes

Practically all persistent languages impose some restrictions on programming style, warning against constructs they can't handle or adding subtle semantic changes, and the ZODB is no exception. Happily, the ZODB's restrictions are fairly simple to understand, and in practice it isn't too painful to work around them.

The summary of rules is as follows:

  • If you modify a mutable object that's the value of an object's attribute, the ZODB can't catch that, and won't mark the object as dirty. The solution is to either set the dirty bit yourself when you modify mutable objects, or use a wrapper for Python's lists and dictionaries (PersistentList?, PersistentMapping?) that will set the dirty bit properly.
  • Recent versions of the ZODB allow writing a class with __setattr__ , __getattr__, or __delattr_- _ methods. (Older versions didn't support this at all.) If you write such a __setattr__ or __delattr__- method, its code has to set the dirty bit manually.
  • A persistent class should not have a __del__ method. The database moves objects freely between memory and storage. If an object has not been used in a while, it may be released and its contents loaded from storage the next time it is used. Since the Python interpreter is unaware of persistence, it would call __del__ each time the object was freed.

Let's look at each of these rules in detail.

2.6.1   Modifying Mutable Objects

The ZODB uses various Python hooks to catch attribute accesses, and can trap most of the ways of modifying an object, but not all of them. If you modify a User object by assigning to one of its attributes, as in userobj.first_name = 'Andrew', the ZODB will mark the object as having been changed, and it'll be written out on the following commit(). The most common idiom that isn't caught by the ZODB is mutating a list or dictionary. If User objects have a attribute named friends containing a list, calling userobj.friends.append(otherUser) doesn't mark userobj as modified; from the ZODB's point of view, userobj.friends was only read, and its value, which happened to be an ordinary Python list, was returned. The ZODB isn't aware that the object returned was subsequently modified. This is one of the few quirks you'll have to remember when using the ZODB; if you modify a mutable attribute of an object in place, you have to manually mark the object as having been modified by setting its dirty bit to true. This is done by setting the _p_changed attribute of the object to true:

userobj._p_changed = True

You can hide the implementation detail of having to mark objects as dirty by designing your class's API to not use direct attribute access; instead, you can use the Java-style approach of accessor methods for everything, and then set the dirty bit within the accessor method. For example, you might forbid accessing the friends attribute directly, and add a get_friend_list() accessor and an add_friend() modifier method to the class. add_friend() would then look like this:

def add_friend(self, friend):
    self._p_changed = True

Alternatively, you could use a ZODB-aware list or mapping type that handles the dirty bit for you. The ZODB comes with a PersistentMapping? class, and I've contributed a PersistentList? class that's included in my ZODB distribution, and may make it into a future upstream release of Zope.

2.6.2   __getattr__, __delattr__, and __setattr__

ZODB allows persistent classes to have hook methods like __getattr__ and __setattr__. There are four special methods that control attribute access; the rules for each are a little different. The __getattr__ method works pretty much the same for persistent classes as it does for other classes. No special handling is needed. If an object is a ghost, then it will be activated before __getattr__ is called.

The other methods are more delicate. They will override the hooks provided by Persistent, so user code must call special methods to invoke those hooks anyway. The __getattribute__ method will be called for all attribute access; it overrides the attribute access support inherited from Persistent. A user-defined __getattribute__ must always give the Persistent base class a chance to handle special attribute, as well as __dict__ or __class__. The user code should call _p_getattr, passing the name of the attribute as the only argument. If it returns True, the user code should call Persistent's __getattribute__ to get the value. If not, the custom user code can run. A __setattr__ hook will also override the Persistent __setattr__ hook. User code must treat it much like __getattribute__. The user-defined code must call _p_setattr first to all Persistent to handle special attributes; _p_setattr takes the attribute name and value. If it returns True, Persistent handled the attribute. If not, the user code can run. If the user code modifies the object's state, it must assigned to _p_changed. A __delattr__ hooks must be implemented the same was as a the last two hooks. The user code must call _- p_delattr, passing the name of the attribute as an argument. If the call returns True, Persistent handled the attribute; if not, the user code can run.

2.6.3   __del__ methods

A __del__ method is invoked just before the memory occupied by an unreferenced Python object is freed. Because ZODB may materialize, and dematerialize, a given persistent object in memory any number of times, there isn't a meaningful relationship between when a persistent object's __del__ method gets invoked and any natural aspect of a persistent object's life cycle. For example, it is emphatically not the case that a persistent object's __del__- method gets invoked only when the object is no longer referenced by other objects in the database. __del__ is only concerned with reachability from objects in memory. Worse, a __del__ method can interfere with the persistence machinery's goals. For example, some number of persistent objects reside in a Connection's memory cache. At various times, to reduce memory burden, objects that haven't been referenced recently are removed from the cache. If a persistent object with a __del___ method is so removed, and the cache was holding the last memory reference to the object, the object's __del__ method will be invoked. If the __del__ method then references any attribute of the object, ZODB needs to load the object from the database again, in order to satisfy the attribute reference. This puts the object back into the cache again: such an object is effectively immortal, occupying space in the memory cache forever, as every attempt to remove it from cache puts it back into the cache. In ZODB versions prior to 3.2.2, this could even cause the cache reduction code to fall into an infinite loop. The infinite loop no longer occurs, but such objects continue to live in the memory cache forever. Because __del__ methods don't make good sense for persistent objects, and can create problems, persistent classes should not define __del__ methods.

2.7   Writing Persistent Classes

Now that we've looked at the basics of programming using the ZODB, we'll turn to some more subtle tasks that are likely to come up for anyone using the ZODB in a production system.

2.7.1   Changing Instance Attributes

Ideally, before making a class persistent you would get its interface right the first time, so that no attributes would ever need to be added, removed, or have their interpretation change over time. It's a worthy goal, but also an impractical one unless you're gifted with perfect knowledge of the future. Such unnatural foresight can't be required of any person, so you therefore have to be prepared to handle such structural changes gracefully. In object-oriented database terminology, this is a schema update. The ZODB doesn't have an actual schema specification, but you're changing the software's expectations of the data contained by an object, so you're implicitly changing the schema. One way to handle such a change is to write a one-time conversion program that will loop over every single object in the database and update them to match the new schema. This can be easy if your network of object references is quite structured, making it easy to find all the instances of the class being modified. For example, if all User objects can be found inside a single dictionary or BTree?, then it would be a simple matter to loop over every User instance with a for statement. This is more difficult if your object graph is less structured; if User objects can be found as attributes of any number of different class instances, then there's no longer any easy way to find them all, short of writing a generalized object traversal function that would walk over every single object in a ZODB, checking each one to see if it's an instance of User. Some OODBs? support a feature called extents, which allow quickly finding all the instances of a given class, no matter where they are in the object graph; unfortunately the ZODB doesn't offer extents as a feature.

3   ZEO

3.1   How ZEO Works

The ZODB, as I've described it so far, can only be used within a single Python process (though perhaps with multiple threads). ZEO, Zope Enterprise Objects, extends the ZODB machinery to provide access to objects over a network. The name "Zope Enterprise Objects" is a bit misleading; ZEO can be used to store Python objects and access them in a distributed fashion without Zope ever entering the picture. The combination of ZEO and ZODB is essentially a Python-specific object database. ZEO consists of about 12,000 lines of Python code, excluding tests. The code is relatively small because it contains only code for a TCP/IP server, and for a new type of Storage, ClientStorage?. ClientStorage? simply makes remote procedure calls to the server, which then passes them on a regular Storage class such as FileStorage?. The following diagram lays out the system: XXX insert diagram here later Any number of processes can create a ClientStorage? instance, and any number of threads in each process can be using that instance. ClientStorage? aggressively caches objects locally, so in order to avoid using stale data the ZEO server sends an invalidation message to all the connected ClientStorage? instances on every write operation. The invalidation message contains the object ID for each object that's been modified, letting the ClientStorage? instances delete the old data for the given object from their caches. This design decision has some consequences you should be aware of. First, while ZEO isn't tied to Zope, it was first written for use with Zope, which stores HTML, images, and program code in the database. As a result, reads from the database are far more frequent than writes, and ZEO is therefore better suited for read-intensive applications. If every ClientStorage? is writing to the database all the time, this will result in a storm of invalidate messages being sent, and this might take up more processing time than the actual database operations themselves. These messages are small and sent in batches, so there would need to be a lot of writes before it became a problem. On the other hand, for applications that have few writes in comparison to the number of read accesses, this aggres- sive caching can be a major win. Consider a Slashdot-like discussion forum that divides the load among several Web servers. If news items and postings are represented by objects and accessed through ZEO, then the most heav- ily accessed objects ­ the most recent or most popular postings ­ will very quickly wind up in the caches of the ClientStorage? instances on the front-end servers. The back-end ZEO server will do relatively little work, only being called upon to return the occasional older posting that's requested, and to send the occasional invalidate message when a new posting is added. The ZEO server isn't going to be contacted for every single request, so its workload will remain manageable.

3.2   Installing ZEO

This section covers how to install the ZEO package, and how to configure and run a ZEO Storage Server on a machine.

3.2.1   Requirements

The ZEO server software is included in ZODB3. As with the rest of ZODB3, you'll need Python 2.3 or higher.

3.2.2   Running a server

The runzeo.py script in the ZEO directory can be used to start a server. Run it with the -h option to see the various values. If you're just experimenting, a good choise is to use python ZEO/runzeo.py -a /tmp/zeosocket -f /tmp/test.fs to run ZEO with a Unix domain socket and a FileStorage?.

3.3   Testing the ZEO Installation

Once a ZEO server is up and running, using it is just like using ZODB with a more conventional disk-based storage; no new programming details are introduced by using a remote server. The only difference is that programs must create a ClientStorage? instance instead of a FileStorage? instance. From that point onward, ZODB-based code is happily unaware that objects are being retrieved from a ZEO server, and not from the local disk. As an example, and to test whether ZEO is working correctly, try running the following lines of code, which will connect to the server, add some bits of data to the root of the ZODB, and commits the transaction:

from ZEO import ClientStorage
from ZODB import DB
import transaction

# Change next line to connect to your ZEO server
addr = 'kronos.example.com', 1975
storage = ClientStorage.ClientStorage(addr)
db = DB(storage)
conn = db.open()
root = conn.root()

# Store some things in the root
root['list'] = ['a', 'b', 1.0, 3]
root['dict'] = {'a':1, 'b':4}

# Commit the transaction

If this code runs properly, then your ZEO server is working correctly. You can also use a configuration file.

    server localhost:9100

One nice feature of the configuration file is that you don't need to specify imports for a specific storage. That makes the code a little shorter and allows you to change storages without changing the code.

import ZODB.config

db = ZODB.config.databaseFromURL('/tmp/zeo.conf')

3.4   ZEO Programming Notes

ZEO is written using asyncore, from the Python standard library. It assumes that some part of the user application is running an asyncore mainloop. For example, Zope run the loop in a separate thread and ZEO uses that. If your application does not have a mainloop, ZEO will not process incoming invalidation messages until you make some call into ZEO. The Connection.sync method can be used to process pending invalidation messages. You can call it when you want to make sure the Connection has the most recent version of every object, but you don't have any other work for ZEO to do.

3.5   Sample Application: chatter.py

For an example application, we'll build a little chat application. What's interesting is that none of the application's code deals with network programming at all; instead, an object will hold chat messages, and be magically shared between all the clients through ZEO. I won't present the complete script here; it's included in my ZODB distribution, and you can download it from http://www.amk.ca/zodb/demos/. Only the interesting portions of the code will be covered here. The basic data structure is the ChatSession? object, which provides an add_message() method that adds a message, and a new_messages() method that returns a list of new messages that have accumulated since the last call to new_messages(). Internally, ChatSession? maintains a B-tree that uses the time as the key, and stores the message as the corresponding value. The constructor for ChatSession? is pretty simple; it simply creates an attribute containing a B-tree:

class ChatSession(Persistent):
    def __init__(self, name):
        self.name = name
        # Internal attribute: _messages holds all the chat messages.
        self._messages = BTrees.OOBTree.OOBTree()

add_message() has to add a message to the _messages B-tree. A complication is that it's possible that some other client is trying to add a message at the same time; when this happens, the client that commits first wins, and the second client will get a ConflictError? exception when it tries to commit. For this application, ConflictError? isn't serious but simply means that the operation has to be retried; other applications might treat it as a fatal error. The code uses try...except...else inside a while loop, breaking out of the loop when the commit works without raising an exception.

3.6   ZEO Programming Notes

def add_message(self, message):
    """Add a message to the channel.
    message -- text of the message to be added

      while 1:
              now = time.time()
              self._messages[now] = message
          except ConflictError:
              # Conflict occurred; this process should pause and
              # wait for a little bit, then try again.
              # No ConflictError exception raised, so break
              # out of the enclosing while loop.
      # end while

new_messages() introduces the use of volatile attributes. Attributes of a persistent object that begin with _v_ are considered volatile and are never stored in the database. new_messages() needs to store the last time the method was called, but if the time was stored as a regular attribute, its value would be committed to the database and shared with all the other clients. new_messages() would then return the new messages accumulated since any other client called new_messages(), which isn't what we want. def new_messages(self): "Return new messages."

# self._v_last_time is the time of the most recent message
# returned to the user of this class.
if not hasattr(self, '_v_last_time'):
    self._v_last_time = 0

new = []
T = self._v_last_time

for T2, message in self._messages.items():
    if T2 > T:
        self._v_last_time = T2

return new

This application is interesting because it uses ZEO to easily share a data structure; ZEO and ZODB are being used for their networking ability, not primarily for their data storage ability. I can foresee many interesting applications using ZEO in this way:

  • With a Tkinter front-end, and a cleverer, more scalable data structure, you could build a shared whiteboard using the same technique.
  • A shared chessboard object would make writing a networked chess game easy.
  • You could create a Python class containing a CD's title and track information. To make a CD database, a read-only ZEO server could be opened to the world, or an HTTP or XML-RPC interface could be written on top of the ZODB.
  • A program like Quicken could use a ZODB on the local disk to store its data. This avoids the need to write and maintain specialized I/O code that reads in your objects and writes them out; instead you can concentrate on the problem domain, writing objects that represent cheques, stock portfolios, or whatever.

4   Transactions and Versioning

4.1   Committing and Aborting

Changes made during a transaction don't appear in the database until the transaction commits. This is done by calling the commit() method of the current Transaction object, where the latter is obtained from the get() method of the current transaction manager. If the default thread transaction manager is being used, then transaction.commit() suffices. Similarly, a transaction can be explicitly aborted (all changes within the transaction thrown away) by invoking the abort() method of the current Transaction object, or simply transaction.abort() if using the default thread transaction manager. Prior to ZODB 3.3, if a commit failed (meaning the commit() call raised an exception), the transaction was implicitly aborted and a new transaction was implicitly started. This could be very surprising if the exception was suppressed, and especially if the failing commit was one in a sequence of subtransaction commits. So, starting with ZODB 3.3, if a commit fails, all further attempts to commit, join, or register with the transaction raise ZODB.POSException?.TransactionFailedError?. You must explicitly start a new transaction then, either by calling the abort() method of the current transaction, or by calling the begin() method of the current transaction's transaction manager.

4.2   Subtransactions

Subtransactions can be created within a transaction. Each subtransaction can be individually committed and aborted, but the changes within a subtransaction are not truly committed until the containing transaction is committed. The primary purpose of subtransactions is to decrease the memory usage of transactions that touch a very large number of objects. Consider a transaction during which 200,000 objects are modified. All the objects that are modified in a single transaction have to remain in memory until the transaction is committed, because the ZODB can't discard them from the object cache. This can potentially make the memory usage quite large. With subtransactions, a commit can be be performed at intervals, say, every 10,000 objects. Those 10,000 objects are then written to permanent storage and can be purged from the cache to free more space. To commit a subtransaction instead of a full transaction, pass a true value to the commit() or abort() method of the Transaction object.

# Commit a subtransaction

# Abort a subtransaction

A new subtransaction is automatically started upon successful committing or aborting the previous subtransaction.

4.3   Undoing Changes

Some types of Storage support undoing a transaction even after it's been committed. You can tell if this is the case by calling the supportsUndo() method of the DB instance, which returns true if the underlying storage supports undo. Alternatively you can call the supportsUndo() method on the underlying storage instance. If a database supports undo, then the undoLog(start, end[, func ]?) method on the DB instance returns the log of past transactions, returning transactions between the times start and end, measured in seconds from the epoch. If present, func is a function that acts as a filter on the transactions to be returned; it's passed a dictionary representing each transaction, and only transactions for which func returns true will be included in the list of transactions returned to the caller of undoLog(). The dictionary contains keys for various properties of the transaction. The most important keys are id, for the transaction ID, and time, for the time at which the transaction was committed.

>>> print storage.undoLog(0, sys.maxint)
[{'description': '',
  'time': 981126744.98,
  'user_name': ''},
 {'description': '',
  'time': 981126478.202,
  'user_name': ''}

To store a description and a user name on a commit, get the current transaction and call the note(text) method to store a description, and the setUser(user name) method to store the user name. While setUser() overwrites the current user name and replaces it with the new value, the note() method always adds the text to the transaction's description, so it can be called several times to log several different changes made in the course of a single transaction.

transaction.get().note('Change ownership')

To undo a transaction, call the DB.undo(id) method, passing it the ID of the transaction to undo. If the transaction can't be undone, a ZODB.POSException?.UndoError? exception will be raised, with the message "non-undoable transaction". Usually this will happen because later transactions modified the objects affected by the transaction you're trying to undo. After you call undo() you must commit the transaction for the undo to actually be applied. 1 There is one glitch in the undo process. The thread that calls undo may not see the changes to the object until it calls Connection.sync() or commits another transaction.

4.4   Versions


Versions should be avoided. They're going to be deprecated, replaced by better approaches to long- running transactions.

While many subtransactions can be contained within a single regular transaction, it's also possible to contain many regular transactions within a long-running transaction, called a version in ZODB terminology. Inside a version, any number of transactions can be created and committed or rolled back, but the changes within a version are not made 1 There are actually two different ways a storage can implement the undo feature. Most of the storages that ship with ZODB use the transactional form of undo described in the main text. Some storages may use a non-transactional undo makes changes visible immediately visible to other connections to the same ZODB.

Not all storages support versions, but you can test for versioning ability by calling supportsVersions() method of the DB instance, which returns true if the underlying storage supports versioning.

A version can be selected when creating the Connection instance using the DB.open([version ]?) method. The version argument must be a string that will be used as the name of the version:

vers_conn = db.open(version='Working version')

Transactions can then be committed and aborted using this versioned connection. Other connections that don't specify a version, or provide a different version name, will not see changes committed within the version named Working version. To commit or abort a version, which will either make the changes visible to all clients or roll them back, call the DB.commitVersion() or DB.abortVersion() methods. XXX what are the source and dest arguments for? The ZODB makes no attempt to reconcile changes between different versions. Instead, the first version which mod- ifies an object will gain a lock on that object. Attempting to modify the object from a different version or from an unversioned connection will cause a ZODB.POSException?.VersionLockError? to be raised:

from ZODB.POSException import VersionLockError

except VersionLockError, (obj_id, version):
    print ('Cannot commit; object %s '
           'locked by version %s' % (obj_id, version))

The exception provides the ID of the locked object, and the name of the version having a lock on it.

4.5   Multithreaded ZODB Programs

ZODB databases can be accessed from multithreaded Python programs. The Storage and DB instances can be shared among several threads, as long as individual Connection instances are created for each thread.

6   A Resources

7   B GNU Free Documentation License

Version 1.1, March 2000