blogroll tags

from hell import interesting_revelations

Somewhat inspired by the philosophical thickets in the depths of one of the more fundamental discussions on python-list aka comp.lang.python, I wrote a little function in C that grossly violates the Python object model's integrity and swaps two object structures in-place. What I wasn't expecting is that this can be used to illustrate some interesting facets of the CPython internals.

The original version looked like this:

static PyObject *
swap(PyObject *self, PyObject *args)
{
    PyObject *obj1, *obj2;
    Py_ssize_t len;
    PyObject *temp;

    if (!PyArg_ParseTuple(args, "OO", &obj1, &obj2)) {
        return NULL;
    }

    len = obj1->ob_type->tp_basicsize;
    if (obj2->ob_type->tp_basicsize != len) {
        PyErr_SetString(PyExc_TypeError, "types have different sizes (incompatible)");
        return NULL;
    }

    temp = PyMem_Malloc(len);
    memcpy(temp, obj1, len);
    memcpy(obj1, obj2, len);
    memcpy(obj2, temp, len);
    obj2->ob_refcnt = obj1->ob_refcnt;
    obj1->ob_refcnt = temp->ob_refcnt;

    Py_INCREF(Py_None);
    return Py_None;
}

Simple: get the object size in memory, and swap using a temporary variable. This sort of works — but not quite.

Python 3.1.2 (release31-maint, Jul  8 2010, 09:18:08) 
[GCC 4.4.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from hell import swap
>>> a = "this is the first string"
>>> b = "this is the second string!"
>>> swap(a,b)
>>> a
'this is the second string!'
>>> b
'this is the first string'
>>> t1 = (1,2,3)
>>> t2 = (a,)
>>> swap(t1, t2)
>>> t1
(1,)
>>> t2
zsh: segmentation fault  python3

As you can see, it swapped the strings without any problems (I'll show you some problems further below), but it behaved strangely with the tuples: the new t1 does have only one element, like the old t2, but that one element is the first element of the old t1! Also, what the flip happens when you try to access t2?

Turns out tuple is a variable-size type. That means it can be created with any number of items, and have an according size in memory depending on how large it has to be. My original code only respected the “basic size” of the type, meaning that, in the case of tuples, it copied the information on how many items there are, but not the actual items. When trying to print t2, Python reads beyond the end of the tuple structure, probably dereferences an invalid pointer, and dies a painful death.

On a side note, Python's list type is, contrary to what you might expect, not a variable-size type — it cannot be, since, in the case of variable-size types, the length must be known when the object is created (and allocated), and can never change. (The reason is that realloc(3)-ing an object might move it, which would invalidate pointers, which is when all hell would break loose). Lists don't keep their items in the actual object structure, they simply keep a pointer.

Armed with the knowledge of variable-size types, we can fix hell.swap to work for tuples:

    len1 = obj1->ob_type->tp_basicsize
           + ((PyVarObject*)obj1)->ob_size * obj1->ob_type->tp_itemsize;

    len2 = obj2->ob_type->tp_basicsize
           + ((PyVarObject*)obj2)->ob_size * obj2->ob_type->tp_itemsize;

    if (len1 != len2) {
        PyErr_SetString(PyExc_TypeError, "objects have different sizes (incompatible)");
        return NULL;
    }

    temp = PyMem_Malloc(len1);
    memcpy(temp, obj1, len1);
    memcpy(obj1, obj2, len1);
    memcpy(obj2, temp, len1);
    obj2->ob_refcnt = obj1->ob_refcnt;
    obj1->ob_refcnt = temp->ob_refcnt;
    PyMem_Free(temp);

Recompile, and we're ready for more apocalyptic idiocy. This time, after checking that tuples actually work as expected, we will be swapping strings in the wrong place to the great detriment of our sanity.

Python 3.1.2 (release31-maint, Jul  8 2010, 09:18:08) 
[GCC 4.4.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from hell import swap
>>> t1, t2, t3 = (1,2,3), (None,), ("a", "b", "erm...")
>>> swap(t1, t2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: objects have different sizes (incompatible)
>>> swap(t1, t3)
>>> t1
('a', 'b', 'erm...')
>>> t3
(1, 2, 3)
>>> s = set(t1)
>>> swap(t1[0], t1[1])
>>> s
{'b', 'erm...', 'a'}
>>> 'b' in s
False
>>> 'a' in s
False
>>> 'erm...' in s
True
>>> 'a' in list(s)
True
>>> 'b' in list(s)
True
>>> 

Okay, erm, what? It looks like it's there, but it's not, but then it is? There is, of course, a simple explanation for this:

sets (like dicts) are, for speed, implemented as a hash table. When you look up something in a set or dict, it first calculates a hash, and then searches for that. However, since it's possible for two objects to have the same hash, it also checks for equality. You will only get a result when there is an object around with both the same hash and is equal.

So, what happens here is: when you execute 'a' in s, the hash of 'a' is calculated, and all the items of the set that are referred to by that hash are checked whether they actually are 'a'. Since swapping, however, the hash of 'a' is associated with 'b' and vice versa — the set is corrupted because it correctly assumes that the hash of an object either never changed or does not, in fact, exist (lists, for example, aren't hashable at all, since they're mutable, and the hash would have to change when the object changes, which would defeat the whole point).

There you have it: that's what you get when you muck around in Python's memory.

I've uploaded the source code to JollyBOX code. Use it wisely.

>> from hell import swap
>>> swap(str, int)
zsh: segmentation fault  python3
% ]]>

On the evolution of snakes.

It's been a number of years since I first learned programming in Python with Mark Pilgrim's excellent, but now somewhat outdated, book, Dive Into Python. It has managed to become outdated because the Python language is being developed and improved all the time and new features are being added. One of the best features of Python is, beside the standard libraries, arguably, the documentation, which is good enough to include What's New documents for every release.

I've decided to have a look at the backlog of new features, and consider how I use Python today in ways that simply didn't exist when I originally came across the language.

Read about my findings after the break. (Technical language is used. Knowledge of Python and its features is presumed.)

Read more...

Auto-poweroff that server in your cellar

Our cellar houses an old grey box that acts as a home server for the family. It's quite useful in a number of ways, as a file server, web server, database server, and so on. It also, traditionally, had a habit of wasting power—it's so much nicer to just have the machine running when you use it. But, with the wonders of Wake on LAN, even the “I'm too lazy to run into the cellar” argument has lost any validity it might have had.

So much for turning the box on, how about turning it off? Figuring out when nobody is using the machine and then remembering to turn it off as well is hardly a task for a mere mortal. So I wrote a script that does it for me. is_anyone_here.py checks whether anyone is logged in, and looks for any evidence of recent usage. It was written on/for a Debian GNU/Linux (lenny) system with vsftpd and samba, and may require some modifications to work properly in your environment.

Have a look at the whole script after the break.

Read more...

RELEASE: Roxoptr 0.2

The next version of ROXOPTR2 has landed! Version 0.2 brings with it a new look, with completely new levels, and much, much more.

Roxoptr2 is a simple platformer-style helicopter game in which you pilot a small helicopter around obstacles of different shapes and sized. Don't hit anything!

Roxoptr2

Download:

More info on the game's home page and the WiiBrew page.

roxoptr 0.2

The stunning new artwork is all thanks to Mr_Nick666.

Roxoptr2 is free software. You may play, copy, and modify it under the terms of the MIT license.

RELEASE: roxoptr2-0.2~a1

Today, I release the first wrapped-up public version of roxopter2, v0.2~a1, second in the video game franchise nobody has ever heard of "rockopter". Roxoptr2 is a simple 2D side-scrolling game in which you pilot a helicopter around things. It is written in POSIX C using the SDL library (also SDL_image, SDL_ttf and zlib) and runs on UNIX systems such as GNU/Linux and FreeBSD (and probably Mac OS X) as well as the Nintendo Wii and Microsoft Windows.

Wiki page: Software/Roxoptr

Source tarball: roxoptr2-0.2~a1.tar.gz
Wii homebrew binaries: roxoptr2-0.2~a1_wii.zip
Win32 binaries: roxoptr2-0.2~a1_win32.zip

Mercurial repository:http://code.jollybox.de/hg/roxoptr2

Creating the perfect keyboard layout

After having read this post's title, you might have though “ah, he'll be presenting [insert favourite subculture keyboard layout eg neo]! nice!”. If so, you'd be wrong. If, on the other hand, you're thinking “perfect keyboard layout? There's no such thing!”, then I couldn't agree more. Anyway, &hellip;

I have been using the standard US keyboard layout for years, almost always without actually using an American keyboard. The main reason I chose it over the German layout is that characters like []{}\|/`, used in many programming languages, are placed in a civilized manner, meaning I can type them quickly and without breaking my fingers.

The standard US layout has a certain problem, though: when it comes to typing in languages that don't happen to be English, it fails spectacularly. Since I have to write quite a lot of German and, nowadays, French, on my computer(s), this is quite a drawback.

Umn, I fixed it

It's not that hard to create your own keyboard layout, which I have done. I chose to use a standard US layout as base, leaving every single key binding intact, using level-3, i.e. AltGr, bindings to represent missing characters.

The German umlauts and ligature ÄäÖöÜüß? (the last character is the capital ß) I decided to map to the most obvious places imaginable: on the A, O, U and S keys, so AltGr+Shift+U produces Ü.

I created support for most romance languages by adding Çç to the C key (as above), Ññ to the N key, and a number of hidden dead keys: AltGr+' e renders é, AltGr+" e renders ë; the keys for `,~ and ^ act equivalently. The characters Ææ,Œœ,Øø and Åå are on the W,I,Q and Z keys, respectively, ensuring full support for French, Danish, Norwegian, Swedish, and probably other languages. The Esperanto alphabet is completed by the dead circumflex ˆ and AltGr+y, rendering ?. Some other possibly useful characters, beside the quotes „ « » ‹ › “ ”, are ¿ ¡ € £ ‰ ? ? ? § ¦ - —. If you're really interested in the details of the layout, please, try it out!

Yes, you can have it

For X11 (Linux and other Unices): us_tj2.tgz.

For Microsoft Windows: us_tj2c.zip. (older version, missing a number of characters. German, French, and Spanish are supported equally.)