How to fix munged file ownerships

Or, Fun with Xen and LVM

It is wise to use a filesystem snapshot when moving a virtual machine disk image to a new machine. When using rsync from the system hosting the VM, it is usually necessary to use the switch "--numeric-ids" to prevent rsync from trying to remap the UID and GID of each file using the host and destination machines /etc/passwd files.

When you forget the "--numeric-ids" switch, and the host machine has a /etc/passwd file that differs from the VMs, your VM will have some of the GID and UIDs changed. The VM may well boot and appear to work correctly, but there will be problems. Often services will not start correctly.

If you catch the problem quickly, you can fix it by redoing the rsync with the added --numeric-ids switch. On the other hand, if you migrate a few VMs and then, say, go on vacation for a week while your users mutate the filesystem, you may have real trouble sorting things out when you return.

Here is one way to fix this problem:

Recover the ownership information

The first thing to do is attempt to create a list of qualified filenames each associated with the correct UID and GID.

Hopefully a copy of the original filesystem exists somewhere. Find the most recent un-munged copy and mount it somewhere, /mnt/tmp for instance.

You can use the gnu find utility to generate a list of files. I'm using a null byte to delimit record fields to protect against filenames that contain spaces and newlines. The format of the output file will be UID[null]GID[null]filename[null]. Because the record has a fixed number of fields, I'm not using a record delimiter. Here is the find command:

sudo find /mnt/tmp/ -fprintf file_manifest "%U\000%G\000/%P\000"

This will take a little while, depending on the number of files.

Interlude

Now is an excellent time to consider creating a snapshot of the live VM.

sudo lvcreate -L 500mb -s -n beta-snapshot-25-aug-2009 /dev/mirror/beta.puddle.ca-disk

Restore the file ownerships on the live VM

The elegant way to unmunge this mess would be to figure out what got mapped to what, then use find with chown in just the right order to only change the files that need it. That sounds a little too tricky to me, and in fact, the multiple invocations of find that that would require may well be less efficient than one pass over the list of files. Suffice it to say that I will be using a brute force approach and iterating over the entire list just created.

The approach is simple. Retrieve a filename from the manifest file. Stat the file. If it exists, check the ownership, if it does not match the manifest, change it. Repeat until the manifest is at EOF.

This is a job for a script, of course. I'm going to use Python, because I have it, and I like it better than scripting bash. Feel free to send in your one-liner, if you feel the need.

From the live VM, copy the manifest file from the host system. Download the script and invoke it with the name of the manifest file as its argument.

Here is the script:
(But don't copy and paste, download your own copy here)


#!/usr/bin/python

# fix_ownership.py  (http://puddle.ca/~/sim/scripts/fix_broken_owners.html)
# 
# Repair file ownership munged by improper use of rsync
# Simeon Veldstra    25 August 2009
# Released to the Public Domain with *NO WARRANTY*
# Use at your own risk.
#
# Usage: 
# fix_ownership.py manifest_file
# Where manifest_file contains a null delimited list of filepaths 
# preceded with the file's UID and GID as produced by the following find command:
# find /path/to/mount/ -fprintf file_manifest "%U\000%G\000/%P\000"


import os
import sys


def fix_owners(filename, verbose=False, bufsize=2048):

    fp = file(filename, 'r')
    buf = [fp.read(bufsize)]

    def get_field():
        while 1:
            i = buf[0].find('\x00')
            if i != -1:
                break
            next = fp.read(bufsize)
            if next:
                buf[0] += next
            else:
                return None
        val = buf[0][:i]
        buf[0] = buf[0][i+1:]
        return val

    while 1:
        UID = get_field()
        GID = get_field()
        filepath = get_field()
        if UID is None:
            print "Done"
            break
        try:
            UID = int(UID)
            GID = int(GID)
        except ValueError, TypeError:
            print "Corruption in data file near:"
            print UID, GID, filepath
            break
        if os.path.exists(filepath):
            try:
                finfo = os.stat(filepath)
                if UID != finfo.st_uid or GID != finfo.st_gid:
                    os.lchown(filepath, UID, GID)
                    if verbose:
                        print "Fixing ", filepath
            except OSError:
                print "OS Error on file", filepath, "        Are you root?"

    fp.close()


if __name__ == '__main__':
    if len(sys.argv) == 2:
        fix_owners(sys.argv[1], True)
    else:
        print "Usage:", sys.argv[0], "file_manifest"



That's it.
Relax, have a beer, the day is saved.



sim
spamtrap@puddle.ca