JUMP2NR.PL - Automate moving to a new release of your distribution

jump2nr.pl - Automate moving to a new release of your distribution


SYNOPSIS

    jump2nr.pl -cfg nextrel.cfg  testsys


DESCRIPTION

I manage a cluster of machines. Some of the machines are 'gateways' which have many users and the systems are large and complex. Most of the machines are 'client' nodes which are user-less (only have root) and data-less (contain no user data).

jump2nr.pl was written to give me a simple means to replace the operating system on these clients. Conventional approaches to doing this usually require some sort of TFTP server which provides special software to do the updating. This requires a boot from the network (PXE boot) and some of my systems cannot do a PXE boot and those that can require console access to press a key (PF12) to initiate a PXE boot.

What I really want is some command I can just issue from a central site which goes through a set of machinations and the result is a new system on some client node.

Updating the operating system is not just a case of installing the latest version of some packages. It might well mean a change of distributions or moving the root partition from sda2 to sda3.

As I was describing this to someone, I realized that my systems all had just enough to resources to work:

  * they all have plenty of memory so they will not swap
  
  * /boot was a separate partition


THE TRICK

The keys to doing this update came when I realized

 * I could steal the swap partition
 
 * The swap partition was big enough to run a small Linux system
 
 * A distribution can be installed as a tar file plus a small set of changes
 
I've named this process B<jumping> because it reminds me of a leap off a cliff
- think of the ending of Butch Cassidy and the Sundance Kid.
Hopefully your result will not be as traumatic.
If everything is done correctly, you hit the ground, roll and come up.
If not, well, you were going to have to put a CD in and install manually
anyway, right? :-)

Here is a poor man's diagram of what we'll do:

     target            jumpsystem        newtarget           completed
  +-----------+      +-----------+      +-----------+      +-----------+ 
  | /boot     |      | /boot jump|      | /boot     |      | /boot     | 
  +-----------+      +-----------+      +-----------+      +-----------+ 
  | /         |      | unused    |      | /   new   |      | /   new   | 
  |           | boot |           | boot |           | -->  |           | 
  |           |      |           |      |           |      |           | 
  +-----------+      +-----------+      +-----------+      +-----------+ 
  | swap      |      | /     jump|      | unused    |      | swap      | 
  |           |      |           |      |           |      |           | 
  +-----------+      +-----------+      +-----------+      +-----------+

The jump process is to install a separate, carefully tailored operating system on the target system's swap partition and configure the /boot parition to boot the new root (old swap) partition.

After the first reboot the host is running our jump system - a carefully tailored system which uses only the original modified /boot partition and the original swap partition. This makes all the other partitions available to be changed.

The second step is to 'install' the new distribution using the other target partitions. The /boot partition can be re-used now for the new distribution since our jump system will not use /boot again. We take advantage of the fact that Linux systems can mostly be installed by formatting a new root partition and extracting a tar file at the new root partition (e.g. on the correct unused partition from the original system). After installing most of the files, there are a small set of file changes that need to be made - mostly dealing with a hostname change. The boot configuratiuon file in /boot/grub/menu.lst is now corrected to point to the new root partition.

A second reboot boots the new target system with the new distribution. One last detail must be handled by the new system - to get the swap partition set up again.


VARIATIONS

In the section above I've presented a very simple scenario, but it does not need to be quite so simple. I've shown the root system of the target being reused, but of course, the new target partitions might be quite different from the original target's.

Once the jump system is running, your scripts are free to do anything like re-partition the drives, reformat, copy data from one parition to another, fetch data from another system, etc.

As long as it all works, you are fine. If your scripts fail, you probably have a mostly unusable system on your hands, so this process is not reasonable for a single upgrade, but is aimed at doing system upgrades to many hosts.


DETAILS

The actual steps to do a jump are pretty simple:

  * Copy some files to the target system
  * On the target system do:
    * Save some details from the target
    * Take the swap partition offline
    * Format the swap partition for the jump system root
    * Set up jump system to complete the upgrade
    * Set up grub to boot the jump partition
    * Reboot
  * On the jump system, rc.local begins the restore process:
    * Re-partition if necessary
    * Reformat a new root partition
    * Mount it and restore a new distribution
    * Copy saved files from the original system to the correct places
    * Change important files on the new distribution
    * Set up grub to boot the newly installed target system
    * Set up target system to complete the upgrade
    * Reboot
    
  * On the newly installed jump system do:
    * Format the original swap partition (past jump system root partition)
    * Mount the swap parition for the newly installed system

Conceptually this is pretty simple, but of course, the devil is in the details. Don't underestimate what's going on. It's not hard - as long as you understand the details. This is not for the faint of heart. jump2nr.pl is no magic bullet, not beats understanding what's going on.


FILES

jump2nr.pl is designed to use several files:

  * a configuration file
  
  * a shell script
  
  * a tar file of a jump system
  
  * a tar file of a new target system

Configuration File


The configuration file is read by B<jump2nr.pl> and is, essentially, 
a simple shell script which copies files issues commands to be executed
on the jump system.

Comment lines begin with '#' in column one and blank lines are ignored. Lines must use one of these formats: * key=value Defines variable 'key' * key=`cmd` Sets variable 'key' to STDOUT value of cmd * print string Print something to the STDOUT * local cmd Command to be run on the local host * remote cmd Command to run on target host * jump(function, parms) Invoke a builtin to do something complex * jump(usepartition, swap) Set jump root partition on swap partition * jump(usepartition, /dev/sda2) Set jump root partition to a particular partition

'cmd' may be a conventional command line with shell special characters, including parenthesies e.g. (cd /etc; ls). Note that quote characters ARE REMOVED from cmd.

Any occurence of a varable surrounded with % (e.g. %xx%) will be replaced by the value of that variable which has been previously defined. This means variables need to be defined at the top of the file.

The following variables are used internally by jump2nr.pl: _JUMPPARTITION, _JUMPDEV, _JUMPDEVICEID, _HOST, _LOGFILE, _MOUNTDIR.

An example configuration file is provided below.

Shell Script


There are quite a few commands to be run on the target machine when either
setting up the jump system or after the reboot as the upgrading of
the new installation is being done.

The examples provided use the script name jump2.sh for both of these parts - thereby consolidating all the information in one place. You are, of course, free to do this anyway you can imagine.

An example shell script is provided below.

Tar of Jump System

This might seem complex, but it is really quite easy. Install any version of Linux you want on some system and then simply create a tar file of it, e.g.

  tar --one-file-system --exclude proc -f jump.system.tgz /

This creates the file 'jump.system.tgz' which can be extracted on the target system to create a boot jump system. Keep this as small as you can manage (mine is 290MB). Once you have a working system image, you may well never need to re-create this.

Sorry, no example is provided.

Tar of New Target System

This tar image is created just like that one above - except this should be an image of the final upgraded system you want. This might be very different from the Jump System, or it might be almost exactly like the Jump System.

Once restored (see the Shell Script), there will be various files you need to update - including /etc/fstab, /etc/passwd, /etc/rc.local and others. While all the changes might be confusing, once you get a runnable system and can SSH to it, you can easily derive what more needs fixing and add that to restore in your shell script.

Sorry, no example is provided.


OPTIONS

-cfgfile file

Specifies the control file to use. There is not optional.

-logdir DIR

Specifies the directory where the logfile file is created. This defaults to /tmp.

-noprompt

Causes the script to avoid prompting for permission to proceed.

-noexec

Prevents the script from actually executing commands, rather it will just show you want it would have executed.

-ssh string

Specifies the ssh command to be used. The default is sudo ssh.


PARAMETERS

hostname

Specifies the name or IP address of the target host. This is the machine to be upgraded.


SAMPLE FILES

Sample Configuration File


   #   Target on target system where files get copied
  TARGET_DIR=/jumpfiles
  #   Define a script to be run BEFORE we get going. Remote location will be TARGET_DIR
  TARGET_SETUP = jump2.sh
  #   Define what partition is used for the target system
  TARGET_PARTITION = /dev/sda1
  #   Tar file of new target system image to be copied extracted
  TARGET_TAR=target.pe1950.system.tgz
  NEW_TARGET_TAR=`basename %TARGET_TAR%`
  #   New target system rc.local. Used to boot strap and complete new target
  NEW_TARGET_RC1=rc.local.client
  NEW_TARGET_RC2=rc.local.reconfigure.client
  #   Define what partition to use on the jump system
  JUMP_PARTITION = swap
  #   Where partition on jump system is mounted
  JUMP_MNT=/media/jump
  #   Tar file of jump system image. Copy this to target and extract
  JUMP_TAR=jump.system.tgz
  #   Target on jump system where files will be found
  JUMP_DIR=%TARGET_DIR%
  #   GRUB menu.  LILO is similar, but has not been tried
  GRUBMENU=/boot/grub/menu.lst
  #   Show values of all variables
  #Print Dumping config variables in %_LOGFILE%
  #Showvars()
  #remote echo At %_HOST% it should be %D% too
  #   Copy local files to remote system so commands run there have what they need
  print Copying files to %_HOST%
  remote mkdir -p %TARGET_DIR%
  local sudo scp -q %JUMP_TAR% %TARGET_TAR% %TARGET_SETUP% %NEW_TARGET_RC1% %NEW_TARGET_RC2% root@%_HOST%:%TARGET_DIR%
  print Run target setup on %_HOST%
  remote %TARGET_DIR%/%TARGET_SETUP% prep
  #   Mount the swap partition on the target system
  print Prepare to use %JUMP_PARTITION% on %_HOST%
  jump(usepartition, %JUMP_PARTITION%, %JUMP_MNT%)
  #
  #   Build the jump system on the target. We will soon be booting this
  #
  #   Extract tar file in %JUMP_MNT%
  print Extracting jump image tar file in %JUMP_MNT% at %_HOST%
  remote (cd %JUMP_MNT%; tar xzf %TARGET_DIR%/%JUMP_TAR%)
  #   Tailor some details for the jump system
  print Run target setup processing on %_HOST%
  remote %TARGET_DIR%/%TARGET_SETUP% save %TARGET_PARTITION% %JUMP_MNT% %TARGET_DIR% %JUMP_DIR% %NEW_TARGET_TAR%
  #   Fix JUMP boot menu to specify %_JUMPPARTITION% and %_JUMPDEVICEID%
  remote cp -p %JUMP_MNT%%GRUBMENU% %JUMP_MNT%%GRUBMENU%.saved
  remote (cat %JUMP_MNT%%GRUBMENU%.saved | sed -e s:%TARGET_PARTITION%:%_JUMPPARTITION%: -e s:hd0,0:%_JUMPDEVICEID%: > %JUMP_MNT%%GRUBMENU%)
  #   Fix fstab to use %_JUMPPARTITION% for / and not as swap
  remote cp -p %JUMP_MNT%/etc/fstab %JUMP_MNT%/etc/fstab.saved
  remote (cat %JUMP_MNT%/etc/fstab.saved | grep -v %_JUMPPARTITION% | sed -e s:%TARGET_PARTITION%:%_JUMPPARTITION%: > %JUMP_MNT%/etc/fstab)
  remote grub-install --root-directory=%JUMP_MNT% %_JUMPDEV%
  print Rebooting target sytstem %_HOST% ...    Courage grasshopper
  remote reboot

Sample Shell Script

  #!/bin/bash
  #
  #   jump2nr.pl script to help setup jump and new target systems
  #
  #   Syntax:   jump2.sh  prep | save | restore
  #
  #   If something is wrong, the script should exit with a non-zero return code
  #   All exits should be 'exit 0' or 'exit 1' or jump2nr.pl might incorrectly
  #   conclude the remote command failed
  #
  me=`basename $0`
  FILES2SAVE="/etc/fstab /etc/hosts /etc/hostname /etc/passwd /etc/shadow /etc/ssh /etc/network/interfaces /root /etc/postfix/main.cf";
  NEW_TARGET_RC1=rc.local.client
  NEW_TARGET_RC2=rc.local.reconfigure.client
  LOGFILE="/var/log/restore.log"
  ############################################################
  #   prep
  #   Runs on the target system before we try to set up a jump system.
  #   E.g. Stop applications, get users off, etc
  ############################################################
  if [ "$1" = "prep" ]; then
      exit 0
  fi
  ############################################################
  #   save  TARGET_PARTITION  JUMP_MNT  TARGET_DIR  JUMP_DIR  TARGET_TAR
  #   Runs on the target system after the jump root dir is set up
  #   This should copy important files from the target system (/) to the jump system.
  #   These files will be used in a 'restore' step
  ############################################################
  if [ "$1" = "save" ]; then
      shift
      if [ "$#" != 5 ]; then
          echo "Invalid '$me save' syntax:  $0 $*"
          exit 1
      fi
      TARGET_PARTITION="$1"           # /dev/sda1     partition for new target
      JUMP_MNT="$2"                   # /media/jump   where jump system image is mounted
      TARGET_DIR="$3"                 # /jumpfiles    where jump files are on target system
      JUMP_DIR="$4"                   # /jumpfiles    where saved files are on jump system
      TARGET_TAR="$5"                 # target.system.tgz  tar image of new target system to restore
      if [ ! -r $TARGET_DIR/$TARGET_TAR ]; then
          echo "$me - Target system tar image ($TARGET_DIR/$TARGET_TAR) does not exist"
          exit 1
      fi
      echo "Copying files to $JUMP_MNT/$JUMP_DIR"
      mkdir -p $JUMP_MNT/$JUMP_DIR
      cd $JUMP_MNT/$JUMP_DIR || exit 1
      for f in $0 $FILES2SAVE $TARGET_DIR/$TARGET_TAR $TARGET_DIR/$NEW_TARGET_RC1 $TARGET_DIR/$NEW_TARGET_RC2; do
          echo "  $f"
          cp -rp $f . || exit 1
      done
      #   Create rc.local on jump system to invoke me to restore files
      echo "Set up rc.local on jump system to complete process"
      f="$JUMP_MNT/etc/rc.local"
      if [ ! -r $f ]; then
          echo "#!/bin/sh" > $f
      fi
      echo "$JUMP_DIR/$me restore $TARGET_PARTITION $JUMP_DIR $TARGET_TAR 2>&1 | tee $LOGFILE" >> $f
      echo "reboot" >> $f
      chmod 755 $f
      echo "Jump system ready for reboot"
      exit 0
  fi
  ############################################################
  #   restore  TARGET_PARTITION  JUMP_DIR  TARGET_TAR
  #   Runs on the jump system
  #   This should restore a new target system image
  #   This should copy saved important files from the old target
  #       system (/) to the new target system
  ############################################################
  if [ "$1" = "restore" ]; then
      shift
      echo "#================================================="
      echo "#  jump2.sh restore"
      echo "#================================================="
      if [ "$#" != 3 ]; then
          echo "Invalid '$me restore' syntax:  $0 $*"
          exit 1
      fi
      TARGET_PARTITION="$1"           # /dev/sda1     partition for new target
      JUMP_DIR="$2"                   # /jumpfiles    where saved files are on jump system
      TARGET_TAR="$3"                 # target.system.tgz  tar image of new target system to restore
      #   Clean out the target partition, restore the new TAR image
      #   This is where we seriously make a commitment
      mkfs -t ext3 -F $TARGET_PARTITION || exit 1
      mount -t ext3 $TARGET_PARTITION /mnt || exit 1
      cd /mnt || exit 1
      tar xzf $JUMP_DIR/$TARGET_TAR
      echo "If you are lucky, you have a runnable system now"
      df
      #   Now restore those files saved by 'save'
      echo "Restoring files to the new target"
      for f in $FILES2SAVE; do
          echo "  $f"
          if [ "$f" = "/etc/passwd" ]; then
              echo "$f requires special handling, cannot just blast a new one in. Replace this code."
          elif  [ "$f" = "/etc/shadow" ]; then
              echo "$f requires special handling, cannot just blast a new one in. Replace this code."
          else 
              fb=`basename $f`
              fd=`dirname $f` 
              cp -rp $JUMP_DIR/$fb ./$fd
          fi
      done
      #   Files restored.  Prep new target so it can be booted
      #   The new target system better have /boot/grub/menu.lst set up properly
      echo "Prepare new target system for reboot"
      #   Derive device from partition
      d=`perl -e 'chop($ARGV[0]); print $ARGV[0];' $TARGET_PARTITION`
      grub-install --root-directory=/mnt $d
      #   Last hook - replace existing rc.local support file with this one
      echo "Enable my local hooks so the new target system finishes the last bit"
      cp -p $JUMP_DIR/$NEW_TARGET_RC1 $JUMP_DIR/$NEW_TARGET_RC2 etc
      touch /mnt/etc/jump.initialize.myself
      exit
  fi
  echo "Syntax:  ${me} prep | save | restore"
  exit 1


EXIT

If no fatal errors are detected, the program exits with a return code of 0. Any error will set a non-zero return code.


AUTHOR

Written by Terry Gliedt <tpg@hps.com> in 2009 and is copyrighted (C) by Terry Gliedt and the University of Michigan. This is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; See http://www.gnu.org/copyleft/gpl.html