Russian version
Add to Del.icio.us
English version
Digg It!

 Old-School dkLab | Constructor | Sshbak: Simple unix-based remote backup system over SSH 

Site map :: Project Orphus :: Constructor


2006-06-01
Discuss at the Forum

This is the plain contents of README.txt file.

Overview -------- Sshbak is very simple, but powerful remote backup utility for system administrators. It performs fast and comprehensible daily (usually) backup of multiple machines via remote SSH connection. Merits of this system: 1. Backup result is simple set of tar+gzip files. 2. Backup of piped program output (e.g. - remote backup of mysqldump or pg_dumpall results). 3. Only SSH and Perl 5.6+ needed on client machines. 4. Works in Linux, FreeBSD and even Windows. 5. Easy-customizable sparse volume rotation. 6. Backup unlimited number of sources, possibly parallel. Service sshbak consists of 2 kind of components: 1. One sshbak server (sshbak-pull). Runs on machine with large hard disks ("backup storage"). This server periodically tries to connect specified remote machines ("backup sources") and PULL (upload) backup archives from them. Periodicity of uploading managed via crond. Server sshbak-pull always runs with low privileges (non-root user). It consume very low CPU but a lot of network traffic, because do not compress anything, just store. 2. Many sshbak clients (sshbak-push). Each of client runs on machine which must be backed up periodically ("backup source"). Client runs with root privileges. It performs tar+gzip of some subset of directories and files of backup source or piped result of custom program (e.g. mysqldump). This subset is commonly splitted by ARCHIVES, e.g.: bin.tgz, home.tgs, var.tgz etc. Split factor and rules are specified in configuration file situated on server machine (sshbak-pull does not have its own configuration, it reads parameters from command-line and stdin). Sshbak-pull consumes a lot of CPU, because it performs compression work. machine3 sshbak-push ^ | | machine1 SERVER machine2 sshbak-push <----------- sshbak-pull -----------> sshbak-push (request) (request) Note: you may not use sshbak-push utility at all and configure sshbak-pull server to perform ANY command you like (e.g. - local backup via shell). Server: bin/sshbak-pull ----------------------- Sequentially connect to specified backup sources and fetch backup archives from them using custom rules. USAGE # sshbak-pull [--logdir=log_dir] [--nofork] [--dryrun] backup-dir ARGUMENTS * backup-dir Archives are commonly stored in directories backup-dir/group/machine/machine-<date>/ etc. where: - group is directory of backup groups; - machine is directory holding backups for separate machine; - <date> is current date and time (customizable) * --logdir=log_dir If specified, use directory log_dir for store group log files. Each log file has the name group.log and holds log lines produced while processing group of sources. You may also specify "--logdir=-", in this case logs are written to stdout. * --nofork If present, groups are processed sequentially (not parallel). Option is very helpful for debugging together with "--logdir=-". * --dryrun If present, does not write anything to backup directory. Instead, create fetched archives in /tmp and delete them immediately. Very handy for debugging: "--logdir=- --dryrun --nofork". DESCRIPTION Target folders have names backup-dir/group/machine/machine-<date>/. Directories machine-<date> are rotated using custom (possibly sparsed) rotation rules (see below). Let's consider typical structure of backup-dir: ./group1/ machine1/ machine1-2006-06-03___03-00-00/ machine1-2006-06-07___03-00-00/ machine1-2006-06-08___03-00-00/ sshbak-pull.ini machine2/ machine2-2006-06-03___03-00-00/ machine2-2006-06-07___03-00-00/ machine2-2006-06-08___03-00-00/ sshbak-pull.ini group1.log sshbak-pull.ini ./group2/ machine3/ machine3-2006-06-03___03-00-00/ machine3-2006-06-07___03-00-00/ machine3-2006-06-08___03-00-00/ sshbak-pull.ini machine4/ machine4-2006-06-03___03-00-00/ machine4-2006-06-07___03-00-00/ machine4-2006-06-08___03-00-00/ sshbak-pull.ini group2.log Backup alogrythm is simple: for each found subdirectory (group1, group1) performs sequential backup, treating directory content as separate machines (e.g., backup machine1, machine2 in group1 and machine3, machine4 in group2). Configuration of backup sources on each machine is located in sshbak-pull.ini files. All groups are processed in parallel mode using child processes (exact purpose of grouping). It is very handy to use parallel backup if you have a number of hardware nodes with installed virtual machines (e.g. - openvz). In such case group1, group2 will be hardware nodes, and machine1, machine2 and machine3, machine4 - virtual nodes within this physical machines. Different machines may be backed up in parallel mode, but each virtual machine within one hardware node must be backed up sequentially to avoid CPU overloading. For each of group separate log file is also stored (see group1.log, group2.log). You may call sshbak-pull in cron scripts. See tools/sshbak-cron-daily.sh for more details. CONFIGURATION Each backup source (machine) may have its own set of configuration parameters. All these parameters are stored in sshbak-pull.ini files. You can have multiple sshbak-pull.ini files in different level of directory hierarchy. Assume we are backing up machine backup-dir/group1/machine1/: - First /usr/local/sshbak/sshbak-pull.ini file is read. It holds default configuration parameters in most cases. (Assuming that sshbak is installed in /usr/local/sshbak/.) - Then backup-dir/sshbak-pull.ini file is read possibly overriding (or adding) directives of previous one. This file may contain options specific for set of backup groups in backup-dir. - Then backup-dir/group1/sshbak-pull.ini is read. - At last, backup-dir/group1/machine1/sshbak-pull.ini is read. Configuration file is quite usual INI-file. It consists of key=value pairs and optionals groups enclosed by []: ssh_identity = ./identity ssh_host = example.com [stdin] some unparsed content Group is always unparsed: it is just a way to define multiline parameters. In example above we define two single-line keys and one - multiline key with name "stdin". It is possible to set references from one configuration parameter to another, e.g.: shell = $ssh_shell ssh_shell = ssh -o StrictHostKeyChecking=no { -i $ssh_identity } -l $ssh_user $ssh_host No matter if you refer back or forward (as in example): all references are expanded only on demand, not while configuration reading. At last, you may use {} to specify "optional" parts of string values. E.g., if you say "{ -i $ssh_identity }" (spaces after { and before } are cignificant!), but ssh_identity parameter is not set or equals to empty string "", all {}-group is skipped. Else { and } are simply removed from the string. You may see description of all configuration options in default sshbak-pull.ini file situated in root sshbak directory. ALGORYTHM Backup process of each source machine contains of 2 generic steps: 1. Connect to source machine using shell $shell (here and further $shell means value of configuration parameter "shell") and run command $cmd-list in it. Stdin of called command is filled with data from $stdin keys (commonly this key contains list of wildcards used for backup). 2. Treating each line of result as separate backup archive name, sequentially connect to source machine using shell $shell, run command $cmd-data, pass $stdin as command stdin and store output in separate files. Backup directory name is obtained from $datefmt parameter. Please note that sshbak-pull "knows nothing" about details of backup process. E.g. it does not expand wildcards nor interpret achive name and content. All work should be done by $cmd-list and $cmd-data programs, in most cases - sshbak-push. ROTATION Old backups need to be deleted to save disk space. But you may want to keep not only previous 10 (for example) backups, but always keep 4 last Saturdays backups, 2 backups for 1st of monthes etc. To do that you must specify rotation rules in $rotate. For example: rotate = ":7; %w==6: 4; %d==1: 2" Here %w is "weekday number", %d is "day of month". Rule defines what backups should NOT be deleted. In our example that mean "do not delete: 4 last backups with weekday number 6; 2 backups with day of month 2". Part ":7" means "keep no more 7 other last backups, which are not matched by other rules". See strftime() manual for other %-macros description. Client: sshbak-push ------------------- Client program sshbak-push performs backup operation on backup source machine. There are 2 kinds of call. 1. # sshbak-push - In case of running with "-" argument sshbak-pull prints out list of backup volumes separated by newline. This list is generated using wildcard information fetched from stdin. Format is very simple: ---------------------------------------------------------------------- # Include these files and directories (one volume per match). # Each matched entry will become separate backup volume # with "tgz" extension. /* /home/* # Include gzipped output of custom programm (e.g. - pg_dumpall). # This program is always called with privileges of user "nobody". /DUMP/postgres: pg_dumpall -i # Do not include these objects in backup. - /proc - /tmp - *access_log* ---------------------------------------------------------------------- Using this iinformation, you will have archives named bin.tgz, etc.tgz etc. for directories in root folder (matches of "/*"), archives named home_httpd.tgz, home_username.tgz etc. for each matches of "/home/*" wildcard and one DUMP_postgres.gz archive for gzipped output of pg_dumpall utility. Archives will not contain system directories (like /proc) and files containing "access_log" in their pathes because of defined exclusions. So, stdin of sshbak-push determines what to backup on current machine and how to split information using volumes. This data commonly passed from sshbak-pull server while performing the backup (see $stdin configuration parameter above). 2. # sshbak-push archive-name In case of running with one argumentthis argument is treated as archive name from above list. (Of course you must pass the same stdin in this case to produce same wildcard matching.) Program sshbak-push will compress needed files, produce resulting tar+gzip archive (or, in case of custom program running, gzip archive) and print it to standard output. This output commonly gathered by sshbak-pull and stored to archive file. Commonly program sshbak-push must run with root privileges, or it cannot access files to backup owned by system. SECURE CONFIGURATION Utility sshbak-push commonly called by sshbak-pull server over SSH connection. Command line looks something like this: # ssh -l sshbak-push machineN - (to obtain list of backup volumes) or # ssh -l sshbak-push machineN archive.tgz > archive.tgz (to obtain content of the volume) There are some pre-requisities for sshbak-push to work: 1. User with name sshbak-push HAVING ROOT PRIVILEGES must be created on backup source machine. 2. Script sshbak-push must be assigned as SHELL for user sshbak-push in /etc/passwd. 3. Backup server must have passwordless access to backup source machines if it connects under sshbak-push user. On Linux you can create user with proper rights and shell by typing: # useradd -u 0 -g nobody -d /usr/local/sshbak \ -s /usr/local/sshbak/bin/sshbak-push.pl \ -c "Backup user with root privileges" -M \ -o sshbak-push It is the most secure configuration: even if hacker is able to login under user sshbak-push, he does not have shell access to the server. Thanks to shell substitution, he can only fetch backup volumes from the machine. To set up passwordless access, generate pair of private+public keys (see below) and put public key to ~sshbak-push/.ssh/authorized_keys and private key to ~sshbak-pull/.ssh/identity file. This process is standard for openssh, see man pages. Setting up backup storage machine --------------------------------- Let's assume that we use /backup as backup storage and /usr/local/sshbak directory to hold unpacked sshbak distribution. To set up sshbak-pull server, do the following. 1. Create directory /backup. 2. To allow passwordless access between sshbak-pull and sshbak-push users (required), you must generate pair of private + public keys: # cd /backup # sh /usr/local/sshbak/tools/make_keys.sh Now you should have /backup/.ssh directory with private key (file "identity") and public key (file "authorized_keys.generated"). Keep private key in secret! Public key must be copied to backup source machines later, see below. 3. Create directory for backup group: # mkdir -p /backup/group1 Create one subdirectory for each backup source machine: # cd /backup/group1/ # mkdir machine1 # mkdir machine2 Create sshbak-pull.ini files on each machine directory. Specify at least hostname of source machine ($ssh_host) and stdin passed to sshbak-push client ([stdin]). Note: see docs/sample directory for more information. 4. You can easily set up backup server permissions and create needed accounts using tools/install-pull.sh script. Note: it assumes that you use /backup as backup storage. # cd /usr/local/sshbak/tools/ # sh ./install-pull.sh You may call install-pull.sh as many time as you need in the future to correct broken permissions or re-create backup account. 5. Set up daily cron script. You may simply copy tools/sskbak-cron-daily.sh to /etc/rc.d/cron.daily (on Linux) and set execute premission. Command for cron is: # su - sshbak-pull -c "perl /usr/local/sshbak/bin/sshbak-pull.pl ~/backup &" Adding new backup source ------------------------ To install sshbak-push to new backup source machine and connect it to centralized backup system, perform the following actions. 1. Unpack sshbak distribution to /usr/local/sshbak. It will be treat as sshbak-push user home directory. 2. Create .ssh directory and copy authorized_keys file generated before on backup storage machine to .ssh. Note that you should rename authorized_keys.generated to authorized_keys. 2. Run tools/install-push.sh script: # cd /usr/local/sshbak/tools # sh ./install-push.sh It will create user sshbak-push, assign default shell for that user to sshbak-push script and correct permissions to allow passwordless connect. 3. To add new backup source to backup storage machine, create empty directory with name equal to source machine on backup storage. E.g., create /backup/group1/machine-name on sshbak-pull machine. Then create sshbak-pull.ini configuration in it: ssh_host = example.com [stdin] /* # and other wildcards as described before To test correctness of sshbak-push configuration, open shell on source machine and try to run: # su - sshbak-push -c - Then type: /* ^D where ^D is Ctrl+D key combination. You should see the list of archives for root directory. Test installation ----------------- To test overall installation (sshbak-pull and sshbak-push), go to backup storage machine and run: # su - sshbak-pull -c "perl /usr/local/sshbak/bin/sshbak-pull.pl --logdir=- --dry-run ~/backup" Of course you should first install sshbak-push on at least one backup source machine. If you get password prompt, check private + public keys installation and watch /var/log/messages and /var/log/secure (on Linux) for more details. In most cases it is enough to re-run tools/install-pull.sh and tools/install-push.sh (on backup source), because these scripts correct permissons to private and public keys. User roles ---------- You may see that sshbak system uses 2 different users without root privileges to perform backups. 1. User sshbak-push This user exists on each machine which is needed to be backed up. Account of this user must have root access rights (zero UID) and custom shell equals to /usr/local/sshbak/bin/sshbak-push.pl. This script acts like a shell: if you connect under this user with parameter "-", you will get the list of backup archives on target machine (after reading wildcard list from stdin). And if you connect with some other parameter (see -c switch of ssh utility), you will get the content of specified archive. 2. User sshbak-pull User account under which daily backup is performed. This user owns all the backup archives in the system and SHOULD NOT have root privileges. Of course, you may add one more separate user (e.g. "backup") with group sshbak-pull and home directory at /backup. This user will have read-only access to all backup volumes.




Dmitry Koterov, Dk lab. ©1999-2014
GZip
Add to Del.icio.us   Digg It!   Reddit