Linux Backup Suite

1. Requirements

1.1. Backup list specifications

Full, incremental, archive Backups (aren't the latter also just incrementals?)

Distinguish between backup objects and backup lists. It should be possible to do backups of different sets in an intelligent manner. For example, make a separate full backup of every host, but make incremental backups of all hosts. The incremental should then only backup files which have been changed since the last full backup of that particular host.

Keep backups for different times. E.g., a full backup could be kept for ever, a level-1 incremental backup for 2 months and a level 2 backup for 2 weeks until it is overwritten.

Redundant backups. Make backup of a file if there are less than n backups of the file.

1.2. Supported hardware

Tape-like devices (support serial read, write. Block/File structure. Can seek forward/backward to block/file boundary. Writing truncates tape).

Disk-like devices (random access. Block structure). Can be used like tape-like devices or through a file system.

Filesystems. (can be mounted)

Other (need special software to access).

Media change: SCSI Libraries, manual change.

1.3. Supported file systems

Linux supports several different file systems, many of which have either non-unix semantics or are extensions of the unix file system (ext2 attributes, FAT attributes, POSIX ACLs, ..., large files, sparse files). If we add support for other OSs, this will become worse (WinNT ACLs, ...)

Need an an extensible, backwards compatible storage format

Some systems have snapshot capabilities. (i.e., a consistent snapshot of the filesystem is made and can be mounted read-only - Updates of the rw-mounted copy of the filesystem are written to a log as long as the snapshot exists). This can be used to create a consistent backup of a file system.

DB backups, etc: It should be possible to integrate online DB backup schemes. At the very least it must be possible to shut down the database before backup and restart it afterwards.

1.4. Scalability

The system should scale from a single system with a few generations of backups to hundreds of systems with many generations (e.g., monthly full backups kept for years).

Neither counters nor tables must be restricted to 32 bits. (this may actually be solved by using different databases: If you can get by with at most 2 billion file versions use MySQL, if you need more, use Oracle)

It may be necessary to export and import parts of the database to secondary storage. I envision simple text files as an export format.

Tapes may have to be imported into a database. This should work faster than a sequential read of the whole tape.

1.5. Restore scenarios

1.5.1. Single File Restore

User needs a file restored from backup. Name, date, and location of backup are only known vaguely, but it is needed now.

Need efficient search methods in metadata database. Need fast access to single files on media (no sequential search!) Need possibility to restore files to a differenfct location.

1.5.2. Group restore

Like Single File Restore, but a group of related files must be restored.

Needs repeated searches to include/exclude files, recursive includion, manual editing of list. Files may be on different media, so preview of needed media is needed.

1.5.3. Disaster recovery

We have a bit conflicting goals here. If we assume a network of even moderate size, we can assume that the backup server will always be up (we could have two of them), which makes it relatively easy to make a recovery floppy. If we need to restore a standalone system, that may need a larger boot medium (but then, standalone systems usually have a CD-ROM from which they have been installed)

1.5.3.1. OS still exists and we have access to a working backup server

The OS and the backup software still exist, but large portions of the filesystem (e.g, all home directories) have been wiped/corrupted. We still have access to a working backup server.

This is really just the same as Group restore above.

1.5.3.2. Restore from scratch over the network.

system has been completely wiped (e.g. disk crash on a workstation). The system can boot from a rescue medium, but it may be small (e.g., only floppy, no CD-ROM). A backup server is still available.

Need to boot from recovery system (on floppy or CD), get network connection (DHCP?), repartition (may need to change partitioning because new disk has different size than old), format, restore from over network (DHCP client?).

1.5.3.3. OS still exists, but backup server is down.

E.g., the affected system is the backup server and we trashed the database.

Restore software if needed. Restore directly from media.

1.5.3.4. Restore from scratch locally

Like Restore from scratch over the network above, except that we have to restore from local media, which means that we need to include means to access Tape/CD/... on the rescue disk.

1.6. Networking/Security considerations

Systems to be backed up may be on the "wrong" side of a firewall (e.g. the web server is outside of the firewall, the backup server inside, but we still want to make a backup of the web server).

This means that either all connections must be initiated from "inside" or - preferably - that we use defined ports (or at least port ranges) for everything.

If connections can be initiated by "untrusted" hosts, they may need to authenticate themselves.

Users may also need to authenticate themselves if they need to restore their own files - and there must be a way to identify "their" files.

Encryption of connections to prevent sniffing

Encryption of content to prevent unauthorized restore.

is it possible to have a design which doesn't need a central server?

1.7. Useability considerations

(not sure if this doesn't fit anywhere else)

Tapes should be labelled to prevent accidental overwriting and so that they can be found again - a) in a heap of tapes by a human operator, b) by the software in a tape library.

User interfaces: GUI (web?), commandline.

Status/Error reporting: Syslog or through database? (I vote for the latter)

2. Implementation ideas

2.1. How to store backups on media

tar, cpio: Simple and widely supported formats. Can be restored without lbs. Have restrictions regarding filename and file sizes and non-Unix metadata (e.g., ext2 attributes or ACLs).

YABF (Yet another backup format): We could invent our own to overcome restrictions of tar and cpio (I think we need to). Care must be taken to keep format extensible. Maybe some other backup software has a format we can use?

2.2. How to store backup meta data

In a database. Data model TBD.

2.3. How to store configuration

Plain text files: Easy to edit. Could be either complex (e.g., perl code) for maximal flexibility, or simple (for UI and automated frontends), but not both at the same time!

In database: Need special tools to edit. "Grammar" is inherent in DB structure, so frontends always know how to edit the config. Chicken-egg-problem: How to you store db-related config?

2.4. Implementation platform

OS: Linux as a first. Porting to Unixoid systems should be trivial. Windows and other systems maybe later

Programming language: Perl seems to be rather popular. It is relatively large, though - can we fit a rescue system on a floppy? C is nice and small, but hard to program correctly. PHP was also suggested. (theoretically we can implement different parts in different languages)

Database: should be portable. Mysql is only intended as a first target. Interest in postgresql, adabas, db2 and oracle has been aired.

2.5. Network protocols

(I am using Omniback terminology here, although it isn't clear yet that we will be using the same architecture: A disk agent is a program which accesses the files on a filesystem to be backed up or restored. A media agent is a program accessing the backup medium (e.g., the tape device or CD-writer) for either backup or restore. The cell server accesses the metadata database and tells the disk and media agents what to do. The client only talks to the cell server)

All these agents, servers and clients have to communicate somehow over the network. We need to specify protocols for that.

NDMP is a standardized (well, sort of, there have been 5 revisions in about as many years) protocol supported by many devices (NASes, Tape libraries, etc.). If we could use that for communication between disk and media agents (and probably the cell server) we automatically have support for these devices. (It may be a little bit complex to implement, though)

2.6. Modularization

Kernel functions (read configuration, put everyting together)

Strategy modules (act in a certan way on the data)

Data modules (plain data, text data, DBs, ...)

Media modules (store the date according as the media likes it best)

Interfaces modules (present user interface in differnt ways)

any better ideas on this... can the Strategy be devided from the Data

2.7. Scheduling

Cron or our own daemon? How to implement "start backup B after backup A finished"?

Requirements
Backup list specifications
Supported hardware
Supported file systems
Scalability
Restore scenarios
Single File Restore
Group restore
Disaster recovery
OS still exists and we have access to a working backup server
Restore from scratch over the network.
OS still exists, but backup server is down.
Restore from scratch locally
Networking/Security considerations
Useability considerations
Implementation ideas
How to store backups on media
How to store backup meta data
How to store configuration
Implementation platform
Network protocols
Modularization
Scheduling