环境描述
1.OS
CentOS Linux release 7.2.1511 (Core) X64
2.PostgreSQL
PostgreSQL 9.6.1
3.pg_rman
pg_rman-1.3.3-pg96.tar.gz v1.3.3
注意:请下载版本对应的源码包。
https://github.com/ossc-db/pg_rman/releases/download/v1.3.3/pg_rman-1.3.3-pg96.tar.gz
pg_rman-1.3.3.tar.gz(此源码编译过程中报错)
系统包
zlib-devel
二、pg_rman安装
1.安装pg_rman
root用户登录
export PATH=/opt/pgsql/9.6.1/bin:$PATH
export LD_LIBRARY_PATH=/opt/pgsql/9.6.1/lib
export MANPATH=/opt/pgsql/9.6.1/share/man:$MANPATH
# tar zxvf pg_rman-9_6_STABLE.tar.gz
# cd pg_rman-9_6_STABLE/
# make
......
......
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 backup.o catalog.o data.o delete.o dir.o init.o parray.o pg_rman.o restore.o show.o util.o validate.o xlog.o pgsql_src/pg_ctl.o pgut/pgut.o pgut/pgut-port.o -L/opt/pgsql/9.6.1/lib -lpgcommon -lpgport -L/opt/pgsql/9.6.1/lib -lpq -L/opt/pgsql/9.6.1/lib -Wl,--as-needed -Wl,-rpath,'/opt/pgsql/9.6.1/lib',--enable-new-dtags -lpgcommon -lpgport -lz -lreadline -lrt -lcrypt -ldl -lm -o pg_rman
# make install
/usr/bin/mkdir -p '/opt/pgsql/9.6.1/bin'
/usr/bin/install -c pg_rman '/opt/pgsql/9.6.1/bin'
#
2.安装验证
su - postgres
$ pg_rman --version
pg_rman 1.3.3
3.配置数据库参数
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /pg_arclog/%f && cp %p /pg_arclog/%f'
--- root user
mkdir /backup_pg_rman /pg_arclog
chown -R postgres:postgres /backup_pg_rman
chown -R postgres:postgres /pg_arclog
--- postgresql
# pg_rman init -B $backup_dir
三、备份恢复测试
1.备份数据(full<0> + incremental<1>)
# full
export PGDATA=/pgdata96
export BACKUP_PATH=/backup_pg_rman
$ echo $PGDATA
/pgdata96
$ echo $BACKUP_PATH
/backup_pg_rman
$
--- init backup dir: pg_rman init -B $backup_dir -D $PGDATA(当不配置环境变量时,手工指定,注意路径末尾不添加'/'结束符)
$ pg_rman init
INFO: ARCLOG_PATH is set to '/pg_arclog'
INFO: SRVLOG_PATH is set to '/pgdata96/pg_log'
$
$ cat $BACKUP_PATH/pg_rman.ini
ARCLOG_PATH='/pg_arclog'
SRVLOG_PATH='/pgdata96/pg_log'
--- full backup
$ pg_rman backup --backup-mode=full --with-serverlog --progress
INFO: copying database files
Processed 1172 of 1172 files, skipped 0
INFO: copying archived WAL files
Processed 3 of 3 files, skipped 0
INFO: copying server log files
Processed 4 of 4 files, skipped 0
INFO: backup complete
INFO: Please execute 'pg_rman validate' to verify the files are correctly copied.
--- validate backup
$ pg_rman validate, status: done
INFO: validate: "2017-03-06 16:43:39" backup, archive log files and server log files by CRC
INFO: backup "2017-03-06 16:43:39" is valid
--- show backup, status: ok
$ pg_rman show
==========================================================
StartTime Mode Duration Size TLI Status
==========================================================
2017-03-06 16:43:39 FULL 0m 58MB 1 OK
$
--- incremental
$ pg_rman backup --backup-mode=incremental --with-serverlog --progress
INFO: copying database files
Processed 1172 of 1172 files, skipped 1115
INFO: copying archived WAL files
Processed 48 of 48 files, skipped 3
INFO: copying server log files
Processed 4 of 4 files, skipped 3
INFO: backup complete
INFO: Please execute 'pg_rman validate' to verify the files are correctly copied.
$
--- validate backup
$ pg_rman validate
INFO: validate: "2017-03-06 17:04:45" backup, archive log files and server log files by CRC
INFO: backup "2017-03-06 17:04:45" is valid
$
--- show, status: ok
$ pg_rman show detail
============================================================================================================
StartTime Mode Duration Data ArcLog SrvLog Total Compressed CurTLI ParentTLI Status
============================================================================================================
2017-03-06 17:04:45 INCR 0m 401MB 738MB 27kB 1136MB false 1 0 OK
2017-03-06 16:43:39 FULL 0m 30MB 33MB 206kB 58MB false 1 0 OK
$
2.模拟灾难恢复
1).删除PGDATA 目录下所有文件
安全停止数据库,删除文件
$ pg_ctl stop -m immediate -D /pgdata96/
$ cd /pgdata96
$ rm -rf *.*
2).恢复备份
--- postgres user
$ export PGDATA=/pgdata96
$ export BACKUP_PATH=/backup_pg_rman
$ pg_rman restore
WARNING: pg_controldata file "/pgdata96/global/pg_control" does not exist
WARNING: pg_controldata file "/pgdata96/global/pg_control" does not exist
INFO: the recovery target timeline ID is not given
INFO: use timeline ID of latest full backup as recovery target: 1
INFO: calculating timeline branches to be used to recovery target point
INFO: searching latest full backup which can be used as restore start point
INFO: found the full backup can be used as base in recovery: "2017-03-06 16:43:39"
INFO: copying online WAL files and server log files
INFO: clearing restore destination
INFO: validate: "2017-03-06 16:43:39" backup, archive log files and server log files by SIZE
INFO: backup "2017-03-06 16:43:39" is valid
INFO: restoring database files from the full mode backup "2017-03-06 16:43:39"
INFO: searching incremental backup to be restored
INFO: validate: "2017-03-06 17:04:45" backup, archive log files and server log files by SIZE
INFO: backup "2017-03-06 17:04:45" is valid
INFO: restoring database files from the incremental mode backup "2017-03-06 17:04:45"
INFO: searching backup which contained archived WAL files to be restored
INFO: backup "2017-03-06 17:04:45" is valid
INFO: restoring WAL files from backup "2017-03-06 17:04:45"
INFO: restoring online WAL files and server log files
INFO: generating recovery.conf
INFO: restore complete
HINT: Recovery will start automatically when the PostgreSQL server is started.
$
3).启动数据库验证数据
# /etc/init.d/postgresql start
Starting PostgreSQL: ok
#
切换至postgres用户,然后验证数据
基于时间点恢复
建立测试数据
testdb=# create table tbl(id int primary key, first varchar(20),second varchar(20));
CREATE TABLE
testdb=# INSERT INTO tbl VALUES(generate_series(1,1000000), 'first'||(random()*(10^3))::integer, 'second'||(random()*(10^3))::integer);
INSERT 0 1000000
testdb=#
建立全备份
--- postgres user
$ pg_rman backup --backup-mode=full --with-serverlog --progress
INFO: copying database files
Processed 1172 of 1172 files, skipped 0
INFO: copying archived WAL files
Processed 27 of 27 files, skipped 0
INFO: copying server log files
Processed 1 of 1 files, skipped 0
INFO: backup complete
INFO: Please execute 'pg_rman validate' to verify the files are correctly copied.
$ pg_rman show
==========================================================
StartTime Mode Duration Size TLI Status
==========================================================
2017-03-07 16:57:33 FULL 0m 433MB 4 DONE
$ pg_rman validate
INFO: validate: "2017-03-07 16:57:33" backup, archive log files and server log files by CRC
INFO: backup "2017-03-07 16:57:33" is valid
[postgres@localhost ~]$ pg_rman show
==========================================================
StartTime Mode Duration Size TLI Status
==========================================================
2017-03-07 16:57:33 FULL 0m 433MB 4 OK
$
drop 表
testdb=# drop table tbl;
DROP TABLE
testdb=# \q
停止数据库
--- root user
# /etc/init.d/postgresql stop
恢复数据库到指定时间
$ pg_rman restore --recovery-target-time '2017-03-07 16:58:33'
INFO: the recovery target timeline ID is not given
INFO: use timeline ID of current database cluster as recovery target: 4
INFO: calculating timeline branches to be used to recovery target point
INFO: searching latest full backup which can be used as restore start point
INFO: found the full backup can be used as base in recovery: "2017-03-07 16:57:33"
INFO: copying online WAL files and server log files
INFO: clearing restore destination
INFO: validate: "2017-03-07 16:57:33" backup, archive log files and server log files by SIZE
INFO: backup "2017-03-07 16:57:33" is valid
INFO: restoring database files from the full mode backup "2017-03-07 16:57:33"
INFO: searching incremental backup to be restored
INFO: searching backup which contained archived WAL files to be restored
INFO: backup "2017-03-07 16:57:33" is valid
INFO: restoring WAL files from backup "2017-03-07 16:57:33"
INFO: restoring online WAL files and server log files
INFO: generating recovery.conf
INFO: restore complete
HINT: Recovery will start automatically when the PostgreSQL server is started.
$
启动数据库
--- root user
# /etc/init.d/postgresql start
验证数据
--- postgres user
$ psql testdb
psql (9.6.1)
Type "help" for help.
testdb=# \dt
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | tbl | table | postgres
(1 row)
testdb=# select count(*) from tbl;
count
---------
1000000
(1 row)
testdb=# \q
异常停止数据恢复
描述:当数据库没有成功执行检查点完成,恢复时可能会丢失数据,错误排查
现象:启动数据库失败时
$ more postgresql-Mon.log
2017-03-06 17:20:47 CST [3240]: [1-1] user=,db= LOG: database system was interrupted; last known up at 2017-03-06 17:04:51 CST
2017-03-06 17:20:47 CST [3240]: [2-1] user=,db= LOG: starting archive recovery
2017-03-06 17:20:47 CST [3240]: [3-1] user=,db= LOG: invalid primary checkpoint record
2017-03-06 17:20:47 CST [3240]: [4-1] user=,db= LOG: invalid secondary checkpoint record
2017-03-06 17:20:47 CST [3240]: [5-1] user=,db= PANIC: could not locate a valid checkpoint record
2017-03-06 17:20:47 CST [3238]: [3-1] user=,db= LOG: startup process (PID 3240) was terminated by signal 6: Aborted
2017-03-06 17:20:47 CST [3238]: [4-1] user=,db= LOG: aborting startup due to startup process failure
2017-03-06 17:20:47 CST [3238]: [5-1] user=,db= LOG: database system is shut down
2017-03-06 17:21:23 CST [3269]: [1-1] user=,db= LOG: database system was interrupted; last known up at 2017-03-06 17:04:51 CST
2017-03-06 17:21:23 CST [3269]: [2-1] user=,db= LOG: starting archive recovery
2017-03-06 17:21:23 CST [3269]: [3-1] user=,db= LOG: invalid primary checkpoint record
2017-03-06 17:21:23 CST [3269]: [4-1] user=,db= LOG: invalid secondary checkpoint record
2017-03-06 17:21:23 CST [3269]: [5-1] user=,db= PANIC: could not locate a valid checkpoint record
2017-03-06 17:21:23 CST [3267]: [3-1] user=,db= LOG: startup process (PID 3269) was terminated by signal 6: Aborted
2017-03-06 17:21:23 CST [3267]: [4-1] user=,db= LOG: aborting startup due to startup process failure
2017-03-06 17:21:23 CST [3267]: [5-1] user=,db= LOG: database system is shut down
$
处理步骤说明:
重置事务日志
仅保留备份时数据
$ pg_resetxlog -f /pgdata96
Transaction log reset
$
然后启动数据库,验证部分数据