Appendix III. How to Backup
Basic
see Backup your work - Basic in Getting Started.
Advanced: git, rsync & crontab
Tips
实验室机器上的存储都会用RAID等磁盘阵列技术,保证即使两块硬盘同时坏了(概率非常小)也仍然不会丢失数据。所以根据多年的经验来看,一般的数据不太需要备份。
对于code等重要的程序文件,建议高频率地(比如每天)通过 git 同步到github。可以写个自动脚本,用 crontab 设置每天自动提交一个备份到 github;或者自己按修改版本更新;或者使用
git hooks
以及其他第三方工具例如git-auto-sync
orgit-sync-on-inotify
自动更新变化的文件,详见 1.1) Automatically sync with git。大家根据个人喜好,都可以。对于很重要的、而且较大的数据文件,可以利用 rsync 备份到不同的存储上(比如不同的存储机器,移动硬盘,或者实验室的群晖NAS存储,备份设备最好在不同的楼)。从实际经验来看,我们不再建议每天或者每周定期自动备份大的数据文件,实际用处不大;而是建议按一定的频率和课题进展程度手动地备份最为重要的数据。
1) git - backup code
1.1) Automatically sync with git:
Purpose: Automatically sync local changes to a remote GitHub repository.
Methods:
Git Hooks: you can set up Git hooks (e.g.,
post-commit
,post-merge
) in your local repository to automatically push changes to the remote GitHub repository after commits or pulls. This requires scripting and careful configuration to avoid unintended pushes.External Tools:Tools like
git-auto-sync
(as found on GitHub) can run as background daemons, monitoring local repositories for changes and automatically syncing them with the remote.Scheduled Tasks using crontab: Set up a scheduled task to periodically execute
git pull
andgit push
commands, pulling remote changes and pushing local commits.
1.2) Setup git in Linux/Unix/Mac
Setup
set up ssh-key (optinal)
add a setting file: ~/.gitconfig
[user]
email =[[email protected]]
name = Shared
Clone/Download an existed repository on github
git clone [email protected]:lulab/RNAfinder_Server.git
git clone https://github.com/xug15/test.git
Create a new repository
echo "# test" >> README.md
git init
git add README.md
git commit -m "first commit"
git remote add origin https://github.com/xug15/test.git
git push -u origin master
Sync local files with github repo
Pull (update):
git pull origin master
git log -n 2 # look at the last two log entries.
Add:
git add exmaples/
git commit -m ‘20190705v1’
git push origin
Change:
git commit -m ‘20190705v1’
git push origin
Remove:
git rm *.file
git commit -m ‘20190705v1’
git push origin
1.3) git-sync.sh:
# a bash script to sync a github repo
time=`date`
echo $time
git add -u .
git add *
git commit -m '$time'
git push origin master
2) rsync - backup large data files
2.1) Setup ssh key (optional if backup remotely)
Purpose: ssh to remote server not requiring password.
You do not need to setup ssh key if you only need to backup files between local directories. Then, you may go to step 2.2 directly.
(a) Generate SSH key
ssh-keygen -t rsa -b 2048
(b) Copy your keys to the target server
ssh-copy-id user@server_ip #if port add: -p 2200
2.2) Prepare a backup script with rsync
(a) First you need to prepare some backup dirs
mkdir /home/john/backup_local # prepare a backup dir for some local files
mkdir /home/john/backup_remote # prepare a backup dir for some remote files
(b) Then, write a back up script, for example : ~/backup.sh
#!/bin/bash
#0. Define the parameters of rsync
RSYNC="rsync --stats --compress --recursive --times --perms --links --delete --max-size=100M --exclude-from=/home/john/excluded_file_list.txt"
#A. Local backup
echo "1. Backup of /home/john/data start at:"
date
$RSYNC /home/john/data/ /home/john/backup_local/
echo "Backup end at:"
date
#B. Remote backup
echo "2. Backup 166.178.56.20:/home/lulab/john/data/ start at:"
date
$RSYNC [email protected]:/home/lulab/john/data/ /home/john/backup_remote/
echo "Backup end at:"
date
(c) Last, make your backup.sh excutable
chmod +x ~/backup.sh
Parameters of rsync (use
man rsync
to see more details):
Parameter
Mean
-a:
以递归方式传输文件
--delete:
删除那些接收端还有而发送端已经不存在的文件
-q:
精简输出模式
-z:
在传输文件时进行压缩处理
-H:
保持硬链接文件
-t:
对比两边文件的时间戳和文件大小.如果一致,则就认为两边文件一样,对此文件就不再采取更新动作了
-I:
挨个文件去发起数据同步
--port=PORT:
端口号
3) crontab - schedule a sync/backup task
Purpose: run scheduled jobs automatically.
You can use Crontab Generator or edit a crontab job by yourself:
crontab -e
then add these:
# minute hour day_in_month month day_in_week command
# sync code with git daily
15 3 * * * /home/john/git-sync.sh > /home/john/git-sync.log
Click this to see an example of git-sync.sh.
This table explains the value in each column:
Column
Mean
Column 1:
Minutes 0 to 59
Column 2:
Hours 0 to 23 (0 means midnight)
Column 3:
Day 1 to 31
Column 4:
Months 1~12
Column 5:
Week 0 to 7 (0 and 7 for Sunday)
Column 6:
Command to run
4) More Reading for advanced users
《鸟哥的Linux私房菜-基础学习篇》 (25章推荐章节)
Linux 推荐章节:
第25章 LINUX备份策略: 25.2.2完整备份的差异备份; 25.3鸟哥的备份策略; 25.4灾难恢复的考虑; 25.5重点回顾
Last updated
Was this helpful?