# Appendix III. How to Backup

## Basic

* see [Backup your work - Basic](https://book.ncrnalab.org/teaching/getting-started#id-2-backup-your-work) in Getting Started.

## Advanced： git, rsync & crontab

{% hint style="success" %}
**Tips**

1. 实验室机器上的存储都会用RAID等磁盘阵列技术，保证即使两块硬盘同时坏了（概率非常小）也仍然不会丢失数据。所以根据多年的经验来看，一般的数据不太需要备份。
2. 对于code等重要的程序文件，建议高频率地（比如每天）通过 git 同步到github。可以写个自动脚本，用 crontab 设置每天自动提交一个备份到 github；或者自己按修改版本更新；或者使用 `git hooks` 以及其他第三方工具例如 `git-auto-sync` or `git-sync-on-inotify` 自动更新变化的文件（详见下文或者自己通过AI查找方法）。大家根据个人喜好，都可以。
3. 对于很重要的、而且较大的数据文件，可以利用 rsync 备份到不同的存储上（比如不同的存储机器，移动硬盘，或者实验室的群晖NAS存储，备份设备最好在不同的楼）。从实际经验来看，我们不再建议每天或者每周定期自动备份大的数据文件，实际用处不大；而是建议按一定的频率和课题进展程度手动地备份最为重要的数据。
   {% endhint %}

### 1) <mark style="color:red;">git</mark> - backup code

**You can find a detailed instruction of using git in this official documentation of Github:**

{% embed url="<https://docs.github.com/en/repositories>" %}

#### 1.1)  Setup git in Linux/Unix/Mac&#x20;

* **Setup:** add a setting file: \~/.gitconfig

```bash
[user] 
email =[your_email_of_your_github_account]
name = your_user_name
```

* **Clone/Download an existed repository on github**

```bash
git clone https://github.com/user/repo.git
git clone git@github.com:user/repo.git

# replace "user" to your user account, replace "repo" with your repo name.
# An HTTPS URL like https://github.com/user/repo.git
# An SSH URL, like git@github.com:user/repo.git
```

{% hint style="success" %}
Tips：

* [Git 的使用——提交避免输入用户名和密码](https://zhuanlan.zhihu.com/p/358721423) （较为繁琐，推荐下面链接中的VS Code中Git插件方法）

* 【推荐】 [Git in VS Code](https://code.visualstudio.com/docs/sourcecontrol/intro-to-git) (VS Code中登录Github账户后会自动保存账户密码和登入状态，而且本地登录好后即使在VS Code中操作远程的git也不用输入用户密码了，方法简单）
  {% endhint %}

* **Create a new repository**

```bash
echo "# test" >> README.md 
git init 
git add README.md 
git commit -m "first commit" 
git remote add origin https://github.com/xug15/test.git 
git push -u origin master
```

* **Sync local files with github repo**

**Pull (update)**:

```bash
git pull origin master
git log -n 2 # look at the last two log entries.
```

**Add:**

```bash
git add exmaples/
git commit -m ‘20190705v1’
git push origin
```

**Change:**

```bash
git commit -m ‘20190705v1’
git push origin
```

**Remove:**

```bash
git rm *.file
git commit -m ‘20190705v1’
git push origin
```

#### <mark style="color:green;">1.2) Tips of using git</mark>

#### <mark style="color:green;">Tip 1:  git-sync.sh</mark>

```bash
# a bash script to sync a github repo

time=`date`
echo $time
git add -u .
git add *
git commit -m '$time'
git push origin master
```

#### <mark style="color:green;">Tip 2: clone a private repo</mark>

{% hint style="warning" %}
Methods in this tip were generated by AI. They have not been tested yet.
{% endhint %}

> Cloning a private Git repository requires authentication to confirm you have access. This can be done using Github Desktop App, VS Code,  HTTPS with a **Personal Access Token (PAT)** or **SSH keys**.

<mark style="color:$primary;">**Method 1. Git integrated in VS Code 【推荐】**</mark>

* [Git in VS Code](https://code.visualstudio.com/docs/sourcecontrol/intro-to-git) (VS Code中登录Github账户后会自动保存账户密码和登入状态，而且本地登录好后即使在VS Code中操作远程的git也不用输入用户密码了，方法简单）

<mark style="color:$primary;">**Method 2. Using HTTPS with a Personal Access Token (PAT)**</mark>

* **Generate a Personal Access Token (PAT):**
  * Navigate to your Git hosting service (e.g., GitHub, GitLab) settings.
  * Find the "Developer settings" or "Access tokens" section.
  * Generate a new PAT, ensuring it has the necessary permissions (e.g., `repo` scope for GitHub) to clone repositories. Copy the generated token immediately, as it usually won't be shown again.
* **Clone the repository:**
  * Open your terminal or command prompt.
  * Navigate to the directory where you want to clone the repository.&#x20;
  * Use the `git clone` command with the HTTPS URL of the repository. When prompted for a password, paste your PAT instead of your account password.

```
    git clone https://github.com/YOUR-USERNAME/YOUR-PRIVATE-REPOSITORY.git
```

<mark style="color:$primary;">**Method 3. Using SSH Keys**</mark>

* **Generate an SSH Key Pair:**
  * Open your terminal or command prompt.
  * Generate an SSH key pair using `ssh-keygen`:

```bash
    # Mac or Linux
    ssh-keygen -t rsa -b 4096 
    # or
    ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
    
    # Windows
    ssh-keygen -t ed_25519
```

* Follow the prompts, optionally setting a passphrase for added security.
* **Add the Public Key to your Git Hosting Service:**
  * Copy the content of your public key (usually `~/.ssh/id_rsa.pub`).
  * Navigate to your Git hosting service settings (e.g., GitHub, GitLab).
  * Find the "SSH and GPG keys" or "SSH Keys" section.
  * Add a new SSH key, giving it a descriptive title and pasting the content of your public key.
* **Clone the repository:**
  * Open your terminal or command prompt.
  * Navigate to the directory where you want to clone the repository.&#x20;
  * Use the `git clone` command with the SSH URL of the repository, like `git@github.com:user/repo.git`

#### <mark style="color:green;">Tip 3:  Automatically sync with git</mark>

* **Purpose:** Automatically sync local changes to a remote GitHub repository.
* **Methods:**
  * **Git Hooks:** you can set up Git hooks (e.g., `post-commit`, `post-merge`) in your local repository to automatically push changes to the remote GitHub repository after commits or pulls. This requires scripting and careful configuration to avoid unintended pushes.
  * **External Tools:**&#x54;ools like `git-auto-sync` (as found on GitHub) can run as background daemons, monitoring local repositories for changes and automatically syncing them with the remote.
  * **Scheduled Tasks using** [**crontab**](#id-3-crontab-schedule-a-sync-backup-task)**:** Set up a scheduled task to periodically execute `git pull` and `git push` commands, pulling remote changes and pushing local commits.

### 2) <mark style="color:red;">rsync</mark> - backup large data files&#x20;

#### 2.1) Setup ssh key (optional if backup remotely) <a href="#ssh-key" id="ssh-key"></a>

**Purpose:** ssh to remote server not requiring password.&#x20;

> You do not need to setup ssh key if you only need to backup files between local directories. Then, you may go to[ step 2.2](#id-2.2-prepare-a-backup-script-with-rsync) directly.

* (a) Generate SSH key

```bash
# Mac or Linux
ssh-keygen -t rsa -b 2048
# Windows
ssh-keygen -t ed_25519
```

* (b) Copy your keys to the target server

```bash
ssh-copy-id user@server_ip    #if port add: -p 2200
```

#### 2.2) Prepare a backup script with rsync

* (a) First you need to prepare some backup dirs

```bash
mkdir /home/john/backup_local    # prepare a backup dir for some local files
mkdir /home/john/backup_remote   # prepare a backup dir for some remote files
```

* (b) Then, write a back up script, for example : \~/backup.sh

```bash
#!/bin/bash

#0. Define the parameters of rsync
RSYNC="rsync --stats  --compress --recursive --times --perms --links --delete --max-size=100M --exclude-from=/home/john/excluded_file_list.txt"

#A. Local backup  
echo "1. Backup of /home/john/data start at:"
date
$RSYNC /home/john/data/  /home/john/backup_local/
echo "Backup end at:"
date

#B. Remote backup 
echo "2. Backup 166.178.56.20:/home/lulab/john/data/ start at:"
date
$RSYNC john@166.178.56.20:/home/lulab/john/data/ /home/john/backup_remote/
echo "Backup end at:"
date
```

* (c) Last, make your backup.sh excutable

```bash
chmod +x ~/backup.sh
```

> **Parameters of rsync** (use `man rsync` to see more details):

| Parameter    | Mean                                           |
| ------------ | ---------------------------------------------- |
| -a:          | 以递归方式传输文件                                      |
| --delete:    | 删除那些接收端还有而发送端已经不存在的文件                          |
| -q:          | 精简输出模式                                         |
| -z:          | 在传输文件时进行压缩处理                                   |
| -H:          | 保持硬链接文件                                        |
| -t:          | 对比两边文件的时间戳和文件大小.如果一致，则就认为两边文件一样，对此文件就不再采取更新动作了 |
| -I:          | 挨个文件去发起数据同步                                    |
| --port=PORT: | 端口号                                            |

### 3) [<mark style="color:red;">crontab</mark>](#id-3-crontab-schedule-a-sync-backup-task) - schedule a sync/backup task&#x20;

**Purpose:**  run scheduled jobs automatically.&#x20;

&#x20;You can use [Crontab Generator](https://crontab-generator.org/) or edit a crontab job by yourself:

```
crontab -e
```

then add these:

```bash
# minute hour day_in_month month day_in_week command
# sync code with git daily
15 3 * * * /home/john/git-sync.sh > /home/john/git-sync.log 
```

Click this to see an example of [git-sync.sh](#id-1.3-git-sync.sh).

This table explains the value in each column:

| Column    | Mean                             |
| --------- | -------------------------------- |
| Column 1: | Minutes 0 to 59                  |
| Column 2: | Hours 0 to 23 (0 means midnight) |
| Column 3: | Day 1 to 31                      |
| Column 4: | Months 1\~12                     |
| Column 5: | Week 0 to 7 (0 and 7 for Sunday) |
| Column 6: | Command to run                   |

### 4) More Reading for advanced users

* 《[鸟哥的Linux私房菜-基础学习篇](https://www.ctolib.com/docs/sfile/vbird-linux-basic-4e)》 (25章推荐章节)

  > **Linux 推荐章节：**
  >
  > * 第25章 LINUX备份策略: 25.2.2完整备份的差异备份; 25.3鸟哥的备份策略; 25.4灾难恢复的考虑; 25.5重点回顾
