# Appendix III. How to Backup

## Basic

* see [Backup your work - Basic](/teaching/getting-started.md#id-2-backup-your-work) in Getting Started.

## Advanced： git, rsync & crontab

{% hint style="success" %}
**Tips**

1. 实验室机器上的存储都会用RAID等磁盘阵列技术，保证即使两块硬盘同时坏了（概率非常小）也仍然不会丢失数据。所以根据多年的经验来看，一般的数据不太需要备份。
2. 对于code等重要的程序文件，建议高频率地（比如每天）通过 git 同步到github。可以写个自动脚本，用 crontab 设置每天自动提交一个备份到 github；或者自己按修改版本更新；或者使用 `git hooks` 以及其他第三方工具例如 `git-auto-sync` or `git-sync-on-inotify` 自动更新变化的文件（详见下文或者自己通过AI查找方法）。大家根据个人喜好，都可以。
3. 对于很重要的、而且较大的数据文件，可以利用 rsync 备份到不同的存储上（比如不同的存储机器，移动硬盘，或者实验室的群晖NAS存储，备份设备最好在不同的楼）。从实际经验来看，我们不再建议每天或者每周定期自动备份大的数据文件，实际用处不大；而是建议按一定的频率和课题进展程度手动地备份最为重要的数据。
   {% endhint %}

### 1) <mark style="color:red;">git</mark> - backup code

**You can find a detailed instruction of using git in this official documentation of Github:**

{% embed url="<https://docs.github.com/en/repositories>" %}

#### 1.1)  Setup git in Linux/Unix/Mac&#x20;

* **Setup:** add a setting file: \~/.gitconfig

```bash
[user] 
email =[your_email_of_your_github_account]
name = your_user_name
```

* **Clone/Download an existed repository on github**

```bash
git clone https://github.com/user/repo.git
git clone git@github.com:user/repo.git

# replace "user" to your user account, replace "repo" with your repo name.
# An HTTPS URL like https://github.com/user/repo.git
# An SSH URL, like git@github.com:user/repo.git
```

{% hint style="success" %}
Tips：

* [Git 的使用——提交避免输入用户名和密码](https://zhuanlan.zhihu.com/p/358721423) （较为繁琐，推荐下面链接中的VS Code中Git插件方法）

* 【推荐】 [Git in VS Code](https://code.visualstudio.com/docs/sourcecontrol/intro-to-git) (VS Code中登录Github账户后会自动保存账户密码和登入状态，而且本地登录好后即使在VS Code中操作远程的git也不用输入用户密码了，方法简单）
  {% endhint %}

* **Create a new repository**

```bash
echo "# test" >> README.md 
git init 
git add README.md 
git commit -m "first commit" 
git remote add origin https://github.com/xug15/test.git 
git push -u origin master
```

* **Sync local files with github repo**

**Pull (update)**:

```bash
git pull origin master
git log -n 2 # look at the last two log entries.
```

**Add:**

```bash
git add exmaples/
git commit -m ‘20190705v1’
git push origin
```

**Change:**

```bash
git commit -m ‘20190705v1’
git push origin
```

**Remove:**

```bash
git rm *.file
git commit -m ‘20190705v1’
git push origin
```

#### <mark style="color:green;">1.2) Tips of using git</mark>

#### <mark style="color:green;">Tip 1:  git-sync.sh</mark>

```bash
# a bash script to sync a github repo

time=`date`
echo $time
git add -u .
git add *
git commit -m '$time'
git push origin master
```

#### <mark style="color:green;">Tip 2: clone a private repo</mark>

{% hint style="warning" %}
Methods in this tip were generated by AI. They have not been tested yet.
{% endhint %}

> Cloning a private Git repository requires authentication to confirm you have access. This can be done using Github Desktop App, VS Code,  HTTPS with a **Personal Access Token (PAT)** or **SSH keys**.

<mark style="color:$primary;">**Method 1. Git integrated in VS Code 【推荐】**</mark>

* [Git in VS Code](https://code.visualstudio.com/docs/sourcecontrol/intro-to-git) (VS Code中登录Github账户后会自动保存账户密码和登入状态，而且本地登录好后即使在VS Code中操作远程的git也不用输入用户密码了，方法简单）

<mark style="color:$primary;">**Method 2. Using HTTPS with a Personal Access Token (PAT)**</mark>

* **Generate a Personal Access Token (PAT):**
  * Navigate to your Git hosting service (e.g., GitHub, GitLab) settings.
  * Find the "Developer settings" or "Access tokens" section.
  * Generate a new PAT, ensuring it has the necessary permissions (e.g., `repo` scope for GitHub) to clone repositories. Copy the generated token immediately, as it usually won't be shown again.
* **Clone the repository:**
  * Open your terminal or command prompt.
  * Navigate to the directory where you want to clone the repository.&#x20;
  * Use the `git clone` command with the HTTPS URL of the repository. When prompted for a password, paste your PAT instead of your account password.

```
    git clone https://github.com/YOUR-USERNAME/YOUR-PRIVATE-REPOSITORY.git
```

<mark style="color:$primary;">**Method 3. Using SSH Keys**</mark>

* **Generate an SSH Key Pair:**
  * Open your terminal or command prompt.
  * Generate an SSH key pair using `ssh-keygen`:

```bash
    # Mac or Linux
    ssh-keygen -t rsa -b 4096 
    # or
    ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
    
    # Windows
    ssh-keygen -t ed_25519
```

* Follow the prompts, optionally setting a passphrase for added security.
* **Add the Public Key to your Git Hosting Service:**
  * Copy the content of your public key (usually `~/.ssh/id_rsa.pub`).
  * Navigate to your Git hosting service settings (e.g., GitHub, GitLab).
  * Find the "SSH and GPG keys" or "SSH Keys" section.
  * Add a new SSH key, giving it a descriptive title and pasting the content of your public key.
* **Clone the repository:**
  * Open your terminal or command prompt.
  * Navigate to the directory where you want to clone the repository.&#x20;
  * Use the `git clone` command with the SSH URL of the repository, like `git@github.com:user/repo.git`

#### <mark style="color:green;">Tip 3:  Automatically sync with git</mark>

* **Purpose:** Automatically sync local changes to a remote GitHub repository.
* **Methods:**
  * **Git Hooks:** you can set up Git hooks (e.g., `post-commit`, `post-merge`) in your local repository to automatically push changes to the remote GitHub repository after commits or pulls. This requires scripting and careful configuration to avoid unintended pushes.
  * **External Tools:**&#x54;ools like `git-auto-sync` (as found on GitHub) can run as background daemons, monitoring local repositories for changes and automatically syncing them with the remote.
  * **Scheduled Tasks using** [**crontab**](#id-3-crontab-schedule-a-sync-backup-task)**:** Set up a scheduled task to periodically execute `git pull` and `git push` commands, pulling remote changes and pushing local commits.

### 2) <mark style="color:red;">rsync</mark> - backup large data files&#x20;

#### 2.1) Setup ssh key (optional if backup remotely) <a href="#ssh-key" id="ssh-key"></a>

**Purpose:** ssh to remote server not requiring password.&#x20;

> You do not need to setup ssh key if you only need to backup files between local directories. Then, you may go to[ step 2.2](#id-2.2-prepare-a-backup-script-with-rsync) directly.

* (a) Generate SSH key

```bash
# Mac or Linux
ssh-keygen -t rsa -b 2048
# Windows
ssh-keygen -t ed_25519
```

* (b) Copy your keys to the target server

```bash
ssh-copy-id user@server_ip    #if port add: -p 2200
```

#### 2.2) Prepare a backup script with rsync

* (a) First you need to prepare some backup dirs

```bash
mkdir /home/john/backup_local    # prepare a backup dir for some local files
mkdir /home/john/backup_remote   # prepare a backup dir for some remote files
```

* (b) Then, write a back up script, for example : \~/backup.sh

```bash
#!/bin/bash

#0. Define the parameters of rsync
RSYNC="rsync --stats  --compress --recursive --times --perms --links --delete --max-size=100M --exclude-from=/home/john/excluded_file_list.txt"

#A. Local backup  
echo "1. Backup of /home/john/data start at:"
date
$RSYNC /home/john/data/  /home/john/backup_local/
echo "Backup end at:"
date

#B. Remote backup 
echo "2. Backup 166.178.56.20:/home/lulab/john/data/ start at:"
date
$RSYNC john@166.178.56.20:/home/lulab/john/data/ /home/john/backup_remote/
echo "Backup end at:"
date
```

* (c) Last, make your backup.sh excutable

```bash
chmod +x ~/backup.sh
```

> **Parameters of rsync** (use `man rsync` to see more details):

| Parameter    | Mean                                           |
| ------------ | ---------------------------------------------- |
| -a:          | 以递归方式传输文件                                      |
| --delete:    | 删除那些接收端还有而发送端已经不存在的文件                          |
| -q:          | 精简输出模式                                         |
| -z:          | 在传输文件时进行压缩处理                                   |
| -H:          | 保持硬链接文件                                        |
| -t:          | 对比两边文件的时间戳和文件大小.如果一致，则就认为两边文件一样，对此文件就不再采取更新动作了 |
| -I:          | 挨个文件去发起数据同步                                    |
| --port=PORT: | 端口号                                            |

### 3) [<mark style="color:red;">crontab</mark>](#id-3-crontab-schedule-a-sync-backup-task) - schedule a sync/backup task&#x20;

**Purpose:**  run scheduled jobs automatically.&#x20;

&#x20;You can use [Crontab Generator](https://crontab-generator.org/) or edit a crontab job by yourself:

```
crontab -e
```

then add these:

```bash
# minute hour day_in_month month day_in_week command
# sync code with git daily
15 3 * * * /home/john/git-sync.sh > /home/john/git-sync.log 
```

Click this to see an example of [git-sync.sh](#id-1.3-git-sync.sh).

This table explains the value in each column:

| Column    | Mean                             |
| --------- | -------------------------------- |
| Column 1: | Minutes 0 to 59                  |
| Column 2: | Hours 0 to 23 (0 means midnight) |
| Column 3: | Day 1 to 31                      |
| Column 4: | Months 1\~12                     |
| Column 5: | Week 0 to 7 (0 and 7 for Sunday) |
| Column 6: | Command to run                   |

### 4) More Reading for advanced users

* 《[鸟哥的Linux私房菜-基础学习篇](https://www.ctolib.com/docs/sfile/vbird-linux-basic-4e)》 (25章推荐章节)

  > **Linux 推荐章节：**
  >
  > * 第25章 LINUX备份策略: 25.2.2完整备份的差异备份; 25.3鸟哥的备份策略; 25.4灾难恢复的考虑; 25.5重点回顾


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://book.ncrnalab.org/teaching/appendix/appendix-iii.-how-to-backup.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
