Tran H T N, Ang K S, Chevrier M, et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data[J]. Genome biology, 2020, 21: 1-32.
python相关库导入
import logging, matplotlib, os, sys
import copy
import scanpy as sc
import numpy as np
import scipy as sp
import pandas as pd
import scrublet as scr
import matplotlib.pyplot as plt
from matplotlib.pyplot import rc_context
from anndata import AnnData
from matplotlib import rcParams
from matplotlib import colors
import seaborn as sb
import warnings
# from rpy2.robjects.packages import importr
%matplotlib inline
# %matplotlib notebook
################
# configure file
################
sc.settings.autoshow = False
sc.settings.verbosity = 3
sc.settings.set_figure_params(dpi=150, dpi_save=300, format='png', frameon=False, transparent=True, fontsize=10)
plt.rcParams["image.aspect"] = "equal"
plt.rcParams["figure.figsize"] = ([3,3])
warnings.simplefilter(action='ignore', category=FutureWarning)
Converting 10X V(D)J data into the AIRR Community standardized format
AssignGenes.py igblast -s filtered_contig.fasta -b igblast_1.19 \
--organism human --loci ig --format blast
#The -b argument specifies the path containing the database, internal_data, and optional_file directories required by IgBLAST.
# The output is "filtered_contig_igblast.fmt7".
MakeDb.py igblast -i filtered_contig_igblast.fmt7 -s filtered_contig.fasta -r \
imgt_human_*.fasta \
--10x filtered_contig_annotations.csv --extended
# The output is "filtered_contig_igblast_db-pass.tsv", which overwrites the V, D and J gene assignments generated by Cell Ranger and uses those generated by IgBLAST instead.
# Standalone IgBLAST blast-style tabular output is parsed by the igblast subcommand of MakeDb.py to generate the standardized tab-delimited database file on which all subsequent Change-O modules operate.
# The optional --extended argument adds extra columns to the output database containing IMGT-gapped CDR/FWR regions and alignment metrics.
# -r IMGT_Human_IGHV.fasta IMGT_Human_IGHD.fasta IMGT_Human_IGHJ.fasta
Identifying clones from B cells in AIRR formatted 10X V(D)J data
Splitting into separate light and heavy chain files. To group B cells into clones from AIRR Rearrangement data, the output from MakeDb must be parsed into a light chain file and a heavy chain file:
ParseDb.py select -d filtered_contig_igblast_db-pass.tsv -f locus -u "IGH" \
--logic all --regex --outname temp
ParseDb.py select -d temp_parse-select.tsv -f productive -u T --outname temp2
ParseDb.py select -d temp2_parse-select.tsv -f v_call j_call c_call -u "IGH" \
--logic all --regex --outname heavy
ParseDb.py select -d filtered_contig_igblast_db-pass.tsv -f locus -u "IG[LK]" \
--logic all --regex --outname temp
ParseDb.py select -d temp_parse-select.tsv -f productive -u T --outname temp2
ParseDb.py select -d temp2_parse-select.tsv -f v_call j_call c_call -u "IG[LK]" \
--logic all --regex --outname light
# the outputs are "heavy_parse-select.tsv" and "light_parse-select.tsv". Non-productive sequences were removed.
#records with disagreements between the C-region primers and the reference alignment were removed too
#(vjc are both IGH or IGL/LGK).
Calculating nearest neighbor distances based on heavy chains
DefineClones.py -d heavy_parse-select.tsv --act set --model ham \
--norm len --dist 0.16
# or use other models:
#DefineClones.py -d heavy_parse-select.tsv --act set --model hh_s5f --norm none --dist **
# output is "heavy_parse-select_clone-pass.tsv"
#Correct clonal groups based on light chain data:
light_cluster.py -d heavy_parse-select_clone-pass.tsv -e light_parse-select.tsv \
-o 10X_clone-pass.tsv
#The algorithm will (1) remove cells associated with more than one heavy chain and
#(2) correct heavy chain clone definitions based on an analysis of the light chain partners
#associated with the heavy chain clone.
Reconstructing germline sequences
CreateGermlines.py -d 10X_clone-pass.tsv -g dmask --cloned\
-r $SCRATCH/projects/B/immcantation/germlines/imgt/human/vdj/imgt_human_*.fasta
# The output is "10X_clone-pass_germ-pass.tsv".
#this will generate a single germline of consensus length for each clone