Tran H T N, Ang K S, Chevrier M, et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data[J]. Genome biology, 2020, 21: 1-32.
python相关库导入
import logging, matplotlib, os, sysimport copyimport scanpy as scimport numpy as npimport scipy as spimport pandas as pdimport scrublet as scrimport matplotlib.pyplot as pltfrom matplotlib.pyplot import rc_contextfrom anndata import AnnDatafrom matplotlib import rcParamsfrom matplotlib import colorsimport seaborn as sbimport warnings# from rpy2.robjects.packages import importr%matplotlib inline# %matplotlib notebook################# configure file ################sc.settings.autoshow =Falsesc.settings.verbosity =3sc.settings.set_figure_params(dpi=150, dpi_save=300, format='png', frameon=False, transparent=True, fontsize=10)plt.rcParams["image.aspect"]="equal"plt.rcParams["figure.figsize"]= ([3,3])warnings.simplefilter(action='ignore', category=FutureWarning)
Converting 10X V(D)J data into the AIRR Community standardized format
AssignGenes.pyigblast-sfiltered_contig.fasta-bigblast_1.19 \--organism human--lociig--formatblast#The -b argument specifies the path containing the database, internal_data, and optional_file directories required by IgBLAST.
# The output is "filtered_contig_igblast.fmt7".MakeDb.pyigblast-ifiltered_contig_igblast.fmt7-sfiltered_contig.fasta-r \imgt_human_*.fasta \--10x filtered_contig_annotations.csv--extended# The output is "filtered_contig_igblast_db-pass.tsv", which overwrites the V, D and J gene assignments generated by Cell Ranger and uses those generated by IgBLAST instead.
# Standalone IgBLAST blast-style tabular output is parsed by the igblast subcommand of MakeDb.py to generate the standardized tab-delimited database file on which all subsequent Change-O modules operate.
# The optional --extended argument adds extra columns to the output database containing IMGT-gapped CDR/FWR regions and alignment metrics.
# -r IMGT_Human_IGHV.fasta IMGT_Human_IGHD.fasta IMGT_Human_IGHJ.fasta
Identifying clones from B cells in AIRR formatted 10X V(D)J data
Splitting into separate light and heavy chain files. To group B cells into clones from AIRR Rearrangement data, the output from MakeDb must be parsed into a light chain file and a heavy chain file:
ParseDb.pyselect-dfiltered_contig_igblast_db-pass.tsv-flocus-u"IGH" \--logicall--regex--outnametempParseDb.pyselect-dtemp_parse-select.tsv-fproductive-uT--outnametemp2ParseDb.pyselect-dtemp2_parse-select.tsv-fv_callj_callc_call-u"IGH" \--logicall--regex--outnameheavyParseDb.pyselect-dfiltered_contig_igblast_db-pass.tsv-flocus-u"IG[LK]" \--logicall--regex--outnametempParseDb.pyselect-dtemp_parse-select.tsv-fproductive-uT--outnametemp2ParseDb.pyselect-dtemp2_parse-select.tsv-fv_callj_callc_call-u"IG[LK]" \--logicall--regex--outnamelight# the outputs are "heavy_parse-select.tsv" and "light_parse-select.tsv". Non-productive sequences were removed.#records with disagreements between the C-region primers and the reference alignment were removed too #(vjc are both IGH or IGL/LGK).
Calculating nearest neighbor distances based on heavy chains
DefineClones.py-dheavy_parse-select.tsv--actset--modelham \--normlen--dist0.16# or use other models:#DefineClones.py -d heavy_parse-select.tsv --act set --model hh_s5f --norm none --dist **# output is "heavy_parse-select_clone-pass.tsv"#Correct clonal groups based on light chain data:light_cluster.py-dheavy_parse-select_clone-pass.tsv-elight_parse-select.tsv \-o10X_clone-pass.tsv#The algorithm will (1) remove cells associated with more than one heavy chain and #(2) correct heavy chain clone definitions based on an analysis of the light chain partners#associated with the heavy chain clone.
Reconstructing germline sequences
CreateGermlines.py-d10X_clone-pass.tsv-gdmask--cloned\-r $SCRATCH/projects/B/immcantation/germlines/imgt/human/vdj/imgt_human_*.fasta# The output is "10X_clone-pass_germ-pass.tsv".#this will generate a single germline of consensus length for each clone