# extract X chromosome sequencetar-xz-fchromFa.tar.gzchrX.famvchrX.faMus_musculus_chrX.fa# use only X chromosomezcatMus_musculus.GRCm38.93.gtf.gz|grep-P'(#!)|(X\t)'>Mus_musculus_chrX.gtf# make hisat indexhisat2-2.1.0/hisat2_extract_splice_sites.pyMus_musculus_chrX.gtf>Mus_musculus_chrX.sshisat2-2.1.0/hisat2_extract_exons.pyMus_musculus_chrX.gtf>Mus_musculus_chrX.exonmkdirhisat2_indexeshisat2-2.1.0/hisat2-build-p4 \--ssMus_musculus_chrX.ss--exonMus_musculus_chrX.exon \Mus_musculus_chrX.fahisat2_indexes/Mus_musculus_chrX
(6) mapping
# mappinghisat2-2.1.0/hisat2-p4--dta \-SSRR065544_chrX.sam-xhisat2_indexes/Mus_musculus_chrX \-1SRR065544_1.fastq.gz-2SRR065544_2.fastq.gzhisat2-2.1.0/hisat2-p4--dta \-SSRR065545_chrX.sam-xhisat2_indexes/Mus_musculus_chrX \-1SRR065545_1.fastq.gz-2SRR065545_2.fastq.gz# covert to .bamsamtoolssort-@4-oSRR065544_chrX_raw.bamSRR065544_chrX.samsamtoolssort-@4-oSRR065545_chrX_raw.bamSRR065545_chrX.sam# filter only mapped readsbamtoolsindex-inSRR065544_chrX_raw.bambamtoolsindex-inSRR065545_chrX_raw.bambamtoolsfilter-isMappedtrue-inSRR065544_chrX_raw.bam \-outSRR065544_chrX.bambamtoolsfilter-isMappedtrue-inSRR065545_chrX_raw.bam \-outSRR065545_chrX.bam
5) Homework
为了鉴定 CUGBP1 对 mRNA isoform 的调控,科学家在 C2C12 小鼠成肌细胞(myoblast)中分别表达空载体(SRR065546)和含有干扰 CUGBP1 的 shRNA 的载体(SRR065547)。请同学们至该链接中Files needed by this Tutorial中的清华云Bioinformatics Tutorial / Files路径下的相应文件夹中下载 .bam 输入文件(只含有 map 到 X 染色体的 reads),探索在 X 染色体上存在 differential alternative splicing 的基因。(需要上交代码和输出结果中所有以 .MATS.JCEC.txt 结尾的文件)