引

2003年的时候，人类基因组草图刚放出来，就有人注意到人的基因组中存在一种特殊的序列，它的前半段是完整的snRNA (如U6)，后面紧邻着LINE-1的3’部分（也就是5’-truncated LINE-1），在5’-truncated L1的后面有polyA，整个序列还被两端的TSD (target site duplicatioin，一小段序列，在5’和3’相同)包围。

本页主题是探究这种序列形成的原因。

LINE-1逆转录转座帮助processed pseudogenes形成

先忘记snRNA-L1 chimeras的事情，认识一些前置概念。

假基因 pseudogene

从定义上来讲，假基因跟所谓的“真”基因相比有两个特点：

related，序列相近，同源性很高
defective，功能受损，甚至无法转录、翻译

1977年，pseudogene这个概念第一次被提出来，是在Xenopus的5S DNA后面发现了它的缺损重复（C. Jacq et al, Cell, 1977）。

加工过的假基因 processed pseudogenes

Pseudogene可以分为以下两类：

Those retain the intervening sequences of their functional couterparts.
Those lacking the intervening sequences of their functional counterparts: processed pseudogenes.

为什么有没有“intervening sequences”要用来作为分类标准呢？原因很简单，因为发现的很多pseudogene都存在intervening sequences，所以相当于是把所有存在intervening sequences的pseudogenes单独分成的一类，然后剩下的分成另一类。

这种存在”intervening sequences”的pseudogenes被称为”processed pseudogenes”，“加工过的假基因”。这里的Processed的意思是：这些pseudogenes来源于processed RNA，也就是mRNA.

processed pseudogenes拥有以下证据强烈暗示了这种说法的正确性：

The lack of intervening sequences (mRNA的内含子被剪接去掉了)
The presence of a poly A tract at the 3’ end. (mRNA成熟后具有polyA tails)
The sequence homology ceases at the border of functional transcripts (同源性在functional transcripts的边界突变)
They are flanked by direct repeats of 7-17 bp.（可能是mRNA反转录整合时引入的repeats）
Pseudogenes and functional couterparts are not on the same chromosome. (从细胞质中的mRNA而来，自然不一定在同一条染色体上)
For human beta-tubulin, pseudogenes of all different polyadenylation varients are found.
- “Abudant ubiquitously expressed transcripts account for a large fraction of the processed pseudogenes. (Zhang, 2003)” （beta-tubulin的每一个mRNA variant都有对应的pseudogenes的形式存在）

结论：Processed pseudogenes are derived from the mRNA intermediate.

LINE-1逆转录转座 retrotransposition of LINE-1

即便我们知道了processed pseudogenes很可能来源于mRNA中间体，但是mRNA通过何种方式变成了DNA并整合进入基因组中并不知道。一种很可能的方式是通过LINE-1的逆转录转座。

LINE-1逆转录转座的机制：

一个有功能的Full length LINE-1依靠内部的promoter转录出全长LINE-1 RNA.
LINE-1 RNA进入细胞质，其中的ORF1、ORF2被翻译。
ORF1和ORF2具有cis preference特性与自身RNA形成L1 RNP。
L1 RNP返回细胞核，发生target-primed reverse transcription (TPRT)。
1. First strand nick
2. priming and reverse transcription
3. second strand cleavage
4. DNA synthesis
5. TSD formation

LINE-1逆转录转座形成Retrogenes

虽然LINE-1本身的逆转录转座系统是为了在基因组中对自身进行拷贝的。但是，也有其他的RNA可能劫持或误入这个系统，被反转录成DNA整合进入基因组，这种基因成为Retrogene

Retrogene: An expressed and functional gene that is generated by retrotransposition and has an intact ORF that is consistent with that of the parental gene. (Richard Cordaux et al, Nat. Genetics, 2009) 这抄的定义里写的要”expressed and functional”，我这里想表达的意思并不需要它是表达且有功能的。

前置概念讲完了.

Processed pseudogene是从mRNA mediate过来的。
LINE-1的逆转录转座可能帮助其他RNA整合。

因此，Processed pseudogene很可能就是一条成熟的mRNA碰巧hijack了LINE-1 retrotransposition，然后被当成了LINE-1整合进基因组里去了。这样的假设完全符合processed pseudogenes的特点，没有内含子、有PolyA，在两端的short repeats正好就是retrotransposition中引入的TSD.

下面回到”snRNA-L1 chimeras”的主题。

发现

2003年，在人类基因组中发现了一些chimeric genes.

Characteristics:

3’-poly(A)-tailed.
flanked by 11-21 bp direct repeats
contained T2A4 hexanucleotide motifs near 5’ end.

在740个pseudogenes中，有81个是chimeras；在81个chimeras中，有66个是U6 containing chimeras；在66个U6 containing chimeras中，有56个U6-L1 chimeras.

总之，其中特征很明显的一类是：Full-sized U6 snRNA fused at their 3’-termini with 5’-truncated LINE-1.

机制

不行我现在突然赖得写了，心情好的时候再来补吧。

总之就是十几年一直认为是LINE-1 retrotransposition过程中发生了template switch，导致L1自己的RNA从3‘端开始逆转录到一半的时候，模板换成U6之类的其他RNA了，所以形成Full-sized U6融合5’-truncated LINE-1. 然后2019年有篇PNAS说可能是RTCB把具有2’,3’-cyclic phosphate末端的U6（加工过程自带）跟在细胞内损伤断裂的具有5’-OH末端的L1 RNA连起来了，然后再通过L1 retrotransposition整合。两种说法，也没定论到底谁对谁错。

这是2023年10月20日组会上分享的文献，想复习可以回去找PPT，文件名”20231020FYM”