Human endogenous retroviruses (HERVs) and other long terminal repeat (LTR)-type retrotransposons (HERV/LTRs) have regulatory elements that possibly influence the transcription of host genes. We systematically identified these regulatory elements based on publicly available datasets of ChIP-Seq for transcription factors (TFs) and DNase-Seq. We identified TF binding sites (TFBSs) on HERV/LTRs (HERV-TFBSs) and DNase I hypersensitive sites (DHSs) on HERV/LTRs (HERV-DHSs). Subsequently, we identified "HERV/LTR-shared regulatory element (HSRE)", defined as a TF-binding motif in HERV-TFBSs, shared within a substantial fraction of a HERV/LTR type. dbHERV-REs is a database of HERV/LTR regulatory elements. The database provides (i) general information on HERV/LTRs such as family classification, copy number, and insertion date judged by distribution of orthologous copies among mammalian genome; (ii) positions of HERV-TFBSs, HSREs, and HERV-DHSs in the consensus sequence of HERV/LTRs and in human reference genome; and (iii) results of Gene ontology (GO) enrichment analyses with GREAT using sets of respective HSREs. The database also can compare phylogenetic relationship of HERV/LTR copies with the presence of orthologous copies across the mammalian genome, TFBSs, and TF-binding motifs. Further descriptions are written in our paper.
The figures and results in this website can be reused for your publication. When you reuse the data in your publication, please cite our paper.
This web page is comfirmed to work properly on Safari, Chrome and Firefox.
Download information of HSREs of .
Number of HERV-TFBSs mapped on each consensus position of the HERV/LTR type. The x-axis indicates nucleotide positions of the consensus sequence of the HERV/LTR type. The y-axis indicates number of HERV/LTR copies harboring HERV-TFBSs at each position.
Download the genomic positions of HERV-TFBSs for in human reference genome (GRCh37/hg19).
Number of TF-binding motifs in HERV-TFBSs mapped on each consensus position of the HERV/LTR type. The x-axis indicates consensus position of the HERV/LTR type. The y-axis indicates number of HERV/LTR copies harboring the TF-binding motifs in TFBSs at each position. Peaks of the motifs corresponding to HSREs are shown with dots and motif names.
Download genomic positions of HSREs for in human reference genome (GRCh37/hg19).
Download Gene Ontology terms in which insertions of HSREs for were enriched.
Number of HERV-DHSs mapped on each consensus position of HERV/LTR. The x-axis indicates consensus position of the HERV/LTR type. The y-axis indicates number of HERV/LTR copies harboring HERV-DHSs at each position.
Download genomic positions of HERV-DHSs for in human reference genome (GRCh37/hg19).
Proportion of HERV/LTR copies overlapped to each chromatin state predicted by chromatin segmentation. TSS; promoter region including TSS, PF; predicted promoter flanking region, E; enhancer, WE; weak enhancer or open chromatin cis regulatory element, CTCF; CTCF enriched element, T; transcribed region, R; repressed or low activity region.
An unrooted phylogenetic tree of HERV/LTR copies. Representative supporting values calculated by Shimodaira-Hasegawa (SH)-like test are shown at the corresponding branches.
Presences of orthologous HERV/LTR copies in the reference genomes of mammals. The order of HERV/LTR copies is the same to the left tree.
Presences of TFBSs on each HERV/LTR copy. The order of HERV/LTR copies is the same to the left tree.
Presences of TF-binding motifs at positions corresponding to HSREs on each HERV/LTR copy. The order of HERV/LTR copies is the same to the left tree. Black and gray colors indicate presences of motifs with p values < 0.0001 and > 0.001, respectively.
Download all data set. (Size=979MiB, MD5=90ae21fa75febff25862b64d5821dc5f last update; Jun. 27, 2017)
We recommend you to use wget command for downloading our data.
When focusing on repetitive elements such as HERV/LTRs, it is important to check whether multiple mapped reads (reads can be mapped to multiple genomic regions) are excluded in data analysis of next generation sequencing. If multiple mapped reads are not excluded, false positive peaks may be detected at regions that have sequences similar to those authentically bounded by the TF. If they are excluded, it is unfeasible to identify ChIP-Seq peaks on recently integrated HERV/LTRs that show low sequence divergence among the copies. Therefore, we generated two types of ChIP-Seq peaks (TFBSs) datasets: all-read and unique-read TFBSs (Fig. 1). All-read TFBSs are ChIP-Seq peaks that were determined with all reads mapped to the human reference genome; in our analytical pipeline, a multiple mapped read was randomly assigned to a particular genomic position chosen from candidate positions. The unique-read TFBSs are ChIP-Seq peaks that were determined with only the reads uniquely mapped to the reference genome; in other words, multiple mapped reads were excluded before the peak calling of ChIP-Seq.
Fig. 1. An analytical pipeline for peak calling of ChIP-Seq in this study
When a regulatory element is observed on a sequence of HERV/LTR, there are two possible evolutionary scenarios (Fig. 2A, 2B) to depict how the regulatory element was generated.
Fig. 2A. Scenario 1: The regulatory element originally existed in the HERV/LTR before the insertion
Fig. 2B. Scenario 2: The regulatory element was newly arisen by mutations after the insertion
We probably can distinguish the two scenarios by examining many of the HERV/LTRs interspersed in the genome. Namely, the former scenario is more likely if the regulatory elements are shared/conserved among HERV/LTR copies. For this purpose, we defined HSREs, regulatory elements (TF-binding motifs in TFBSs) that are shared/conserved within a substantial fraction of HERV/LTR copies. We identified HSREs as following procedures (Fig. 3).
Fig. 3. Scheme for identification of HSREs
We investigated HSREs for two purposes; (i) to infer property of regulatory elements that were anciently present in HERV/LTRs or their ancestors of exogenous retroviruses, and (ii) to identify a set of regulatory elements that coordinately work in the genome. HSREs are shared among many HERV/LTR copies that are interspersed in the genome, and then a set of HSREs can modulate several genes in a coordinate manner (Fig. 4).
Fig. 4. Coordinate gene regulations by a set of HSREs
For this reason, many researchers considered that HERV/LTRs sharing regulatory elements contributed to evolution of gene regulatory networks of the host.
It is important to check whether a TF binds to a type of HERV/LTRs significantly more than expected, because HERV/LTRs occupy a large fraction of the genome, and therefore, TF binding would be partially observed on the HERV/LTRs regardless of the absence of a special association between the HERV/LTRs and TFs. Therefore, we evaluated statistical enrichment of binding of a TF in respective types of TEs to random expectation. For this purpose, we performed two kinds of permutation tests for filtering HERV-TFBSs; count-based and depth-based permutation tests. In the both tests, the merged TFBSs were used.
Count-based permutation test
We generated 100 randomized datasets of TFBSs by permuting genomic position of the TFBSs. In the observed and randomized datasets, HERV-TFBS overlaps were counted. Standardized score (z score) was calculated based on counts in the observed and the randomized datasets. The count-based test is available for all HERV-TFBS combinations.
Depth-based permutation test
We generated 500 randomized datasets of TFBSs by permuting genomic position of the TFBSs. In the observed and randomized datasets, depth of TFBSs in the MSA (Fig. 3-4) was measured at each consensus position. Z score was calculated at each consensus position based on depths in the observed, and randomized datasets (referred to as base-wised z score). After smoothing the base-wised z score with sliding window algorithm (window size; 50 bp), the maximum base-wised z score among all positions was defined as a depth-based z score. In respective HERV/LTR types, the depth-based test is only done for TFBSs whose maximum depth in observed dataset is greater than or equal to 10.
For filtering of HERV-DHSs, the count-based permutation test is available. Depth-based z score can distinguish the two situations shown in Fig. 5 whereas count-based z score cannot. Please note that a depth-based z score tends to be more sensitive than a count-based z score, particularly when a HERV/LTR type has long consensus sequence (such as internal sequence of HERV/LTRs).
Fig. 5. Two possible situations when HERV-TFBS overlaps were observed at 5 times
These files were written by BED-like format. Namely, a row indicate a particular genomic feature (HERV-TFBSs) and itsF genomic position; chromosome name, cromStart position, and chromEnd position. The first base in a chromosome is numbered 0, and the chromEnd base is not included in the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99.
Division of Human Genetics, National Institute of Genetics, Mishima, Shizuoka JAPAN
If there are some questions and comments on this database, please send e-mail to jampei0513@yahoo.co.jp