Manual

miREM is a web-analysis tool designed for the prediction of underlying differentially expressed miRNAs from a list of differentially expressed genes (DEG). It does this by referencing established miRNA-interaction databases to identify the miRNA-repressors of each gene in the list, and then calculates the overall repressor enrichment probability of each of the miRNAs.

miREM utilizes the strengths of hypergeometric and expectation-maximization (EM) probabilistic approaches for the predictions. This approach has been shown to be a more reliable predictor than pure hypergeometric approaches currently being used by other prediction software (described in our manuscript).

  1. How miREM works?
  2. How to use miREM?
  3. How to interpret the results?
  4. Saving the results

1. How miREM works?

Here is a graphical summary of miREM’s workflow.

 

The miREM The workflow is composed of the following five steps:

  1. Each gene derived from the differentially expressed genes (DEG) list is ‘mapped’ to its targeted miRNA(s) using the selected prediction databases featured in miREM, thus creating a list of potentially repressive miRNAs.
  2. The hypergeometric p-value (and corrected p-value according to Benjamini-Hochberg) is determined for each unique miRNA so as to identify its enrichment significance.
  3. If only one miRNA is found to have a significant p-value, the program stops and identifies only this significant miRNA as having an influence on the DEG. This is, however, very rare and the identification of more than one miRNA is more likely when only the hypergeometric method is being used.
  4. The program then selects miRNAs with corrected p-values below the specified threshold and subjects them to the EM-algorithm to establish the likelihood probability of each miRNA. miRNAs with the highest likelihood probabilities are the most likely to have an influence on the DEG. Along with tab-delimited files, miREM results are available through a clustered heatmap of miRNA-gene interactions to ease visualization of the genes targeted by predicted miRNAs. This visualization also enables users to intuitively infer the kind of co-repression activity these predicted miRNAs play in the system.
  5. Finally, predicted miRNAs are clustered according to their mature region sequences in order to identify duplicated predictions (miRNAs sharing similar sequences).

2. How to use miREM?

miREM comes with a simple-to-use interface divided into 3 main parts as follows. These parts are:
1. Input
2. Mapping & Database selection

3. Thresholds selection

 

1. Input

miREM currently accepts 4 types of gene annotations for the DEG list (down- or up-regulated genes):

  • Refseq id (eg. NM_000401, NM_001004298, NM_001001182…)
  • Ensembl id (eg. ENSG00000001167, ENSG00000001630, ENSMUSG00000000125…)
  • UCSC id (eg. uc009vis.3, uc010nxu.2, uc011whw.1…)
  • Official Gene Names (eg. TP53, ERBB2, Sox2…)

The gene-list can either be pasted into the textbox, or uploaded as a text file (1 gene per line). Species selection (human or mouse) is then made so as to allow referencing of the appropriate databases.
When a downregulated gene-list is provided, miREM will predict for upregulated miRNAs. Alternatively, when an upregulated gene-list is provided, miREM will predict for downregulated miRNAs.

 

2. Mapping & Databases Selection

miREM offers 7 established miRNA-target prediction databases which can either be queried alone or can be intersected to map the gene-list to the corresponding repressor miRNAs. The sources for the prediction databases are given here.

miREM offers 2 options, users can opt either to 1) intersect specific databases (right panel) or 2) intersect databases dynamically (left panel).

Database selection is the most important consideration when running miREMs as it determines the breath of miRNAs that are available for prediction. This selection is a matter of user preference. For the uninitiated, the selection of appropriate databases based on the different algorithms can be daunting. We offer some advice on how to go about getting the correct results:

Firstly, understand the sizes of the database and the similarities of prediction between each algorithm (cf. Release notes).

Secondly, understand the prediction principles of each database/algorithm. It helps to determine how each program differs and what you can expect by either intersecting them or using them separately.

Thirdly, test the gene-list that you have on one or more databases (changing the database(s) each time) to see if the prediction results hold.

NB. It is not always true that a stringent database selection (i.e choosing prediction from all 7 databases together) will give you the correct results.

It worths highlighting that miREM selects the miRNA-target interactions commonly shared in the databases chosen by users (if users choose to query specific databases). Since conserved TargetScan and non-conserved TargetScan pools are mutually independent, simultaneous selection of the both will lead to no result in miREM analysis. The same consequence will be seen if both Miranda (conserved) and Miranda (non-conserved) are queried at the same time.

3. Threshold Selection

Finally, there are two thresholds for consideration in miREM.

The hypergeometric threshold is the p-value cut-off used to select the miRNAs which would be subjected to the EM-algorithm for the final analysis. The default value of 0.001 or 0.0001 has been shown to perform well with our test datasets.

The EM threshold is the convergence threshold used as a stopping criterion for the EM iteration. The default value of 0.001 has also been shown to work well.

3. How to interpret the results?

The main output from miREM consists of a table of predicted miRNAs along with the following metrics:

  • raw hypergeometric p-value
  • corrected hypergeometric p-value (Benjamini-Hochberg)
  • EM-probability
  • number of genes in the gene-list associated with the miRNA.

The following example shows the results from the analysis of up-regulated genes after a double miRNA knockout (KO) (miR-144/451 KO | Yu et al. Genes & Dev., 2010 | see more examples here). From this gene-list, miREM is able to correctly predict the KO miRNAs.

To facilitate the understanding of the results, miREM provides additional plots:

  1. A responsive scatter-plot where predicted targets (blue dots) are shown according to their hyper-geometric p-value (x-axis) vs their EM score (y-axis) (Figure 1).
    Additional information concerning the predicted miRNAs is shown when the user hovers over the dots.

Figure 1: P-values/EM scores scatter-plot of miREM predicted miRNAs.

  • A heatmap where predicted miRNA are clustered according to their respective gene targets (Figure 2).
    This representation allows the visualization of the miRNA activities such as specific or co-operative repressions (ie. genes can be targeted by one or several miRNAs).
    Here, blue denotes an interaction and yellow no-interaction.

Figure 2: Heatmap of predicted miRNAs clustered according to their respective gene targets.

  • A phylogenetic tree where the mature sequences of predicted miRNAs are classified according to their homologies (Figure 3).
    miRNAs which share similar mature regions are likely to target similar genes. Therefore, such miRNAs are likely to be co-predicted. It is useful to group predicted miRNAs according to their mature sequences in order to prioritize candidates efficiently by identifying “duplicate predictions”. For instance, the figure 3A shows a miREM analysis of a DEG list derived from the knock-in of miR-1. miREM provided an EM score for three miRNAs, the actual knock-in miRNA (miR-1), and two other miRNAs (miR-206 and miR-613), which were falsely identified due to their sequence and thus biological similarity with the knock-in miRNA, as shown by the clustering tree (Figure 3A).

    Figure 3A: miREM analysis of down-regulated genes derived from the knock-in of miR-1.

  • In contrast, a user might consider all predicted miRNAs with EM scores, as these may have very distinct mature sequences. For instance, the figure 1B shows a miREM analysis of a gene-list from a double miRNA knock-out (miR-144 and miR451), where both miRNAs were correctly predicted. Here, mature sequences of both miRNAs are distinct, suggesting that these miRNAs target distinct genes in the given gene-list.

 

Figure 3B: miREM analysis of up-regulated genes derived from the miR-144 and miR451 double knock-out.

4. Saving the results

The results page summarizes the analysis options and features a URL for users to download all text and graphical elements found on the results page (see image above). This also includes the matrix used to generate the heatmap so users are able to see in detail the genes that are affected by the considered miRNAs. The URL will only be active for 7 days after the analysis.