MAnorm R_tutorial

1. Pre-requisites:

b. Bioconductor packages: affy, R.basic and MASS
  type following lines to install these 3 packages in R
  #affy:
    source("http://bioconductor.org/biocLite.R")
    biocLite(c("affy"))
  #R.basic and MASS:
    source("http://www.braju.com/R/hbLite.R")
    hbLite(c("R.basic","MASS"))


2. Usage:
    I. Create a folder and place in the folder MAnorm.sh, MAnorm.r, and all 4 bed files (peak file and read file for the 2 samples under comparison) to be analyzed.
    II. run command:
    ./MAnorm.sh   sample1_peakfile[BED]     sample2_peakfile[BED]     sample1_readfile[BED]    sample2_readfile[BED]     sample1_readshift_lentgh[INT]       sample2_readshift_length[INT]   output_folder[string]                       

3. Example:
   ./MAnorm.sh  sample1_peaks.bed  sample2_peaks.bed  sample1_read.bed  sample2_read.bed  150  150  output_folder 
      The first 4 input files should be in bed format with no header lines
           The first 2 peak files have 3 columns: chromosome, start, end.
           The next 2 read files should have 6 columns (information from column 1, 2,3,6 are used): chromosome, start, end, description, description, strand (+/-)
      The 5th and 6th parameters are the number of bp to be shifted for each read, which is approximately half of the fragment size. MACS estimates and give the shiftsize in the headerlines of MACS peak file *_peaks.xls after "# d =".
      The last parameter is the output folder
                                                                                                                                                                                                                    
      MAnorm.r is called from MAnorm.sh, and there is no need to run it separately.  Please check file Rcommand.out for the output file from running the R script for error tracking.

4. Output:       
4.1 Output files created:
  MAnorm_result.xls: Includes all peak coordinates, # raw reads from sample1, #raw reads from sample2, M-value_rescaled, A-value_rescaled, -log10(p-value). 
  This file lists the peak coordinates, raw reads, M and A values, and MA-norm p-value for the set of common peaks in each sample and for the peaks unique to each sample.  
  The file below (MAnorm_result_commonPeak_merged.xls) provides the list of merged common peaks.
  MAnorm_result_commonPeak_merged.xls: This output file is similar to MAnorm_result.xls, except that the common peaks are merged. This file corresponds to a list of merged peaks for the two samples being compared.
                                                                                                    
  sample1_peaks.wig: wig file for sample1 peak list, with the width of the bar representing the M value
  sample2_peaks.wig: wig file for sample2 peak list, with the width of the bar representing the M value
                                                                                                                                                                                                                 
  MAplot_before_rescaling (common peaks).png: MAplot for common peaks before MAnorm
                green line: robust regression
                red line: LOWESS regression
                blue line: line M = 0    
  MAplot_after_rescaling (all peaks).png: MAplot for all peaks after MAnorm
                red line: LOWESS regression
                blue line: line M = 0
                
4.2 Temporary Output files removed after MAnorm is done (If these files are needed, please modify MAnorm.sh):
                peak1.bed: sample 1 peak list
                peak2.bed: sample 2 peak list
                read1.bed: sample 1 read used for mapping to peaks
                read2.bed: sample 2 read used for mapping to peaks
                peak1_dump.bed: sample 1 peaks not used
                peak2_dump.bed: sample 2 peaks not used
                read1_dump.bed: sample 1 read NOT used for mapping to peaks
                read2_dump.bed: sample 2 read NOT used for mapping to peaks
                                        
5. other useful scripts:
5.1  classify_by_M_value.sh:
     description:a useful script  for peak classification based on M value, which could be uploaded directly to a genome browser.  Required to be in the same folder of MAnorm_result.xls generated by MAnorm.sh.
     usage:
     ./classify_by_M_value.sh    absolute_M-value_cutoff_unbiased     absolute_M-value_minimum_for_unique_peaks    -log10p_cutoff_unique
     example_command:
     ./classify_by_M_value.sh 0.5  1  5
     In the above example, sample 1 unique peaks (non-concordant peaks) are peaks with M-values greater than 1 and that have a log base 10(p-value) greater than 5.  Similarly, sample 2 unique peaks (non-concordant peaks) are peaks with M-values less than (-1) that have a log base 10(p-value) greater than 5.  Unbiased peaks (concordant peaks) are peaks with M-values between (-0.5) and (+0.5).   
     output:
     3 bed files:  2 unique peak lists (classI_sample1_uniq_peaks.bed  and  classII_sample2_uniq_peaks.bed)
                       1 unbiased peak list (classIII_unbiased_peaks.bed)
back to the MAnorm mainpage
MAnorm R Version Tutorial

1. Pre-requisites:

2. Usage:

3. Example:

4. Output:

4.1 Output files created:

4.2 Temporary Output files removed after MAnorm is done (If these files are needed, please modify MAnorm.sh):

5. other useful scripts: