Below you will find pages that utilize the taxonomy term “r programming”
Blog
Convert Gene Symbols to Entrez IDs in R
Bioinformatics studies usually includes gene symbols as identifiers (IDs) as they are more recognizable comparing to other IDs such as Entrez IDs. However, certain analyses (tools) may not use gene symbols as there are usually more than one symbol so it is more difficult to implement a method to work with gene symbols. In such cases, you may need to do a conversion which is very common thing to do in bioinformatics.
Blog
Computing Significance of Overlap between Two Sets using Hypergeometric Test
There are many cases where we have two sets (e.g. under two different conditions) of things such as transcripts, genes or proteins and we want to compute the significance of the overlap between them. Hypergeometric test is very simple and widely used option for such cases.
I’ll use the phyper function in R but you can use the same idea in SciPy (Python).
Let’s say you have from 200 genes (A);
Blog
Mann Whitney U Test (Wilcoxon Rank-Sum Test) Javascript Implementation
Currently Javascript is really poor in statistical methods compared to Python (SciPy) and R. There are several efforts to fill this gap, most notably from jStat. However, still many functions, distributions and tests are missing in this library. In one of my projects, I had to implement a Javascript version of Mann Whitney U test (or also called Wilcoxon rank-sum test). Here, I’m giving a link to its source code and describing how it works.
Blog
MiClip 1.3 Installation
MiClip is a CLIP-seq data peak calling algorithm implemented in R but currently it doesn’t show up in the CRAN but you can obtain it from the archive and install from the source or tar.gz file.
Download the tar.gz file:
wget https://cran.r-project.org/src/contrib/Archive/MiClip/MiClip_1.3.tar.gz Start R:
R Install dependencies:
1install.packages("moments") 2install.packages("VGAM") Finally install MiClip 1.3:
1install.packages("MiClip_1.3.tar.gz", repos = NULL, type="source") Then you can test it by loading the package and viewing its help file.
Blog
How to Get Path to or Directory of Current Script in R
Use following code to get the path to or directory of current (running) script in R:
1scr_dir <- dirname(sys.frame(1)$ofile) 2scr_path <- paste(scr_dir, "script.R", sep="/") Taken from SO
Blog
How to Get (or Load) NCBI GEO Microarray Data into R using GEOquery Package from Bioconductor
R, especially with lots of Bioconductor packages, provides nice tools to load, manage and analyze microarray data. If you are trying to load NCBI GEO data into R, use GEOquery package. Here, I’ll describe how to start with it and probably in my future posts I’ll mention more.
Installation
1source("http://bioconductor.org/biocLite.R") 2biocLite("GEOquery") Usage
1library(GEOquery) 2gds <- getGEO("GDS5072") or
1library(GEOquery) 2gds <- getGEO(filename="path/to/GDS5072.soft.gz") getGEO function return a complex class type GDS object which contains the complete dataset.
Blog
Plotting Expression Curves for Experimental Data
As I can plot expression curves for in silico data. I moved on experimental data which is more complex and larger. This data is the result of RPPA experiments on different breast cancer cell lines and it includes protein abundance measurements for about 45 phophoproteins. These phosphoproteins are treated with different inhibitors and stimuli and by comparing their expressions, I will try to infer relations between them.
Before moving on inferring part, I want to have a script that can plot the graphs so that I can see particular results for specific cases.
Blog
Experimental Data Optimization for Network Inference
As I mentioned in my previous post, experimental data from the challenge has missing data values that create problems during analyses. To solve it, first thing I did was to optimize data, which includes detecting missing conditions and putting NAs for data values and sorting them if necessary.
I wrote two functions in the script. First one ranks the data according to the fashion and sorts it based on these ranks.
Blog
Some String Functions in R, String Manipulation in R
I have programmed with Perl, Python, and PHP before, and string manipulation was more direct and easier in them than in R. But still there are useful functions for string manipulation in R. I’m not an expert in R but I’ve been dealing with it for a while and I’ve learned some good functions for this purpose.
Concatenate strings
Concatenation is done with paste function. It gets concatenated strings as arguments separated bu comma and also separator character(s).
Blog
Network Inference DREAM Breast Cancer Challenge
The inference of causal edges are described as the change on a node seen after the intervention of another node. If the curves obtained over time overlap (under intervention or no intervention), then there is no relation. Otherwise, we can draw an edge between those nodes and according to the level, up or down, the edge will be activating or inhibiting. These causal edges are context-specific so in different cell line data, we may have different relations.