Below you will find pages that utilize the taxonomy term “2015”
Blog
MiClip 1.3 Installation
MiClip is a CLIP-seq data peak calling algorithm implemented in R but currently it doesn’t show up in the CRAN but you can obtain it from the archive and install from the source or tar.gz file.
Download the tar.gz file:
wget https://cran.r-project.org/src/contrib/Archive/MiClip/MiClip_1.3.tar.gz Start R:
R Install dependencies:
1install.packages("moments") 2install.packages("VGAM") Finally install MiClip 1.3:
1install.packages("MiClip_1.3.tar.gz", repos = NULL, type="source") Then you can test it by loading the package and viewing its help file.
Blog
Generating 2D SVG Images of MOL Files using RDKit Transparent Background
The latest release of RDKit (2015-03) can generate SVG images with several lines of codes but by default the generated SVG image has a white background. The investigations on sources didn’t solve my problem as I couldn’t find any option for setting background to transparent background.
An example of SVG image generation can be found on RDKit blog post called New Drawing Code.
In [3] shows the SVG image generation and it returns the SVG file content in XML.
Blog
Install Cairo Graphics and PyCairo on Ubuntu 14.04 / Linux Mint 17
Cairo is a 2D graphics library implemented as a library written in the C programming language but if you’d like to use Python programming language, you should also install Python bindings for Cairo.
This guide will go through installation of Cairo Graphics library version 1.14.2 (most recent) and py2cairo Python bindings version 1.10.1 (also most recent).
Install Cairo
It’s very easy with the following repository. Just add it, update your packages and install.
Blog
Install RDKit 2015-03 Build on Ubuntu 14.04 / Linux Mint 17
RDKit is an open source toolkit for cheminformatics. It has many functionalities to work with chemical files.
Follow the below guide to install RDKit 2015-03 build on an Ubuntu 14.04 / Linux Mint 17 computer. Since Ubuntu packages don’t have the latest RDKit for trusty, you have to build RDKit from its source.
Install Dependencies
1sudo apt-get install flex bison build-essential python-numpy cmake python-dev sqlite3 libsqlite3-dev libboost1.54-all-dev Download the Build
Blog
Generating 2D Images of Molecules from MOL Files using Open Babel
Open Babel is a tool to work with molecular data in any way from converting one type to another, analyzing, molecular modeling, etc. It also has a method to convert MOL files into SVG or PNG images to represent them as 2D images.
Install Open Babel in Linux as following or go to their page for different operating systems
1sudo apt-get install openbabel Open Babel uses the same command to generate SVG or PNG and recognizes the file format using the given filename to as the output option -O.
Blog
Simple Way of Python's subprocess.Popen with a Timeout Option
subprocess module in Python provides us a variety of methods to start a process from a Python script. We may use these methods to run an external commands / programs, collect their output and manage them. An example use of it might be as following:
1from subprocess import Popen, PIPE 2 3 4p = Popen(['ls', '-l'], stdout=PIPE, stderr=PIPE) 5stdout, stderr = p.communicate() 6print stdout, stderr These lines can be used to run ls -l command in Terminal and collect the output (standard output and standard error) in stdout and stderr variables using communicate method defined in the process.
Blog
Running StarCluster Load Balancer in Background in Linux
StarCluster loadbalancer command is regularly monitors the jobs in queue and it adds or removes nodes to the master node that is created beforehand to effectively complete the queue.
To run in in the background without killing it when the terminal closed:
nohup starcluster loadbalance cluster_name >loadbalance.log 2>&1 & or to keep standard output and standard error logs separate:
nohup starcluster loadbalance cluster_name > loadbalance.access.log 2> loadbalance.error.log & This will start the process and output the process ID (PID) which can be used to check or kill it.
Blog
Change Apache’s Default User www-data or Home Directory /var/www/
I was getting errors from StarCluster run due to not being able to find .starcluster directory in /var/www/.
This directory has config file and log directories for StarCluster so without it, it can’t run.
To solve the issue, I set up my own user in Apache envvars instead of www-data which also changes default home directory to mine.
Edit following file with super user permissions:
sudo nano /etc/apache2/envvars Enter your username to following lines and save:
Blog
Transfer Files to Your AWS S3 Storage in Linux
Uploading files to an AWS S3 storage can be difficult through the GUI with many files included or if your files are in a server where you don’t have a GUI option. Use following tool to transfer files to an S3 bucket.
Download following tool and install:
cd ~/Downloads git clone https://github.com/s3tools/s3cmd.git cd s3cmd/ sudo python setup.py install Next, execute following to create a configuration file to connect to your AWS S3 account:
Blog
ImportError: Reportlab Version 2.1+ is needed
Little bug in xhtml2pdf version 0.0.5. To fix:
$ sudo nano /usr/local/lib/python2.7/dist-packages/xhtml2pdf/util.py Change the following lines:
1if not (reportlab.Version[0] == "2" and reportlab.Version[2] >= "1"): 2 raise ImportError("Reportlab Version 2.1+ is needed!") 3 4REPORTLAB22 = (reportlab.Version[0] == "2" and reportlab.Version[2] >= "2") With these lines:
1if not (reportlab.Version[:3] >= "2.1"): 2 raise ImportError("Reportlab Version 2.1+ is needed!") 3 4REPORTLAB22 = (reportlab.Version[:3] >= "2.1")
Blog
Django Migrations Table Already Exists Fix
Fix this issue by faking the migrations:
python manage.py migrate –fake <appname> Taken from this SO answer
Blog
Mezzanine BS Banners Translation with django-modeltranslation
Mezzanine BS Banners is a nice app for implementing Bootstrap 3 banners/sliders to your Mezzanine projects. The Banners model in BS Banners app has a title and its stacked inline Slides model has title and content for translation.
After [installing and setting up Django/Mezzanine translations]({% post_url 2015-07-01-djangomezzanine-content-translation-for-mezzanine %}):
Create a translation.py inside your Mezzanine project or your custom theme/skin application and copy/paste following lines:
1from modeltranslation.translator import translator 2from mezzanine.core.translation import TranslatedSlugged, TranslatedRichText 3from mezzanine_bsbanners.
Blog
Django/Mezzanine Content Translation for Mezzanine Built-in Applications
As Mezzanine comes with additional Django applications such as pages, galleries and to translate their content, Mezzanine supports django-modeltranslation integration.
Install django-modeltranslation:
pip install django-modeltranslation Add following to the INSTALLED_APPS in settings.py:
1"modeltranslation", And following in settings.py:
1USE_MODELTRANSLATION = True Also, move mezzanine.pages to the top of other Mezzanine apps in INSTALLED_APPS in settings.py like so:
1"mezzanine.pages", 2"mezzanine.boot", 3"mezzanine.conf", 4"mezzanine.core", 5"mezzanine.generic", 6"mezzanine.blog", 7"mezzanine.forms", 8"mezzanine.galleries", 9"mezzanine.twitter", 10"mezzanine.accounts", 11"mezzanine.mobile", Run following to create fields in database tables for translations:
Blog
Convert XLS/XLSX to CSV in Bash
In most of the modern Linux distributions, Libre Office is available and it can be used to convert XLS or XLSX file(s) to CSV file(s) in bash.
For XLS file(s):
1for i in *.xls; do libreoffice --headless --convert-to csv "$i"; done For XLSX file(s):
1for i in *.xlsx; do libreoffice --headless --convert-to csv "$i"; done You may get following warning but it still works fine:
1javaldx: Could not find a Java Runtime Environment!
Blog
Setting Up Templates and Python Scripts for Translation
Templates need following template tag:
1{% raw %}{% load i18n %}{% endraw %} Then, wrapping any text with
1{% raw %}{% trans "TEXT" %}{% endraw %} will make it translatable via Rosetta Django application
In Python scripts, you need to import following library:
from django.utils.translation import ugettext_lazy as _ Then wrapping any text with
1_('TEXT') will make it translatable.
Blog
Django Rosetta Translations for Django Applications
Make a directory called locale/ under the application directory:
cd app_name mkdir locale Add the folder in LOCAL_PATHS dictionary in settings.py:
1LOCALE_PATHS = ( 2 os.path.join(PROJECT_ROOT, 'app_name', 'locale/'), 3) Run the following command to create PO translation file for the application:
python ../manage.py makemessages -l tr -e html,py,txt python ../manage.py compilemessages Option -l is for language, it should match your definition in settings.py:
1LANGUAGES = ( 2 ('en' _('English')), 3 ('tr' _('Turkish')), 4 ('it' _('Italian')), 5) Repeat the last step for all languages and the go to Rosetta URL to translate.
Blog
Django Rosetta Installation
Install SciPy:
sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose Install pymongo and nltk:
sudo pip install pymongo sudo pip install nltk Install Python MySQLdb:
sudo apt-get install python-mysqldb Install Rosetta:
sudo pip install django-rosetta Add following into INSTALLED_APPS in settings.py:
1"rosetta", Add following into urls.py:
url(r’^translations/’, include(‘rosetta.urls’)), To also allow language prefixes , change patters to i18n_patterns in urls.py:
1urlpatterns += i18n_patterns( 2 ... 3)
Blog
Obtaining Molecule Description using Open Babel / PyBel
Open Babel is a great tool to analyze and investigate molecular data (.MOL, .SDF files). Its Python API is particularly very nice if you are familiar with Python already. In this post, I’ll demonstrate how you can obtain molecule description such as molecular weight, HBA, HBD, logP, formula, number of chiral centers using PyBel.
Installation
1$ sudo apt-get install openbabel python-openbabel Usage for MW, HBA, HBD, logP
After reading .MOL file, we need to use calcdesc method with descnames argument for getting the descriptions.
Blog
Running Script on Cluster (StarCluster)
Start a new cluster with the configuration file you modified:
starcluster start cluster_name Send the script to the running cluster:
starcluster put cluster_name myscr.csh /home/myscr.csh Run it using source:
starcluster sshmaster cluster_name "source /home/myscr.csh >& /home/myscr.log"
Blog
Uploading Files to AWS using SSH/SCP
Here is a small command for uploading files to AWS through SSH’s command scp (secure copy).
scp -i path/to/your/key-pairs/file path/to/file/you/want/to/upload ubuntu@PUBLIC_DNS:path/to/the/destination
Blog
Errno 13 Permission denied Django File Uploads
Run following command to give www-data permissions to static folder and all its content:
cd path/to/your/django/project sudo chown -R www-data:www-data static/ Do this in your production server
Blog
Configuring Mezzanine for Apache server & mod_wsgi in AWS
Install [Mezzanine]({% post_url 2015-05-01-how-to-install-mezzanine-on-ubuntulinux-mint %}), [Apache server]({% post_url 2015-05-08-getting-started-with-your-aws-instance-and %}) and mod_wsgi:
sudo apt-get install libapache2-mod-wsgi sudo a2enmod wsgi Set up a MySQL database for your Mezzanine project
Read [my post on how to set up a MySQL database for a Mezzanine project]({% post_url 2015-05-09-how-to-set-up-a-mysql-database-for-a-mezzanine %})
Collect static files:
python manage.py collectstatic Configure your Apache server configuration for the project like following:
WSGIPythonPath /home/ubuntu/www/mezzanine-project <VirtualHost *:80> #ServerName example.com ServerAdmin admin@example.com DocumentRoot /home/ubuntu/www/mezzanine-project WSGIScriptAlias / /home/ubuntu/www/mezzanine-project/wsgi.
Blog
How to Set Up a MySQL Database for a Mezzanine Project
Install MySQL server and python-mysqldb package:
sudo apt-get install mysql-server sudo apt-get install python-mysqldb Run MySQL:
mysql -u root -p Create a database:
mysql> create database mezzanine_project; Confirm it:
mysql> show databases; Exit:
mysql> exit Configure local_settings.py:
cd path/to/your/mezzanine/projectnano local_settings.py Like following:
1DATABASES = { 2 "default": { 3 "ENGINE": "django.db.backends.mysql", 4 "NAME": "mezzanine_project", 5 "USER": "root", 6 "PASSWORD": "123456", 7 "HOST": "", 8 "PORT": "", 9 } 10 } Note: Replace your password
Blog
Setting Up Mezzanine Projects in AWS
Go to EC2 management console, Security groups and add a Custom TCP inbound rule with port 8000. Select “Anywhere” from the list.
Then follow [this to install Mezzanine]({% post_url 2015-05-01-how-to-install-mezzanine-on-ubuntulinux-mint %})
Above tutorial is also explains setting up a site record. Mezzanine default site record is 127.0.0.1:8000 which should be 0.0.0.0:8000 in our case. So, enter 0.0.0.0:8000 when you’re asked to enter a site record when you ru
python manage.py createdb Also, you might still need to provide this site record while running the development server:
Blog
Getting Started with Your AWS Instance and Installing and Setting Up an Apache Server
Update and upgrade packages:
sudo apt-get update sudo apt-get upgrade Install Apache server:
sudo apt-get install apache2 Set up a root folder in home folder and create an index file for testing:
mkdir ~/www echo ‘Hello, World!’ > ~/www/index.html Set up your virtual host:
sudo cp /etc/apache2/sites-available/000-default.conf /etc/apache2/sites-available/000-www.conf sudo nano /etc/apache2/sites-available/000-www.conf Modify DocumentRoot to point your “www” folder in home folder (e.g. /home/ubuntu/www)
And add following lines after DocumentRoot line:
Blog
AWS Start an Instance and Connect to it
Go to EC2 management console
Create a new key-pair if necessary and download it
Launch an instance
Add HTTP security group for web applications over HTTP
Get public DNS
Change permissions on key-pair file:
1chmod 400 path/to/your/file.pem Connect:
1ssh -i path/to/your/file.pem ubuntu@PUBLIC_DNS Note: ubuntu is for connecting an Ubuntu 64 bit instance. It’s different for others
Blog
How to Get Path to or Directory of Current Script in R
Use following code to get the path to or directory of current (running) script in R:
1scr_dir <- dirname(sys.frame(1)$ofile) 2scr_path <- paste(scr_dir, "script.R", sep="/") Taken from SO
Blog
How to Get (or Load) NCBI GEO Microarray Data into R using GEOquery Package from Bioconductor
R, especially with lots of Bioconductor packages, provides nice tools to load, manage and analyze microarray data. If you are trying to load NCBI GEO data into R, use GEOquery package. Here, I’ll describe how to start with it and probably in my future posts I’ll mention more.
Installation
1source("http://bioconductor.org/biocLite.R") 2biocLite("GEOquery") Usage
1library(GEOquery) 2gds <- getGEO("GDS5072") or
1library(GEOquery) 2gds <- getGEO(filename="path/to/GDS5072.soft.gz") getGEO function return a complex class type GDS object which contains the complete dataset.
Blog
How to Install Mezzanine on Ubuntu/Linux Mint [Complete Guide]
Mezzanine is a CMS application built on Django web framework. The installation steps are easy but your environment may not just suitable enough for it work without a problem. So, here I’m going to describe complete installation from scratch on a virtual environment.
First of all, install virtualenv:
$ sudo apt-get install python-virtualenv Then, create a virtual environment:
$ virtualenv testenv And, activate it: $ cd testenv $ source bin/activate
Blog
How to Clear (or Drop) DB Table of A Django App
Let’s say you created a Django app and ran python manage.py syncdb and created its table. Everytime you make a change in the table, you’ll need to drop that table and run python manage.py syncdb again to update. And how you drop a table of a Django app:
$ python manage.py sqlclear app_name | python manage.py dbshell Drop tables of an app with migrations (Django >= 1.8):
$ python manage.py migrate appname zero Recreate all the tables:
Blog
Salmonella - Host Interaction Network - A Detailed, Better Visualization
We’re almost done with the analyses and we’re making the final visualization of the network. As I previously posted, the network was clustered and visualized by time points. After that, we have done several more analyses and here I report how we visualized them. I’m going to post more about how we did the analyses separately.
First, the nodes are grouped into experimental and not experimental (PCSF nodes). This can easily be done by parsing experimental network output and network outputs of PCSF.
Blog
GO Enrichment of Network Clusters
In my previous post, I mentioned how I clustered the network we obtained at the end. For functional annotation gene ontology (GO) enrichment has been done on these clusters.
There were 20 clusters and the HGNC names are obtained separately for each cluster and using DAVID functional annotation tool API, GO and pathway annotations are collected per cluster and these are saved separately.
http://david.abcc.ncifcrf.gov/api.jsp?type=OFFICIAL_GENE_SYMBOL&tool=chartReport&annot=GOTERM_BP_FAT,GOTERM_CC_FAT,GOTERM_MF_FAT,BBID,BIOCARTA,KEGG_PATHWAY&ids=HGNC_NAME1,HGNC_NAME2,HGNC_NAME3,... Above URL was used to obtain chart report for some GO and pathways chart records.
Blog
Network Clustering with NeAT - RNSC Algorithm
As we have obtained proteins at different times points from the experimental data, then we have found intermediate nodes (from human interactome) using PCSF algorithm and finally with a special matrix from the network that PCSF created, we have validated the edges and also determined edge directions using an approach which a divide and conquer (ILP) approach for construction of large-scale signaling networks from PPI data. The resulting network is a directed network and will be used and visualized for further analyses.
Blog
Finding k-cores and Clustering Coefficient Computation with NetworkX
Assume you have a large network and you want to find k-cores of each node and also you want to compute clustering coefficient for each one. Python package NetworkX comes with very nice methods for you to easily do these.
k-core is a maximal subgraph whose nodes are at least k degree [1]. To find k-cores:
Add all edges you have in your network in a NetworkX graph, and use core_number method that gets graph as the single input and returns node – k-core pairs.
Blog
Searching Open Reading Frames (ORF) in DNA sequences - ORF Finder
Open reading frames (ORF) are regions on DNA which are translated into protein. They are in between start and stop codons and they are usually long.
The Python script below searches for ORFs in six frames and returns the longest one. It doesn’t consider start codon as a delimiter and only splits the sequence by stop codons. So the ORF can start with any codon but ends with a stop codon (TAG, TGA, TAA).
Blog
Reconstructed Salmonella Signaling Network Visualized and Colored
After fold changes were obtained and HGNC names were found for each phosphopeptide, these were used to construct Salmonella signaling network using PCSF and then with the nodes that PCSF found as well, we generated a matrix which has node in the rows and time points in the columns and each cell shows the presence of corresponding protein under the corresponding time point(s).
The matrix has 658 nodes (proteins) and 4 time points as indicated before: 2 min, 5 min, 10 min and 20 min.
Blog
Python: Get Longest String in a List
Here is a quick Python trick you might use in your code.
Assume you have a list of strings and you want to get the longest one in the most efficient way.
1>>>l=["aaa", "bb", "c"] 2>>>longest_string = max(l, key = len) 3>>>longest_string 4'aaa'
Blog
Python: defaultdict(list) Dictionary of Lists
Most of the time, when you need to work on large data, you’ll have to use some dictionaries in Python. Dictionaries of lists are very useful to store large data in very organized way. You can always initiate them by initiating empty lists inside an empty dictionary but when you don’t know how many of them you’ll end up with and if you want an easier option, use defaultdict(list). You just need to import it, first:
Blog
Python: extend() Append Elements of a List to a List
When you append a list to a list by using append() method, you’ll see your list is going to be appended as a list:
1>>>l=["a"] 2>>>l2=["a", "b"] 3>>>l.append(l2) 4>>>l 5['a', ['a', 'b']] If you want to append elements of the list directly without creating nested lists, use extend() method:
1>>>l=["a"] 2>>>l2=["a", "b"] 3>>>l.extend(l2) 4>>>l 5['a', 'a', 'b']
Blog
Salmonella Data Preprocessing for PCSF Algorithm
This post describes data preprocessing in Salmonella project for Prize-Collecting Steiner Forest Problem (PCSF) algorithm.
Salmonella data taken from Table S6 in Phosphoproteomic Analysis of Salmonella-Infected Cells Identifies Key Kinase Regulators and SopB-Dependent Host Phosphorylation Events by Rogers, LD et al. has been converted to tab delimited TXT file from its original XLS file for easy reading in Python.
The data should be separated into time points files (2, 5, 10 and 20 minutes) each of which will contain corresponding phophoproteins and their fold changes.
Blog
UPGMA Algorithm Described - Unweighted Pair-Group Method with Arithmetic Mean
UPGMA is an agglomerative clustering algorithm that is ultrametric (assumes a molecular clock - all lineages are evolving at a constant rate) by Sokal and Michener in 1958.
The idea is to continue iteration until only one cluster is obtained and at each iteration, join two nearest clusters (which become a higher cluster). The distance between any two clusters are calculated by averaging distances between elements of each cluster.
To understand better, see UPGMA worked example by Dr Richard Edwards.