Güngör Budak's Blog

Bioinformatics, web programming, coding in general

If clean URLs don't work in Laravel 4 on Ubuntu 12.04 LTS

.htaccess directions are correct, mod_rewrite is enabled but still you are getting 404 Not Found errors…

You need to change AllowOverride None to AllowOverride All in /etc/apache2/sites-available/default.

Modified section in the file:

<Directory /home/user/www/>
    Options Indexes FollowSymLinks MultiViews
    AllowOverride All
    Order allow,deny
    allow from all

A Nice File Browser for Geany 1.23 on Ubuntu 12.04 LTS

If you’re looking for a file browser for Geany, check out TreeBrowser plugin on its page (see the page for screenshots).

To install and enable, just run following o Terminal:

sudo apt-get install geany-plugin-treebrowser

And go to “Tools” -> “Plugin Manager”, check “TreeBrowser”


Base URL for Your Laravel 4 Website

To get base URL of your website to generate links to your content or assets do following:

Set $url in app/config/app.php to your base URL:

'url' => 'http://localhost/example',

Use it everywhere with URL::to(), for example:

echo URL::to('assets/css/general.css');
/* outputs http://localhost/example/assets/css/general.css */

Remove public from URL Laravel 4

Move all content of (files in) public/ folder one level above (to the base)

Fix paths in index.php:

require __DIR__.'/bootstrap/autoload.php';
$app = require_once __DIR__.'/bootstrap/start.php';

Fix path in bootstrap/paths.php:

'public' => __DIR__.'/..',



Last Submissions to the Challenge

Today, I submitted in silico and experimental data network inference results on Synapse for the next leaderboard on this Wednesday.

For experimental part, I had to exclude edges with FGFR1 and FGFR3 because the data lacks phosphorylated forms of these proteins and networks must be constructed using only phosphoproteins in the data.

Since there was an update for in silico part, I had to modify the script and resubmit the results. I will see the score for this part on this Wednesday as well.

Also, I added the scripts to a public GitHub repo called “netinf-bigcat”.

Network Visualization Using Cytoscape

Cytoscape is a nice tool to visualize network for better understanding and delivery. I used it for in silico data network visualization and the result was really pretty. Now, I have networks constructed using experimental data from HPN-DREAM Challenge.

In this post, I want to demonstrate how to visualize a network with scores. I’m using Cytoscape 2.8 on Ubuntu 12.

First, the network will be read from a SIF file which is default format of Cytoscape for networks. But, to have scores of edges in the network, this SIF file will be different. This SIF file has 4 columns: parent node, relation, child node and score. These columns will be named not to confuse us during visualization. Below, you can find an example.

Network visualization with Cytoscape 1

Next, by clicking on “File” -> “Import” -> “Network from Table (Text/MS Excel)…”, we will open a window to read data properly. On the window, we will open additional options by clicking on “Show Text File Import Options”, then “Transfer first line as attribute names…”. This option will help use column names in the network. Next, we will define source (parent), interaction (relation) and target (child) and also click on ‘‘Score” column to have it as an edge attribute. After all this, we should have this kind of selections:

Network visualization with Cytoscape 2

Then, we click on “Import”.

Network visualization with Cytoscape 3

To have a better look of imported network, we can use “Force-directed layout” option. To change the visualization, we go to VizMapper tab of Control Panel on the left. From the list, we select mappings we want to change. Every change will be seen on the next side.

In VizMapper, for each mapping, we define the attribute, and mapping type. Attributes are the columns that we provide in SIF file. There are three mapping types: Continuous, passthrough and discrete mapper. Continuous mapper is used for numbers and provides some kind of gradient look for the mapping. For example, in the example below, edge thickness is done using this option and width is determined by its score. Passthrough mapper puts directly what each row has according to the attribute. Examples are edge scores as edge labels and node names as node labels. In discrete mapper, we select different options for different elements. For example, there are only two relation types and I want “inhibition” (-1) relation to have red colored edge with T-shaped end and “activating” (1) relation to have green colored edge with arrow-shaped end.

Network visualization with Cytoscape 4

Also, we change “Defaults” from Control Panel for node color, font size, background color.

Network visualization with Cytoscape 5

Below, you can see the result for this network.

Network visualization with Cytoscape 6

“Cytoscape is an open source software platform for visualizing complex networks and integrating these with any type of attribute data.” Find out more on their website.

Plotting Expression Curves for Experimental Data

As I can plot expression curves for in silico data. I moved on experimental data which is more complex and larger. This data is the result of RPPA experiments on different breast cancer cell lines and it includes protein abundance measurements for about 45 phophoproteins. These phosphoproteins are treated with different inhibitors and stimuli and by comparing their expressions, I will try to infer relations between them.

Before moving on inferring part, I want to have a script that can plot the graphs so that I can see particular results for specific cases. For that, I used the same script as in in silico part but I has to modify it a lot to handle experimental data. I mentioned some of my progress on this in the previous post. Now, I have completed it by ordering the data according to pre-defined fashion, handling duplicates (taking average of them), putting NAs for the missing data points and estimating data points which are in the middle (taking average of neighboring data points in the series). I do this estimation because they are not many and most of them are missing only one point. And in this way, I’m able to have more edges in networks.

Then, the network is inferred and the edges are scored. Since second and third inhibitors act on two proteins, before showing the results, the script splits those with two targets and keeps the scores and relations the same. If there is already an edge after the split, based on their score, the one with higher score is written in the result.

In this inference, I consider each inhibitor with its target(s) and construct relation between its target(s) and phophoproteins in the files. And these targets are actually in the data but since they are given in unphophorylated forms, they are not exactly the same as in the ones in the data. For example, “GSK690693” is given as “AKT inhibitor” and when I use it for inferring networks, I find relations between “AKT” and other phophoproteins. “AKT” is present in the data in several phosphorylated forms such as: “AKT_pS473” and “AKT_pT308”. So, I have a relation between “AKT” and “AKT_pS473”, for example. This is not I want and not what it’s asked because in the network I should have only phosphoproteins from the files. Therefore, I might do this: I can duplicate each edge with “AKT” and name them as the forms of AKT present in the file, which means I will assume that “GSK690693” inhibits all phosphorylated forms of AKT. This is true for “MEK”, too. I haven’t seen the other inhibited proteins “FGFR1” and “FGFR3” in the data, which is strange. I will go over it later.

Experimental AKT plot

Above, there is a plot from experimental data. Cell line is “MCF7” and stimulus is “Serum”. The antibody being investigated is “AKT_pT308”. The invervention condition is “Serum__GSK690693”, which means I’m investigating relation of “AKT” forms on themselves. This plot shows a confident result of inhibiting relation. So we can conclude that “AKT” forms inhibit themselves. Because, under inhibition, its expression increases.

It was run using these parameters:

bigcat@bigcat-shut-01:~/gungor/netinf$ ./netInfPlotter_experimental.R MCF7 AKT_pT308 Serum Serum__GSK690693

Actually, there are not many things I can do anymore because I’m leaving in two weeks and also the challenge will end soon. In the following days, I will try to learn more on statistics and try to improve that part of network inference. I will also try to visualize the networks in Cytoscape and write another post on it.

Experimental Data Optimization for Network Inference

As I mentioned in my previous post, experimental data from the challenge has missing data values that create problems during analyses. To solve it, first thing I did was to optimize data, which includes detecting missing conditions and putting NAs for data values and sorting them if necessary.

I wrote two functions in the script. First one ranks the data according to the fashion and sorts it based on these ranks. The second one gets sorted list and looks for missing data values and puts NAs as data values.

BT20 missing data

Above there is the list of conditions which are designed to represent “fashion” in the data. Also, it’s the result of the functions I mentioned above and for this time point, there are 3 missing data values. Actually, “NRG1__GSK690693_GSK1120212” and “PBS__GSK690693_GSK1120212” are missing for all time points. So I won’t be able to make any inference using these conditions. “IGF1__GSK690693_GSK1120212” is the only data value missing among the others. There are some other missing data values in different time points and all of these can be estimated.

The first approach I will use for the estimation is calculating formula for regression line and finding corresponding data value using time point. There are several others and I will mention them later.

But before moving on to estimation part, I want to be sure that this optimization approach is working for all cell lines. I applied this approach for MCF7 cell line and it wasn’t working as expected. Because in MCF7 cell line data DMSO treatments (no inhibitors) have duplicates, which cannot be handled by this version of script. This is also similar for UACC812 and BT549 cell line.

I will modify the script to work with these and then move to estimation part.

Working with Experimental Data from Network Inference Challenge

As I almost finished with in silico data, I moved on to analyses of experimental data using the same script. But since the characteristics of data is somehow different, before inferring network, I need to modify the script to be able to read experimental data files.

These differences include missing data values for some conditions. This makes analyses difficult because I have to estimate a value for them and this will decrease the confidence score of edges.

I’m working with BT20 (breast cancer) cell line data for the beginning. It has 234 rows of conditions and some of them (zeroth time points) are duplicates. Zero time points do not have stimulus treatment so for now I will exclude them and start with 5 seconds which is the second time point. So I have 6 time points (5, 15, 30, 60, 120, 240).

Stimuli are FGF1, Insulin, EGF, IGF1, HGF, Serum, NRG1, PBS and inhibitors are AKT, AKT & MEK and FGFR1 & FGFR3. And there are 48 phosphoproteins in BT20 cell line data. These make 192 different conditions and time points and 9216 data values.

In R, I defined a fashion for the data and I’m checking the data if it follows the expected pattern. If not, I create an empty data value for that condition and time point. Later, I’ll try to estimate values for them. Also, I realized for 240 seconds rows, data does not follow the expected pattern. So now, I have to fix this and start estimation.

After these estimations, I will have a complete dataset as in in silico part and will use the same scripts for plotting the graphs of expression profiles and inferring and scoring networks.