Blog

How to Make a Customizable Google Custom Search Engine Box

Google Custom Search Engine (CSE) is a great service when you need to implement a search functionality to a website and do not spend too much time. However, the default way of implementing it restricting us to have a certain search box design that might not blend well to our website. In this post, I’ll show how I found a workaround solution and actually implement on this very blog. To see the solution in action, feel free to search anything on the search box at the top of the right sidebar.

Blog

How to Make a Customizable Google Custom Search Engine Box

Google Custom Search Engine (CSE) is a great service when you need to implement a search functionality to a website and do not spend too much time. However, the default way of implementing it restricting us to have a certain search box design that might not blend well to our website. In this post, I’ll show how I found a workaround solution and actually implement on this very blog. To see the solution in action, feel free to search anything on the search box at the top of the right sidebar.

Blog

How to Make a Customizable Google Custom Search Engine Box

Google Custom Search Engine (CSE) is a great service when you need to implement a search functionality to a website and do not spend too much time. However, the default way of implementing it restricting us to have a certain search box design that might not blend well to our website. In this post, I’ll show how I found a workaround solution and actually implement on this very blog. To see the solution in action, feel free to search anything on the search box at the top of the right sidebar.

Blog

How to Make a Customizable Google Custom Search Engine Box

Google Custom Search Engine (CSE) is a great service when you need to implement a search functionality to a website and do not spend too much time. However, the default way of implementing it restricting us to have a certain search box design that might not blend well to our website. In this post, I’ll show how I found a workaround solution and actually implement on this very blog. To see the solution in action, feel free to search anything on the search box at the top of the right sidebar.

Blog

Bootstrap 4 Search Box with Search Icon

Bootstrap 4 is a very handy library to generate quick web user interfaces for web pages and web applications. Search box is a very fundamental UI element if the web page is providing some content and in this post I’ll describe some styles that make a nice text input for search box. To accomplish this, I’ll make use of the default way of form validations in Bootstrap 3 which was removed in Bootstrap 4 because it doesn’t support font icons anymore.

Blog

Bootstrap 4 Search Box with Search Icon

Bootstrap 4 is a very handy library to generate quick web user interfaces for web pages and web applications. Search box is a very fundamental UI element if the web page is providing some content and in this post I’ll describe some styles that make a nice text input for search box. To accomplish this, I’ll make use of the default way of form validations in Bootstrap 3 which was removed in Bootstrap 4 because it doesn’t support font icons anymore.

Blog

Bootstrap 4 Search Box with Search Icon

Bootstrap 4 is a very handy library to generate quick web user interfaces for web pages and web applications. Search box is a very fundamental UI element if the web page is providing some content and in this post I’ll describe some styles that make a nice text input for search box. To accomplish this, I’ll make use of the default way of form validations in Bootstrap 3 which was removed in Bootstrap 4 because it doesn’t support font icons anymore.

Blog

Bootstrap 4 Search Box with Search Icon

Bootstrap 4 is a very handy library to generate quick web user interfaces for web pages and web applications. Search box is a very fundamental UI element if the web page is providing some content and in this post I’ll describe some styles that make a nice text input for search box. To accomplish this, I’ll make use of the default way of form validations in Bootstrap 3 which was removed in Bootstrap 4 because it doesn’t support font icons anymore.

Blog

How to Install Sambamba on Linux

Sambamba is a great utility to work with alignment file formats in bioinformatics such as BAM and CRAM. Follow below steps on any 64-bit Linux machine to install (this guide installs version 0.6.8 go to Sambamba releases page for the most up-to-date version): Create a softwares directory (optional but recommended) cd ~/ mkdir softwares cd softwares/ Download the static executable wget https://github.com/biod/sambamba/releases/download/v0.6.8/sambamba-0.6.8-linux-static.gz Unzip the package and rename the executable unpacked

Blog

Correct Installation and Configuration of pip2 and pip3

You may have to keep both Python version, the old 2 and 3, at the same time due to your projects and they will require corresponding pip installation so you can separately install and maintain packages for both version. There are multiple ways of installing pip to a system but the version configuration and setting the default version for pip executable can be tricky. Below is the easiest solution I’ve found.

Blog

How to Install Valgrind on macOS High Sierra

Valgrind is a programming tool for memory debugging, memory leak detection and profiling. Its installation for macOS High Sierra seems problematic and I wanted to write this post to tell the solution that worked for me. I use Homebrew to install it which is the recommended way and the solution also uses it. So, when you try installing right away, you may get the following error: 1brew install valgrind 2valgrind: This formula either does not compile or function as expected on macOS 3versions newer than Sierra due to an upstream incompatibility.

Blog

How to Install Numpy Python Package on Windows

Numpy (Numerical Python) is a great Python package that you should definitely make use of if you’re doing scientific computing Installing it on Windows might be difficult if you don’t know how to do it via command line. There are unofficial Windows binaries for Numpy for Windows 32 and 64 bit which make it super easy to install. Go to the link below and download the one for your system and Python version:http://www.

Blog

Install Ensembl API and BioPerl 1.2.3 on Your System

I’m going to work on a project that requires lots of queries on Ensembl databases so I wanted to install Ensembl API to begin with. Since it’s programmed in Perl, I will be using Perl in this project. There is a nice tutorial on Ensembl website for API installation. Here I will describe some steps. 1. Download the API and BioPerl Go to Ensembl FTP ftp://ftp.ensembl.org/pub/ and download “ensembl-api.tar.gz” or click here

Blog

How to Install Sambamba on Linux

Sambamba is a great utility to work with alignment file formats in bioinformatics such as BAM and CRAM. Follow below steps on any 64-bit Linux machine to install (this guide installs version 0.6.8 go to Sambamba releases page for the most up-to-date version): Create a softwares directory (optional but recommended) cd ~/ mkdir softwares cd softwares/ Download the static executable wget https://github.com/biod/sambamba/releases/download/v0.6.8/sambamba-0.6.8-linux-static.gz Unzip the package and rename the executable unpacked

Blog

How to Install Sambamba on Linux

Sambamba is a great utility to work with alignment file formats in bioinformatics such as BAM and CRAM. Follow below steps on any 64-bit Linux machine to install (this guide installs version 0.6.8 go to Sambamba releases page for the most up-to-date version): Create a softwares directory (optional but recommended) cd ~/ mkdir softwares cd softwares/ Download the static executable wget https://github.com/biod/sambamba/releases/download/v0.6.8/sambamba-0.6.8-linux-static.gz Unzip the package and rename the executable unpacked

Blog

Passwordless SSH for Mac/Linux

You don’t have to enter the ssh password everytime you make a connection. Use below method to generate a key, copy it to the host you want to connect and connect anytime without entering your password. Generate a keygen: 1ssh-keygen Copy the key to remote host: 1ssh-copy-id root@linuxconfig.org

Blog

How to Install Sambamba on Linux

Sambamba is a great utility to work with alignment file formats in bioinformatics such as BAM and CRAM. Follow below steps on any 64-bit Linux machine to install (this guide installs version 0.6.8 go to Sambamba releases page for the most up-to-date version): Create a softwares directory (optional but recommended) cd ~/ mkdir softwares cd softwares/ Download the static executable wget https://github.com/biod/sambamba/releases/download/v0.6.8/sambamba-0.6.8-linux-static.gz Unzip the package and rename the executable unpacked

Blog

Easy and Free Method to Compress Images on macOS with GUI and Terminal

Image compression is mostly needed if you are short of storage on your devices or if you want to serve your images online and you want to optimize them in a way that we load fast which greatly affects how search engines evaluates your content and how users will enjoy your website. This is especially important if you are also aiming to support for mobile devices and internet connections that are relatively slow.

Blog

Easy and Free Method to Compress Images on macOS with GUI and Terminal

Image compression is mostly needed if you are short of storage on your devices or if you want to serve your images online and you want to optimize them in a way that we load fast which greatly affects how search engines evaluates your content and how users will enjoy your website. This is especially important if you are also aiming to support for mobile devices and internet connections that are relatively slow.

Blog

Easy and Free Method to Compress Images on macOS with GUI and Terminal

Image compression is mostly needed if you are short of storage on your devices or if you want to serve your images online and you want to optimize them in a way that we load fast which greatly affects how search engines evaluates your content and how users will enjoy your website. This is especially important if you are also aiming to support for mobile devices and internet connections that are relatively slow.

Blog

Memory Leak Testing with Valgrind on macOS using Docker Containers

I had some issues installing Valgrind on macOS High Sierra and [posted some tips to successfully install it to the system]({% post_url 2018-04-28-how-to-install-valgrind-on-macos-high-sierra %}). Although I could install the software this way, it didn’t work correctly after testing with with several real and dummy C++ codes. It was giving me a memory leak error even with an empty code. So, then I decided to use an Ubuntu 16.04 based Docker container to test the code within the container using the Ubuntu version of Valgrind.

Blog

How to Install Valgrind on macOS High Sierra

Valgrind is a programming tool for memory debugging, memory leak detection and profiling. Its installation for macOS High Sierra seems problematic and I wanted to write this post to tell the solution that worked for me. I use Homebrew to install it which is the recommended way and the solution also uses it. So, when you try installing right away, you may get the following error: 1brew install valgrind 2valgrind: This formula either does not compile or function as expected on macOS 3versions newer than Sierra due to an upstream incompatibility.

Blog

Easy and Free Method to Compress Images on macOS with GUI and Terminal

Image compression is mostly needed if you are short of storage on your devices or if you want to serve your images online and you want to optimize them in a way that we load fast which greatly affects how search engines evaluates your content and how users will enjoy your website. This is especially important if you are also aiming to support for mobile devices and internet connections that are relatively slow.

Blog

MongoDB Listing Database Collections/Tables with Number of Records/Rows

Use following script and command to quickly get the number of records/rows in the collections/tables in a database. mongo-ls.js script: 1var collections = db.getCollectionNames(); 2for (var i = 0; i < collections.length; ++i) { 3 print(collections[i] + ' - ' + db[collections[i]].count() + ' records'); 4} So, copy-paste this script in to text file and save as mongo-ls.js. Finally, use the following command to query the database. Make sure you change HOSTNAME, DBNAME, USERNAME and PASSWORD with your own.

Blog

How to Generate Database EER Diagrams from SQL Scripts using MySQL Workbench

MySQL Workbench makes it really easy to generate EER diagrams from SQL scripts. Follow below steps to make one for yourself. Download and install MySQL Workbench for your system. See below simple SQL commands, later I’ll use them to generate a sample diagram. 1create table country ( 2 id integer primary key, 3 name CHAR(55)); 4 5create table city ( 6 id integer primary key, 7 country_id integer, 8 name CHAR(55), 9 foreign key (country_id) references country(id)); Open MySQL Workbench and create a new model (File -> New Model).

Blog

Get Size of MySQL Databases

Use below query in MySQL command prompt to get a table of databases and their sizes in MB. SELECT table_schema "DB Name", Round(Sum(data_length + index_length) / 1024 / 1024, 1) "DB Size in MB" FROM information_schema.tables GROUP BY table_schema;

Blog

How to Set Up a MySQL Database for a Mezzanine Project

Install MySQL server and python-mysqldb package: sudo apt-get install mysql-server sudo apt-get install python-mysqldb Run MySQL: mysql -u root -p Create a database: mysql> create database mezzanine_project; Confirm it: mysql> show databases; Exit: mysql> exit Configure local_settings.py: cd path/to/your/mezzanine/projectnano local_settings.py Like following: 1DATABASES = { 2 "default": { 3 "ENGINE": "django.db.backends.mysql", 4 "NAME": "mezzanine_project", 5 "USER": "root", 6 "PASSWORD": "123456", 7 "HOST": "", 8 "PORT": "", 9 } 10 } Note: Replace your password

Blog

Retrieving Data with AJAX using jQuery, PHP and MySQL

Last semester, I took a course from Informatics Institute at METU called “Biological Databases and Data Analysis Tools” where first we learned what is a database and how to do queries on it. Also, the technology behind databases are taught. Then, we learned many biological databases and data analysis tools available. These include gene, protein and pathway databases, tools for creating databases. As a final project, we were asked to create an online tool that can search a database and get the data and display it on any web browsers.

Blog

MongoDB Listing Database Collections/Tables with Number of Records/Rows

Use following script and command to quickly get the number of records/rows in the collections/tables in a database. mongo-ls.js script: 1var collections = db.getCollectionNames(); 2for (var i = 0; i < collections.length; ++i) { 3 print(collections[i] + ' - ' + db[collections[i]].count() + ' records'); 4} So, copy-paste this script in to text file and save as mongo-ls.js. Finally, use the following command to query the database. Make sure you change HOSTNAME, DBNAME, USERNAME and PASSWORD with your own.

Blog

MongoDB Listing Database Collections/Tables with Number of Records/Rows

Use following script and command to quickly get the number of records/rows in the collections/tables in a database. mongo-ls.js script: 1var collections = db.getCollectionNames(); 2for (var i = 0; i < collections.length; ++i) { 3 print(collections[i] + ' - ' + db[collections[i]].count() + ' records'); 4} So, copy-paste this script in to text file and save as mongo-ls.js. Finally, use the following command to query the database. Make sure you change HOSTNAME, DBNAME, USERNAME and PASSWORD with your own.

Blog

MongoDB Listing Database Collections/Tables with Number of Records/Rows

Use following script and command to quickly get the number of records/rows in the collections/tables in a database. mongo-ls.js script: 1var collections = db.getCollectionNames(); 2for (var i = 0; i < collections.length; ++i) { 3 print(collections[i] + ' - ' + db[collections[i]].count() + ' records'); 4} So, copy-paste this script in to text file and save as mongo-ls.js. Finally, use the following command to query the database. Make sure you change HOSTNAME, DBNAME, USERNAME and PASSWORD with your own.

Blog

Convert Gene Symbols to Entrez IDs in R

Bioinformatics studies usually includes gene symbols as identifiers (IDs) as they are more recognizable comparing to other IDs such as Entrez IDs. However, certain analyses (tools) may not use gene symbols as there are usually more than one symbol so it is more difficult to implement a method to work with gene symbols. In such cases, you may need to do a conversion which is very common thing to do in bioinformatics.

Blog

Convert Gene Symbols to Entrez IDs in R

Bioinformatics studies usually includes gene symbols as identifiers (IDs) as they are more recognizable comparing to other IDs such as Entrez IDs. However, certain analyses (tools) may not use gene symbols as there are usually more than one symbol so it is more difficult to implement a method to work with gene symbols. In such cases, you may need to do a conversion which is very common thing to do in bioinformatics.

Blog

Convert Gene Symbols to Entrez IDs in R

Bioinformatics studies usually includes gene symbols as identifiers (IDs) as they are more recognizable comparing to other IDs such as Entrez IDs. However, certain analyses (tools) may not use gene symbols as there are usually more than one symbol so it is more difficult to implement a method to work with gene symbols. In such cases, you may need to do a conversion which is very common thing to do in bioinformatics.

Blog

Convert Gene Symbols to Entrez IDs in R

Bioinformatics studies usually includes gene symbols as identifiers (IDs) as they are more recognizable comparing to other IDs such as Entrez IDs. However, certain analyses (tools) may not use gene symbols as there are usually more than one symbol so it is more difficult to implement a method to work with gene symbols. In such cases, you may need to do a conversion which is very common thing to do in bioinformatics.

Blog

Jupyter Notebook ile R Programlama - R Kernel Kurulumu

Daha önceki bir yazımda [Jupyter’in kurulumundan ve Jupyter Notebook]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %})’tan bahsetmiştim. Jupyter’in kurulumu Jupyter Notebook’a Python kernelini direkt kuruyor ve Python ile programlamayı mümkün kılıyor ancak biyoenformatikte sıkça kullanılacak bir diğer programlama dili olan R programlama için ilgili kerneli ekstra kurmak gerekiyor. Bu yazımda bu kernelin kurulumundan bahsedeceğim. Öncelikle [Jupyter kurulumu]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %}) ve R kurulumu yapılmış olması gerekiyor. Daha sonra Terminal’den aşağıdaki komutu kullanarak bir R oturumu başlatın:

Blog

Computing Significance of Overlap between Two Sets using Hypergeometric Test

There are many cases where we have two sets (e.g. under two different conditions) of things such as transcripts, genes or proteins and we want to compute the significance of the overlap between them. Hypergeometric test is very simple and widely used option for such cases. I’ll use the phyper function in R but you can use the same idea in SciPy (Python). Let’s say you have from 200 genes (A);

Blog

Mann Whitney U Test (Wilcoxon Rank-Sum Test) Javascript Implementation

Currently Javascript is really poor in statistical methods compared to Python (SciPy) and R. There are several efforts to fill this gap, most notably from jStat. However, still many functions, distributions and tests are missing in this library. In one of my projects, I had to implement a Javascript version of Mann Whitney U test (or also called Wilcoxon rank-sum test). Here, I’m giving a link to its source code and describing how it works.

Blog

MiClip 1.3 Installation

MiClip is a CLIP-seq data peak calling algorithm implemented in R but currently it doesn’t show up in the CRAN but you can obtain it from the archive and install from the source or tar.gz file. Download the tar.gz file: wget https://cran.r-project.org/src/contrib/Archive/MiClip/MiClip_1.3.tar.gz Start R: R Install dependencies: 1install.packages("moments") 2install.packages("VGAM") Finally install MiClip 1.3: 1install.packages("MiClip_1.3.tar.gz", repos = NULL, type="source") Then you can test it by loading the package and viewing its help file.

Blog

How to Get Path to or Directory of Current Script in R

Use following code to get the path to or directory of current (running) script in R: 1scr_dir <- dirname(sys.frame(1)$ofile) 2scr_path <- paste(scr_dir, "script.R", sep="/") Taken from SO

Blog

How to Get (or Load) NCBI GEO Microarray Data into R using GEOquery Package from Bioconductor

R, especially with lots of Bioconductor packages, provides nice tools to load, manage and analyze microarray data. If you are trying to load NCBI GEO data into R, use GEOquery package. Here, I’ll describe how to start with it and probably in my future posts I’ll mention more. Installation 1source("http://bioconductor.org/biocLite.R") 2biocLite("GEOquery") Usage 1library(GEOquery) 2gds <- getGEO("GDS5072") or 1library(GEOquery) 2gds <- getGEO(filename="path/to/GDS5072.soft.gz") getGEO function return a complex class type GDS object which contains the complete dataset.

Blog

Plotting Expression Curves for Experimental Data

As I can plot expression curves for in silico data. I moved on experimental data which is more complex and larger. This data is the result of RPPA experiments on different breast cancer cell lines and it includes protein abundance measurements for about 45 phophoproteins. These phosphoproteins are treated with different inhibitors and stimuli and by comparing their expressions, I will try to infer relations between them. Before moving on inferring part, I want to have a script that can plot the graphs so that I can see particular results for specific cases.

Blog

Experimental Data Optimization for Network Inference

As I mentioned in my previous post, experimental data from the challenge has missing data values that create problems during analyses. To solve it, first thing I did was to optimize data, which includes detecting missing conditions and putting NAs for data values and sorting them if necessary. I wrote two functions in the script. First one ranks the data according to the fashion and sorts it based on these ranks.

Blog

Some String Functions in R, String Manipulation in R

I have programmed with Perl, Python, and PHP before, and string manipulation was more direct and easier in them than in R. But still there are useful functions for string manipulation in R. I’m not an expert in R but I’ve been dealing with it for a while and I’ve learned some good functions for this purpose. Concatenate strings Concatenation is done with paste function. It gets concatenated strings as arguments separated bu comma and also separator character(s).

Blog

Network Inference DREAM Breast Cancer Challenge

The inference of causal edges are described as the change on a node seen after the intervention of another node. If the curves obtained over time overlap (under intervention or no intervention), then there is no relation. Otherwise, we can draw an edge between those nodes and according to the level, up or down, the edge will be activating or inhibiting. These causal edges are context-specific so in different cell line data, we may have different relations.

Blog

Convert Gene Symbols to Entrez IDs in R

Bioinformatics studies usually includes gene symbols as identifiers (IDs) as they are more recognizable comparing to other IDs such as Entrez IDs. However, certain analyses (tools) may not use gene symbols as there are usually more than one symbol so it is more difficult to implement a method to work with gene symbols. In such cases, you may need to do a conversion which is very common thing to do in bioinformatics.

Blog

Computing Significance of Overlap between Two Sets using Hypergeometric Test

There are many cases where we have two sets (e.g. under two different conditions) of things such as transcripts, genes or proteins and we want to compute the significance of the overlap between them. Hypergeometric test is very simple and widely used option for such cases. I’ll use the phyper function in R but you can use the same idea in SciPy (Python). Let’s say you have from 200 genes (A);

Blog

Mann Whitney U Test (Wilcoxon Rank-Sum Test) Javascript Implementation

Currently Javascript is really poor in statistical methods compared to Python (SciPy) and R. There are several efforts to fill this gap, most notably from jStat. However, still many functions, distributions and tests are missing in this library. In one of my projects, I had to implement a Javascript version of Mann Whitney U test (or also called Wilcoxon rank-sum test). Here, I’m giving a link to its source code and describing how it works.

Blog

MiClip 1.3 Installation

MiClip is a CLIP-seq data peak calling algorithm implemented in R but currently it doesn’t show up in the CRAN but you can obtain it from the archive and install from the source or tar.gz file. Download the tar.gz file: wget https://cran.r-project.org/src/contrib/Archive/MiClip/MiClip_1.3.tar.gz Start R: R Install dependencies: 1install.packages("moments") 2install.packages("VGAM") Finally install MiClip 1.3: 1install.packages("MiClip_1.3.tar.gz", repos = NULL, type="source") Then you can test it by loading the package and viewing its help file.

Blog

How to Get Path to or Directory of Current Script in R

Use following code to get the path to or directory of current (running) script in R: 1scr_dir <- dirname(sys.frame(1)$ofile) 2scr_path <- paste(scr_dir, "script.R", sep="/") Taken from SO

Blog

How to Get (or Load) NCBI GEO Microarray Data into R using GEOquery Package from Bioconductor

R, especially with lots of Bioconductor packages, provides nice tools to load, manage and analyze microarray data. If you are trying to load NCBI GEO data into R, use GEOquery package. Here, I’ll describe how to start with it and probably in my future posts I’ll mention more. Installation 1source("http://bioconductor.org/biocLite.R") 2biocLite("GEOquery") Usage 1library(GEOquery) 2gds <- getGEO("GDS5072") or 1library(GEOquery) 2gds <- getGEO(filename="path/to/GDS5072.soft.gz") getGEO function return a complex class type GDS object which contains the complete dataset.

Blog

Plotting Expression Curves for Experimental Data

As I can plot expression curves for in silico data. I moved on experimental data which is more complex and larger. This data is the result of RPPA experiments on different breast cancer cell lines and it includes protein abundance measurements for about 45 phophoproteins. These phosphoproteins are treated with different inhibitors and stimuli and by comparing their expressions, I will try to infer relations between them. Before moving on inferring part, I want to have a script that can plot the graphs so that I can see particular results for specific cases.

Blog

Experimental Data Optimization for Network Inference

As I mentioned in my previous post, experimental data from the challenge has missing data values that create problems during analyses. To solve it, first thing I did was to optimize data, which includes detecting missing conditions and putting NAs for data values and sorting them if necessary. I wrote two functions in the script. First one ranks the data according to the fashion and sorts it based on these ranks.

Blog

Some String Functions in R, String Manipulation in R

I have programmed with Perl, Python, and PHP before, and string manipulation was more direct and easier in them than in R. But still there are useful functions for string manipulation in R. I’m not an expert in R but I’ve been dealing with it for a while and I’ve learned some good functions for this purpose. Concatenate strings Concatenation is done with paste function. It gets concatenated strings as arguments separated bu comma and also separator character(s).

Blog

Network Inference DREAM Breast Cancer Challenge

The inference of causal edges are described as the change on a node seen after the intervention of another node. If the curves obtained over time overlap (under intervention or no intervention), then there is no relation. Otherwise, we can draw an edge between those nodes and according to the level, up or down, the edge will be activating or inhibiting. These causal edges are context-specific so in different cell line data, we may have different relations.

Blog

Correct Installation and Configuration of pip2 and pip3

You may have to keep both Python version, the old 2 and 3, at the same time due to your projects and they will require corresponding pip installation so you can separately install and maintain packages for both version. There are multiple ways of installing pip to a system but the version configuration and setting the default version for pip executable can be tricky. Below is the easiest solution I’ve found.

Blog

Correct Installation and Configuration of pip2 and pip3

You may have to keep both Python version, the old 2 and 3, at the same time due to your projects and they will require corresponding pip installation so you can separately install and maintain packages for both version. There are multiple ways of installing pip to a system but the version configuration and setting the default version for pip executable can be tricky. Below is the easiest solution I’ve found.

Blog

Correct Installation and Configuration of pip2 and pip3

You may have to keep both Python version, the old 2 and 3, at the same time due to your projects and they will require corresponding pip installation so you can separately install and maintain packages for both version. There are multiple ways of installing pip to a system but the version configuration and setting the default version for pip executable can be tricky. Below is the easiest solution I’ve found.

Blog

Correct Installation and Configuration of pip2 and pip3

You may have to keep both Python version, the old 2 and 3, at the same time due to your projects and they will require corresponding pip installation so you can separately install and maintain packages for both version. There are multiple ways of installing pip to a system but the version configuration and setting the default version for pip executable can be tricky. Below is the easiest solution I’ve found.

Blog

Capture Full Size Screenshot on Chrome without Extension

Chrome’s new Developer Tools has a way to capture high quality full size screenshot of the page so you don’t have to have an extension for it anymore! Update for latest Chrome versions: Chrome DevTools was slightly changed so here are the new steps (tested in Version 71.0.3578.98 (Official Build) (64-bit) on macOS). Open the website that you want to capture Use Ctrl + Shift + J shortcut on Windows/Linux or Cmd + Opt + J on Mac to open Developer Tools.

Blog

Capture Full Size Screenshot on Chrome without Extension

Chrome’s new Developer Tools has a way to capture high quality full size screenshot of the page so you don’t have to have an extension for it anymore! Update for latest Chrome versions: Chrome DevTools was slightly changed so here are the new steps (tested in Version 71.0.3578.98 (Official Build) (64-bit) on macOS). Open the website that you want to capture Use Ctrl + Shift + J shortcut on Windows/Linux or Cmd + Opt + J on Mac to open Developer Tools.

Blog

Capture Full Size Screenshot on Chrome without Extension

Chrome’s new Developer Tools has a way to capture high quality full size screenshot of the page so you don’t have to have an extension for it anymore! Update for latest Chrome versions: Chrome DevTools was slightly changed so here are the new steps (tested in Version 71.0.3578.98 (Official Build) (64-bit) on macOS). Open the website that you want to capture Use Ctrl + Shift + J shortcut on Windows/Linux or Cmd + Opt + J on Mac to open Developer Tools.

Blog

Capture Full Size Screenshot on Chrome without Extension

Chrome’s new Developer Tools has a way to capture high quality full size screenshot of the page so you don’t have to have an extension for it anymore! Update for latest Chrome versions: Chrome DevTools was slightly changed so here are the new steps (tested in Version 71.0.3578.98 (Official Build) (64-bit) on macOS). Open the website that you want to capture Use Ctrl + Shift + J shortcut on Windows/Linux or Cmd + Opt + J on Mac to open Developer Tools.

Blog

Memory Leak Testing with Valgrind on macOS using Docker Containers

I had some issues installing Valgrind on macOS High Sierra and [posted some tips to successfully install it to the system]({% post_url 2018-04-28-how-to-install-valgrind-on-macos-high-sierra %}). Although I could install the software this way, it didn’t work correctly after testing with with several real and dummy C++ codes. It was giving me a memory leak error even with an empty code. So, then I decided to use an Ubuntu 16.04 based Docker container to test the code within the container using the Ubuntu version of Valgrind.

Blog

Memory Leak Testing with Valgrind on macOS using Docker Containers

I had some issues installing Valgrind on macOS High Sierra and [posted some tips to successfully install it to the system]({% post_url 2018-04-28-how-to-install-valgrind-on-macos-high-sierra %}). Although I could install the software this way, it didn’t work correctly after testing with with several real and dummy C++ codes. It was giving me a memory leak error even with an empty code. So, then I decided to use an Ubuntu 16.04 based Docker container to test the code within the container using the Ubuntu version of Valgrind.

Blog

Memory Leak Testing with Valgrind on macOS using Docker Containers

I had some issues installing Valgrind on macOS High Sierra and [posted some tips to successfully install it to the system]({% post_url 2018-04-28-how-to-install-valgrind-on-macos-high-sierra %}). Although I could install the software this way, it didn’t work correctly after testing with with several real and dummy C++ codes. It was giving me a memory leak error even with an empty code. So, then I decided to use an Ubuntu 16.04 based Docker container to test the code within the container using the Ubuntu version of Valgrind.

Blog

Memory Leak Testing with Valgrind on macOS using Docker Containers

I had some issues installing Valgrind on macOS High Sierra and [posted some tips to successfully install it to the system]({% post_url 2018-04-28-how-to-install-valgrind-on-macos-high-sierra %}). Although I could install the software this way, it didn’t work correctly after testing with with several real and dummy C++ codes. It was giving me a memory leak error even with an empty code. So, then I decided to use an Ubuntu 16.04 based Docker container to test the code within the container using the Ubuntu version of Valgrind.

Blog

How to Install Valgrind on macOS High Sierra

Valgrind is a programming tool for memory debugging, memory leak detection and profiling. Its installation for macOS High Sierra seems problematic and I wanted to write this post to tell the solution that worked for me. I use Homebrew to install it which is the recommended way and the solution also uses it. So, when you try installing right away, you may get the following error: 1brew install valgrind 2valgrind: This formula either does not compile or function as expected on macOS 3versions newer than Sierra due to an upstream incompatibility.

Blog

How to Download hg38/GRCh38 FASTA Human Reference Genome

hg38/GRCh38 is the latest human reference genome as of today which was released December, 2013. There are multiple sources for downloading it and also it comes in different versions. The most well-known databases to use for downloading the human reference genomes are UCSC Genome Browser, Ensembl and NCBI. The naming convention hg38 is used by UCSC Genome Browser, while Ensembl and NCBI use GRCh38 to refer to the latest human reference genome.

Blog

How to Convert PED to FASTA

You may need the conversion of PED files to FASTA format in your studies for further analyses. Use below script for this purpose. PED to FASTA converter on GitHub Gets first 6 columns of each line as header line and the rest as the sequence replacing 0s with Ns and organizes it into a FASTA file. Note 0s are for missing nucleotides defined by default in PLINK How to run:

Blog

Download Human Reference Genome (HG19 - GRCh37)

Many variation calling tools and many other methods in bioinformatics require a reference genome as an input so may need to download human reference genome or sequences. There are several sources that freely and publicly provide the entire human genome and I’ll describe how to download complete human genome from University of California, Santa Cruz (UCSC) webpage. Index to the gzip-compressed FASTA files of human chromosomes can be found here at the UCSC webpage.

Blog

How to Get Transcripts (also Exons & Introns) of a Gene using Ensembl API

As a part of my project, I need to obtain exons and introns of certain genes. These genes are actually human genes that are determined for a specific reason that I will describe later when I explain my project. But for now, I want to share the way to obtain this information using (Perl) Ensembl API. Note that Ensembl has started a beautiful way (Ensembl REST API) of getting data but it is beta and it doesn’t provide exons / introns information.

Blog

Birden Fazla Dizi Dosyalarindan MegaBLAST'i Calistirmak

Asagidaki scripti, pipeline’in MegaBLAST aramasini daha hizli yapabilmek icin dusundugumuz bir teknige uygun olabilmesi icin yazdim. Yaptigi sey, her okuma icin olusturulmus ve formatlanmis dizi dosyalarini kullanarak veritabanlarinda belirtilen baslangic noktasi ve okuma sayisi ile arama yapmak. 1#!user/local/bin/perl 2 3$database = $ARGV[0]; 4$dir = $ARGV[1]; #directory for sequences 5$sp = $ARGV[2]; #starting point 6$n = $ARGV[3] + $sp; 7 8while (1) { 9 system("blastplus -programname=megablast $dir/read_$sp.seq $database -OUTFILE=read_$sp.megablast -nobatch -d"); 10 $sp++; 11 last if ($sp == $n); 12} Burada her sey gercekten cok basit bir programlama ile isliyor.

Blog

Tek FASTA Dosyasindan MegaBLAST'i Calistirmak - Duzenli Ifadeler

Asagida MegaBLAST’i FASTA dosyasi okuyarak calistirmak ve sonuclari bir dizinde toplayabilmek amaciyla yazdigim Perl scripti ve onun aciklamasi var. Bu script tasarlamakta oldugum pipeline’in onemli bir parcasi. Bu script ilk yazdigim olan ve sadece bir FASTA dosyasi uzerinden tum okumalara ulasabilen script. 1#!user/local/bin/perl 2$database = $ARGV[0]; 3$fasta = $ARGV[1]; #input file 4$sp = $ARGV[2]; #starting point 5$n = $ARGV[3] + $sp; 6 7if(!defined($n)){$n=12;} #set default number 8 9open FASTA, $fasta or die $!

Blog

Bir MegaBLAST Ciktisi Icerigi - RefSeq Veritabani

Asagida, deneme FASTA dosyasini refseq_genomic veritabaninda arayarak elde ettigim dosyadan, bir hitin ayrintilarini goruyoruz. >>>>refseq_genomic_complete3: AC_000033_0310 Continuation (311 of 1357) of AC_000033 from base 31000001 (AC_000033 Mus musculus strain mixed chromosome 11, alternate assembly Mm_Celera, whole genome shotgun sequence. 2/2012) Length = 110000 Score = 115 bits (58), Expect = 4e-22 Identities = 74/79 (93%), Gaps = 2/79 (2%) Strand = Plus / Minus Query: 1 ctctctctgtct-tctctctctctctgtctctctctctttctctctcttctctctctctc 59 |||||||||||| ||| ||||||||| ||||||||||| ||||||||||||||||||||| Sbjct: 89773 ctctctctgtctgtctttctctctctctctctctctctctctctctcttctctctctctc 89714 Query: 60 tttctctctgccctctctc 78 ||||||||| ||||||||| Sbjct: 89713 tttctctct-ccctctctc 89696 Ayrintilarda, ilk olarak >>>> karakterleriyle hit ile ilgili baslik bilgisi veriyor.

Blog

MegaBLAST Aramasini Hizlandirma

Son zamanlarda sadece farkli veritabanlarinda, MegaBLAST’i en cabuk ve etkili bir sekilde calistirmanin yolunu ariyorum ve FASTA dosyasi olusturma asamasinda, gercekten cokca ise yarayan bir yontem danismanim tarafindan geldi. Daha once tum dizilerin bulundugu tek bir FASTA dosyasindan arama yapiyordum ve bu zaman kaybina yol aciyordu. Her ne kadar dosya bir sefer acilsa da her seferinde dosya icinde satirlara gidip onu okuman, zaman alan bir islem. Bunu, dosyadaki her okumayi, ayri bir FASTA dosyasi haline getirerek cozduk.

Blog

FASTQ'dan FASTA'ya Donusturme Perl Scripti

FASTQ ve FASTA formatlari aslinda ayni bilgiyi iceren ancak birinde sadece herbir dizi icin iki satir daha az bilginin bulundugu dosya formatlari. Projemde onemli olan diger bir farklari ise FASTA formatinin direkt olarak MegaBLAST arama yapilabilmesi. Iste bu yuzden, genetik dizilim yapan makinelerin olusturdugu FASTQ formatini FASTA’ya cevirmem gerekiyor. Ve bu script pipeline’in ilk adimi. Aslinda deneme amacli aldigim genetk dizilimin, bana bunu ulastiran tarafindan eslestirmesinin yapilmadigi icin, bir on adim olarak bu eslestirmeyi yapmistim.

Blog

SAM Dosyası - BAM Dosyası - samtools

Aslında programlamam gereken pipeline direkt olarak eşleşmeyen okumalar üzerinden analizler yapacak. Ancak böyle bir veri bulamadiığım için, elimdeki tek veri eşleşen ve eşleşmeyen okumaları içerdiği için önce eşleşenlerden kurtulmam gerekti. Bunu daha önce de belirttiğim gibi bwa eşleştiricisi (aligner - mapper) ile yapıyorum. bwa bir dizi işlemden sonra SAM dosyası oluşturuyor ancak benim FASTQ dosyasına ihtiyacım var. Bunun için SAM dosyasını samtools1 ile benzer bir format olan BAM dosyasına çevirip, daha sonra da bam2fastq2 aracı ile FASTQ dosyamı elde edeceğim.

Blog

MegaBLAST - Dizilerdeki Benzerlikleri Bulma Aracı

MegaBLAST, HUSAR paketinde bulunan, BLAST (Basic Local Alignment Search Tool) paketinin bir parçası. Ayrıca BLASTN’in bir değişik türü. MegaBLAST uzun dizileri BLASTN’den daha etkili bir şekilde işliyor ve hem de çok daha hızlı işlem yapiyor ancak daha az duyarlı. Bu yüzden benzer dizileri geniş veri tabanlarında aramaya çok uygun bir araç. Yazacağım program çoklu dizilim barındıran FASTA dosyasını alacak ve megablast komutunu çalıştıracak. Daha sonra da her okuma için bir .

Blog

Kontaminant (Kirletici) Analizi Projesi

Başlangıç olarak, araçlara, programlama diline, kısacası biyoenformatiğe alışabilmem için bana verilen bu ufak projeyi ayrıntılı olarak anlatacağım. Biliyoruz ki, laboratuvar çalışmalarımızda ne kadar önlemeye çalışsak da kontaminant riski hep bulunuyor. Bunu ne kadar aza indirsek o kadar iyi, ki daha sonra bunun miktarını bulup, bunun üzerinden sonucumuzun bir başka değerlendirmesini de yapabiliriz. İşte bunu bulmak için bir yöntem, DNA analizi. Çalıştığınız örneğinizin DNA’sı dizileniyor ve bu DNA çeşitli programlarla analiz edilip, kirleten organizmaları DNA’larından ortaya çıkarabiliyoruz

Blog

FASTQ Formatı - FASTQ Dosyası

Bugün programı oluştururken kullanacağım “test” dizilimini aldım. İki adet FASTQ dosyasından oluşuyor, her biri sıkıştırılmış ama buna rağmen boyutları 6 GB civarı. Ben elbette çok zaman kaybetmek istemediğim için bu dosyalardan birinin sadece bir kısmını kullanacağım. Amacım, bu FASTQ dosyalarındaki eşleşebilen okumaları BWA aracı ile bularak, daha sonra onları çıkarmak. Ve kalan eşleşemeyen okumaları MegaBLAST aracının anlayabileceği bir dilde (FASTA formatında) kaydetmek. Bu arada tüm projeyi bir Unix bilgisayarda hazırladığım için birçok komut öğreniyorum, daha sonra bunları ayrıca yazmaya çalışacağım.

Blog

How to Download hg38/GRCh38 FASTA Human Reference Genome

hg38/GRCh38 is the latest human reference genome as of today which was released December, 2013. There are multiple sources for downloading it and also it comes in different versions. The most well-known databases to use for downloading the human reference genomes are UCSC Genome Browser, Ensembl and NCBI. The naming convention hg38 is used by UCSC Genome Browser, while Ensembl and NCBI use GRCh38 to refer to the latest human reference genome.

Blog

How to Download hg38/GRCh38 FASTA Human Reference Genome

hg38/GRCh38 is the latest human reference genome as of today which was released December, 2013. There are multiple sources for downloading it and also it comes in different versions. The most well-known databases to use for downloading the human reference genomes are UCSC Genome Browser, Ensembl and NCBI. The naming convention hg38 is used by UCSC Genome Browser, while Ensembl and NCBI use GRCh38 to refer to the latest human reference genome.

Blog

How to Download hg38/GRCh38 FASTA Human Reference Genome

hg38/GRCh38 is the latest human reference genome as of today which was released December, 2013. There are multiple sources for downloading it and also it comes in different versions. The most well-known databases to use for downloading the human reference genomes are UCSC Genome Browser, Ensembl and NCBI. The naming convention hg38 is used by UCSC Genome Browser, while Ensembl and NCBI use GRCh38 to refer to the latest human reference genome.

Blog

Data Preprocessing II for Salmon Project

So in our Multi-dimensional Modeling and Reconstruction of Signaling Networks in Salmonella-infected Human Cells project, we have several methods to construct the networks so the data is still needed to be preprocessed so that it can be ready to be analyzed with these methods. One method needed to have a matrix first row as protein name and time series (2 min, 5 min, 10 min, 20 min), and the values of the proteins in each time series were to be 1 or 0 according to variance, significance and the size of fold change.

Blog

Multi-dimensional Modeling and Reconstruction of Signaling Networks in Salmonella-infected Human Cells

In this study, we’re going to use a phosphorylation data from a research paper on phosphoproteomic analysis of related cells. The idea is to use and compare existing methods and develop these methods to be able to better understand the nature of signaling events in these cells and to find key proteins that might be targets for disease diagnosis, prevention and treatment. This study will be submitted as a research paper so I’m not going to publish any results here for now but I’ll mention the struggles I have and solutions I try to solve them.

Blog

Download Human Reference Genome (HG19 - GRCh37)

Many variation calling tools and many other methods in bioinformatics require a reference genome as an input so may need to download human reference genome or sequences. There are several sources that freely and publicly provide the entire human genome and I’ll describe how to download complete human genome from University of California, Santa Cruz (UCSC) webpage. Index to the gzip-compressed FASTA files of human chromosomes can be found here at the UCSC webpage.

Blog

Super Long Introns of Euarchontoglires

There was another weird result I got about my exon/intron boundaries analysis research. To less diverse species’ genes, intron lengths are shown to increase. However, according to my findings, at a point of Euarchontoglires or Supraprimates, this increase is very sharp and seems unexpected. So, I looked at exon/intron length each gene in each taxonomic rank and try to see what makes Euarchontoglires genes with that long introns. As you see in the graph above, Euarchontoglires introns are very long compared to the rest.

Blog

How to Download hg38/GRCh38 FASTA Human Reference Genome

hg38/GRCh38 is the latest human reference genome as of today which was released December, 2013. There are multiple sources for downloading it and also it comes in different versions. The most well-known databases to use for downloading the human reference genomes are UCSC Genome Browser, Ensembl and NCBI. The naming convention hg38 is used by UCSC Genome Browser, while Ensembl and NCBI use GRCh38 to refer to the latest human reference genome.

Blog

How to Install Valgrind on macOS High Sierra

Valgrind is a programming tool for memory debugging, memory leak detection and profiling. Its installation for macOS High Sierra seems problematic and I wanted to write this post to tell the solution that worked for me. I use Homebrew to install it which is the recommended way and the solution also uses it. So, when you try installing right away, you may get the following error: 1brew install valgrind 2valgrind: This formula either does not compile or function as expected on macOS 3versions newer than Sierra due to an upstream incompatibility.

Blog

How to Install Valgrind on macOS High Sierra

Valgrind is a programming tool for memory debugging, memory leak detection and profiling. Its installation for macOS High Sierra seems problematic and I wanted to write this post to tell the solution that worked for me. I use Homebrew to install it which is the recommended way and the solution also uses it. So, when you try installing right away, you may get the following error: 1brew install valgrind 2valgrind: This formula either does not compile or function as expected on macOS 3versions newer than Sierra due to an upstream incompatibility.

Blog

Jupyter Notebook ile R Programlama - R Kernel Kurulumu

Daha önceki bir yazımda [Jupyter’in kurulumundan ve Jupyter Notebook]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %})’tan bahsetmiştim. Jupyter’in kurulumu Jupyter Notebook’a Python kernelini direkt kuruyor ve Python ile programlamayı mümkün kılıyor ancak biyoenformatikte sıkça kullanılacak bir diğer programlama dili olan R programlama için ilgili kerneli ekstra kurmak gerekiyor. Bu yazımda bu kernelin kurulumundan bahsedeceğim. Öncelikle [Jupyter kurulumu]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %}) ve R kurulumu yapılmış olması gerekiyor. Daha sonra Terminal’den aşağıdaki komutu kullanarak bir R oturumu başlatın:

Blog

Jupyter Notebook ile R Programlama - R Kernel Kurulumu

Daha önceki bir yazımda [Jupyter’in kurulumundan ve Jupyter Notebook]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %})’tan bahsetmiştim. Jupyter’in kurulumu Jupyter Notebook’a Python kernelini direkt kuruyor ve Python ile programlamayı mümkün kılıyor ancak biyoenformatikte sıkça kullanılacak bir diğer programlama dili olan R programlama için ilgili kerneli ekstra kurmak gerekiyor. Bu yazımda bu kernelin kurulumundan bahsedeceğim. Öncelikle [Jupyter kurulumu]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %}) ve R kurulumu yapılmış olması gerekiyor. Daha sonra Terminal’den aşağıdaki komutu kullanarak bir R oturumu başlatın:

Blog

Jupyter Notebook ile R Programlama - R Kernel Kurulumu

Daha önceki bir yazımda [Jupyter’in kurulumundan ve Jupyter Notebook]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %})’tan bahsetmiştim. Jupyter’in kurulumu Jupyter Notebook’a Python kernelini direkt kuruyor ve Python ile programlamayı mümkün kılıyor ancak biyoenformatikte sıkça kullanılacak bir diğer programlama dili olan R programlama için ilgili kerneli ekstra kurmak gerekiyor. Bu yazımda bu kernelin kurulumundan bahsedeceğim. Öncelikle [Jupyter kurulumu]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %}) ve R kurulumu yapılmış olması gerekiyor. Daha sonra Terminal’den aşağıdaki komutu kullanarak bir R oturumu başlatın:

Blog

Jupyter / Python Nedir, Nasıl Kurulur?

Jupyter çeşitli programlama dilleri için etkileşimli bir ortam sağlayan yazılımdır. Orijinal olarak IPython (interactive python) adıyla, Python programlama dili için geliştirildi ancak daha sonra kurucuları Jupyter projesini başlatıp IPython’ın birçok tarafını Jupyter’e kaydırdı. IPython, sadece Jupyter’in kerneli olarak devam ediyor. Jupyter’in özellikleri; Etkileşimli bir shell sunması, Komut İstemi/Terminal’den jupyter console komutu ile başlatılır ve orijinal Python shell’ine göre otomatik tamamlama gibi kullanıcı dostu özellikleri barındırır. Tarayıcı tabanlı defter (notebook) sunması, Komut İstemi/Terminal’den jupyter notebook komutu ile başlatılır, açılan tarayıcı penceresinden yeni defter oluşturularak çeşitli programlama dillerinde kodlar yazılabilir ve bu kodlar çalıştırılarak çıktıları (metin, grafik, vs) etkileşimli olarak direkt tarayıcıda görüntülenebilir.

Blog

Jupyter Notebook ile R Programlama - R Kernel Kurulumu

Daha önceki bir yazımda [Jupyter’in kurulumundan ve Jupyter Notebook]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %})’tan bahsetmiştim. Jupyter’in kurulumu Jupyter Notebook’a Python kernelini direkt kuruyor ve Python ile programlamayı mümkün kılıyor ancak biyoenformatikte sıkça kullanılacak bir diğer programlama dili olan R programlama için ilgili kerneli ekstra kurmak gerekiyor. Bu yazımda bu kernelin kurulumundan bahsedeceğim. Öncelikle [Jupyter kurulumu]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %}) ve R kurulumu yapılmış olması gerekiyor. Daha sonra Terminal’den aşağıdaki komutu kullanarak bir R oturumu başlatın:

Blog

Jupyter Notebook ile R Programlama - R Kernel Kurulumu

Daha önceki bir yazımda [Jupyter’in kurulumundan ve Jupyter Notebook]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %})’tan bahsetmiştim. Jupyter’in kurulumu Jupyter Notebook’a Python kernelini direkt kuruyor ve Python ile programlamayı mümkün kılıyor ancak biyoenformatikte sıkça kullanılacak bir diğer programlama dili olan R programlama için ilgili kerneli ekstra kurmak gerekiyor. Bu yazımda bu kernelin kurulumundan bahsedeceğim. Öncelikle [Jupyter kurulumu]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %}) ve R kurulumu yapılmış olması gerekiyor. Daha sonra Terminal’den aşağıdaki komutu kullanarak bir R oturumu başlatın:

Blog

Jupyter Notebook ile R Programlama - R Kernel Kurulumu

Daha önceki bir yazımda [Jupyter’in kurulumundan ve Jupyter Notebook]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %})’tan bahsetmiştim. Jupyter’in kurulumu Jupyter Notebook’a Python kernelini direkt kuruyor ve Python ile programlamayı mümkün kılıyor ancak biyoenformatikte sıkça kullanılacak bir diğer programlama dili olan R programlama için ilgili kerneli ekstra kurmak gerekiyor. Bu yazımda bu kernelin kurulumundan bahsedeceğim. Öncelikle [Jupyter kurulumu]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %}) ve R kurulumu yapılmış olması gerekiyor. Daha sonra Terminal’den aşağıdaki komutu kullanarak bir R oturumu başlatın:

Blog

Jupyter Notebook ile R Programlama - R Kernel Kurulumu

Daha önceki bir yazımda [Jupyter’in kurulumundan ve Jupyter Notebook]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %})’tan bahsetmiştim. Jupyter’in kurulumu Jupyter Notebook’a Python kernelini direkt kuruyor ve Python ile programlamayı mümkün kılıyor ancak biyoenformatikte sıkça kullanılacak bir diğer programlama dili olan R programlama için ilgili kerneli ekstra kurmak gerekiyor. Bu yazımda bu kernelin kurulumundan bahsedeceğim. Öncelikle [Jupyter kurulumu]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %}) ve R kurulumu yapılmış olması gerekiyor. Daha sonra Terminal’den aşağıdaki komutu kullanarak bir R oturumu başlatın:

Blog

Jupyter / Python Nedir, Nasıl Kurulur?

Jupyter çeşitli programlama dilleri için etkileşimli bir ortam sağlayan yazılımdır. Orijinal olarak IPython (interactive python) adıyla, Python programlama dili için geliştirildi ancak daha sonra kurucuları Jupyter projesini başlatıp IPython’ın birçok tarafını Jupyter’e kaydırdı. IPython, sadece Jupyter’in kerneli olarak devam ediyor. Jupyter’in özellikleri; Etkileşimli bir shell sunması, Komut İstemi/Terminal’den jupyter console komutu ile başlatılır ve orijinal Python shell’ine göre otomatik tamamlama gibi kullanıcı dostu özellikleri barındırır. Tarayıcı tabanlı defter (notebook) sunması, Komut İstemi/Terminal’den jupyter notebook komutu ile başlatılır, açılan tarayıcı penceresinden yeni defter oluşturularak çeşitli programlama dillerinde kodlar yazılabilir ve bu kodlar çalıştırılarak çıktıları (metin, grafik, vs) etkileşimli olarak direkt tarayıcıda görüntülenebilir.

Blog

Jupyter Notebook ile R Programlama - R Kernel Kurulumu

Daha önceki bir yazımda [Jupyter’in kurulumundan ve Jupyter Notebook]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %})’tan bahsetmiştim. Jupyter’in kurulumu Jupyter Notebook’a Python kernelini direkt kuruyor ve Python ile programlamayı mümkün kılıyor ancak biyoenformatikte sıkça kullanılacak bir diğer programlama dili olan R programlama için ilgili kerneli ekstra kurmak gerekiyor. Bu yazımda bu kernelin kurulumundan bahsedeceğim. Öncelikle [Jupyter kurulumu]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %}) ve R kurulumu yapılmış olması gerekiyor. Daha sonra Terminal’den aşağıdaki komutu kullanarak bir R oturumu başlatın:

Blog

Jupyter / Python Nedir, Nasıl Kurulur?

Jupyter çeşitli programlama dilleri için etkileşimli bir ortam sağlayan yazılımdır. Orijinal olarak IPython (interactive python) adıyla, Python programlama dili için geliştirildi ancak daha sonra kurucuları Jupyter projesini başlatıp IPython’ın birçok tarafını Jupyter’e kaydırdı. IPython, sadece Jupyter’in kerneli olarak devam ediyor. Jupyter’in özellikleri; Etkileşimli bir shell sunması, Komut İstemi/Terminal’den jupyter console komutu ile başlatılır ve orijinal Python shell’ine göre otomatik tamamlama gibi kullanıcı dostu özellikleri barındırır. Tarayıcı tabanlı defter (notebook) sunması, Komut İstemi/Terminal’den jupyter notebook komutu ile başlatılır, açılan tarayıcı penceresinden yeni defter oluşturularak çeşitli programlama dillerinde kodlar yazılabilir ve bu kodlar çalıştırılarak çıktıları (metin, grafik, vs) etkileşimli olarak direkt tarayıcıda görüntülenebilir.

Blog

Jupyter Notebook ile R Programlama - R Kernel Kurulumu

Daha önceki bir yazımda [Jupyter’in kurulumundan ve Jupyter Notebook]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %})’tan bahsetmiştim. Jupyter’in kurulumu Jupyter Notebook’a Python kernelini direkt kuruyor ve Python ile programlamayı mümkün kılıyor ancak biyoenformatikte sıkça kullanılacak bir diğer programlama dili olan R programlama için ilgili kerneli ekstra kurmak gerekiyor. Bu yazımda bu kernelin kurulumundan bahsedeceğim. Öncelikle [Jupyter kurulumu]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %}) ve R kurulumu yapılmış olması gerekiyor. Daha sonra Terminal’den aşağıdaki komutu kullanarak bir R oturumu başlatın:

Blog

Jupyter Notebook ile R Programlama - R Kernel Kurulumu

Daha önceki bir yazımda [Jupyter’in kurulumundan ve Jupyter Notebook]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %})’tan bahsetmiştim. Jupyter’in kurulumu Jupyter Notebook’a Python kernelini direkt kuruyor ve Python ile programlamayı mümkün kılıyor ancak biyoenformatikte sıkça kullanılacak bir diğer programlama dili olan R programlama için ilgili kerneli ekstra kurmak gerekiyor. Bu yazımda bu kernelin kurulumundan bahsedeceğim. Öncelikle [Jupyter kurulumu]({% post_url 2018-03-31-jupyter-python-nedir-nasil-kurulur %}) ve R kurulumu yapılmış olması gerekiyor. Daha sonra Terminal’den aşağıdaki komutu kullanarak bir R oturumu başlatın:

Blog

Jupyter / Python Nedir, Nasıl Kurulur?

Jupyter çeşitli programlama dilleri için etkileşimli bir ortam sağlayan yazılımdır. Orijinal olarak IPython (interactive python) adıyla, Python programlama dili için geliştirildi ancak daha sonra kurucuları Jupyter projesini başlatıp IPython’ın birçok tarafını Jupyter’e kaydırdı. IPython, sadece Jupyter’in kerneli olarak devam ediyor. Jupyter’in özellikleri; Etkileşimli bir shell sunması, Komut İstemi/Terminal’den jupyter console komutu ile başlatılır ve orijinal Python shell’ine göre otomatik tamamlama gibi kullanıcı dostu özellikleri barındırır. Tarayıcı tabanlı defter (notebook) sunması, Komut İstemi/Terminal’den jupyter notebook komutu ile başlatılır, açılan tarayıcı penceresinden yeni defter oluşturularak çeşitli programlama dillerinde kodlar yazılabilir ve bu kodlar çalıştırılarak çıktıları (metin, grafik, vs) etkileşimli olarak direkt tarayıcıda görüntülenebilir.

Blog

Jupyter / Python Nedir, Nasıl Kurulur?

Jupyter çeşitli programlama dilleri için etkileşimli bir ortam sağlayan yazılımdır. Orijinal olarak IPython (interactive python) adıyla, Python programlama dili için geliştirildi ancak daha sonra kurucuları Jupyter projesini başlatıp IPython’ın birçok tarafını Jupyter’e kaydırdı. IPython, sadece Jupyter’in kerneli olarak devam ediyor. Jupyter’in özellikleri; Etkileşimli bir shell sunması, Komut İstemi/Terminal’den jupyter console komutu ile başlatılır ve orijinal Python shell’ine göre otomatik tamamlama gibi kullanıcı dostu özellikleri barındırır. Tarayıcı tabanlı defter (notebook) sunması, Komut İstemi/Terminal’den jupyter notebook komutu ile başlatılır, açılan tarayıcı penceresinden yeni defter oluşturularak çeşitli programlama dillerinde kodlar yazılabilir ve bu kodlar çalıştırılarak çıktıları (metin, grafik, vs) etkileşimli olarak direkt tarayıcıda görüntülenebilir.

Blog

Jupyter / Python Nedir, Nasıl Kurulur?

Jupyter çeşitli programlama dilleri için etkileşimli bir ortam sağlayan yazılımdır. Orijinal olarak IPython (interactive python) adıyla, Python programlama dili için geliştirildi ancak daha sonra kurucuları Jupyter projesini başlatıp IPython’ın birçok tarafını Jupyter’e kaydırdı. IPython, sadece Jupyter’in kerneli olarak devam ediyor. Jupyter’in özellikleri; Etkileşimli bir shell sunması, Komut İstemi/Terminal’den jupyter console komutu ile başlatılır ve orijinal Python shell’ine göre otomatik tamamlama gibi kullanıcı dostu özellikleri barındırır. Tarayıcı tabanlı defter (notebook) sunması, Komut İstemi/Terminal’den jupyter notebook komutu ile başlatılır, açılan tarayıcı penceresinden yeni defter oluşturularak çeşitli programlama dillerinde kodlar yazılabilir ve bu kodlar çalıştırılarak çıktıları (metin, grafik, vs) etkileşimli olarak direkt tarayıcıda görüntülenebilir.

Blog

Install Cairo Graphics and PyCairo on Ubuntu 14.04 / Linux Mint 17

Cairo is a 2D graphics library implemented as a library written in the C programming language but if you’d like to use Python programming language, you should also install Python bindings for Cairo. This guide will go through installation of Cairo Graphics library version 1.14.2 (most recent) and py2cairo Python bindings version 1.10.1 (also most recent). Install Cairo It’s very easy with the following repository. Just add it, update your packages and install.

Blog

Install RDKit 2015-03 Build on Ubuntu 14.04 / Linux Mint 17

RDKit is an open source toolkit for cheminformatics. It has many functionalities to work with chemical files. Follow the below guide to install RDKit 2015-03 build on an Ubuntu 14.04 / Linux Mint 17 computer. Since Ubuntu packages don’t have the latest RDKit for trusty, you have to build RDKit from its source. Install Dependencies 1sudo apt-get install flex bison build-essential python-numpy cmake python-dev sqlite3 libsqlite3-dev libboost1.54-all-dev Download the Build

Blog

Simple Way of Python's subprocess.Popen with a Timeout Option

subprocess module in Python provides us a variety of methods to start a process from a Python script. We may use these methods to run an external commands / programs, collect their output and manage them. An example use of it might be as following: 1from subprocess import Popen, PIPE 2 3 4p = Popen(['ls', '-l'], stdout=PIPE, stderr=PIPE) 5stdout, stderr = p.communicate() 6print stdout, stderr These lines can be used to run ls -l command in Terminal and collect the output (standard output and standard error) in stdout and stderr variables using communicate method defined in the process.

Blog

ImportError: Reportlab Version 2.1+ is needed

Little bug in xhtml2pdf version 0.0.5. To fix: $ sudo nano /usr/local/lib/python2.7/dist-packages/xhtml2pdf/util.py Change the following lines: 1if not (reportlab.Version[0] == "2" and reportlab.Version[2] >= "1"): 2 raise ImportError("Reportlab Version 2.1+ is needed!") 3 4REPORTLAB22 = (reportlab.Version[0] == "2" and reportlab.Version[2] >= "2") With these lines: 1if not (reportlab.Version[:3] >= "2.1"): 2 raise ImportError("Reportlab Version 2.1+ is needed!") 3 4REPORTLAB22 = (reportlab.Version[:3] >= "2.1")

Blog

Django Migrations Table Already Exists Fix

Fix this issue by faking the migrations: python manage.py migrate –fake <appname> Taken from this SO answer

Blog

Mezzanine BS Banners Translation with django-modeltranslation

Mezzanine BS Banners is a nice app for implementing Bootstrap 3 banners/sliders to your Mezzanine projects. The Banners model in BS Banners app has a title and its stacked inline Slides model has title and content for translation. After [installing and setting up Django/Mezzanine translations]({% post_url 2015-07-01-djangomezzanine-content-translation-for-mezzanine %}): Create a translation.py inside your Mezzanine project or your custom theme/skin application and copy/paste following lines: 1from modeltranslation.translator import translator 2from mezzanine.core.translation import TranslatedSlugged, TranslatedRichText 3from mezzanine_bsbanners.

Blog

Django/Mezzanine Content Translation for Mezzanine Built-in Applications

As Mezzanine comes with additional Django applications such as pages, galleries and to translate their content, Mezzanine supports django-modeltranslation integration. Install django-modeltranslation: pip install django-modeltranslation Add following to the INSTALLED_APPS in settings.py: 1"modeltranslation", And following in settings.py: 1USE_MODELTRANSLATION = True Also, move mezzanine.pages to the top of other Mezzanine apps in INSTALLED_APPS in settings.py like so: 1"mezzanine.pages", 2"mezzanine.boot", 3"mezzanine.conf", 4"mezzanine.core", 5"mezzanine.generic", 6"mezzanine.blog", 7"mezzanine.forms", 8"mezzanine.galleries", 9"mezzanine.twitter", 10"mezzanine.accounts", 11"mezzanine.mobile", Run following to create fields in database tables for translations:

Blog

Setting Up Templates and Python Scripts for Translation

Templates need following template tag: 1{% raw %}{% load i18n %}{% endraw %} Then, wrapping any text with 1{% raw %}{% trans "TEXT" %}{% endraw %} will make it translatable via Rosetta Django application In Python scripts, you need to import following library: from django.utils.translation import ugettext_lazy as _ Then wrapping any text with 1_('TEXT') will make it translatable.

Blog

Django Rosetta Translations for Django Applications

Make a directory called locale/ under the application directory: cd app_name mkdir locale Add the folder in LOCAL_PATHS dictionary in settings.py: 1LOCALE_PATHS = ( 2 os.path.join(PROJECT_ROOT, 'app_name', 'locale/'), 3) Run the following command to create PO translation file for the application: python ../manage.py makemessages -l tr -e html,py,txt python ../manage.py compilemessages Option -l is for language, it should match your definition in settings.py: 1LANGUAGES = ( 2 ('en' _('English')), 3 ('tr' _('Turkish')), 4 ('it' _('Italian')), 5) Repeat the last step for all languages and the go to Rosetta URL to translate.

Blog

Django Rosetta Installation

Install SciPy: sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose Install pymongo and nltk: sudo pip install pymongo sudo pip install nltk Install Python MySQLdb: sudo apt-get install python-mysqldb Install Rosetta: sudo pip install django-rosetta Add following into INSTALLED_APPS in settings.py: 1"rosetta", Add following into urls.py: url(r’^translations/’, include(‘rosetta.urls’)), To also allow language prefixes , change patters to i18n_patterns in urls.py: 1urlpatterns += i18n_patterns( 2 ... 3)

Blog

Errno 13 Permission denied Django File Uploads

Run following command to give www-data permissions to static folder and all its content: cd path/to/your/django/project sudo chown -R www-data:www-data static/ Do this in your production server

Blog

Configuring Mezzanine for Apache server & mod_wsgi in AWS

Install [Mezzanine]({% post_url 2015-05-01-how-to-install-mezzanine-on-ubuntulinux-mint %}), [Apache server]({% post_url 2015-05-08-getting-started-with-your-aws-instance-and %}) and mod_wsgi: sudo apt-get install libapache2-mod-wsgi sudo a2enmod wsgi Set up a MySQL database for your Mezzanine project Read [my post on how to set up a MySQL database for a Mezzanine project]({% post_url 2015-05-09-how-to-set-up-a-mysql-database-for-a-mezzanine %}) Collect static files: python manage.py collectstatic Configure your Apache server configuration for the project like following: WSGIPythonPath /home/ubuntu/www/mezzanine-project <VirtualHost *:80> #ServerName example.com ServerAdmin admin@example.com DocumentRoot /home/ubuntu/www/mezzanine-project WSGIScriptAlias / /home/ubuntu/www/mezzanine-project/wsgi.

Blog

How to Set Up a MySQL Database for a Mezzanine Project

Install MySQL server and python-mysqldb package: sudo apt-get install mysql-server sudo apt-get install python-mysqldb Run MySQL: mysql -u root -p Create a database: mysql> create database mezzanine_project; Confirm it: mysql> show databases; Exit: mysql> exit Configure local_settings.py: cd path/to/your/mezzanine/projectnano local_settings.py Like following: 1DATABASES = { 2 "default": { 3 "ENGINE": "django.db.backends.mysql", 4 "NAME": "mezzanine_project", 5 "USER": "root", 6 "PASSWORD": "123456", 7 "HOST": "", 8 "PORT": "", 9 } 10 } Note: Replace your password

Blog

How to Install Mezzanine on Ubuntu/Linux Mint [Complete Guide]

Mezzanine is a CMS application built on Django web framework. The installation steps are easy but your environment may not just suitable enough for it work without a problem. So, here I’m going to describe complete installation from scratch on a virtual environment. First of all, install virtualenv: $ sudo apt-get install python-virtualenv Then, create a virtual environment: $ virtualenv testenv And, activate it: $ cd testenv $ source bin/activate

Blog

Finding k-cores and Clustering Coefficient Computation with NetworkX

Assume you have a large network and you want to find k-cores of each node and also you want to compute clustering coefficient for each one. Python package NetworkX comes with very nice methods for you to easily do these. k-core is a maximal subgraph whose nodes are at least k degree [1]. To find k-cores: Add all edges you have in your network in a NetworkX graph, and use core_number method that gets graph as the single input and returns node – k-core pairs.

Blog

Searching Open Reading Frames (ORF) in DNA sequences - ORF Finder

Open reading frames (ORF) are regions on DNA which are translated into protein. They are in between start and stop codons and they are usually long. The Python script below searches for ORFs in six frames and returns the longest one. It doesn’t consider start codon as a delimiter and only splits the sequence by stop codons. So the ORF can start with any codon but ends with a stop codon (TAG, TGA, TAA).

Blog

Python: Get Longest String in a List

Here is a quick Python trick you might use in your code. Assume you have a list of strings and you want to get the longest one in the most efficient way. 1>>>l=["aaa", "bb", "c"] 2>>>longest_string = max(l, key = len) 3>>>longest_string 4'aaa'

Blog

Python: defaultdict(list) Dictionary of Lists

Most of the time, when you need to work on large data, you’ll have to use some dictionaries in Python. Dictionaries of lists are very useful to store large data in very organized way. You can always initiate them by initiating empty lists inside an empty dictionary but when you don’t know how many of them you’ll end up with and if you want an easier option, use defaultdict(list). You just need to import it, first:

Blog

Python: extend() Append Elements of a List to a List

When you append a list to a list by using append() method, you’ll see your list is going to be appended as a list: 1>>>l=["a"] 2>>>l2=["a", "b"] 3>>>l.append(l2) 4>>>l 5['a', ['a', 'b']] If you want to append elements of the list directly without creating nested lists, use extend() method: 1>>>l=["a"] 2>>>l2=["a", "b"] 3>>>l.extend(l2) 4>>>l 5['a', 'a', 'b']

Blog

Salmonella Data Preprocessing for PCSF Algorithm

This post describes data preprocessing in Salmonella project for Prize-Collecting Steiner Forest Problem (PCSF) algorithm. Salmonella data taken from Table S6 in Phosphoproteomic Analysis of Salmonella-Infected Cells Identifies Key Kinase Regulators and SopB-Dependent Host Phosphorylation Events by Rogers, LD et al. has been converted to tab delimited TXT file from its original XLS file for easy reading in Python. The data should be separated into time points files (2, 5, 10 and 20 minutes) each of which will contain corresponding phophoproteins and their fold changes.

Blog

How to Install openpyxl on Windows

openpyxl is a Python library to read/write Excel 2007 xlsx/xlsm files. To download and install on Windows: Download it from Python Packages Then to install, extract the tar ball you downloaded, open up CMD, navigate to the folder that you extracted and run the following: C:\Users\Gungor>cd Downloads\openpyxl-2.1.2.tar\dist\openpyxl-2.1.2\openpyxl-2.1.2 C:\Users\Gungor\Downloads\openpyxl-2.1.2.tar\dist\openpyxl-2.1.2\openpyxl-2.1.2>python setup.py install It’s going to install everything and will report any error. If there is nothing that seems like an error. You’re good to go.

Blog

How to Install Numpy Python Package on Windows

Numpy (Numerical Python) is a great Python package that you should definitely make use of if you’re doing scientific computing Installing it on Windows might be difficult if you don’t know how to do it via command line. There are unofficial Windows binaries for Numpy for Windows 32 and 64 bit which make it super easy to install. Go to the link below and download the one for your system and Python version:http://www.

Blog

JointSNVMix Installation on Linux Mint 16 Cython, Pysam Included

JointSNVMix is a software package that consists of a number of tools for calling somatic mutations in tumour/normal paired NGS data. It requires Python (>= 2.7), Cython (>= 0.13) and Pysam (== 0.5.0). Python must be installed by default ona Linux machine so I will describe the installation of others and JointSNVMix. Note this guide may become outdated after some time so please make sure before following all. Install Cython

Blog

Set Up Google Cloud SDK on Windows using Cygwin

Windows isn’t the best environment for software development I believe but if you have to use it there are nice softwares to make it easy for you. Cygwin here will help us to use Google Cloud tools but installation requires certain things that you should be aware of beforehand. You’ll need Python latest 2.7.x Google Cloud SDK Cygwin 32-bit (i.e. setup-x86.exe - note only this one works) openssh, curl and latest 2.

Blog

Some String Functions in R, String Manipulation in R

I have programmed with Perl, Python, and PHP before, and string manipulation was more direct and easier in them than in R. But still there are useful functions for string manipulation in R. I’m not an expert in R but I’ve been dealing with it for a while and I’ve learned some good functions for this purpose. Concatenate strings Concatenation is done with paste function. It gets concatenated strings as arguments separated bu comma and also separator character(s).

Blog

First Impressions and Thoughts on Rosalind Project

Actually, I signed up Rosalind.info 8 months ago, I didn’t really play around with it. But last week, in a BiGCaT science cafe, after I learnt it, I was more interested than before and I just started solving problems. In each problem, you have a description about the context and also about the problem. Also, there is a sample input and output. Sometimes there are hints about the solution. What I did was to write a solution that works for the sample and hopefully for the problem.

Blog

Tags Cloud Sorted by Post Count for Jekyll Blogs without Plugins

Recently, I have been trying to transfer my old posts in a Blogger blog to my new Jekyll blog since I really liked this way of blogging. But there were some features that I like in Blogger and wasn’t supported in Jekyll by default. I did some research and found a very nice way of generating tags cloud the my blog. Although I build my blog locally and then push to GitHub pages, I still try not to use a custom plugin.

Blog

Hoş Geldim! Hoş Geldiniz!

Merhabalar, Biyoloji alanında özel olarak ilgi alanım olan ve daha fazla keşfetmem, üzerine çok şey öğrenmem gereken Biyoenformatik’i, bu blog aracılığıyla (olası ziyaretçilerimle birlikte) öğreneceğim. İlk yazımı biraz önce Biyoenformatik’in çeşitli otoriteler tarafından yapılan tanımları ile tamamladım. Daha sonra, Biyoenformatik’te geçen birçok ilkelerin tanımlarından da bahsetmek istiyorum. Ayrıca, Biyoenformatik hakkında yazılım dilleri, istatiksel yöntemler de yazılarımın konularını oluşturacak. Aynı zamanda Biyoenformatik ile ilgili haberlere de yer vermek ve bu haberlerle en son gelişmeleri takip etmeyi (ettirmeyi) planlıyorum.

Blog

Tags Cloud Sorted by Post Count for Jekyll Blogs without Plugins

Recently, I have been trying to transfer my old posts in a Blogger blog to my new Jekyll blog since I really liked this way of blogging. But there were some features that I like in Blogger and wasn’t supported in Jekyll by default. I did some research and found a very nice way of generating tags cloud the my blog. Although I build my blog locally and then push to GitHub pages, I still try not to use a custom plugin.

Blog

Tags Cloud Sorted by Post Count for Jekyll Blogs without Plugins

Recently, I have been trying to transfer my old posts in a Blogger blog to my new Jekyll blog since I really liked this way of blogging. But there were some features that I like in Blogger and wasn’t supported in Jekyll by default. I did some research and found a very nice way of generating tags cloud the my blog. Although I build my blog locally and then push to GitHub pages, I still try not to use a custom plugin.

Blog

Tags Cloud Sorted by Post Count for Jekyll Blogs without Plugins

Recently, I have been trying to transfer my old posts in a Blogger blog to my new Jekyll blog since I really liked this way of blogging. But there were some features that I like in Blogger and wasn’t supported in Jekyll by default. I did some research and found a very nice way of generating tags cloud the my blog. Although I build my blog locally and then push to GitHub pages, I still try not to use a custom plugin.

Blog

Tags Cloud Sorted by Post Count for Jekyll Blogs without Plugins

Recently, I have been trying to transfer my old posts in a Blogger blog to my new Jekyll blog since I really liked this way of blogging. But there were some features that I like in Blogger and wasn’t supported in Jekyll by default. I did some research and found a very nice way of generating tags cloud the my blog. Although I build my blog locally and then push to GitHub pages, I still try not to use a custom plugin.

Blog

How to Generate Database EER Diagrams from SQL Scripts using MySQL Workbench

MySQL Workbench makes it really easy to generate EER diagrams from SQL scripts. Follow below steps to make one for yourself. Download and install MySQL Workbench for your system. See below simple SQL commands, later I’ll use them to generate a sample diagram. 1create table country ( 2 id integer primary key, 3 name CHAR(55)); 4 5create table city ( 6 id integer primary key, 7 country_id integer, 8 name CHAR(55), 9 foreign key (country_id) references country(id)); Open MySQL Workbench and create a new model (File -> New Model).

Blog

How to Generate Database EER Diagrams from SQL Scripts using MySQL Workbench

MySQL Workbench makes it really easy to generate EER diagrams from SQL scripts. Follow below steps to make one for yourself. Download and install MySQL Workbench for your system. See below simple SQL commands, later I’ll use them to generate a sample diagram. 1create table country ( 2 id integer primary key, 3 name CHAR(55)); 4 5create table city ( 6 id integer primary key, 7 country_id integer, 8 name CHAR(55), 9 foreign key (country_id) references country(id)); Open MySQL Workbench and create a new model (File -> New Model).

Blog

How to Generate Database EER Diagrams from SQL Scripts using MySQL Workbench

MySQL Workbench makes it really easy to generate EER diagrams from SQL scripts. Follow below steps to make one for yourself. Download and install MySQL Workbench for your system. See below simple SQL commands, later I’ll use them to generate a sample diagram. 1create table country ( 2 id integer primary key, 3 name CHAR(55)); 4 5create table city ( 6 id integer primary key, 7 country_id integer, 8 name CHAR(55), 9 foreign key (country_id) references country(id)); Open MySQL Workbench and create a new model (File -> New Model).

Blog

Get Size of MySQL Databases

Use below query in MySQL command prompt to get a table of databases and their sizes in MB. SELECT table_schema "DB Name", Round(Sum(data_length + index_length) / 1024 / 1024, 1) "DB Size in MB" FROM information_schema.tables GROUP BY table_schema;

Blog

Configuring Mezzanine for Apache server & mod_wsgi in AWS

Install [Mezzanine]({% post_url 2015-05-01-how-to-install-mezzanine-on-ubuntulinux-mint %}), [Apache server]({% post_url 2015-05-08-getting-started-with-your-aws-instance-and %}) and mod_wsgi: sudo apt-get install libapache2-mod-wsgi sudo a2enmod wsgi Set up a MySQL database for your Mezzanine project Read [my post on how to set up a MySQL database for a Mezzanine project]({% post_url 2015-05-09-how-to-set-up-a-mysql-database-for-a-mezzanine %}) Collect static files: python manage.py collectstatic Configure your Apache server configuration for the project like following: WSGIPythonPath /home/ubuntu/www/mezzanine-project <VirtualHost *:80> #ServerName example.com ServerAdmin admin@example.com DocumentRoot /home/ubuntu/www/mezzanine-project WSGIScriptAlias / /home/ubuntu/www/mezzanine-project/wsgi.

Blog

How to Set Up a MySQL Database for a Mezzanine Project

Install MySQL server and python-mysqldb package: sudo apt-get install mysql-server sudo apt-get install python-mysqldb Run MySQL: mysql -u root -p Create a database: mysql> create database mezzanine_project; Confirm it: mysql> show databases; Exit: mysql> exit Configure local_settings.py: cd path/to/your/mezzanine/projectnano local_settings.py Like following: 1DATABASES = { 2 "default": { 3 "ENGINE": "django.db.backends.mysql", 4 "NAME": "mezzanine_project", 5 "USER": "root", 6 "PASSWORD": "123456", 7 "HOST": "", 8 "PORT": "", 9 } 10 } Note: Replace your password

Blog

How to Clear (or Drop) DB Table of A Django App

Let’s say you created a Django app and ran python manage.py syncdb and created its table. Everytime you make a change in the table, you’ll need to drop that table and run python manage.py syncdb again to update. And how you drop a table of a Django app: $ python manage.py sqlclear app_name | python manage.py dbshell Drop tables of an app with migrations (Django >= 1.8): $ python manage.py migrate appname zero Recreate all the tables:

Blog

Install Apache2, PHP5, MySQL & phpMyAdmin on Ubuntu 12.04

First, install apache2: sudo apt-get install apache2 Then, for it to work: sudo service apache2 restart For custom www folder: sudo cp /etc/apache2/sites-available/default /etc/apache2/sites-available/www gksudo gedit /etc/apache2/sites-available/www Change DocumentRoot and Directory directive to point to new location. For example, /home/user/www/ Save and see (link here clean URLs not working Laravel 4) Make www default and disable default: sudo a2dissite default && sudo a2ensite www sudo service apache2 restart Create new file in www

Blog

Retrieving Data with AJAX using jQuery, PHP and MySQL

Last semester, I took a course from Informatics Institute at METU called “Biological Databases and Data Analysis Tools” where first we learned what is a database and how to do queries on it. Also, the technology behind databases are taught. Then, we learned many biological databases and data analysis tools available. These include gene, protein and pathway databases, tools for creating databases. As a final project, we were asked to create an online tool that can search a database and get the data and display it on any web browsers.

Blog

How to Generate Database EER Diagrams from SQL Scripts using MySQL Workbench

MySQL Workbench makes it really easy to generate EER diagrams from SQL scripts. Follow below steps to make one for yourself. Download and install MySQL Workbench for your system. See below simple SQL commands, later I’ll use them to generate a sample diagram. 1create table country ( 2 id integer primary key, 3 name CHAR(55)); 4 5create table city ( 6 id integer primary key, 7 country_id integer, 8 name CHAR(55), 9 foreign key (country_id) references country(id)); Open MySQL Workbench and create a new model (File -> New Model).

Blog

How to Generate Database EER Diagrams from SQL Scripts using MySQL Workbench

MySQL Workbench makes it really easy to generate EER diagrams from SQL scripts. Follow below steps to make one for yourself. Download and install MySQL Workbench for your system. See below simple SQL commands, later I’ll use them to generate a sample diagram. 1create table country ( 2 id integer primary key, 3 name CHAR(55)); 4 5create table city ( 6 id integer primary key, 7 country_id integer, 8 name CHAR(55), 9 foreign key (country_id) references country(id)); Open MySQL Workbench and create a new model (File -> New Model).

Blog

Replace Entire Column with a Number in Bash

Use below awk one-liner to replace all values in a column (5th column in example) with a value (1 in example). awk "{$5=1} {print}" filename > filename.replaced

Blog

Replace Entire Column with a Number in Bash

Use below awk one-liner to replace all values in a column (5th column in example) with a value (1 in example). awk "{$5=1} {print}" filename > filename.replaced

Blog

Replace Entire Column with a Number in Bash

Use below awk one-liner to replace all values in a column (5th column in example) with a value (1 in example). awk "{$5=1} {print}" filename > filename.replaced

Blog

Make a Shortcut for SSH Connections

It could be really annoying to reenter the host name again and again if you are working over ssh and the host name is really long (i.e. mistral.ii.metu.edu.tr). Using this method, you can set a shorcut for the host name (i.e. mistral) and use it whenever you connect. Open ~/.ssh/config for editing: subl ~/.ssh/config Add your host definition as follows: Host mistral HostName mistral.ii.metu.edu.tr User gbudak

Blog

Make a Shortcut for SSH Connections

It could be really annoying to reenter the host name again and again if you are working over ssh and the host name is really long (i.e. mistral.ii.metu.edu.tr). Using this method, you can set a shorcut for the host name (i.e. mistral) and use it whenever you connect. Open ~/.ssh/config for editing: subl ~/.ssh/config Add your host definition as follows: Host mistral HostName mistral.ii.metu.edu.tr User gbudak

Blog

Make a Shortcut for SSH Connections

It could be really annoying to reenter the host name again and again if you are working over ssh and the host name is really long (i.e. mistral.ii.metu.edu.tr). Using this method, you can set a shorcut for the host name (i.e. mistral) and use it whenever you connect. Open ~/.ssh/config for editing: subl ~/.ssh/config Add your host definition as follows: Host mistral HostName mistral.ii.metu.edu.tr User gbudak

Blog

Make a Shortcut for SSH Connections

It could be really annoying to reenter the host name again and again if you are working over ssh and the host name is really long (i.e. mistral.ii.metu.edu.tr). Using this method, you can set a shorcut for the host name (i.e. mistral) and use it whenever you connect. Open ~/.ssh/config for editing: subl ~/.ssh/config Add your host definition as follows: Host mistral HostName mistral.ii.metu.edu.tr User gbudak

Blog

Make a Shortcut for SSH Connections

It could be really annoying to reenter the host name again and again if you are working over ssh and the host name is really long (i.e. mistral.ii.metu.edu.tr). Using this method, you can set a shorcut for the host name (i.e. mistral) and use it whenever you connect. Open ~/.ssh/config for editing: subl ~/.ssh/config Add your host definition as follows: Host mistral HostName mistral.ii.metu.edu.tr User gbudak

Blog

Passwordless SSH for Mac/Linux

You don’t have to enter the ssh password everytime you make a connection. Use below method to generate a key, copy it to the host you want to connect and connect anytime without entering your password. Generate a keygen: 1ssh-keygen Copy the key to remote host: 1ssh-copy-id root@linuxconfig.org

Blog

Uploading Files to AWS using SSH/SCP

Here is a small command for uploading files to AWS through SSH’s command scp (secure copy). scp -i path/to/your/key-pairs/file path/to/file/you/want/to/upload ubuntu@PUBLIC_DNS:path/to/the/destination

Blog

AWS Start an Instance and Connect to it

Go to EC2 management console Create a new key-pair if necessary and download it Launch an instance Add HTTP security group for web applications over HTTP Get public DNS Change permissions on key-pair file: 1chmod 400 path/to/your/file.pem Connect: 1ssh -i path/to/your/file.pem ubuntu@PUBLIC_DNS Note: ubuntu is for connecting an Ubuntu 64 bit instance. It’s different for others

Blog

Passwordless SSH for Mac/Linux

You don’t have to enter the ssh password everytime you make a connection. Use below method to generate a key, copy it to the host you want to connect and connect anytime without entering your password. Generate a keygen: 1ssh-keygen Copy the key to remote host: 1ssh-copy-id root@linuxconfig.org

Blog

Passwordless SSH for Mac/Linux

You don’t have to enter the ssh password everytime you make a connection. Use below method to generate a key, copy it to the host you want to connect and connect anytime without entering your password. Generate a keygen: 1ssh-keygen Copy the key to remote host: 1ssh-copy-id root@linuxconfig.org

Blog

Computing Significance of Overlap between Two Sets using Hypergeometric Test

There are many cases where we have two sets (e.g. under two different conditions) of things such as transcripts, genes or proteins and we want to compute the significance of the overlap between them. Hypergeometric test is very simple and widely used option for such cases. I’ll use the phyper function in R but you can use the same idea in SciPy (Python). Let’s say you have from 200 genes (A);

Blog

Computing Significance of Overlap between Two Sets using Hypergeometric Test

There are many cases where we have two sets (e.g. under two different conditions) of things such as transcripts, genes or proteins and we want to compute the significance of the overlap between them. Hypergeometric test is very simple and widely used option for such cases. I’ll use the phyper function in R but you can use the same idea in SciPy (Python). Let’s say you have from 200 genes (A);

Blog

Computing Significance of Overlap between Two Sets using Hypergeometric Test

There are many cases where we have two sets (e.g. under two different conditions) of things such as transcripts, genes or proteins and we want to compute the significance of the overlap between them. Hypergeometric test is very simple and widely used option for such cases. I’ll use the phyper function in R but you can use the same idea in SciPy (Python). Let’s say you have from 200 genes (A);

Blog

An Exon of Length 2 Appeared in Ensembl

I want to share an interesting finding about our research on exon/intron analysis of human evolutionary history. So I had the genes that emerged at each pass point of human history and I was using Ensembl API to get exons and introns of these genes to perform further analyses. There was one gene (ENSG00000197568 - HERV-H LTR-associating 3 - HHLA3) with a surprise. Because it’s one transcript (ENST00000432224) had an exon (ENSE00001707577) of length 2.

Blog

Computing Significance of Overlap between Two Sets using Hypergeometric Test

There are many cases where we have two sets (e.g. under two different conditions) of things such as transcripts, genes or proteins and we want to compute the significance of the overlap between them. Hypergeometric test is very simple and widely used option for such cases. I’ll use the phyper function in R but you can use the same idea in SciPy (Python). Let’s say you have from 200 genes (A);

Blog

Computing Significance of Overlap between Two Sets using Hypergeometric Test

There are many cases where we have two sets (e.g. under two different conditions) of things such as transcripts, genes or proteins and we want to compute the significance of the overlap between them. Hypergeometric test is very simple and widely used option for such cases. I’ll use the phyper function in R but you can use the same idea in SciPy (Python). Let’s say you have from 200 genes (A);

Blog

Computing Significance of Overlap between Two Sets using Hypergeometric Test

There are many cases where we have two sets (e.g. under two different conditions) of things such as transcripts, genes or proteins and we want to compute the significance of the overlap between them. Hypergeometric test is very simple and widely used option for such cases. I’ll use the phyper function in R but you can use the same idea in SciPy (Python). Let’s say you have from 200 genes (A);

Blog

Computing Significance of Overlap between Two Sets using Hypergeometric Test

There are many cases where we have two sets (e.g. under two different conditions) of things such as transcripts, genes or proteins and we want to compute the significance of the overlap between them. Hypergeometric test is very simple and widely used option for such cases. I’ll use the phyper function in R but you can use the same idea in SciPy (Python). Let’s say you have from 200 genes (A);

Blog

Computing Significance of Overlap between Two Sets using Hypergeometric Test

There are many cases where we have two sets (e.g. under two different conditions) of things such as transcripts, genes or proteins and we want to compute the significance of the overlap between them. Hypergeometric test is very simple and widely used option for such cases. I’ll use the phyper function in R but you can use the same idea in SciPy (Python). Let’s say you have from 200 genes (A);

Blog

Computing Significance of Overlap between Two Sets using Hypergeometric Test

There are many cases where we have two sets (e.g. under two different conditions) of things such as transcripts, genes or proteins and we want to compute the significance of the overlap between them. Hypergeometric test is very simple and widely used option for such cases. I’ll use the phyper function in R but you can use the same idea in SciPy (Python). Let’s say you have from 200 genes (A);

Blog

Computing Significance of Overlap between Two Sets using Hypergeometric Test

There are many cases where we have two sets (e.g. under two different conditions) of things such as transcripts, genes or proteins and we want to compute the significance of the overlap between them. Hypergeometric test is very simple and widely used option for such cases. I’ll use the phyper function in R but you can use the same idea in SciPy (Python). Let’s say you have from 200 genes (A);

Blog

An Exon of Length 2 Appeared in Ensembl

I want to share an interesting finding about our research on exon/intron analysis of human evolutionary history. So I had the genes that emerged at each pass point of human history and I was using Ensembl API to get exons and introns of these genes to perform further analyses. There was one gene (ENSG00000197568 - HERV-H LTR-associating 3 - HHLA3) with a surprise. Because it’s one transcript (ENST00000432224) had an exon (ENSE00001707577) of length 2.

Blog

ODTÜ Enformatik Enstitüsü'nün 20. Yılı Etkinliği

ODTÜ Enformatik Enstitüsü kuruluşunun 20. yılını bir bilim festivaliyle kutluyor. 16 Mayıs 2016‘da, ODTÜ Kültür ve Kongre Merkezi’nde gerçekleştirilecek olan bilim festivaline herkes davetlidir! Bilime, sanat ve müziğin de eşlik edeceği bu festivalde aşağıdaki ana konuşmacılar yer alacaktır: Prof. Dr. Jennifer Hayes: New England Microsoft Araştırma ve New York Microsoft Araştırma yönetici ve eş kurucu Assoc. Prof. Claudio Ferretti: Milano-Bicocca Üniversitesi, Bilgisayar Bilimi, Sistemleri ve İletişimi Dr. Christian Borgs: Araştırmacı, New England Microsoft Araştırma vekil yönetici ve eş kurucu Etkinliğin Facebook sayfasına gitmek için tıklayın.

Blog

ODTÜ Enformatik Enstitüsü'nün 20. Yılı Etkinliği

ODTÜ Enformatik Enstitüsü kuruluşunun 20. yılını bir bilim festivaliyle kutluyor. 16 Mayıs 2016‘da, ODTÜ Kültür ve Kongre Merkezi’nde gerçekleştirilecek olan bilim festivaline herkes davetlidir! Bilime, sanat ve müziğin de eşlik edeceği bu festivalde aşağıdaki ana konuşmacılar yer alacaktır: Prof. Dr. Jennifer Hayes: New England Microsoft Araştırma ve New York Microsoft Araştırma yönetici ve eş kurucu Assoc. Prof. Claudio Ferretti: Milano-Bicocca Üniversitesi, Bilgisayar Bilimi, Sistemleri ve İletişimi Dr. Christian Borgs: Araştırmacı, New England Microsoft Araştırma vekil yönetici ve eş kurucu Etkinliğin Facebook sayfasına gitmek için tıklayın.

Blog

ODTÜ Enformatik Enstitüsü'nün 20. Yılı Etkinliği

ODTÜ Enformatik Enstitüsü kuruluşunun 20. yılını bir bilim festivaliyle kutluyor. 16 Mayıs 2016‘da, ODTÜ Kültür ve Kongre Merkezi’nde gerçekleştirilecek olan bilim festivaline herkes davetlidir! Bilime, sanat ve müziğin de eşlik edeceği bu festivalde aşağıdaki ana konuşmacılar yer alacaktır: Prof. Dr. Jennifer Hayes: New England Microsoft Araştırma ve New York Microsoft Araştırma yönetici ve eş kurucu Assoc. Prof. Claudio Ferretti: Milano-Bicocca Üniversitesi, Bilgisayar Bilimi, Sistemleri ve İletişimi Dr. Christian Borgs: Araştırmacı, New England Microsoft Araştırma vekil yönetici ve eş kurucu Etkinliğin Facebook sayfasına gitmek için tıklayın.

Blog

Biyoinformatik mi? Yoksa Biyoenformatik mi?

Yazılarıma konu ararken kitaplarla birlikte interneti de karıştırıyorum. Yabancı kaynaklar elbette fazlaca var ve yeterliler, ancak Türkçe kaynaklara baktığımda ilk gözüme çarpan bu alanın isminin farklı kullanımları oldu. Biliyorsunuz, İngilizcede bu alana bioinformatics deniyor. Gayet normal, çünkü İngilizcede informatics ics eki ile birlikte information sözcüğünden geliyor. Bu sözcük ise Latince kökene sahip1. Enformatik sözcüğü Türkçeye, Fransızcadan informatique sözcüğünden, enformatik olarak gelmiş, ayrıca bilişim olarak da Türkçesi önerilmiş2. Elbette bu Fransızca sözcük de İngilizcesi ile aynı kökene sahip.

Blog

ODTÜ Enformatik Enstitüsü'nün 20. Yılı Etkinliği

ODTÜ Enformatik Enstitüsü kuruluşunun 20. yılını bir bilim festivaliyle kutluyor. 16 Mayıs 2016‘da, ODTÜ Kültür ve Kongre Merkezi’nde gerçekleştirilecek olan bilim festivaline herkes davetlidir! Bilime, sanat ve müziğin de eşlik edeceği bu festivalde aşağıdaki ana konuşmacılar yer alacaktır: Prof. Dr. Jennifer Hayes: New England Microsoft Araştırma ve New York Microsoft Araştırma yönetici ve eş kurucu Assoc. Prof. Claudio Ferretti: Milano-Bicocca Üniversitesi, Bilgisayar Bilimi, Sistemleri ve İletişimi Dr. Christian Borgs: Araştırmacı, New England Microsoft Araştırma vekil yönetici ve eş kurucu Etkinliğin Facebook sayfasına gitmek için tıklayın.

Blog

7th International Symposium on Health Informatics and Bioinformatics

7. Sağlık Enformatiği ve Biyoenformatik üzerine Uluslararası Sempozyumu, 7th International Symposium on Health Informatics and Bioinformatics (HIBIT 2012), ilk kez 2005’te ODTÜ Enformatik Enstitüsü tarafından düzenlenmiş ve Sağlık Enformatiği, Tıbbi Enformatik, Hesaplamalı Biyoloji ve Biyoenformatik alanlarında akademisyenleri ve araştırmacıları bir araya getirmeyi ve bu alanlar hakkında yapılan çalışmaların sunulmasına ortam sağlamayı ve çalışmalar üzerine interaktif bir şekilde değerlendirmeler yapmayı amaçlamaktadır. Bu sene, 19-22 Nisan 2012’de Ürgüp, Nevşehir Perissia Hotel’de düzenlenecek olan HIBIT 2012 organizasyonu ODTÜ, ODTÜ Enformatik Enstitüsü, ODTÜ Biyolojik Bilimler Bölümü ve ODTÜ Bilgisayar Mühendisliği Bölümü partnerliği ile gerçekleştirilmektedir.

Blog

ODTÜ Enformatik Enstitüsü'nün 20. Yılı Etkinliği

ODTÜ Enformatik Enstitüsü kuruluşunun 20. yılını bir bilim festivaliyle kutluyor. 16 Mayıs 2016‘da, ODTÜ Kültür ve Kongre Merkezi’nde gerçekleştirilecek olan bilim festivaline herkes davetlidir! Bilime, sanat ve müziğin de eşlik edeceği bu festivalde aşağıdaki ana konuşmacılar yer alacaktır: Prof. Dr. Jennifer Hayes: New England Microsoft Araştırma ve New York Microsoft Araştırma yönetici ve eş kurucu Assoc. Prof. Claudio Ferretti: Milano-Bicocca Üniversitesi, Bilgisayar Bilimi, Sistemleri ve İletişimi Dr. Christian Borgs: Araştırmacı, New England Microsoft Araştırma vekil yönetici ve eş kurucu Etkinliğin Facebook sayfasına gitmek için tıklayın.

Blog

ODTÜ Enformatik Enstitüsü'nün 20. Yılı Etkinliği

ODTÜ Enformatik Enstitüsü kuruluşunun 20. yılını bir bilim festivaliyle kutluyor. 16 Mayıs 2016‘da, ODTÜ Kültür ve Kongre Merkezi’nde gerçekleştirilecek olan bilim festivaline herkes davetlidir! Bilime, sanat ve müziğin de eşlik edeceği bu festivalde aşağıdaki ana konuşmacılar yer alacaktır: Prof. Dr. Jennifer Hayes: New England Microsoft Araştırma ve New York Microsoft Araştırma yönetici ve eş kurucu Assoc. Prof. Claudio Ferretti: Milano-Bicocca Üniversitesi, Bilgisayar Bilimi, Sistemleri ve İletişimi Dr. Christian Borgs: Araştırmacı, New England Microsoft Araştırma vekil yönetici ve eş kurucu Etkinliğin Facebook sayfasına gitmek için tıklayın.

Blog

7th International Symposium on Health Informatics and Bioinformatics

7. Sağlık Enformatiği ve Biyoenformatik üzerine Uluslararası Sempozyumu, 7th International Symposium on Health Informatics and Bioinformatics (HIBIT 2012), ilk kez 2005’te ODTÜ Enformatik Enstitüsü tarafından düzenlenmiş ve Sağlık Enformatiği, Tıbbi Enformatik, Hesaplamalı Biyoloji ve Biyoenformatik alanlarında akademisyenleri ve araştırmacıları bir araya getirmeyi ve bu alanlar hakkında yapılan çalışmaların sunulmasına ortam sağlamayı ve çalışmalar üzerine interaktif bir şekilde değerlendirmeler yapmayı amaçlamaktadır. Bu sene, 19-22 Nisan 2012’de Ürgüp, Nevşehir Perissia Hotel’de düzenlenecek olan HIBIT 2012 organizasyonu ODTÜ, ODTÜ Enformatik Enstitüsü, ODTÜ Biyolojik Bilimler Bölümü ve ODTÜ Bilgisayar Mühendisliği Bölümü partnerliği ile gerçekleştirilmektedir.

Blog

Mann Whitney U Test (Wilcoxon Rank-Sum Test) Javascript Implementation

Currently Javascript is really poor in statistical methods compared to Python (SciPy) and R. There are several efforts to fill this gap, most notably from jStat. However, still many functions, distributions and tests are missing in this library. In one of my projects, I had to implement a Javascript version of Mann Whitney U test (or also called Wilcoxon rank-sum test). Here, I’m giving a link to its source code and describing how it works.

Blog

Mann Whitney U Test (Wilcoxon Rank-Sum Test) Javascript Implementation

Currently Javascript is really poor in statistical methods compared to Python (SciPy) and R. There are several efforts to fill this gap, most notably from jStat. However, still many functions, distributions and tests are missing in this library. In one of my projects, I had to implement a Javascript version of Mann Whitney U test (or also called Wilcoxon rank-sum test). Here, I’m giving a link to its source code and describing how it works.

Blog

Mann Whitney U Test (Wilcoxon Rank-Sum Test) Javascript Implementation

Currently Javascript is really poor in statistical methods compared to Python (SciPy) and R. There are several efforts to fill this gap, most notably from jStat. However, still many functions, distributions and tests are missing in this library. In one of my projects, I had to implement a Javascript version of Mann Whitney U test (or also called Wilcoxon rank-sum test). Here, I’m giving a link to its source code and describing how it works.

Blog

Mann Whitney U Test (Wilcoxon Rank-Sum Test) Javascript Implementation

Currently Javascript is really poor in statistical methods compared to Python (SciPy) and R. There are several efforts to fill this gap, most notably from jStat. However, still many functions, distributions and tests are missing in this library. In one of my projects, I had to implement a Javascript version of Mann Whitney U test (or also called Wilcoxon rank-sum test). Here, I’m giving a link to its source code and describing how it works.

Blog

Mann Whitney U Test (Wilcoxon Rank-Sum Test) Javascript Implementation

Currently Javascript is really poor in statistical methods compared to Python (SciPy) and R. There are several efforts to fill this gap, most notably from jStat. However, still many functions, distributions and tests are missing in this library. In one of my projects, I had to implement a Javascript version of Mann Whitney U test (or also called Wilcoxon rank-sum test). Here, I’m giving a link to its source code and describing how it works.

Blog

Mann Whitney U Test (Wilcoxon Rank-Sum Test) Javascript Implementation

Currently Javascript is really poor in statistical methods compared to Python (SciPy) and R. There are several efforts to fill this gap, most notably from jStat. However, still many functions, distributions and tests are missing in this library. In one of my projects, I had to implement a Javascript version of Mann Whitney U test (or also called Wilcoxon rank-sum test). Here, I’m giving a link to its source code and describing how it works.

Blog

Mann Whitney U Test (Wilcoxon Rank-Sum Test) Javascript Implementation

Currently Javascript is really poor in statistical methods compared to Python (SciPy) and R. There are several efforts to fill this gap, most notably from jStat. However, still many functions, distributions and tests are missing in this library. In one of my projects, I had to implement a Javascript version of Mann Whitney U test (or also called Wilcoxon rank-sum test). Here, I’m giving a link to its source code and describing how it works.

Blog

Mann Whitney U Test (Wilcoxon Rank-Sum Test) Javascript Implementation

Currently Javascript is really poor in statistical methods compared to Python (SciPy) and R. There are several efforts to fill this gap, most notably from jStat. However, still many functions, distributions and tests are missing in this library. In one of my projects, I had to implement a Javascript version of Mann Whitney U test (or also called Wilcoxon rank-sum test). Here, I’m giving a link to its source code and describing how it works.

Blog

First Impressions and Thoughts on Rosalind Project

Actually, I signed up Rosalind.info 8 months ago, I didn’t really play around with it. But last week, in a BiGCaT science cafe, after I learnt it, I was more interested than before and I just started solving problems. In each problem, you have a description about the context and also about the problem. Also, there is a sample input and output. Sometimes there are hints about the solution. What I did was to write a solution that works for the sample and hopefully for the problem.

Blog

Mann Whitney U Test (Wilcoxon Rank-Sum Test) Javascript Implementation

Currently Javascript is really poor in statistical methods compared to Python (SciPy) and R. There are several efforts to fill this gap, most notably from jStat. However, still many functions, distributions and tests are missing in this library. In one of my projects, I had to implement a Javascript version of Mann Whitney U test (or also called Wilcoxon rank-sum test). Here, I’m giving a link to its source code and describing how it works.

Blog

MiClip 1.3 Installation

MiClip is a CLIP-seq data peak calling algorithm implemented in R but currently it doesn’t show up in the CRAN but you can obtain it from the archive and install from the source or tar.gz file. Download the tar.gz file: wget https://cran.r-project.org/src/contrib/Archive/MiClip/MiClip_1.3.tar.gz Start R: R Install dependencies: 1install.packages("moments") 2install.packages("VGAM") Finally install MiClip 1.3: 1install.packages("MiClip_1.3.tar.gz", repos = NULL, type="source") Then you can test it by loading the package and viewing its help file.

Blog

MiClip 1.3 Installation

MiClip is a CLIP-seq data peak calling algorithm implemented in R but currently it doesn’t show up in the CRAN but you can obtain it from the archive and install from the source or tar.gz file. Download the tar.gz file: wget https://cran.r-project.org/src/contrib/Archive/MiClip/MiClip_1.3.tar.gz Start R: R Install dependencies: 1install.packages("moments") 2install.packages("VGAM") Finally install MiClip 1.3: 1install.packages("MiClip_1.3.tar.gz", repos = NULL, type="source") Then you can test it by loading the package and viewing its help file.

Blog

Generating 2D SVG Images of MOL Files using RDKit Transparent Background

The latest release of RDKit (2015-03) can generate SVG images with several lines of codes but by default the generated SVG image has a white background. The investigations on sources didn’t solve my problem as I couldn’t find any option for setting background to transparent background. An example of SVG image generation can be found on RDKit blog post called New Drawing Code. In [3] shows the SVG image generation and it returns the SVG file content in XML.

Blog

Generating 2D SVG Images of MOL Files using RDKit Transparent Background

The latest release of RDKit (2015-03) can generate SVG images with several lines of codes but by default the generated SVG image has a white background. The investigations on sources didn’t solve my problem as I couldn’t find any option for setting background to transparent background. An example of SVG image generation can be found on RDKit blog post called New Drawing Code. In [3] shows the SVG image generation and it returns the SVG file content in XML.

Blog

Install RDKit 2015-03 Build on Ubuntu 14.04 / Linux Mint 17

RDKit is an open source toolkit for cheminformatics. It has many functionalities to work with chemical files. Follow the below guide to install RDKit 2015-03 build on an Ubuntu 14.04 / Linux Mint 17 computer. Since Ubuntu packages don’t have the latest RDKit for trusty, you have to build RDKit from its source. Install Dependencies 1sudo apt-get install flex bison build-essential python-numpy cmake python-dev sqlite3 libsqlite3-dev libboost1.54-all-dev Download the Build

Blog

Generating 2D SVG Images of MOL Files using RDKit Transparent Background

The latest release of RDKit (2015-03) can generate SVG images with several lines of codes but by default the generated SVG image has a white background. The investigations on sources didn’t solve my problem as I couldn’t find any option for setting background to transparent background. An example of SVG image generation can be found on RDKit blog post called New Drawing Code. In [3] shows the SVG image generation and it returns the SVG file content in XML.

Blog

Generating 2D SVG Images of MOL Files using RDKit Transparent Background

The latest release of RDKit (2015-03) can generate SVG images with several lines of codes but by default the generated SVG image has a white background. The investigations on sources didn’t solve my problem as I couldn’t find any option for setting background to transparent background. An example of SVG image generation can be found on RDKit blog post called New Drawing Code. In [3] shows the SVG image generation and it returns the SVG file content in XML.

Blog

Install Cairo Graphics and PyCairo on Ubuntu 14.04 / Linux Mint 17

Cairo is a 2D graphics library implemented as a library written in the C programming language but if you’d like to use Python programming language, you should also install Python bindings for Cairo. This guide will go through installation of Cairo Graphics library version 1.14.2 (most recent) and py2cairo Python bindings version 1.10.1 (also most recent). Install Cairo It’s very easy with the following repository. Just add it, update your packages and install.

Blog

Install Cairo Graphics and PyCairo on Ubuntu 14.04 / Linux Mint 17

Cairo is a 2D graphics library implemented as a library written in the C programming language but if you’d like to use Python programming language, you should also install Python bindings for Cairo. This guide will go through installation of Cairo Graphics library version 1.14.2 (most recent) and py2cairo Python bindings version 1.10.1 (also most recent). Install Cairo It’s very easy with the following repository. Just add it, update your packages and install.

Blog

Install Cairo Graphics and PyCairo on Ubuntu 14.04 / Linux Mint 17

Cairo is a 2D graphics library implemented as a library written in the C programming language but if you’d like to use Python programming language, you should also install Python bindings for Cairo. This guide will go through installation of Cairo Graphics library version 1.14.2 (most recent) and py2cairo Python bindings version 1.10.1 (also most recent). Install Cairo It’s very easy with the following repository. Just add it, update your packages and install.

Blog

Install Cairo Graphics and PyCairo on Ubuntu 14.04 / Linux Mint 17

Cairo is a 2D graphics library implemented as a library written in the C programming language but if you’d like to use Python programming language, you should also install Python bindings for Cairo. This guide will go through installation of Cairo Graphics library version 1.14.2 (most recent) and py2cairo Python bindings version 1.10.1 (also most recent). Install Cairo It’s very easy with the following repository. Just add it, update your packages and install.

Blog

Install Cairo Graphics and PyCairo on Ubuntu 14.04 / Linux Mint 17

Cairo is a 2D graphics library implemented as a library written in the C programming language but if you’d like to use Python programming language, you should also install Python bindings for Cairo. This guide will go through installation of Cairo Graphics library version 1.14.2 (most recent) and py2cairo Python bindings version 1.10.1 (also most recent). Install Cairo It’s very easy with the following repository. Just add it, update your packages and install.

Blog

Install RDKit 2015-03 Build on Ubuntu 14.04 / Linux Mint 17

RDKit is an open source toolkit for cheminformatics. It has many functionalities to work with chemical files. Follow the below guide to install RDKit 2015-03 build on an Ubuntu 14.04 / Linux Mint 17 computer. Since Ubuntu packages don’t have the latest RDKit for trusty, you have to build RDKit from its source. Install Dependencies 1sudo apt-get install flex bison build-essential python-numpy cmake python-dev sqlite3 libsqlite3-dev libboost1.54-all-dev Download the Build

Blog

Install RDKit 2015-03 Build on Ubuntu 14.04 / Linux Mint 17

RDKit is an open source toolkit for cheminformatics. It has many functionalities to work with chemical files. Follow the below guide to install RDKit 2015-03 build on an Ubuntu 14.04 / Linux Mint 17 computer. Since Ubuntu packages don’t have the latest RDKit for trusty, you have to build RDKit from its source. Install Dependencies 1sudo apt-get install flex bison build-essential python-numpy cmake python-dev sqlite3 libsqlite3-dev libboost1.54-all-dev Download the Build

Blog

ClipCrop Installation on Linux Mint 16 nvm, Node, npm Included

ClipCrop is a tool for detecting structural variations from SAM files. And it’s built with Node.js. ClipCrop uses two softwares internally so they should be installed first. Install SHRiMP2 SHRiMP is a software package for aligning genomic reads against a target genome. 1$ mkdir ~/software 2$ cd ~/software 3$ wget http://compbio.cs.toronto.edu/shrimp/releases/SHRiMP_2_2_3.lx26.x86_64.tar.gz 4$ tar xzvf SHRiMP_2_2_3.lx26.x86_64.tar.gz 5$ cd SHRiMP_2_2_3 6$ file bin/gmapper 7$ export SHRIMP_FOLDER=$PWD Install BWA BWA is a software package for mapping low-divergent sequences against a large reference genome.

Blog

Install RDKit 2015-03 Build on Ubuntu 14.04 / Linux Mint 17

RDKit is an open source toolkit for cheminformatics. It has many functionalities to work with chemical files. Follow the below guide to install RDKit 2015-03 build on an Ubuntu 14.04 / Linux Mint 17 computer. Since Ubuntu packages don’t have the latest RDKit for trusty, you have to build RDKit from its source. Install Dependencies 1sudo apt-get install flex bison build-essential python-numpy cmake python-dev sqlite3 libsqlite3-dev libboost1.54-all-dev Download the Build

Blog

How to Install Mezzanine on Ubuntu/Linux Mint [Complete Guide]

Mezzanine is a CMS application built on Django web framework. The installation steps are easy but your environment may not just suitable enough for it work without a problem. So, here I’m going to describe complete installation from scratch on a virtual environment. First of all, install virtualenv: $ sudo apt-get install python-virtualenv Then, create a virtual environment: $ virtualenv testenv And, activate it: $ cd testenv $ source bin/activate

Blog

Geany Color Schemes Ubuntu

There is a collection of color schemes for Geany as well. Download it on GitHub and follow the instructions. You’ll need to extract and copy all the files in colorschemes directory to ~/.config/geany/colorschemes/ Then, restart Geany and go to View -> Editor -> Color Schemes and choose your style. I’m using Tango. Source

Blog

Install Geany 1.23 on Ubuntu

Geany is a really nice text editor for Ubuntu. I would recommend it with TreeBrowser and some interface coding are color schemes. But you’ll need the latest version which is 1.23 for now. To install this version you need to add PPA, also this will keep it updated when you update your system. Execute following lines one by one: sudo add-apt-repository ppa:geany-dev/ppa sudo apt-get update sudo apt-get install geany Then, when you start Geany you’ll see “This is Geany 1.

Blog

Install Apache2, PHP5, MySQL & phpMyAdmin on Ubuntu 12.04

First, install apache2: sudo apt-get install apache2 Then, for it to work: sudo service apache2 restart For custom www folder: sudo cp /etc/apache2/sites-available/default /etc/apache2/sites-available/www gksudo gedit /etc/apache2/sites-available/www Change DocumentRoot and Directory directive to point to new location. For example, /home/user/www/ Save and see (link here clean URLs not working Laravel 4) Make www default and disable default: sudo a2dissite default && sudo a2ensite www sudo service apache2 restart Create new file in www

Blog

Install Perl DBI Module on Ubuntu 12.04

On Terminal, run: sudo apt-get install libdbi-perl Source

Blog

Start Ubuntu 12.04 Bluetooth Off

On Terminal: sudo gedit /etc/rc.local Add following before the line “exit 0” rfkill block bluetooth Save Source

Blog

Install Steam on Ubuntu 12.04

Download steam_latest.deb at: http://repo.steampowered.com/steam/archive/precise/steam_latest.deb Double click and open it on Ubuntu Software Center and click Install It’ll start Terminal and ask password for sudo because there are some packages required, enter your password and continue Next it’ll update itself Done Source

Blog

Enable Hibernation for Lenovo Z500 on Ubuntu 12.04

Using Terminal add this file: sudo gedit /etc/polkit-1/localauthority/50-local.d/com.ubuntu.enable-hibernate.pkla This: [Re-enable hibernate by default] Identity=unix-user:* Action=org.freedesktop.upower.hibernate ResultActive=yes Save & reboot Source

Blog

Install Spotify on Ubuntu 12.04

Start Software Sources from Dash Home Add following in Other Sources tab: deb http://repository.spotify.com stable non-free Close Software Sources Add Spotify repo key on Terminal: sudo apt-key adv –keyserver keyserver.ubuntu.com –recv-keys 94558F59 Install Spotify on Terminal: sudo apt-get update && sudo apt-get install spotify-client Find Spotify in Dash Home Source

Blog

Enable Software Sources in Dash Home Ubuntu 12.04

First copy the software sources desktop file to your local applications folder: mkdir -p ~/.local/share/applications cp /usr/share/applications/software-properties-gtk.desktop ~/.local/share/applications/ Edit the file & change the line NoDisplay=true to NoDisplay=false: gedit ~/.local/share/applications/software-properties-gtk.desktop Save, logout and login Source

Blog

Save Brightness Settings Ubuntu 12.04 LTS

If your laptop starts with minimized or maximized brightness and you want to have a fixed default value for that do following: Run terminal and type to get maximum brightness: cat /sys/class/backlight/acpi_video0/max_brightness Now set the brightness as you want and run following which give you the value for current setting: cat /sys/class/backlight/acpi_video0/brightness Edit /etc/rc.local to have that value as default after each reboot / start: sudo gedit /etc/rc.local Add this line before exit 0:

Blog

Hotkeys (special keys) Volume/Brightness Controls Don't Work After Suspend

What seems to solve this problem on Ubuntu 12.04 LTS (Lenovo Z500): Open this file: sudo gedit /etc/default/grub Modify the line as this: GRUB_CMDLINE_LINUX="noapic" Close it and run the following: sudo update-grub Restart your computer Source

Blog

How To Make A File or Script Executable in Ubuntu

Start terminal CTRL + Alt + T can be used (or just go to Dash Home and type Terminal): Run this command below: sudo chmod +x /path/to/your/file Source

Blog

Suspend Laptop When Lid Closed Ubuntu 12.04 LTS in Lenovo Z500

I guess this is a bug. Although suspend is set in Power settings, it doesn’t suspend the laptop when its lid is closed. To solve it, I’ve found a workaround on web. Here is how you implement it: Generate folder if it’s not present: sudo mkdir /etc/acpi/local Set its permissions: sudo chmod 755 /etc/acpi/local Generate the script: sudo gedit /etc/acpi/local/lid.sh.post Copy-paste the following: #!/bin/bash grep -q closed /proc/acpi/button/lid/*/state if [ $?

Blog

Install Ensembl API and BioPerl 1.2.3 on Your System

I’m going to work on a project that requires lots of queries on Ensembl databases so I wanted to install Ensembl API to begin with. Since it’s programmed in Perl, I will be using Perl in this project. There is a nice tutorial on Ensembl website for API installation. Here I will describe some steps. 1. Download the API and BioPerl Go to Ensembl FTP ftp://ftp.ensembl.org/pub/ and download “ensembl-api.tar.gz” or click here

Blog

Generating 2D Images of Molecules from MOL Files using Open Babel

Open Babel is a tool to work with molecular data in any way from converting one type to another, analyzing, molecular modeling, etc. It also has a method to convert MOL files into SVG or PNG images to represent them as 2D images. Install Open Babel in Linux as following or go to their page for different operating systems 1sudo apt-get install openbabel Open Babel uses the same command to generate SVG or PNG and recognizes the file format using the given filename to as the output option -O.

Blog

Generating 2D Images of Molecules from MOL Files using Open Babel

Open Babel is a tool to work with molecular data in any way from converting one type to another, analyzing, molecular modeling, etc. It also has a method to convert MOL files into SVG or PNG images to represent them as 2D images. Install Open Babel in Linux as following or go to their page for different operating systems 1sudo apt-get install openbabel Open Babel uses the same command to generate SVG or PNG and recognizes the file format using the given filename to as the output option -O.

Blog

Generating 2D Images of Molecules from MOL Files using Open Babel

Open Babel is a tool to work with molecular data in any way from converting one type to another, analyzing, molecular modeling, etc. It also has a method to convert MOL files into SVG or PNG images to represent them as 2D images. Install Open Babel in Linux as following or go to their page for different operating systems 1sudo apt-get install openbabel Open Babel uses the same command to generate SVG or PNG and recognizes the file format using the given filename to as the output option -O.

Blog

Generating 2D Images of Molecules from MOL Files using Open Babel

Open Babel is a tool to work with molecular data in any way from converting one type to another, analyzing, molecular modeling, etc. It also has a method to convert MOL files into SVG or PNG images to represent them as 2D images. Install Open Babel in Linux as following or go to their page for different operating systems 1sudo apt-get install openbabel Open Babel uses the same command to generate SVG or PNG and recognizes the file format using the given filename to as the output option -O.

Blog

Generating 2D Images of Molecules from MOL Files using Open Babel

Open Babel is a tool to work with molecular data in any way from converting one type to another, analyzing, molecular modeling, etc. It also has a method to convert MOL files into SVG or PNG images to represent them as 2D images. Install Open Babel in Linux as following or go to their page for different operating systems 1sudo apt-get install openbabel Open Babel uses the same command to generate SVG or PNG and recognizes the file format using the given filename to as the output option -O.

Blog

Obtaining Molecule Description using Open Babel / PyBel

Open Babel is a great tool to analyze and investigate molecular data (.MOL, .SDF files). Its Python API is particularly very nice if you are familiar with Python already. In this post, I’ll demonstrate how you can obtain molecule description such as molecular weight, HBA, HBD, logP, formula, number of chiral centers using PyBel. Installation 1$ sudo apt-get install openbabel python-openbabel Usage for MW, HBA, HBD, logP After reading .MOL file, we need to use calcdesc method with descnames argument for getting the descriptions.

Blog

Simple Way of Python's subprocess.Popen with a Timeout Option

subprocess module in Python provides us a variety of methods to start a process from a Python script. We may use these methods to run an external commands / programs, collect their output and manage them. An example use of it might be as following: 1from subprocess import Popen, PIPE 2 3 4p = Popen(['ls', '-l'], stdout=PIPE, stderr=PIPE) 5stdout, stderr = p.communicate() 6print stdout, stderr These lines can be used to run ls -l command in Terminal and collect the output (standard output and standard error) in stdout and stderr variables using communicate method defined in the process.

Blog

SRS'de Coklu Arama Yapmak

Inceleme yapan scriptin en son hali, oncekilere gore daha fazla okuma inceliyor oldugu icin her okuma icin SRS uzerinde isim aramak oldukca zaman alan bir islemdi. Oyle ki, son inceleme 4 gun surdu. Bunu azaltmak icin inceleme scriptini tamamen degistirdim. Oncelikle her zaman oldugu gibi esik degerini gecenleri aliyor ama direkt bunlarin ID numaralarini bir dizide (array) listeliyorum. Daha sonra bu listenin herbir elemanini boru karakteri ile ayirarak bir string haline getiriyorum.

Blog

Simple Way of Python's subprocess.Popen with a Timeout Option

subprocess module in Python provides us a variety of methods to start a process from a Python script. We may use these methods to run an external commands / programs, collect their output and manage them. An example use of it might be as following: 1from subprocess import Popen, PIPE 2 3 4p = Popen(['ls', '-l'], stdout=PIPE, stderr=PIPE) 5stdout, stderr = p.communicate() 6print stdout, stderr These lines can be used to run ls -l command in Terminal and collect the output (standard output and standard error) in stdout and stderr variables using communicate method defined in the process.

Blog

Simple Way of Python's subprocess.Popen with a Timeout Option

subprocess module in Python provides us a variety of methods to start a process from a Python script. We may use these methods to run an external commands / programs, collect their output and manage them. An example use of it might be as following: 1from subprocess import Popen, PIPE 2 3 4p = Popen(['ls', '-l'], stdout=PIPE, stderr=PIPE) 5stdout, stderr = p.communicate() 6print stdout, stderr These lines can be used to run ls -l command in Terminal and collect the output (standard output and standard error) in stdout and stderr variables using communicate method defined in the process.

Blog

Simple Way of Python's subprocess.Popen with a Timeout Option

subprocess module in Python provides us a variety of methods to start a process from a Python script. We may use these methods to run an external commands / programs, collect their output and manage them. An example use of it might be as following: 1from subprocess import Popen, PIPE 2 3 4p = Popen(['ls', '-l'], stdout=PIPE, stderr=PIPE) 5stdout, stderr = p.communicate() 6print stdout, stderr These lines can be used to run ls -l command in Terminal and collect the output (standard output and standard error) in stdout and stderr variables using communicate method defined in the process.

Blog

Running StarCluster Load Balancer in Background in Linux

StarCluster loadbalancer command is regularly monitors the jobs in queue and it adds or removes nodes to the master node that is created beforehand to effectively complete the queue. To run in in the background without killing it when the terminal closed: nohup starcluster loadbalance cluster_name >loadbalance.log 2>&1 & or to keep standard output and standard error logs separate: nohup starcluster loadbalance cluster_name > loadbalance.access.log 2> loadbalance.error.log & This will start the process and output the process ID (PID) which can be used to check or kill it.

Blog

Running StarCluster Load Balancer in Background in Linux

StarCluster loadbalancer command is regularly monitors the jobs in queue and it adds or removes nodes to the master node that is created beforehand to effectively complete the queue. To run in in the background without killing it when the terminal closed: nohup starcluster loadbalance cluster_name >loadbalance.log 2>&1 & or to keep standard output and standard error logs separate: nohup starcluster loadbalance cluster_name > loadbalance.access.log 2> loadbalance.error.log & This will start the process and output the process ID (PID) which can be used to check or kill it.

Blog

Running StarCluster Load Balancer in Background in Linux

StarCluster loadbalancer command is regularly monitors the jobs in queue and it adds or removes nodes to the master node that is created beforehand to effectively complete the queue. To run in in the background without killing it when the terminal closed: nohup starcluster loadbalance cluster_name >loadbalance.log 2>&1 & or to keep standard output and standard error logs separate: nohup starcluster loadbalance cluster_name > loadbalance.access.log 2> loadbalance.error.log & This will start the process and output the process ID (PID) which can be used to check or kill it.

Blog

Running StarCluster Load Balancer in Background in Linux

StarCluster loadbalancer command is regularly monitors the jobs in queue and it adds or removes nodes to the master node that is created beforehand to effectively complete the queue. To run in in the background without killing it when the terminal closed: nohup starcluster loadbalance cluster_name >loadbalance.log 2>&1 & or to keep standard output and standard error logs separate: nohup starcluster loadbalance cluster_name > loadbalance.access.log 2> loadbalance.error.log & This will start the process and output the process ID (PID) which can be used to check or kill it.

Blog

Running Script on Cluster (StarCluster)

Start a new cluster with the configuration file you modified: starcluster start cluster_name Send the script to the running cluster: starcluster put cluster_name myscr.csh /home/myscr.csh Run it using source: starcluster sshmaster cluster_name "source /home/myscr.csh >& /home/myscr.log"

Blog

Change Apache’s Default User www-data or Home Directory /var/www/

I was getting errors from StarCluster run due to not being able to find .starcluster directory in /var/www/. This directory has config file and log directories for StarCluster so without it, it can’t run. To solve the issue, I set up my own user in Apache envvars instead of www-data which also changes default home directory to mine. Edit following file with super user permissions: sudo nano /etc/apache2/envvars Enter your username to following lines and save:

Blog

Getting Started with Your AWS Instance and Installing and Setting Up an Apache Server

Update and upgrade packages: sudo apt-get update sudo apt-get upgrade Install Apache server: sudo apt-get install apache2 Set up a root folder in home folder and create an index file for testing: mkdir ~/www echo ‘Hello, World!’ > ~/www/index.html Set up your virtual host: sudo cp /etc/apache2/sites-available/000-default.conf /etc/apache2/sites-available/000-www.conf sudo nano /etc/apache2/sites-available/000-www.conf Modify DocumentRoot to point your “www” folder in home folder (e.g. /home/ubuntu/www) And add following lines after DocumentRoot line:

Blog

Install Apache2, PHP5, MySQL & phpMyAdmin on Ubuntu 12.04

First, install apache2: sudo apt-get install apache2 Then, for it to work: sudo service apache2 restart For custom www folder: sudo cp /etc/apache2/sites-available/default /etc/apache2/sites-available/www gksudo gedit /etc/apache2/sites-available/www Change DocumentRoot and Directory directive to point to new location. For example, /home/user/www/ Save and see (link here clean URLs not working Laravel 4) Make www default and disable default: sudo a2dissite default && sudo a2ensite www sudo service apache2 restart Create new file in www

Blog

Change Apache’s Default User www-data or Home Directory /var/www/

I was getting errors from StarCluster run due to not being able to find .starcluster directory in /var/www/. This directory has config file and log directories for StarCluster so without it, it can’t run. To solve the issue, I set up my own user in Apache envvars instead of www-data which also changes default home directory to mine. Edit following file with super user permissions: sudo nano /etc/apache2/envvars Enter your username to following lines and save:

Blog

Change Apache’s Default User www-data or Home Directory /var/www/

I was getting errors from StarCluster run due to not being able to find .starcluster directory in /var/www/. This directory has config file and log directories for StarCluster so without it, it can’t run. To solve the issue, I set up my own user in Apache envvars instead of www-data which also changes default home directory to mine. Edit following file with super user permissions: sudo nano /etc/apache2/envvars Enter your username to following lines and save:

Blog

Transfer Files to Your AWS S3 Storage in Linux

Uploading files to an AWS S3 storage can be difficult through the GUI with many files included or if your files are in a server where you don’t have a GUI option. Use following tool to transfer files to an S3 bucket. Download following tool and install: cd ~/Downloads git clone https://github.com/s3tools/s3cmd.git cd s3cmd/ sudo python setup.py install Next, execute following to create a configuration file to connect to your AWS S3 account:

Blog

Running Script on Cluster (StarCluster)

Start a new cluster with the configuration file you modified: starcluster start cluster_name Send the script to the running cluster: starcluster put cluster_name myscr.csh /home/myscr.csh Run it using source: starcluster sshmaster cluster_name "source /home/myscr.csh >& /home/myscr.log"

Blog

Uploading Files to AWS using SSH/SCP

Here is a small command for uploading files to AWS through SSH’s command scp (secure copy). scp -i path/to/your/key-pairs/file path/to/file/you/want/to/upload ubuntu@PUBLIC_DNS:path/to/the/destination

Blog

Configuring Mezzanine for Apache server & mod_wsgi in AWS

Install [Mezzanine]({% post_url 2015-05-01-how-to-install-mezzanine-on-ubuntulinux-mint %}), [Apache server]({% post_url 2015-05-08-getting-started-with-your-aws-instance-and %}) and mod_wsgi: sudo apt-get install libapache2-mod-wsgi sudo a2enmod wsgi Set up a MySQL database for your Mezzanine project Read [my post on how to set up a MySQL database for a Mezzanine project]({% post_url 2015-05-09-how-to-set-up-a-mysql-database-for-a-mezzanine %}) Collect static files: python manage.py collectstatic Configure your Apache server configuration for the project like following: WSGIPythonPath /home/ubuntu/www/mezzanine-project <VirtualHost *:80> #ServerName example.com ServerAdmin admin@example.com DocumentRoot /home/ubuntu/www/mezzanine-project WSGIScriptAlias / /home/ubuntu/www/mezzanine-project/wsgi.

Blog

Setting Up Mezzanine Projects in AWS

Go to EC2 management console, Security groups and add a Custom TCP inbound rule with port 8000. Select “Anywhere” from the list. Then follow [this to install Mezzanine]({% post_url 2015-05-01-how-to-install-mezzanine-on-ubuntulinux-mint %}) Above tutorial is also explains setting up a site record. Mezzanine default site record is 127.0.0.1:8000 which should be 0.0.0.0:8000 in our case. So, enter 0.0.0.0:8000 when you’re asked to enter a site record when you ru python manage.py createdb Also, you might still need to provide this site record while running the development server:

Blog

Getting Started with Your AWS Instance and Installing and Setting Up an Apache Server

Update and upgrade packages: sudo apt-get update sudo apt-get upgrade Install Apache server: sudo apt-get install apache2 Set up a root folder in home folder and create an index file for testing: mkdir ~/www echo ‘Hello, World!’ > ~/www/index.html Set up your virtual host: sudo cp /etc/apache2/sites-available/000-default.conf /etc/apache2/sites-available/000-www.conf sudo nano /etc/apache2/sites-available/000-www.conf Modify DocumentRoot to point your “www” folder in home folder (e.g. /home/ubuntu/www) And add following lines after DocumentRoot line:

Blog

AWS Start an Instance and Connect to it

Go to EC2 management console Create a new key-pair if necessary and download it Launch an instance Add HTTP security group for web applications over HTTP Get public DNS Change permissions on key-pair file: 1chmod 400 path/to/your/file.pem Connect: 1ssh -i path/to/your/file.pem ubuntu@PUBLIC_DNS Note: ubuntu is for connecting an Ubuntu 64 bit instance. It’s different for others

Blog

Transfer Files to Your AWS S3 Storage in Linux

Uploading files to an AWS S3 storage can be difficult through the GUI with many files included or if your files are in a server where you don’t have a GUI option. Use following tool to transfer files to an S3 bucket. Download following tool and install: cd ~/Downloads git clone https://github.com/s3tools/s3cmd.git cd s3cmd/ sudo python setup.py install Next, execute following to create a configuration file to connect to your AWS S3 account:

Blog

Running Script on Cluster (StarCluster)

Start a new cluster with the configuration file you modified: starcluster start cluster_name Send the script to the running cluster: starcluster put cluster_name myscr.csh /home/myscr.csh Run it using source: starcluster sshmaster cluster_name "source /home/myscr.csh >& /home/myscr.log"

Blog

Uploading Files to AWS using SSH/SCP

Here is a small command for uploading files to AWS through SSH’s command scp (secure copy). scp -i path/to/your/key-pairs/file path/to/file/you/want/to/upload ubuntu@PUBLIC_DNS:path/to/the/destination

Blog

Configuring Mezzanine for Apache server & mod_wsgi in AWS

Install [Mezzanine]({% post_url 2015-05-01-how-to-install-mezzanine-on-ubuntulinux-mint %}), [Apache server]({% post_url 2015-05-08-getting-started-with-your-aws-instance-and %}) and mod_wsgi: sudo apt-get install libapache2-mod-wsgi sudo a2enmod wsgi Set up a MySQL database for your Mezzanine project Read [my post on how to set up a MySQL database for a Mezzanine project]({% post_url 2015-05-09-how-to-set-up-a-mysql-database-for-a-mezzanine %}) Collect static files: python manage.py collectstatic Configure your Apache server configuration for the project like following: WSGIPythonPath /home/ubuntu/www/mezzanine-project <VirtualHost *:80> #ServerName example.com ServerAdmin admin@example.com DocumentRoot /home/ubuntu/www/mezzanine-project WSGIScriptAlias / /home/ubuntu/www/mezzanine-project/wsgi.

Blog

Setting Up Mezzanine Projects in AWS

Go to EC2 management console, Security groups and add a Custom TCP inbound rule with port 8000. Select “Anywhere” from the list. Then follow [this to install Mezzanine]({% post_url 2015-05-01-how-to-install-mezzanine-on-ubuntulinux-mint %}) Above tutorial is also explains setting up a site record. Mezzanine default site record is 127.0.0.1:8000 which should be 0.0.0.0:8000 in our case. So, enter 0.0.0.0:8000 when you’re asked to enter a site record when you ru python manage.py createdb Also, you might still need to provide this site record while running the development server:

Blog

Getting Started with Your AWS Instance and Installing and Setting Up an Apache Server

Update and upgrade packages: sudo apt-get update sudo apt-get upgrade Install Apache server: sudo apt-get install apache2 Set up a root folder in home folder and create an index file for testing: mkdir ~/www echo ‘Hello, World!’ > ~/www/index.html Set up your virtual host: sudo cp /etc/apache2/sites-available/000-default.conf /etc/apache2/sites-available/000-www.conf sudo nano /etc/apache2/sites-available/000-www.conf Modify DocumentRoot to point your “www” folder in home folder (e.g. /home/ubuntu/www) And add following lines after DocumentRoot line:

Blog

AWS Start an Instance and Connect to it

Go to EC2 management console Create a new key-pair if necessary and download it Launch an instance Add HTTP security group for web applications over HTTP Get public DNS Change permissions on key-pair file: 1chmod 400 path/to/your/file.pem Connect: 1ssh -i path/to/your/file.pem ubuntu@PUBLIC_DNS Note: ubuntu is for connecting an Ubuntu 64 bit instance. It’s different for others

Blog

Transfer Files to Your AWS S3 Storage in Linux

Uploading files to an AWS S3 storage can be difficult through the GUI with many files included or if your files are in a server where you don’t have a GUI option. Use following tool to transfer files to an S3 bucket. Download following tool and install: cd ~/Downloads git clone https://github.com/s3tools/s3cmd.git cd s3cmd/ sudo python setup.py install Next, execute following to create a configuration file to connect to your AWS S3 account:

Blog

Transfer Files to Your AWS S3 Storage in Linux

Uploading files to an AWS S3 storage can be difficult through the GUI with many files included or if your files are in a server where you don’t have a GUI option. Use following tool to transfer files to an S3 bucket. Download following tool and install: cd ~/Downloads git clone https://github.com/s3tools/s3cmd.git cd s3cmd/ sudo python setup.py install Next, execute following to create a configuration file to connect to your AWS S3 account:

Blog

ImportError: Reportlab Version 2.1+ is needed

Little bug in xhtml2pdf version 0.0.5. To fix: $ sudo nano /usr/local/lib/python2.7/dist-packages/xhtml2pdf/util.py Change the following lines: 1if not (reportlab.Version[0] == "2" and reportlab.Version[2] >= "1"): 2 raise ImportError("Reportlab Version 2.1+ is needed!") 3 4REPORTLAB22 = (reportlab.Version[0] == "2" and reportlab.Version[2] >= "2") With these lines: 1if not (reportlab.Version[:3] >= "2.1"): 2 raise ImportError("Reportlab Version 2.1+ is needed!") 3 4REPORTLAB22 = (reportlab.Version[:3] >= "2.1")

Blog

Django Migrations Table Already Exists Fix

Fix this issue by faking the migrations: python manage.py migrate –fake <appname> Taken from this SO answer

Blog

Mezzanine BS Banners Translation with django-modeltranslation

Mezzanine BS Banners is a nice app for implementing Bootstrap 3 banners/sliders to your Mezzanine projects. The Banners model in BS Banners app has a title and its stacked inline Slides model has title and content for translation. After [installing and setting up Django/Mezzanine translations]({% post_url 2015-07-01-djangomezzanine-content-translation-for-mezzanine %}): Create a translation.py inside your Mezzanine project or your custom theme/skin application and copy/paste following lines: 1from modeltranslation.translator import translator 2from mezzanine.core.translation import TranslatedSlugged, TranslatedRichText 3from mezzanine_bsbanners.

Blog

Django/Mezzanine Content Translation for Mezzanine Built-in Applications

As Mezzanine comes with additional Django applications such as pages, galleries and to translate their content, Mezzanine supports django-modeltranslation integration. Install django-modeltranslation: pip install django-modeltranslation Add following to the INSTALLED_APPS in settings.py: 1"modeltranslation", And following in settings.py: 1USE_MODELTRANSLATION = True Also, move mezzanine.pages to the top of other Mezzanine apps in INSTALLED_APPS in settings.py like so: 1"mezzanine.pages", 2"mezzanine.boot", 3"mezzanine.conf", 4"mezzanine.core", 5"mezzanine.generic", 6"mezzanine.blog", 7"mezzanine.forms", 8"mezzanine.galleries", 9"mezzanine.twitter", 10"mezzanine.accounts", 11"mezzanine.mobile", Run following to create fields in database tables for translations:

Blog

Setting Up Templates and Python Scripts for Translation

Templates need following template tag: 1{% raw %}{% load i18n %}{% endraw %} Then, wrapping any text with 1{% raw %}{% trans "TEXT" %}{% endraw %} will make it translatable via Rosetta Django application In Python scripts, you need to import following library: from django.utils.translation import ugettext_lazy as _ Then wrapping any text with 1_('TEXT') will make it translatable.

Blog

Django Rosetta Translations for Django Applications

Make a directory called locale/ under the application directory: cd app_name mkdir locale Add the folder in LOCAL_PATHS dictionary in settings.py: 1LOCALE_PATHS = ( 2 os.path.join(PROJECT_ROOT, 'app_name', 'locale/'), 3) Run the following command to create PO translation file for the application: python ../manage.py makemessages -l tr -e html,py,txt python ../manage.py compilemessages Option -l is for language, it should match your definition in settings.py: 1LANGUAGES = ( 2 ('en' _('English')), 3 ('tr' _('Turkish')), 4 ('it' _('Italian')), 5) Repeat the last step for all languages and the go to Rosetta URL to translate.

Blog

Django Rosetta Installation

Install SciPy: sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose Install pymongo and nltk: sudo pip install pymongo sudo pip install nltk Install Python MySQLdb: sudo apt-get install python-mysqldb Install Rosetta: sudo pip install django-rosetta Add following into INSTALLED_APPS in settings.py: 1"rosetta", Add following into urls.py: url(r’^translations/’, include(‘rosetta.urls’)), To also allow language prefixes , change patters to i18n_patterns in urls.py: 1urlpatterns += i18n_patterns( 2 ... 3)

Blog

Errno 13 Permission denied Django File Uploads

Run following command to give www-data permissions to static folder and all its content: cd path/to/your/django/project sudo chown -R www-data:www-data static/ Do this in your production server

Blog

Configuring Mezzanine for Apache server & mod_wsgi in AWS

Install [Mezzanine]({% post_url 2015-05-01-how-to-install-mezzanine-on-ubuntulinux-mint %}), [Apache server]({% post_url 2015-05-08-getting-started-with-your-aws-instance-and %}) and mod_wsgi: sudo apt-get install libapache2-mod-wsgi sudo a2enmod wsgi Set up a MySQL database for your Mezzanine project Read [my post on how to set up a MySQL database for a Mezzanine project]({% post_url 2015-05-09-how-to-set-up-a-mysql-database-for-a-mezzanine %}) Collect static files: python manage.py collectstatic Configure your Apache server configuration for the project like following: WSGIPythonPath /home/ubuntu/www/mezzanine-project <VirtualHost *:80> #ServerName example.com ServerAdmin admin@example.com DocumentRoot /home/ubuntu/www/mezzanine-project WSGIScriptAlias / /home/ubuntu/www/mezzanine-project/wsgi.

Blog

How to Set Up a MySQL Database for a Mezzanine Project

Install MySQL server and python-mysqldb package: sudo apt-get install mysql-server sudo apt-get install python-mysqldb Run MySQL: mysql -u root -p Create a database: mysql> create database mezzanine_project; Confirm it: mysql> show databases; Exit: mysql> exit Configure local_settings.py: cd path/to/your/mezzanine/projectnano local_settings.py Like following: 1DATABASES = { 2 "default": { 3 "ENGINE": "django.db.backends.mysql", 4 "NAME": "mezzanine_project", 5 "USER": "root", 6 "PASSWORD": "123456", 7 "HOST": "", 8 "PORT": "", 9 } 10 } Note: Replace your password

Blog

How to Install Mezzanine on Ubuntu/Linux Mint [Complete Guide]

Mezzanine is a CMS application built on Django web framework. The installation steps are easy but your environment may not just suitable enough for it work without a problem. So, here I’m going to describe complete installation from scratch on a virtual environment. First of all, install virtualenv: $ sudo apt-get install python-virtualenv Then, create a virtual environment: $ virtualenv testenv And, activate it: $ cd testenv $ source bin/activate

Blog

How to Clear (or Drop) DB Table of A Django App

Let’s say you created a Django app and ran python manage.py syncdb and created its table. Everytime you make a change in the table, you’ll need to drop that table and run python manage.py syncdb again to update. And how you drop a table of a Django app: $ python manage.py sqlclear app_name | python manage.py dbshell Drop tables of an app with migrations (Django >= 1.8): $ python manage.py migrate appname zero Recreate all the tables:

Blog

ImportError: Reportlab Version 2.1+ is needed

Little bug in xhtml2pdf version 0.0.5. To fix: $ sudo nano /usr/local/lib/python2.7/dist-packages/xhtml2pdf/util.py Change the following lines: 1if not (reportlab.Version[0] == "2" and reportlab.Version[2] >= "1"): 2 raise ImportError("Reportlab Version 2.1+ is needed!") 3 4REPORTLAB22 = (reportlab.Version[0] == "2" and reportlab.Version[2] >= "2") With these lines: 1if not (reportlab.Version[:3] >= "2.1"): 2 raise ImportError("Reportlab Version 2.1+ is needed!") 3 4REPORTLAB22 = (reportlab.Version[:3] >= "2.1")

Blog

ImportError: Reportlab Version 2.1+ is needed

Little bug in xhtml2pdf version 0.0.5. To fix: $ sudo nano /usr/local/lib/python2.7/dist-packages/xhtml2pdf/util.py Change the following lines: 1if not (reportlab.Version[0] == "2" and reportlab.Version[2] >= "1"): 2 raise ImportError("Reportlab Version 2.1+ is needed!") 3 4REPORTLAB22 = (reportlab.Version[0] == "2" and reportlab.Version[2] >= "2") With these lines: 1if not (reportlab.Version[:3] >= "2.1"): 2 raise ImportError("Reportlab Version 2.1+ is needed!") 3 4REPORTLAB22 = (reportlab.Version[:3] >= "2.1")

Blog

Mezzanine BS Banners Translation with django-modeltranslation

Mezzanine BS Banners is a nice app for implementing Bootstrap 3 banners/sliders to your Mezzanine projects. The Banners model in BS Banners app has a title and its stacked inline Slides model has title and content for translation. After [installing and setting up Django/Mezzanine translations]({% post_url 2015-07-01-djangomezzanine-content-translation-for-mezzanine %}): Create a translation.py inside your Mezzanine project or your custom theme/skin application and copy/paste following lines: 1from modeltranslation.translator import translator 2from mezzanine.core.translation import TranslatedSlugged, TranslatedRichText 3from mezzanine_bsbanners.

Blog

Django/Mezzanine Content Translation for Mezzanine Built-in Applications

As Mezzanine comes with additional Django applications such as pages, galleries and to translate their content, Mezzanine supports django-modeltranslation integration. Install django-modeltranslation: pip install django-modeltranslation Add following to the INSTALLED_APPS in settings.py: 1"modeltranslation", And following in settings.py: 1USE_MODELTRANSLATION = True Also, move mezzanine.pages to the top of other Mezzanine apps in INSTALLED_APPS in settings.py like so: 1"mezzanine.pages", 2"mezzanine.boot", 3"mezzanine.conf", 4"mezzanine.core", 5"mezzanine.generic", 6"mezzanine.blog", 7"mezzanine.forms", 8"mezzanine.galleries", 9"mezzanine.twitter", 10"mezzanine.accounts", 11"mezzanine.mobile", Run following to create fields in database tables for translations:

Blog

Configuring Mezzanine for Apache server & mod_wsgi in AWS

Install [Mezzanine]({% post_url 2015-05-01-how-to-install-mezzanine-on-ubuntulinux-mint %}), [Apache server]({% post_url 2015-05-08-getting-started-with-your-aws-instance-and %}) and mod_wsgi: sudo apt-get install libapache2-mod-wsgi sudo a2enmod wsgi Set up a MySQL database for your Mezzanine project Read [my post on how to set up a MySQL database for a Mezzanine project]({% post_url 2015-05-09-how-to-set-up-a-mysql-database-for-a-mezzanine %}) Collect static files: python manage.py collectstatic Configure your Apache server configuration for the project like following: WSGIPythonPath /home/ubuntu/www/mezzanine-project <VirtualHost *:80> #ServerName example.com ServerAdmin admin@example.com DocumentRoot /home/ubuntu/www/mezzanine-project WSGIScriptAlias / /home/ubuntu/www/mezzanine-project/wsgi.

Blog

How to Set Up a MySQL Database for a Mezzanine Project

Install MySQL server and python-mysqldb package: sudo apt-get install mysql-server sudo apt-get install python-mysqldb Run MySQL: mysql -u root -p Create a database: mysql> create database mezzanine_project; Confirm it: mysql> show databases; Exit: mysql> exit Configure local_settings.py: cd path/to/your/mezzanine/projectnano local_settings.py Like following: 1DATABASES = { 2 "default": { 3 "ENGINE": "django.db.backends.mysql", 4 "NAME": "mezzanine_project", 5 "USER": "root", 6 "PASSWORD": "123456", 7 "HOST": "", 8 "PORT": "", 9 } 10 } Note: Replace your password

Blog

Setting Up Mezzanine Projects in AWS

Go to EC2 management console, Security groups and add a Custom TCP inbound rule with port 8000. Select “Anywhere” from the list. Then follow [this to install Mezzanine]({% post_url 2015-05-01-how-to-install-mezzanine-on-ubuntulinux-mint %}) Above tutorial is also explains setting up a site record. Mezzanine default site record is 127.0.0.1:8000 which should be 0.0.0.0:8000 in our case. So, enter 0.0.0.0:8000 when you’re asked to enter a site record when you ru python manage.py createdb Also, you might still need to provide this site record while running the development server:

Blog

How to Install Mezzanine on Ubuntu/Linux Mint [Complete Guide]

Mezzanine is a CMS application built on Django web framework. The installation steps are easy but your environment may not just suitable enough for it work without a problem. So, here I’m going to describe complete installation from scratch on a virtual environment. First of all, install virtualenv: $ sudo apt-get install python-virtualenv Then, create a virtual environment: $ virtualenv testenv And, activate it: $ cd testenv $ source bin/activate

Blog

ImportError: Reportlab Version 2.1+ is needed

Little bug in xhtml2pdf version 0.0.5. To fix: $ sudo nano /usr/local/lib/python2.7/dist-packages/xhtml2pdf/util.py Change the following lines: 1if not (reportlab.Version[0] == "2" and reportlab.Version[2] >= "1"): 2 raise ImportError("Reportlab Version 2.1+ is needed!") 3 4REPORTLAB22 = (reportlab.Version[0] == "2" and reportlab.Version[2] >= "2") With these lines: 1if not (reportlab.Version[:3] >= "2.1"): 2 raise ImportError("Reportlab Version 2.1+ is needed!") 3 4REPORTLAB22 = (reportlab.Version[:3] >= "2.1")

Blog

ImportError: Reportlab Version 2.1+ is needed

Little bug in xhtml2pdf version 0.0.5. To fix: $ sudo nano /usr/local/lib/python2.7/dist-packages/xhtml2pdf/util.py Change the following lines: 1if not (reportlab.Version[0] == "2" and reportlab.Version[2] >= "1"): 2 raise ImportError("Reportlab Version 2.1+ is needed!") 3 4REPORTLAB22 = (reportlab.Version[0] == "2" and reportlab.Version[2] >= "2") With these lines: 1if not (reportlab.Version[:3] >= "2.1"): 2 raise ImportError("Reportlab Version 2.1+ is needed!") 3 4REPORTLAB22 = (reportlab.Version[:3] >= "2.1")

Blog

Django Migrations Table Already Exists Fix

Fix this issue by faking the migrations: python manage.py migrate –fake <appname> Taken from this SO answer

Blog

Django Migrations Table Already Exists Fix

Fix this issue by faking the migrations: python manage.py migrate –fake <appname> Taken from this SO answer

Blog

Mezzanine BS Banners Translation with django-modeltranslation

Mezzanine BS Banners is a nice app for implementing Bootstrap 3 banners/sliders to your Mezzanine projects. The Banners model in BS Banners app has a title and its stacked inline Slides model has title and content for translation. After [installing and setting up Django/Mezzanine translations]({% post_url 2015-07-01-djangomezzanine-content-translation-for-mezzanine %}): Create a translation.py inside your Mezzanine project or your custom theme/skin application and copy/paste following lines: 1from modeltranslation.translator import translator 2from mezzanine.core.translation import TranslatedSlugged, TranslatedRichText 3from mezzanine_bsbanners.

Blog

Mezzanine BS Banners Translation with django-modeltranslation

Mezzanine BS Banners is a nice app for implementing Bootstrap 3 banners/sliders to your Mezzanine projects. The Banners model in BS Banners app has a title and its stacked inline Slides model has title and content for translation. After [installing and setting up Django/Mezzanine translations]({% post_url 2015-07-01-djangomezzanine-content-translation-for-mezzanine %}): Create a translation.py inside your Mezzanine project or your custom theme/skin application and copy/paste following lines: 1from modeltranslation.translator import translator 2from mezzanine.core.translation import TranslatedSlugged, TranslatedRichText 3from mezzanine_bsbanners.

Blog

Mezzanine BS Banners Translation with django-modeltranslation

Mezzanine BS Banners is a nice app for implementing Bootstrap 3 banners/sliders to your Mezzanine projects. The Banners model in BS Banners app has a title and its stacked inline Slides model has title and content for translation. After [installing and setting up Django/Mezzanine translations]({% post_url 2015-07-01-djangomezzanine-content-translation-for-mezzanine %}): Create a translation.py inside your Mezzanine project or your custom theme/skin application and copy/paste following lines: 1from modeltranslation.translator import translator 2from mezzanine.core.translation import TranslatedSlugged, TranslatedRichText 3from mezzanine_bsbanners.

Blog

Mezzanine BS Banners Translation with django-modeltranslation

Mezzanine BS Banners is a nice app for implementing Bootstrap 3 banners/sliders to your Mezzanine projects. The Banners model in BS Banners app has a title and its stacked inline Slides model has title and content for translation. After [installing and setting up Django/Mezzanine translations]({% post_url 2015-07-01-djangomezzanine-content-translation-for-mezzanine %}): Create a translation.py inside your Mezzanine project or your custom theme/skin application and copy/paste following lines: 1from modeltranslation.translator import translator 2from mezzanine.core.translation import TranslatedSlugged, TranslatedRichText 3from mezzanine_bsbanners.

Blog

Django/Mezzanine Content Translation for Mezzanine Built-in Applications

As Mezzanine comes with additional Django applications such as pages, galleries and to translate their content, Mezzanine supports django-modeltranslation integration. Install django-modeltranslation: pip install django-modeltranslation Add following to the INSTALLED_APPS in settings.py: 1"modeltranslation", And following in settings.py: 1USE_MODELTRANSLATION = True Also, move mezzanine.pages to the top of other Mezzanine apps in INSTALLED_APPS in settings.py like so: 1"mezzanine.pages", 2"mezzanine.boot", 3"mezzanine.conf", 4"mezzanine.core", 5"mezzanine.generic", 6"mezzanine.blog", 7"mezzanine.forms", 8"mezzanine.galleries", 9"mezzanine.twitter", 10"mezzanine.accounts", 11"mezzanine.mobile", Run following to create fields in database tables for translations:

Blog

Setting Up Templates and Python Scripts for Translation

Templates need following template tag: 1{% raw %}{% load i18n %}{% endraw %} Then, wrapping any text with 1{% raw %}{% trans "TEXT" %}{% endraw %} will make it translatable via Rosetta Django application In Python scripts, you need to import following library: from django.utils.translation import ugettext_lazy as _ Then wrapping any text with 1_('TEXT') will make it translatable.

Blog

Django Rosetta Installation

Install SciPy: sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose Install pymongo and nltk: sudo pip install pymongo sudo pip install nltk Install Python MySQLdb: sudo apt-get install python-mysqldb Install Rosetta: sudo pip install django-rosetta Add following into INSTALLED_APPS in settings.py: 1"rosetta", Add following into urls.py: url(r’^translations/’, include(‘rosetta.urls’)), To also allow language prefixes , change patters to i18n_patterns in urls.py: 1urlpatterns += i18n_patterns( 2 ... 3)

Blog

Convert XLS/XLSX to CSV in Bash

In most of the modern Linux distributions, Libre Office is available and it can be used to convert XLS or XLSX file(s) to CSV file(s) in bash. For XLS file(s): 1for i in *.xls; do libreoffice --headless --convert-to csv "$i"; done For XLSX file(s): 1for i in *.xlsx; do libreoffice --headless --convert-to csv "$i"; done You may get following warning but it still works fine: 1javaldx: Could not find a Java Runtime Environment!

Blog

Running Script on Cluster (StarCluster)

Start a new cluster with the configuration file you modified: starcluster start cluster_name Send the script to the running cluster: starcluster put cluster_name myscr.csh /home/myscr.csh Run it using source: starcluster sshmaster cluster_name "source /home/myscr.csh >& /home/myscr.log"

Blog

How To Make A File or Script Executable in Ubuntu

Start terminal CTRL + Alt + T can be used (or just go to Dash Home and type Terminal): Run this command below: sudo chmod +x /path/to/your/file Source

Blog

Convert XLS/XLSX to CSV in Bash

In most of the modern Linux distributions, Libre Office is available and it can be used to convert XLS or XLSX file(s) to CSV file(s) in bash. For XLS file(s): 1for i in *.xls; do libreoffice --headless --convert-to csv "$i"; done For XLSX file(s): 1for i in *.xlsx; do libreoffice --headless --convert-to csv "$i"; done You may get following warning but it still works fine: 1javaldx: Could not find a Java Runtime Environment!

Blog

Data Preprocessing I for Salmon Project

Since we’ll be using R for most of the analyses, we converted XLS data file to CSV using MS Office Excel 2013 and then we had to fix several lines using Sublime Text 2 because three colums in these lines were left unquoted which later created a problem reading in RStudio. The data contains phosphorylation data of 8553 peptides. There are many missing data points for many peptides and since IPI IDs were used for peptides and these are not supported now, we had to convert IPI IDs to HGNC approved symbols although data had these symbols as names but they looked outdated.

Blog

Playing around with CellNOptR Tool and MIDAS File

With CellNOptR, we will try to construct network models for the challenge. For this, the tool needs two inputs. First one is a special data object called CNOlist that stores vectors and matrices of data. Second one is a .SIF file that contains prior knowledge network which can be obtained from pathway database and analysis tools. CNOlist contains following fields: namesSignals, namesCues, namesStimuli and namesInhibitors, which are vectors storing the names of measurements.

Blog

Convert XLS/XLSX to CSV in Bash

In most of the modern Linux distributions, Libre Office is available and it can be used to convert XLS or XLSX file(s) to CSV file(s) in bash. For XLS file(s): 1for i in *.xls; do libreoffice --headless --convert-to csv "$i"; done For XLSX file(s): 1for i in *.xlsx; do libreoffice --headless --convert-to csv "$i"; done You may get following warning but it still works fine: 1javaldx: Could not find a Java Runtime Environment!

Blog

Convert XLS/XLSX to CSV in Bash

In most of the modern Linux distributions, Libre Office is available and it can be used to convert XLS or XLSX file(s) to CSV file(s) in bash. For XLS file(s): 1for i in *.xls; do libreoffice --headless --convert-to csv "$i"; done For XLSX file(s): 1for i in *.xlsx; do libreoffice --headless --convert-to csv "$i"; done You may get following warning but it still works fine: 1javaldx: Could not find a Java Runtime Environment!

Blog

Convert XLS/XLSX to CSV in Bash

In most of the modern Linux distributions, Libre Office is available and it can be used to convert XLS or XLSX file(s) to CSV file(s) in bash. For XLS file(s): 1for i in *.xls; do libreoffice --headless --convert-to csv "$i"; done For XLSX file(s): 1for i in *.xlsx; do libreoffice --headless --convert-to csv "$i"; done You may get following warning but it still works fine: 1javaldx: Could not find a Java Runtime Environment!

Blog

Salmonella Data Preprocessing for PCSF Algorithm

This post describes data preprocessing in Salmonella project for Prize-Collecting Steiner Forest Problem (PCSF) algorithm. Salmonella data taken from Table S6 in Phosphoproteomic Analysis of Salmonella-Infected Cells Identifies Key Kinase Regulators and SopB-Dependent Host Phosphorylation Events by Rogers, LD et al. has been converted to tab delimited TXT file from its original XLS file for easy reading in Python. The data should be separated into time points files (2, 5, 10 and 20 minutes) each of which will contain corresponding phophoproteins and their fold changes.

Blog

Data Preprocessing I for Salmon Project

Since we’ll be using R for most of the analyses, we converted XLS data file to CSV using MS Office Excel 2013 and then we had to fix several lines using Sublime Text 2 because three colums in these lines were left unquoted which later created a problem reading in RStudio. The data contains phosphorylation data of 8553 peptides. There are many missing data points for many peptides and since IPI IDs were used for peptides and these are not supported now, we had to convert IPI IDs to HGNC approved symbols although data had these symbols as names but they looked outdated.

Blog

Convert XLS/XLSX to CSV in Bash

In most of the modern Linux distributions, Libre Office is available and it can be used to convert XLS or XLSX file(s) to CSV file(s) in bash. For XLS file(s): 1for i in *.xls; do libreoffice --headless --convert-to csv "$i"; done For XLSX file(s): 1for i in *.xlsx; do libreoffice --headless --convert-to csv "$i"; done You may get following warning but it still works fine: 1javaldx: Could not find a Java Runtime Environment!

Blog

Convert XLS/XLSX to CSV in Bash

In most of the modern Linux distributions, Libre Office is available and it can be used to convert XLS or XLSX file(s) to CSV file(s) in bash. For XLS file(s): 1for i in *.xls; do libreoffice --headless --convert-to csv "$i"; done For XLSX file(s): 1for i in *.xlsx; do libreoffice --headless --convert-to csv "$i"; done You may get following warning but it still works fine: 1javaldx: Could not find a Java Runtime Environment!

Blog

Convert XLS/XLSX to CSV in Bash

In most of the modern Linux distributions, Libre Office is available and it can be used to convert XLS or XLSX file(s) to CSV file(s) in bash. For XLS file(s): 1for i in *.xls; do libreoffice --headless --convert-to csv "$i"; done For XLSX file(s): 1for i in *.xlsx; do libreoffice --headless --convert-to csv "$i"; done You may get following warning but it still works fine: 1javaldx: Could not find a Java Runtime Environment!

Blog

Setting Up Templates and Python Scripts for Translation

Templates need following template tag: 1{% raw %}{% load i18n %}{% endraw %} Then, wrapping any text with 1{% raw %}{% trans "TEXT" %}{% endraw %} will make it translatable via Rosetta Django application In Python scripts, you need to import following library: from django.utils.translation import ugettext_lazy as _ Then wrapping any text with 1_('TEXT') will make it translatable.

Blog

Setting Up Templates and Python Scripts for Translation

Templates need following template tag: 1{% raw %}{% load i18n %}{% endraw %} Then, wrapping any text with 1{% raw %}{% trans "TEXT" %}{% endraw %} will make it translatable via Rosetta Django application In Python scripts, you need to import following library: from django.utils.translation import ugettext_lazy as _ Then wrapping any text with 1_('TEXT') will make it translatable.

Blog

Django Rosetta Translations for Django Applications

Make a directory called locale/ under the application directory: cd app_name mkdir locale Add the folder in LOCAL_PATHS dictionary in settings.py: 1LOCALE_PATHS = ( 2 os.path.join(PROJECT_ROOT, 'app_name', 'locale/'), 3) Run the following command to create PO translation file for the application: python ../manage.py makemessages -l tr -e html,py,txt python ../manage.py compilemessages Option -l is for language, it should match your definition in settings.py: 1LANGUAGES = ( 2 ('en' _('English')), 3 ('tr' _('Turkish')), 4 ('it' _('Italian')), 5) Repeat the last step for all languages and the go to Rosetta URL to translate.

Blog

Django Rosetta Installation

Install SciPy: sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose Install pymongo and nltk: sudo pip install pymongo sudo pip install nltk Install Python MySQLdb: sudo apt-get install python-mysqldb Install Rosetta: sudo pip install django-rosetta Add following into INSTALLED_APPS in settings.py: 1"rosetta", Add following into urls.py: url(r’^translations/’, include(‘rosetta.urls’)), To also allow language prefixes , change patters to i18n_patterns in urls.py: 1urlpatterns += i18n_patterns( 2 ... 3)

Blog

Setting Up Templates and Python Scripts for Translation

Templates need following template tag: 1{% raw %}{% load i18n %}{% endraw %} Then, wrapping any text with 1{% raw %}{% trans "TEXT" %}{% endraw %} will make it translatable via Rosetta Django application In Python scripts, you need to import following library: from django.utils.translation import ugettext_lazy as _ Then wrapping any text with 1_('TEXT') will make it translatable.

Blog

Django Rosetta Translations for Django Applications

Make a directory called locale/ under the application directory: cd app_name mkdir locale Add the folder in LOCAL_PATHS dictionary in settings.py: 1LOCALE_PATHS = ( 2 os.path.join(PROJECT_ROOT, 'app_name', 'locale/'), 3) Run the following command to create PO translation file for the application: python ../manage.py makemessages -l tr -e html,py,txt python ../manage.py compilemessages Option -l is for language, it should match your definition in settings.py: 1LANGUAGES = ( 2 ('en' _('English')), 3 ('tr' _('Turkish')), 4 ('it' _('Italian')), 5) Repeat the last step for all languages and the go to Rosetta URL to translate.

Blog

Django Rosetta Translations for Django Applications

Make a directory called locale/ under the application directory: cd app_name mkdir locale Add the folder in LOCAL_PATHS dictionary in settings.py: 1LOCALE_PATHS = ( 2 os.path.join(PROJECT_ROOT, 'app_name', 'locale/'), 3) Run the following command to create PO translation file for the application: python ../manage.py makemessages -l tr -e html,py,txt python ../manage.py compilemessages Option -l is for language, it should match your definition in settings.py: 1LANGUAGES = ( 2 ('en' _('English')), 3 ('tr' _('Turkish')), 4 ('it' _('Italian')), 5) Repeat the last step for all languages and the go to Rosetta URL to translate.

Blog

Django Rosetta Installation

Install SciPy: sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose Install pymongo and nltk: sudo pip install pymongo sudo pip install nltk Install Python MySQLdb: sudo apt-get install python-mysqldb Install Rosetta: sudo pip install django-rosetta Add following into INSTALLED_APPS in settings.py: 1"rosetta", Add following into urls.py: url(r’^translations/’, include(‘rosetta.urls’)), To also allow language prefixes , change patters to i18n_patterns in urls.py: 1urlpatterns += i18n_patterns( 2 ... 3)

Blog

Obtaining Molecule Description using Open Babel / PyBel

Open Babel is a great tool to analyze and investigate molecular data (.MOL, .SDF files). Its Python API is particularly very nice if you are familiar with Python already. In this post, I’ll demonstrate how you can obtain molecule description such as molecular weight, HBA, HBD, logP, formula, number of chiral centers using PyBel. Installation 1$ sudo apt-get install openbabel python-openbabel Usage for MW, HBA, HBD, logP After reading .MOL file, we need to use calcdesc method with descnames argument for getting the descriptions.

Blog

Obtaining Molecule Description using Open Babel / PyBel

Open Babel is a great tool to analyze and investigate molecular data (.MOL, .SDF files). Its Python API is particularly very nice if you are familiar with Python already. In this post, I’ll demonstrate how you can obtain molecule description such as molecular weight, HBA, HBD, logP, formula, number of chiral centers using PyBel. Installation 1$ sudo apt-get install openbabel python-openbabel Usage for MW, HBA, HBD, logP After reading .MOL file, we need to use calcdesc method with descnames argument for getting the descriptions.

Blog

Obtaining Molecule Description using Open Babel / PyBel

Open Babel is a great tool to analyze and investigate molecular data (.MOL, .SDF files). Its Python API is particularly very nice if you are familiar with Python already. In this post, I’ll demonstrate how you can obtain molecule description such as molecular weight, HBA, HBD, logP, formula, number of chiral centers using PyBel. Installation 1$ sudo apt-get install openbabel python-openbabel Usage for MW, HBA, HBD, logP After reading .MOL file, we need to use calcdesc method with descnames argument for getting the descriptions.

Blog

Obtaining Molecule Description using Open Babel / PyBel

Open Babel is a great tool to analyze and investigate molecular data (.MOL, .SDF files). Its Python API is particularly very nice if you are familiar with Python already. In this post, I’ll demonstrate how you can obtain molecule description such as molecular weight, HBA, HBD, logP, formula, number of chiral centers using PyBel. Installation 1$ sudo apt-get install openbabel python-openbabel Usage for MW, HBA, HBD, logP After reading .MOL file, we need to use calcdesc method with descnames argument for getting the descriptions.

Blog

Obtaining Molecule Description using Open Babel / PyBel

Open Babel is a great tool to analyze and investigate molecular data (.MOL, .SDF files). Its Python API is particularly very nice if you are familiar with Python already. In this post, I’ll demonstrate how you can obtain molecule description such as molecular weight, HBA, HBD, logP, formula, number of chiral centers using PyBel. Installation 1$ sudo apt-get install openbabel python-openbabel Usage for MW, HBA, HBD, logP After reading .MOL file, we need to use calcdesc method with descnames argument for getting the descriptions.

Blog

Obtaining Molecule Description using Open Babel / PyBel

Open Babel is a great tool to analyze and investigate molecular data (.MOL, .SDF files). Its Python API is particularly very nice if you are familiar with Python already. In this post, I’ll demonstrate how you can obtain molecule description such as molecular weight, HBA, HBD, logP, formula, number of chiral centers using PyBel. Installation 1$ sudo apt-get install openbabel python-openbabel Usage for MW, HBA, HBD, logP After reading .MOL file, we need to use calcdesc method with descnames argument for getting the descriptions.

Blog

Obtaining Molecule Description using Open Babel / PyBel

Open Babel is a great tool to analyze and investigate molecular data (.MOL, .SDF files). Its Python API is particularly very nice if you are familiar with Python already. In this post, I’ll demonstrate how you can obtain molecule description such as molecular weight, HBA, HBD, logP, formula, number of chiral centers using PyBel. Installation 1$ sudo apt-get install openbabel python-openbabel Usage for MW, HBA, HBD, logP After reading .MOL file, we need to use calcdesc method with descnames argument for getting the descriptions.

Blog

Obtaining Molecule Description using Open Babel / PyBel

Open Babel is a great tool to analyze and investigate molecular data (.MOL, .SDF files). Its Python API is particularly very nice if you are familiar with Python already. In this post, I’ll demonstrate how you can obtain molecule description such as molecular weight, HBA, HBD, logP, formula, number of chiral centers using PyBel. Installation 1$ sudo apt-get install openbabel python-openbabel Usage for MW, HBA, HBD, logP After reading .MOL file, we need to use calcdesc method with descnames argument for getting the descriptions.

Blog

Obtaining Molecule Description using Open Babel / PyBel

Open Babel is a great tool to analyze and investigate molecular data (.MOL, .SDF files). Its Python API is particularly very nice if you are familiar with Python already. In this post, I’ll demonstrate how you can obtain molecule description such as molecular weight, HBA, HBD, logP, formula, number of chiral centers using PyBel. Installation 1$ sudo apt-get install openbabel python-openbabel Usage for MW, HBA, HBD, logP After reading .MOL file, we need to use calcdesc method with descnames argument for getting the descriptions.

Blog

Running Script on Cluster (StarCluster)

Start a new cluster with the configuration file you modified: starcluster start cluster_name Send the script to the running cluster: starcluster put cluster_name myscr.csh /home/myscr.csh Run it using source: starcluster sshmaster cluster_name "source /home/myscr.csh >& /home/myscr.log"

Blog

Running Script on Cluster (StarCluster)

Start a new cluster with the configuration file you modified: starcluster start cluster_name Send the script to the running cluster: starcluster put cluster_name myscr.csh /home/myscr.csh Run it using source: starcluster sshmaster cluster_name "source /home/myscr.csh >& /home/myscr.log"

Blog

Running Script on Cluster (StarCluster)

Start a new cluster with the configuration file you modified: starcluster start cluster_name Send the script to the running cluster: starcluster put cluster_name myscr.csh /home/myscr.csh Run it using source: starcluster sshmaster cluster_name "source /home/myscr.csh >& /home/myscr.log"

Blog

MegaBLAST Sonuclarini Incelemek - Parsing

Pipeline’da son asama, aranan dizilerin urettigi ciktilari baska bir script ile incelemek. Bu islemle herbir megablast dosyasi okunuyor, ve dizilerin name, identity, overlapping length gibi parametrelerinin degerleri saklanarak amaca yonelik sekilde ekrana yazdiriliyor. Projemde HUSAR paketinde bulunan ve yukarida bahsettigim alanlari bana dizi olarak donduren Inslink adinda bir parser kullaniyorum. Bu parserin yaptigi tek sey, dosyayi okumak ve dosyadaki istenen alanlarin degerlerini saklamak. Daha sonra ben bu saklanan degerleri, koda eklemeler yaparak gosteriyorum ve birkac ek kod ile de ihtiyacim olan anlamli sonuclar gosteriyorum.

Blog

Yeni Verisetinin Incelenmesi

Pipeline’i tasarlama asamasinda deneme amacli kullandigim onceki verinin cok kotu olmasi sebebiyle yeni bir veriseti aldim. Elbette deneme asamasinda birden fazla, farkli karakterlerde verisetleri kullanmak yararlidir. Ancak onceki veriseti anlamli birkac sonuc veremeyecek kadar kotuydu diyebilirim. Ayrintilarina [buradan]({% post_url 2012-07-06-eslestirme-ve-eslesmeyen-okumalari %}) gozatabilirsiniz. Yeni veriseti, gene bir insan genomu verisi ve BAM dosyasinin boyutu 1.8 GB ve icinde eslenebilen ve eslenemeyen okumalari bulunduruyordu. Ben bam2fastq araciyla hem bu BAM dosyasini FASTQ dosyasina cevirirken hem de eslenebilen okumalardan ayiklayarak 0.

Blog

Birden Fazla Dizi Dosyalarindan MegaBLAST'i Calistirmak

Asagidaki scripti, pipeline’in MegaBLAST aramasini daha hizli yapabilmek icin dusundugumuz bir teknige uygun olabilmesi icin yazdim. Yaptigi sey, her okuma icin olusturulmus ve formatlanmis dizi dosyalarini kullanarak veritabanlarinda belirtilen baslangic noktasi ve okuma sayisi ile arama yapmak. 1#!user/local/bin/perl 2 3$database = $ARGV[0]; 4$dir = $ARGV[1]; #directory for sequences 5$sp = $ARGV[2]; #starting point 6$n = $ARGV[3] + $sp; 7 8while (1) { 9 system("blastplus -programname=megablast $dir/read_$sp.seq $database -OUTFILE=read_$sp.megablast -nobatch -d"); 10 $sp++; 11 last if ($sp == $n); 12} Burada her sey gercekten cok basit bir programlama ile isliyor.

Blog

FASTQ'dan FASTA'ya Donusturme Perl Scripti

FASTQ ve FASTA formatlari aslinda ayni bilgiyi iceren ancak birinde sadece herbir dizi icin iki satir daha az bilginin bulundugu dosya formatlari. Projemde onemli olan diger bir farklari ise FASTA formatinin direkt olarak MegaBLAST arama yapilabilmesi. Iste bu yuzden, genetik dizilim yapan makinelerin olusturdugu FASTQ formatini FASTA’ya cevirmem gerekiyor. Ve bu script pipeline’in ilk adimi. Aslinda deneme amacli aldigim genetk dizilimin, bana bunu ulastiran tarafindan eslestirmesinin yapilmadigi icin, bir on adim olarak bu eslestirmeyi yapmistim.

Blog

Running Script on Cluster (StarCluster)

Start a new cluster with the configuration file you modified: starcluster start cluster_name Send the script to the running cluster: starcluster put cluster_name myscr.csh /home/myscr.csh Run it using source: starcluster sshmaster cluster_name "source /home/myscr.csh >& /home/myscr.log"

Blog

Running Script on Cluster (StarCluster)

Start a new cluster with the configuration file you modified: starcluster start cluster_name Send the script to the running cluster: starcluster put cluster_name myscr.csh /home/myscr.csh Run it using source: starcluster sshmaster cluster_name "source /home/myscr.csh >& /home/myscr.log"

Blog

Uploading Files to AWS using SSH/SCP

Here is a small command for uploading files to AWS through SSH’s command scp (secure copy). scp -i path/to/your/key-pairs/file path/to/file/you/want/to/upload ubuntu@PUBLIC_DNS:path/to/the/destination

Blog

Uploading Files to AWS using SSH/SCP

Here is a small command for uploading files to AWS through SSH’s command scp (secure copy). scp -i path/to/your/key-pairs/file path/to/file/you/want/to/upload ubuntu@PUBLIC_DNS:path/to/the/destination

Blog

Errno 13 Permission denied Django File Uploads

Run following command to give www-data permissions to static folder and all its content: cd path/to/your/django/project sudo chown -R www-data:www-data static/ Do this in your production server

Blog

session_start() Permission denied (13) Laravel 4

Solve it by running following lines: chmod -R 755 /path/to/your/laravel/directory chmod -R o+w /path/to/your/laravel/directory And/or maybe: sudo chown -R www-data:user /path/to/your/laravel/directory

Blog

Permission Issues develop Laravel 4 on Ubuntu 12.04 LTS

If your CSS or JS files don’t seem to load or you get 403 Forbidden or Permissions denied, all you need to do is to run following on terminal: sudo chmod -R 755 /path/to/your/laravel/directory

Blog

Configuring Mezzanine for Apache server & mod_wsgi in AWS

Install [Mezzanine]({% post_url 2015-05-01-how-to-install-mezzanine-on-ubuntulinux-mint %}), [Apache server]({% post_url 2015-05-08-getting-started-with-your-aws-instance-and %}) and mod_wsgi: sudo apt-get install libapache2-mod-wsgi sudo a2enmod wsgi Set up a MySQL database for your Mezzanine project Read [my post on how to set up a MySQL database for a Mezzanine project]({% post_url 2015-05-09-how-to-set-up-a-mysql-database-for-a-mezzanine %}) Collect static files: python manage.py collectstatic Configure your Apache server configuration for the project like following: WSGIPythonPath /home/ubuntu/www/mezzanine-project <VirtualHost *:80> #ServerName example.com ServerAdmin admin@example.com DocumentRoot /home/ubuntu/www/mezzanine-project WSGIScriptAlias / /home/ubuntu/www/mezzanine-project/wsgi.

Blog

Configuring Mezzanine for Apache server & mod_wsgi in AWS

Install [Mezzanine]({% post_url 2015-05-01-how-to-install-mezzanine-on-ubuntulinux-mint %}), [Apache server]({% post_url 2015-05-08-getting-started-with-your-aws-instance-and %}) and mod_wsgi: sudo apt-get install libapache2-mod-wsgi sudo a2enmod wsgi Set up a MySQL database for your Mezzanine project Read [my post on how to set up a MySQL database for a Mezzanine project]({% post_url 2015-05-09-how-to-set-up-a-mysql-database-for-a-mezzanine %}) Collect static files: python manage.py collectstatic Configure your Apache server configuration for the project like following: WSGIPythonPath /home/ubuntu/www/mezzanine-project <VirtualHost *:80> #ServerName example.com ServerAdmin admin@example.com DocumentRoot /home/ubuntu/www/mezzanine-project WSGIScriptAlias / /home/ubuntu/www/mezzanine-project/wsgi.

Blog

Setting Up Mezzanine Projects in AWS

Go to EC2 management console, Security groups and add a Custom TCP inbound rule with port 8000. Select “Anywhere” from the list. Then follow [this to install Mezzanine]({% post_url 2015-05-01-how-to-install-mezzanine-on-ubuntulinux-mint %}) Above tutorial is also explains setting up a site record. Mezzanine default site record is 127.0.0.1:8000 which should be 0.0.0.0:8000 in our case. So, enter 0.0.0.0:8000 when you’re asked to enter a site record when you ru python manage.py createdb Also, you might still need to provide this site record while running the development server:

Blog

Setting Up Mezzanine Projects in AWS

Go to EC2 management console, Security groups and add a Custom TCP inbound rule with port 8000. Select “Anywhere” from the list. Then follow [this to install Mezzanine]({% post_url 2015-05-01-how-to-install-mezzanine-on-ubuntulinux-mint %}) Above tutorial is also explains setting up a site record. Mezzanine default site record is 127.0.0.1:8000 which should be 0.0.0.0:8000 in our case. So, enter 0.0.0.0:8000 when you’re asked to enter a site record when you ru python manage.py createdb Also, you might still need to provide this site record while running the development server:

Blog

AWS Start an Instance and Connect to it

Go to EC2 management console Create a new key-pair if necessary and download it Launch an instance Add HTTP security group for web applications over HTTP Get public DNS Change permissions on key-pair file: 1chmod 400 path/to/your/file.pem Connect: 1ssh -i path/to/your/file.pem ubuntu@PUBLIC_DNS Note: ubuntu is for connecting an Ubuntu 64 bit instance. It’s different for others

Blog

AWS Start an Instance and Connect to it

Go to EC2 management console Create a new key-pair if necessary and download it Launch an instance Add HTTP security group for web applications over HTTP Get public DNS Change permissions on key-pair file: 1chmod 400 path/to/your/file.pem Connect: 1ssh -i path/to/your/file.pem ubuntu@PUBLIC_DNS Note: ubuntu is for connecting an Ubuntu 64 bit instance. It’s different for others

Blog

How to Get Path to or Directory of Current Script in R

Use following code to get the path to or directory of current (running) script in R: 1scr_dir <- dirname(sys.frame(1)$ofile) 2scr_path <- paste(scr_dir, "script.R", sep="/") Taken from SO

Blog

How to Get Path to or Directory of Current Script in R

Use following code to get the path to or directory of current (running) script in R: 1scr_dir <- dirname(sys.frame(1)$ofile) 2scr_path <- paste(scr_dir, "script.R", sep="/") Taken from SO

Blog

How to Get Path to or Directory of Current Script in R

Use following code to get the path to or directory of current (running) script in R: 1scr_dir <- dirname(sys.frame(1)$ofile) 2scr_path <- paste(scr_dir, "script.R", sep="/") Taken from SO

Blog

How to Get (or Load) NCBI GEO Microarray Data into R using GEOquery Package from Bioconductor

R, especially with lots of Bioconductor packages, provides nice tools to load, manage and analyze microarray data. If you are trying to load NCBI GEO data into R, use GEOquery package. Here, I’ll describe how to start with it and probably in my future posts I’ll mention more. Installation 1source("http://bioconductor.org/biocLite.R") 2biocLite("GEOquery") Usage 1library(GEOquery) 2gds <- getGEO("GDS5072") or 1library(GEOquery) 2gds <- getGEO(filename="path/to/GDS5072.soft.gz") getGEO function return a complex class type GDS object which contains the complete dataset.

Blog

Progress on Network Inference Sub-Challenge

This sub-challenge has several requirements: Directed and causal edges on the models (32 models - 4 cell lines × 8 stimuli) Edges should be scored (normalizing to range between 0 and 1) that will show confidence Nodes will be phosphoproteins from the data Prior knowledge network (that can be constructed using pathway databases) is allowed to be used (actually this is a must for some network inference tools) First thing was to look for existing tools.

Blog

How to Get (or Load) NCBI GEO Microarray Data into R using GEOquery Package from Bioconductor

R, especially with lots of Bioconductor packages, provides nice tools to load, manage and analyze microarray data. If you are trying to load NCBI GEO data into R, use GEOquery package. Here, I’ll describe how to start with it and probably in my future posts I’ll mention more. Installation 1source("http://bioconductor.org/biocLite.R") 2biocLite("GEOquery") Usage 1library(GEOquery) 2gds <- getGEO("GDS5072") or 1library(GEOquery) 2gds <- getGEO(filename="path/to/GDS5072.soft.gz") getGEO function return a complex class type GDS object which contains the complete dataset.

Blog

How to Get (or Load) NCBI GEO Microarray Data into R using GEOquery Package from Bioconductor

R, especially with lots of Bioconductor packages, provides nice tools to load, manage and analyze microarray data. If you are trying to load NCBI GEO data into R, use GEOquery package. Here, I’ll describe how to start with it and probably in my future posts I’ll mention more. Installation 1source("http://bioconductor.org/biocLite.R") 2biocLite("GEOquery") Usage 1library(GEOquery) 2gds <- getGEO("GDS5072") or 1library(GEOquery) 2gds <- getGEO(filename="path/to/GDS5072.soft.gz") getGEO function return a complex class type GDS object which contains the complete dataset.

Blog

How to Get (or Load) NCBI GEO Microarray Data into R using GEOquery Package from Bioconductor

R, especially with lots of Bioconductor packages, provides nice tools to load, manage and analyze microarray data. If you are trying to load NCBI GEO data into R, use GEOquery package. Here, I’ll describe how to start with it and probably in my future posts I’ll mention more. Installation 1source("http://bioconductor.org/biocLite.R") 2biocLite("GEOquery") Usage 1library(GEOquery) 2gds <- getGEO("GDS5072") or 1library(GEOquery) 2gds <- getGEO(filename="path/to/GDS5072.soft.gz") getGEO function return a complex class type GDS object which contains the complete dataset.

Blog

How to Get (or Load) NCBI GEO Microarray Data into R using GEOquery Package from Bioconductor

R, especially with lots of Bioconductor packages, provides nice tools to load, manage and analyze microarray data. If you are trying to load NCBI GEO data into R, use GEOquery package. Here, I’ll describe how to start with it and probably in my future posts I’ll mention more. Installation 1source("http://bioconductor.org/biocLite.R") 2biocLite("GEOquery") Usage 1library(GEOquery) 2gds <- getGEO("GDS5072") or 1library(GEOquery) 2gds <- getGEO(filename="path/to/GDS5072.soft.gz") getGEO function return a complex class type GDS object which contains the complete dataset.

Blog

How to Get (or Load) NCBI GEO Microarray Data into R using GEOquery Package from Bioconductor

R, especially with lots of Bioconductor packages, provides nice tools to load, manage and analyze microarray data. If you are trying to load NCBI GEO data into R, use GEOquery package. Here, I’ll describe how to start with it and probably in my future posts I’ll mention more. Installation 1source("http://bioconductor.org/biocLite.R") 2biocLite("GEOquery") Usage 1library(GEOquery) 2gds <- getGEO("GDS5072") or 1library(GEOquery) 2gds <- getGEO(filename="path/to/GDS5072.soft.gz") getGEO function return a complex class type GDS object which contains the complete dataset.

Blog

How to Get (or Load) NCBI GEO Microarray Data into R using GEOquery Package from Bioconductor

R, especially with lots of Bioconductor packages, provides nice tools to load, manage and analyze microarray data. If you are trying to load NCBI GEO data into R, use GEOquery package. Here, I’ll describe how to start with it and probably in my future posts I’ll mention more. Installation 1source("http://bioconductor.org/biocLite.R") 2biocLite("GEOquery") Usage 1library(GEOquery) 2gds <- getGEO("GDS5072") or 1library(GEOquery) 2gds <- getGEO(filename="path/to/GDS5072.soft.gz") getGEO function return a complex class type GDS object which contains the complete dataset.

Blog

An Exon of Length 2 Appeared in Ensembl

I want to share an interesting finding about our research on exon/intron analysis of human evolutionary history. So I had the genes that emerged at each pass point of human history and I was using Ensembl API to get exons and introns of these genes to perform further analyses. There was one gene (ENSG00000197568 - HERV-H LTR-associating 3 - HHLA3) with a surprise. Because it’s one transcript (ENST00000432224) had an exon (ENSE00001707577) of length 2.

Blog

Duzenli Ifadeler ile Tur Ismini Elde Etmek

Projemin sonunda kullaniciya olasi kirleten organizmalarin adlarini (Latince tur isimleri) gosterecegim icin, MegaBLAST sonuclarindaki erisim numaralarini (accession number) kullanarak her dizi icin organizma adlarini elde etmem gerekiyor. Sequence Retrival System (SRS) adinda, HUSAR sunucularinda bulunan baska bir sistem ile bunu yapabiliyorum. SRS’ten organizma adini ogrenebilmem icin Unix komut satirinda “getz” komutuyla birlikte veritabani ismi, erisim numarasi ve ogrenmek istedigim alani yazmam yetiyor. Asagida, bu isi yapabilen ornek bir kod bulabilirsiniz.

Blog

Veritabani Secimi

Bu projedeki amacim olasi kirleten organizmalari (kontaminantlari) bulmak. Dolayisiyla genis bir veritabanina ihtiyacim var. Ancak veritabanini genis tutmak boyle bir avantaj sagliyorken, her dizi icin o veritabaninda arama yapmak oldukca fazla bilgisayar gucu ve zaman gerektiriyor. Bu yuzden projemi gelistirirken, cesitli veritabanlarini da inceliyorum. Ve ayrica bunlari nasil kisitlayarak, amacim icin en uygun hale getirebilecegimi arastiriyorum. Ilk olarak NCBI’in Reference Sequence (Kaynak Dizi ya da Referans Sekans) – RefSeq – veritabaniyla basladim.

Blog

Biyoenformatik Nedir? Biyoenformatik'in Tanımı

Birçok organizmanın ve son olarak da 2001’de insan genomunun çıkarılmasıyla, tüm 3 milyar baz çiftinin diziliminin elde edilmesiyle, karşımıza bu bilgiyi farklı şekillerde kullanacak olan alanlar çıktı.Bu genleri anlamaya çalışan, bu genlerden oluşacak proteinleri belirlemeye çalışan alanların yanında bu bilginin analizini yapma ihtiyacı da Biyoenformatik alanını doğurdu. Biyoenformatik, biyolojik bilginin bilgisayarlar ve istatistiksel teknikler kullanılarak analiz edilmesidir; başka bir deyişle, biyoenformatik, biyolojik araştırmaları iyileştirmek ve hızlandırmak için bilgisayar veri tabanları ve algoritmaları geliştirme ve onlardan yarar sağlama bilimidir [1].

Blog

How to Install Mezzanine on Ubuntu/Linux Mint [Complete Guide]

Mezzanine is a CMS application built on Django web framework. The installation steps are easy but your environment may not just suitable enough for it work without a problem. So, here I’m going to describe complete installation from scratch on a virtual environment. First of all, install virtualenv: $ sudo apt-get install python-virtualenv Then, create a virtual environment: $ virtualenv testenv And, activate it: $ cd testenv $ source bin/activate

Blog

How to Clear (or Drop) DB Table of A Django App

Let’s say you created a Django app and ran python manage.py syncdb and created its table. Everytime you make a change in the table, you’ll need to drop that table and run python manage.py syncdb again to update. And how you drop a table of a Django app: $ python manage.py sqlclear app_name | python manage.py dbshell Drop tables of an app with migrations (Django >= 1.8): $ python manage.py migrate appname zero Recreate all the tables:

Blog

How to Clear (or Drop) DB Table of A Django App

Let’s say you created a Django app and ran python manage.py syncdb and created its table. Everytime you make a change in the table, you’ll need to drop that table and run python manage.py syncdb again to update. And how you drop a table of a Django app: $ python manage.py sqlclear app_name | python manage.py dbshell Drop tables of an app with migrations (Django >= 1.8): $ python manage.py migrate appname zero Recreate all the tables:

Blog

How to Clear (or Drop) DB Table of A Django App

Let’s say you created a Django app and ran python manage.py syncdb and created its table. Everytime you make a change in the table, you’ll need to drop that table and run python manage.py syncdb again to update. And how you drop a table of a Django app: $ python manage.py sqlclear app_name | python manage.py dbshell Drop tables of an app with migrations (Django >= 1.8): $ python manage.py migrate appname zero Recreate all the tables:

Blog

How to Clear (or Drop) DB Table of A Django App

Let’s say you created a Django app and ran python manage.py syncdb and created its table. Everytime you make a change in the table, you’ll need to drop that table and run python manage.py syncdb again to update. And how you drop a table of a Django app: $ python manage.py sqlclear app_name | python manage.py dbshell Drop tables of an app with migrations (Django >= 1.8): $ python manage.py migrate appname zero Recreate all the tables:

Blog

Salmonella - Host Interaction Network - A Detailed, Better Visualization

We’re almost done with the analyses and we’re making the final visualization of the network. As I previously posted, the network was clustered and visualized by time points. After that, we have done several more analyses and here I report how we visualized them. I’m going to post more about how we did the analyses separately. First, the nodes are grouped into experimental and not experimental (PCSF nodes). This can easily be done by parsing experimental network output and network outputs of PCSF.

Blog

Salmonella - Host Interaction Network - A Detailed, Better Visualization

We’re almost done with the analyses and we’re making the final visualization of the network. As I previously posted, the network was clustered and visualized by time points. After that, we have done several more analyses and here I report how we visualized them. I’m going to post more about how we did the analyses separately. First, the nodes are grouped into experimental and not experimental (PCSF nodes). This can easily be done by parsing experimental network output and network outputs of PCSF.

Blog

Network Clustering with NeAT - RNSC Algorithm

As we have obtained proteins at different times points from the experimental data, then we have found intermediate nodes (from human interactome) using PCSF algorithm and finally with a special matrix from the network that PCSF created, we have validated the edges and also determined edge directions using an approach which a divide and conquer (ILP) approach for construction of large-scale signaling networks from PPI data. The resulting network is a directed network and will be used and visualized for further analyses.

Blog

Network Visualization Using Cytoscape

Cytoscape is a nice tool to visualize network for better understanding and delivery. I used it for in silico data network visualization and the result was really pretty. Now, I have networks constructed using experimental data from HPN-DREAM Challenge. In this post, I want to demonstrate how to visualize a network with scores. I’m using Cytoscape 2.8 on Ubuntu 12. First, the network will be read from a SIF file which is default format of Cytoscape for networks.

Blog

In silico Network Inference Last Improvements and Visualization of Result in Cytoscape

I’m almost done with the analysis of in silico data, although I need to decide if I need further analysis with the inhibiting parent nodes in the network. Last, I couldn’t filter out duplicate edges, which were scored differently. Now, with some improvements in the script, low scores duplicates are filtered and there is a better final list of edges which is ready to be visualized. I also tried visualizing it on Cytoscape.

Blog

Salmonella - Host Interaction Network - A Detailed, Better Visualization

We’re almost done with the analyses and we’re making the final visualization of the network. As I previously posted, the network was clustered and visualized by time points. After that, we have done several more analyses and here I report how we visualized them. I’m going to post more about how we did the analyses separately. First, the nodes are grouped into experimental and not experimental (PCSF nodes). This can easily be done by parsing experimental network output and network outputs of PCSF.

Blog

Salmonella - Host Interaction Network - A Detailed, Better Visualization

We’re almost done with the analyses and we’re making the final visualization of the network. As I previously posted, the network was clustered and visualized by time points. After that, we have done several more analyses and here I report how we visualized them. I’m going to post more about how we did the analyses separately. First, the nodes are grouped into experimental and not experimental (PCSF nodes). This can easily be done by parsing experimental network output and network outputs of PCSF.

Blog

GO Enrichment of Network Clusters

In my previous post, I mentioned how I clustered the network we obtained at the end. For functional annotation gene ontology (GO) enrichment has been done on these clusters. There were 20 clusters and the HGNC names are obtained separately for each cluster and using DAVID functional annotation tool API, GO and pathway annotations are collected per cluster and these are saved separately. http://david.abcc.ncifcrf.gov/api.jsp?type=OFFICIAL_GENE_SYMBOL&tool=chartReport&annot=GOTERM_BP_FAT,GOTERM_CC_FAT,GOTERM_MF_FAT,BBID,BIOCARTA,KEGG_PATHWAY&ids=HGNC_NAME1,HGNC_NAME2,HGNC_NAME3,... Above URL was used to obtain chart report for some GO and pathways chart records.

Blog

Network Clustering with NeAT - RNSC Algorithm

As we have obtained proteins at different times points from the experimental data, then we have found intermediate nodes (from human interactome) using PCSF algorithm and finally with a special matrix from the network that PCSF created, we have validated the edges and also determined edge directions using an approach which a divide and conquer (ILP) approach for construction of large-scale signaling networks from PPI data. The resulting network is a directed network and will be used and visualized for further analyses.

Blog

Reconstructed Salmonella Signaling Network Visualized and Colored

After fold changes were obtained and HGNC names were found for each phosphopeptide, these were used to construct Salmonella signaling network using PCSF and then with the nodes that PCSF found as well, we generated a matrix which has node in the rows and time points in the columns and each cell shows the presence of corresponding protein under the corresponding time point(s). The matrix has 658 nodes (proteins) and 4 time points as indicated before: 2 min, 5 min, 10 min and 20 min.

Blog

Multi-dimensional Modeling and Reconstruction of Signaling Networks in Salmonella-infected Human Cells

In this study, we’re going to use a phosphorylation data from a research paper on phosphoproteomic analysis of related cells. The idea is to use and compare existing methods and develop these methods to be able to better understand the nature of signaling events in these cells and to find key proteins that might be targets for disease diagnosis, prevention and treatment. This study will be submitted as a research paper so I’m not going to publish any results here for now but I’ll mention the struggles I have and solutions I try to solve them.

Blog

Last Submissions to the Challenge

Today, I submitted in silico and experimental data network inference results on Synapse for the next leaderboard on this Wednesday. For experimental part, I had to exclude edges with FGFR1 and FGFR3 because the data lacks phosphorylated forms of these proteins and networks must be constructed using only phosphoproteins in the data. Since there was an update for in silico part, I had to modify the script and resubmit the results.

Blog

Network Visualization Using Cytoscape

Cytoscape is a nice tool to visualize network for better understanding and delivery. I used it for in silico data network visualization and the result was really pretty. Now, I have networks constructed using experimental data from HPN-DREAM Challenge. In this post, I want to demonstrate how to visualize a network with scores. I’m using Cytoscape 2.8 on Ubuntu 12. First, the network will be read from a SIF file which is default format of Cytoscape for networks.

Blog

Plotting Expression Curves for Experimental Data

As I can plot expression curves for in silico data. I moved on experimental data which is more complex and larger. This data is the result of RPPA experiments on different breast cancer cell lines and it includes protein abundance measurements for about 45 phophoproteins. These phosphoproteins are treated with different inhibitors and stimuli and by comparing their expressions, I will try to infer relations between them. Before moving on inferring part, I want to have a script that can plot the graphs so that I can see particular results for specific cases.

Blog

Experimental Data Optimization for Network Inference

As I mentioned in my previous post, experimental data from the challenge has missing data values that create problems during analyses. To solve it, first thing I did was to optimize data, which includes detecting missing conditions and putting NAs for data values and sorting them if necessary. I wrote two functions in the script. First one ranks the data according to the fashion and sorts it based on these ranks.

Blog

Working with Experimental Data from Network Inference Challenge

As I almost finished with in silico data, I moved on to analyses of experimental data using the same script. But since the characteristics of data is somehow different, before inferring network, I need to modify the script to be able to read experimental data files. These differences include missing data values for some conditions. This makes analyses difficult because I have to estimate a value for them and this will decrease the confidence score of edges.

Blog

In silico Network Inference Last Improvements and Visualization of Result in Cytoscape

I’m almost done with the analysis of in silico data, although I need to decide if I need further analysis with the inhibiting parent nodes in the network. Last, I couldn’t filter out duplicate edges, which were scored differently. Now, with some improvements in the script, low scores duplicates are filtered and there is a better final list of edges which is ready to be visualized. I also tried visualizing it on Cytoscape.

Blog

Salmonella - Host Interaction Network - A Detailed, Better Visualization

We’re almost done with the analyses and we’re making the final visualization of the network. As I previously posted, the network was clustered and visualized by time points. After that, we have done several more analyses and here I report how we visualized them. I’m going to post more about how we did the analyses separately. First, the nodes are grouped into experimental and not experimental (PCSF nodes). This can easily be done by parsing experimental network output and network outputs of PCSF.

Blog

Salmonella - Host Interaction Network - A Detailed, Better Visualization

We’re almost done with the analyses and we’re making the final visualization of the network. As I previously posted, the network was clustered and visualized by time points. After that, we have done several more analyses and here I report how we visualized them. I’m going to post more about how we did the analyses separately. First, the nodes are grouped into experimental and not experimental (PCSF nodes). This can easily be done by parsing experimental network output and network outputs of PCSF.

Blog

Finding k-cores and Clustering Coefficient Computation with NetworkX

Assume you have a large network and you want to find k-cores of each node and also you want to compute clustering coefficient for each one. Python package NetworkX comes with very nice methods for you to easily do these. k-core is a maximal subgraph whose nodes are at least k degree [1]. To find k-cores: Add all edges you have in your network in a NetworkX graph, and use core_number method that gets graph as the single input and returns node – k-core pairs.

Blog

Salmonella - Host Interaction Network - A Detailed, Better Visualization

We’re almost done with the analyses and we’re making the final visualization of the network. As I previously posted, the network was clustered and visualized by time points. After that, we have done several more analyses and here I report how we visualized them. I’m going to post more about how we did the analyses separately. First, the nodes are grouped into experimental and not experimental (PCSF nodes). This can easily be done by parsing experimental network output and network outputs of PCSF.

Blog

Salmonella - Host Interaction Network - A Detailed, Better Visualization

We’re almost done with the analyses and we’re making the final visualization of the network. As I previously posted, the network was clustered and visualized by time points. After that, we have done several more analyses and here I report how we visualized them. I’m going to post more about how we did the analyses separately. First, the nodes are grouped into experimental and not experimental (PCSF nodes). This can easily be done by parsing experimental network output and network outputs of PCSF.

Blog

Salmonella - Host Interaction Network - A Detailed, Better Visualization

We’re almost done with the analyses and we’re making the final visualization of the network. As I previously posted, the network was clustered and visualized by time points. After that, we have done several more analyses and here I report how we visualized them. I’m going to post more about how we did the analyses separately. First, the nodes are grouped into experimental and not experimental (PCSF nodes). This can easily be done by parsing experimental network output and network outputs of PCSF.

Blog

Network Clustering with NeAT - RNSC Algorithm

As we have obtained proteins at different times points from the experimental data, then we have found intermediate nodes (from human interactome) using PCSF algorithm and finally with a special matrix from the network that PCSF created, we have validated the edges and also determined edge directions using an approach which a divide and conquer (ILP) approach for construction of large-scale signaling networks from PPI data. The resulting network is a directed network and will be used and visualized for further analyses.

Blog

Reconstructed Salmonella Signaling Network Visualized and Colored

After fold changes were obtained and HGNC names were found for each phosphopeptide, these were used to construct Salmonella signaling network using PCSF and then with the nodes that PCSF found as well, we generated a matrix which has node in the rows and time points in the columns and each cell shows the presence of corresponding protein under the corresponding time point(s). The matrix has 658 nodes (proteins) and 4 time points as indicated before: 2 min, 5 min, 10 min and 20 min.

Blog

Salmonella - Host Interaction Network - A Detailed, Better Visualization

We’re almost done with the analyses and we’re making the final visualization of the network. As I previously posted, the network was clustered and visualized by time points. After that, we have done several more analyses and here I report how we visualized them. I’m going to post more about how we did the analyses separately. First, the nodes are grouped into experimental and not experimental (PCSF nodes). This can easily be done by parsing experimental network output and network outputs of PCSF.

Blog

Network Clustering with NeAT - RNSC Algorithm

As we have obtained proteins at different times points from the experimental data, then we have found intermediate nodes (from human interactome) using PCSF algorithm and finally with a special matrix from the network that PCSF created, we have validated the edges and also determined edge directions using an approach which a divide and conquer (ILP) approach for construction of large-scale signaling networks from PPI data. The resulting network is a directed network and will be used and visualized for further analyses.

Blog

In silico Network Inference Last Improvements and Visualization of Result in Cytoscape

I’m almost done with the analysis of in silico data, although I need to decide if I need further analysis with the inhibiting parent nodes in the network. Last, I couldn’t filter out duplicate edges, which were scored differently. Now, with some improvements in the script, low scores duplicates are filtered and there is a better final list of edges which is ready to be visualized. I also tried visualizing it on Cytoscape.

Blog

DREAM Breast Cancer Sub-challenges

I have been going over the sub-challenges before attempting to solve them. As I mentioned, there are three sub-challenges and somehow they are connected. First, using given data and other possible data sources such as pathway databases, the causal signaling network of the phosphoproteins. There are 4 cell lines and 8 stimulus so they make total 32 networks at the end. Nodes are phosphoproteins and edges should be directed and causal (activator or inhibitor).

Blog

GO Enrichment of Network Clusters

In my previous post, I mentioned how I clustered the network we obtained at the end. For functional annotation gene ontology (GO) enrichment has been done on these clusters. There were 20 clusters and the HGNC names are obtained separately for each cluster and using DAVID functional annotation tool API, GO and pathway annotations are collected per cluster and these are saved separately. http://david.abcc.ncifcrf.gov/api.jsp?type=OFFICIAL_GENE_SYMBOL&tool=chartReport&annot=GOTERM_BP_FAT,GOTERM_CC_FAT,GOTERM_MF_FAT,BBID,BIOCARTA,KEGG_PATHWAY&ids=HGNC_NAME1,HGNC_NAME2,HGNC_NAME3,... Above URL was used to obtain chart report for some GO and pathways chart records.

Blog

GO Enrichment of Network Clusters

In my previous post, I mentioned how I clustered the network we obtained at the end. For functional annotation gene ontology (GO) enrichment has been done on these clusters. There were 20 clusters and the HGNC names are obtained separately for each cluster and using DAVID functional annotation tool API, GO and pathway annotations are collected per cluster and these are saved separately. http://david.abcc.ncifcrf.gov/api.jsp?type=OFFICIAL_GENE_SYMBOL&tool=chartReport&annot=GOTERM_BP_FAT,GOTERM_CC_FAT,GOTERM_MF_FAT,BBID,BIOCARTA,KEGG_PATHWAY&ids=HGNC_NAME1,HGNC_NAME2,HGNC_NAME3,... Above URL was used to obtain chart report for some GO and pathways chart records.

Blog

Network Clustering with NeAT - RNSC Algorithm

As we have obtained proteins at different times points from the experimental data, then we have found intermediate nodes (from human interactome) using PCSF algorithm and finally with a special matrix from the network that PCSF created, we have validated the edges and also determined edge directions using an approach which a divide and conquer (ILP) approach for construction of large-scale signaling networks from PPI data. The resulting network is a directed network and will be used and visualized for further analyses.

Blog

Finding k-cores and Clustering Coefficient Computation with NetworkX

Assume you have a large network and you want to find k-cores of each node and also you want to compute clustering coefficient for each one. Python package NetworkX comes with very nice methods for you to easily do these. k-core is a maximal subgraph whose nodes are at least k degree [1]. To find k-cores: Add all edges you have in your network in a NetworkX graph, and use core_number method that gets graph as the single input and returns node – k-core pairs.

Blog

UPGMA Algorithm Described - Unweighted Pair-Group Method with Arithmetic Mean

UPGMA is an agglomerative clustering algorithm that is ultrametric (assumes a molecular clock - all lineages are evolving at a constant rate) by Sokal and Michener in 1958. The idea is to continue iteration until only one cluster is obtained and at each iteration, join two nearest clusters (which become a higher cluster). The distance between any two clusters are calculated by averaging distances between elements of each cluster. To understand better, see UPGMA worked example by Dr Richard Edwards.

Blog

GO Enrichment of Network Clusters

In my previous post, I mentioned how I clustered the network we obtained at the end. For functional annotation gene ontology (GO) enrichment has been done on these clusters. There were 20 clusters and the HGNC names are obtained separately for each cluster and using DAVID functional annotation tool API, GO and pathway annotations are collected per cluster and these are saved separately. http://david.abcc.ncifcrf.gov/api.jsp?type=OFFICIAL_GENE_SYMBOL&tool=chartReport&annot=GOTERM_BP_FAT,GOTERM_CC_FAT,GOTERM_MF_FAT,BBID,BIOCARTA,KEGG_PATHWAY&ids=HGNC_NAME1,HGNC_NAME2,HGNC_NAME3,... Above URL was used to obtain chart report for some GO and pathways chart records.

Blog

Salmonella Data Preprocessing for PCSF Algorithm

This post describes data preprocessing in Salmonella project for Prize-Collecting Steiner Forest Problem (PCSF) algorithm. Salmonella data taken from Table S6 in Phosphoproteomic Analysis of Salmonella-Infected Cells Identifies Key Kinase Regulators and SopB-Dependent Host Phosphorylation Events by Rogers, LD et al. has been converted to tab delimited TXT file from its original XLS file for easy reading in Python. The data should be separated into time points files (2, 5, 10 and 20 minutes) each of which will contain corresponding phophoproteins and their fold changes.

Blog

Data Preprocessing I for Salmon Project

Since we’ll be using R for most of the analyses, we converted XLS data file to CSV using MS Office Excel 2013 and then we had to fix several lines using Sublime Text 2 because three colums in these lines were left unquoted which later created a problem reading in RStudio. The data contains phosphorylation data of 8553 peptides. There are many missing data points for many peptides and since IPI IDs were used for peptides and these are not supported now, we had to convert IPI IDs to HGNC approved symbols although data had these symbols as names but they looked outdated.

Blog

GO Enrichment of Network Clusters

In my previous post, I mentioned how I clustered the network we obtained at the end. For functional annotation gene ontology (GO) enrichment has been done on these clusters. There were 20 clusters and the HGNC names are obtained separately for each cluster and using DAVID functional annotation tool API, GO and pathway annotations are collected per cluster and these are saved separately. http://david.abcc.ncifcrf.gov/api.jsp?type=OFFICIAL_GENE_SYMBOL&tool=chartReport&annot=GOTERM_BP_FAT,GOTERM_CC_FAT,GOTERM_MF_FAT,BBID,BIOCARTA,KEGG_PATHWAY&ids=HGNC_NAME1,HGNC_NAME2,HGNC_NAME3,... Above URL was used to obtain chart report for some GO and pathways chart records.

Blog

GO Enrichment of Network Clusters

In my previous post, I mentioned how I clustered the network we obtained at the end. For functional annotation gene ontology (GO) enrichment has been done on these clusters. There were 20 clusters and the HGNC names are obtained separately for each cluster and using DAVID functional annotation tool API, GO and pathway annotations are collected per cluster and these are saved separately. http://david.abcc.ncifcrf.gov/api.jsp?type=OFFICIAL_GENE_SYMBOL&tool=chartReport&annot=GOTERM_BP_FAT,GOTERM_CC_FAT,GOTERM_MF_FAT,BBID,BIOCARTA,KEGG_PATHWAY&ids=HGNC_NAME1,HGNC_NAME2,HGNC_NAME3,... Above URL was used to obtain chart report for some GO and pathways chart records.

Blog

GO Enrichment of Network Clusters

In my previous post, I mentioned how I clustered the network we obtained at the end. For functional annotation gene ontology (GO) enrichment has been done on these clusters. There were 20 clusters and the HGNC names are obtained separately for each cluster and using DAVID functional annotation tool API, GO and pathway annotations are collected per cluster and these are saved separately. http://david.abcc.ncifcrf.gov/api.jsp?type=OFFICIAL_GENE_SYMBOL&tool=chartReport&annot=GOTERM_BP_FAT,GOTERM_CC_FAT,GOTERM_MF_FAT,BBID,BIOCARTA,KEGG_PATHWAY&ids=HGNC_NAME1,HGNC_NAME2,HGNC_NAME3,... Above URL was used to obtain chart report for some GO and pathways chart records.

Blog

GO Enrichment of Network Clusters

In my previous post, I mentioned how I clustered the network we obtained at the end. For functional annotation gene ontology (GO) enrichment has been done on these clusters. There were 20 clusters and the HGNC names are obtained separately for each cluster and using DAVID functional annotation tool API, GO and pathway annotations are collected per cluster and these are saved separately. http://david.abcc.ncifcrf.gov/api.jsp?type=OFFICIAL_GENE_SYMBOL&tool=chartReport&annot=GOTERM_BP_FAT,GOTERM_CC_FAT,GOTERM_MF_FAT,BBID,BIOCARTA,KEGG_PATHWAY&ids=HGNC_NAME1,HGNC_NAME2,HGNC_NAME3,... Above URL was used to obtain chart report for some GO and pathways chart records.

Blog

Reconstructed Salmonella Signaling Network Visualized and Colored

After fold changes were obtained and HGNC names were found for each phosphopeptide, these were used to construct Salmonella signaling network using PCSF and then with the nodes that PCSF found as well, we generated a matrix which has node in the rows and time points in the columns and each cell shows the presence of corresponding protein under the corresponding time point(s). The matrix has 658 nodes (proteins) and 4 time points as indicated before: 2 min, 5 min, 10 min and 20 min.

Blog

Salmonella Data Preprocessing for PCSF Algorithm

This post describes data preprocessing in Salmonella project for Prize-Collecting Steiner Forest Problem (PCSF) algorithm. Salmonella data taken from Table S6 in Phosphoproteomic Analysis of Salmonella-Infected Cells Identifies Key Kinase Regulators and SopB-Dependent Host Phosphorylation Events by Rogers, LD et al. has been converted to tab delimited TXT file from its original XLS file for easy reading in Python. The data should be separated into time points files (2, 5, 10 and 20 minutes) each of which will contain corresponding phophoproteins and their fold changes.

Blog

Data Preprocessing I for Salmon Project

Since we’ll be using R for most of the analyses, we converted XLS data file to CSV using MS Office Excel 2013 and then we had to fix several lines using Sublime Text 2 because three colums in these lines were left unquoted which later created a problem reading in RStudio. The data contains phosphorylation data of 8553 peptides. There are many missing data points for many peptides and since IPI IDs were used for peptides and these are not supported now, we had to convert IPI IDs to HGNC approved symbols although data had these symbols as names but they looked outdated.

Blog

GO Enrichment of Network Clusters

In my previous post, I mentioned how I clustered the network we obtained at the end. For functional annotation gene ontology (GO) enrichment has been done on these clusters. There were 20 clusters and the HGNC names are obtained separately for each cluster and using DAVID functional annotation tool API, GO and pathway annotations are collected per cluster and these are saved separately. http://david.abcc.ncifcrf.gov/api.jsp?type=OFFICIAL_GENE_SYMBOL&tool=chartReport&annot=GOTERM_BP_FAT,GOTERM_CC_FAT,GOTERM_MF_FAT,BBID,BIOCARTA,KEGG_PATHWAY&ids=HGNC_NAME1,HGNC_NAME2,HGNC_NAME3,... Above URL was used to obtain chart report for some GO and pathways chart records.

Blog

Network Clustering with NeAT - RNSC Algorithm

As we have obtained proteins at different times points from the experimental data, then we have found intermediate nodes (from human interactome) using PCSF algorithm and finally with a special matrix from the network that PCSF created, we have validated the edges and also determined edge directions using an approach which a divide and conquer (ILP) approach for construction of large-scale signaling networks from PPI data. The resulting network is a directed network and will be used and visualized for further analyses.

Blog

Network Clustering with NeAT - RNSC Algorithm

As we have obtained proteins at different times points from the experimental data, then we have found intermediate nodes (from human interactome) using PCSF algorithm and finally with a special matrix from the network that PCSF created, we have validated the edges and also determined edge directions using an approach which a divide and conquer (ILP) approach for construction of large-scale signaling networks from PPI data. The resulting network is a directed network and will be used and visualized for further analyses.

Blog

Network Clustering with NeAT - RNSC Algorithm

As we have obtained proteins at different times points from the experimental data, then we have found intermediate nodes (from human interactome) using PCSF algorithm and finally with a special matrix from the network that PCSF created, we have validated the edges and also determined edge directions using an approach which a divide and conquer (ILP) approach for construction of large-scale signaling networks from PPI data. The resulting network is a directed network and will be used and visualized for further analyses.

Blog

Network Clustering with NeAT - RNSC Algorithm

As we have obtained proteins at different times points from the experimental data, then we have found intermediate nodes (from human interactome) using PCSF algorithm and finally with a special matrix from the network that PCSF created, we have validated the edges and also determined edge directions using an approach which a divide and conquer (ILP) approach for construction of large-scale signaling networks from PPI data. The resulting network is a directed network and will be used and visualized for further analyses.

Blog

Finding k-cores and Clustering Coefficient Computation with NetworkX

Assume you have a large network and you want to find k-cores of each node and also you want to compute clustering coefficient for each one. Python package NetworkX comes with very nice methods for you to easily do these. k-core is a maximal subgraph whose nodes are at least k degree [1]. To find k-cores: Add all edges you have in your network in a NetworkX graph, and use core_number method that gets graph as the single input and returns node – k-core pairs.

Blog

Finding k-cores and Clustering Coefficient Computation with NetworkX

Assume you have a large network and you want to find k-cores of each node and also you want to compute clustering coefficient for each one. Python package NetworkX comes with very nice methods for you to easily do these. k-core is a maximal subgraph whose nodes are at least k degree [1]. To find k-cores: Add all edges you have in your network in a NetworkX graph, and use core_number method that gets graph as the single input and returns node – k-core pairs.

Blog

Finding k-cores and Clustering Coefficient Computation with NetworkX

Assume you have a large network and you want to find k-cores of each node and also you want to compute clustering coefficient for each one. Python package NetworkX comes with very nice methods for you to easily do these. k-core is a maximal subgraph whose nodes are at least k degree [1]. To find k-cores: Add all edges you have in your network in a NetworkX graph, and use core_number method that gets graph as the single input and returns node – k-core pairs.

Blog

Finding k-cores and Clustering Coefficient Computation with NetworkX

Assume you have a large network and you want to find k-cores of each node and also you want to compute clustering coefficient for each one. Python package NetworkX comes with very nice methods for you to easily do these. k-core is a maximal subgraph whose nodes are at least k degree [1]. To find k-cores: Add all edges you have in your network in a NetworkX graph, and use core_number method that gets graph as the single input and returns node – k-core pairs.

Blog

Searching Open Reading Frames (ORF) in DNA sequences - ORF Finder

Open reading frames (ORF) are regions on DNA which are translated into protein. They are in between start and stop codons and they are usually long. The Python script below searches for ORFs in six frames and returns the longest one. It doesn’t consider start codon as a delimiter and only splits the sequence by stop codons. So the ORF can start with any codon but ends with a stop codon (TAG, TGA, TAA).

Blog

Searching Open Reading Frames (ORF) in DNA sequences - ORF Finder

Open reading frames (ORF) are regions on DNA which are translated into protein. They are in between start and stop codons and they are usually long. The Python script below searches for ORFs in six frames and returns the longest one. It doesn’t consider start codon as a delimiter and only splits the sequence by stop codons. So the ORF can start with any codon but ends with a stop codon (TAG, TGA, TAA).

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

Searching Open Reading Frames (ORF) in DNA sequences - ORF Finder

Open reading frames (ORF) are regions on DNA which are translated into protein. They are in between start and stop codons and they are usually long. The Python script below searches for ORFs in six frames and returns the longest one. It doesn’t consider start codon as a delimiter and only splits the sequence by stop codons. So the ORF can start with any codon but ends with a stop codon (TAG, TGA, TAA).

Blog

Searching Open Reading Frames (ORF) in DNA sequences - ORF Finder

Open reading frames (ORF) are regions on DNA which are translated into protein. They are in between start and stop codons and they are usually long. The Python script below searches for ORFs in six frames and returns the longest one. It doesn’t consider start codon as a delimiter and only splits the sequence by stop codons. So the ORF can start with any codon but ends with a stop codon (TAG, TGA, TAA).

Blog

Searching Open Reading Frames (ORF) in DNA sequences - ORF Finder

Open reading frames (ORF) are regions on DNA which are translated into protein. They are in between start and stop codons and they are usually long. The Python script below searches for ORFs in six frames and returns the longest one. It doesn’t consider start codon as a delimiter and only splits the sequence by stop codons. So the ORF can start with any codon but ends with a stop codon (TAG, TGA, TAA).

Blog

Searching Open Reading Frames (ORF) in DNA sequences - ORF Finder

Open reading frames (ORF) are regions on DNA which are translated into protein. They are in between start and stop codons and they are usually long. The Python script below searches for ORFs in six frames and returns the longest one. It doesn’t consider start codon as a delimiter and only splits the sequence by stop codons. So the ORF can start with any codon but ends with a stop codon (TAG, TGA, TAA).

Blog

Searching Open Reading Frames (ORF) in DNA sequences - ORF Finder

Open reading frames (ORF) are regions on DNA which are translated into protein. They are in between start and stop codons and they are usually long. The Python script below searches for ORFs in six frames and returns the longest one. It doesn’t consider start codon as a delimiter and only splits the sequence by stop codons. So the ORF can start with any codon but ends with a stop codon (TAG, TGA, TAA).

Blog

Searching Open Reading Frames (ORF) in DNA sequences - ORF Finder

Open reading frames (ORF) are regions on DNA which are translated into protein. They are in between start and stop codons and they are usually long. The Python script below searches for ORFs in six frames and returns the longest one. It doesn’t consider start codon as a delimiter and only splits the sequence by stop codons. So the ORF can start with any codon but ends with a stop codon (TAG, TGA, TAA).

Blog

Reconstructed Salmonella Signaling Network Visualized and Colored

After fold changes were obtained and HGNC names were found for each phosphopeptide, these were used to construct Salmonella signaling network using PCSF and then with the nodes that PCSF found as well, we generated a matrix which has node in the rows and time points in the columns and each cell shows the presence of corresponding protein under the corresponding time point(s). The matrix has 658 nodes (proteins) and 4 time points as indicated before: 2 min, 5 min, 10 min and 20 min.

Blog

Salmonella Data Preprocessing for PCSF Algorithm

This post describes data preprocessing in Salmonella project for Prize-Collecting Steiner Forest Problem (PCSF) algorithm. Salmonella data taken from Table S6 in Phosphoproteomic Analysis of Salmonella-Infected Cells Identifies Key Kinase Regulators and SopB-Dependent Host Phosphorylation Events by Rogers, LD et al. has been converted to tab delimited TXT file from its original XLS file for easy reading in Python. The data should be separated into time points files (2, 5, 10 and 20 minutes) each of which will contain corresponding phophoproteins and their fold changes.

Blog

Reconstructed Salmonella Signaling Network Visualized and Colored

After fold changes were obtained and HGNC names were found for each phosphopeptide, these were used to construct Salmonella signaling network using PCSF and then with the nodes that PCSF found as well, we generated a matrix which has node in the rows and time points in the columns and each cell shows the presence of corresponding protein under the corresponding time point(s). The matrix has 658 nodes (proteins) and 4 time points as indicated before: 2 min, 5 min, 10 min and 20 min.

Blog

Reconstructed Salmonella Signaling Network Visualized and Colored

After fold changes were obtained and HGNC names were found for each phosphopeptide, these were used to construct Salmonella signaling network using PCSF and then with the nodes that PCSF found as well, we generated a matrix which has node in the rows and time points in the columns and each cell shows the presence of corresponding protein under the corresponding time point(s). The matrix has 658 nodes (proteins) and 4 time points as indicated before: 2 min, 5 min, 10 min and 20 min.

Blog

Reconstructed Salmonella Signaling Network Visualized and Colored

After fold changes were obtained and HGNC names were found for each phosphopeptide, these were used to construct Salmonella signaling network using PCSF and then with the nodes that PCSF found as well, we generated a matrix which has node in the rows and time points in the columns and each cell shows the presence of corresponding protein under the corresponding time point(s). The matrix has 658 nodes (proteins) and 4 time points as indicated before: 2 min, 5 min, 10 min and 20 min.

Blog

Salmonella Data Preprocessing for PCSF Algorithm

This post describes data preprocessing in Salmonella project for Prize-Collecting Steiner Forest Problem (PCSF) algorithm. Salmonella data taken from Table S6 in Phosphoproteomic Analysis of Salmonella-Infected Cells Identifies Key Kinase Regulators and SopB-Dependent Host Phosphorylation Events by Rogers, LD et al. has been converted to tab delimited TXT file from its original XLS file for easy reading in Python. The data should be separated into time points files (2, 5, 10 and 20 minutes) each of which will contain corresponding phophoproteins and their fold changes.

Blog

Data Preprocessing II for Salmon Project

So in our Multi-dimensional Modeling and Reconstruction of Signaling Networks in Salmonella-infected Human Cells project, we have several methods to construct the networks so the data is still needed to be preprocessed so that it can be ready to be analyzed with these methods. One method needed to have a matrix first row as protein name and time series (2 min, 5 min, 10 min, 20 min), and the values of the proteins in each time series were to be 1 or 0 according to variance, significance and the size of fold change.

Blog

Data Preprocessing I for Salmon Project

Since we’ll be using R for most of the analyses, we converted XLS data file to CSV using MS Office Excel 2013 and then we had to fix several lines using Sublime Text 2 because three colums in these lines were left unquoted which later created a problem reading in RStudio. The data contains phosphorylation data of 8553 peptides. There are many missing data points for many peptides and since IPI IDs were used for peptides and these are not supported now, we had to convert IPI IDs to HGNC approved symbols although data had these symbols as names but they looked outdated.

Blog

Multi-dimensional Modeling and Reconstruction of Signaling Networks in Salmonella-infected Human Cells

In this study, we’re going to use a phosphorylation data from a research paper on phosphoproteomic analysis of related cells. The idea is to use and compare existing methods and develop these methods to be able to better understand the nature of signaling events in these cells and to find key proteins that might be targets for disease diagnosis, prevention and treatment. This study will be submitted as a research paper so I’m not going to publish any results here for now but I’ll mention the struggles I have and solutions I try to solve them.

Blog

Python: Get Longest String in a List

Here is a quick Python trick you might use in your code. Assume you have a list of strings and you want to get the longest one in the most efficient way. 1>>>l=["aaa", "bb", "c"] 2>>>longest_string = max(l, key = len) 3>>>longest_string 4'aaa'

Blog

Python: Get Longest String in a List

Here is a quick Python trick you might use in your code. Assume you have a list of strings and you want to get the longest one in the most efficient way. 1>>>l=["aaa", "bb", "c"] 2>>>longest_string = max(l, key = len) 3>>>longest_string 4'aaa'

Blog

Python: Get Longest String in a List

Here is a quick Python trick you might use in your code. Assume you have a list of strings and you want to get the longest one in the most efficient way. 1>>>l=["aaa", "bb", "c"] 2>>>longest_string = max(l, key = len) 3>>>longest_string 4'aaa'

Blog

SRS'de Coklu Arama Yapmak

Inceleme yapan scriptin en son hali, oncekilere gore daha fazla okuma inceliyor oldugu icin her okuma icin SRS uzerinde isim aramak oldukca zaman alan bir islemdi. Oyle ki, son inceleme 4 gun surdu. Bunu azaltmak icin inceleme scriptini tamamen degistirdim. Oncelikle her zaman oldugu gibi esik degerini gecenleri aliyor ama direkt bunlarin ID numaralarini bir dizide (array) listeliyorum. Daha sonra bu listenin herbir elemanini boru karakteri ile ayirarak bir string haline getiriyorum.

Blog

Python: defaultdict(list) Dictionary of Lists

Most of the time, when you need to work on large data, you’ll have to use some dictionaries in Python. Dictionaries of lists are very useful to store large data in very organized way. You can always initiate them by initiating empty lists inside an empty dictionary but when you don’t know how many of them you’ll end up with and if you want an easier option, use defaultdict(list). You just need to import it, first:

Blog

Python: defaultdict(list) Dictionary of Lists

Most of the time, when you need to work on large data, you’ll have to use some dictionaries in Python. Dictionaries of lists are very useful to store large data in very organized way. You can always initiate them by initiating empty lists inside an empty dictionary but when you don’t know how many of them you’ll end up with and if you want an easier option, use defaultdict(list). You just need to import it, first:

Blog

Python: extend() Append Elements of a List to a List

When you append a list to a list by using append() method, you’ll see your list is going to be appended as a list: 1>>>l=["a"] 2>>>l2=["a", "b"] 3>>>l.append(l2) 4>>>l 5['a', ['a', 'b']] If you want to append elements of the list directly without creating nested lists, use extend() method: 1>>>l=["a"] 2>>>l2=["a", "b"] 3>>>l.extend(l2) 4>>>l 5['a', 'a', 'b']

Blog

Python: extend() Append Elements of a List to a List

When you append a list to a list by using append() method, you’ll see your list is going to be appended as a list: 1>>>l=["a"] 2>>>l2=["a", "b"] 3>>>l.append(l2) 4>>>l 5['a', ['a', 'b']] If you want to append elements of the list directly without creating nested lists, use extend() method: 1>>>l=["a"] 2>>>l2=["a", "b"] 3>>>l.extend(l2) 4>>>l 5['a', 'a', 'b']

Blog

Python: extend() Append Elements of a List to a List

When you append a list to a list by using append() method, you’ll see your list is going to be appended as a list: 1>>>l=["a"] 2>>>l2=["a", "b"] 3>>>l.append(l2) 4>>>l 5['a', ['a', 'b']] If you want to append elements of the list directly without creating nested lists, use extend() method: 1>>>l=["a"] 2>>>l2=["a", "b"] 3>>>l.extend(l2) 4>>>l 5['a', 'a', 'b']

Blog

Python: extend() Append Elements of a List to a List

When you append a list to a list by using append() method, you’ll see your list is going to be appended as a list: 1>>>l=["a"] 2>>>l2=["a", "b"] 3>>>l.append(l2) 4>>>l 5['a', ['a', 'b']] If you want to append elements of the list directly without creating nested lists, use extend() method: 1>>>l=["a"] 2>>>l2=["a", "b"] 3>>>l.extend(l2) 4>>>l 5['a', 'a', 'b']

Blog

Python: extend() Append Elements of a List to a List

When you append a list to a list by using append() method, you’ll see your list is going to be appended as a list: 1>>>l=["a"] 2>>>l2=["a", "b"] 3>>>l.append(l2) 4>>>l 5['a', ['a', 'b']] If you want to append elements of the list directly without creating nested lists, use extend() method: 1>>>l=["a"] 2>>>l2=["a", "b"] 3>>>l.extend(l2) 4>>>l 5['a', 'a', 'b']

Blog

Salmonella Data Preprocessing for PCSF Algorithm

This post describes data preprocessing in Salmonella project for Prize-Collecting Steiner Forest Problem (PCSF) algorithm. Salmonella data taken from Table S6 in Phosphoproteomic Analysis of Salmonella-Infected Cells Identifies Key Kinase Regulators and SopB-Dependent Host Phosphorylation Events by Rogers, LD et al. has been converted to tab delimited TXT file from its original XLS file for easy reading in Python. The data should be separated into time points files (2, 5, 10 and 20 minutes) each of which will contain corresponding phophoproteins and their fold changes.

Blog

Salmonella Data Preprocessing for PCSF Algorithm

This post describes data preprocessing in Salmonella project for Prize-Collecting Steiner Forest Problem (PCSF) algorithm. Salmonella data taken from Table S6 in Phosphoproteomic Analysis of Salmonella-Infected Cells Identifies Key Kinase Regulators and SopB-Dependent Host Phosphorylation Events by Rogers, LD et al. has been converted to tab delimited TXT file from its original XLS file for easy reading in Python. The data should be separated into time points files (2, 5, 10 and 20 minutes) each of which will contain corresponding phophoproteins and their fold changes.

Blog

UPGMA Algorithm Described - Unweighted Pair-Group Method with Arithmetic Mean

UPGMA is an agglomerative clustering algorithm that is ultrametric (assumes a molecular clock - all lineages are evolving at a constant rate) by Sokal and Michener in 1958. The idea is to continue iteration until only one cluster is obtained and at each iteration, join two nearest clusters (which become a higher cluster). The distance between any two clusters are calculated by averaging distances between elements of each cluster. To understand better, see UPGMA worked example by Dr Richard Edwards.

Blog

UPGMA Algorithm Described - Unweighted Pair-Group Method with Arithmetic Mean

UPGMA is an agglomerative clustering algorithm that is ultrametric (assumes a molecular clock - all lineages are evolving at a constant rate) by Sokal and Michener in 1958. The idea is to continue iteration until only one cluster is obtained and at each iteration, join two nearest clusters (which become a higher cluster). The distance between any two clusters are calculated by averaging distances between elements of each cluster. To understand better, see UPGMA worked example by Dr Richard Edwards.

Blog

UPGMA Algorithm Described - Unweighted Pair-Group Method with Arithmetic Mean

UPGMA is an agglomerative clustering algorithm that is ultrametric (assumes a molecular clock - all lineages are evolving at a constant rate) by Sokal and Michener in 1958. The idea is to continue iteration until only one cluster is obtained and at each iteration, join two nearest clusters (which become a higher cluster). The distance between any two clusters are calculated by averaging distances between elements of each cluster. To understand better, see UPGMA worked example by Dr Richard Edwards.

Blog

UPGMA Algorithm Described - Unweighted Pair-Group Method with Arithmetic Mean

UPGMA is an agglomerative clustering algorithm that is ultrametric (assumes a molecular clock - all lineages are evolving at a constant rate) by Sokal and Michener in 1958. The idea is to continue iteration until only one cluster is obtained and at each iteration, join two nearest clusters (which become a higher cluster). The distance between any two clusters are calculated by averaging distances between elements of each cluster. To understand better, see UPGMA worked example by Dr Richard Edwards.

Blog

UPGMA Algorithm Described - Unweighted Pair-Group Method with Arithmetic Mean

UPGMA is an agglomerative clustering algorithm that is ultrametric (assumes a molecular clock - all lineages are evolving at a constant rate) by Sokal and Michener in 1958. The idea is to continue iteration until only one cluster is obtained and at each iteration, join two nearest clusters (which become a higher cluster). The distance between any two clusters are calculated by averaging distances between elements of each cluster. To understand better, see UPGMA worked example by Dr Richard Edwards.

Blog

UPGMA Algorithm Described - Unweighted Pair-Group Method with Arithmetic Mean

UPGMA is an agglomerative clustering algorithm that is ultrametric (assumes a molecular clock - all lineages are evolving at a constant rate) by Sokal and Michener in 1958. The idea is to continue iteration until only one cluster is obtained and at each iteration, join two nearest clusters (which become a higher cluster). The distance between any two clusters are calculated by averaging distances between elements of each cluster. To understand better, see UPGMA worked example by Dr Richard Edwards.

Blog

Structural Superimposition of Local Sequence Alignment using BioPython

This task was given to me as a homework in one of my courses at the university and I wanted to share my solution as I saw there is no such entry on the Internet. Objectives here are; Download (two) PDB files automatically from the server Do the pairwise alignment after getting their amino acid sequences Superimpose them and report RMSD Bio.PDB module from BioPython works very well in this case.

Blog

Structural Superimposition of Local Sequence Alignment using BioPython

This task was given to me as a homework in one of my courses at the university and I wanted to share my solution as I saw there is no such entry on the Internet. Objectives here are; Download (two) PDB files automatically from the server Do the pairwise alignment after getting their amino acid sequences Superimpose them and report RMSD Bio.PDB module from BioPython works very well in this case.

Blog

Structural Superimposition of Local Sequence Alignment using BioPython

This task was given to me as a homework in one of my courses at the university and I wanted to share my solution as I saw there is no such entry on the Internet. Objectives here are; Download (two) PDB files automatically from the server Do the pairwise alignment after getting their amino acid sequences Superimpose them and report RMSD Bio.PDB module from BioPython works very well in this case.

Blog

Structural Superimposition of Local Sequence Alignment using BioPython

This task was given to me as a homework in one of my courses at the university and I wanted to share my solution as I saw there is no such entry on the Internet. Objectives here are; Download (two) PDB files automatically from the server Do the pairwise alignment after getting their amino acid sequences Superimpose them and report RMSD Bio.PDB module from BioPython works very well in this case.

Blog

Structural Superimposition of Local Sequence Alignment using BioPython

This task was given to me as a homework in one of my courses at the university and I wanted to share my solution as I saw there is no such entry on the Internet. Objectives here are; Download (two) PDB files automatically from the server Do the pairwise alignment after getting their amino acid sequences Superimpose them and report RMSD Bio.PDB module from BioPython works very well in this case.

Blog

Structural Superimposition of Local Sequence Alignment using BioPython

This task was given to me as a homework in one of my courses at the university and I wanted to share my solution as I saw there is no such entry on the Internet. Objectives here are; Download (two) PDB files automatically from the server Do the pairwise alignment after getting their amino acid sequences Superimpose them and report RMSD Bio.PDB module from BioPython works very well in this case.

Blog

Structural Superimposition of Local Sequence Alignment using BioPython

This task was given to me as a homework in one of my courses at the university and I wanted to share my solution as I saw there is no such entry on the Internet. Objectives here are; Download (two) PDB files automatically from the server Do the pairwise alignment after getting their amino acid sequences Superimpose them and report RMSD Bio.PDB module from BioPython works very well in this case.

Blog

How to Install openpyxl on Windows

openpyxl is a Python library to read/write Excel 2007 xlsx/xlsm files. To download and install on Windows: Download it from Python Packages Then to install, extract the tar ball you downloaded, open up CMD, navigate to the folder that you extracted and run the following: C:\Users\Gungor>cd Downloads\openpyxl-2.1.2.tar\dist\openpyxl-2.1.2\openpyxl-2.1.2 C:\Users\Gungor\Downloads\openpyxl-2.1.2.tar\dist\openpyxl-2.1.2\openpyxl-2.1.2>python setup.py install It’s going to install everything and will report any error. If there is nothing that seems like an error. You’re good to go.

Blog

How to Install openpyxl on Windows

openpyxl is a Python library to read/write Excel 2007 xlsx/xlsm files. To download and install on Windows: Download it from Python Packages Then to install, extract the tar ball you downloaded, open up CMD, navigate to the folder that you extracted and run the following: C:\Users\Gungor>cd Downloads\openpyxl-2.1.2.tar\dist\openpyxl-2.1.2\openpyxl-2.1.2 C:\Users\Gungor\Downloads\openpyxl-2.1.2.tar\dist\openpyxl-2.1.2\openpyxl-2.1.2>python setup.py install It’s going to install everything and will report any error. If there is nothing that seems like an error. You’re good to go.

Blog

How to Install openpyxl on Windows

openpyxl is a Python library to read/write Excel 2007 xlsx/xlsm files. To download and install on Windows: Download it from Python Packages Then to install, extract the tar ball you downloaded, open up CMD, navigate to the folder that you extracted and run the following: C:\Users\Gungor>cd Downloads\openpyxl-2.1.2.tar\dist\openpyxl-2.1.2\openpyxl-2.1.2 C:\Users\Gungor\Downloads\openpyxl-2.1.2.tar\dist\openpyxl-2.1.2\openpyxl-2.1.2>python setup.py install It’s going to install everything and will report any error. If there is nothing that seems like an error. You’re good to go.

Blog

How to Install openpyxl on Windows

openpyxl is a Python library to read/write Excel 2007 xlsx/xlsm files. To download and install on Windows: Download it from Python Packages Then to install, extract the tar ball you downloaded, open up CMD, navigate to the folder that you extracted and run the following: C:\Users\Gungor>cd Downloads\openpyxl-2.1.2.tar\dist\openpyxl-2.1.2\openpyxl-2.1.2 C:\Users\Gungor\Downloads\openpyxl-2.1.2.tar\dist\openpyxl-2.1.2\openpyxl-2.1.2>python setup.py install It’s going to install everything and will report any error. If there is nothing that seems like an error. You’re good to go.

Blog

How to Install openpyxl on Windows

openpyxl is a Python library to read/write Excel 2007 xlsx/xlsm files. To download and install on Windows: Download it from Python Packages Then to install, extract the tar ball you downloaded, open up CMD, navigate to the folder that you extracted and run the following: C:\Users\Gungor>cd Downloads\openpyxl-2.1.2.tar\dist\openpyxl-2.1.2\openpyxl-2.1.2 C:\Users\Gungor\Downloads\openpyxl-2.1.2.tar\dist\openpyxl-2.1.2\openpyxl-2.1.2>python setup.py install It’s going to install everything and will report any error. If there is nothing that seems like an error. You’re good to go.

Blog

How to Install openpyxl on Windows

openpyxl is a Python library to read/write Excel 2007 xlsx/xlsm files. To download and install on Windows: Download it from Python Packages Then to install, extract the tar ball you downloaded, open up CMD, navigate to the folder that you extracted and run the following: C:\Users\Gungor>cd Downloads\openpyxl-2.1.2.tar\dist\openpyxl-2.1.2\openpyxl-2.1.2 C:\Users\Gungor\Downloads\openpyxl-2.1.2.tar\dist\openpyxl-2.1.2\openpyxl-2.1.2>python setup.py install It’s going to install everything and will report any error. If there is nothing that seems like an error. You’re good to go.

Blog

How to Install Numpy Python Package on Windows

Numpy (Numerical Python) is a great Python package that you should definitely make use of if you’re doing scientific computing Installing it on Windows might be difficult if you don’t know how to do it via command line. There are unofficial Windows binaries for Numpy for Windows 32 and 64 bit which make it super easy to install. Go to the link below and download the one for your system and Python version:http://www.

Blog

Set Up Google Cloud SDK on Windows using Cygwin

Windows isn’t the best environment for software development I believe but if you have to use it there are nice softwares to make it easy for you. Cygwin here will help us to use Google Cloud tools but installation requires certain things that you should be aware of beforehand. You’ll need Python latest 2.7.x Google Cloud SDK Cygwin 32-bit (i.e. setup-x86.exe - note only this one works) openssh, curl and latest 2.

Blog

How to Convert PLINK Binary Formats into Non-binary Formats

PLINK is a whole genome association analysis toolset and to save time and space, you need to convert your data files to binary formats (BED, FAM, BIM) but of course when you need to view the files, you have to convert them back to non-binary formats (PED, MAP) to be able to open them in your text editor such as Notepad on Windows OS. This operation is really easy. It requires PLINK of course, and the following line of code written to DOS window (Run -> type cmd; hit ENTER) in the directory of PLINK:

Blog

How to Install Numpy Python Package on Windows

Numpy (Numerical Python) is a great Python package that you should definitely make use of if you’re doing scientific computing Installing it on Windows might be difficult if you don’t know how to do it via command line. There are unofficial Windows binaries for Numpy for Windows 32 and 64 bit which make it super easy to install. Go to the link below and download the one for your system and Python version:http://www.

Blog

How to Install Numpy Python Package on Windows

Numpy (Numerical Python) is a great Python package that you should definitely make use of if you’re doing scientific computing Installing it on Windows might be difficult if you don’t know how to do it via command line. There are unofficial Windows binaries for Numpy for Windows 32 and 64 bit which make it super easy to install. Go to the link below and download the one for your system and Python version:http://www.

Blog

How to Install Numpy Python Package on Windows

Numpy (Numerical Python) is a great Python package that you should definitely make use of if you’re doing scientific computing Installing it on Windows might be difficult if you don’t know how to do it via command line. There are unofficial Windows binaries for Numpy for Windows 32 and 64 bit which make it super easy to install. Go to the link below and download the one for your system and Python version:http://www.

Blog

Data Preprocessing II for Salmon Project

So in our Multi-dimensional Modeling and Reconstruction of Signaling Networks in Salmonella-infected Human Cells project, we have several methods to construct the networks so the data is still needed to be preprocessed so that it can be ready to be analyzed with these methods. One method needed to have a matrix first row as protein name and time series (2 min, 5 min, 10 min, 20 min), and the values of the proteins in each time series were to be 1 or 0 according to variance, significance and the size of fold change.

Blog

Data Preprocessing II for Salmon Project

So in our Multi-dimensional Modeling and Reconstruction of Signaling Networks in Salmonella-infected Human Cells project, we have several methods to construct the networks so the data is still needed to be preprocessed so that it can be ready to be analyzed with these methods. One method needed to have a matrix first row as protein name and time series (2 min, 5 min, 10 min, 20 min), and the values of the proteins in each time series were to be 1 or 0 according to variance, significance and the size of fold change.

Blog

Data Preprocessing I for Salmon Project

Since we’ll be using R for most of the analyses, we converted XLS data file to CSV using MS Office Excel 2013 and then we had to fix several lines using Sublime Text 2 because three colums in these lines were left unquoted which later created a problem reading in RStudio. The data contains phosphorylation data of 8553 peptides. There are many missing data points for many peptides and since IPI IDs were used for peptides and these are not supported now, we had to convert IPI IDs to HGNC approved symbols although data had these symbols as names but they looked outdated.

Blog

Multi-dimensional Modeling and Reconstruction of Signaling Networks in Salmonella-infected Human Cells

In this study, we’re going to use a phosphorylation data from a research paper on phosphoproteomic analysis of related cells. The idea is to use and compare existing methods and develop these methods to be able to better understand the nature of signaling events in these cells and to find key proteins that might be targets for disease diagnosis, prevention and treatment. This study will be submitted as a research paper so I’m not going to publish any results here for now but I’ll mention the struggles I have and solutions I try to solve them.

Blog

Data Preprocessing II for Salmon Project

So in our Multi-dimensional Modeling and Reconstruction of Signaling Networks in Salmonella-infected Human Cells project, we have several methods to construct the networks so the data is still needed to be preprocessed so that it can be ready to be analyzed with these methods. One method needed to have a matrix first row as protein name and time series (2 min, 5 min, 10 min, 20 min), and the values of the proteins in each time series were to be 1 or 0 according to variance, significance and the size of fold change.

Blog

Data Preprocessing I for Salmon Project

Since we’ll be using R for most of the analyses, we converted XLS data file to CSV using MS Office Excel 2013 and then we had to fix several lines using Sublime Text 2 because three colums in these lines were left unquoted which later created a problem reading in RStudio. The data contains phosphorylation data of 8553 peptides. There are many missing data points for many peptides and since IPI IDs were used for peptides and these are not supported now, we had to convert IPI IDs to HGNC approved symbols although data had these symbols as names but they looked outdated.

Blog

Multi-dimensional Modeling and Reconstruction of Signaling Networks in Salmonella-infected Human Cells

In this study, we’re going to use a phosphorylation data from a research paper on phosphoproteomic analysis of related cells. The idea is to use and compare existing methods and develop these methods to be able to better understand the nature of signaling events in these cells and to find key proteins that might be targets for disease diagnosis, prevention and treatment. This study will be submitted as a research paper so I’m not going to publish any results here for now but I’ll mention the struggles I have and solutions I try to solve them.

Blog

Data Preprocessing II for Salmon Project

So in our Multi-dimensional Modeling and Reconstruction of Signaling Networks in Salmonella-infected Human Cells project, we have several methods to construct the networks so the data is still needed to be preprocessed so that it can be ready to be analyzed with these methods. One method needed to have a matrix first row as protein name and time series (2 min, 5 min, 10 min, 20 min), and the values of the proteins in each time series were to be 1 or 0 according to variance, significance and the size of fold change.

Blog

Data Preprocessing I for Salmon Project

Since we’ll be using R for most of the analyses, we converted XLS data file to CSV using MS Office Excel 2013 and then we had to fix several lines using Sublime Text 2 because three colums in these lines were left unquoted which later created a problem reading in RStudio. The data contains phosphorylation data of 8553 peptides. There are many missing data points for many peptides and since IPI IDs were used for peptides and these are not supported now, we had to convert IPI IDs to HGNC approved symbols although data had these symbols as names but they looked outdated.

Blog

Multi-dimensional Modeling and Reconstruction of Signaling Networks in Salmonella-infected Human Cells

In this study, we’re going to use a phosphorylation data from a research paper on phosphoproteomic analysis of related cells. The idea is to use and compare existing methods and develop these methods to be able to better understand the nature of signaling events in these cells and to find key proteins that might be targets for disease diagnosis, prevention and treatment. This study will be submitted as a research paper so I’m not going to publish any results here for now but I’ll mention the struggles I have and solutions I try to solve them.

Blog

Data Preprocessing II for Salmon Project

So in our Multi-dimensional Modeling and Reconstruction of Signaling Networks in Salmonella-infected Human Cells project, we have several methods to construct the networks so the data is still needed to be preprocessed so that it can be ready to be analyzed with these methods. One method needed to have a matrix first row as protein name and time series (2 min, 5 min, 10 min, 20 min), and the values of the proteins in each time series were to be 1 or 0 according to variance, significance and the size of fold change.

Blog

How to Convert PED to FASTA

You may need the conversion of PED files to FASTA format in your studies for further analyses. Use below script for this purpose. PED to FASTA converter on GitHub Gets first 6 columns of each line as header line and the rest as the sequence replacing 0s with Ns and organizes it into a FASTA file. Note 0s are for missing nucleotides defined by default in PLINK How to run:

Blog

How to Convert PLINK Binary Formats into Non-binary Formats

PLINK is a whole genome association analysis toolset and to save time and space, you need to convert your data files to binary formats (BED, FAM, BIM) but of course when you need to view the files, you have to convert them back to non-binary formats (PED, MAP) to be able to open them in your text editor such as Notepad on Windows OS. This operation is really easy. It requires PLINK of course, and the following line of code written to DOS window (Run -> type cmd; hit ENTER) in the directory of PLINK:

Blog

How to Convert PED to FASTA

You may need the conversion of PED files to FASTA format in your studies for further analyses. Use below script for this purpose. PED to FASTA converter on GitHub Gets first 6 columns of each line as header line and the rest as the sequence replacing 0s with Ns and organizes it into a FASTA file. Note 0s are for missing nucleotides defined by default in PLINK How to run:

Blog

How to Convert PED to FASTA

You may need the conversion of PED files to FASTA format in your studies for further analyses. Use below script for this purpose. PED to FASTA converter on GitHub Gets first 6 columns of each line as header line and the rest as the sequence replacing 0s with Ns and organizes it into a FASTA file. Note 0s are for missing nucleotides defined by default in PLINK How to run:

Blog

How to Convert PLINK Binary Formats into Non-binary Formats

PLINK is a whole genome association analysis toolset and to save time and space, you need to convert your data files to binary formats (BED, FAM, BIM) but of course when you need to view the files, you have to convert them back to non-binary formats (PED, MAP) to be able to open them in your text editor such as Notepad on Windows OS. This operation is really easy. It requires PLINK of course, and the following line of code written to DOS window (Run -> type cmd; hit ENTER) in the directory of PLINK:

Blog

Data Preprocessing I for Salmon Project

Since we’ll be using R for most of the analyses, we converted XLS data file to CSV using MS Office Excel 2013 and then we had to fix several lines using Sublime Text 2 because three colums in these lines were left unquoted which later created a problem reading in RStudio. The data contains phosphorylation data of 8553 peptides. There are many missing data points for many peptides and since IPI IDs were used for peptides and these are not supported now, we had to convert IPI IDs to HGNC approved symbols although data had these symbols as names but they looked outdated.

Blog

Data Preprocessing I for Salmon Project

Since we’ll be using R for most of the analyses, we converted XLS data file to CSV using MS Office Excel 2013 and then we had to fix several lines using Sublime Text 2 because three colums in these lines were left unquoted which later created a problem reading in RStudio. The data contains phosphorylation data of 8553 peptides. There are many missing data points for many peptides and since IPI IDs were used for peptides and these are not supported now, we had to convert IPI IDs to HGNC approved symbols although data had these symbols as names but they looked outdated.

Blog

Data Preprocessing I for Salmon Project

Since we’ll be using R for most of the analyses, we converted XLS data file to CSV using MS Office Excel 2013 and then we had to fix several lines using Sublime Text 2 because three colums in these lines were left unquoted which later created a problem reading in RStudio. The data contains phosphorylation data of 8553 peptides. There are many missing data points for many peptides and since IPI IDs were used for peptides and these are not supported now, we had to convert IPI IDs to HGNC approved symbols although data had these symbols as names but they looked outdated.

Blog

Data Preprocessing I for Salmon Project

Since we’ll be using R for most of the analyses, we converted XLS data file to CSV using MS Office Excel 2013 and then we had to fix several lines using Sublime Text 2 because three colums in these lines were left unquoted which later created a problem reading in RStudio. The data contains phosphorylation data of 8553 peptides. There are many missing data points for many peptides and since IPI IDs were used for peptides and these are not supported now, we had to convert IPI IDs to HGNC approved symbols although data had these symbols as names but they looked outdated.

Blog

Data Preprocessing I for Salmon Project

Since we’ll be using R for most of the analyses, we converted XLS data file to CSV using MS Office Excel 2013 and then we had to fix several lines using Sublime Text 2 because three colums in these lines were left unquoted which later created a problem reading in RStudio. The data contains phosphorylation data of 8553 peptides. There are many missing data points for many peptides and since IPI IDs were used for peptides and these are not supported now, we had to convert IPI IDs to HGNC approved symbols although data had these symbols as names but they looked outdated.

Blog

Download Human Reference Genome (HG19 - GRCh37)

Many variation calling tools and many other methods in bioinformatics require a reference genome as an input so may need to download human reference genome or sequences. There are several sources that freely and publicly provide the entire human genome and I’ll describe how to download complete human genome from University of California, Santa Cruz (UCSC) webpage. Index to the gzip-compressed FASTA files of human chromosomes can be found here at the UCSC webpage.

Blog

Download Human Reference Genome (HG19 - GRCh37)

Many variation calling tools and many other methods in bioinformatics require a reference genome as an input so may need to download human reference genome or sequences. There are several sources that freely and publicly provide the entire human genome and I’ll describe how to download complete human genome from University of California, Santa Cruz (UCSC) webpage. Index to the gzip-compressed FASTA files of human chromosomes can be found here at the UCSC webpage.

Blog

Download Human Reference Genome (HG19 - GRCh37)

Many variation calling tools and many other methods in bioinformatics require a reference genome as an input so may need to download human reference genome or sequences. There are several sources that freely and publicly provide the entire human genome and I’ll describe how to download complete human genome from University of California, Santa Cruz (UCSC) webpage. Index to the gzip-compressed FASTA files of human chromosomes can be found here at the UCSC webpage.

Blog

Download Human Reference Genome (HG19 - GRCh37)

Many variation calling tools and many other methods in bioinformatics require a reference genome as an input so may need to download human reference genome or sequences. There are several sources that freely and publicly provide the entire human genome and I’ll describe how to download complete human genome from University of California, Santa Cruz (UCSC) webpage. Index to the gzip-compressed FASTA files of human chromosomes can be found here at the UCSC webpage.

Blog

Download Human Reference Genome (HG19 - GRCh37)

Many variation calling tools and many other methods in bioinformatics require a reference genome as an input so may need to download human reference genome or sequences. There are several sources that freely and publicly provide the entire human genome and I’ll describe how to download complete human genome from University of California, Santa Cruz (UCSC) webpage. Index to the gzip-compressed FASTA files of human chromosomes can be found here at the UCSC webpage.

Blog

Download Human Reference Genome (HG19 - GRCh37)

Many variation calling tools and many other methods in bioinformatics require a reference genome as an input so may need to download human reference genome or sequences. There are several sources that freely and publicly provide the entire human genome and I’ll describe how to download complete human genome from University of California, Santa Cruz (UCSC) webpage. Index to the gzip-compressed FASTA files of human chromosomes can be found here at the UCSC webpage.

Blog

Download Human Reference Genome (HG19 - GRCh37)

Many variation calling tools and many other methods in bioinformatics require a reference genome as an input so may need to download human reference genome or sequences. There are several sources that freely and publicly provide the entire human genome and I’ll describe how to download complete human genome from University of California, Santa Cruz (UCSC) webpage. Index to the gzip-compressed FASTA files of human chromosomes can be found here at the UCSC webpage.

Blog

Download Human Reference Genome (HG19 - GRCh37)

Many variation calling tools and many other methods in bioinformatics require a reference genome as an input so may need to download human reference genome or sequences. There are several sources that freely and publicly provide the entire human genome and I’ll describe how to download complete human genome from University of California, Santa Cruz (UCSC) webpage. Index to the gzip-compressed FASTA files of human chromosomes can be found here at the UCSC webpage.

Blog

ClipCrop Installation on Linux Mint 16 nvm, Node, npm Included

ClipCrop is a tool for detecting structural variations from SAM files. And it’s built with Node.js. ClipCrop uses two softwares internally so they should be installed first. Install SHRiMP2 SHRiMP is a software package for aligning genomic reads against a target genome. 1$ mkdir ~/software 2$ cd ~/software 3$ wget http://compbio.cs.toronto.edu/shrimp/releases/SHRiMP_2_2_3.lx26.x86_64.tar.gz 4$ tar xzvf SHRiMP_2_2_3.lx26.x86_64.tar.gz 5$ cd SHRiMP_2_2_3 6$ file bin/gmapper 7$ export SHRIMP_FOLDER=$PWD Install BWA BWA is a software package for mapping low-divergent sequences against a large reference genome.

Blog

Eşleştirme ve Eşleşmeyen Okumaları Çıkarma Sonuçları

Daha önce verinin sadece bir kısmı ile çalışıyordum ancak artık tamamıyla çalışacağım. Bu yüzden bana sıkıştırılmış halde gelen veriyi direkt çalışma klasörüme çıkardım ve onun üzerinden işlemler yaptım. Başlangıç (FASTQ) dosyamın boyutu 2153988289 bayt (2 GB). Ve bwa aracılığıyla eşleştirmeden sonra toplamda 6004193 dizilim, ya da okuma, (sequences ya da reads) ortaya çıktı. Daha sonra eşleşmeyen okumaları çıkarmam sonrasında toplam okuma sayısı 551065 kadar azaldı ve 5493128 oldu. Yani verinin %9.

Blog

BWA İle Eşleştirme (Mapping - Alignment)

Bunu daha önce yazmayı unutmuşum. Aslında bahsetmiştim ancak nasıl yapıldığına dair bir şeyler yazmamışım ayrıca örnek komutlar da eklememişim. BWA elimizdeki (FASTQ formatındaki) DNA dizilimini, referans genomunu (projemde bu insan genomu) alarak bir .sai dosyası oluşturuyor. Bu dosya dizinin ve referans genomunun eşleşmesi ile ilgili bilgiler taşiyor ve bu bilgileri kullanarak eşleşmeyenleri ayırabiliyorum. İlk olarak aşağıdaki komut ile .sai dosyamızı oluşturuyoruz. 1bwa aln $NGSDATAROOT/bwa/human_genome37 ChIP_NoIndex_L001_R1_complete_filtered.fastq > complete_alignment.sai Oluşturduğumuz .sai dosyası çok da kullanışlı bir dosya değil, bu yüzden onu SAM dosyasına çevirerek, işlemlere devam ediyoruz.

Blog

SAM Dosyası - BAM Dosyası - samtools

Aslında programlamam gereken pipeline direkt olarak eşleşmeyen okumalar üzerinden analizler yapacak. Ancak böyle bir veri bulamadiığım için, elimdeki tek veri eşleşen ve eşleşmeyen okumaları içerdiği için önce eşleşenlerden kurtulmam gerekti. Bunu daha önce de belirttiğim gibi bwa eşleştiricisi (aligner - mapper) ile yapıyorum. bwa bir dizi işlemden sonra SAM dosyası oluşturuyor ancak benim FASTQ dosyasına ihtiyacım var. Bunun için SAM dosyasını samtools1 ile benzer bir format olan BAM dosyasına çevirip, daha sonra da bam2fastq2 aracı ile FASTQ dosyamı elde edeceğim.

Blog

FASTQ Formatı - FASTQ Dosyası

Bugün programı oluştururken kullanacağım “test” dizilimini aldım. İki adet FASTQ dosyasından oluşuyor, her biri sıkıştırılmış ama buna rağmen boyutları 6 GB civarı. Ben elbette çok zaman kaybetmek istemediğim için bu dosyalardan birinin sadece bir kısmını kullanacağım. Amacım, bu FASTQ dosyalarındaki eşleşebilen okumaları BWA aracı ile bularak, daha sonra onları çıkarmak. Ve kalan eşleşemeyen okumaları MegaBLAST aracının anlayabileceği bir dilde (FASTA formatında) kaydetmek. Bu arada tüm projeyi bir Unix bilgisayarda hazırladığım için birçok komut öğreniyorum, daha sonra bunları ayrıca yazmaya çalışacağım.

Blog

BWA (Burrows-Wheeler Aligner) Hizalayıcı - Eşleştirici

Önceki yazımda belirttiğim gibi bir eşleştirici (aligner ya da mapper) kullanarak elimdeki verinin referans genomu ile ne derece eşlestiğini bulmaya çalışacağım. Daha sonra eşleşmeyen kısmıyla birtakım analizler yapacağım. BWA (Burrows-Wheeler Aligner) görece kısa dizilimleri insan genomu gibi uzun referans genomlarıyla eşleştiren bir program. 200bp (bp: baz çifti) uzunluğuna kadar bwa-short algoritması, 200bp - 100kbp arası ise BWA-SW algoritması kullanılıyor. Hizalayıcı - eşleştirici seçmede birçok faktör rol oynuyor. Birçok bu tip araç var ve farklı özelliklere sahipler.

Blog

ClipCrop Installation on Linux Mint 16 nvm, Node, npm Included

ClipCrop is a tool for detecting structural variations from SAM files. And it’s built with Node.js. ClipCrop uses two softwares internally so they should be installed first. Install SHRiMP2 SHRiMP is a software package for aligning genomic reads against a target genome. 1$ mkdir ~/software 2$ cd ~/software 3$ wget http://compbio.cs.toronto.edu/shrimp/releases/SHRiMP_2_2_3.lx26.x86_64.tar.gz 4$ tar xzvf SHRiMP_2_2_3.lx26.x86_64.tar.gz 5$ cd SHRiMP_2_2_3 6$ file bin/gmapper 7$ export SHRIMP_FOLDER=$PWD Install BWA BWA is a software package for mapping low-divergent sequences against a large reference genome.

Blog

ClipCrop Installation on Linux Mint 16 nvm, Node, npm Included

ClipCrop is a tool for detecting structural variations from SAM files. And it’s built with Node.js. ClipCrop uses two softwares internally so they should be installed first. Install SHRiMP2 SHRiMP is a software package for aligning genomic reads against a target genome. 1$ mkdir ~/software 2$ cd ~/software 3$ wget http://compbio.cs.toronto.edu/shrimp/releases/SHRiMP_2_2_3.lx26.x86_64.tar.gz 4$ tar xzvf SHRiMP_2_2_3.lx26.x86_64.tar.gz 5$ cd SHRiMP_2_2_3 6$ file bin/gmapper 7$ export SHRIMP_FOLDER=$PWD Install BWA BWA is a software package for mapping low-divergent sequences against a large reference genome.

Blog

ClipCrop Installation on Linux Mint 16 nvm, Node, npm Included

ClipCrop is a tool for detecting structural variations from SAM files. And it’s built with Node.js. ClipCrop uses two softwares internally so they should be installed first. Install SHRiMP2 SHRiMP is a software package for aligning genomic reads against a target genome. 1$ mkdir ~/software 2$ cd ~/software 3$ wget http://compbio.cs.toronto.edu/shrimp/releases/SHRiMP_2_2_3.lx26.x86_64.tar.gz 4$ tar xzvf SHRiMP_2_2_3.lx26.x86_64.tar.gz 5$ cd SHRiMP_2_2_3 6$ file bin/gmapper 7$ export SHRIMP_FOLDER=$PWD Install BWA BWA is a software package for mapping low-divergent sequences against a large reference genome.

Blog

ClipCrop Installation on Linux Mint 16 nvm, Node, npm Included

ClipCrop is a tool for detecting structural variations from SAM files. And it’s built with Node.js. ClipCrop uses two softwares internally so they should be installed first. Install SHRiMP2 SHRiMP is a software package for aligning genomic reads against a target genome. 1$ mkdir ~/software 2$ cd ~/software 3$ wget http://compbio.cs.toronto.edu/shrimp/releases/SHRiMP_2_2_3.lx26.x86_64.tar.gz 4$ tar xzvf SHRiMP_2_2_3.lx26.x86_64.tar.gz 5$ cd SHRiMP_2_2_3 6$ file bin/gmapper 7$ export SHRIMP_FOLDER=$PWD Install BWA BWA is a software package for mapping low-divergent sequences against a large reference genome.

Blog

ClipCrop Installation on Linux Mint 16 nvm, Node, npm Included

ClipCrop is a tool for detecting structural variations from SAM files. And it’s built with Node.js. ClipCrop uses two softwares internally so they should be installed first. Install SHRiMP2 SHRiMP is a software package for aligning genomic reads against a target genome. 1$ mkdir ~/software 2$ cd ~/software 3$ wget http://compbio.cs.toronto.edu/shrimp/releases/SHRiMP_2_2_3.lx26.x86_64.tar.gz 4$ tar xzvf SHRiMP_2_2_3.lx26.x86_64.tar.gz 5$ cd SHRiMP_2_2_3 6$ file bin/gmapper 7$ export SHRIMP_FOLDER=$PWD Install BWA BWA is a software package for mapping low-divergent sequences against a large reference genome.

Blog

ClipCrop Installation on Linux Mint 16 nvm, Node, npm Included

ClipCrop is a tool for detecting structural variations from SAM files. And it’s built with Node.js. ClipCrop uses two softwares internally so they should be installed first. Install SHRiMP2 SHRiMP is a software package for aligning genomic reads against a target genome. 1$ mkdir ~/software 2$ cd ~/software 3$ wget http://compbio.cs.toronto.edu/shrimp/releases/SHRiMP_2_2_3.lx26.x86_64.tar.gz 4$ tar xzvf SHRiMP_2_2_3.lx26.x86_64.tar.gz 5$ cd SHRiMP_2_2_3 6$ file bin/gmapper 7$ export SHRIMP_FOLDER=$PWD Install BWA BWA is a software package for mapping low-divergent sequences against a large reference genome.

Blog

ClipCrop Installation on Linux Mint 16 nvm, Node, npm Included

ClipCrop is a tool for detecting structural variations from SAM files. And it’s built with Node.js. ClipCrop uses two softwares internally so they should be installed first. Install SHRiMP2 SHRiMP is a software package for aligning genomic reads against a target genome. 1$ mkdir ~/software 2$ cd ~/software 3$ wget http://compbio.cs.toronto.edu/shrimp/releases/SHRiMP_2_2_3.lx26.x86_64.tar.gz 4$ tar xzvf SHRiMP_2_2_3.lx26.x86_64.tar.gz 5$ cd SHRiMP_2_2_3 6$ file bin/gmapper 7$ export SHRIMP_FOLDER=$PWD Install BWA BWA is a software package for mapping low-divergent sequences against a large reference genome.

Blog

ClipCrop Installation on Linux Mint 16 nvm, Node, npm Included

ClipCrop is a tool for detecting structural variations from SAM files. And it’s built with Node.js. ClipCrop uses two softwares internally so they should be installed first. Install SHRiMP2 SHRiMP is a software package for aligning genomic reads against a target genome. 1$ mkdir ~/software 2$ cd ~/software 3$ wget http://compbio.cs.toronto.edu/shrimp/releases/SHRiMP_2_2_3.lx26.x86_64.tar.gz 4$ tar xzvf SHRiMP_2_2_3.lx26.x86_64.tar.gz 5$ cd SHRiMP_2_2_3 6$ file bin/gmapper 7$ export SHRIMP_FOLDER=$PWD Install BWA BWA is a software package for mapping low-divergent sequences against a large reference genome.

Blog

SAM Dosyası - BAM Dosyası - samtools

Aslında programlamam gereken pipeline direkt olarak eşleşmeyen okumalar üzerinden analizler yapacak. Ancak böyle bir veri bulamadiığım için, elimdeki tek veri eşleşen ve eşleşmeyen okumaları içerdiği için önce eşleşenlerden kurtulmam gerekti. Bunu daha önce de belirttiğim gibi bwa eşleştiricisi (aligner - mapper) ile yapıyorum. bwa bir dizi işlemden sonra SAM dosyası oluşturuyor ancak benim FASTQ dosyasına ihtiyacım var. Bunun için SAM dosyasını samtools1 ile benzer bir format olan BAM dosyasına çevirip, daha sonra da bam2fastq2 aracı ile FASTQ dosyamı elde edeceğim.

Blog

ClipCrop Installation on Linux Mint 16 nvm, Node, npm Included

ClipCrop is a tool for detecting structural variations from SAM files. And it’s built with Node.js. ClipCrop uses two softwares internally so they should be installed first. Install SHRiMP2 SHRiMP is a software package for aligning genomic reads against a target genome. 1$ mkdir ~/software 2$ cd ~/software 3$ wget http://compbio.cs.toronto.edu/shrimp/releases/SHRiMP_2_2_3.lx26.x86_64.tar.gz 4$ tar xzvf SHRiMP_2_2_3.lx26.x86_64.tar.gz 5$ cd SHRiMP_2_2_3 6$ file bin/gmapper 7$ export SHRIMP_FOLDER=$PWD Install BWA BWA is a software package for mapping low-divergent sequences against a large reference genome.

Blog

JointSNVMix Installation on Linux Mint 16 Cython, Pysam Included

JointSNVMix is a software package that consists of a number of tools for calling somatic mutations in tumour/normal paired NGS data. It requires Python (>= 2.7), Cython (>= 0.13) and Pysam (== 0.5.0). Python must be installed by default ona Linux machine so I will describe the installation of others and JointSNVMix. Note this guide may become outdated after some time so please make sure before following all. Install Cython

Blog

JointSNVMix Installation on Linux Mint 16 Cython, Pysam Included

JointSNVMix is a software package that consists of a number of tools for calling somatic mutations in tumour/normal paired NGS data. It requires Python (>= 2.7), Cython (>= 0.13) and Pysam (== 0.5.0). Python must be installed by default ona Linux machine so I will describe the installation of others and JointSNVMix. Note this guide may become outdated after some time so please make sure before following all. Install Cython

Blog

JointSNVMix Installation on Linux Mint 16 Cython, Pysam Included

JointSNVMix is a software package that consists of a number of tools for calling somatic mutations in tumour/normal paired NGS data. It requires Python (>= 2.7), Cython (>= 0.13) and Pysam (== 0.5.0). Python must be installed by default ona Linux machine so I will describe the installation of others and JointSNVMix. Note this guide may become outdated after some time so please make sure before following all. Install Cython

Blog

JointSNVMix Installation on Linux Mint 16 Cython, Pysam Included

JointSNVMix is a software package that consists of a number of tools for calling somatic mutations in tumour/normal paired NGS data. It requires Python (>= 2.7), Cython (>= 0.13) and Pysam (== 0.5.0). Python must be installed by default ona Linux machine so I will describe the installation of others and JointSNVMix. Note this guide may become outdated after some time so please make sure before following all. Install Cython

Blog

JointSNVMix Installation on Linux Mint 16 Cython, Pysam Included

JointSNVMix is a software package that consists of a number of tools for calling somatic mutations in tumour/normal paired NGS data. It requires Python (>= 2.7), Cython (>= 0.13) and Pysam (== 0.5.0). Python must be installed by default ona Linux machine so I will describe the installation of others and JointSNVMix. Note this guide may become outdated after some time so please make sure before following all. Install Cython

Blog

Set Up Google Cloud SDK on Windows using Cygwin

Windows isn’t the best environment for software development I believe but if you have to use it there are nice softwares to make it easy for you. Cygwin here will help us to use Google Cloud tools but installation requires certain things that you should be aware of beforehand. You’ll need Python latest 2.7.x Google Cloud SDK Cygwin 32-bit (i.e. setup-x86.exe - note only this one works) openssh, curl and latest 2.

Blog

Set Up Google Cloud SDK on Windows using Cygwin

Windows isn’t the best environment for software development I believe but if you have to use it there are nice softwares to make it easy for you. Cygwin here will help us to use Google Cloud tools but installation requires certain things that you should be aware of beforehand. You’ll need Python latest 2.7.x Google Cloud SDK Cygwin 32-bit (i.e. setup-x86.exe - note only this one works) openssh, curl and latest 2.

Blog

Set Up Google Cloud SDK on Windows using Cygwin

Windows isn’t the best environment for software development I believe but if you have to use it there are nice softwares to make it easy for you. Cygwin here will help us to use Google Cloud tools but installation requires certain things that you should be aware of beforehand. You’ll need Python latest 2.7.x Google Cloud SDK Cygwin 32-bit (i.e. setup-x86.exe - note only this one works) openssh, curl and latest 2.

Blog

Set Up Google Cloud SDK on Windows using Cygwin

Windows isn’t the best environment for software development I believe but if you have to use it there are nice softwares to make it easy for you. Cygwin here will help us to use Google Cloud tools but installation requires certain things that you should be aware of beforehand. You’ll need Python latest 2.7.x Google Cloud SDK Cygwin 32-bit (i.e. setup-x86.exe - note only this one works) openssh, curl and latest 2.

Blog

Set Up Google Cloud SDK on Windows using Cygwin

Windows isn’t the best environment for software development I believe but if you have to use it there are nice softwares to make it easy for you. Cygwin here will help us to use Google Cloud tools but installation requires certain things that you should be aware of beforehand. You’ll need Python latest 2.7.x Google Cloud SDK Cygwin 32-bit (i.e. setup-x86.exe - note only this one works) openssh, curl and latest 2.

Blog

Set Up Google Cloud SDK on Windows using Cygwin

Windows isn’t the best environment for software development I believe but if you have to use it there are nice softwares to make it easy for you. Cygwin here will help us to use Google Cloud tools but installation requires certain things that you should be aware of beforehand. You’ll need Python latest 2.7.x Google Cloud SDK Cygwin 32-bit (i.e. setup-x86.exe - note only this one works) openssh, curl and latest 2.

Blog

Super Long Introns of Euarchontoglires

There was another weird result I got about my exon/intron boundaries analysis research. To less diverse species’ genes, intron lengths are shown to increase. However, according to my findings, at a point of Euarchontoglires or Supraprimates, this increase is very sharp and seems unexpected. So, I looked at exon/intron length each gene in each taxonomic rank and try to see what makes Euarchontoglires genes with that long introns. As you see in the graph above, Euarchontoglires introns are very long compared to the rest.

Blog

Super Long Introns of Euarchontoglires

There was another weird result I got about my exon/intron boundaries analysis research. To less diverse species’ genes, intron lengths are shown to increase. However, according to my findings, at a point of Euarchontoglires or Supraprimates, this increase is very sharp and seems unexpected. So, I looked at exon/intron length each gene in each taxonomic rank and try to see what makes Euarchontoglires genes with that long introns. As you see in the graph above, Euarchontoglires introns are very long compared to the rest.

Blog

An Exon of Length 2 Appeared in Ensembl

I want to share an interesting finding about our research on exon/intron analysis of human evolutionary history. So I had the genes that emerged at each pass point of human history and I was using Ensembl API to get exons and introns of these genes to perform further analyses. There was one gene (ENSG00000197568 - HERV-H LTR-associating 3 - HHLA3) with a surprise. Because it’s one transcript (ENST00000432224) had an exon (ENSE00001707577) of length 2.

Blog

How to Get Transcripts (also Exons & Introns) of a Gene using Ensembl API

As a part of my project, I need to obtain exons and introns of certain genes. These genes are actually human genes that are determined for a specific reason that I will describe later when I explain my project. But for now, I want to share the way to obtain this information using (Perl) Ensembl API. Note that Ensembl has started a beautiful way (Ensembl REST API) of getting data but it is beta and it doesn’t provide exons / introns information.

Blog

Install Ensembl API and BioPerl 1.2.3 on Your System

I’m going to work on a project that requires lots of queries on Ensembl databases so I wanted to install Ensembl API to begin with. Since it’s programmed in Perl, I will be using Perl in this project. There is a nice tutorial on Ensembl website for API installation. Here I will describe some steps. 1. Download the API and BioPerl Go to Ensembl FTP ftp://ftp.ensembl.org/pub/ and download “ensembl-api.tar.gz” or click here

Blog

Super Long Introns of Euarchontoglires

There was another weird result I got about my exon/intron boundaries analysis research. To less diverse species’ genes, intron lengths are shown to increase. However, according to my findings, at a point of Euarchontoglires or Supraprimates, this increase is very sharp and seems unexpected. So, I looked at exon/intron length each gene in each taxonomic rank and try to see what makes Euarchontoglires genes with that long introns. As you see in the graph above, Euarchontoglires introns are very long compared to the rest.

Blog

How to Get Transcripts (also Exons & Introns) of a Gene using Ensembl API

As a part of my project, I need to obtain exons and introns of certain genes. These genes are actually human genes that are determined for a specific reason that I will describe later when I explain my project. But for now, I want to share the way to obtain this information using (Perl) Ensembl API. Note that Ensembl has started a beautiful way (Ensembl REST API) of getting data but it is beta and it doesn’t provide exons / introns information.

Blog

Install Ensembl API and BioPerl 1.2.3 on Your System

I’m going to work on a project that requires lots of queries on Ensembl databases so I wanted to install Ensembl API to begin with. Since it’s programmed in Perl, I will be using Perl in this project. There is a nice tutorial on Ensembl website for API installation. Here I will describe some steps. 1. Download the API and BioPerl Go to Ensembl FTP ftp://ftp.ensembl.org/pub/ and download “ensembl-api.tar.gz” or click here

Blog

Super Long Introns of Euarchontoglires

There was another weird result I got about my exon/intron boundaries analysis research. To less diverse species’ genes, intron lengths are shown to increase. However, according to my findings, at a point of Euarchontoglires or Supraprimates, this increase is very sharp and seems unexpected. So, I looked at exon/intron length each gene in each taxonomic rank and try to see what makes Euarchontoglires genes with that long introns. As you see in the graph above, Euarchontoglires introns are very long compared to the rest.

Blog

Super Long Introns of Euarchontoglires

There was another weird result I got about my exon/intron boundaries analysis research. To less diverse species’ genes, intron lengths are shown to increase. However, according to my findings, at a point of Euarchontoglires or Supraprimates, this increase is very sharp and seems unexpected. So, I looked at exon/intron length each gene in each taxonomic rank and try to see what makes Euarchontoglires genes with that long introns. As you see in the graph above, Euarchontoglires introns are very long compared to the rest.

Blog

How to Get Transcripts (also Exons & Introns) of a Gene using Ensembl API

As a part of my project, I need to obtain exons and introns of certain genes. These genes are actually human genes that are determined for a specific reason that I will describe later when I explain my project. But for now, I want to share the way to obtain this information using (Perl) Ensembl API. Note that Ensembl has started a beautiful way (Ensembl REST API) of getting data but it is beta and it doesn’t provide exons / introns information.

Blog

Super Long Introns of Euarchontoglires

There was another weird result I got about my exon/intron boundaries analysis research. To less diverse species’ genes, intron lengths are shown to increase. However, according to my findings, at a point of Euarchontoglires or Supraprimates, this increase is very sharp and seems unexpected. So, I looked at exon/intron length each gene in each taxonomic rank and try to see what makes Euarchontoglires genes with that long introns. As you see in the graph above, Euarchontoglires introns are very long compared to the rest.

Blog

How to Get Transcripts (also Exons & Introns) of a Gene using Ensembl API

As a part of my project, I need to obtain exons and introns of certain genes. These genes are actually human genes that are determined for a specific reason that I will describe later when I explain my project. But for now, I want to share the way to obtain this information using (Perl) Ensembl API. Note that Ensembl has started a beautiful way (Ensembl REST API) of getting data but it is beta and it doesn’t provide exons / introns information.

Blog

Super Long Introns of Euarchontoglires

There was another weird result I got about my exon/intron boundaries analysis research. To less diverse species’ genes, intron lengths are shown to increase. However, according to my findings, at a point of Euarchontoglires or Supraprimates, this increase is very sharp and seems unexpected. So, I looked at exon/intron length each gene in each taxonomic rank and try to see what makes Euarchontoglires genes with that long introns. As you see in the graph above, Euarchontoglires introns are very long compared to the rest.

Blog

Ikinci Veriseti Inceleme Sonuclari

Daha az eslenemeyen okumalara sahip ikinci verisetinin incelemesini tamamladim. Bu oncekine gore daha iyi bir dizileme ornegi oldugu icin aldigim sonuclar da oldukca tutarliydi. Insan genomuna ait bir diziden inceleme sonra asagidaki sonuclari elde ettim. LIST OF ORGANISMS AND THEIR NUMBER OF OCCURENCES Ambiguous hit 1323 Homo sapiens 312 Pan troglodytes 25 Pongo abelii 18 Nomascus leucogenys 17 Halomonas sp. GFAJ-1 7 Callithrix jacchus 4 Macaca mulatta 3 Oryctolagus cuniculus 2 Loxodonta africana 1 Cavia porcellus 1 “Ambiguous hit” tanimini baska bir yazida aciklayacagim.

Blog

Super Long Introns of Euarchontoglires

There was another weird result I got about my exon/intron boundaries analysis research. To less diverse species’ genes, intron lengths are shown to increase. However, according to my findings, at a point of Euarchontoglires or Supraprimates, this increase is very sharp and seems unexpected. So, I looked at exon/intron length each gene in each taxonomic rank and try to see what makes Euarchontoglires genes with that long introns. As you see in the graph above, Euarchontoglires introns are very long compared to the rest.

Blog

How to Get Transcripts (also Exons & Introns) of a Gene using Ensembl API

As a part of my project, I need to obtain exons and introns of certain genes. These genes are actually human genes that are determined for a specific reason that I will describe later when I explain my project. But for now, I want to share the way to obtain this information using (Perl) Ensembl API. Note that Ensembl has started a beautiful way (Ensembl REST API) of getting data but it is beta and it doesn’t provide exons / introns information.

Blog

Super Long Introns of Euarchontoglires

There was another weird result I got about my exon/intron boundaries analysis research. To less diverse species’ genes, intron lengths are shown to increase. However, according to my findings, at a point of Euarchontoglires or Supraprimates, this increase is very sharp and seems unexpected. So, I looked at exon/intron length each gene in each taxonomic rank and try to see what makes Euarchontoglires genes with that long introns. As you see in the graph above, Euarchontoglires introns are very long compared to the rest.

Blog

Super Long Introns of Euarchontoglires

There was another weird result I got about my exon/intron boundaries analysis research. To less diverse species’ genes, intron lengths are shown to increase. However, according to my findings, at a point of Euarchontoglires or Supraprimates, this increase is very sharp and seems unexpected. So, I looked at exon/intron length each gene in each taxonomic rank and try to see what makes Euarchontoglires genes with that long introns. As you see in the graph above, Euarchontoglires introns are very long compared to the rest.

Blog

An Exon of Length 2 Appeared in Ensembl

I want to share an interesting finding about our research on exon/intron analysis of human evolutionary history. So I had the genes that emerged at each pass point of human history and I was using Ensembl API to get exons and introns of these genes to perform further analyses. There was one gene (ENSG00000197568 - HERV-H LTR-associating 3 - HHLA3) with a surprise. Because it’s one transcript (ENST00000432224) had an exon (ENSE00001707577) of length 2.

Blog

An Exon of Length 2 Appeared in Ensembl

I want to share an interesting finding about our research on exon/intron analysis of human evolutionary history. So I had the genes that emerged at each pass point of human history and I was using Ensembl API to get exons and introns of these genes to perform further analyses. There was one gene (ENSG00000197568 - HERV-H LTR-associating 3 - HHLA3) with a surprise. Because it’s one transcript (ENST00000432224) had an exon (ENSE00001707577) of length 2.

Blog

An Exon of Length 2 Appeared in Ensembl

I want to share an interesting finding about our research on exon/intron analysis of human evolutionary history. So I had the genes that emerged at each pass point of human history and I was using Ensembl API to get exons and introns of these genes to perform further analyses. There was one gene (ENSG00000197568 - HERV-H LTR-associating 3 - HHLA3) with a surprise. Because it’s one transcript (ENST00000432224) had an exon (ENSE00001707577) of length 2.

Blog

An Exon of Length 2 Appeared in Ensembl

I want to share an interesting finding about our research on exon/intron analysis of human evolutionary history. So I had the genes that emerged at each pass point of human history and I was using Ensembl API to get exons and introns of these genes to perform further analyses. There was one gene (ENSG00000197568 - HERV-H LTR-associating 3 - HHLA3) with a surprise. Because it’s one transcript (ENST00000432224) had an exon (ENSE00001707577) of length 2.

Blog

How to Convert PLINK Binary Formats into Non-binary Formats

PLINK is a whole genome association analysis toolset and to save time and space, you need to convert your data files to binary formats (BED, FAM, BIM) but of course when you need to view the files, you have to convert them back to non-binary formats (PED, MAP) to be able to open them in your text editor such as Notepad on Windows OS. This operation is really easy. It requires PLINK of course, and the following line of code written to DOS window (Run -> type cmd; hit ENTER) in the directory of PLINK:

Blog

How to Convert PLINK Binary Formats into Non-binary Formats

PLINK is a whole genome association analysis toolset and to save time and space, you need to convert your data files to binary formats (BED, FAM, BIM) but of course when you need to view the files, you have to convert them back to non-binary formats (PED, MAP) to be able to open them in your text editor such as Notepad on Windows OS. This operation is really easy. It requires PLINK of course, and the following line of code written to DOS window (Run -> type cmd; hit ENTER) in the directory of PLINK:

Blog

How to Convert PLINK Binary Formats into Non-binary Formats

PLINK is a whole genome association analysis toolset and to save time and space, you need to convert your data files to binary formats (BED, FAM, BIM) but of course when you need to view the files, you have to convert them back to non-binary formats (PED, MAP) to be able to open them in your text editor such as Notepad on Windows OS. This operation is really easy. It requires PLINK of course, and the following line of code written to DOS window (Run -> type cmd; hit ENTER) in the directory of PLINK:

Blog

How to Convert PLINK Binary Formats into Non-binary Formats

PLINK is a whole genome association analysis toolset and to save time and space, you need to convert your data files to binary formats (BED, FAM, BIM) but of course when you need to view the files, you have to convert them back to non-binary formats (PED, MAP) to be able to open them in your text editor such as Notepad on Windows OS. This operation is really easy. It requires PLINK of course, and the following line of code written to DOS window (Run -> type cmd; hit ENTER) in the directory of PLINK:

Blog

How to Convert PLINK Binary Formats into Non-binary Formats

PLINK is a whole genome association analysis toolset and to save time and space, you need to convert your data files to binary formats (BED, FAM, BIM) but of course when you need to view the files, you have to convert them back to non-binary formats (PED, MAP) to be able to open them in your text editor such as Notepad on Windows OS. This operation is really easy. It requires PLINK of course, and the following line of code written to DOS window (Run -> type cmd; hit ENTER) in the directory of PLINK:

Blog

How to Convert PLINK Binary Formats into Non-binary Formats

PLINK is a whole genome association analysis toolset and to save time and space, you need to convert your data files to binary formats (BED, FAM, BIM) but of course when you need to view the files, you have to convert them back to non-binary formats (PED, MAP) to be able to open them in your text editor such as Notepad on Windows OS. This operation is really easy. It requires PLINK of course, and the following line of code written to DOS window (Run -> type cmd; hit ENTER) in the directory of PLINK:

Blog

How to Convert PLINK Binary Formats into Non-binary Formats

PLINK is a whole genome association analysis toolset and to save time and space, you need to convert your data files to binary formats (BED, FAM, BIM) but of course when you need to view the files, you have to convert them back to non-binary formats (PED, MAP) to be able to open them in your text editor such as Notepad on Windows OS. This operation is really easy. It requires PLINK of course, and the following line of code written to DOS window (Run -> type cmd; hit ENTER) in the directory of PLINK:

Blog

How to Convert PLINK Binary Formats into Non-binary Formats

PLINK is a whole genome association analysis toolset and to save time and space, you need to convert your data files to binary formats (BED, FAM, BIM) but of course when you need to view the files, you have to convert them back to non-binary formats (PED, MAP) to be able to open them in your text editor such as Notepad on Windows OS. This operation is really easy. It requires PLINK of course, and the following line of code written to DOS window (Run -> type cmd; hit ENTER) in the directory of PLINK:

Blog

How to Get Transcripts (also Exons & Introns) of a Gene using Ensembl API

As a part of my project, I need to obtain exons and introns of certain genes. These genes are actually human genes that are determined for a specific reason that I will describe later when I explain my project. But for now, I want to share the way to obtain this information using (Perl) Ensembl API. Note that Ensembl has started a beautiful way (Ensembl REST API) of getting data but it is beta and it doesn’t provide exons / introns information.

Blog

Retrieving Data with AJAX using jQuery, PHP and MySQL

Last semester, I took a course from Informatics Institute at METU called “Biological Databases and Data Analysis Tools” where first we learned what is a database and how to do queries on it. Also, the technology behind databases are taught. Then, we learned many biological databases and data analysis tools available. These include gene, protein and pathway databases, tools for creating databases. As a final project, we were asked to create an online tool that can search a database and get the data and display it on any web browsers.

Blog

How to Get Transcripts (also Exons & Introns) of a Gene using Ensembl API

As a part of my project, I need to obtain exons and introns of certain genes. These genes are actually human genes that are determined for a specific reason that I will describe later when I explain my project. But for now, I want to share the way to obtain this information using (Perl) Ensembl API. Note that Ensembl has started a beautiful way (Ensembl REST API) of getting data but it is beta and it doesn’t provide exons / introns information.

Blog

How to Get Transcripts (also Exons & Introns) of a Gene using Ensembl API

As a part of my project, I need to obtain exons and introns of certain genes. These genes are actually human genes that are determined for a specific reason that I will describe later when I explain my project. But for now, I want to share the way to obtain this information using (Perl) Ensembl API. Note that Ensembl has started a beautiful way (Ensembl REST API) of getting data but it is beta and it doesn’t provide exons / introns information.

Blog

Geany Color Schemes Ubuntu

There is a collection of color schemes for Geany as well. Download it on GitHub and follow the instructions. You’ll need to extract and copy all the files in colorschemes directory to ~/.config/geany/colorschemes/ Then, restart Geany and go to View -> Editor -> Color Schemes and choose your style. I’m using Tango. Source

Blog

Install Geany 1.23 on Ubuntu

Geany is a really nice text editor for Ubuntu. I would recommend it with TreeBrowser and some interface coding are color schemes. But you’ll need the latest version which is 1.23 for now. To install this version you need to add PPA, also this will keep it updated when you update your system. Execute following lines one by one: sudo add-apt-repository ppa:geany-dev/ppa sudo apt-get update sudo apt-get install geany Then, when you start Geany you’ll see “This is Geany 1.

Blog

A Nice File Browser for Geany 1.23 on Ubuntu 12.04 LTS

If you’re looking for a file browser for Geany, check out TreeBrowser plugin on its page (see the page for screenshots). To install and enable, just run following o Terminal: sudo apt-get install geany-plugin-treebrowser And go to “Tools” -> “Plugin Manager”, check “TreeBrowser” Source

Blog

Geany Color Schemes Ubuntu

There is a collection of color schemes for Geany as well. Download it on GitHub and follow the instructions. You’ll need to extract and copy all the files in colorschemes directory to ~/.config/geany/colorschemes/ Then, restart Geany and go to View -> Editor -> Color Schemes and choose your style. I’m using Tango. Source

Blog

Install Geany 1.23 on Ubuntu

Geany is a really nice text editor for Ubuntu. I would recommend it with TreeBrowser and some interface coding are color schemes. But you’ll need the latest version which is 1.23 for now. To install this version you need to add PPA, also this will keep it updated when you update your system. Execute following lines one by one: sudo add-apt-repository ppa:geany-dev/ppa sudo apt-get update sudo apt-get install geany Then, when you start Geany you’ll see “This is Geany 1.

Blog

A Nice File Browser for Geany 1.23 on Ubuntu 12.04 LTS

If you’re looking for a file browser for Geany, check out TreeBrowser plugin on its page (see the page for screenshots). To install and enable, just run following o Terminal: sudo apt-get install geany-plugin-treebrowser And go to “Tools” -> “Plugin Manager”, check “TreeBrowser” Source

Blog

Install Apache2, PHP5, MySQL & phpMyAdmin on Ubuntu 12.04

First, install apache2: sudo apt-get install apache2 Then, for it to work: sudo service apache2 restart For custom www folder: sudo cp /etc/apache2/sites-available/default /etc/apache2/sites-available/www gksudo gedit /etc/apache2/sites-available/www Change DocumentRoot and Directory directive to point to new location. For example, /home/user/www/ Save and see (link here clean URLs not working Laravel 4) Make www default and disable default: sudo a2dissite default && sudo a2ensite www sudo service apache2 restart Create new file in www

Blog

session_start() Permission denied (13) Laravel 4

Solve it by running following lines: chmod -R 755 /path/to/your/laravel/directory chmod -R o+w /path/to/your/laravel/directory And/or maybe: sudo chown -R www-data:user /path/to/your/laravel/directory

Blog

If clean URLs don't work in Laravel 4 on Ubuntu 12.04 LTS

.htaccess directions are correct, mod_rewrite is enabled but still you are getting 404 Not Found errors… You need to change AllowOverride None to AllowOverride All in /etc/apache2/sites-available/default. Modified section in the file: <Directory /home/user/www/> Options Indexes FollowSymLinks MultiViews AllowOverride All Order allow,deny allow from all </Directory>

Blog

Permission Issues develop Laravel 4 on Ubuntu 12.04 LTS

If your CSS or JS files don’t seem to load or you get 403 Forbidden or Permissions denied, all you need to do is to run following on terminal: sudo chmod -R 755 /path/to/your/laravel/directory

Blog

Base URL for Your Laravel 4 Website

To get base URL of your website to generate links to your content or assets do following: Set $url in app/config/app.php to your base URL: 1'url' => 'http://localhost/example', Use it everywhere with URL::to(), for example: 1echo URL::to('assets/css/general.css'); 2/* outputs http://localhost/example/assets/css/general.css */

Blog

Remove public from URL Laravel 4

Move all content of (files in) public/ folder one level above (to the base) Fix paths in index.php: 1require __DIR__.'/bootstrap/autoload.php'; 2$app = require_once __DIR__.'/bootstrap/start.php'; Fix path in bootstrap/paths.php: 1'public' => __DIR__.'/..', Done Source

Blog

Some String Functions in R, String Manipulation in R

I have programmed with Perl, Python, and PHP before, and string manipulation was more direct and easier in them than in R. But still there are useful functions for string manipulation in R. I’m not an expert in R but I’ve been dealing with it for a while and I’ve learned some good functions for this purpose. Concatenate strings Concatenation is done with paste function. It gets concatenated strings as arguments separated bu comma and also separator character(s).

Blog

Retrieving Data with AJAX using jQuery, PHP and MySQL

Last semester, I took a course from Informatics Institute at METU called “Biological Databases and Data Analysis Tools” where first we learned what is a database and how to do queries on it. Also, the technology behind databases are taught. Then, we learned many biological databases and data analysis tools available. These include gene, protein and pathway databases, tools for creating databases. As a final project, we were asked to create an online tool that can search a database and get the data and display it on any web browsers.

Blog

Install Apache2, PHP5, MySQL & phpMyAdmin on Ubuntu 12.04

First, install apache2: sudo apt-get install apache2 Then, for it to work: sudo service apache2 restart For custom www folder: sudo cp /etc/apache2/sites-available/default /etc/apache2/sites-available/www gksudo gedit /etc/apache2/sites-available/www Change DocumentRoot and Directory directive to point to new location. For example, /home/user/www/ Save and see (link here clean URLs not working Laravel 4) Make www default and disable default: sudo a2dissite default && sudo a2ensite www sudo service apache2 restart Create new file in www

Blog

Install Perl DBI Module on Ubuntu 12.04

On Terminal, run: sudo apt-get install libdbi-perl Source

Blog

Install Ensembl API and BioPerl 1.2.3 on Your System

I’m going to work on a project that requires lots of queries on Ensembl databases so I wanted to install Ensembl API to begin with. Since it’s programmed in Perl, I will be using Perl in this project. There is a nice tutorial on Ensembl website for API installation. Here I will describe some steps. 1. Download the API and BioPerl Go to Ensembl FTP ftp://ftp.ensembl.org/pub/ and download “ensembl-api.tar.gz” or click here

Blog

Some String Functions in R, String Manipulation in R

I have programmed with Perl, Python, and PHP before, and string manipulation was more direct and easier in them than in R. But still there are useful functions for string manipulation in R. I’m not an expert in R but I’ve been dealing with it for a while and I’ve learned some good functions for this purpose. Concatenate strings Concatenation is done with paste function. It gets concatenated strings as arguments separated bu comma and also separator character(s).

Blog

SRS'de Coklu Arama Yapmak

Inceleme yapan scriptin en son hali, oncekilere gore daha fazla okuma inceliyor oldugu icin her okuma icin SRS uzerinde isim aramak oldukca zaman alan bir islemdi. Oyle ki, son inceleme 4 gun surdu. Bunu azaltmak icin inceleme scriptini tamamen degistirdim. Oncelikle her zaman oldugu gibi esik degerini gecenleri aliyor ama direkt bunlarin ID numaralarini bir dizide (array) listeliyorum. Daha sonra bu listenin herbir elemanini boru karakteri ile ayirarak bir string haline getiriyorum.

Blog

MegaBLAST Sonuclarini Incelemek - Parsing

Pipeline’da son asama, aranan dizilerin urettigi ciktilari baska bir script ile incelemek. Bu islemle herbir megablast dosyasi okunuyor, ve dizilerin name, identity, overlapping length gibi parametrelerinin degerleri saklanarak amaca yonelik sekilde ekrana yazdiriliyor. Projemde HUSAR paketinde bulunan ve yukarida bahsettigim alanlari bana dizi olarak donduren Inslink adinda bir parser kullaniyorum. Bu parserin yaptigi tek sey, dosyayi okumak ve dosyadaki istenen alanlarin degerlerini saklamak. Daha sonra ben bu saklanan degerleri, koda eklemeler yaparak gosteriyorum ve birkac ek kod ile de ihtiyacim olan anlamli sonuclar gosteriyorum.

Blog

Kalite Satirinin Degerlendirilmesi - Quality Filter

Kirleten organizma (konaminant) analizi yapacak olan pipeline’i daha fazla gelistirmek, daha anlamli sonuclar elde etmek icin ilk adimlara (henuz fastq dosyasini isliyorken) kalite filtresi eklemeyi dusunduk. Boylece belirli bir esik degerinden dusuk okumalari daha o asamadan filtreleyerek daha guvenilir sonuclar elde elebilecegiz. Bu kalite kontrolunu fastq dosyasinda her okumanin 4. satirini anlayarak yapacagiz. Bu 4. satir (aslinda okumanin dizileme kalite skoru), cesitli dizileme cihazlari tarafindan cesitli sekillerde yaziliyor (kodlaniyor) ve bu kodlamadan tekrar kalite skorunu elde ederek filtreleme uygulanmasi gerekiyor.

Blog

Inceleme Sonuclarini "Ambiguous" Olarak Ayirmak

Cesitli veritabanlarina karsi yaptigim aramalardan aldigim sonuclari incelerken, bunlari cesitli esik degerleri ile degerlendirmek ile beraber belirlenen esik degerlerinin uzerinde ya da altinda olan hitleri “Ambiguous” (belirsiz, cok anlamli) ya da “Unique” (essiz, tek) olarak ayirarak daha da anlamli hale getirmeye calisiyorum. “Ambiguous” olarak, her bir megablast dosyasinda esik degerlerine uygun ancak birden fazla farkli organizmayi iceren hitleri etiketliyorum. Eger her esik degerine uygun hit, tek bir dosya icinde her zaman ayni organizmaya ait ise bu durumda yaptigim sey onu “unique” olarak etiketlemek.

Blog

Ikinci Veriseti Inceleme Sonuclari

Daha az eslenemeyen okumalara sahip ikinci verisetinin incelemesini tamamladim. Bu oncekine gore daha iyi bir dizileme ornegi oldugu icin aldigim sonuclar da oldukca tutarliydi. Insan genomuna ait bir diziden inceleme sonra asagidaki sonuclari elde ettim. LIST OF ORGANISMS AND THEIR NUMBER OF OCCURENCES Ambiguous hit 1323 Homo sapiens 312 Pan troglodytes 25 Pongo abelii 18 Nomascus leucogenys 17 Halomonas sp. GFAJ-1 7 Callithrix jacchus 4 Macaca mulatta 3 Oryctolagus cuniculus 2 Loxodonta africana 1 Cavia porcellus 1 “Ambiguous hit” tanimini baska bir yazida aciklayacagim.

Blog

Yeni Verisetinin Incelenmesi

Pipeline’i tasarlama asamasinda deneme amacli kullandigim onceki verinin cok kotu olmasi sebebiyle yeni bir veriseti aldim. Elbette deneme asamasinda birden fazla, farkli karakterlerde verisetleri kullanmak yararlidir. Ancak onceki veriseti anlamli birkac sonuc veremeyecek kadar kotuydu diyebilirim. Ayrintilarina [buradan]({% post_url 2012-07-06-eslestirme-ve-eslesmeyen-okumalari %}) gozatabilirsiniz. Yeni veriseti, gene bir insan genomu verisi ve BAM dosyasinin boyutu 1.8 GB ve icinde eslenebilen ve eslenemeyen okumalari bulunduruyordu. Ben bam2fastq araciyla hem bu BAM dosyasini FASTQ dosyasina cevirirken hem de eslenebilen okumalardan ayiklayarak 0.

Blog

Birden Fazla Dizi Dosyalarindan MegaBLAST'i Calistirmak

Asagidaki scripti, pipeline’in MegaBLAST aramasini daha hizli yapabilmek icin dusundugumuz bir teknige uygun olabilmesi icin yazdim. Yaptigi sey, her okuma icin olusturulmus ve formatlanmis dizi dosyalarini kullanarak veritabanlarinda belirtilen baslangic noktasi ve okuma sayisi ile arama yapmak. 1#!user/local/bin/perl 2 3$database = $ARGV[0]; 4$dir = $ARGV[1]; #directory for sequences 5$sp = $ARGV[2]; #starting point 6$n = $ARGV[3] + $sp; 7 8while (1) { 9 system("blastplus -programname=megablast $dir/read_$sp.seq $database -OUTFILE=read_$sp.megablast -nobatch -d"); 10 $sp++; 11 last if ($sp == $n); 12} Burada her sey gercekten cok basit bir programlama ile isliyor.

Blog

Tek FASTA Dosyasindan MegaBLAST'i Calistirmak - Duzenli Ifadeler

Asagida MegaBLAST’i FASTA dosyasi okuyarak calistirmak ve sonuclari bir dizinde toplayabilmek amaciyla yazdigim Perl scripti ve onun aciklamasi var. Bu script tasarlamakta oldugum pipeline’in onemli bir parcasi. Bu script ilk yazdigim olan ve sadece bir FASTA dosyasi uzerinden tum okumalara ulasabilen script. 1#!user/local/bin/perl 2$database = $ARGV[0]; 3$fasta = $ARGV[1]; #input file 4$sp = $ARGV[2]; #starting point 5$n = $ARGV[3] + $sp; 6 7if(!defined($n)){$n=12;} #set default number 8 9open FASTA, $fasta or die $!

Blog

Unix'te Perl Ile Bir Komut Ciktisini Okumak ve Duzenli Ifadeler

Daha once organizma isimlerini duzenli ifadelerle nasil cikardigimi anlatmistim. Burada, gene benzer bir seyden bahsedecegim ancak bu biraz daha fazla, ozel bir teknikle Perl’de yapilan, veri tabanindan bilgileri birden fazla satir halinde cikti olarak aldigim icin gerek duydugum cok yararli bir yontem. Mutlaka benzerini baska amaclarla da kullanabilir, yararlanabilirsiniz. Bu ihtiyac, HUSAR gurubu tarafindan olusturulan honest veritabaninin organizma isimlerini direkt sunmamasi ancak birkac satir halinde gostermesi sebebiyle dogdu. Asagida bunun ornegini gorebilirsiniz.

Blog

MegaBLAST Aramasini Hizlandirma

Son zamanlarda sadece farkli veritabanlarinda, MegaBLAST’i en cabuk ve etkili bir sekilde calistirmanin yolunu ariyorum ve FASTA dosyasi olusturma asamasinda, gercekten cokca ise yarayan bir yontem danismanim tarafindan geldi. Daha once tum dizilerin bulundugu tek bir FASTA dosyasindan arama yapiyordum ve bu zaman kaybina yol aciyordu. Her ne kadar dosya bir sefer acilsa da her seferinde dosya icinde satirlara gidip onu okuman, zaman alan bir islem. Bunu, dosyadaki her okumayi, ayri bir FASTA dosyasi haline getirerek cozduk.

Blog

FASTQ'dan FASTA'ya Donusturme Perl Scripti

FASTQ ve FASTA formatlari aslinda ayni bilgiyi iceren ancak birinde sadece herbir dizi icin iki satir daha az bilginin bulundugu dosya formatlari. Projemde onemli olan diger bir farklari ise FASTA formatinin direkt olarak MegaBLAST arama yapilabilmesi. Iste bu yuzden, genetik dizilim yapan makinelerin olusturdugu FASTQ formatini FASTA’ya cevirmem gerekiyor. Ve bu script pipeline’in ilk adimi. Aslinda deneme amacli aldigim genetk dizilimin, bana bunu ulastiran tarafindan eslestirmesinin yapilmadigi icin, bir on adim olarak bu eslestirmeyi yapmistim.

Blog

İlk Adım: Eşleşmeyen Okumaları Elde Etmek

Projemin ilk kısmı daha önce bahsettiğim gibi eşleşmeyen okumaları (unmapped reads) FASTQ dosyasından çıkarmak. Böylece, daha sonraki analizler için elimdeki ihtiyacım olmayan dizileri çıkarmış ve bu analizlerdeki iş yükünü azaltmış oluyorum. Başından beri hedefim, tüm projeyi adım adım gerçekleştiren bir pipeline tasarlamak olduğu için bu işlemi bir Perl scripti ile yapacağım. Bu script pipeline’in ilk scripti ve laboratuvardan gelecek ham (raw) FASTQ formatındaki verinin girdi (input) olarak kullanılacağı yer. Aslında bu scripte ihtiyacım olmayacak, sadece elimdeki verinin eşlenebilen verileri de içermesi sebebiyle bu adımı ekledim.

Blog

FASTQ Formatı - FASTQ Dosyası

Bugün programı oluştururken kullanacağım “test” dizilimini aldım. İki adet FASTQ dosyasından oluşuyor, her biri sıkıştırılmış ama buna rağmen boyutları 6 GB civarı. Ben elbette çok zaman kaybetmek istemediğim için bu dosyalardan birinin sadece bir kısmını kullanacağım. Amacım, bu FASTQ dosyalarındaki eşleşebilen okumaları BWA aracı ile bularak, daha sonra onları çıkarmak. Ve kalan eşleşemeyen okumaları MegaBLAST aracının anlayabileceği bir dilde (FASTA formatında) kaydetmek. Bu arada tüm projeyi bir Unix bilgisayarda hazırladığım için birçok komut öğreniyorum, daha sonra bunları ayrıca yazmaya çalışacağım.

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

Install Perl DBI Module on Ubuntu 12.04

On Terminal, run: sudo apt-get install libdbi-perl Source

Blog

Start Ubuntu 12.04 Bluetooth Off

On Terminal: sudo gedit /etc/rc.local Add following before the line “exit 0” rfkill block bluetooth Save Source

Blog

Install Steam on Ubuntu 12.04

Download steam_latest.deb at: http://repo.steampowered.com/steam/archive/precise/steam_latest.deb Double click and open it on Ubuntu Software Center and click Install It’ll start Terminal and ask password for sudo because there are some packages required, enter your password and continue Next it’ll update itself Done Source

Blog

Enable Hibernation for Lenovo Z500 on Ubuntu 12.04

Using Terminal add this file: sudo gedit /etc/polkit-1/localauthority/50-local.d/com.ubuntu.enable-hibernate.pkla This: [Re-enable hibernate by default] Identity=unix-user:* Action=org.freedesktop.upower.hibernate ResultActive=yes Save & reboot Source

Blog

Enable Hibernation for Lenovo Z500 on Ubuntu 12.04

Using Terminal add this file: sudo gedit /etc/polkit-1/localauthority/50-local.d/com.ubuntu.enable-hibernate.pkla This: [Re-enable hibernate by default] Identity=unix-user:* Action=org.freedesktop.upower.hibernate ResultActive=yes Save & reboot Source

Blog

Hotkeys (special keys) Volume/Brightness Controls Don't Work After Suspend

What seems to solve this problem on Ubuntu 12.04 LTS (Lenovo Z500): Open this file: sudo gedit /etc/default/grub Modify the line as this: GRUB_CMDLINE_LINUX="noapic" Close it and run the following: sudo update-grub Restart your computer Source

Blog

Suspend Laptop When Lid Closed Ubuntu 12.04 LTS in Lenovo Z500

I guess this is a bug. Although suspend is set in Power settings, it doesn’t suspend the laptop when its lid is closed. To solve it, I’ve found a workaround on web. Here is how you implement it: Generate folder if it’s not present: sudo mkdir /etc/acpi/local Set its permissions: sudo chmod 755 /etc/acpi/local Generate the script: sudo gedit /etc/acpi/local/lid.sh.post Copy-paste the following: #!/bin/bash grep -q closed /proc/acpi/button/lid/*/state if [ $?

Blog

Install Spotify on Ubuntu 12.04

Start Software Sources from Dash Home Add following in Other Sources tab: deb http://repository.spotify.com stable non-free Close Software Sources Add Spotify repo key on Terminal: sudo apt-key adv –keyserver keyserver.ubuntu.com –recv-keys 94558F59 Install Spotify on Terminal: sudo apt-get update && sudo apt-get install spotify-client Find Spotify in Dash Home Source

Blog

Save Brightness Settings Ubuntu 12.04 LTS

If your laptop starts with minimized or maximized brightness and you want to have a fixed default value for that do following: Run terminal and type to get maximum brightness: cat /sys/class/backlight/acpi_video0/max_brightness Now set the brightness as you want and run following which give you the value for current setting: cat /sys/class/backlight/acpi_video0/brightness Edit /etc/rc.local to have that value as default after each reboot / start: sudo gedit /etc/rc.local Add this line before exit 0:

Blog

Hotkeys (special keys) Volume/Brightness Controls Don't Work After Suspend

What seems to solve this problem on Ubuntu 12.04 LTS (Lenovo Z500): Open this file: sudo gedit /etc/default/grub Modify the line as this: GRUB_CMDLINE_LINUX="noapic" Close it and run the following: sudo update-grub Restart your computer Source

Blog

Hotkeys (special keys) Volume/Brightness Controls Don't Work After Suspend

What seems to solve this problem on Ubuntu 12.04 LTS (Lenovo Z500): Open this file: sudo gedit /etc/default/grub Modify the line as this: GRUB_CMDLINE_LINUX="noapic" Close it and run the following: sudo update-grub Restart your computer Source

Blog

session_start() Permission denied (13) Laravel 4

Solve it by running following lines: chmod -R 755 /path/to/your/laravel/directory chmod -R o+w /path/to/your/laravel/directory And/or maybe: sudo chown -R www-data:user /path/to/your/laravel/directory

Blog

If clean URLs don't work in Laravel 4 on Ubuntu 12.04 LTS

.htaccess directions are correct, mod_rewrite is enabled but still you are getting 404 Not Found errors… You need to change AllowOverride None to AllowOverride All in /etc/apache2/sites-available/default. Modified section in the file: <Directory /home/user/www/> Options Indexes FollowSymLinks MultiViews AllowOverride All Order allow,deny allow from all </Directory>

Blog

Permission Issues develop Laravel 4 on Ubuntu 12.04 LTS

If your CSS or JS files don’t seem to load or you get 403 Forbidden or Permissions denied, all you need to do is to run following on terminal: sudo chmod -R 755 /path/to/your/laravel/directory

Blog

Base URL for Your Laravel 4 Website

To get base URL of your website to generate links to your content or assets do following: Set $url in app/config/app.php to your base URL: 1'url' => 'http://localhost/example', Use it everywhere with URL::to(), for example: 1echo URL::to('assets/css/general.css'); 2/* outputs http://localhost/example/assets/css/general.css */

Blog

Remove public from URL Laravel 4

Move all content of (files in) public/ folder one level above (to the base) Fix paths in index.php: 1require __DIR__.'/bootstrap/autoload.php'; 2$app = require_once __DIR__.'/bootstrap/start.php'; Fix path in bootstrap/paths.php: 1'public' => __DIR__.'/..', Done Source

Blog

session_start() Permission denied (13) Laravel 4

Solve it by running following lines: chmod -R 755 /path/to/your/laravel/directory chmod -R o+w /path/to/your/laravel/directory And/or maybe: sudo chown -R www-data:user /path/to/your/laravel/directory

Blog

How To Make A File or Script Executable in Ubuntu

Start terminal CTRL + Alt + T can be used (or just go to Dash Home and type Terminal): Run this command below: sudo chmod +x /path/to/your/file Source

Blog

Suspend Laptop When Lid Closed Ubuntu 12.04 LTS in Lenovo Z500

I guess this is a bug. Although suspend is set in Power settings, it doesn’t suspend the laptop when its lid is closed. To solve it, I’ve found a workaround on web. Here is how you implement it: Generate folder if it’s not present: sudo mkdir /etc/acpi/local Set its permissions: sudo chmod 755 /etc/acpi/local Generate the script: sudo gedit /etc/acpi/local/lid.sh.post Copy-paste the following: #!/bin/bash grep -q closed /proc/acpi/button/lid/*/state if [ $?

Blog

Install Ensembl API and BioPerl 1.2.3 on Your System

I’m going to work on a project that requires lots of queries on Ensembl databases so I wanted to install Ensembl API to begin with. Since it’s programmed in Perl, I will be using Perl in this project. There is a nice tutorial on Ensembl website for API installation. Here I will describe some steps. 1. Download the API and BioPerl Go to Ensembl FTP ftp://ftp.ensembl.org/pub/ and download “ensembl-api.tar.gz” or click here

Blog

Install Ensembl API and BioPerl 1.2.3 on Your System

I’m going to work on a project that requires lots of queries on Ensembl databases so I wanted to install Ensembl API to begin with. Since it’s programmed in Perl, I will be using Perl in this project. There is a nice tutorial on Ensembl website for API installation. Here I will describe some steps. 1. Download the API and BioPerl Go to Ensembl FTP ftp://ftp.ensembl.org/pub/ and download “ensembl-api.tar.gz” or click here

Blog

Install Ensembl API and BioPerl 1.2.3 on Your System

I’m going to work on a project that requires lots of queries on Ensembl databases so I wanted to install Ensembl API to begin with. Since it’s programmed in Perl, I will be using Perl in this project. There is a nice tutorial on Ensembl website for API installation. Here I will describe some steps. 1. Download the API and BioPerl Go to Ensembl FTP ftp://ftp.ensembl.org/pub/ and download “ensembl-api.tar.gz” or click here

Blog

Install Ensembl API and BioPerl 1.2.3 on Your System

I’m going to work on a project that requires lots of queries on Ensembl databases so I wanted to install Ensembl API to begin with. Since it’s programmed in Perl, I will be using Perl in this project. There is a nice tutorial on Ensembl website for API installation. Here I will describe some steps. 1. Download the API and BioPerl Go to Ensembl FTP ftp://ftp.ensembl.org/pub/ and download “ensembl-api.tar.gz” or click here

Blog

Install Ensembl API and BioPerl 1.2.3 on Your System

I’m going to work on a project that requires lots of queries on Ensembl databases so I wanted to install Ensembl API to begin with. Since it’s programmed in Perl, I will be using Perl in this project. There is a nice tutorial on Ensembl website for API installation. Here I will describe some steps. 1. Download the API and BioPerl Go to Ensembl FTP ftp://ftp.ensembl.org/pub/ and download “ensembl-api.tar.gz” or click here

Blog

Install Ensembl API and BioPerl 1.2.3 on Your System

I’m going to work on a project that requires lots of queries on Ensembl databases so I wanted to install Ensembl API to begin with. Since it’s programmed in Perl, I will be using Perl in this project. There is a nice tutorial on Ensembl website for API installation. Here I will describe some steps. 1. Download the API and BioPerl Go to Ensembl FTP ftp://ftp.ensembl.org/pub/ and download “ensembl-api.tar.gz” or click here

Blog

If clean URLs don't work in Laravel 4 on Ubuntu 12.04 LTS

.htaccess directions are correct, mod_rewrite is enabled but still you are getting 404 Not Found errors… You need to change AllowOverride None to AllowOverride All in /etc/apache2/sites-available/default. Modified section in the file: <Directory /home/user/www/> Options Indexes FollowSymLinks MultiViews AllowOverride All Order allow,deny allow from all </Directory>

Blog

A Nice File Browser for Geany 1.23 on Ubuntu 12.04 LTS

If you’re looking for a file browser for Geany, check out TreeBrowser plugin on its page (see the page for screenshots). To install and enable, just run following o Terminal: sudo apt-get install geany-plugin-treebrowser And go to “Tools” -> “Plugin Manager”, check “TreeBrowser” Source

Blog

Permission Issues develop Laravel 4 on Ubuntu 12.04 LTS

If your CSS or JS files don’t seem to load or you get 403 Forbidden or Permissions denied, all you need to do is to run following on terminal: sudo chmod -R 755 /path/to/your/laravel/directory

Blog

Base URL for Your Laravel 4 Website

To get base URL of your website to generate links to your content or assets do following: Set $url in app/config/app.php to your base URL: 1'url' => 'http://localhost/example', Use it everywhere with URL::to(), for example: 1echo URL::to('assets/css/general.css'); 2/* outputs http://localhost/example/assets/css/general.css */

Blog

Remove public from URL Laravel 4

Move all content of (files in) public/ folder one level above (to the base) Fix paths in index.php: 1require __DIR__.'/bootstrap/autoload.php'; 2$app = require_once __DIR__.'/bootstrap/start.php'; Fix path in bootstrap/paths.php: 1'public' => __DIR__.'/..', Done Source

Blog

Last Submissions to the Challenge

Today, I submitted in silico and experimental data network inference results on Synapse for the next leaderboard on this Wednesday. For experimental part, I had to exclude edges with FGFR1 and FGFR3 because the data lacks phosphorylated forms of these proteins and networks must be constructed using only phosphoproteins in the data. Since there was an update for in silico part, I had to modify the script and resubmit the results.

Blog

Network Visualization Using Cytoscape

Cytoscape is a nice tool to visualize network for better understanding and delivery. I used it for in silico data network visualization and the result was really pretty. Now, I have networks constructed using experimental data from HPN-DREAM Challenge. In this post, I want to demonstrate how to visualize a network with scores. I’m using Cytoscape 2.8 on Ubuntu 12. First, the network will be read from a SIF file which is default format of Cytoscape for networks.

Blog

Plotting Expression Curves for Experimental Data

As I can plot expression curves for in silico data. I moved on experimental data which is more complex and larger. This data is the result of RPPA experiments on different breast cancer cell lines and it includes protein abundance measurements for about 45 phophoproteins. These phosphoproteins are treated with different inhibitors and stimuli and by comparing their expressions, I will try to infer relations between them. Before moving on inferring part, I want to have a script that can plot the graphs so that I can see particular results for specific cases.

Blog

In silico Network Inference Last Improvements and Visualization of Result in Cytoscape

I’m almost done with the analysis of in silico data, although I need to decide if I need further analysis with the inhibiting parent nodes in the network. Last, I couldn’t filter out duplicate edges, which were scored differently. Now, with some improvements in the script, low scores duplicates are filtered and there is a better final list of edges which is ready to be visualized. I also tried visualizing it on Cytoscape.

Blog

Plotting Expression Profiles Data Analysis for Network Inference

For in silico data network inference I decided to develop a script because the existing tools have bugs and they are not compatible with the data. At the same time, I will try to report bugs and the compatibility issues to developers. in silico data has 660 experiment results of 20 antibodies, 4 kinds of stimuli and 3 kinds of inhibitors. Antibodies are treated with a stimulus, say at t_0 and in the case of inhibitors, say at t_i, antibodies are pre-incubated for some time (t_pre) and then, treated with a stimulus.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

Last Submissions to the Challenge

Today, I submitted in silico and experimental data network inference results on Synapse for the next leaderboard on this Wednesday. For experimental part, I had to exclude edges with FGFR1 and FGFR3 because the data lacks phosphorylated forms of these proteins and networks must be constructed using only phosphoproteins in the data. Since there was an update for in silico part, I had to modify the script and resubmit the results.

Blog

Network Visualization Using Cytoscape

Cytoscape is a nice tool to visualize network for better understanding and delivery. I used it for in silico data network visualization and the result was really pretty. Now, I have networks constructed using experimental data from HPN-DREAM Challenge. In this post, I want to demonstrate how to visualize a network with scores. I’m using Cytoscape 2.8 on Ubuntu 12. First, the network will be read from a SIF file which is default format of Cytoscape for networks.

Blog

Plotting Expression Curves for Experimental Data

As I can plot expression curves for in silico data. I moved on experimental data which is more complex and larger. This data is the result of RPPA experiments on different breast cancer cell lines and it includes protein abundance measurements for about 45 phophoproteins. These phosphoproteins are treated with different inhibitors and stimuli and by comparing their expressions, I will try to infer relations between them. Before moving on inferring part, I want to have a script that can plot the graphs so that I can see particular results for specific cases.

Blog

Experimental Data Optimization for Network Inference

As I mentioned in my previous post, experimental data from the challenge has missing data values that create problems during analyses. To solve it, first thing I did was to optimize data, which includes detecting missing conditions and putting NAs for data values and sorting them if necessary. I wrote two functions in the script. First one ranks the data according to the fashion and sorts it based on these ranks.

Blog

Working with Experimental Data from Network Inference Challenge

As I almost finished with in silico data, I moved on to analyses of experimental data using the same script. But since the characteristics of data is somehow different, before inferring network, I need to modify the script to be able to read experimental data files. These differences include missing data values for some conditions. This makes analyses difficult because I have to estimate a value for them and this will decrease the confidence score of edges.

Blog

In silico Network Inference Last Improvements and Visualization of Result in Cytoscape

I’m almost done with the analysis of in silico data, although I need to decide if I need further analysis with the inhibiting parent nodes in the network. Last, I couldn’t filter out duplicate edges, which were scored differently. Now, with some improvements in the script, low scores duplicates are filtered and there is a better final list of edges which is ready to be visualized. I also tried visualizing it on Cytoscape.

Blog

Latest Progress on Network Inference and Edge Scoring

I have improved network inference part of the script slightly by changing the way of comparing intervention (presence of inhibitor and stimulus) and no intervention (presence of stimulus) data from in silico part. Now, I’m using a function (simp) from an R package called StreamMetabolism, which gets time points and data values and (does integration) calculates the area under the curve (Sefick, 2009). I do this integration for both condition and then I compare them.

Blog

Scoring Edges Network Inference HPN-DREAM Challenge

Yesterday, I managed to infer a network for some part of in silico data from the challenge. Since the challenge also asks for scoring the edges in networks, I developed the script further and add a function for that. edgeScorer function gets data object of average time points for each curve in intervention/no-intervention sets and scores each edge for each set of conditions. For this, first, it looks for the largest difference among the sets and set it as maxDifference and later, it stores differences divided by maxDifference in another data object.

Blog

Determining Edges More Progress on Network Inference

Lately, I have been writing an R script to infer network using in silico data. Last version of the script was reading MIDAS file and plotting expression profiles. I have modified it and now it reads MIDAS file, does some analyses and prints causal relations to a file. This file is a SIF file as required. This dataset is generated with 20 antibodies but only 3 of them are perturbed. Also, for one, stimulus is missing.

Blog

Plotting Expression Profiles Data Analysis for Network Inference

For in silico data network inference I decided to develop a script because the existing tools have bugs and they are not compatible with the data. At the same time, I will try to report bugs and the compatibility issues to developers. in silico data has 660 experiment results of 20 antibodies, 4 kinds of stimuli and 3 kinds of inhibitors. Antibodies are treated with a stimulus, say at t_0 and in the case of inhibitors, say at t_i, antibodies are pre-incubated for some time (t_pre) and then, treated with a stimulus.

Blog

Webinar on HPN-DREAM Breast Cancer Network Inference Challenge

DREAM8 organizers plan a webinar about HPN-DREAM Breast Cancer Network Inference Challenge on July 19, at 10:30 - 11:30 (PDT / UTC -7). General setup of the challenge, demo submissions to the leaderboard will be discussed and also questions about the challenge will be accepted during webinar. The number of the participants to the challenge is also announced: 138. Registration to the webinar is done using this form. There are limited number of “seats”, but later recordings will be published.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

Playing around with CellNOptR Tool and MIDAS File

With CellNOptR, we will try to construct network models for the challenge. For this, the tool needs two inputs. First one is a special data object called CNOlist that stores vectors and matrices of data. Second one is a .SIF file that contains prior knowledge network which can be obtained from pathway database and analysis tools. CNOlist contains following fields: namesSignals, namesCues, namesStimuli and namesInhibitors, which are vectors storing the names of measurements.

Blog

Progress on Network Inference Sub-Challenge

This sub-challenge has several requirements: Directed and causal edges on the models (32 models - 4 cell lines × 8 stimuli) Edges should be scored (normalizing to range between 0 and 1) that will show confidence Nodes will be phosphoproteins from the data Prior knowledge network (that can be constructed using pathway databases) is allowed to be used (actually this is a must for some network inference tools) First thing was to look for existing tools.

Blog

Network Inference DREAM Breast Cancer Challenge

The inference of causal edges are described as the change on a node seen after the intervention of another node. If the curves obtained over time overlap (under intervention or no intervention), then there is no relation. Otherwise, we can draw an edge between those nodes and according to the level, up or down, the edge will be activating or inhibiting. These causal edges are context-specific so in different cell line data, we may have different relations.

Blog

DREAM Breast Cancer Sub-challenges

I have been going over the sub-challenges before attempting to solve them. As I mentioned, there are three sub-challenges and somehow they are connected. First, using given data and other possible data sources such as pathway databases, the causal signaling network of the phosphoproteins. There are 4 cell lines and 8 stimulus so they make total 32 networks at the end. Nodes are phosphoproteins and edges should be directed and causal (activator or inhibitor).

Blog

HPN-DREAM Breast Cancer Network Inference Challenge

Understanding signaling networks might bring more insights on cancer treatment because cells respond to their environment by activating these networks and phosphorylation reactions play important roles in these networks. The goal of this challenge is to advance our ability and knowledge on signaling networks inference and protein phosphorylation dynamics prediction. Also, we are asked to develop a visualization method for the data. The dataset provided is extensive and a result of RPPA (reverse-phase protein array) experiments.

Blog

Dream Challenge

This year, 8th Dream Challenge takes place and I will be working on this project as my internship job in BiGCaT, Bioinformatics, UM. The challenge brings scientists to catalyze the interaction between experiment and theory in the area of cellular network inference and quantitative model building in systems biology (as said on their webpage). In this competition, I will work on a specific challenge about network modeling, dynamic response predictions and data visualization.

Blog

Last Submissions to the Challenge

Today, I submitted in silico and experimental data network inference results on Synapse for the next leaderboard on this Wednesday. For experimental part, I had to exclude edges with FGFR1 and FGFR3 because the data lacks phosphorylated forms of these proteins and networks must be constructed using only phosphoproteins in the data. Since there was an update for in silico part, I had to modify the script and resubmit the results.

Blog

Network Visualization Using Cytoscape

Cytoscape is a nice tool to visualize network for better understanding and delivery. I used it for in silico data network visualization and the result was really pretty. Now, I have networks constructed using experimental data from HPN-DREAM Challenge. In this post, I want to demonstrate how to visualize a network with scores. I’m using Cytoscape 2.8 on Ubuntu 12. First, the network will be read from a SIF file which is default format of Cytoscape for networks.

Blog

Plotting Expression Curves for Experimental Data

As I can plot expression curves for in silico data. I moved on experimental data which is more complex and larger. This data is the result of RPPA experiments on different breast cancer cell lines and it includes protein abundance measurements for about 45 phophoproteins. These phosphoproteins are treated with different inhibitors and stimuli and by comparing their expressions, I will try to infer relations between them. Before moving on inferring part, I want to have a script that can plot the graphs so that I can see particular results for specific cases.

Blog

Experimental Data Optimization for Network Inference

As I mentioned in my previous post, experimental data from the challenge has missing data values that create problems during analyses. To solve it, first thing I did was to optimize data, which includes detecting missing conditions and putting NAs for data values and sorting them if necessary. I wrote two functions in the script. First one ranks the data according to the fashion and sorts it based on these ranks.

Blog

Working with Experimental Data from Network Inference Challenge

As I almost finished with in silico data, I moved on to analyses of experimental data using the same script. But since the characteristics of data is somehow different, before inferring network, I need to modify the script to be able to read experimental data files. These differences include missing data values for some conditions. This makes analyses difficult because I have to estimate a value for them and this will decrease the confidence score of edges.

Blog

In silico Network Inference Last Improvements and Visualization of Result in Cytoscape

I’m almost done with the analysis of in silico data, although I need to decide if I need further analysis with the inhibiting parent nodes in the network. Last, I couldn’t filter out duplicate edges, which were scored differently. Now, with some improvements in the script, low scores duplicates are filtered and there is a better final list of edges which is ready to be visualized. I also tried visualizing it on Cytoscape.

Blog

Latest Progress on Network Inference and Edge Scoring

I have improved network inference part of the script slightly by changing the way of comparing intervention (presence of inhibitor and stimulus) and no intervention (presence of stimulus) data from in silico part. Now, I’m using a function (simp) from an R package called StreamMetabolism, which gets time points and data values and (does integration) calculates the area under the curve (Sefick, 2009). I do this integration for both condition and then I compare them.

Blog

Scoring Edges Network Inference HPN-DREAM Challenge

Yesterday, I managed to infer a network for some part of in silico data from the challenge. Since the challenge also asks for scoring the edges in networks, I developed the script further and add a function for that. edgeScorer function gets data object of average time points for each curve in intervention/no-intervention sets and scores each edge for each set of conditions. For this, first, it looks for the largest difference among the sets and set it as maxDifference and later, it stores differences divided by maxDifference in another data object.

Blog

Last Submissions to the Challenge

Today, I submitted in silico and experimental data network inference results on Synapse for the next leaderboard on this Wednesday. For experimental part, I had to exclude edges with FGFR1 and FGFR3 because the data lacks phosphorylated forms of these proteins and networks must be constructed using only phosphoproteins in the data. Since there was an update for in silico part, I had to modify the script and resubmit the results.

Blog

Working with Experimental Data from Network Inference Challenge

As I almost finished with in silico data, I moved on to analyses of experimental data using the same script. But since the characteristics of data is somehow different, before inferring network, I need to modify the script to be able to read experimental data files. These differences include missing data values for some conditions. This makes analyses difficult because I have to estimate a value for them and this will decrease the confidence score of edges.

Blog

In silico Network Inference Last Improvements and Visualization of Result in Cytoscape

I’m almost done with the analysis of in silico data, although I need to decide if I need further analysis with the inhibiting parent nodes in the network. Last, I couldn’t filter out duplicate edges, which were scored differently. Now, with some improvements in the script, low scores duplicates are filtered and there is a better final list of edges which is ready to be visualized. I also tried visualizing it on Cytoscape.

Blog

Scoring Edges Network Inference HPN-DREAM Challenge

Yesterday, I managed to infer a network for some part of in silico data from the challenge. Since the challenge also asks for scoring the edges in networks, I developed the script further and add a function for that. edgeScorer function gets data object of average time points for each curve in intervention/no-intervention sets and scores each edge for each set of conditions. For this, first, it looks for the largest difference among the sets and set it as maxDifference and later, it stores differences divided by maxDifference in another data object.

Blog

Determining Edges More Progress on Network Inference

Lately, I have been writing an R script to infer network using in silico data. Last version of the script was reading MIDAS file and plotting expression profiles. I have modified it and now it reads MIDAS file, does some analyses and prints causal relations to a file. This file is a SIF file as required. This dataset is generated with 20 antibodies but only 3 of them are perturbed. Also, for one, stimulus is missing.

Blog

Plotting Expression Profiles Data Analysis for Network Inference

For in silico data network inference I decided to develop a script because the existing tools have bugs and they are not compatible with the data. At the same time, I will try to report bugs and the compatibility issues to developers. in silico data has 660 experiment results of 20 antibodies, 4 kinds of stimuli and 3 kinds of inhibitors. Antibodies are treated with a stimulus, say at t_0 and in the case of inhibitors, say at t_i, antibodies are pre-incubated for some time (t_pre) and then, treated with a stimulus.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

Progress on Network Inference Sub-Challenge

This sub-challenge has several requirements: Directed and causal edges on the models (32 models - 4 cell lines × 8 stimuli) Edges should be scored (normalizing to range between 0 and 1) that will show confidence Nodes will be phosphoproteins from the data Prior knowledge network (that can be constructed using pathway databases) is allowed to be used (actually this is a must for some network inference tools) First thing was to look for existing tools.

Blog

HPN-DREAM Breast Cancer Network Inference Challenge

Understanding signaling networks might bring more insights on cancer treatment because cells respond to their environment by activating these networks and phosphorylation reactions play important roles in these networks. The goal of this challenge is to advance our ability and knowledge on signaling networks inference and protein phosphorylation dynamics prediction. Also, we are asked to develop a visualization method for the data. The dataset provided is extensive and a result of RPPA (reverse-phase protein array) experiments.

Blog

Last Submissions to the Challenge

Today, I submitted in silico and experimental data network inference results on Synapse for the next leaderboard on this Wednesday. For experimental part, I had to exclude edges with FGFR1 and FGFR3 because the data lacks phosphorylated forms of these proteins and networks must be constructed using only phosphoproteins in the data. Since there was an update for in silico part, I had to modify the script and resubmit the results.

Blog

Network Visualization Using Cytoscape

Cytoscape is a nice tool to visualize network for better understanding and delivery. I used it for in silico data network visualization and the result was really pretty. Now, I have networks constructed using experimental data from HPN-DREAM Challenge. In this post, I want to demonstrate how to visualize a network with scores. I’m using Cytoscape 2.8 on Ubuntu 12. First, the network will be read from a SIF file which is default format of Cytoscape for networks.

Blog

Plotting Expression Curves for Experimental Data

As I can plot expression curves for in silico data. I moved on experimental data which is more complex and larger. This data is the result of RPPA experiments on different breast cancer cell lines and it includes protein abundance measurements for about 45 phophoproteins. These phosphoproteins are treated with different inhibitors and stimuli and by comparing their expressions, I will try to infer relations between them. Before moving on inferring part, I want to have a script that can plot the graphs so that I can see particular results for specific cases.

Blog

Experimental Data Optimization for Network Inference

As I mentioned in my previous post, experimental data from the challenge has missing data values that create problems during analyses. To solve it, first thing I did was to optimize data, which includes detecting missing conditions and putting NAs for data values and sorting them if necessary. I wrote two functions in the script. First one ranks the data according to the fashion and sorts it based on these ranks.

Blog

Working with Experimental Data from Network Inference Challenge

As I almost finished with in silico data, I moved on to analyses of experimental data using the same script. But since the characteristics of data is somehow different, before inferring network, I need to modify the script to be able to read experimental data files. These differences include missing data values for some conditions. This makes analyses difficult because I have to estimate a value for them and this will decrease the confidence score of edges.

Blog

In silico Network Inference Last Improvements and Visualization of Result in Cytoscape

I’m almost done with the analysis of in silico data, although I need to decide if I need further analysis with the inhibiting parent nodes in the network. Last, I couldn’t filter out duplicate edges, which were scored differently. Now, with some improvements in the script, low scores duplicates are filtered and there is a better final list of edges which is ready to be visualized. I also tried visualizing it on Cytoscape.

Blog

Latest Progress on Network Inference and Edge Scoring

I have improved network inference part of the script slightly by changing the way of comparing intervention (presence of inhibitor and stimulus) and no intervention (presence of stimulus) data from in silico part. Now, I’m using a function (simp) from an R package called StreamMetabolism, which gets time points and data values and (does integration) calculates the area under the curve (Sefick, 2009). I do this integration for both condition and then I compare them.

Blog

Scoring Edges Network Inference HPN-DREAM Challenge

Yesterday, I managed to infer a network for some part of in silico data from the challenge. Since the challenge also asks for scoring the edges in networks, I developed the script further and add a function for that. edgeScorer function gets data object of average time points for each curve in intervention/no-intervention sets and scores each edge for each set of conditions. For this, first, it looks for the largest difference among the sets and set it as maxDifference and later, it stores differences divided by maxDifference in another data object.

Blog

Determining Edges More Progress on Network Inference

Lately, I have been writing an R script to infer network using in silico data. Last version of the script was reading MIDAS file and plotting expression profiles. I have modified it and now it reads MIDAS file, does some analyses and prints causal relations to a file. This file is a SIF file as required. This dataset is generated with 20 antibodies but only 3 of them are perturbed. Also, for one, stimulus is missing.

Blog

Plotting Expression Profiles Data Analysis for Network Inference

For in silico data network inference I decided to develop a script because the existing tools have bugs and they are not compatible with the data. At the same time, I will try to report bugs and the compatibility issues to developers. in silico data has 660 experiment results of 20 antibodies, 4 kinds of stimuli and 3 kinds of inhibitors. Antibodies are treated with a stimulus, say at t_0 and in the case of inhibitors, say at t_i, antibodies are pre-incubated for some time (t_pre) and then, treated with a stimulus.

Blog

Webinar on HPN-DREAM Breast Cancer Network Inference Challenge

DREAM8 organizers plan a webinar about HPN-DREAM Breast Cancer Network Inference Challenge on July 19, at 10:30 - 11:30 (PDT / UTC -7). General setup of the challenge, demo submissions to the leaderboard will be discussed and also questions about the challenge will be accepted during webinar. The number of the participants to the challenge is also announced: 138. Registration to the webinar is done using this form. There are limited number of “seats”, but later recordings will be published.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

Playing around with CellNOptR Tool and MIDAS File

With CellNOptR, we will try to construct network models for the challenge. For this, the tool needs two inputs. First one is a special data object called CNOlist that stores vectors and matrices of data. Second one is a .SIF file that contains prior knowledge network which can be obtained from pathway database and analysis tools. CNOlist contains following fields: namesSignals, namesCues, namesStimuli and namesInhibitors, which are vectors storing the names of measurements.

Blog

Progress on Network Inference Sub-Challenge

This sub-challenge has several requirements: Directed and causal edges on the models (32 models - 4 cell lines × 8 stimuli) Edges should be scored (normalizing to range between 0 and 1) that will show confidence Nodes will be phosphoproteins from the data Prior knowledge network (that can be constructed using pathway databases) is allowed to be used (actually this is a must for some network inference tools) First thing was to look for existing tools.

Blog

Network Inference DREAM Breast Cancer Challenge

The inference of causal edges are described as the change on a node seen after the intervention of another node. If the curves obtained over time overlap (under intervention or no intervention), then there is no relation. Otherwise, we can draw an edge between those nodes and according to the level, up or down, the edge will be activating or inhibiting. These causal edges are context-specific so in different cell line data, we may have different relations.

Blog

DREAM Breast Cancer Sub-challenges

I have been going over the sub-challenges before attempting to solve them. As I mentioned, there are three sub-challenges and somehow they are connected. First, using given data and other possible data sources such as pathway databases, the causal signaling network of the phosphoproteins. There are 4 cell lines and 8 stimulus so they make total 32 networks at the end. Nodes are phosphoproteins and edges should be directed and causal (activator or inhibitor).

Blog

HPN-DREAM Breast Cancer Network Inference Challenge

Understanding signaling networks might bring more insights on cancer treatment because cells respond to their environment by activating these networks and phosphorylation reactions play important roles in these networks. The goal of this challenge is to advance our ability and knowledge on signaling networks inference and protein phosphorylation dynamics prediction. Also, we are asked to develop a visualization method for the data. The dataset provided is extensive and a result of RPPA (reverse-phase protein array) experiments.

Blog

Dream Challenge

This year, 8th Dream Challenge takes place and I will be working on this project as my internship job in BiGCaT, Bioinformatics, UM. The challenge brings scientists to catalyze the interaction between experiment and theory in the area of cellular network inference and quantitative model building in systems biology (as said on their webpage). In this competition, I will work on a specific challenge about network modeling, dynamic response predictions and data visualization.

Blog

Network Visualization Using Cytoscape

Cytoscape is a nice tool to visualize network for better understanding and delivery. I used it for in silico data network visualization and the result was really pretty. Now, I have networks constructed using experimental data from HPN-DREAM Challenge. In this post, I want to demonstrate how to visualize a network with scores. I’m using Cytoscape 2.8 on Ubuntu 12. First, the network will be read from a SIF file which is default format of Cytoscape for networks.

Blog

Network Visualization Using Cytoscape

Cytoscape is a nice tool to visualize network for better understanding and delivery. I used it for in silico data network visualization and the result was really pretty. Now, I have networks constructed using experimental data from HPN-DREAM Challenge. In this post, I want to demonstrate how to visualize a network with scores. I’m using Cytoscape 2.8 on Ubuntu 12. First, the network will be read from a SIF file which is default format of Cytoscape for networks.

Blog

BWA (Burrows-Wheeler Aligner) Hizalayıcı - Eşleştirici

Önceki yazımda belirttiğim gibi bir eşleştirici (aligner ya da mapper) kullanarak elimdeki verinin referans genomu ile ne derece eşlestiğini bulmaya çalışacağım. Daha sonra eşleşmeyen kısmıyla birtakım analizler yapacağım. BWA (Burrows-Wheeler Aligner) görece kısa dizilimleri insan genomu gibi uzun referans genomlarıyla eşleştiren bir program. 200bp (bp: baz çifti) uzunluğuna kadar bwa-short algoritması, 200bp - 100kbp arası ise BWA-SW algoritması kullanılıyor. Hizalayıcı - eşleştirici seçmede birçok faktör rol oynuyor. Birçok bu tip araç var ve farklı özelliklere sahipler.

Blog

Network Visualization Using Cytoscape

Cytoscape is a nice tool to visualize network for better understanding and delivery. I used it for in silico data network visualization and the result was really pretty. Now, I have networks constructed using experimental data from HPN-DREAM Challenge. In this post, I want to demonstrate how to visualize a network with scores. I’m using Cytoscape 2.8 on Ubuntu 12. First, the network will be read from a SIF file which is default format of Cytoscape for networks.

Blog

Determining Edges More Progress on Network Inference

Lately, I have been writing an R script to infer network using in silico data. Last version of the script was reading MIDAS file and plotting expression profiles. I have modified it and now it reads MIDAS file, does some analyses and prints causal relations to a file. This file is a SIF file as required. This dataset is generated with 20 antibodies but only 3 of them are perturbed. Also, for one, stimulus is missing.

Blog

Playing around with CellNOptR Tool and MIDAS File

With CellNOptR, we will try to construct network models for the challenge. For this, the tool needs two inputs. First one is a special data object called CNOlist that stores vectors and matrices of data. Second one is a .SIF file that contains prior knowledge network which can be obtained from pathway database and analysis tools. CNOlist contains following fields: namesSignals, namesCues, namesStimuli and namesInhibitors, which are vectors storing the names of measurements.

Blog

Progress on Network Inference Sub-Challenge

This sub-challenge has several requirements: Directed and causal edges on the models (32 models - 4 cell lines × 8 stimuli) Edges should be scored (normalizing to range between 0 and 1) that will show confidence Nodes will be phosphoproteins from the data Prior knowledge network (that can be constructed using pathway databases) is allowed to be used (actually this is a must for some network inference tools) First thing was to look for existing tools.

Blog

Network Inference DREAM Breast Cancer Challenge

The inference of causal edges are described as the change on a node seen after the intervention of another node. If the curves obtained over time overlap (under intervention or no intervention), then there is no relation. Otherwise, we can draw an edge between those nodes and according to the level, up or down, the edge will be activating or inhibiting. These causal edges are context-specific so in different cell line data, we may have different relations.

Blog

Network Visualization Using Cytoscape

Cytoscape is a nice tool to visualize network for better understanding and delivery. I used it for in silico data network visualization and the result was really pretty. Now, I have networks constructed using experimental data from HPN-DREAM Challenge. In this post, I want to demonstrate how to visualize a network with scores. I’m using Cytoscape 2.8 on Ubuntu 12. First, the network will be read from a SIF file which is default format of Cytoscape for networks.

Blog

Plotting Expression Curves for Experimental Data

As I can plot expression curves for in silico data. I moved on experimental data which is more complex and larger. This data is the result of RPPA experiments on different breast cancer cell lines and it includes protein abundance measurements for about 45 phophoproteins. These phosphoproteins are treated with different inhibitors and stimuli and by comparing their expressions, I will try to infer relations between them. Before moving on inferring part, I want to have a script that can plot the graphs so that I can see particular results for specific cases.

Blog

Plotting Expression Curves for Experimental Data

As I can plot expression curves for in silico data. I moved on experimental data which is more complex and larger. This data is the result of RPPA experiments on different breast cancer cell lines and it includes protein abundance measurements for about 45 phophoproteins. These phosphoproteins are treated with different inhibitors and stimuli and by comparing their expressions, I will try to infer relations between them. Before moving on inferring part, I want to have a script that can plot the graphs so that I can see particular results for specific cases.

Blog

Plotting Expression Curves for Experimental Data

As I can plot expression curves for in silico data. I moved on experimental data which is more complex and larger. This data is the result of RPPA experiments on different breast cancer cell lines and it includes protein abundance measurements for about 45 phophoproteins. These phosphoproteins are treated with different inhibitors and stimuli and by comparing their expressions, I will try to infer relations between them. Before moving on inferring part, I want to have a script that can plot the graphs so that I can see particular results for specific cases.

Blog

Plotting Expression Curves for Experimental Data

As I can plot expression curves for in silico data. I moved on experimental data which is more complex and larger. This data is the result of RPPA experiments on different breast cancer cell lines and it includes protein abundance measurements for about 45 phophoproteins. These phosphoproteins are treated with different inhibitors and stimuli and by comparing their expressions, I will try to infer relations between them. Before moving on inferring part, I want to have a script that can plot the graphs so that I can see particular results for specific cases.

Blog

Network Inference DREAM Breast Cancer Challenge

The inference of causal edges are described as the change on a node seen after the intervention of another node. If the curves obtained over time overlap (under intervention or no intervention), then there is no relation. Otherwise, we can draw an edge between those nodes and according to the level, up or down, the edge will be activating or inhibiting. These causal edges are context-specific so in different cell line data, we may have different relations.

Blog

HPN-DREAM Breast Cancer Network Inference Challenge

Understanding signaling networks might bring more insights on cancer treatment because cells respond to their environment by activating these networks and phosphorylation reactions play important roles in these networks. The goal of this challenge is to advance our ability and knowledge on signaling networks inference and protein phosphorylation dynamics prediction. Also, we are asked to develop a visualization method for the data. The dataset provided is extensive and a result of RPPA (reverse-phase protein array) experiments.

Blog

Experimental Data Optimization for Network Inference

As I mentioned in my previous post, experimental data from the challenge has missing data values that create problems during analyses. To solve it, first thing I did was to optimize data, which includes detecting missing conditions and putting NAs for data values and sorting them if necessary. I wrote two functions in the script. First one ranks the data according to the fashion and sorts it based on these ranks.

Blog

Working with Experimental Data from Network Inference Challenge

As I almost finished with in silico data, I moved on to analyses of experimental data using the same script. But since the characteristics of data is somehow different, before inferring network, I need to modify the script to be able to read experimental data files. These differences include missing data values for some conditions. This makes analyses difficult because I have to estimate a value for them and this will decrease the confidence score of edges.

Blog

Webinar on HPN-DREAM Breast Cancer Network Inference Challenge

DREAM8 organizers plan a webinar about HPN-DREAM Breast Cancer Network Inference Challenge on July 19, at 10:30 - 11:30 (PDT / UTC -7). General setup of the challenge, demo submissions to the leaderboard will be discussed and also questions about the challenge will be accepted during webinar. The number of the participants to the challenge is also announced: 138. Registration to the webinar is done using this form. There are limited number of “seats”, but later recordings will be published.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

DREAM Breast Cancer Sub-challenges

I have been going over the sub-challenges before attempting to solve them. As I mentioned, there are three sub-challenges and somehow they are connected. First, using given data and other possible data sources such as pathway databases, the causal signaling network of the phosphoproteins. There are 4 cell lines and 8 stimulus so they make total 32 networks at the end. Nodes are phosphoproteins and edges should be directed and causal (activator or inhibitor).

Blog

HPN-DREAM Breast Cancer Network Inference Challenge

Understanding signaling networks might bring more insights on cancer treatment because cells respond to their environment by activating these networks and phosphorylation reactions play important roles in these networks. The goal of this challenge is to advance our ability and knowledge on signaling networks inference and protein phosphorylation dynamics prediction. Also, we are asked to develop a visualization method for the data. The dataset provided is extensive and a result of RPPA (reverse-phase protein array) experiments.

Blog

Experimental Data Optimization for Network Inference

As I mentioned in my previous post, experimental data from the challenge has missing data values that create problems during analyses. To solve it, first thing I did was to optimize data, which includes detecting missing conditions and putting NAs for data values and sorting them if necessary. I wrote two functions in the script. First one ranks the data according to the fashion and sorts it based on these ranks.

Blog

Experimental Data Optimization for Network Inference

As I mentioned in my previous post, experimental data from the challenge has missing data values that create problems during analyses. To solve it, first thing I did was to optimize data, which includes detecting missing conditions and putting NAs for data values and sorting them if necessary. I wrote two functions in the script. First one ranks the data according to the fashion and sorts it based on these ranks.

Blog

Experimental Data Optimization for Network Inference

As I mentioned in my previous post, experimental data from the challenge has missing data values that create problems during analyses. To solve it, first thing I did was to optimize data, which includes detecting missing conditions and putting NAs for data values and sorting them if necessary. I wrote two functions in the script. First one ranks the data according to the fashion and sorts it based on these ranks.

Blog

Experimental Data Optimization for Network Inference

As I mentioned in my previous post, experimental data from the challenge has missing data values that create problems during analyses. To solve it, first thing I did was to optimize data, which includes detecting missing conditions and putting NAs for data values and sorting them if necessary. I wrote two functions in the script. First one ranks the data according to the fashion and sorts it based on these ranks.

Blog

Experimental Data Optimization for Network Inference

As I mentioned in my previous post, experimental data from the challenge has missing data values that create problems during analyses. To solve it, first thing I did was to optimize data, which includes detecting missing conditions and putting NAs for data values and sorting them if necessary. I wrote two functions in the script. First one ranks the data according to the fashion and sorts it based on these ranks.

Blog

Experimental Data Optimization for Network Inference

As I mentioned in my previous post, experimental data from the challenge has missing data values that create problems during analyses. To solve it, first thing I did was to optimize data, which includes detecting missing conditions and putting NAs for data values and sorting them if necessary. I wrote two functions in the script. First one ranks the data according to the fashion and sorts it based on these ranks.

Blog

Some String Functions in R, String Manipulation in R

I have programmed with Perl, Python, and PHP before, and string manipulation was more direct and easier in them than in R. But still there are useful functions for string manipulation in R. I’m not an expert in R but I’ve been dealing with it for a while and I’ve learned some good functions for this purpose. Concatenate strings Concatenation is done with paste function. It gets concatenated strings as arguments separated bu comma and also separator character(s).

Blog

Some String Functions in R, String Manipulation in R

I have programmed with Perl, Python, and PHP before, and string manipulation was more direct and easier in them than in R. But still there are useful functions for string manipulation in R. I’m not an expert in R but I’ve been dealing with it for a while and I’ve learned some good functions for this purpose. Concatenate strings Concatenation is done with paste function. It gets concatenated strings as arguments separated bu comma and also separator character(s).

Blog

Some String Functions in R, String Manipulation in R

I have programmed with Perl, Python, and PHP before, and string manipulation was more direct and easier in them than in R. But still there are useful functions for string manipulation in R. I’m not an expert in R but I’ve been dealing with it for a while and I’ve learned some good functions for this purpose. Concatenate strings Concatenation is done with paste function. It gets concatenated strings as arguments separated bu comma and also separator character(s).

Blog

Some String Functions in R, String Manipulation in R

I have programmed with Perl, Python, and PHP before, and string manipulation was more direct and easier in them than in R. But still there are useful functions for string manipulation in R. I’m not an expert in R but I’ve been dealing with it for a while and I’ve learned some good functions for this purpose. Concatenate strings Concatenation is done with paste function. It gets concatenated strings as arguments separated bu comma and also separator character(s).

Blog

Some String Functions in R, String Manipulation in R

I have programmed with Perl, Python, and PHP before, and string manipulation was more direct and easier in them than in R. But still there are useful functions for string manipulation in R. I’m not an expert in R but I’ve been dealing with it for a while and I’ve learned some good functions for this purpose. Concatenate strings Concatenation is done with paste function. It gets concatenated strings as arguments separated bu comma and also separator character(s).

Blog

Some String Functions in R, String Manipulation in R

I have programmed with Perl, Python, and PHP before, and string manipulation was more direct and easier in them than in R. But still there are useful functions for string manipulation in R. I’m not an expert in R but I’ve been dealing with it for a while and I’ve learned some good functions for this purpose. Concatenate strings Concatenation is done with paste function. It gets concatenated strings as arguments separated bu comma and also separator character(s).

Blog

Some String Functions in R, String Manipulation in R

I have programmed with Perl, Python, and PHP before, and string manipulation was more direct and easier in them than in R. But still there are useful functions for string manipulation in R. I’m not an expert in R but I’ve been dealing with it for a while and I’ve learned some good functions for this purpose. Concatenate strings Concatenation is done with paste function. It gets concatenated strings as arguments separated bu comma and also separator character(s).

Blog

Some String Functions in R, String Manipulation in R

I have programmed with Perl, Python, and PHP before, and string manipulation was more direct and easier in them than in R. But still there are useful functions for string manipulation in R. I’m not an expert in R but I’ve been dealing with it for a while and I’ve learned some good functions for this purpose. Concatenate strings Concatenation is done with paste function. It gets concatenated strings as arguments separated bu comma and also separator character(s).

Blog

Latest Progress on Network Inference and Edge Scoring

I have improved network inference part of the script slightly by changing the way of comparing intervention (presence of inhibitor and stimulus) and no intervention (presence of stimulus) data from in silico part. Now, I’m using a function (simp) from an R package called StreamMetabolism, which gets time points and data values and (does integration) calculates the area under the curve (Sefick, 2009). I do this integration for both condition and then I compare them.

Blog

Latest Progress on Network Inference and Edge Scoring

I have improved network inference part of the script slightly by changing the way of comparing intervention (presence of inhibitor and stimulus) and no intervention (presence of stimulus) data from in silico part. Now, I’m using a function (simp) from an R package called StreamMetabolism, which gets time points and data values and (does integration) calculates the area under the curve (Sefick, 2009). I do this integration for both condition and then I compare them.

Blog

Scoring Edges Network Inference HPN-DREAM Challenge

Yesterday, I managed to infer a network for some part of in silico data from the challenge. Since the challenge also asks for scoring the edges in networks, I developed the script further and add a function for that. edgeScorer function gets data object of average time points for each curve in intervention/no-intervention sets and scores each edge for each set of conditions. For this, first, it looks for the largest difference among the sets and set it as maxDifference and later, it stores differences divided by maxDifference in another data object.

Blog

Determining Edges More Progress on Network Inference

Lately, I have been writing an R script to infer network using in silico data. Last version of the script was reading MIDAS file and plotting expression profiles. I have modified it and now it reads MIDAS file, does some analyses and prints causal relations to a file. This file is a SIF file as required. This dataset is generated with 20 antibodies but only 3 of them are perturbed. Also, for one, stimulus is missing.

Blog

Plotting Expression Profiles Data Analysis for Network Inference

For in silico data network inference I decided to develop a script because the existing tools have bugs and they are not compatible with the data. At the same time, I will try to report bugs and the compatibility issues to developers. in silico data has 660 experiment results of 20 antibodies, 4 kinds of stimuli and 3 kinds of inhibitors. Antibodies are treated with a stimulus, say at t_0 and in the case of inhibitors, say at t_i, antibodies are pre-incubated for some time (t_pre) and then, treated with a stimulus.

Blog

Latest Progress on Network Inference and Edge Scoring

I have improved network inference part of the script slightly by changing the way of comparing intervention (presence of inhibitor and stimulus) and no intervention (presence of stimulus) data from in silico part. Now, I’m using a function (simp) from an R package called StreamMetabolism, which gets time points and data values and (does integration) calculates the area under the curve (Sefick, 2009). I do this integration for both condition and then I compare them.

Blog

Latest Progress on Network Inference and Edge Scoring

I have improved network inference part of the script slightly by changing the way of comparing intervention (presence of inhibitor and stimulus) and no intervention (presence of stimulus) data from in silico part. Now, I’m using a function (simp) from an R package called StreamMetabolism, which gets time points and data values and (does integration) calculates the area under the curve (Sefick, 2009). I do this integration for both condition and then I compare them.

Blog

Scoring Edges Network Inference HPN-DREAM Challenge

Yesterday, I managed to infer a network for some part of in silico data from the challenge. Since the challenge also asks for scoring the edges in networks, I developed the script further and add a function for that. edgeScorer function gets data object of average time points for each curve in intervention/no-intervention sets and scores each edge for each set of conditions. For this, first, it looks for the largest difference among the sets and set it as maxDifference and later, it stores differences divided by maxDifference in another data object.

Blog

Determining Edges More Progress on Network Inference

Lately, I have been writing an R script to infer network using in silico data. Last version of the script was reading MIDAS file and plotting expression profiles. I have modified it and now it reads MIDAS file, does some analyses and prints causal relations to a file. This file is a SIF file as required. This dataset is generated with 20 antibodies but only 3 of them are perturbed. Also, for one, stimulus is missing.

Blog

Determining Edges More Progress on Network Inference

Lately, I have been writing an R script to infer network using in silico data. Last version of the script was reading MIDAS file and plotting expression profiles. I have modified it and now it reads MIDAS file, does some analyses and prints causal relations to a file. This file is a SIF file as required. This dataset is generated with 20 antibodies but only 3 of them are perturbed. Also, for one, stimulus is missing.

Blog

Determining Edges More Progress on Network Inference

Lately, I have been writing an R script to infer network using in silico data. Last version of the script was reading MIDAS file and plotting expression profiles. I have modified it and now it reads MIDAS file, does some analyses and prints causal relations to a file. This file is a SIF file as required. This dataset is generated with 20 antibodies but only 3 of them are perturbed. Also, for one, stimulus is missing.

Blog

Determining Edges More Progress on Network Inference

Lately, I have been writing an R script to infer network using in silico data. Last version of the script was reading MIDAS file and plotting expression profiles. I have modified it and now it reads MIDAS file, does some analyses and prints causal relations to a file. This file is a SIF file as required. This dataset is generated with 20 antibodies but only 3 of them are perturbed. Also, for one, stimulus is missing.

Blog

Plotting Expression Profiles Data Analysis for Network Inference

For in silico data network inference I decided to develop a script because the existing tools have bugs and they are not compatible with the data. At the same time, I will try to report bugs and the compatibility issues to developers. in silico data has 660 experiment results of 20 antibodies, 4 kinds of stimuli and 3 kinds of inhibitors. Antibodies are treated with a stimulus, say at t_0 and in the case of inhibitors, say at t_i, antibodies are pre-incubated for some time (t_pre) and then, treated with a stimulus.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

Playing around with CellNOptR Tool and MIDAS File

With CellNOptR, we will try to construct network models for the challenge. For this, the tool needs two inputs. First one is a special data object called CNOlist that stores vectors and matrices of data. Second one is a .SIF file that contains prior knowledge network which can be obtained from pathway database and analysis tools. CNOlist contains following fields: namesSignals, namesCues, namesStimuli and namesInhibitors, which are vectors storing the names of measurements.

Blog

Webinar on HPN-DREAM Breast Cancer Network Inference Challenge

DREAM8 organizers plan a webinar about HPN-DREAM Breast Cancer Network Inference Challenge on July 19, at 10:30 - 11:30 (PDT / UTC -7). General setup of the challenge, demo submissions to the leaderboard will be discussed and also questions about the challenge will be accepted during webinar. The number of the participants to the challenge is also announced: 138. Registration to the webinar is done using this form. There are limited number of “seats”, but later recordings will be published.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

First Impressions and Thoughts on Rosalind Project

Actually, I signed up Rosalind.info 8 months ago, I didn’t really play around with it. But last week, in a BiGCaT science cafe, after I learnt it, I was more interested than before and I just started solving problems. In each problem, you have a description about the context and also about the problem. Also, there is a sample input and output. Sometimes there are hints about the solution. What I did was to write a solution that works for the sample and hopefully for the problem.

Blog

Dream Challenge

This year, 8th Dream Challenge takes place and I will be working on this project as my internship job in BiGCaT, Bioinformatics, UM. The challenge brings scientists to catalyze the interaction between experiment and theory in the area of cellular network inference and quantitative model building in systems biology (as said on their webpage). In this competition, I will work on a specific challenge about network modeling, dynamic response predictions and data visualization.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

Playing around with CellNOptR Tool and MIDAS File

With CellNOptR, we will try to construct network models for the challenge. For this, the tool needs two inputs. First one is a special data object called CNOlist that stores vectors and matrices of data. Second one is a .SIF file that contains prior knowledge network which can be obtained from pathway database and analysis tools. CNOlist contains following fields: namesSignals, namesCues, namesStimuli and namesInhibitors, which are vectors storing the names of measurements.

Blog

Progress on Network Inference Sub-Challenge

This sub-challenge has several requirements: Directed and causal edges on the models (32 models - 4 cell lines × 8 stimuli) Edges should be scored (normalizing to range between 0 and 1) that will show confidence Nodes will be phosphoproteins from the data Prior knowledge network (that can be constructed using pathway databases) is allowed to be used (actually this is a must for some network inference tools) First thing was to look for existing tools.

Blog

Network Inference DREAM Breast Cancer Challenge

The inference of causal edges are described as the change on a node seen after the intervention of another node. If the curves obtained over time overlap (under intervention or no intervention), then there is no relation. Otherwise, we can draw an edge between those nodes and according to the level, up or down, the edge will be activating or inhibiting. These causal edges are context-specific so in different cell line data, we may have different relations.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

Playing around with CellNOptR Tool and MIDAS File

With CellNOptR, we will try to construct network models for the challenge. For this, the tool needs two inputs. First one is a special data object called CNOlist that stores vectors and matrices of data. Second one is a .SIF file that contains prior knowledge network which can be obtained from pathway database and analysis tools. CNOlist contains following fields: namesSignals, namesCues, namesStimuli and namesInhibitors, which are vectors storing the names of measurements.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

Progress on Network Inference Sub-Challenge

This sub-challenge has several requirements: Directed and causal edges on the models (32 models - 4 cell lines × 8 stimuli) Edges should be scored (normalizing to range between 0 and 1) that will show confidence Nodes will be phosphoproteins from the data Prior knowledge network (that can be constructed using pathway databases) is allowed to be used (actually this is a must for some network inference tools) First thing was to look for existing tools.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

Playing around with CellNOptR Tool and MIDAS File

With CellNOptR, we will try to construct network models for the challenge. For this, the tool needs two inputs. First one is a special data object called CNOlist that stores vectors and matrices of data. Second one is a .SIF file that contains prior knowledge network which can be obtained from pathway database and analysis tools. CNOlist contains following fields: namesSignals, namesCues, namesStimuli and namesInhibitors, which are vectors storing the names of measurements.

Blog

Network Inference Challenge in silico Data

I had a meeting with BiGCaT this week and we discussed DREAM Breast Cancer Challenge. I presented the challenge and also some ways that I have found to solve the first sub-challenge network inference. Tina, from BiGCaT, suggested starting with in silico data which is much simpler than breast cancer data. Later, I can use the methods I develop for in silico data in experimental data. in silico data contains 20 antibodies, 3 inhibitors and 2 ligand stimuli with 2 different concentration for each.

Blog

First Impressions and Thoughts on Rosalind Project

Actually, I signed up Rosalind.info 8 months ago, I didn’t really play around with it. But last week, in a BiGCaT science cafe, after I learnt it, I was more interested than before and I just started solving problems. In each problem, you have a description about the context and also about the problem. Also, there is a sample input and output. Sometimes there are hints about the solution. What I did was to write a solution that works for the sample and hopefully for the problem.

Blog

Using Online Tools for Teaching Bioinformatics

I attended one of science cafe meetings of BiGCaT group today and we discussed use of online tools for teaching bioinformatics. Andra Waagmeester (PhD student form BiGCaT) introduced Rosalind Project as a teaching tool. This project mainly focuses on bioinformatics solutions. Various questions about bioinformatics are asked on the website. Actually, those are various problems that can be seen in any bioinformatics research and by solving them, it helps you learn bioinformatics.

Blog

HPN-DREAM Breast Cancer Network Inference Challenge

Understanding signaling networks might bring more insights on cancer treatment because cells respond to their environment by activating these networks and phosphorylation reactions play important roles in these networks. The goal of this challenge is to advance our ability and knowledge on signaling networks inference and protein phosphorylation dynamics prediction. Also, we are asked to develop a visualization method for the data. The dataset provided is extensive and a result of RPPA (reverse-phase protein array) experiments.

Blog

Dream Challenge

This year, 8th Dream Challenge takes place and I will be working on this project as my internship job in BiGCaT, Bioinformatics, UM. The challenge brings scientists to catalyze the interaction between experiment and theory in the area of cellular network inference and quantitative model building in systems biology (as said on their webpage). In this competition, I will work on a specific challenge about network modeling, dynamic response predictions and data visualization.

Blog

Biyoinformatik mi? Yoksa Biyoenformatik mi?

Yazılarıma konu ararken kitaplarla birlikte interneti de karıştırıyorum. Yabancı kaynaklar elbette fazlaca var ve yeterliler, ancak Türkçe kaynaklara baktığımda ilk gözüme çarpan bu alanın isminin farklı kullanımları oldu. Biliyorsunuz, İngilizcede bu alana bioinformatics deniyor. Gayet normal, çünkü İngilizcede informatics ics eki ile birlikte information sözcüğünden geliyor. Bu sözcük ise Latince kökene sahip1. Enformatik sözcüğü Türkçeye, Fransızcadan informatique sözcüğünden, enformatik olarak gelmiş, ayrıca bilişim olarak da Türkçesi önerilmiş2. Elbette bu Fransızca sözcük de İngilizcesi ile aynı kökene sahip.

Blog

First Impressions and Thoughts on Rosalind Project

Actually, I signed up Rosalind.info 8 months ago, I didn’t really play around with it. But last week, in a BiGCaT science cafe, after I learnt it, I was more interested than before and I just started solving problems. In each problem, you have a description about the context and also about the problem. Also, there is a sample input and output. Sometimes there are hints about the solution. What I did was to write a solution that works for the sample and hopefully for the problem.

Blog

First Impressions and Thoughts on Rosalind Project

Actually, I signed up Rosalind.info 8 months ago, I didn’t really play around with it. But last week, in a BiGCaT science cafe, after I learnt it, I was more interested than before and I just started solving problems. In each problem, you have a description about the context and also about the problem. Also, there is a sample input and output. Sometimes there are hints about the solution. What I did was to write a solution that works for the sample and hopefully for the problem.

Blog

Retrieving Data with AJAX using jQuery, PHP and MySQL

Last semester, I took a course from Informatics Institute at METU called “Biological Databases and Data Analysis Tools” where first we learned what is a database and how to do queries on it. Also, the technology behind databases are taught. Then, we learned many biological databases and data analysis tools available. These include gene, protein and pathway databases, tools for creating databases. As a final project, we were asked to create an online tool that can search a database and get the data and display it on any web browsers.

Blog

First Impressions and Thoughts on Rosalind Project

Actually, I signed up Rosalind.info 8 months ago, I didn’t really play around with it. But last week, in a BiGCaT science cafe, after I learnt it, I was more interested than before and I just started solving problems. In each problem, you have a description about the context and also about the problem. Also, there is a sample input and output. Sometimes there are hints about the solution. What I did was to write a solution that works for the sample and hopefully for the problem.

Blog

First Impressions and Thoughts on Rosalind Project

Actually, I signed up Rosalind.info 8 months ago, I didn’t really play around with it. But last week, in a BiGCaT science cafe, after I learnt it, I was more interested than before and I just started solving problems. In each problem, you have a description about the context and also about the problem. Also, there is a sample input and output. Sometimes there are hints about the solution. What I did was to write a solution that works for the sample and hopefully for the problem.

Blog

First Impressions and Thoughts on Rosalind Project

Actually, I signed up Rosalind.info 8 months ago, I didn’t really play around with it. But last week, in a BiGCaT science cafe, after I learnt it, I was more interested than before and I just started solving problems. In each problem, you have a description about the context and also about the problem. Also, there is a sample input and output. Sometimes there are hints about the solution. What I did was to write a solution that works for the sample and hopefully for the problem.

Blog

First Impressions and Thoughts on Rosalind Project

Actually, I signed up Rosalind.info 8 months ago, I didn’t really play around with it. But last week, in a BiGCaT science cafe, after I learnt it, I was more interested than before and I just started solving problems. In each problem, you have a description about the context and also about the problem. Also, there is a sample input and output. Sometimes there are hints about the solution. What I did was to write a solution that works for the sample and hopefully for the problem.

Blog

Using Online Tools for Teaching Bioinformatics

I attended one of science cafe meetings of BiGCaT group today and we discussed use of online tools for teaching bioinformatics. Andra Waagmeester (PhD student form BiGCaT) introduced Rosalind Project as a teaching tool. This project mainly focuses on bioinformatics solutions. Various questions about bioinformatics are asked on the website. Actually, those are various problems that can be seen in any bioinformatics research and by solving them, it helps you learn bioinformatics.

Blog

Progress on Network Inference Sub-Challenge

This sub-challenge has several requirements: Directed and causal edges on the models (32 models - 4 cell lines × 8 stimuli) Edges should be scored (normalizing to range between 0 and 1) that will show confidence Nodes will be phosphoproteins from the data Prior knowledge network (that can be constructed using pathway databases) is allowed to be used (actually this is a must for some network inference tools) First thing was to look for existing tools.

Blog

Network Inference DREAM Breast Cancer Challenge

The inference of causal edges are described as the change on a node seen after the intervention of another node. If the curves obtained over time overlap (under intervention or no intervention), then there is no relation. Otherwise, we can draw an edge between those nodes and according to the level, up or down, the edge will be activating or inhibiting. These causal edges are context-specific so in different cell line data, we may have different relations.

Blog

Progress on Network Inference Sub-Challenge

This sub-challenge has several requirements: Directed and causal edges on the models (32 models - 4 cell lines × 8 stimuli) Edges should be scored (normalizing to range between 0 and 1) that will show confidence Nodes will be phosphoproteins from the data Prior knowledge network (that can be constructed using pathway databases) is allowed to be used (actually this is a must for some network inference tools) First thing was to look for existing tools.

Blog

Progress on Network Inference Sub-Challenge

This sub-challenge has several requirements: Directed and causal edges on the models (32 models - 4 cell lines × 8 stimuli) Edges should be scored (normalizing to range between 0 and 1) that will show confidence Nodes will be phosphoproteins from the data Prior knowledge network (that can be constructed using pathway databases) is allowed to be used (actually this is a must for some network inference tools) First thing was to look for existing tools.

Blog

Progress on Network Inference Sub-Challenge

This sub-challenge has several requirements: Directed and causal edges on the models (32 models - 4 cell lines × 8 stimuli) Edges should be scored (normalizing to range between 0 and 1) that will show confidence Nodes will be phosphoproteins from the data Prior knowledge network (that can be constructed using pathway databases) is allowed to be used (actually this is a must for some network inference tools) First thing was to look for existing tools.

Blog

Progress on Network Inference Sub-Challenge

This sub-challenge has several requirements: Directed and causal edges on the models (32 models - 4 cell lines × 8 stimuli) Edges should be scored (normalizing to range between 0 and 1) that will show confidence Nodes will be phosphoproteins from the data Prior knowledge network (that can be constructed using pathway databases) is allowed to be used (actually this is a must for some network inference tools) First thing was to look for existing tools.

Blog

Retrieving Data with AJAX using jQuery, PHP and MySQL

Last semester, I took a course from Informatics Institute at METU called “Biological Databases and Data Analysis Tools” where first we learned what is a database and how to do queries on it. Also, the technology behind databases are taught. Then, we learned many biological databases and data analysis tools available. These include gene, protein and pathway databases, tools for creating databases. As a final project, we were asked to create an online tool that can search a database and get the data and display it on any web browsers.

Blog

Retrieving Data with AJAX using jQuery, PHP and MySQL

Last semester, I took a course from Informatics Institute at METU called “Biological Databases and Data Analysis Tools” where first we learned what is a database and how to do queries on it. Also, the technology behind databases are taught. Then, we learned many biological databases and data analysis tools available. These include gene, protein and pathway databases, tools for creating databases. As a final project, we were asked to create an online tool that can search a database and get the data and display it on any web browsers.

Blog

Retrieving Data with AJAX using jQuery, PHP and MySQL

Last semester, I took a course from Informatics Institute at METU called “Biological Databases and Data Analysis Tools” where first we learned what is a database and how to do queries on it. Also, the technology behind databases are taught. Then, we learned many biological databases and data analysis tools available. These include gene, protein and pathway databases, tools for creating databases. As a final project, we were asked to create an online tool that can search a database and get the data and display it on any web browsers.

Blog

Retrieving Data with AJAX using jQuery, PHP and MySQL

Last semester, I took a course from Informatics Institute at METU called “Biological Databases and Data Analysis Tools” where first we learned what is a database and how to do queries on it. Also, the technology behind databases are taught. Then, we learned many biological databases and data analysis tools available. These include gene, protein and pathway databases, tools for creating databases. As a final project, we were asked to create an online tool that can search a database and get the data and display it on any web browsers.

Blog

Retrieving Data with AJAX using jQuery, PHP and MySQL

Last semester, I took a course from Informatics Institute at METU called “Biological Databases and Data Analysis Tools” where first we learned what is a database and how to do queries on it. Also, the technology behind databases are taught. Then, we learned many biological databases and data analysis tools available. These include gene, protein and pathway databases, tools for creating databases. As a final project, we were asked to create an online tool that can search a database and get the data and display it on any web browsers.

Blog

Retrieving Data with AJAX using jQuery, PHP and MySQL

Last semester, I took a course from Informatics Institute at METU called “Biological Databases and Data Analysis Tools” where first we learned what is a database and how to do queries on it. Also, the technology behind databases are taught. Then, we learned many biological databases and data analysis tools available. These include gene, protein and pathway databases, tools for creating databases. As a final project, we were asked to create an online tool that can search a database and get the data and display it on any web browsers.

Blog

Retrieving Data with AJAX using jQuery, PHP and MySQL

Last semester, I took a course from Informatics Institute at METU called “Biological Databases and Data Analysis Tools” where first we learned what is a database and how to do queries on it. Also, the technology behind databases are taught. Then, we learned many biological databases and data analysis tools available. These include gene, protein and pathway databases, tools for creating databases. As a final project, we were asked to create an online tool that can search a database and get the data and display it on any web browsers.

Blog

Retrieving Data with AJAX using jQuery, PHP and MySQL

Last semester, I took a course from Informatics Institute at METU called “Biological Databases and Data Analysis Tools” where first we learned what is a database and how to do queries on it. Also, the technology behind databases are taught. Then, we learned many biological databases and data analysis tools available. These include gene, protein and pathway databases, tools for creating databases. As a final project, we were asked to create an online tool that can search a database and get the data and display it on any web browsers.

Blog

Using Online Tools for Teaching Bioinformatics

I attended one of science cafe meetings of BiGCaT group today and we discussed use of online tools for teaching bioinformatics. Andra Waagmeester (PhD student form BiGCaT) introduced Rosalind Project as a teaching tool. This project mainly focuses on bioinformatics solutions. Various questions about bioinformatics are asked on the website. Actually, those are various problems that can be seen in any bioinformatics research and by solving them, it helps you learn bioinformatics.

Blog

Using Online Tools for Teaching Bioinformatics

I attended one of science cafe meetings of BiGCaT group today and we discussed use of online tools for teaching bioinformatics. Andra Waagmeester (PhD student form BiGCaT) introduced Rosalind Project as a teaching tool. This project mainly focuses on bioinformatics solutions. Various questions about bioinformatics are asked on the website. Actually, those are various problems that can be seen in any bioinformatics research and by solving them, it helps you learn bioinformatics.

Blog

Using Online Tools for Teaching Bioinformatics

I attended one of science cafe meetings of BiGCaT group today and we discussed use of online tools for teaching bioinformatics. Andra Waagmeester (PhD student form BiGCaT) introduced Rosalind Project as a teaching tool. This project mainly focuses on bioinformatics solutions. Various questions about bioinformatics are asked on the website. Actually, those are various problems that can be seen in any bioinformatics research and by solving them, it helps you learn bioinformatics.

Blog

Using Online Tools for Teaching Bioinformatics

I attended one of science cafe meetings of BiGCaT group today and we discussed use of online tools for teaching bioinformatics. Andra Waagmeester (PhD student form BiGCaT) introduced Rosalind Project as a teaching tool. This project mainly focuses on bioinformatics solutions. Various questions about bioinformatics are asked on the website. Actually, those are various problems that can be seen in any bioinformatics research and by solving them, it helps you learn bioinformatics.

Blog

Network Inference DREAM Breast Cancer Challenge

The inference of causal edges are described as the change on a node seen after the intervention of another node. If the curves obtained over time overlap (under intervention or no intervention), then there is no relation. Otherwise, we can draw an edge between those nodes and according to the level, up or down, the edge will be activating or inhibiting. These causal edges are context-specific so in different cell line data, we may have different relations.

Blog

Network Inference DREAM Breast Cancer Challenge

The inference of causal edges are described as the change on a node seen after the intervention of another node. If the curves obtained over time overlap (under intervention or no intervention), then there is no relation. Otherwise, we can draw an edge between those nodes and according to the level, up or down, the edge will be activating or inhibiting. These causal edges are context-specific so in different cell line data, we may have different relations.

Blog

Network Inference DREAM Breast Cancer Challenge

The inference of causal edges are described as the change on a node seen after the intervention of another node. If the curves obtained over time overlap (under intervention or no intervention), then there is no relation. Otherwise, we can draw an edge between those nodes and according to the level, up or down, the edge will be activating or inhibiting. These causal edges are context-specific so in different cell line data, we may have different relations.

Blog

Network Inference DREAM Breast Cancer Challenge

The inference of causal edges are described as the change on a node seen after the intervention of another node. If the curves obtained over time overlap (under intervention or no intervention), then there is no relation. Otherwise, we can draw an edge between those nodes and according to the level, up or down, the edge will be activating or inhibiting. These causal edges are context-specific so in different cell line data, we may have different relations.

Blog

DREAM Breast Cancer Sub-challenges

I have been going over the sub-challenges before attempting to solve them. As I mentioned, there are three sub-challenges and somehow they are connected. First, using given data and other possible data sources such as pathway databases, the causal signaling network of the phosphoproteins. There are 4 cell lines and 8 stimulus so they make total 32 networks at the end. Nodes are phosphoproteins and edges should be directed and causal (activator or inhibitor).

Blog

HPN-DREAM Breast Cancer Network Inference Challenge

Understanding signaling networks might bring more insights on cancer treatment because cells respond to their environment by activating these networks and phosphorylation reactions play important roles in these networks. The goal of this challenge is to advance our ability and knowledge on signaling networks inference and protein phosphorylation dynamics prediction. Also, we are asked to develop a visualization method for the data. The dataset provided is extensive and a result of RPPA (reverse-phase protein array) experiments.

Blog

DREAM Breast Cancer Sub-challenges

I have been going over the sub-challenges before attempting to solve them. As I mentioned, there are three sub-challenges and somehow they are connected. First, using given data and other possible data sources such as pathway databases, the causal signaling network of the phosphoproteins. There are 4 cell lines and 8 stimulus so they make total 32 networks at the end. Nodes are phosphoproteins and edges should be directed and causal (activator or inhibitor).

Blog

HPN-DREAM Breast Cancer Network Inference Challenge

Understanding signaling networks might bring more insights on cancer treatment because cells respond to their environment by activating these networks and phosphorylation reactions play important roles in these networks. The goal of this challenge is to advance our ability and knowledge on signaling networks inference and protein phosphorylation dynamics prediction. Also, we are asked to develop a visualization method for the data. The dataset provided is extensive and a result of RPPA (reverse-phase protein array) experiments.

Blog

HPN-DREAM Breast Cancer Network Inference Challenge

Understanding signaling networks might bring more insights on cancer treatment because cells respond to their environment by activating these networks and phosphorylation reactions play important roles in these networks. The goal of this challenge is to advance our ability and knowledge on signaling networks inference and protein phosphorylation dynamics prediction. Also, we are asked to develop a visualization method for the data. The dataset provided is extensive and a result of RPPA (reverse-phase protein array) experiments.

Blog

HPN-DREAM Breast Cancer Network Inference Challenge

Understanding signaling networks might bring more insights on cancer treatment because cells respond to their environment by activating these networks and phosphorylation reactions play important roles in these networks. The goal of this challenge is to advance our ability and knowledge on signaling networks inference and protein phosphorylation dynamics prediction. Also, we are asked to develop a visualization method for the data. The dataset provided is extensive and a result of RPPA (reverse-phase protein array) experiments.

Blog

HPN-DREAM Breast Cancer Network Inference Challenge

Understanding signaling networks might bring more insights on cancer treatment because cells respond to their environment by activating these networks and phosphorylation reactions play important roles in these networks. The goal of this challenge is to advance our ability and knowledge on signaling networks inference and protein phosphorylation dynamics prediction. Also, we are asked to develop a visualization method for the data. The dataset provided is extensive and a result of RPPA (reverse-phase protein array) experiments.

Blog

Dream Challenge

This year, 8th Dream Challenge takes place and I will be working on this project as my internship job in BiGCaT, Bioinformatics, UM. The challenge brings scientists to catalyze the interaction between experiment and theory in the area of cellular network inference and quantitative model building in systems biology (as said on their webpage). In this competition, I will work on a specific challenge about network modeling, dynamic response predictions and data visualization.

Blog

Dream Challenge

This year, 8th Dream Challenge takes place and I will be working on this project as my internship job in BiGCaT, Bioinformatics, UM. The challenge brings scientists to catalyze the interaction between experiment and theory in the area of cellular network inference and quantitative model building in systems biology (as said on their webpage). In this competition, I will work on a specific challenge about network modeling, dynamic response predictions and data visualization.

Blog

SRS'de Coklu Arama Yapmak

Inceleme yapan scriptin en son hali, oncekilere gore daha fazla okuma inceliyor oldugu icin her okuma icin SRS uzerinde isim aramak oldukca zaman alan bir islemdi. Oyle ki, son inceleme 4 gun surdu. Bunu azaltmak icin inceleme scriptini tamamen degistirdim. Oncelikle her zaman oldugu gibi esik degerini gecenleri aliyor ama direkt bunlarin ID numaralarini bir dizide (array) listeliyorum. Daha sonra bu listenin herbir elemanini boru karakteri ile ayirarak bir string haline getiriyorum.

Blog

SRS'de Coklu Arama Yapmak

Inceleme yapan scriptin en son hali, oncekilere gore daha fazla okuma inceliyor oldugu icin her okuma icin SRS uzerinde isim aramak oldukca zaman alan bir islemdi. Oyle ki, son inceleme 4 gun surdu. Bunu azaltmak icin inceleme scriptini tamamen degistirdim. Oncelikle her zaman oldugu gibi esik degerini gecenleri aliyor ama direkt bunlarin ID numaralarini bir dizide (array) listeliyorum. Daha sonra bu listenin herbir elemanini boru karakteri ile ayirarak bir string haline getiriyorum.

Blog

Unix'te Perl Ile Bir Komut Ciktisini Okumak ve Duzenli Ifadeler

Daha once organizma isimlerini duzenli ifadelerle nasil cikardigimi anlatmistim. Burada, gene benzer bir seyden bahsedecegim ancak bu biraz daha fazla, ozel bir teknikle Perl’de yapilan, veri tabanindan bilgileri birden fazla satir halinde cikti olarak aldigim icin gerek duydugum cok yararli bir yontem. Mutlaka benzerini baska amaclarla da kullanabilir, yararlanabilirsiniz. Bu ihtiyac, HUSAR gurubu tarafindan olusturulan honest veritabaninin organizma isimlerini direkt sunmamasi ancak birkac satir halinde gostermesi sebebiyle dogdu. Asagida bunun ornegini gorebilirsiniz.

Blog

Duzenli Ifadeler ile Tur Ismini Elde Etmek

Projemin sonunda kullaniciya olasi kirleten organizmalarin adlarini (Latince tur isimleri) gosterecegim icin, MegaBLAST sonuclarindaki erisim numaralarini (accession number) kullanarak her dizi icin organizma adlarini elde etmem gerekiyor. Sequence Retrival System (SRS) adinda, HUSAR sunucularinda bulunan baska bir sistem ile bunu yapabiliyorum. SRS’ten organizma adini ogrenebilmem icin Unix komut satirinda “getz” komutuyla birlikte veritabani ismi, erisim numarasi ve ogrenmek istedigim alani yazmam yetiyor. Asagida, bu isi yapabilen ornek bir kod bulabilirsiniz.

Blog

SRS'de Coklu Arama Yapmak

Inceleme yapan scriptin en son hali, oncekilere gore daha fazla okuma inceliyor oldugu icin her okuma icin SRS uzerinde isim aramak oldukca zaman alan bir islemdi. Oyle ki, son inceleme 4 gun surdu. Bunu azaltmak icin inceleme scriptini tamamen degistirdim. Oncelikle her zaman oldugu gibi esik degerini gecenleri aliyor ama direkt bunlarin ID numaralarini bir dizide (array) listeliyorum. Daha sonra bu listenin herbir elemanini boru karakteri ile ayirarak bir string haline getiriyorum.

Blog

MegaBLAST Sonuclarini Incelemek - Parsing

Pipeline’da son asama, aranan dizilerin urettigi ciktilari baska bir script ile incelemek. Bu islemle herbir megablast dosyasi okunuyor, ve dizilerin name, identity, overlapping length gibi parametrelerinin degerleri saklanarak amaca yonelik sekilde ekrana yazdiriliyor. Projemde HUSAR paketinde bulunan ve yukarida bahsettigim alanlari bana dizi olarak donduren Inslink adinda bir parser kullaniyorum. Bu parserin yaptigi tek sey, dosyayi okumak ve dosyadaki istenen alanlarin degerlerini saklamak. Daha sonra ben bu saklanan degerleri, koda eklemeler yaparak gosteriyorum ve birkac ek kod ile de ihtiyacim olan anlamli sonuclar gosteriyorum.

Blog

Inceleme Sonuclarini "Ambiguous" Olarak Ayirmak

Cesitli veritabanlarina karsi yaptigim aramalardan aldigim sonuclari incelerken, bunlari cesitli esik degerleri ile degerlendirmek ile beraber belirlenen esik degerlerinin uzerinde ya da altinda olan hitleri “Ambiguous” (belirsiz, cok anlamli) ya da “Unique” (essiz, tek) olarak ayirarak daha da anlamli hale getirmeye calisiyorum. “Ambiguous” olarak, her bir megablast dosyasinda esik degerlerine uygun ancak birden fazla farkli organizmayi iceren hitleri etiketliyorum. Eger her esik degerine uygun hit, tek bir dosya icinde her zaman ayni organizmaya ait ise bu durumda yaptigim sey onu “unique” olarak etiketlemek.

Blog

SRS'de Coklu Arama Yapmak

Inceleme yapan scriptin en son hali, oncekilere gore daha fazla okuma inceliyor oldugu icin her okuma icin SRS uzerinde isim aramak oldukca zaman alan bir islemdi. Oyle ki, son inceleme 4 gun surdu. Bunu azaltmak icin inceleme scriptini tamamen degistirdim. Oncelikle her zaman oldugu gibi esik degerini gecenleri aliyor ama direkt bunlarin ID numaralarini bir dizide (array) listeliyorum. Daha sonra bu listenin herbir elemanini boru karakteri ile ayirarak bir string haline getiriyorum.

Blog

MegaBLAST Sonuclarini Incelemek - Parsing

Pipeline’da son asama, aranan dizilerin urettigi ciktilari baska bir script ile incelemek. Bu islemle herbir megablast dosyasi okunuyor, ve dizilerin name, identity, overlapping length gibi parametrelerinin degerleri saklanarak amaca yonelik sekilde ekrana yazdiriliyor. Projemde HUSAR paketinde bulunan ve yukarida bahsettigim alanlari bana dizi olarak donduren Inslink adinda bir parser kullaniyorum. Bu parserin yaptigi tek sey, dosyayi okumak ve dosyadaki istenen alanlarin degerlerini saklamak. Daha sonra ben bu saklanan degerleri, koda eklemeler yaparak gosteriyorum ve birkac ek kod ile de ihtiyacim olan anlamli sonuclar gosteriyorum.

Blog

Kalite Satirinin Degerlendirilmesi - Quality Filter

Kirleten organizma (konaminant) analizi yapacak olan pipeline’i daha fazla gelistirmek, daha anlamli sonuclar elde etmek icin ilk adimlara (henuz fastq dosyasini isliyorken) kalite filtresi eklemeyi dusunduk. Boylece belirli bir esik degerinden dusuk okumalari daha o asamadan filtreleyerek daha guvenilir sonuclar elde elebilecegiz. Bu kalite kontrolunu fastq dosyasinda her okumanin 4. satirini anlayarak yapacagiz. Bu 4. satir (aslinda okumanin dizileme kalite skoru), cesitli dizileme cihazlari tarafindan cesitli sekillerde yaziliyor (kodlaniyor) ve bu kodlamadan tekrar kalite skorunu elde ederek filtreleme uygulanmasi gerekiyor.

Blog

Inceleme Sonuclarini "Ambiguous" Olarak Ayirmak

Cesitli veritabanlarina karsi yaptigim aramalardan aldigim sonuclari incelerken, bunlari cesitli esik degerleri ile degerlendirmek ile beraber belirlenen esik degerlerinin uzerinde ya da altinda olan hitleri “Ambiguous” (belirsiz, cok anlamli) ya da “Unique” (essiz, tek) olarak ayirarak daha da anlamli hale getirmeye calisiyorum. “Ambiguous” olarak, her bir megablast dosyasinda esik degerlerine uygun ancak birden fazla farkli organizmayi iceren hitleri etiketliyorum. Eger her esik degerine uygun hit, tek bir dosya icinde her zaman ayni organizmaya ait ise bu durumda yaptigim sey onu “unique” olarak etiketlemek.

Blog

Birden Fazla Dizi Dosyalarindan MegaBLAST'i Calistirmak

Asagidaki scripti, pipeline’in MegaBLAST aramasini daha hizli yapabilmek icin dusundugumuz bir teknige uygun olabilmesi icin yazdim. Yaptigi sey, her okuma icin olusturulmus ve formatlanmis dizi dosyalarini kullanarak veritabanlarinda belirtilen baslangic noktasi ve okuma sayisi ile arama yapmak. 1#!user/local/bin/perl 2 3$database = $ARGV[0]; 4$dir = $ARGV[1]; #directory for sequences 5$sp = $ARGV[2]; #starting point 6$n = $ARGV[3] + $sp; 7 8while (1) { 9 system("blastplus -programname=megablast $dir/read_$sp.seq $database -OUTFILE=read_$sp.megablast -nobatch -d"); 10 $sp++; 11 last if ($sp == $n); 12} Burada her sey gercekten cok basit bir programlama ile isliyor.

Blog

Tek FASTA Dosyasindan MegaBLAST'i Calistirmak - Duzenli Ifadeler

Asagida MegaBLAST’i FASTA dosyasi okuyarak calistirmak ve sonuclari bir dizinde toplayabilmek amaciyla yazdigim Perl scripti ve onun aciklamasi var. Bu script tasarlamakta oldugum pipeline’in onemli bir parcasi. Bu script ilk yazdigim olan ve sadece bir FASTA dosyasi uzerinden tum okumalara ulasabilen script. 1#!user/local/bin/perl 2$database = $ARGV[0]; 3$fasta = $ARGV[1]; #input file 4$sp = $ARGV[2]; #starting point 5$n = $ARGV[3] + $sp; 6 7if(!defined($n)){$n=12;} #set default number 8 9open FASTA, $fasta or die $!

Blog

Unix'te Perl Ile Bir Komut Ciktisini Okumak ve Duzenli Ifadeler

Daha once organizma isimlerini duzenli ifadelerle nasil cikardigimi anlatmistim. Burada, gene benzer bir seyden bahsedecegim ancak bu biraz daha fazla, ozel bir teknikle Perl’de yapilan, veri tabanindan bilgileri birden fazla satir halinde cikti olarak aldigim icin gerek duydugum cok yararli bir yontem. Mutlaka benzerini baska amaclarla da kullanabilir, yararlanabilirsiniz. Bu ihtiyac, HUSAR gurubu tarafindan olusturulan honest veritabaninin organizma isimlerini direkt sunmamasi ancak birkac satir halinde gostermesi sebebiyle dogdu. Asagida bunun ornegini gorebilirsiniz.

Blog

Duzenli Ifadeler ile Tur Ismini Elde Etmek

Projemin sonunda kullaniciya olasi kirleten organizmalarin adlarini (Latince tur isimleri) gosterecegim icin, MegaBLAST sonuclarindaki erisim numaralarini (accession number) kullanarak her dizi icin organizma adlarini elde etmem gerekiyor. Sequence Retrival System (SRS) adinda, HUSAR sunucularinda bulunan baska bir sistem ile bunu yapabiliyorum. SRS’ten organizma adini ogrenebilmem icin Unix komut satirinda “getz” komutuyla birlikte veritabani ismi, erisim numarasi ve ogrenmek istedigim alani yazmam yetiyor. Asagida, bu isi yapabilen ornek bir kod bulabilirsiniz.

Blog

Bir MegaBLAST Ciktisi Icerigi - RefSeq Veritabani

Asagida, deneme FASTA dosyasini refseq_genomic veritabaninda arayarak elde ettigim dosyadan, bir hitin ayrintilarini goruyoruz. >>>>refseq_genomic_complete3: AC_000033_0310 Continuation (311 of 1357) of AC_000033 from base 31000001 (AC_000033 Mus musculus strain mixed chromosome 11, alternate assembly Mm_Celera, whole genome shotgun sequence. 2/2012) Length = 110000 Score = 115 bits (58), Expect = 4e-22 Identities = 74/79 (93%), Gaps = 2/79 (2%) Strand = Plus / Minus Query: 1 ctctctctgtct-tctctctctctctgtctctctctctttctctctcttctctctctctc 59 |||||||||||| ||| ||||||||| ||||||||||| ||||||||||||||||||||| Sbjct: 89773 ctctctctgtctgtctttctctctctctctctctctctctctctctcttctctctctctc 89714 Query: 60 tttctctctgccctctctc 78 ||||||||| ||||||||| Sbjct: 89713 tttctctct-ccctctctc 89696 Ayrintilarda, ilk olarak >>>> karakterleriyle hit ile ilgili baslik bilgisi veriyor.

Blog

MegaBLAST Aramasini Hizlandirma

Son zamanlarda sadece farkli veritabanlarinda, MegaBLAST’i en cabuk ve etkili bir sekilde calistirmanin yolunu ariyorum ve FASTA dosyasi olusturma asamasinda, gercekten cokca ise yarayan bir yontem danismanim tarafindan geldi. Daha once tum dizilerin bulundugu tek bir FASTA dosyasindan arama yapiyordum ve bu zaman kaybina yol aciyordu. Her ne kadar dosya bir sefer acilsa da her seferinde dosya icinde satirlara gidip onu okuman, zaman alan bir islem. Bunu, dosyadaki her okumayi, ayri bir FASTA dosyasi haline getirerek cozduk.

Blog

Veritabanina Gore Bir Komutun Calisma Suresi - CPU Runtime

Calisilan dosyalar, veritabanları buyuk olunca ve yeterince bilgisayar gucune sahip olmayınca, her seyden once olcmemiz gereken nasil en etkili ve kisa surede sonucu alabiliyor olmamizdir. Özellikle projemde, farkli veritabanları ve farkli parametreler kullanarak, bunları arastiriyorum. Şimdilik dort veritabani deniyorum, bunlar: nrnuc, ensembl_cdna, honest ve refseq_genomic. Ayrica, bunu farkli iki kelime uzunluğuna gore de yapacagim. Kelime uzunluğu (word size) MegaBLAST’in ararken tam olarak eslestirecegi baz cifti sayisi. Yani elimde 151 baz ciftine sahip bir dizilim varsa, ve eger kelime uzunluğu 50 olarak belirlenmişse, bu 151 baz cifti icinden herhangi bir yerden baslayan ama arka arkaya en az 50 bazin dizilendiği kisimlar aranacak.

Blog

FASTQ'dan FASTA'ya Donusturme Perl Scripti

FASTQ ve FASTA formatlari aslinda ayni bilgiyi iceren ancak birinde sadece herbir dizi icin iki satir daha az bilginin bulundugu dosya formatlari. Projemde onemli olan diger bir farklari ise FASTA formatinin direkt olarak MegaBLAST arama yapilabilmesi. Iste bu yuzden, genetik dizilim yapan makinelerin olusturdugu FASTQ formatini FASTA’ya cevirmem gerekiyor. Ve bu script pipeline’in ilk adimi. Aslinda deneme amacli aldigim genetk dizilimin, bana bunu ulastiran tarafindan eslestirmesinin yapilmadigi icin, bir on adim olarak bu eslestirmeyi yapmistim.

Blog

SAM Dosyası - BAM Dosyası - samtools

Aslında programlamam gereken pipeline direkt olarak eşleşmeyen okumalar üzerinden analizler yapacak. Ancak böyle bir veri bulamadiığım için, elimdeki tek veri eşleşen ve eşleşmeyen okumaları içerdiği için önce eşleşenlerden kurtulmam gerekti. Bunu daha önce de belirttiğim gibi bwa eşleştiricisi (aligner - mapper) ile yapıyorum. bwa bir dizi işlemden sonra SAM dosyası oluşturuyor ancak benim FASTQ dosyasına ihtiyacım var. Bunun için SAM dosyasını samtools1 ile benzer bir format olan BAM dosyasına çevirip, daha sonra da bam2fastq2 aracı ile FASTQ dosyamı elde edeceğim.

Blog

MegaBLAST - Dizilerdeki Benzerlikleri Bulma Aracı

MegaBLAST, HUSAR paketinde bulunan, BLAST (Basic Local Alignment Search Tool) paketinin bir parçası. Ayrıca BLASTN’in bir değişik türü. MegaBLAST uzun dizileri BLASTN’den daha etkili bir şekilde işliyor ve hem de çok daha hızlı işlem yapiyor ancak daha az duyarlı. Bu yüzden benzer dizileri geniş veri tabanlarında aramaya çok uygun bir araç. Yazacağım program çoklu dizilim barındıran FASTA dosyasını alacak ve megablast komutunu çalıştıracak. Daha sonra da her okuma için bir .

Blog

Kontaminant (Kirletici) Analizi Projesi

Başlangıç olarak, araçlara, programlama diline, kısacası biyoenformatiğe alışabilmem için bana verilen bu ufak projeyi ayrıntılı olarak anlatacağım. Biliyoruz ki, laboratuvar çalışmalarımızda ne kadar önlemeye çalışsak da kontaminant riski hep bulunuyor. Bunu ne kadar aza indirsek o kadar iyi, ki daha sonra bunun miktarını bulup, bunun üzerinden sonucumuzun bir başka değerlendirmesini de yapabiliriz. İşte bunu bulmak için bir yöntem, DNA analizi. Çalıştığınız örneğinizin DNA’sı dizileniyor ve bu DNA çeşitli programlarla analiz edilip, kirleten organizmaları DNA’larından ortaya çıkarabiliyoruz

Blog

FASTQ Formatı - FASTQ Dosyası

Bugün programı oluştururken kullanacağım “test” dizilimini aldım. İki adet FASTQ dosyasından oluşuyor, her biri sıkıştırılmış ama buna rağmen boyutları 6 GB civarı. Ben elbette çok zaman kaybetmek istemediğim için bu dosyalardan birinin sadece bir kısmını kullanacağım. Amacım, bu FASTQ dosyalarındaki eşleşebilen okumaları BWA aracı ile bularak, daha sonra onları çıkarmak. Ve kalan eşleşemeyen okumaları MegaBLAST aracının anlayabileceği bir dilde (FASTA formatında) kaydetmek. Bu arada tüm projeyi bir Unix bilgisayarda hazırladığım için birçok komut öğreniyorum, daha sonra bunları ayrıca yazmaya çalışacağım.

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

SRS'de Coklu Arama Yapmak

Inceleme yapan scriptin en son hali, oncekilere gore daha fazla okuma inceliyor oldugu icin her okuma icin SRS uzerinde isim aramak oldukca zaman alan bir islemdi. Oyle ki, son inceleme 4 gun surdu. Bunu azaltmak icin inceleme scriptini tamamen degistirdim. Oncelikle her zaman oldugu gibi esik degerini gecenleri aliyor ama direkt bunlarin ID numaralarini bir dizide (array) listeliyorum. Daha sonra bu listenin herbir elemanini boru karakteri ile ayirarak bir string haline getiriyorum.

Blog

SRS'de Coklu Arama Yapmak

Inceleme yapan scriptin en son hali, oncekilere gore daha fazla okuma inceliyor oldugu icin her okuma icin SRS uzerinde isim aramak oldukca zaman alan bir islemdi. Oyle ki, son inceleme 4 gun surdu. Bunu azaltmak icin inceleme scriptini tamamen degistirdim. Oncelikle her zaman oldugu gibi esik degerini gecenleri aliyor ama direkt bunlarin ID numaralarini bir dizide (array) listeliyorum. Daha sonra bu listenin herbir elemanini boru karakteri ile ayirarak bir string haline getiriyorum.

Blog

Unix'te Perl Ile Bir Komut Ciktisini Okumak ve Duzenli Ifadeler

Daha once organizma isimlerini duzenli ifadelerle nasil cikardigimi anlatmistim. Burada, gene benzer bir seyden bahsedecegim ancak bu biraz daha fazla, ozel bir teknikle Perl’de yapilan, veri tabanindan bilgileri birden fazla satir halinde cikti olarak aldigim icin gerek duydugum cok yararli bir yontem. Mutlaka benzerini baska amaclarla da kullanabilir, yararlanabilirsiniz. Bu ihtiyac, HUSAR gurubu tarafindan olusturulan honest veritabaninin organizma isimlerini direkt sunmamasi ancak birkac satir halinde gostermesi sebebiyle dogdu. Asagida bunun ornegini gorebilirsiniz.

Blog

Duzenli Ifadeler ile Tur Ismini Elde Etmek

Projemin sonunda kullaniciya olasi kirleten organizmalarin adlarini (Latince tur isimleri) gosterecegim icin, MegaBLAST sonuclarindaki erisim numaralarini (accession number) kullanarak her dizi icin organizma adlarini elde etmem gerekiyor. Sequence Retrival System (SRS) adinda, HUSAR sunucularinda bulunan baska bir sistem ile bunu yapabiliyorum. SRS’ten organizma adini ogrenebilmem icin Unix komut satirinda “getz” komutuyla birlikte veritabani ismi, erisim numarasi ve ogrenmek istedigim alani yazmam yetiyor. Asagida, bu isi yapabilen ornek bir kod bulabilirsiniz.

Blog

DKFZ - Heidelberg Biyoenformatik Birimi'nde Staj

Erasmus programıyla yapıyor olduğum yaz stajı başladı. İlk olarak birimi yöneten bilim insanlarından birkaç saatlik tanıtım dersi aldım. Bu derste birimin kısa tarihi, birimin günümüze kadar yaptıkları projeler ve bunlarin ayrintilari konusunda bilgiler aldım. Biyoenformatik Birimi DKFZ’nin (Deutsches Krebsforschungszentrum – ing. German Cancer Research Center) bir çekirdek tesisi olan Genomik ve Proteomik Çekirdek Tesisi’ne bağlı bir grup. İsimleri aynı zamanda HUSAR (Heidelberg Unix Sequence Analysis Resources) ve bu isim grubun geliştirdiği dizi analizi yapma paketinin de adı olarak kullanılıyor.

Blog

MegaBLAST Sonuclarini Incelemek - Parsing

Pipeline’da son asama, aranan dizilerin urettigi ciktilari baska bir script ile incelemek. Bu islemle herbir megablast dosyasi okunuyor, ve dizilerin name, identity, overlapping length gibi parametrelerinin degerleri saklanarak amaca yonelik sekilde ekrana yazdiriliyor. Projemde HUSAR paketinde bulunan ve yukarida bahsettigim alanlari bana dizi olarak donduren Inslink adinda bir parser kullaniyorum. Bu parserin yaptigi tek sey, dosyayi okumak ve dosyadaki istenen alanlarin degerlerini saklamak. Daha sonra ben bu saklanan degerleri, koda eklemeler yaparak gosteriyorum ve birkac ek kod ile de ihtiyacim olan anlamli sonuclar gosteriyorum.

Blog

FASTQ Formatı - FASTQ Dosyası

Bugün programı oluştururken kullanacağım “test” dizilimini aldım. İki adet FASTQ dosyasından oluşuyor, her biri sıkıştırılmış ama buna rağmen boyutları 6 GB civarı. Ben elbette çok zaman kaybetmek istemediğim için bu dosyalardan birinin sadece bir kısmını kullanacağım. Amacım, bu FASTQ dosyalarındaki eşleşebilen okumaları BWA aracı ile bularak, daha sonra onları çıkarmak. Ve kalan eşleşemeyen okumaları MegaBLAST aracının anlayabileceği bir dilde (FASTA formatında) kaydetmek. Bu arada tüm projeyi bir Unix bilgisayarda hazırladığım için birçok komut öğreniyorum, daha sonra bunları ayrıca yazmaya çalışacağım.

Blog

MegaBLAST Sonuclarini Incelemek - Parsing

Pipeline’da son asama, aranan dizilerin urettigi ciktilari baska bir script ile incelemek. Bu islemle herbir megablast dosyasi okunuyor, ve dizilerin name, identity, overlapping length gibi parametrelerinin degerleri saklanarak amaca yonelik sekilde ekrana yazdiriliyor. Projemde HUSAR paketinde bulunan ve yukarida bahsettigim alanlari bana dizi olarak donduren Inslink adinda bir parser kullaniyorum. Bu parserin yaptigi tek sey, dosyayi okumak ve dosyadaki istenen alanlarin degerlerini saklamak. Daha sonra ben bu saklanan degerleri, koda eklemeler yaparak gosteriyorum ve birkac ek kod ile de ihtiyacim olan anlamli sonuclar gosteriyorum.

Blog

MegaBLAST Sonuclarini Incelemek - Parsing

Pipeline’da son asama, aranan dizilerin urettigi ciktilari baska bir script ile incelemek. Bu islemle herbir megablast dosyasi okunuyor, ve dizilerin name, identity, overlapping length gibi parametrelerinin degerleri saklanarak amaca yonelik sekilde ekrana yazdiriliyor. Projemde HUSAR paketinde bulunan ve yukarida bahsettigim alanlari bana dizi olarak donduren Inslink adinda bir parser kullaniyorum. Bu parserin yaptigi tek sey, dosyayi okumak ve dosyadaki istenen alanlarin degerlerini saklamak. Daha sonra ben bu saklanan degerleri, koda eklemeler yaparak gosteriyorum ve birkac ek kod ile de ihtiyacim olan anlamli sonuclar gosteriyorum.

Blog

Unix'te Perl Ile Bir Komut Ciktisini Okumak ve Duzenli Ifadeler

Daha once organizma isimlerini duzenli ifadelerle nasil cikardigimi anlatmistim. Burada, gene benzer bir seyden bahsedecegim ancak bu biraz daha fazla, ozel bir teknikle Perl’de yapilan, veri tabanindan bilgileri birden fazla satir halinde cikti olarak aldigim icin gerek duydugum cok yararli bir yontem. Mutlaka benzerini baska amaclarla da kullanabilir, yararlanabilirsiniz. Bu ihtiyac, HUSAR gurubu tarafindan olusturulan honest veritabaninin organizma isimlerini direkt sunmamasi ancak birkac satir halinde gostermesi sebebiyle dogdu. Asagida bunun ornegini gorebilirsiniz.

Blog

Duzenli Ifadeler ile Tur Ismini Elde Etmek

Projemin sonunda kullaniciya olasi kirleten organizmalarin adlarini (Latince tur isimleri) gosterecegim icin, MegaBLAST sonuclarindaki erisim numaralarini (accession number) kullanarak her dizi icin organizma adlarini elde etmem gerekiyor. Sequence Retrival System (SRS) adinda, HUSAR sunucularinda bulunan baska bir sistem ile bunu yapabiliyorum. SRS’ten organizma adini ogrenebilmem icin Unix komut satirinda “getz” komutuyla birlikte veritabani ismi, erisim numarasi ve ogrenmek istedigim alani yazmam yetiyor. Asagida, bu isi yapabilen ornek bir kod bulabilirsiniz.

Blog

FASTQ Formatı - FASTQ Dosyası

Bugün programı oluştururken kullanacağım “test” dizilimini aldım. İki adet FASTQ dosyasından oluşuyor, her biri sıkıştırılmış ama buna rağmen boyutları 6 GB civarı. Ben elbette çok zaman kaybetmek istemediğim için bu dosyalardan birinin sadece bir kısmını kullanacağım. Amacım, bu FASTQ dosyalarındaki eşleşebilen okumaları BWA aracı ile bularak, daha sonra onları çıkarmak. Ve kalan eşleşemeyen okumaları MegaBLAST aracının anlayabileceği bir dilde (FASTA formatında) kaydetmek. Bu arada tüm projeyi bir Unix bilgisayarda hazırladığım için birçok komut öğreniyorum, daha sonra bunları ayrıca yazmaya çalışacağım.

Blog

Kalite Satirinin Degerlendirilmesi - Quality Filter

Kirleten organizma (konaminant) analizi yapacak olan pipeline’i daha fazla gelistirmek, daha anlamli sonuclar elde etmek icin ilk adimlara (henuz fastq dosyasini isliyorken) kalite filtresi eklemeyi dusunduk. Boylece belirli bir esik degerinden dusuk okumalari daha o asamadan filtreleyerek daha guvenilir sonuclar elde elebilecegiz. Bu kalite kontrolunu fastq dosyasinda her okumanin 4. satirini anlayarak yapacagiz. Bu 4. satir (aslinda okumanin dizileme kalite skoru), cesitli dizileme cihazlari tarafindan cesitli sekillerde yaziliyor (kodlaniyor) ve bu kodlamadan tekrar kalite skorunu elde ederek filtreleme uygulanmasi gerekiyor.

Blog

Dorduncu Deneme Veriseti: Mus Musculus Genomu

Simdiye kadar ilk uc veriseti de insan genomuna aitti. Pipeline’i bu genomlarla deneyip, yer yer iyilestirmeler yaptim. Simdi ise baska organizmalarla da deneyip, daha fazla sonuc alip bunlari inceleyecegim ve gene gerekli iyilestirmeleri yapacagim. Bu ilk farkli veriseti fareden geliyor. Mus Musculus tur adina ve ev faresi olarak yaygin isme sahip bu organizma da model organizma olarak calismalarda kullanildigi icin dizisi daha siklikla cikarilan diger bir organizma. Bi dizilemeyi yapan, birlikte calistigim laboratuvardan cesitli BAM formatinda dizi dosyalari aldim.

Blog

Yeni Verisetinin Incelenmesi

Pipeline’i tasarlama asamasinda deneme amacli kullandigim onceki verinin cok kotu olmasi sebebiyle yeni bir veriseti aldim. Elbette deneme asamasinda birden fazla, farkli karakterlerde verisetleri kullanmak yararlidir. Ancak onceki veriseti anlamli birkac sonuc veremeyecek kadar kotuydu diyebilirim. Ayrintilarina [buradan]({% post_url 2012-07-06-eslestirme-ve-eslesmeyen-okumalari %}) gozatabilirsiniz. Yeni veriseti, gene bir insan genomu verisi ve BAM dosyasinin boyutu 1.8 GB ve icinde eslenebilen ve eslenemeyen okumalari bulunduruyordu. Ben bam2fastq araciyla hem bu BAM dosyasini FASTQ dosyasina cevirirken hem de eslenebilen okumalardan ayiklayarak 0.

Blog

Tek FASTA Dosyasindan MegaBLAST'i Calistirmak - Duzenli Ifadeler

Asagida MegaBLAST’i FASTA dosyasi okuyarak calistirmak ve sonuclari bir dizinde toplayabilmek amaciyla yazdigim Perl scripti ve onun aciklamasi var. Bu script tasarlamakta oldugum pipeline’in onemli bir parcasi. Bu script ilk yazdigim olan ve sadece bir FASTA dosyasi uzerinden tum okumalara ulasabilen script. 1#!user/local/bin/perl 2$database = $ARGV[0]; 3$fasta = $ARGV[1]; #input file 4$sp = $ARGV[2]; #starting point 5$n = $ARGV[3] + $sp; 6 7if(!defined($n)){$n=12;} #set default number 8 9open FASTA, $fasta or die $!

Blog

FASTQ'dan FASTA'ya Donusturme Perl Scripti

FASTQ ve FASTA formatlari aslinda ayni bilgiyi iceren ancak birinde sadece herbir dizi icin iki satir daha az bilginin bulundugu dosya formatlari. Projemde onemli olan diger bir farklari ise FASTA formatinin direkt olarak MegaBLAST arama yapilabilmesi. Iste bu yuzden, genetik dizilim yapan makinelerin olusturdugu FASTQ formatini FASTA’ya cevirmem gerekiyor. Ve bu script pipeline’in ilk adimi. Aslinda deneme amacli aldigim genetk dizilimin, bana bunu ulastiran tarafindan eslestirmesinin yapilmadigi icin, bir on adim olarak bu eslestirmeyi yapmistim.

Blog

Eşleştirme ve Eşleşmeyen Okumaları Çıkarma Sonuçları

Daha önce verinin sadece bir kısmı ile çalışıyordum ancak artık tamamıyla çalışacağım. Bu yüzden bana sıkıştırılmış halde gelen veriyi direkt çalışma klasörüme çıkardım ve onun üzerinden işlemler yaptım. Başlangıç (FASTQ) dosyamın boyutu 2153988289 bayt (2 GB). Ve bwa aracılığıyla eşleştirmeden sonra toplamda 6004193 dizilim, ya da okuma, (sequences ya da reads) ortaya çıktı. Daha sonra eşleşmeyen okumaları çıkarmam sonrasında toplam okuma sayısı 551065 kadar azaldı ve 5493128 oldu. Yani verinin %9.

Blog

BWA İle Eşleştirme (Mapping - Alignment)

Bunu daha önce yazmayı unutmuşum. Aslında bahsetmiştim ancak nasıl yapıldığına dair bir şeyler yazmamışım ayrıca örnek komutlar da eklememişim. BWA elimizdeki (FASTQ formatındaki) DNA dizilimini, referans genomunu (projemde bu insan genomu) alarak bir .sai dosyası oluşturuyor. Bu dosya dizinin ve referans genomunun eşleşmesi ile ilgili bilgiler taşiyor ve bu bilgileri kullanarak eşleşmeyenleri ayırabiliyorum. İlk olarak aşağıdaki komut ile .sai dosyamızı oluşturuyoruz. 1bwa aln $NGSDATAROOT/bwa/human_genome37 ChIP_NoIndex_L001_R1_complete_filtered.fastq > complete_alignment.sai Oluşturduğumuz .sai dosyası çok da kullanışlı bir dosya değil, bu yüzden onu SAM dosyasına çevirerek, işlemlere devam ediyoruz.

Blog

SAM Dosyası - BAM Dosyası - samtools

Aslında programlamam gereken pipeline direkt olarak eşleşmeyen okumalar üzerinden analizler yapacak. Ancak böyle bir veri bulamadiığım için, elimdeki tek veri eşleşen ve eşleşmeyen okumaları içerdiği için önce eşleşenlerden kurtulmam gerekti. Bunu daha önce de belirttiğim gibi bwa eşleştiricisi (aligner - mapper) ile yapıyorum. bwa bir dizi işlemden sonra SAM dosyası oluşturuyor ancak benim FASTQ dosyasına ihtiyacım var. Bunun için SAM dosyasını samtools1 ile benzer bir format olan BAM dosyasına çevirip, daha sonra da bam2fastq2 aracı ile FASTQ dosyamı elde edeceğim.

Blog

İlk Adım: Eşleşmeyen Okumaları Elde Etmek

Projemin ilk kısmı daha önce bahsettiğim gibi eşleşmeyen okumaları (unmapped reads) FASTQ dosyasından çıkarmak. Böylece, daha sonraki analizler için elimdeki ihtiyacım olmayan dizileri çıkarmış ve bu analizlerdeki iş yükünü azaltmış oluyorum. Başından beri hedefim, tüm projeyi adım adım gerçekleştiren bir pipeline tasarlamak olduğu için bu işlemi bir Perl scripti ile yapacağım. Bu script pipeline’in ilk scripti ve laboratuvardan gelecek ham (raw) FASTQ formatındaki verinin girdi (input) olarak kullanılacağı yer. Aslında bu scripte ihtiyacım olmayacak, sadece elimdeki verinin eşlenebilen verileri de içermesi sebebiyle bu adımı ekledim.

Blog

Kontaminant (Kirletici) Analizi Projesi

Başlangıç olarak, araçlara, programlama diline, kısacası biyoenformatiğe alışabilmem için bana verilen bu ufak projeyi ayrıntılı olarak anlatacağım. Biliyoruz ki, laboratuvar çalışmalarımızda ne kadar önlemeye çalışsak da kontaminant riski hep bulunuyor. Bunu ne kadar aza indirsek o kadar iyi, ki daha sonra bunun miktarını bulup, bunun üzerinden sonucumuzun bir başka değerlendirmesini de yapabiliriz. İşte bunu bulmak için bir yöntem, DNA analizi. Çalıştığınız örneğinizin DNA’sı dizileniyor ve bu DNA çeşitli programlarla analiz edilip, kirleten organizmaları DNA’larından ortaya çıkarabiliyoruz

Blog

FASTQ Formatı - FASTQ Dosyası

Bugün programı oluştururken kullanacağım “test” dizilimini aldım. İki adet FASTQ dosyasından oluşuyor, her biri sıkıştırılmış ama buna rağmen boyutları 6 GB civarı. Ben elbette çok zaman kaybetmek istemediğim için bu dosyalardan birinin sadece bir kısmını kullanacağım. Amacım, bu FASTQ dosyalarındaki eşleşebilen okumaları BWA aracı ile bularak, daha sonra onları çıkarmak. Ve kalan eşleşemeyen okumaları MegaBLAST aracının anlayabileceği bir dilde (FASTA formatında) kaydetmek. Bu arada tüm projeyi bir Unix bilgisayarda hazırladığım için birçok komut öğreniyorum, daha sonra bunları ayrıca yazmaya çalışacağım.

Blog

Kalite Satirinin Degerlendirilmesi - Quality Filter

Kirleten organizma (konaminant) analizi yapacak olan pipeline’i daha fazla gelistirmek, daha anlamli sonuclar elde etmek icin ilk adimlara (henuz fastq dosyasini isliyorken) kalite filtresi eklemeyi dusunduk. Boylece belirli bir esik degerinden dusuk okumalari daha o asamadan filtreleyerek daha guvenilir sonuclar elde elebilecegiz. Bu kalite kontrolunu fastq dosyasinda her okumanin 4. satirini anlayarak yapacagiz. Bu 4. satir (aslinda okumanin dizileme kalite skoru), cesitli dizileme cihazlari tarafindan cesitli sekillerde yaziliyor (kodlaniyor) ve bu kodlamadan tekrar kalite skorunu elde ederek filtreleme uygulanmasi gerekiyor.

Blog

Kalite Satirinin Degerlendirilmesi - Quality Filter

Kirleten organizma (konaminant) analizi yapacak olan pipeline’i daha fazla gelistirmek, daha anlamli sonuclar elde etmek icin ilk adimlara (henuz fastq dosyasini isliyorken) kalite filtresi eklemeyi dusunduk. Boylece belirli bir esik degerinden dusuk okumalari daha o asamadan filtreleyerek daha guvenilir sonuclar elde elebilecegiz. Bu kalite kontrolunu fastq dosyasinda her okumanin 4. satirini anlayarak yapacagiz. Bu 4. satir (aslinda okumanin dizileme kalite skoru), cesitli dizileme cihazlari tarafindan cesitli sekillerde yaziliyor (kodlaniyor) ve bu kodlamadan tekrar kalite skorunu elde ederek filtreleme uygulanmasi gerekiyor.

Blog

Kalite Satirinin Degerlendirilmesi - Quality Filter

Kirleten organizma (konaminant) analizi yapacak olan pipeline’i daha fazla gelistirmek, daha anlamli sonuclar elde etmek icin ilk adimlara (henuz fastq dosyasini isliyorken) kalite filtresi eklemeyi dusunduk. Boylece belirli bir esik degerinden dusuk okumalari daha o asamadan filtreleyerek daha guvenilir sonuclar elde elebilecegiz. Bu kalite kontrolunu fastq dosyasinda her okumanin 4. satirini anlayarak yapacagiz. Bu 4. satir (aslinda okumanin dizileme kalite skoru), cesitli dizileme cihazlari tarafindan cesitli sekillerde yaziliyor (kodlaniyor) ve bu kodlamadan tekrar kalite skorunu elde ederek filtreleme uygulanmasi gerekiyor.

Blog

Kalite Satirinin Degerlendirilmesi - Quality Filter

Kirleten organizma (konaminant) analizi yapacak olan pipeline’i daha fazla gelistirmek, daha anlamli sonuclar elde etmek icin ilk adimlara (henuz fastq dosyasini isliyorken) kalite filtresi eklemeyi dusunduk. Boylece belirli bir esik degerinden dusuk okumalari daha o asamadan filtreleyerek daha guvenilir sonuclar elde elebilecegiz. Bu kalite kontrolunu fastq dosyasinda her okumanin 4. satirini anlayarak yapacagiz. Bu 4. satir (aslinda okumanin dizileme kalite skoru), cesitli dizileme cihazlari tarafindan cesitli sekillerde yaziliyor (kodlaniyor) ve bu kodlamadan tekrar kalite skorunu elde ederek filtreleme uygulanmasi gerekiyor.

Blog

Kalite Satirinin Degerlendirilmesi - Quality Filter

Kirleten organizma (konaminant) analizi yapacak olan pipeline’i daha fazla gelistirmek, daha anlamli sonuclar elde etmek icin ilk adimlara (henuz fastq dosyasini isliyorken) kalite filtresi eklemeyi dusunduk. Boylece belirli bir esik degerinden dusuk okumalari daha o asamadan filtreleyerek daha guvenilir sonuclar elde elebilecegiz. Bu kalite kontrolunu fastq dosyasinda her okumanin 4. satirini anlayarak yapacagiz. Bu 4. satir (aslinda okumanin dizileme kalite skoru), cesitli dizileme cihazlari tarafindan cesitli sekillerde yaziliyor (kodlaniyor) ve bu kodlamadan tekrar kalite skorunu elde ederek filtreleme uygulanmasi gerekiyor.

Blog

Kalite Satirinin Degerlendirilmesi - Quality Filter

Kirleten organizma (konaminant) analizi yapacak olan pipeline’i daha fazla gelistirmek, daha anlamli sonuclar elde etmek icin ilk adimlara (henuz fastq dosyasini isliyorken) kalite filtresi eklemeyi dusunduk. Boylece belirli bir esik degerinden dusuk okumalari daha o asamadan filtreleyerek daha guvenilir sonuclar elde elebilecegiz. Bu kalite kontrolunu fastq dosyasinda her okumanin 4. satirini anlayarak yapacagiz. Bu 4. satir (aslinda okumanin dizileme kalite skoru), cesitli dizileme cihazlari tarafindan cesitli sekillerde yaziliyor (kodlaniyor) ve bu kodlamadan tekrar kalite skorunu elde ederek filtreleme uygulanmasi gerekiyor.

Blog

Dorduncu Deneme Veriseti: Mus Musculus Genomu

Simdiye kadar ilk uc veriseti de insan genomuna aitti. Pipeline’i bu genomlarla deneyip, yer yer iyilestirmeler yaptim. Simdi ise baska organizmalarla da deneyip, daha fazla sonuc alip bunlari inceleyecegim ve gene gerekli iyilestirmeleri yapacagim. Bu ilk farkli veriseti fareden geliyor. Mus Musculus tur adina ve ev faresi olarak yaygin isme sahip bu organizma da model organizma olarak calismalarda kullanildigi icin dizisi daha siklikla cikarilan diger bir organizma. Bi dizilemeyi yapan, birlikte calistigim laboratuvardan cesitli BAM formatinda dizi dosyalari aldim.

Blog

Yeni Verisetinin Incelenmesi

Pipeline’i tasarlama asamasinda deneme amacli kullandigim onceki verinin cok kotu olmasi sebebiyle yeni bir veriseti aldim. Elbette deneme asamasinda birden fazla, farkli karakterlerde verisetleri kullanmak yararlidir. Ancak onceki veriseti anlamli birkac sonuc veremeyecek kadar kotuydu diyebilirim. Ayrintilarina [buradan]({% post_url 2012-07-06-eslestirme-ve-eslesmeyen-okumalari %}) gozatabilirsiniz. Yeni veriseti, gene bir insan genomu verisi ve BAM dosyasinin boyutu 1.8 GB ve icinde eslenebilen ve eslenemeyen okumalari bulunduruyordu. Ben bam2fastq araciyla hem bu BAM dosyasini FASTQ dosyasina cevirirken hem de eslenebilen okumalardan ayiklayarak 0.

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

Pipeline ve Pipeline Geliştirme

Bugün aldığım tanıtım derslerinin devamında, pipeline ve pipeline geliştirme ile ilgili ayrıntılı bilgiler aldım. Pipeline, aslında bildiğimiz boru hattı demek, örneğin borularla petrolün bir yerden başka bir yere taşınması için kullanılan sistem. Bunun bilgisayar terminolojisinde anlamı ise bir elementin çıktısı, diğerinin girdisi olacak şekilde oluşturulmuş işleme elementleri zinciri. Böylece çok daha komplike işlemler pipeline oluşturularak, kolay ve düzenli bir biçimde gerçekleştiriliyor. Sanırım pipeline Türkçeye ardışık düzen olarak çevriliyor, gene de ben pipeline olarak kullanacağım.

Blog

Kalite Satirinin Degerlendirilmesi - Quality Filter

Kirleten organizma (konaminant) analizi yapacak olan pipeline’i daha fazla gelistirmek, daha anlamli sonuclar elde etmek icin ilk adimlara (henuz fastq dosyasini isliyorken) kalite filtresi eklemeyi dusunduk. Boylece belirli bir esik degerinden dusuk okumalari daha o asamadan filtreleyerek daha guvenilir sonuclar elde elebilecegiz. Bu kalite kontrolunu fastq dosyasinda her okumanin 4. satirini anlayarak yapacagiz. Bu 4. satir (aslinda okumanin dizileme kalite skoru), cesitli dizileme cihazlari tarafindan cesitli sekillerde yaziliyor (kodlaniyor) ve bu kodlamadan tekrar kalite skorunu elde ederek filtreleme uygulanmasi gerekiyor.

Blog

Kalite Satirinin Degerlendirilmesi - Quality Filter

Kirleten organizma (konaminant) analizi yapacak olan pipeline’i daha fazla gelistirmek, daha anlamli sonuclar elde etmek icin ilk adimlara (henuz fastq dosyasini isliyorken) kalite filtresi eklemeyi dusunduk. Boylece belirli bir esik degerinden dusuk okumalari daha o asamadan filtreleyerek daha guvenilir sonuclar elde elebilecegiz. Bu kalite kontrolunu fastq dosyasinda her okumanin 4. satirini anlayarak yapacagiz. Bu 4. satir (aslinda okumanin dizileme kalite skoru), cesitli dizileme cihazlari tarafindan cesitli sekillerde yaziliyor (kodlaniyor) ve bu kodlamadan tekrar kalite skorunu elde ederek filtreleme uygulanmasi gerekiyor.

Blog

Kalite Satirinin Degerlendirilmesi - Quality Filter

Kirleten organizma (konaminant) analizi yapacak olan pipeline’i daha fazla gelistirmek, daha anlamli sonuclar elde etmek icin ilk adimlara (henuz fastq dosyasini isliyorken) kalite filtresi eklemeyi dusunduk. Boylece belirli bir esik degerinden dusuk okumalari daha o asamadan filtreleyerek daha guvenilir sonuclar elde elebilecegiz. Bu kalite kontrolunu fastq dosyasinda her okumanin 4. satirini anlayarak yapacagiz. Bu 4. satir (aslinda okumanin dizileme kalite skoru), cesitli dizileme cihazlari tarafindan cesitli sekillerde yaziliyor (kodlaniyor) ve bu kodlamadan tekrar kalite skorunu elde ederek filtreleme uygulanmasi gerekiyor.

Blog

Dorduncu Deneme Veriseti: Mus Musculus Genomu

Simdiye kadar ilk uc veriseti de insan genomuna aitti. Pipeline’i bu genomlarla deneyip, yer yer iyilestirmeler yaptim. Simdi ise baska organizmalarla da deneyip, daha fazla sonuc alip bunlari inceleyecegim ve gene gerekli iyilestirmeleri yapacagim. Bu ilk farkli veriseti fareden geliyor. Mus Musculus tur adina ve ev faresi olarak yaygin isme sahip bu organizma da model organizma olarak calismalarda kullanildigi icin dizisi daha siklikla cikarilan diger bir organizma. Bi dizilemeyi yapan, birlikte calistigim laboratuvardan cesitli BAM formatinda dizi dosyalari aldim.

Blog

Yeni Verisetinin Incelenmesi

Pipeline’i tasarlama asamasinda deneme amacli kullandigim onceki verinin cok kotu olmasi sebebiyle yeni bir veriseti aldim. Elbette deneme asamasinda birden fazla, farkli karakterlerde verisetleri kullanmak yararlidir. Ancak onceki veriseti anlamli birkac sonuc veremeyecek kadar kotuydu diyebilirim. Ayrintilarina [buradan]({% post_url 2012-07-06-eslestirme-ve-eslesmeyen-okumalari %}) gozatabilirsiniz. Yeni veriseti, gene bir insan genomu verisi ve BAM dosyasinin boyutu 1.8 GB ve icinde eslenebilen ve eslenemeyen okumalari bulunduruyordu. Ben bam2fastq araciyla hem bu BAM dosyasini FASTQ dosyasina cevirirken hem de eslenebilen okumalardan ayiklayarak 0.

Blog

SAM Dosyası - BAM Dosyası - samtools

Aslında programlamam gereken pipeline direkt olarak eşleşmeyen okumalar üzerinden analizler yapacak. Ancak böyle bir veri bulamadiığım için, elimdeki tek veri eşleşen ve eşleşmeyen okumaları içerdiği için önce eşleşenlerden kurtulmam gerekti. Bunu daha önce de belirttiğim gibi bwa eşleştiricisi (aligner - mapper) ile yapıyorum. bwa bir dizi işlemden sonra SAM dosyası oluşturuyor ancak benim FASTQ dosyasına ihtiyacım var. Bunun için SAM dosyasını samtools1 ile benzer bir format olan BAM dosyasına çevirip, daha sonra da bam2fastq2 aracı ile FASTQ dosyamı elde edeceğim.

Blog

Dorduncu Deneme Veriseti: Mus Musculus Genomu

Simdiye kadar ilk uc veriseti de insan genomuna aitti. Pipeline’i bu genomlarla deneyip, yer yer iyilestirmeler yaptim. Simdi ise baska organizmalarla da deneyip, daha fazla sonuc alip bunlari inceleyecegim ve gene gerekli iyilestirmeleri yapacagim. Bu ilk farkli veriseti fareden geliyor. Mus Musculus tur adina ve ev faresi olarak yaygin isme sahip bu organizma da model organizma olarak calismalarda kullanildigi icin dizisi daha siklikla cikarilan diger bir organizma. Bi dizilemeyi yapan, birlikte calistigim laboratuvardan cesitli BAM formatinda dizi dosyalari aldim.

Blog

SAM Dosyası - BAM Dosyası - samtools

Aslında programlamam gereken pipeline direkt olarak eşleşmeyen okumalar üzerinden analizler yapacak. Ancak böyle bir veri bulamadiığım için, elimdeki tek veri eşleşen ve eşleşmeyen okumaları içerdiği için önce eşleşenlerden kurtulmam gerekti. Bunu daha önce de belirttiğim gibi bwa eşleştiricisi (aligner - mapper) ile yapıyorum. bwa bir dizi işlemden sonra SAM dosyası oluşturuyor ancak benim FASTQ dosyasına ihtiyacım var. Bunun için SAM dosyasını samtools1 ile benzer bir format olan BAM dosyasına çevirip, daha sonra da bam2fastq2 aracı ile FASTQ dosyamı elde edeceğim.

Blog

Dorduncu Deneme Veriseti: Mus Musculus Genomu

Simdiye kadar ilk uc veriseti de insan genomuna aitti. Pipeline’i bu genomlarla deneyip, yer yer iyilestirmeler yaptim. Simdi ise baska organizmalarla da deneyip, daha fazla sonuc alip bunlari inceleyecegim ve gene gerekli iyilestirmeleri yapacagim. Bu ilk farkli veriseti fareden geliyor. Mus Musculus tur adina ve ev faresi olarak yaygin isme sahip bu organizma da model organizma olarak calismalarda kullanildigi icin dizisi daha siklikla cikarilan diger bir organizma. Bi dizilemeyi yapan, birlikte calistigim laboratuvardan cesitli BAM formatinda dizi dosyalari aldim.

Blog

Eşleştirme ve Eşleşmeyen Okumaları Çıkarma Sonuçları

Daha önce verinin sadece bir kısmı ile çalışıyordum ancak artık tamamıyla çalışacağım. Bu yüzden bana sıkıştırılmış halde gelen veriyi direkt çalışma klasörüme çıkardım ve onun üzerinden işlemler yaptım. Başlangıç (FASTQ) dosyamın boyutu 2153988289 bayt (2 GB). Ve bwa aracılığıyla eşleştirmeden sonra toplamda 6004193 dizilim, ya da okuma, (sequences ya da reads) ortaya çıktı. Daha sonra eşleşmeyen okumaları çıkarmam sonrasında toplam okuma sayısı 551065 kadar azaldı ve 5493128 oldu. Yani verinin %9.

Blog

SAM Dosyası - BAM Dosyası - samtools

Aslında programlamam gereken pipeline direkt olarak eşleşmeyen okumalar üzerinden analizler yapacak. Ancak böyle bir veri bulamadiığım için, elimdeki tek veri eşleşen ve eşleşmeyen okumaları içerdiği için önce eşleşenlerden kurtulmam gerekti. Bunu daha önce de belirttiğim gibi bwa eşleştiricisi (aligner - mapper) ile yapıyorum. bwa bir dizi işlemden sonra SAM dosyası oluşturuyor ancak benim FASTQ dosyasına ihtiyacım var. Bunun için SAM dosyasını samtools1 ile benzer bir format olan BAM dosyasına çevirip, daha sonra da bam2fastq2 aracı ile FASTQ dosyamı elde edeceğim.

Blog

İlk Adım: Eşleşmeyen Okumaları Elde Etmek

Projemin ilk kısmı daha önce bahsettiğim gibi eşleşmeyen okumaları (unmapped reads) FASTQ dosyasından çıkarmak. Böylece, daha sonraki analizler için elimdeki ihtiyacım olmayan dizileri çıkarmış ve bu analizlerdeki iş yükünü azaltmış oluyorum. Başından beri hedefim, tüm projeyi adım adım gerçekleştiren bir pipeline tasarlamak olduğu için bu işlemi bir Perl scripti ile yapacağım. Bu script pipeline’in ilk scripti ve laboratuvardan gelecek ham (raw) FASTQ formatındaki verinin girdi (input) olarak kullanılacağı yer. Aslında bu scripte ihtiyacım olmayacak, sadece elimdeki verinin eşlenebilen verileri de içermesi sebebiyle bu adımı ekledim.

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

Dorduncu Deneme Veriseti: Mus Musculus Genomu

Simdiye kadar ilk uc veriseti de insan genomuna aitti. Pipeline’i bu genomlarla deneyip, yer yer iyilestirmeler yaptim. Simdi ise baska organizmalarla da deneyip, daha fazla sonuc alip bunlari inceleyecegim ve gene gerekli iyilestirmeleri yapacagim. Bu ilk farkli veriseti fareden geliyor. Mus Musculus tur adina ve ev faresi olarak yaygin isme sahip bu organizma da model organizma olarak calismalarda kullanildigi icin dizisi daha siklikla cikarilan diger bir organizma. Bi dizilemeyi yapan, birlikte calistigim laboratuvardan cesitli BAM formatinda dizi dosyalari aldim.

Blog

Dorduncu Deneme Veriseti: Mus Musculus Genomu

Simdiye kadar ilk uc veriseti de insan genomuna aitti. Pipeline’i bu genomlarla deneyip, yer yer iyilestirmeler yaptim. Simdi ise baska organizmalarla da deneyip, daha fazla sonuc alip bunlari inceleyecegim ve gene gerekli iyilestirmeleri yapacagim. Bu ilk farkli veriseti fareden geliyor. Mus Musculus tur adina ve ev faresi olarak yaygin isme sahip bu organizma da model organizma olarak calismalarda kullanildigi icin dizisi daha siklikla cikarilan diger bir organizma. Bi dizilemeyi yapan, birlikte calistigim laboratuvardan cesitli BAM formatinda dizi dosyalari aldim.

Blog

Dorduncu Deneme Veriseti: Mus Musculus Genomu

Simdiye kadar ilk uc veriseti de insan genomuna aitti. Pipeline’i bu genomlarla deneyip, yer yer iyilestirmeler yaptim. Simdi ise baska organizmalarla da deneyip, daha fazla sonuc alip bunlari inceleyecegim ve gene gerekli iyilestirmeleri yapacagim. Bu ilk farkli veriseti fareden geliyor. Mus Musculus tur adina ve ev faresi olarak yaygin isme sahip bu organizma da model organizma olarak calismalarda kullanildigi icin dizisi daha siklikla cikarilan diger bir organizma. Bi dizilemeyi yapan, birlikte calistigim laboratuvardan cesitli BAM formatinda dizi dosyalari aldim.

Blog

BWA İle Eşleştirme (Mapping - Alignment)

Bunu daha önce yazmayı unutmuşum. Aslında bahsetmiştim ancak nasıl yapıldığına dair bir şeyler yazmamışım ayrıca örnek komutlar da eklememişim. BWA elimizdeki (FASTQ formatındaki) DNA dizilimini, referans genomunu (projemde bu insan genomu) alarak bir .sai dosyası oluşturuyor. Bu dosya dizinin ve referans genomunun eşleşmesi ile ilgili bilgiler taşiyor ve bu bilgileri kullanarak eşleşmeyenleri ayırabiliyorum. İlk olarak aşağıdaki komut ile .sai dosyamızı oluşturuyoruz. 1bwa aln $NGSDATAROOT/bwa/human_genome37 ChIP_NoIndex_L001_R1_complete_filtered.fastq > complete_alignment.sai Oluşturduğumuz .sai dosyası çok da kullanışlı bir dosya değil, bu yüzden onu SAM dosyasına çevirerek, işlemlere devam ediyoruz.

Blog

BWA (Burrows-Wheeler Aligner) Hizalayıcı - Eşleştirici

Önceki yazımda belirttiğim gibi bir eşleştirici (aligner ya da mapper) kullanarak elimdeki verinin referans genomu ile ne derece eşlestiğini bulmaya çalışacağım. Daha sonra eşleşmeyen kısmıyla birtakım analizler yapacağım. BWA (Burrows-Wheeler Aligner) görece kısa dizilimleri insan genomu gibi uzun referans genomlarıyla eşleştiren bir program. 200bp (bp: baz çifti) uzunluğuna kadar bwa-short algoritması, 200bp - 100kbp arası ise BWA-SW algoritması kullanılıyor. Hizalayıcı - eşleştirici seçmede birçok faktör rol oynuyor. Birçok bu tip araç var ve farklı özelliklere sahipler.

Blog

Biyoenformatik Nedir? Biyoenformatik'in Tanımı

Birçok organizmanın ve son olarak da 2001’de insan genomunun çıkarılmasıyla, tüm 3 milyar baz çiftinin diziliminin elde edilmesiyle, karşımıza bu bilgiyi farklı şekillerde kullanacak olan alanlar çıktı.Bu genleri anlamaya çalışan, bu genlerden oluşacak proteinleri belirlemeye çalışan alanların yanında bu bilginin analizini yapma ihtiyacı da Biyoenformatik alanını doğurdu. Biyoenformatik, biyolojik bilginin bilgisayarlar ve istatistiksel teknikler kullanılarak analiz edilmesidir; başka bir deyişle, biyoenformatik, biyolojik araştırmaları iyileştirmek ve hızlandırmak için bilgisayar veri tabanları ve algoritmaları geliştirme ve onlardan yarar sağlama bilimidir [1].

Blog

Dorduncu Deneme Veriseti: Mus Musculus Genomu

Simdiye kadar ilk uc veriseti de insan genomuna aitti. Pipeline’i bu genomlarla deneyip, yer yer iyilestirmeler yaptim. Simdi ise baska organizmalarla da deneyip, daha fazla sonuc alip bunlari inceleyecegim ve gene gerekli iyilestirmeleri yapacagim. Bu ilk farkli veriseti fareden geliyor. Mus Musculus tur adina ve ev faresi olarak yaygin isme sahip bu organizma da model organizma olarak calismalarda kullanildigi icin dizisi daha siklikla cikarilan diger bir organizma. Bi dizilemeyi yapan, birlikte calistigim laboratuvardan cesitli BAM formatinda dizi dosyalari aldim.

Blog

Inceleme Sonuclarini "Ambiguous" Olarak Ayirmak

Cesitli veritabanlarina karsi yaptigim aramalardan aldigim sonuclari incelerken, bunlari cesitli esik degerleri ile degerlendirmek ile beraber belirlenen esik degerlerinin uzerinde ya da altinda olan hitleri “Ambiguous” (belirsiz, cok anlamli) ya da “Unique” (essiz, tek) olarak ayirarak daha da anlamli hale getirmeye calisiyorum. “Ambiguous” olarak, her bir megablast dosyasinda esik degerlerine uygun ancak birden fazla farkli organizmayi iceren hitleri etiketliyorum. Eger her esik degerine uygun hit, tek bir dosya icinde her zaman ayni organizmaya ait ise bu durumda yaptigim sey onu “unique” olarak etiketlemek.

Blog

Inceleme Sonuclarini "Ambiguous" Olarak Ayirmak

Cesitli veritabanlarina karsi yaptigim aramalardan aldigim sonuclari incelerken, bunlari cesitli esik degerleri ile degerlendirmek ile beraber belirlenen esik degerlerinin uzerinde ya da altinda olan hitleri “Ambiguous” (belirsiz, cok anlamli) ya da “Unique” (essiz, tek) olarak ayirarak daha da anlamli hale getirmeye calisiyorum. “Ambiguous” olarak, her bir megablast dosyasinda esik degerlerine uygun ancak birden fazla farkli organizmayi iceren hitleri etiketliyorum. Eger her esik degerine uygun hit, tek bir dosya icinde her zaman ayni organizmaya ait ise bu durumda yaptigim sey onu “unique” olarak etiketlemek.

Blog

Inceleme Sonuclarini "Ambiguous" Olarak Ayirmak

Cesitli veritabanlarina karsi yaptigim aramalardan aldigim sonuclari incelerken, bunlari cesitli esik degerleri ile degerlendirmek ile beraber belirlenen esik degerlerinin uzerinde ya da altinda olan hitleri “Ambiguous” (belirsiz, cok anlamli) ya da “Unique” (essiz, tek) olarak ayirarak daha da anlamli hale getirmeye calisiyorum. “Ambiguous” olarak, her bir megablast dosyasinda esik degerlerine uygun ancak birden fazla farkli organizmayi iceren hitleri etiketliyorum. Eger her esik degerine uygun hit, tek bir dosya icinde her zaman ayni organizmaya ait ise bu durumda yaptigim sey onu “unique” olarak etiketlemek.

Blog

Inceleme Sonuclarini "Ambiguous" Olarak Ayirmak

Cesitli veritabanlarina karsi yaptigim aramalardan aldigim sonuclari incelerken, bunlari cesitli esik degerleri ile degerlendirmek ile beraber belirlenen esik degerlerinin uzerinde ya da altinda olan hitleri “Ambiguous” (belirsiz, cok anlamli) ya da “Unique” (essiz, tek) olarak ayirarak daha da anlamli hale getirmeye calisiyorum. “Ambiguous” olarak, her bir megablast dosyasinda esik degerlerine uygun ancak birden fazla farkli organizmayi iceren hitleri etiketliyorum. Eger her esik degerine uygun hit, tek bir dosya icinde her zaman ayni organizmaya ait ise bu durumda yaptigim sey onu “unique” olarak etiketlemek.

Blog

Inceleme Sonuclarini "Ambiguous" Olarak Ayirmak

Cesitli veritabanlarina karsi yaptigim aramalardan aldigim sonuclari incelerken, bunlari cesitli esik degerleri ile degerlendirmek ile beraber belirlenen esik degerlerinin uzerinde ya da altinda olan hitleri “Ambiguous” (belirsiz, cok anlamli) ya da “Unique” (essiz, tek) olarak ayirarak daha da anlamli hale getirmeye calisiyorum. “Ambiguous” olarak, her bir megablast dosyasinda esik degerlerine uygun ancak birden fazla farkli organizmayi iceren hitleri etiketliyorum. Eger her esik degerine uygun hit, tek bir dosya icinde her zaman ayni organizmaya ait ise bu durumda yaptigim sey onu “unique” olarak etiketlemek.

Blog

Inceleme Sonuclarini "Ambiguous" Olarak Ayirmak

Cesitli veritabanlarina karsi yaptigim aramalardan aldigim sonuclari incelerken, bunlari cesitli esik degerleri ile degerlendirmek ile beraber belirlenen esik degerlerinin uzerinde ya da altinda olan hitleri “Ambiguous” (belirsiz, cok anlamli) ya da “Unique” (essiz, tek) olarak ayirarak daha da anlamli hale getirmeye calisiyorum. “Ambiguous” olarak, her bir megablast dosyasinda esik degerlerine uygun ancak birden fazla farkli organizmayi iceren hitleri etiketliyorum. Eger her esik degerine uygun hit, tek bir dosya icinde her zaman ayni organizmaya ait ise bu durumda yaptigim sey onu “unique” olarak etiketlemek.

Blog

Inceleme Sonuclarini "Ambiguous" Olarak Ayirmak

Cesitli veritabanlarina karsi yaptigim aramalardan aldigim sonuclari incelerken, bunlari cesitli esik degerleri ile degerlendirmek ile beraber belirlenen esik degerlerinin uzerinde ya da altinda olan hitleri “Ambiguous” (belirsiz, cok anlamli) ya da “Unique” (essiz, tek) olarak ayirarak daha da anlamli hale getirmeye calisiyorum. “Ambiguous” olarak, her bir megablast dosyasinda esik degerlerine uygun ancak birden fazla farkli organizmayi iceren hitleri etiketliyorum. Eger her esik degerine uygun hit, tek bir dosya icinde her zaman ayni organizmaya ait ise bu durumda yaptigim sey onu “unique” olarak etiketlemek.

Blog

Inceleme Sonuclarini "Ambiguous" Olarak Ayirmak

Cesitli veritabanlarina karsi yaptigim aramalardan aldigim sonuclari incelerken, bunlari cesitli esik degerleri ile degerlendirmek ile beraber belirlenen esik degerlerinin uzerinde ya da altinda olan hitleri “Ambiguous” (belirsiz, cok anlamli) ya da “Unique” (essiz, tek) olarak ayirarak daha da anlamli hale getirmeye calisiyorum. “Ambiguous” olarak, her bir megablast dosyasinda esik degerlerine uygun ancak birden fazla farkli organizmayi iceren hitleri etiketliyorum. Eger her esik degerine uygun hit, tek bir dosya icinde her zaman ayni organizmaya ait ise bu durumda yaptigim sey onu “unique” olarak etiketlemek.

Blog

Inceleme Sonuclarini "Ambiguous" Olarak Ayirmak

Cesitli veritabanlarina karsi yaptigim aramalardan aldigim sonuclari incelerken, bunlari cesitli esik degerleri ile degerlendirmek ile beraber belirlenen esik degerlerinin uzerinde ya da altinda olan hitleri “Ambiguous” (belirsiz, cok anlamli) ya da “Unique” (essiz, tek) olarak ayirarak daha da anlamli hale getirmeye calisiyorum. “Ambiguous” olarak, her bir megablast dosyasinda esik degerlerine uygun ancak birden fazla farkli organizmayi iceren hitleri etiketliyorum. Eger her esik degerine uygun hit, tek bir dosya icinde her zaman ayni organizmaya ait ise bu durumda yaptigim sey onu “unique” olarak etiketlemek.

Blog

Inceleme Sonuclarini "Ambiguous" Olarak Ayirmak

Cesitli veritabanlarina karsi yaptigim aramalardan aldigim sonuclari incelerken, bunlari cesitli esik degerleri ile degerlendirmek ile beraber belirlenen esik degerlerinin uzerinde ya da altinda olan hitleri “Ambiguous” (belirsiz, cok anlamli) ya da “Unique” (essiz, tek) olarak ayirarak daha da anlamli hale getirmeye calisiyorum. “Ambiguous” olarak, her bir megablast dosyasinda esik degerlerine uygun ancak birden fazla farkli organizmayi iceren hitleri etiketliyorum. Eger her esik degerine uygun hit, tek bir dosya icinde her zaman ayni organizmaya ait ise bu durumda yaptigim sey onu “unique” olarak etiketlemek.

Blog

Ikinci Veriseti Inceleme Sonuclari

Daha az eslenemeyen okumalara sahip ikinci verisetinin incelemesini tamamladim. Bu oncekine gore daha iyi bir dizileme ornegi oldugu icin aldigim sonuclar da oldukca tutarliydi. Insan genomuna ait bir diziden inceleme sonra asagidaki sonuclari elde ettim. LIST OF ORGANISMS AND THEIR NUMBER OF OCCURENCES Ambiguous hit 1323 Homo sapiens 312 Pan troglodytes 25 Pongo abelii 18 Nomascus leucogenys 17 Halomonas sp. GFAJ-1 7 Callithrix jacchus 4 Macaca mulatta 3 Oryctolagus cuniculus 2 Loxodonta africana 1 Cavia porcellus 1 “Ambiguous hit” tanimini baska bir yazida aciklayacagim.

Blog

Ikinci Veriseti Inceleme Sonuclari

Daha az eslenemeyen okumalara sahip ikinci verisetinin incelemesini tamamladim. Bu oncekine gore daha iyi bir dizileme ornegi oldugu icin aldigim sonuclar da oldukca tutarliydi. Insan genomuna ait bir diziden inceleme sonra asagidaki sonuclari elde ettim. LIST OF ORGANISMS AND THEIR NUMBER OF OCCURENCES Ambiguous hit 1323 Homo sapiens 312 Pan troglodytes 25 Pongo abelii 18 Nomascus leucogenys 17 Halomonas sp. GFAJ-1 7 Callithrix jacchus 4 Macaca mulatta 3 Oryctolagus cuniculus 2 Loxodonta africana 1 Cavia porcellus 1 “Ambiguous hit” tanimini baska bir yazida aciklayacagim.

Blog

Duzenli Ifadeler ile Tur Ismini Elde Etmek

Projemin sonunda kullaniciya olasi kirleten organizmalarin adlarini (Latince tur isimleri) gosterecegim icin, MegaBLAST sonuclarindaki erisim numaralarini (accession number) kullanarak her dizi icin organizma adlarini elde etmem gerekiyor. Sequence Retrival System (SRS) adinda, HUSAR sunucularinda bulunan baska bir sistem ile bunu yapabiliyorum. SRS’ten organizma adini ogrenebilmem icin Unix komut satirinda “getz” komutuyla birlikte veritabani ismi, erisim numarasi ve ogrenmek istedigim alani yazmam yetiyor. Asagida, bu isi yapabilen ornek bir kod bulabilirsiniz.

Blog

Bir MegaBLAST Ciktisi Icerigi - RefSeq Veritabani

Asagida, deneme FASTA dosyasini refseq_genomic veritabaninda arayarak elde ettigim dosyadan, bir hitin ayrintilarini goruyoruz. >>>>refseq_genomic_complete3: AC_000033_0310 Continuation (311 of 1357) of AC_000033 from base 31000001 (AC_000033 Mus musculus strain mixed chromosome 11, alternate assembly Mm_Celera, whole genome shotgun sequence. 2/2012) Length = 110000 Score = 115 bits (58), Expect = 4e-22 Identities = 74/79 (93%), Gaps = 2/79 (2%) Strand = Plus / Minus Query: 1 ctctctctgtct-tctctctctctctgtctctctctctttctctctcttctctctctctc 59 |||||||||||| ||| ||||||||| ||||||||||| ||||||||||||||||||||| Sbjct: 89773 ctctctctgtctgtctttctctctctctctctctctctctctctctcttctctctctctc 89714 Query: 60 tttctctctgccctctctc 78 ||||||||| ||||||||| Sbjct: 89713 tttctctct-ccctctctc 89696 Ayrintilarda, ilk olarak >>>> karakterleriyle hit ile ilgili baslik bilgisi veriyor.

Blog

Veritabani Secimi

Bu projedeki amacim olasi kirleten organizmalari (kontaminantlari) bulmak. Dolayisiyla genis bir veritabanina ihtiyacim var. Ancak veritabanini genis tutmak boyle bir avantaj sagliyorken, her dizi icin o veritabaninda arama yapmak oldukca fazla bilgisayar gucu ve zaman gerektiriyor. Bu yuzden projemi gelistirirken, cesitli veritabanlarini da inceliyorum. Ve ayrica bunlari nasil kisitlayarak, amacim icin en uygun hale getirebilecegimi arastiriyorum. Ilk olarak NCBI’in Reference Sequence (Kaynak Dizi ya da Referans Sekans) – RefSeq – veritabaniyla basladim.

Blog

Ikinci Veriseti Inceleme Sonuclari

Daha az eslenemeyen okumalara sahip ikinci verisetinin incelemesini tamamladim. Bu oncekine gore daha iyi bir dizileme ornegi oldugu icin aldigim sonuclar da oldukca tutarliydi. Insan genomuna ait bir diziden inceleme sonra asagidaki sonuclari elde ettim. LIST OF ORGANISMS AND THEIR NUMBER OF OCCURENCES Ambiguous hit 1323 Homo sapiens 312 Pan troglodytes 25 Pongo abelii 18 Nomascus leucogenys 17 Halomonas sp. GFAJ-1 7 Callithrix jacchus 4 Macaca mulatta 3 Oryctolagus cuniculus 2 Loxodonta africana 1 Cavia porcellus 1 “Ambiguous hit” tanimini baska bir yazida aciklayacagim.

Blog

Duzenli Ifadeler ile Tur Ismini Elde Etmek

Projemin sonunda kullaniciya olasi kirleten organizmalarin adlarini (Latince tur isimleri) gosterecegim icin, MegaBLAST sonuclarindaki erisim numaralarini (accession number) kullanarak her dizi icin organizma adlarini elde etmem gerekiyor. Sequence Retrival System (SRS) adinda, HUSAR sunucularinda bulunan baska bir sistem ile bunu yapabiliyorum. SRS’ten organizma adini ogrenebilmem icin Unix komut satirinda “getz” komutuyla birlikte veritabani ismi, erisim numarasi ve ogrenmek istedigim alani yazmam yetiyor. Asagida, bu isi yapabilen ornek bir kod bulabilirsiniz.

Blog

Ikinci Veriseti Inceleme Sonuclari

Daha az eslenemeyen okumalara sahip ikinci verisetinin incelemesini tamamladim. Bu oncekine gore daha iyi bir dizileme ornegi oldugu icin aldigim sonuclar da oldukca tutarliydi. Insan genomuna ait bir diziden inceleme sonra asagidaki sonuclari elde ettim. LIST OF ORGANISMS AND THEIR NUMBER OF OCCURENCES Ambiguous hit 1323 Homo sapiens 312 Pan troglodytes 25 Pongo abelii 18 Nomascus leucogenys 17 Halomonas sp. GFAJ-1 7 Callithrix jacchus 4 Macaca mulatta 3 Oryctolagus cuniculus 2 Loxodonta africana 1 Cavia porcellus 1 “Ambiguous hit” tanimini baska bir yazida aciklayacagim.

Blog

Veritabanina Gore Bir Komutun Calisma Suresi - CPU Runtime

Calisilan dosyalar, veritabanları buyuk olunca ve yeterince bilgisayar gucune sahip olmayınca, her seyden once olcmemiz gereken nasil en etkili ve kisa surede sonucu alabiliyor olmamizdir. Özellikle projemde, farkli veritabanları ve farkli parametreler kullanarak, bunları arastiriyorum. Şimdilik dort veritabani deniyorum, bunlar: nrnuc, ensembl_cdna, honest ve refseq_genomic. Ayrica, bunu farkli iki kelime uzunluğuna gore de yapacagim. Kelime uzunluğu (word size) MegaBLAST’in ararken tam olarak eslestirecegi baz cifti sayisi. Yani elimde 151 baz ciftine sahip bir dizilim varsa, ve eger kelime uzunluğu 50 olarak belirlenmişse, bu 151 baz cifti icinden herhangi bir yerden baslayan ama arka arkaya en az 50 bazin dizilendiği kisimlar aranacak.

Blog

Ikinci Veriseti Inceleme Sonuclari

Daha az eslenemeyen okumalara sahip ikinci verisetinin incelemesini tamamladim. Bu oncekine gore daha iyi bir dizileme ornegi oldugu icin aldigim sonuclar da oldukca tutarliydi. Insan genomuna ait bir diziden inceleme sonra asagidaki sonuclari elde ettim. LIST OF ORGANISMS AND THEIR NUMBER OF OCCURENCES Ambiguous hit 1323 Homo sapiens 312 Pan troglodytes 25 Pongo abelii 18 Nomascus leucogenys 17 Halomonas sp. GFAJ-1 7 Callithrix jacchus 4 Macaca mulatta 3 Oryctolagus cuniculus 2 Loxodonta africana 1 Cavia porcellus 1 “Ambiguous hit” tanimini baska bir yazida aciklayacagim.

Blog

Birden Fazla Dizi Dosyalarindan MegaBLAST'i Calistirmak

Asagidaki scripti, pipeline’in MegaBLAST aramasini daha hizli yapabilmek icin dusundugumuz bir teknige uygun olabilmesi icin yazdim. Yaptigi sey, her okuma icin olusturulmus ve formatlanmis dizi dosyalarini kullanarak veritabanlarinda belirtilen baslangic noktasi ve okuma sayisi ile arama yapmak. 1#!user/local/bin/perl 2 3$database = $ARGV[0]; 4$dir = $ARGV[1]; #directory for sequences 5$sp = $ARGV[2]; #starting point 6$n = $ARGV[3] + $sp; 7 8while (1) { 9 system("blastplus -programname=megablast $dir/read_$sp.seq $database -OUTFILE=read_$sp.megablast -nobatch -d"); 10 $sp++; 11 last if ($sp == $n); 12} Burada her sey gercekten cok basit bir programlama ile isliyor.

Blog

Bir MegaBLAST Ciktisi Icerigi - RefSeq Veritabani

Asagida, deneme FASTA dosyasini refseq_genomic veritabaninda arayarak elde ettigim dosyadan, bir hitin ayrintilarini goruyoruz. >>>>refseq_genomic_complete3: AC_000033_0310 Continuation (311 of 1357) of AC_000033 from base 31000001 (AC_000033 Mus musculus strain mixed chromosome 11, alternate assembly Mm_Celera, whole genome shotgun sequence. 2/2012) Length = 110000 Score = 115 bits (58), Expect = 4e-22 Identities = 74/79 (93%), Gaps = 2/79 (2%) Strand = Plus / Minus Query: 1 ctctctctgtct-tctctctctctctgtctctctctctttctctctcttctctctctctc 59 |||||||||||| ||| ||||||||| ||||||||||| ||||||||||||||||||||| Sbjct: 89773 ctctctctgtctgtctttctctctctctctctctctctctctctctcttctctctctctc 89714 Query: 60 tttctctctgccctctctc 78 ||||||||| ||||||||| Sbjct: 89713 tttctctct-ccctctctc 89696 Ayrintilarda, ilk olarak >>>> karakterleriyle hit ile ilgili baslik bilgisi veriyor.

Blog

MegaBLAST - Dizilerdeki Benzerlikleri Bulma Aracı

MegaBLAST, HUSAR paketinde bulunan, BLAST (Basic Local Alignment Search Tool) paketinin bir parçası. Ayrıca BLASTN’in bir değişik türü. MegaBLAST uzun dizileri BLASTN’den daha etkili bir şekilde işliyor ve hem de çok daha hızlı işlem yapiyor ancak daha az duyarlı. Bu yüzden benzer dizileri geniş veri tabanlarında aramaya çok uygun bir araç. Yazacağım program çoklu dizilim barındıran FASTA dosyasını alacak ve megablast komutunu çalıştıracak. Daha sonra da her okuma için bir .

Blog

Birden Fazla Dizi Dosyalarindan MegaBLAST'i Calistirmak

Asagidaki scripti, pipeline’in MegaBLAST aramasini daha hizli yapabilmek icin dusundugumuz bir teknige uygun olabilmesi icin yazdim. Yaptigi sey, her okuma icin olusturulmus ve formatlanmis dizi dosyalarini kullanarak veritabanlarinda belirtilen baslangic noktasi ve okuma sayisi ile arama yapmak. 1#!user/local/bin/perl 2 3$database = $ARGV[0]; 4$dir = $ARGV[1]; #directory for sequences 5$sp = $ARGV[2]; #starting point 6$n = $ARGV[3] + $sp; 7 8while (1) { 9 system("blastplus -programname=megablast $dir/read_$sp.seq $database -OUTFILE=read_$sp.megablast -nobatch -d"); 10 $sp++; 11 last if ($sp == $n); 12} Burada her sey gercekten cok basit bir programlama ile isliyor.

Blog

Birden Fazla Dizi Dosyalarindan MegaBLAST'i Calistirmak

Asagidaki scripti, pipeline’in MegaBLAST aramasini daha hizli yapabilmek icin dusundugumuz bir teknige uygun olabilmesi icin yazdim. Yaptigi sey, her okuma icin olusturulmus ve formatlanmis dizi dosyalarini kullanarak veritabanlarinda belirtilen baslangic noktasi ve okuma sayisi ile arama yapmak. 1#!user/local/bin/perl 2 3$database = $ARGV[0]; 4$dir = $ARGV[1]; #directory for sequences 5$sp = $ARGV[2]; #starting point 6$n = $ARGV[3] + $sp; 7 8while (1) { 9 system("blastplus -programname=megablast $dir/read_$sp.seq $database -OUTFILE=read_$sp.megablast -nobatch -d"); 10 $sp++; 11 last if ($sp == $n); 12} Burada her sey gercekten cok basit bir programlama ile isliyor.

Blog

Birden Fazla Dizi Dosyalarindan MegaBLAST'i Calistirmak

Asagidaki scripti, pipeline’in MegaBLAST aramasini daha hizli yapabilmek icin dusundugumuz bir teknige uygun olabilmesi icin yazdim. Yaptigi sey, her okuma icin olusturulmus ve formatlanmis dizi dosyalarini kullanarak veritabanlarinda belirtilen baslangic noktasi ve okuma sayisi ile arama yapmak. 1#!user/local/bin/perl 2 3$database = $ARGV[0]; 4$dir = $ARGV[1]; #directory for sequences 5$sp = $ARGV[2]; #starting point 6$n = $ARGV[3] + $sp; 7 8while (1) { 9 system("blastplus -programname=megablast $dir/read_$sp.seq $database -OUTFILE=read_$sp.megablast -nobatch -d"); 10 $sp++; 11 last if ($sp == $n); 12} Burada her sey gercekten cok basit bir programlama ile isliyor.

Blog

Birden Fazla Dizi Dosyalarindan MegaBLAST'i Calistirmak

Asagidaki scripti, pipeline’in MegaBLAST aramasini daha hizli yapabilmek icin dusundugumuz bir teknige uygun olabilmesi icin yazdim. Yaptigi sey, her okuma icin olusturulmus ve formatlanmis dizi dosyalarini kullanarak veritabanlarinda belirtilen baslangic noktasi ve okuma sayisi ile arama yapmak. 1#!user/local/bin/perl 2 3$database = $ARGV[0]; 4$dir = $ARGV[1]; #directory for sequences 5$sp = $ARGV[2]; #starting point 6$n = $ARGV[3] + $sp; 7 8while (1) { 9 system("blastplus -programname=megablast $dir/read_$sp.seq $database -OUTFILE=read_$sp.megablast -nobatch -d"); 10 $sp++; 11 last if ($sp == $n); 12} Burada her sey gercekten cok basit bir programlama ile isliyor.

Blog

Birden Fazla Dizi Dosyalarindan MegaBLAST'i Calistirmak

Asagidaki scripti, pipeline’in MegaBLAST aramasini daha hizli yapabilmek icin dusundugumuz bir teknige uygun olabilmesi icin yazdim. Yaptigi sey, her okuma icin olusturulmus ve formatlanmis dizi dosyalarini kullanarak veritabanlarinda belirtilen baslangic noktasi ve okuma sayisi ile arama yapmak. 1#!user/local/bin/perl 2 3$database = $ARGV[0]; 4$dir = $ARGV[1]; #directory for sequences 5$sp = $ARGV[2]; #starting point 6$n = $ARGV[3] + $sp; 7 8while (1) { 9 system("blastplus -programname=megablast $dir/read_$sp.seq $database -OUTFILE=read_$sp.megablast -nobatch -d"); 10 $sp++; 11 last if ($sp == $n); 12} Burada her sey gercekten cok basit bir programlama ile isliyor.

Blog

Birden Fazla Dizi Dosyalarindan MegaBLAST'i Calistirmak

Asagidaki scripti, pipeline’in MegaBLAST aramasini daha hizli yapabilmek icin dusundugumuz bir teknige uygun olabilmesi icin yazdim. Yaptigi sey, her okuma icin olusturulmus ve formatlanmis dizi dosyalarini kullanarak veritabanlarinda belirtilen baslangic noktasi ve okuma sayisi ile arama yapmak. 1#!user/local/bin/perl 2 3$database = $ARGV[0]; 4$dir = $ARGV[1]; #directory for sequences 5$sp = $ARGV[2]; #starting point 6$n = $ARGV[3] + $sp; 7 8while (1) { 9 system("blastplus -programname=megablast $dir/read_$sp.seq $database -OUTFILE=read_$sp.megablast -nobatch -d"); 10 $sp++; 11 last if ($sp == $n); 12} Burada her sey gercekten cok basit bir programlama ile isliyor.

Blog

FASTQ'dan FASTA'ya Donusturme Perl Scripti

FASTQ ve FASTA formatlari aslinda ayni bilgiyi iceren ancak birinde sadece herbir dizi icin iki satir daha az bilginin bulundugu dosya formatlari. Projemde onemli olan diger bir farklari ise FASTA formatinin direkt olarak MegaBLAST arama yapilabilmesi. Iste bu yuzden, genetik dizilim yapan makinelerin olusturdugu FASTQ formatini FASTA’ya cevirmem gerekiyor. Ve bu script pipeline’in ilk adimi. Aslinda deneme amacli aldigim genetk dizilimin, bana bunu ulastiran tarafindan eslestirmesinin yapilmadigi icin, bir on adim olarak bu eslestirmeyi yapmistim.

Blog

Tek FASTA Dosyasindan MegaBLAST'i Calistirmak - Duzenli Ifadeler

Asagida MegaBLAST’i FASTA dosyasi okuyarak calistirmak ve sonuclari bir dizinde toplayabilmek amaciyla yazdigim Perl scripti ve onun aciklamasi var. Bu script tasarlamakta oldugum pipeline’in onemli bir parcasi. Bu script ilk yazdigim olan ve sadece bir FASTA dosyasi uzerinden tum okumalara ulasabilen script. 1#!user/local/bin/perl 2$database = $ARGV[0]; 3$fasta = $ARGV[1]; #input file 4$sp = $ARGV[2]; #starting point 5$n = $ARGV[3] + $sp; 6 7if(!defined($n)){$n=12;} #set default number 8 9open FASTA, $fasta or die $!

Blog

Unix'te Perl Ile Bir Komut Ciktisini Okumak ve Duzenli Ifadeler

Daha once organizma isimlerini duzenli ifadelerle nasil cikardigimi anlatmistim. Burada, gene benzer bir seyden bahsedecegim ancak bu biraz daha fazla, ozel bir teknikle Perl’de yapilan, veri tabanindan bilgileri birden fazla satir halinde cikti olarak aldigim icin gerek duydugum cok yararli bir yontem. Mutlaka benzerini baska amaclarla da kullanabilir, yararlanabilirsiniz. Bu ihtiyac, HUSAR gurubu tarafindan olusturulan honest veritabaninin organizma isimlerini direkt sunmamasi ancak birkac satir halinde gostermesi sebebiyle dogdu. Asagida bunun ornegini gorebilirsiniz.

Blog

Duzenli Ifadeler ile Tur Ismini Elde Etmek

Projemin sonunda kullaniciya olasi kirleten organizmalarin adlarini (Latince tur isimleri) gosterecegim icin, MegaBLAST sonuclarindaki erisim numaralarini (accession number) kullanarak her dizi icin organizma adlarini elde etmem gerekiyor. Sequence Retrival System (SRS) adinda, HUSAR sunucularinda bulunan baska bir sistem ile bunu yapabiliyorum. SRS’ten organizma adini ogrenebilmem icin Unix komut satirinda “getz” komutuyla birlikte veritabani ismi, erisim numarasi ve ogrenmek istedigim alani yazmam yetiyor. Asagida, bu isi yapabilen ornek bir kod bulabilirsiniz.

Blog

Tek FASTA Dosyasindan MegaBLAST'i Calistirmak - Duzenli Ifadeler

Asagida MegaBLAST’i FASTA dosyasi okuyarak calistirmak ve sonuclari bir dizinde toplayabilmek amaciyla yazdigim Perl scripti ve onun aciklamasi var. Bu script tasarlamakta oldugum pipeline’in onemli bir parcasi. Bu script ilk yazdigim olan ve sadece bir FASTA dosyasi uzerinden tum okumalara ulasabilen script. 1#!user/local/bin/perl 2$database = $ARGV[0]; 3$fasta = $ARGV[1]; #input file 4$sp = $ARGV[2]; #starting point 5$n = $ARGV[3] + $sp; 6 7if(!defined($n)){$n=12;} #set default number 8 9open FASTA, $fasta or die $!

Blog

Duzenli Ifadeler ile Tur Ismini Elde Etmek

Projemin sonunda kullaniciya olasi kirleten organizmalarin adlarini (Latince tur isimleri) gosterecegim icin, MegaBLAST sonuclarindaki erisim numaralarini (accession number) kullanarak her dizi icin organizma adlarini elde etmem gerekiyor. Sequence Retrival System (SRS) adinda, HUSAR sunucularinda bulunan baska bir sistem ile bunu yapabiliyorum. SRS’ten organizma adini ogrenebilmem icin Unix komut satirinda “getz” komutuyla birlikte veritabani ismi, erisim numarasi ve ogrenmek istedigim alani yazmam yetiyor. Asagida, bu isi yapabilen ornek bir kod bulabilirsiniz.

Blog

Duzenli Ifadeler ile Tur Ismini Elde Etmek

Projemin sonunda kullaniciya olasi kirleten organizmalarin adlarini (Latince tur isimleri) gosterecegim icin, MegaBLAST sonuclarindaki erisim numaralarini (accession number) kullanarak her dizi icin organizma adlarini elde etmem gerekiyor. Sequence Retrival System (SRS) adinda, HUSAR sunucularinda bulunan baska bir sistem ile bunu yapabiliyorum. SRS’ten organizma adini ogrenebilmem icin Unix komut satirinda “getz” komutuyla birlikte veritabani ismi, erisim numarasi ve ogrenmek istedigim alani yazmam yetiyor. Asagida, bu isi yapabilen ornek bir kod bulabilirsiniz.

Blog

SAM Dosyası - BAM Dosyası - samtools

Aslında programlamam gereken pipeline direkt olarak eşleşmeyen okumalar üzerinden analizler yapacak. Ancak böyle bir veri bulamadiığım için, elimdeki tek veri eşleşen ve eşleşmeyen okumaları içerdiği için önce eşleşenlerden kurtulmam gerekti. Bunu daha önce de belirttiğim gibi bwa eşleştiricisi (aligner - mapper) ile yapıyorum. bwa bir dizi işlemden sonra SAM dosyası oluşturuyor ancak benim FASTQ dosyasına ihtiyacım var. Bunun için SAM dosyasını samtools1 ile benzer bir format olan BAM dosyasına çevirip, daha sonra da bam2fastq2 aracı ile FASTQ dosyamı elde edeceğim.

Blog

MegaBLAST - Dizilerdeki Benzerlikleri Bulma Aracı

MegaBLAST, HUSAR paketinde bulunan, BLAST (Basic Local Alignment Search Tool) paketinin bir parçası. Ayrıca BLASTN’in bir değişik türü. MegaBLAST uzun dizileri BLASTN’den daha etkili bir şekilde işliyor ve hem de çok daha hızlı işlem yapiyor ancak daha az duyarlı. Bu yüzden benzer dizileri geniş veri tabanlarında aramaya çok uygun bir araç. Yazacağım program çoklu dizilim barındıran FASTA dosyasını alacak ve megablast komutunu çalıştıracak. Daha sonra da her okuma için bir .

Blog

Kontaminant (Kirletici) Analizi Projesi

Başlangıç olarak, araçlara, programlama diline, kısacası biyoenformatiğe alışabilmem için bana verilen bu ufak projeyi ayrıntılı olarak anlatacağım. Biliyoruz ki, laboratuvar çalışmalarımızda ne kadar önlemeye çalışsak da kontaminant riski hep bulunuyor. Bunu ne kadar aza indirsek o kadar iyi, ki daha sonra bunun miktarını bulup, bunun üzerinden sonucumuzun bir başka değerlendirmesini de yapabiliriz. İşte bunu bulmak için bir yöntem, DNA analizi. Çalıştığınız örneğinizin DNA’sı dizileniyor ve bu DNA çeşitli programlarla analiz edilip, kirleten organizmaları DNA’larından ortaya çıkarabiliyoruz

Blog

FASTQ Formatı - FASTQ Dosyası

Bugün programı oluştururken kullanacağım “test” dizilimini aldım. İki adet FASTQ dosyasından oluşuyor, her biri sıkıştırılmış ama buna rağmen boyutları 6 GB civarı. Ben elbette çok zaman kaybetmek istemediğim için bu dosyalardan birinin sadece bir kısmını kullanacağım. Amacım, bu FASTQ dosyalarındaki eşleşebilen okumaları BWA aracı ile bularak, daha sonra onları çıkarmak. Ve kalan eşleşemeyen okumaları MegaBLAST aracının anlayabileceği bir dilde (FASTA formatında) kaydetmek. Bu arada tüm projeyi bir Unix bilgisayarda hazırladığım için birçok komut öğreniyorum, daha sonra bunları ayrıca yazmaya çalışacağım.

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

Pipeline ve Pipeline Geliştirme

Bugün aldığım tanıtım derslerinin devamında, pipeline ve pipeline geliştirme ile ilgili ayrıntılı bilgiler aldım. Pipeline, aslında bildiğimiz boru hattı demek, örneğin borularla petrolün bir yerden başka bir yere taşınması için kullanılan sistem. Bunun bilgisayar terminolojisinde anlamı ise bir elementin çıktısı, diğerinin girdisi olacak şekilde oluşturulmuş işleme elementleri zinciri. Böylece çok daha komplike işlemler pipeline oluşturularak, kolay ve düzenli bir biçimde gerçekleştiriliyor. Sanırım pipeline Türkçeye ardışık düzen olarak çevriliyor, gene de ben pipeline olarak kullanacağım.

Blog

WWW2HUSAR - HUSAR'ın Web Arayüzü

Stajımın ikinci gününde HUSAR’ın web arayüzünü konuştuk. HUSAR komut isteminden komutlarla kullanılabilen, yönetilebilen bir yazılım ancak bunu kolaylaştırmak için hazırlanmış bir web arayüzü var. WWW2HUSAR adını verdikleri bu arayüz ile listelenen araçları kolayca seçebiliyor, genetik dizinizi ekleyebiliyor ve başka birçok işlemi kolayca, birkaç tık ile yapabiliyorsunuz. Bununla birlikte biraz daha HUSAR’ın işlevlerine göz attık. Yazılımda, yerel klasörde gen dizisi listeleri oluşturarak, bunları çoklu dizi hizalama (multiple sequence alignment) aracı ile genlerin benzerliklerini karşılastırabiliyor ve örneğin evrimsel ilişkilerini ortaya çıkarabiliyorsunuz.

Blog

DKFZ - Heidelberg Biyoenformatik Birimi'nde Staj

Erasmus programıyla yapıyor olduğum yaz stajı başladı. İlk olarak birimi yöneten bilim insanlarından birkaç saatlik tanıtım dersi aldım. Bu derste birimin kısa tarihi, birimin günümüze kadar yaptıkları projeler ve bunlarin ayrintilari konusunda bilgiler aldım. Biyoenformatik Birimi DKFZ’nin (Deutsches Krebsforschungszentrum – ing. German Cancer Research Center) bir çekirdek tesisi olan Genomik ve Proteomik Çekirdek Tesisi’ne bağlı bir grup. İsimleri aynı zamanda HUSAR (Heidelberg Unix Sequence Analysis Resources) ve bu isim grubun geliştirdiği dizi analizi yapma paketinin de adı olarak kullanılıyor.

Blog

Duzenli Ifadeler ile Tur Ismini Elde Etmek

Projemin sonunda kullaniciya olasi kirleten organizmalarin adlarini (Latince tur isimleri) gosterecegim icin, MegaBLAST sonuclarindaki erisim numaralarini (accession number) kullanarak her dizi icin organizma adlarini elde etmem gerekiyor. Sequence Retrival System (SRS) adinda, HUSAR sunucularinda bulunan baska bir sistem ile bunu yapabiliyorum. SRS’ten organizma adini ogrenebilmem icin Unix komut satirinda “getz” komutuyla birlikte veritabani ismi, erisim numarasi ve ogrenmek istedigim alani yazmam yetiyor. Asagida, bu isi yapabilen ornek bir kod bulabilirsiniz.

Blog

Veritabani Secimi

Bu projedeki amacim olasi kirleten organizmalari (kontaminantlari) bulmak. Dolayisiyla genis bir veritabanina ihtiyacim var. Ancak veritabanini genis tutmak boyle bir avantaj sagliyorken, her dizi icin o veritabaninda arama yapmak oldukca fazla bilgisayar gucu ve zaman gerektiriyor. Bu yuzden projemi gelistirirken, cesitli veritabanlarini da inceliyorum. Ve ayrica bunlari nasil kisitlayarak, amacim icin en uygun hale getirebilecegimi arastiriyorum. Ilk olarak NCBI’in Reference Sequence (Kaynak Dizi ya da Referans Sekans) – RefSeq – veritabaniyla basladim.

Blog

Duzenli Ifadeler ile Tur Ismini Elde Etmek

Projemin sonunda kullaniciya olasi kirleten organizmalarin adlarini (Latince tur isimleri) gosterecegim icin, MegaBLAST sonuclarindaki erisim numaralarini (accession number) kullanarak her dizi icin organizma adlarini elde etmem gerekiyor. Sequence Retrival System (SRS) adinda, HUSAR sunucularinda bulunan baska bir sistem ile bunu yapabiliyorum. SRS’ten organizma adini ogrenebilmem icin Unix komut satirinda “getz” komutuyla birlikte veritabani ismi, erisim numarasi ve ogrenmek istedigim alani yazmam yetiyor. Asagida, bu isi yapabilen ornek bir kod bulabilirsiniz.

Blog

DKFZ - Heidelberg Biyoenformatik Birimi'nde Staj

Erasmus programıyla yapıyor olduğum yaz stajı başladı. İlk olarak birimi yöneten bilim insanlarından birkaç saatlik tanıtım dersi aldım. Bu derste birimin kısa tarihi, birimin günümüze kadar yaptıkları projeler ve bunlarin ayrintilari konusunda bilgiler aldım. Biyoenformatik Birimi DKFZ’nin (Deutsches Krebsforschungszentrum – ing. German Cancer Research Center) bir çekirdek tesisi olan Genomik ve Proteomik Çekirdek Tesisi’ne bağlı bir grup. İsimleri aynı zamanda HUSAR (Heidelberg Unix Sequence Analysis Resources) ve bu isim grubun geliştirdiği dizi analizi yapma paketinin de adı olarak kullanılıyor.

Blog

Bir MegaBLAST Ciktisi Icerigi - RefSeq Veritabani

Asagida, deneme FASTA dosyasini refseq_genomic veritabaninda arayarak elde ettigim dosyadan, bir hitin ayrintilarini goruyoruz. >>>>refseq_genomic_complete3: AC_000033_0310 Continuation (311 of 1357) of AC_000033 from base 31000001 (AC_000033 Mus musculus strain mixed chromosome 11, alternate assembly Mm_Celera, whole genome shotgun sequence. 2/2012) Length = 110000 Score = 115 bits (58), Expect = 4e-22 Identities = 74/79 (93%), Gaps = 2/79 (2%) Strand = Plus / Minus Query: 1 ctctctctgtct-tctctctctctctgtctctctctctttctctctcttctctctctctc 59 |||||||||||| ||| ||||||||| ||||||||||| ||||||||||||||||||||| Sbjct: 89773 ctctctctgtctgtctttctctctctctctctctctctctctctctcttctctctctctc 89714 Query: 60 tttctctctgccctctctc 78 ||||||||| ||||||||| Sbjct: 89713 tttctctct-ccctctctc 89696 Ayrintilarda, ilk olarak >>>> karakterleriyle hit ile ilgili baslik bilgisi veriyor.

Blog

Bir MegaBLAST Ciktisi Icerigi - RefSeq Veritabani

Asagida, deneme FASTA dosyasini refseq_genomic veritabaninda arayarak elde ettigim dosyadan, bir hitin ayrintilarini goruyoruz. >>>>refseq_genomic_complete3: AC_000033_0310 Continuation (311 of 1357) of AC_000033 from base 31000001 (AC_000033 Mus musculus strain mixed chromosome 11, alternate assembly Mm_Celera, whole genome shotgun sequence. 2/2012) Length = 110000 Score = 115 bits (58), Expect = 4e-22 Identities = 74/79 (93%), Gaps = 2/79 (2%) Strand = Plus / Minus Query: 1 ctctctctgtct-tctctctctctctgtctctctctctttctctctcttctctctctctc 59 |||||||||||| ||| ||||||||| ||||||||||| ||||||||||||||||||||| Sbjct: 89773 ctctctctgtctgtctttctctctctctctctctctctctctctctcttctctctctctc 89714 Query: 60 tttctctctgccctctctc 78 ||||||||| ||||||||| Sbjct: 89713 tttctctct-ccctctctc 89696 Ayrintilarda, ilk olarak >>>> karakterleriyle hit ile ilgili baslik bilgisi veriyor.

Blog

Bir MegaBLAST Ciktisi Icerigi - RefSeq Veritabani

Asagida, deneme FASTA dosyasini refseq_genomic veritabaninda arayarak elde ettigim dosyadan, bir hitin ayrintilarini goruyoruz. >>>>refseq_genomic_complete3: AC_000033_0310 Continuation (311 of 1357) of AC_000033 from base 31000001 (AC_000033 Mus musculus strain mixed chromosome 11, alternate assembly Mm_Celera, whole genome shotgun sequence. 2/2012) Length = 110000 Score = 115 bits (58), Expect = 4e-22 Identities = 74/79 (93%), Gaps = 2/79 (2%) Strand = Plus / Minus Query: 1 ctctctctgtct-tctctctctctctgtctctctctctttctctctcttctctctctctc 59 |||||||||||| ||| ||||||||| ||||||||||| ||||||||||||||||||||| Sbjct: 89773 ctctctctgtctgtctttctctctctctctctctctctctctctctcttctctctctctc 89714 Query: 60 tttctctctgccctctctc 78 ||||||||| ||||||||| Sbjct: 89713 tttctctct-ccctctctc 89696 Ayrintilarda, ilk olarak >>>> karakterleriyle hit ile ilgili baslik bilgisi veriyor.

Blog

Bir MegaBLAST Ciktisi Icerigi - RefSeq Veritabani

Asagida, deneme FASTA dosyasini refseq_genomic veritabaninda arayarak elde ettigim dosyadan, bir hitin ayrintilarini goruyoruz. >>>>refseq_genomic_complete3: AC_000033_0310 Continuation (311 of 1357) of AC_000033 from base 31000001 (AC_000033 Mus musculus strain mixed chromosome 11, alternate assembly Mm_Celera, whole genome shotgun sequence. 2/2012) Length = 110000 Score = 115 bits (58), Expect = 4e-22 Identities = 74/79 (93%), Gaps = 2/79 (2%) Strand = Plus / Minus Query: 1 ctctctctgtct-tctctctctctctgtctctctctctttctctctcttctctctctctc 59 |||||||||||| ||| ||||||||| ||||||||||| ||||||||||||||||||||| Sbjct: 89773 ctctctctgtctgtctttctctctctctctctctctctctctctctcttctctctctctc 89714 Query: 60 tttctctctgccctctctc 78 ||||||||| ||||||||| Sbjct: 89713 tttctctct-ccctctctc 89696 Ayrintilarda, ilk olarak >>>> karakterleriyle hit ile ilgili baslik bilgisi veriyor.

Blog

Bir MegaBLAST Ciktisi Icerigi - RefSeq Veritabani

Asagida, deneme FASTA dosyasini refseq_genomic veritabaninda arayarak elde ettigim dosyadan, bir hitin ayrintilarini goruyoruz. >>>>refseq_genomic_complete3: AC_000033_0310 Continuation (311 of 1357) of AC_000033 from base 31000001 (AC_000033 Mus musculus strain mixed chromosome 11, alternate assembly Mm_Celera, whole genome shotgun sequence. 2/2012) Length = 110000 Score = 115 bits (58), Expect = 4e-22 Identities = 74/79 (93%), Gaps = 2/79 (2%) Strand = Plus / Minus Query: 1 ctctctctgtct-tctctctctctctgtctctctctctttctctctcttctctctctctc 59 |||||||||||| ||| ||||||||| ||||||||||| ||||||||||||||||||||| Sbjct: 89773 ctctctctgtctgtctttctctctctctctctctctctctctctctcttctctctctctc 89714 Query: 60 tttctctctgccctctctc 78 ||||||||| ||||||||| Sbjct: 89713 tttctctct-ccctctctc 89696 Ayrintilarda, ilk olarak >>>> karakterleriyle hit ile ilgili baslik bilgisi veriyor.

Blog

Bir MegaBLAST Ciktisi Icerigi - RefSeq Veritabani

Asagida, deneme FASTA dosyasini refseq_genomic veritabaninda arayarak elde ettigim dosyadan, bir hitin ayrintilarini goruyoruz. >>>>refseq_genomic_complete3: AC_000033_0310 Continuation (311 of 1357) of AC_000033 from base 31000001 (AC_000033 Mus musculus strain mixed chromosome 11, alternate assembly Mm_Celera, whole genome shotgun sequence. 2/2012) Length = 110000 Score = 115 bits (58), Expect = 4e-22 Identities = 74/79 (93%), Gaps = 2/79 (2%) Strand = Plus / Minus Query: 1 ctctctctgtct-tctctctctctctgtctctctctctttctctctcttctctctctctc 59 |||||||||||| ||| ||||||||| ||||||||||| ||||||||||||||||||||| Sbjct: 89773 ctctctctgtctgtctttctctctctctctctctctctctctctctcttctctctctctc 89714 Query: 60 tttctctctgccctctctc 78 ||||||||| ||||||||| Sbjct: 89713 tttctctct-ccctctctc 89696 Ayrintilarda, ilk olarak >>>> karakterleriyle hit ile ilgili baslik bilgisi veriyor.

Blog

Veritabanina Gore Bir Komutun Calisma Suresi - CPU Runtime

Calisilan dosyalar, veritabanları buyuk olunca ve yeterince bilgisayar gucune sahip olmayınca, her seyden once olcmemiz gereken nasil en etkili ve kisa surede sonucu alabiliyor olmamizdir. Özellikle projemde, farkli veritabanları ve farkli parametreler kullanarak, bunları arastiriyorum. Şimdilik dort veritabani deniyorum, bunlar: nrnuc, ensembl_cdna, honest ve refseq_genomic. Ayrica, bunu farkli iki kelime uzunluğuna gore de yapacagim. Kelime uzunluğu (word size) MegaBLAST’in ararken tam olarak eslestirecegi baz cifti sayisi. Yani elimde 151 baz ciftine sahip bir dizilim varsa, ve eger kelime uzunluğu 50 olarak belirlenmişse, bu 151 baz cifti icinden herhangi bir yerden baslayan ama arka arkaya en az 50 bazin dizilendiği kisimlar aranacak.

Blog

Veritabanina Gore Bir Komutun Calisma Suresi - CPU Runtime

Calisilan dosyalar, veritabanları buyuk olunca ve yeterince bilgisayar gucune sahip olmayınca, her seyden once olcmemiz gereken nasil en etkili ve kisa surede sonucu alabiliyor olmamizdir. Özellikle projemde, farkli veritabanları ve farkli parametreler kullanarak, bunları arastiriyorum. Şimdilik dort veritabani deniyorum, bunlar: nrnuc, ensembl_cdna, honest ve refseq_genomic. Ayrica, bunu farkli iki kelime uzunluğuna gore de yapacagim. Kelime uzunluğu (word size) MegaBLAST’in ararken tam olarak eslestirecegi baz cifti sayisi. Yani elimde 151 baz ciftine sahip bir dizilim varsa, ve eger kelime uzunluğu 50 olarak belirlenmişse, bu 151 baz cifti icinden herhangi bir yerden baslayan ama arka arkaya en az 50 bazin dizilendiği kisimlar aranacak.

Blog

Veritabani Secimi

Bu projedeki amacim olasi kirleten organizmalari (kontaminantlari) bulmak. Dolayisiyla genis bir veritabanina ihtiyacim var. Ancak veritabanini genis tutmak boyle bir avantaj sagliyorken, her dizi icin o veritabaninda arama yapmak oldukca fazla bilgisayar gucu ve zaman gerektiriyor. Bu yuzden projemi gelistirirken, cesitli veritabanlarini da inceliyorum. Ve ayrica bunlari nasil kisitlayarak, amacim icin en uygun hale getirebilecegimi arastiriyorum. Ilk olarak NCBI’in Reference Sequence (Kaynak Dizi ya da Referans Sekans) – RefSeq – veritabaniyla basladim.

Blog

Veritabanina Gore Bir Komutun Calisma Suresi - CPU Runtime

Calisilan dosyalar, veritabanları buyuk olunca ve yeterince bilgisayar gucune sahip olmayınca, her seyden once olcmemiz gereken nasil en etkili ve kisa surede sonucu alabiliyor olmamizdir. Özellikle projemde, farkli veritabanları ve farkli parametreler kullanarak, bunları arastiriyorum. Şimdilik dort veritabani deniyorum, bunlar: nrnuc, ensembl_cdna, honest ve refseq_genomic. Ayrica, bunu farkli iki kelime uzunluğuna gore de yapacagim. Kelime uzunluğu (word size) MegaBLAST’in ararken tam olarak eslestirecegi baz cifti sayisi. Yani elimde 151 baz ciftine sahip bir dizilim varsa, ve eger kelime uzunluğu 50 olarak belirlenmişse, bu 151 baz cifti icinden herhangi bir yerden baslayan ama arka arkaya en az 50 bazin dizilendiği kisimlar aranacak.

Blog

Veritabani Secimi

Bu projedeki amacim olasi kirleten organizmalari (kontaminantlari) bulmak. Dolayisiyla genis bir veritabanina ihtiyacim var. Ancak veritabanini genis tutmak boyle bir avantaj sagliyorken, her dizi icin o veritabaninda arama yapmak oldukca fazla bilgisayar gucu ve zaman gerektiriyor. Bu yuzden projemi gelistirirken, cesitli veritabanlarini da inceliyorum. Ve ayrica bunlari nasil kisitlayarak, amacim icin en uygun hale getirebilecegimi arastiriyorum. Ilk olarak NCBI’in Reference Sequence (Kaynak Dizi ya da Referans Sekans) – RefSeq – veritabaniyla basladim.

Blog

Veritabanina Gore Bir Komutun Calisma Suresi - CPU Runtime

Calisilan dosyalar, veritabanları buyuk olunca ve yeterince bilgisayar gucune sahip olmayınca, her seyden once olcmemiz gereken nasil en etkili ve kisa surede sonucu alabiliyor olmamizdir. Özellikle projemde, farkli veritabanları ve farkli parametreler kullanarak, bunları arastiriyorum. Şimdilik dort veritabani deniyorum, bunlar: nrnuc, ensembl_cdna, honest ve refseq_genomic. Ayrica, bunu farkli iki kelime uzunluğuna gore de yapacagim. Kelime uzunluğu (word size) MegaBLAST’in ararken tam olarak eslestirecegi baz cifti sayisi. Yani elimde 151 baz ciftine sahip bir dizilim varsa, ve eger kelime uzunluğu 50 olarak belirlenmişse, bu 151 baz cifti icinden herhangi bir yerden baslayan ama arka arkaya en az 50 bazin dizilendiği kisimlar aranacak.

Blog

FASTQ'dan FASTA'ya Donusturme Perl Scripti

FASTQ ve FASTA formatlari aslinda ayni bilgiyi iceren ancak birinde sadece herbir dizi icin iki satir daha az bilginin bulundugu dosya formatlari. Projemde onemli olan diger bir farklari ise FASTA formatinin direkt olarak MegaBLAST arama yapilabilmesi. Iste bu yuzden, genetik dizilim yapan makinelerin olusturdugu FASTQ formatini FASTA’ya cevirmem gerekiyor. Ve bu script pipeline’in ilk adimi. Aslinda deneme amacli aldigim genetk dizilimin, bana bunu ulastiran tarafindan eslestirmesinin yapilmadigi icin, bir on adim olarak bu eslestirmeyi yapmistim.

Blog

FASTQ'dan FASTA'ya Donusturme Perl Scripti

FASTQ ve FASTA formatlari aslinda ayni bilgiyi iceren ancak birinde sadece herbir dizi icin iki satir daha az bilginin bulundugu dosya formatlari. Projemde onemli olan diger bir farklari ise FASTA formatinin direkt olarak MegaBLAST arama yapilabilmesi. Iste bu yuzden, genetik dizilim yapan makinelerin olusturdugu FASTQ formatini FASTA’ya cevirmem gerekiyor. Ve bu script pipeline’in ilk adimi. Aslinda deneme amacli aldigim genetk dizilimin, bana bunu ulastiran tarafindan eslestirmesinin yapilmadigi icin, bir on adim olarak bu eslestirmeyi yapmistim.

Blog

Eşleştirme ve Eşleşmeyen Okumaları Çıkarma Sonuçları

Daha önce verinin sadece bir kısmı ile çalışıyordum ancak artık tamamıyla çalışacağım. Bu yüzden bana sıkıştırılmış halde gelen veriyi direkt çalışma klasörüme çıkardım ve onun üzerinden işlemler yaptım. Başlangıç (FASTQ) dosyamın boyutu 2153988289 bayt (2 GB). Ve bwa aracılığıyla eşleştirmeden sonra toplamda 6004193 dizilim, ya da okuma, (sequences ya da reads) ortaya çıktı. Daha sonra eşleşmeyen okumaları çıkarmam sonrasında toplam okuma sayısı 551065 kadar azaldı ve 5493128 oldu. Yani verinin %9.

Blog

SAM Dosyası - BAM Dosyası - samtools

Aslında programlamam gereken pipeline direkt olarak eşleşmeyen okumalar üzerinden analizler yapacak. Ancak böyle bir veri bulamadiığım için, elimdeki tek veri eşleşen ve eşleşmeyen okumaları içerdiği için önce eşleşenlerden kurtulmam gerekti. Bunu daha önce de belirttiğim gibi bwa eşleştiricisi (aligner - mapper) ile yapıyorum. bwa bir dizi işlemden sonra SAM dosyası oluşturuyor ancak benim FASTQ dosyasına ihtiyacım var. Bunun için SAM dosyasını samtools1 ile benzer bir format olan BAM dosyasına çevirip, daha sonra da bam2fastq2 aracı ile FASTQ dosyamı elde edeceğim.

Blog

İlk Adım: Eşleşmeyen Okumaları Elde Etmek

Projemin ilk kısmı daha önce bahsettiğim gibi eşleşmeyen okumaları (unmapped reads) FASTQ dosyasından çıkarmak. Böylece, daha sonraki analizler için elimdeki ihtiyacım olmayan dizileri çıkarmış ve bu analizlerdeki iş yükünü azaltmış oluyorum. Başından beri hedefim, tüm projeyi adım adım gerçekleştiren bir pipeline tasarlamak olduğu için bu işlemi bir Perl scripti ile yapacağım. Bu script pipeline’in ilk scripti ve laboratuvardan gelecek ham (raw) FASTQ formatındaki verinin girdi (input) olarak kullanılacağı yer. Aslında bu scripte ihtiyacım olmayacak, sadece elimdeki verinin eşlenebilen verileri de içermesi sebebiyle bu adımı ekledim.

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

Eşleştirme ve Eşleşmeyen Okumaları Çıkarma Sonuçları

Daha önce verinin sadece bir kısmı ile çalışıyordum ancak artık tamamıyla çalışacağım. Bu yüzden bana sıkıştırılmış halde gelen veriyi direkt çalışma klasörüme çıkardım ve onun üzerinden işlemler yaptım. Başlangıç (FASTQ) dosyamın boyutu 2153988289 bayt (2 GB). Ve bwa aracılığıyla eşleştirmeden sonra toplamda 6004193 dizilim, ya da okuma, (sequences ya da reads) ortaya çıktı. Daha sonra eşleşmeyen okumaları çıkarmam sonrasında toplam okuma sayısı 551065 kadar azaldı ve 5493128 oldu. Yani verinin %9.

Blog

BWA İle Eşleştirme (Mapping - Alignment)

Bunu daha önce yazmayı unutmuşum. Aslında bahsetmiştim ancak nasıl yapıldığına dair bir şeyler yazmamışım ayrıca örnek komutlar da eklememişim. BWA elimizdeki (FASTQ formatındaki) DNA dizilimini, referans genomunu (projemde bu insan genomu) alarak bir .sai dosyası oluşturuyor. Bu dosya dizinin ve referans genomunun eşleşmesi ile ilgili bilgiler taşiyor ve bu bilgileri kullanarak eşleşmeyenleri ayırabiliyorum. İlk olarak aşağıdaki komut ile .sai dosyamızı oluşturuyoruz. 1bwa aln $NGSDATAROOT/bwa/human_genome37 ChIP_NoIndex_L001_R1_complete_filtered.fastq > complete_alignment.sai Oluşturduğumuz .sai dosyası çok da kullanışlı bir dosya değil, bu yüzden onu SAM dosyasına çevirerek, işlemlere devam ediyoruz.

Blog

BWA İle Eşleştirme (Mapping - Alignment)

Bunu daha önce yazmayı unutmuşum. Aslında bahsetmiştim ancak nasıl yapıldığına dair bir şeyler yazmamışım ayrıca örnek komutlar da eklememişim. BWA elimizdeki (FASTQ formatındaki) DNA dizilimini, referans genomunu (projemde bu insan genomu) alarak bir .sai dosyası oluşturuyor. Bu dosya dizinin ve referans genomunun eşleşmesi ile ilgili bilgiler taşiyor ve bu bilgileri kullanarak eşleşmeyenleri ayırabiliyorum. İlk olarak aşağıdaki komut ile .sai dosyamızı oluşturuyoruz. 1bwa aln $NGSDATAROOT/bwa/human_genome37 ChIP_NoIndex_L001_R1_complete_filtered.fastq > complete_alignment.sai Oluşturduğumuz .sai dosyası çok da kullanışlı bir dosya değil, bu yüzden onu SAM dosyasına çevirerek, işlemlere devam ediyoruz.

Blog

BWA İle Eşleştirme (Mapping - Alignment)

Bunu daha önce yazmayı unutmuşum. Aslında bahsetmiştim ancak nasıl yapıldığına dair bir şeyler yazmamışım ayrıca örnek komutlar da eklememişim. BWA elimizdeki (FASTQ formatındaki) DNA dizilimini, referans genomunu (projemde bu insan genomu) alarak bir .sai dosyası oluşturuyor. Bu dosya dizinin ve referans genomunun eşleşmesi ile ilgili bilgiler taşiyor ve bu bilgileri kullanarak eşleşmeyenleri ayırabiliyorum. İlk olarak aşağıdaki komut ile .sai dosyamızı oluşturuyoruz. 1bwa aln $NGSDATAROOT/bwa/human_genome37 ChIP_NoIndex_L001_R1_complete_filtered.fastq > complete_alignment.sai Oluşturduğumuz .sai dosyası çok da kullanışlı bir dosya değil, bu yüzden onu SAM dosyasına çevirerek, işlemlere devam ediyoruz.

Blog

BWA İle Eşleştirme (Mapping - Alignment)

Bunu daha önce yazmayı unutmuşum. Aslında bahsetmiştim ancak nasıl yapıldığına dair bir şeyler yazmamışım ayrıca örnek komutlar da eklememişim. BWA elimizdeki (FASTQ formatındaki) DNA dizilimini, referans genomunu (projemde bu insan genomu) alarak bir .sai dosyası oluşturuyor. Bu dosya dizinin ve referans genomunun eşleşmesi ile ilgili bilgiler taşiyor ve bu bilgileri kullanarak eşleşmeyenleri ayırabiliyorum. İlk olarak aşağıdaki komut ile .sai dosyamızı oluşturuyoruz. 1bwa aln $NGSDATAROOT/bwa/human_genome37 ChIP_NoIndex_L001_R1_complete_filtered.fastq > complete_alignment.sai Oluşturduğumuz .sai dosyası çok da kullanışlı bir dosya değil, bu yüzden onu SAM dosyasına çevirerek, işlemlere devam ediyoruz.

Blog

BWA İle Eşleştirme (Mapping - Alignment)

Bunu daha önce yazmayı unutmuşum. Aslında bahsetmiştim ancak nasıl yapıldığına dair bir şeyler yazmamışım ayrıca örnek komutlar da eklememişim. BWA elimizdeki (FASTQ formatındaki) DNA dizilimini, referans genomunu (projemde bu insan genomu) alarak bir .sai dosyası oluşturuyor. Bu dosya dizinin ve referans genomunun eşleşmesi ile ilgili bilgiler taşiyor ve bu bilgileri kullanarak eşleşmeyenleri ayırabiliyorum. İlk olarak aşağıdaki komut ile .sai dosyamızı oluşturuyoruz. 1bwa aln $NGSDATAROOT/bwa/human_genome37 ChIP_NoIndex_L001_R1_complete_filtered.fastq > complete_alignment.sai Oluşturduğumuz .sai dosyası çok da kullanışlı bir dosya değil, bu yüzden onu SAM dosyasına çevirerek, işlemlere devam ediyoruz.

Blog

BWA İle Eşleştirme (Mapping - Alignment)

Bunu daha önce yazmayı unutmuşum. Aslında bahsetmiştim ancak nasıl yapıldığına dair bir şeyler yazmamışım ayrıca örnek komutlar da eklememişim. BWA elimizdeki (FASTQ formatındaki) DNA dizilimini, referans genomunu (projemde bu insan genomu) alarak bir .sai dosyası oluşturuyor. Bu dosya dizinin ve referans genomunun eşleşmesi ile ilgili bilgiler taşiyor ve bu bilgileri kullanarak eşleşmeyenleri ayırabiliyorum. İlk olarak aşağıdaki komut ile .sai dosyamızı oluşturuyoruz. 1bwa aln $NGSDATAROOT/bwa/human_genome37 ChIP_NoIndex_L001_R1_complete_filtered.fastq > complete_alignment.sai Oluşturduğumuz .sai dosyası çok da kullanışlı bir dosya değil, bu yüzden onu SAM dosyasına çevirerek, işlemlere devam ediyoruz.

Blog

BWA İle Eşleştirme (Mapping - Alignment)

Bunu daha önce yazmayı unutmuşum. Aslında bahsetmiştim ancak nasıl yapıldığına dair bir şeyler yazmamışım ayrıca örnek komutlar da eklememişim. BWA elimizdeki (FASTQ formatındaki) DNA dizilimini, referans genomunu (projemde bu insan genomu) alarak bir .sai dosyası oluşturuyor. Bu dosya dizinin ve referans genomunun eşleşmesi ile ilgili bilgiler taşiyor ve bu bilgileri kullanarak eşleşmeyenleri ayırabiliyorum. İlk olarak aşağıdaki komut ile .sai dosyamızı oluşturuyoruz. 1bwa aln $NGSDATAROOT/bwa/human_genome37 ChIP_NoIndex_L001_R1_complete_filtered.fastq > complete_alignment.sai Oluşturduğumuz .sai dosyası çok da kullanışlı bir dosya değil, bu yüzden onu SAM dosyasına çevirerek, işlemlere devam ediyoruz.

Blog

BWA İle Eşleştirme (Mapping - Alignment)

Bunu daha önce yazmayı unutmuşum. Aslında bahsetmiştim ancak nasıl yapıldığına dair bir şeyler yazmamışım ayrıca örnek komutlar da eklememişim. BWA elimizdeki (FASTQ formatındaki) DNA dizilimini, referans genomunu (projemde bu insan genomu) alarak bir .sai dosyası oluşturuyor. Bu dosya dizinin ve referans genomunun eşleşmesi ile ilgili bilgiler taşiyor ve bu bilgileri kullanarak eşleşmeyenleri ayırabiliyorum. İlk olarak aşağıdaki komut ile .sai dosyamızı oluşturuyoruz. 1bwa aln $NGSDATAROOT/bwa/human_genome37 ChIP_NoIndex_L001_R1_complete_filtered.fastq > complete_alignment.sai Oluşturduğumuz .sai dosyası çok da kullanışlı bir dosya değil, bu yüzden onu SAM dosyasına çevirerek, işlemlere devam ediyoruz.

Blog

SAM Dosyası - BAM Dosyası - samtools

Aslında programlamam gereken pipeline direkt olarak eşleşmeyen okumalar üzerinden analizler yapacak. Ancak böyle bir veri bulamadiığım için, elimdeki tek veri eşleşen ve eşleşmeyen okumaları içerdiği için önce eşleşenlerden kurtulmam gerekti. Bunu daha önce de belirttiğim gibi bwa eşleştiricisi (aligner - mapper) ile yapıyorum. bwa bir dizi işlemden sonra SAM dosyası oluşturuyor ancak benim FASTQ dosyasına ihtiyacım var. Bunun için SAM dosyasını samtools1 ile benzer bir format olan BAM dosyasına çevirip, daha sonra da bam2fastq2 aracı ile FASTQ dosyamı elde edeceğim.

Blog

SAM Dosyası - BAM Dosyası - samtools

Aslında programlamam gereken pipeline direkt olarak eşleşmeyen okumalar üzerinden analizler yapacak. Ancak böyle bir veri bulamadiığım için, elimdeki tek veri eşleşen ve eşleşmeyen okumaları içerdiği için önce eşleşenlerden kurtulmam gerekti. Bunu daha önce de belirttiğim gibi bwa eşleştiricisi (aligner - mapper) ile yapıyorum. bwa bir dizi işlemden sonra SAM dosyası oluşturuyor ancak benim FASTQ dosyasına ihtiyacım var. Bunun için SAM dosyasını samtools1 ile benzer bir format olan BAM dosyasına çevirip, daha sonra da bam2fastq2 aracı ile FASTQ dosyamı elde edeceğim.

Blog

İlk Adım: Eşleşmeyen Okumaları Elde Etmek

Projemin ilk kısmı daha önce bahsettiğim gibi eşleşmeyen okumaları (unmapped reads) FASTQ dosyasından çıkarmak. Böylece, daha sonraki analizler için elimdeki ihtiyacım olmayan dizileri çıkarmış ve bu analizlerdeki iş yükünü azaltmış oluyorum. Başından beri hedefim, tüm projeyi adım adım gerçekleştiren bir pipeline tasarlamak olduğu için bu işlemi bir Perl scripti ile yapacağım. Bu script pipeline’in ilk scripti ve laboratuvardan gelecek ham (raw) FASTQ formatındaki verinin girdi (input) olarak kullanılacağı yer. Aslında bu scripte ihtiyacım olmayacak, sadece elimdeki verinin eşlenebilen verileri de içermesi sebebiyle bu adımı ekledim.

Blog

İlk Adım: Eşleşmeyen Okumaları Elde Etmek

Projemin ilk kısmı daha önce bahsettiğim gibi eşleşmeyen okumaları (unmapped reads) FASTQ dosyasından çıkarmak. Böylece, daha sonraki analizler için elimdeki ihtiyacım olmayan dizileri çıkarmış ve bu analizlerdeki iş yükünü azaltmış oluyorum. Başından beri hedefim, tüm projeyi adım adım gerçekleştiren bir pipeline tasarlamak olduğu için bu işlemi bir Perl scripti ile yapacağım. Bu script pipeline’in ilk scripti ve laboratuvardan gelecek ham (raw) FASTQ formatındaki verinin girdi (input) olarak kullanılacağı yer. Aslında bu scripte ihtiyacım olmayacak, sadece elimdeki verinin eşlenebilen verileri de içermesi sebebiyle bu adımı ekledim.

Blog

İlk Adım: Eşleşmeyen Okumaları Elde Etmek

Projemin ilk kısmı daha önce bahsettiğim gibi eşleşmeyen okumaları (unmapped reads) FASTQ dosyasından çıkarmak. Böylece, daha sonraki analizler için elimdeki ihtiyacım olmayan dizileri çıkarmış ve bu analizlerdeki iş yükünü azaltmış oluyorum. Başından beri hedefim, tüm projeyi adım adım gerçekleştiren bir pipeline tasarlamak olduğu için bu işlemi bir Perl scripti ile yapacağım. Bu script pipeline’in ilk scripti ve laboratuvardan gelecek ham (raw) FASTQ formatındaki verinin girdi (input) olarak kullanılacağı yer. Aslında bu scripte ihtiyacım olmayacak, sadece elimdeki verinin eşlenebilen verileri de içermesi sebebiyle bu adımı ekledim.

Blog

İlk Adım: Eşleşmeyen Okumaları Elde Etmek

Projemin ilk kısmı daha önce bahsettiğim gibi eşleşmeyen okumaları (unmapped reads) FASTQ dosyasından çıkarmak. Böylece, daha sonraki analizler için elimdeki ihtiyacım olmayan dizileri çıkarmış ve bu analizlerdeki iş yükünü azaltmış oluyorum. Başından beri hedefim, tüm projeyi adım adım gerçekleştiren bir pipeline tasarlamak olduğu için bu işlemi bir Perl scripti ile yapacağım. Bu script pipeline’in ilk scripti ve laboratuvardan gelecek ham (raw) FASTQ formatındaki verinin girdi (input) olarak kullanılacağı yer. Aslında bu scripte ihtiyacım olmayacak, sadece elimdeki verinin eşlenebilen verileri de içermesi sebebiyle bu adımı ekledim.

Blog

İlk Adım: Eşleşmeyen Okumaları Elde Etmek

Projemin ilk kısmı daha önce bahsettiğim gibi eşleşmeyen okumaları (unmapped reads) FASTQ dosyasından çıkarmak. Böylece, daha sonraki analizler için elimdeki ihtiyacım olmayan dizileri çıkarmış ve bu analizlerdeki iş yükünü azaltmış oluyorum. Başından beri hedefim, tüm projeyi adım adım gerçekleştiren bir pipeline tasarlamak olduğu için bu işlemi bir Perl scripti ile yapacağım. Bu script pipeline’in ilk scripti ve laboratuvardan gelecek ham (raw) FASTQ formatındaki verinin girdi (input) olarak kullanılacağı yer. Aslında bu scripte ihtiyacım olmayacak, sadece elimdeki verinin eşlenebilen verileri de içermesi sebebiyle bu adımı ekledim.

Blog

İlk Adım: Eşleşmeyen Okumaları Elde Etmek

Projemin ilk kısmı daha önce bahsettiğim gibi eşleşmeyen okumaları (unmapped reads) FASTQ dosyasından çıkarmak. Böylece, daha sonraki analizler için elimdeki ihtiyacım olmayan dizileri çıkarmış ve bu analizlerdeki iş yükünü azaltmış oluyorum. Başından beri hedefim, tüm projeyi adım adım gerçekleştiren bir pipeline tasarlamak olduğu için bu işlemi bir Perl scripti ile yapacağım. Bu script pipeline’in ilk scripti ve laboratuvardan gelecek ham (raw) FASTQ formatındaki verinin girdi (input) olarak kullanılacağı yer. Aslında bu scripte ihtiyacım olmayacak, sadece elimdeki verinin eşlenebilen verileri de içermesi sebebiyle bu adımı ekledim.

Blog

İlk Adım: Eşleşmeyen Okumaları Elde Etmek

Projemin ilk kısmı daha önce bahsettiğim gibi eşleşmeyen okumaları (unmapped reads) FASTQ dosyasından çıkarmak. Böylece, daha sonraki analizler için elimdeki ihtiyacım olmayan dizileri çıkarmış ve bu analizlerdeki iş yükünü azaltmış oluyorum. Başından beri hedefim, tüm projeyi adım adım gerçekleştiren bir pipeline tasarlamak olduğu için bu işlemi bir Perl scripti ile yapacağım. Bu script pipeline’in ilk scripti ve laboratuvardan gelecek ham (raw) FASTQ formatındaki verinin girdi (input) olarak kullanılacağı yer. Aslında bu scripte ihtiyacım olmayacak, sadece elimdeki verinin eşlenebilen verileri de içermesi sebebiyle bu adımı ekledim.

Blog

Blog Yazılarını Facebook Twitter ve LinkedIn'e Yönlendirmek

İlgilendiğim bir konu üzerine bir blog açıp, bilgilendirici yazılar yazmak uzun süredir aklımda olan bir şeydi. Sonunda ufak ufak yazılarıma başladım. Umarım şu ana kadar güzel gitmiştir. Bu yazıda blog başlığıyla çok alakalı olmayan “konu-dışı” bir konudan bahsedeceğim. Yazılarımı kolay bir şekilde geniş kitleye ulaştırmak için sosyal medyayı kullanmak istiyordum ama her seferinde yazının bağlantısını kopyala-yapıştır yapmak hiç de basit bir iş değil. Aramalarım sonunda bunu, blogumuzu Facebook, Twitter ve LinkedIn hesaplarımıza bağlayarak aynı anda yeni yazıları yönlendirebildiğimiz bir araç buldum.

Blog

Blog Yazılarını Facebook Twitter ve LinkedIn'e Yönlendirmek

İlgilendiğim bir konu üzerine bir blog açıp, bilgilendirici yazılar yazmak uzun süredir aklımda olan bir şeydi. Sonunda ufak ufak yazılarıma başladım. Umarım şu ana kadar güzel gitmiştir. Bu yazıda blog başlığıyla çok alakalı olmayan “konu-dışı” bir konudan bahsedeceğim. Yazılarımı kolay bir şekilde geniş kitleye ulaştırmak için sosyal medyayı kullanmak istiyordum ama her seferinde yazının bağlantısını kopyala-yapıştır yapmak hiç de basit bir iş değil. Aramalarım sonunda bunu, blogumuzu Facebook, Twitter ve LinkedIn hesaplarımıza bağlayarak aynı anda yeni yazıları yönlendirebildiğimiz bir araç buldum.

Blog

Blog Yazılarını Facebook Twitter ve LinkedIn'e Yönlendirmek

İlgilendiğim bir konu üzerine bir blog açıp, bilgilendirici yazılar yazmak uzun süredir aklımda olan bir şeydi. Sonunda ufak ufak yazılarıma başladım. Umarım şu ana kadar güzel gitmiştir. Bu yazıda blog başlığıyla çok alakalı olmayan “konu-dışı” bir konudan bahsedeceğim. Yazılarımı kolay bir şekilde geniş kitleye ulaştırmak için sosyal medyayı kullanmak istiyordum ama her seferinde yazının bağlantısını kopyala-yapıştır yapmak hiç de basit bir iş değil. Aramalarım sonunda bunu, blogumuzu Facebook, Twitter ve LinkedIn hesaplarımıza bağlayarak aynı anda yeni yazıları yönlendirebildiğimiz bir araç buldum.

Blog

Blog Yazılarını Facebook Twitter ve LinkedIn'e Yönlendirmek

İlgilendiğim bir konu üzerine bir blog açıp, bilgilendirici yazılar yazmak uzun süredir aklımda olan bir şeydi. Sonunda ufak ufak yazılarıma başladım. Umarım şu ana kadar güzel gitmiştir. Bu yazıda blog başlığıyla çok alakalı olmayan “konu-dışı” bir konudan bahsedeceğim. Yazılarımı kolay bir şekilde geniş kitleye ulaştırmak için sosyal medyayı kullanmak istiyordum ama her seferinde yazının bağlantısını kopyala-yapıştır yapmak hiç de basit bir iş değil. Aramalarım sonunda bunu, blogumuzu Facebook, Twitter ve LinkedIn hesaplarımıza bağlayarak aynı anda yeni yazıları yönlendirebildiğimiz bir araç buldum.

Blog

Blog Yazılarını Facebook Twitter ve LinkedIn'e Yönlendirmek

İlgilendiğim bir konu üzerine bir blog açıp, bilgilendirici yazılar yazmak uzun süredir aklımda olan bir şeydi. Sonunda ufak ufak yazılarıma başladım. Umarım şu ana kadar güzel gitmiştir. Bu yazıda blog başlığıyla çok alakalı olmayan “konu-dışı” bir konudan bahsedeceğim. Yazılarımı kolay bir şekilde geniş kitleye ulaştırmak için sosyal medyayı kullanmak istiyordum ama her seferinde yazının bağlantısını kopyala-yapıştır yapmak hiç de basit bir iş değil. Aramalarım sonunda bunu, blogumuzu Facebook, Twitter ve LinkedIn hesaplarımıza bağlayarak aynı anda yeni yazıları yönlendirebildiğimiz bir araç buldum.

Blog

Blog Yazılarını Facebook Twitter ve LinkedIn'e Yönlendirmek

İlgilendiğim bir konu üzerine bir blog açıp, bilgilendirici yazılar yazmak uzun süredir aklımda olan bir şeydi. Sonunda ufak ufak yazılarıma başladım. Umarım şu ana kadar güzel gitmiştir. Bu yazıda blog başlığıyla çok alakalı olmayan “konu-dışı” bir konudan bahsedeceğim. Yazılarımı kolay bir şekilde geniş kitleye ulaştırmak için sosyal medyayı kullanmak istiyordum ama her seferinde yazının bağlantısını kopyala-yapıştır yapmak hiç de basit bir iş değil. Aramalarım sonunda bunu, blogumuzu Facebook, Twitter ve LinkedIn hesaplarımıza bağlayarak aynı anda yeni yazıları yönlendirebildiğimiz bir araç buldum.

Blog

Blog Yazılarını Facebook Twitter ve LinkedIn'e Yönlendirmek

İlgilendiğim bir konu üzerine bir blog açıp, bilgilendirici yazılar yazmak uzun süredir aklımda olan bir şeydi. Sonunda ufak ufak yazılarıma başladım. Umarım şu ana kadar güzel gitmiştir. Bu yazıda blog başlığıyla çok alakalı olmayan “konu-dışı” bir konudan bahsedeceğim. Yazılarımı kolay bir şekilde geniş kitleye ulaştırmak için sosyal medyayı kullanmak istiyordum ama her seferinde yazının bağlantısını kopyala-yapıştır yapmak hiç de basit bir iş değil. Aramalarım sonunda bunu, blogumuzu Facebook, Twitter ve LinkedIn hesaplarımıza bağlayarak aynı anda yeni yazıları yönlendirebildiğimiz bir araç buldum.

Blog

Blog Yazılarını Facebook Twitter ve LinkedIn'e Yönlendirmek

İlgilendiğim bir konu üzerine bir blog açıp, bilgilendirici yazılar yazmak uzun süredir aklımda olan bir şeydi. Sonunda ufak ufak yazılarıma başladım. Umarım şu ana kadar güzel gitmiştir. Bu yazıda blog başlığıyla çok alakalı olmayan “konu-dışı” bir konudan bahsedeceğim. Yazılarımı kolay bir şekilde geniş kitleye ulaştırmak için sosyal medyayı kullanmak istiyordum ama her seferinde yazının bağlantısını kopyala-yapıştır yapmak hiç de basit bir iş değil. Aramalarım sonunda bunu, blogumuzu Facebook, Twitter ve LinkedIn hesaplarımıza bağlayarak aynı anda yeni yazıları yönlendirebildiğimiz bir araç buldum.

Blog

Blog Yazılarını Facebook Twitter ve LinkedIn'e Yönlendirmek

İlgilendiğim bir konu üzerine bir blog açıp, bilgilendirici yazılar yazmak uzun süredir aklımda olan bir şeydi. Sonunda ufak ufak yazılarıma başladım. Umarım şu ana kadar güzel gitmiştir. Bu yazıda blog başlığıyla çok alakalı olmayan “konu-dışı” bir konudan bahsedeceğim. Yazılarımı kolay bir şekilde geniş kitleye ulaştırmak için sosyal medyayı kullanmak istiyordum ama her seferinde yazının bağlantısını kopyala-yapıştır yapmak hiç de basit bir iş değil. Aramalarım sonunda bunu, blogumuzu Facebook, Twitter ve LinkedIn hesaplarımıza bağlayarak aynı anda yeni yazıları yönlendirebildiğimiz bir araç buldum.

Blog

Blog Yazılarını Facebook Twitter ve LinkedIn'e Yönlendirmek

İlgilendiğim bir konu üzerine bir blog açıp, bilgilendirici yazılar yazmak uzun süredir aklımda olan bir şeydi. Sonunda ufak ufak yazılarıma başladım. Umarım şu ana kadar güzel gitmiştir. Bu yazıda blog başlığıyla çok alakalı olmayan “konu-dışı” bir konudan bahsedeceğim. Yazılarımı kolay bir şekilde geniş kitleye ulaştırmak için sosyal medyayı kullanmak istiyordum ama her seferinde yazının bağlantısını kopyala-yapıştır yapmak hiç de basit bir iş değil. Aramalarım sonunda bunu, blogumuzu Facebook, Twitter ve LinkedIn hesaplarımıza bağlayarak aynı anda yeni yazıları yönlendirebildiğimiz bir araç buldum.

Blog

Blog Yazılarını Facebook Twitter ve LinkedIn'e Yönlendirmek

İlgilendiğim bir konu üzerine bir blog açıp, bilgilendirici yazılar yazmak uzun süredir aklımda olan bir şeydi. Sonunda ufak ufak yazılarıma başladım. Umarım şu ana kadar güzel gitmiştir. Bu yazıda blog başlığıyla çok alakalı olmayan “konu-dışı” bir konudan bahsedeceğim. Yazılarımı kolay bir şekilde geniş kitleye ulaştırmak için sosyal medyayı kullanmak istiyordum ama her seferinde yazının bağlantısını kopyala-yapıştır yapmak hiç de basit bir iş değil. Aramalarım sonunda bunu, blogumuzu Facebook, Twitter ve LinkedIn hesaplarımıza bağlayarak aynı anda yeni yazıları yönlendirebildiğimiz bir araç buldum.

Blog

MegaBLAST - Dizilerdeki Benzerlikleri Bulma Aracı

MegaBLAST, HUSAR paketinde bulunan, BLAST (Basic Local Alignment Search Tool) paketinin bir parçası. Ayrıca BLASTN’in bir değişik türü. MegaBLAST uzun dizileri BLASTN’den daha etkili bir şekilde işliyor ve hem de çok daha hızlı işlem yapiyor ancak daha az duyarlı. Bu yüzden benzer dizileri geniş veri tabanlarında aramaya çok uygun bir araç. Yazacağım program çoklu dizilim barındıran FASTA dosyasını alacak ve megablast komutunu çalıştıracak. Daha sonra da her okuma için bir .

Blog

MegaBLAST - Dizilerdeki Benzerlikleri Bulma Aracı

MegaBLAST, HUSAR paketinde bulunan, BLAST (Basic Local Alignment Search Tool) paketinin bir parçası. Ayrıca BLASTN’in bir değişik türü. MegaBLAST uzun dizileri BLASTN’den daha etkili bir şekilde işliyor ve hem de çok daha hızlı işlem yapiyor ancak daha az duyarlı. Bu yüzden benzer dizileri geniş veri tabanlarında aramaya çok uygun bir araç. Yazacağım program çoklu dizilim barındıran FASTA dosyasını alacak ve megablast komutunu çalıştıracak. Daha sonra da her okuma için bir .

Blog

MegaBLAST - Dizilerdeki Benzerlikleri Bulma Aracı

MegaBLAST, HUSAR paketinde bulunan, BLAST (Basic Local Alignment Search Tool) paketinin bir parçası. Ayrıca BLASTN’in bir değişik türü. MegaBLAST uzun dizileri BLASTN’den daha etkili bir şekilde işliyor ve hem de çok daha hızlı işlem yapiyor ancak daha az duyarlı. Bu yüzden benzer dizileri geniş veri tabanlarında aramaya çok uygun bir araç. Yazacağım program çoklu dizilim barındıran FASTA dosyasını alacak ve megablast komutunu çalıştıracak. Daha sonra da her okuma için bir .

Blog

Kontaminant (Kirletici) Analizi Projesi

Başlangıç olarak, araçlara, programlama diline, kısacası biyoenformatiğe alışabilmem için bana verilen bu ufak projeyi ayrıntılı olarak anlatacağım. Biliyoruz ki, laboratuvar çalışmalarımızda ne kadar önlemeye çalışsak da kontaminant riski hep bulunuyor. Bunu ne kadar aza indirsek o kadar iyi, ki daha sonra bunun miktarını bulup, bunun üzerinden sonucumuzun bir başka değerlendirmesini de yapabiliriz. İşte bunu bulmak için bir yöntem, DNA analizi. Çalıştığınız örneğinizin DNA’sı dizileniyor ve bu DNA çeşitli programlarla analiz edilip, kirleten organizmaları DNA’larından ortaya çıkarabiliyoruz

Blog

FASTQ Formatı - FASTQ Dosyası

Bugün programı oluştururken kullanacağım “test” dizilimini aldım. İki adet FASTQ dosyasından oluşuyor, her biri sıkıştırılmış ama buna rağmen boyutları 6 GB civarı. Ben elbette çok zaman kaybetmek istemediğim için bu dosyalardan birinin sadece bir kısmını kullanacağım. Amacım, bu FASTQ dosyalarındaki eşleşebilen okumaları BWA aracı ile bularak, daha sonra onları çıkarmak. Ve kalan eşleşemeyen okumaları MegaBLAST aracının anlayabileceği bir dilde (FASTA formatında) kaydetmek. Bu arada tüm projeyi bir Unix bilgisayarda hazırladığım için birçok komut öğreniyorum, daha sonra bunları ayrıca yazmaya çalışacağım.

Blog

BWA (Burrows-Wheeler Aligner) Hizalayıcı - Eşleştirici

Önceki yazımda belirttiğim gibi bir eşleştirici (aligner ya da mapper) kullanarak elimdeki verinin referans genomu ile ne derece eşlestiğini bulmaya çalışacağım. Daha sonra eşleşmeyen kısmıyla birtakım analizler yapacağım. BWA (Burrows-Wheeler Aligner) görece kısa dizilimleri insan genomu gibi uzun referans genomlarıyla eşleştiren bir program. 200bp (bp: baz çifti) uzunluğuna kadar bwa-short algoritması, 200bp - 100kbp arası ise BWA-SW algoritması kullanılıyor. Hizalayıcı - eşleştirici seçmede birçok faktör rol oynuyor. Birçok bu tip araç var ve farklı özelliklere sahipler.

Blog

Pipeline ve Pipeline Geliştirme

Bugün aldığım tanıtım derslerinin devamında, pipeline ve pipeline geliştirme ile ilgili ayrıntılı bilgiler aldım. Pipeline, aslında bildiğimiz boru hattı demek, örneğin borularla petrolün bir yerden başka bir yere taşınması için kullanılan sistem. Bunun bilgisayar terminolojisinde anlamı ise bir elementin çıktısı, diğerinin girdisi olacak şekilde oluşturulmuş işleme elementleri zinciri. Böylece çok daha komplike işlemler pipeline oluşturularak, kolay ve düzenli bir biçimde gerçekleştiriliyor. Sanırım pipeline Türkçeye ardışık düzen olarak çevriliyor, gene de ben pipeline olarak kullanacağım.

Blog

WWW2HUSAR - HUSAR'ın Web Arayüzü

Stajımın ikinci gününde HUSAR’ın web arayüzünü konuştuk. HUSAR komut isteminden komutlarla kullanılabilen, yönetilebilen bir yazılım ancak bunu kolaylaştırmak için hazırlanmış bir web arayüzü var. WWW2HUSAR adını verdikleri bu arayüz ile listelenen araçları kolayca seçebiliyor, genetik dizinizi ekleyebiliyor ve başka birçok işlemi kolayca, birkaç tık ile yapabiliyorsunuz. Bununla birlikte biraz daha HUSAR’ın işlevlerine göz attık. Yazılımda, yerel klasörde gen dizisi listeleri oluşturarak, bunları çoklu dizi hizalama (multiple sequence alignment) aracı ile genlerin benzerliklerini karşılastırabiliyor ve örneğin evrimsel ilişkilerini ortaya çıkarabiliyorsunuz.

Blog

DKFZ - Heidelberg Biyoenformatik Birimi'nde Staj

Erasmus programıyla yapıyor olduğum yaz stajı başladı. İlk olarak birimi yöneten bilim insanlarından birkaç saatlik tanıtım dersi aldım. Bu derste birimin kısa tarihi, birimin günümüze kadar yaptıkları projeler ve bunlarin ayrintilari konusunda bilgiler aldım. Biyoenformatik Birimi DKFZ’nin (Deutsches Krebsforschungszentrum – ing. German Cancer Research Center) bir çekirdek tesisi olan Genomik ve Proteomik Çekirdek Tesisi’ne bağlı bir grup. İsimleri aynı zamanda HUSAR (Heidelberg Unix Sequence Analysis Resources) ve bu isim grubun geliştirdiği dizi analizi yapma paketinin de adı olarak kullanılıyor.

Blog

Kontaminant (Kirletici) Analizi Projesi

Başlangıç olarak, araçlara, programlama diline, kısacası biyoenformatiğe alışabilmem için bana verilen bu ufak projeyi ayrıntılı olarak anlatacağım. Biliyoruz ki, laboratuvar çalışmalarımızda ne kadar önlemeye çalışsak da kontaminant riski hep bulunuyor. Bunu ne kadar aza indirsek o kadar iyi, ki daha sonra bunun miktarını bulup, bunun üzerinden sonucumuzun bir başka değerlendirmesini de yapabiliriz. İşte bunu bulmak için bir yöntem, DNA analizi. Çalıştığınız örneğinizin DNA’sı dizileniyor ve bu DNA çeşitli programlarla analiz edilip, kirleten organizmaları DNA’larından ortaya çıkarabiliyoruz

Blog

Pipeline ve Pipeline Geliştirme

Bugün aldığım tanıtım derslerinin devamında, pipeline ve pipeline geliştirme ile ilgili ayrıntılı bilgiler aldım. Pipeline, aslında bildiğimiz boru hattı demek, örneğin borularla petrolün bir yerden başka bir yere taşınması için kullanılan sistem. Bunun bilgisayar terminolojisinde anlamı ise bir elementin çıktısı, diğerinin girdisi olacak şekilde oluşturulmuş işleme elementleri zinciri. Böylece çok daha komplike işlemler pipeline oluşturularak, kolay ve düzenli bir biçimde gerçekleştiriliyor. Sanırım pipeline Türkçeye ardışık düzen olarak çevriliyor, gene de ben pipeline olarak kullanacağım.

Blog

WWW2HUSAR - HUSAR'ın Web Arayüzü

Stajımın ikinci gününde HUSAR’ın web arayüzünü konuştuk. HUSAR komut isteminden komutlarla kullanılabilen, yönetilebilen bir yazılım ancak bunu kolaylaştırmak için hazırlanmış bir web arayüzü var. WWW2HUSAR adını verdikleri bu arayüz ile listelenen araçları kolayca seçebiliyor, genetik dizinizi ekleyebiliyor ve başka birçok işlemi kolayca, birkaç tık ile yapabiliyorsunuz. Bununla birlikte biraz daha HUSAR’ın işlevlerine göz attık. Yazılımda, yerel klasörde gen dizisi listeleri oluşturarak, bunları çoklu dizi hizalama (multiple sequence alignment) aracı ile genlerin benzerliklerini karşılastırabiliyor ve örneğin evrimsel ilişkilerini ortaya çıkarabiliyorsunuz.

Blog

Biyoinformatik mi? Yoksa Biyoenformatik mi?

Yazılarıma konu ararken kitaplarla birlikte interneti de karıştırıyorum. Yabancı kaynaklar elbette fazlaca var ve yeterliler, ancak Türkçe kaynaklara baktığımda ilk gözüme çarpan bu alanın isminin farklı kullanımları oldu. Biliyorsunuz, İngilizcede bu alana bioinformatics deniyor. Gayet normal, çünkü İngilizcede informatics ics eki ile birlikte information sözcüğünden geliyor. Bu sözcük ise Latince kökene sahip1. Enformatik sözcüğü Türkçeye, Fransızcadan informatique sözcüğünden, enformatik olarak gelmiş, ayrıca bilişim olarak da Türkçesi önerilmiş2. Elbette bu Fransızca sözcük de İngilizcesi ile aynı kökene sahip.

Blog

7th International Symposium on Health Informatics and Bioinformatics

7. Sağlık Enformatiği ve Biyoenformatik üzerine Uluslararası Sempozyumu, 7th International Symposium on Health Informatics and Bioinformatics (HIBIT 2012), ilk kez 2005’te ODTÜ Enformatik Enstitüsü tarafından düzenlenmiş ve Sağlık Enformatiği, Tıbbi Enformatik, Hesaplamalı Biyoloji ve Biyoenformatik alanlarında akademisyenleri ve araştırmacıları bir araya getirmeyi ve bu alanlar hakkında yapılan çalışmaların sunulmasına ortam sağlamayı ve çalışmalar üzerine interaktif bir şekilde değerlendirmeler yapmayı amaçlamaktadır. Bu sene, 19-22 Nisan 2012’de Ürgüp, Nevşehir Perissia Hotel’de düzenlenecek olan HIBIT 2012 organizasyonu ODTÜ, ODTÜ Enformatik Enstitüsü, ODTÜ Biyolojik Bilimler Bölümü ve ODTÜ Bilgisayar Mühendisliği Bölümü partnerliği ile gerçekleştirilmektedir.

Blog

Biyoenformatik Nedir? Biyoenformatik'in Tanımı

Birçok organizmanın ve son olarak da 2001’de insan genomunun çıkarılmasıyla, tüm 3 milyar baz çiftinin diziliminin elde edilmesiyle, karşımıza bu bilgiyi farklı şekillerde kullanacak olan alanlar çıktı.Bu genleri anlamaya çalışan, bu genlerden oluşacak proteinleri belirlemeye çalışan alanların yanında bu bilginin analizini yapma ihtiyacı da Biyoenformatik alanını doğurdu. Biyoenformatik, biyolojik bilginin bilgisayarlar ve istatistiksel teknikler kullanılarak analiz edilmesidir; başka bir deyişle, biyoenformatik, biyolojik araştırmaları iyileştirmek ve hızlandırmak için bilgisayar veri tabanları ve algoritmaları geliştirme ve onlardan yarar sağlama bilimidir [1].

Blog

Hoş Geldim! Hoş Geldiniz!

Merhabalar, Biyoloji alanında özel olarak ilgi alanım olan ve daha fazla keşfetmem, üzerine çok şey öğrenmem gereken Biyoenformatik’i, bu blog aracılığıyla (olası ziyaretçilerimle birlikte) öğreneceğim. İlk yazımı biraz önce Biyoenformatik’in çeşitli otoriteler tarafından yapılan tanımları ile tamamladım. Daha sonra, Biyoenformatik’te geçen birçok ilkelerin tanımlarından da bahsetmek istiyorum. Ayrıca, Biyoenformatik hakkında yazılım dilleri, istatiksel yöntemler de yazılarımın konularını oluşturacak. Aynı zamanda Biyoenformatik ile ilgili haberlere de yer vermek ve bu haberlerle en son gelişmeleri takip etmeyi (ettirmeyi) planlıyorum.

Blog

Kontaminant (Kirletici) Analizi Projesi

Başlangıç olarak, araçlara, programlama diline, kısacası biyoenformatiğe alışabilmem için bana verilen bu ufak projeyi ayrıntılı olarak anlatacağım. Biliyoruz ki, laboratuvar çalışmalarımızda ne kadar önlemeye çalışsak da kontaminant riski hep bulunuyor. Bunu ne kadar aza indirsek o kadar iyi, ki daha sonra bunun miktarını bulup, bunun üzerinden sonucumuzun bir başka değerlendirmesini de yapabiliriz. İşte bunu bulmak için bir yöntem, DNA analizi. Çalıştığınız örneğinizin DNA’sı dizileniyor ve bu DNA çeşitli programlarla analiz edilip, kirleten organizmaları DNA’larından ortaya çıkarabiliyoruz

Blog

Kontaminant (Kirletici) Analizi Projesi

Başlangıç olarak, araçlara, programlama diline, kısacası biyoenformatiğe alışabilmem için bana verilen bu ufak projeyi ayrıntılı olarak anlatacağım. Biliyoruz ki, laboratuvar çalışmalarımızda ne kadar önlemeye çalışsak da kontaminant riski hep bulunuyor. Bunu ne kadar aza indirsek o kadar iyi, ki daha sonra bunun miktarını bulup, bunun üzerinden sonucumuzun bir başka değerlendirmesini de yapabiliriz. İşte bunu bulmak için bir yöntem, DNA analizi. Çalıştığınız örneğinizin DNA’sı dizileniyor ve bu DNA çeşitli programlarla analiz edilip, kirleten organizmaları DNA’larından ortaya çıkarabiliyoruz

Blog

Kontaminant (Kirletici) Analizi Projesi

Başlangıç olarak, araçlara, programlama diline, kısacası biyoenformatiğe alışabilmem için bana verilen bu ufak projeyi ayrıntılı olarak anlatacağım. Biliyoruz ki, laboratuvar çalışmalarımızda ne kadar önlemeye çalışsak da kontaminant riski hep bulunuyor. Bunu ne kadar aza indirsek o kadar iyi, ki daha sonra bunun miktarını bulup, bunun üzerinden sonucumuzun bir başka değerlendirmesini de yapabiliriz. İşte bunu bulmak için bir yöntem, DNA analizi. Çalıştığınız örneğinizin DNA’sı dizileniyor ve bu DNA çeşitli programlarla analiz edilip, kirleten organizmaları DNA’larından ortaya çıkarabiliyoruz

Blog

Kontaminant (Kirletici) Analizi Projesi

Başlangıç olarak, araçlara, programlama diline, kısacası biyoenformatiğe alışabilmem için bana verilen bu ufak projeyi ayrıntılı olarak anlatacağım. Biliyoruz ki, laboratuvar çalışmalarımızda ne kadar önlemeye çalışsak da kontaminant riski hep bulunuyor. Bunu ne kadar aza indirsek o kadar iyi, ki daha sonra bunun miktarını bulup, bunun üzerinden sonucumuzun bir başka değerlendirmesini de yapabiliriz. İşte bunu bulmak için bir yöntem, DNA analizi. Çalıştığınız örneğinizin DNA’sı dizileniyor ve bu DNA çeşitli programlarla analiz edilip, kirleten organizmaları DNA’larından ortaya çıkarabiliyoruz

Blog

FASTQ Formatı - FASTQ Dosyası

Bugün programı oluştururken kullanacağım “test” dizilimini aldım. İki adet FASTQ dosyasından oluşuyor, her biri sıkıştırılmış ama buna rağmen boyutları 6 GB civarı. Ben elbette çok zaman kaybetmek istemediğim için bu dosyalardan birinin sadece bir kısmını kullanacağım. Amacım, bu FASTQ dosyalarındaki eşleşebilen okumaları BWA aracı ile bularak, daha sonra onları çıkarmak. Ve kalan eşleşemeyen okumaları MegaBLAST aracının anlayabileceği bir dilde (FASTA formatında) kaydetmek. Bu arada tüm projeyi bir Unix bilgisayarda hazırladığım için birçok komut öğreniyorum, daha sonra bunları ayrıca yazmaya çalışacağım.

Blog

FASTQ Formatı - FASTQ Dosyası

Bugün programı oluştururken kullanacağım “test” dizilimini aldım. İki adet FASTQ dosyasından oluşuyor, her biri sıkıştırılmış ama buna rağmen boyutları 6 GB civarı. Ben elbette çok zaman kaybetmek istemediğim için bu dosyalardan birinin sadece bir kısmını kullanacağım. Amacım, bu FASTQ dosyalarındaki eşleşebilen okumaları BWA aracı ile bularak, daha sonra onları çıkarmak. Ve kalan eşleşemeyen okumaları MegaBLAST aracının anlayabileceği bir dilde (FASTA formatında) kaydetmek. Bu arada tüm projeyi bir Unix bilgisayarda hazırladığım için birçok komut öğreniyorum, daha sonra bunları ayrıca yazmaya çalışacağım.

Blog

FASTQ Formatı - FASTQ Dosyası

Bugün programı oluştururken kullanacağım “test” dizilimini aldım. İki adet FASTQ dosyasından oluşuyor, her biri sıkıştırılmış ama buna rağmen boyutları 6 GB civarı. Ben elbette çok zaman kaybetmek istemediğim için bu dosyalardan birinin sadece bir kısmını kullanacağım. Amacım, bu FASTQ dosyalarındaki eşleşebilen okumaları BWA aracı ile bularak, daha sonra onları çıkarmak. Ve kalan eşleşemeyen okumaları MegaBLAST aracının anlayabileceği bir dilde (FASTA formatında) kaydetmek. Bu arada tüm projeyi bir Unix bilgisayarda hazırladığım için birçok komut öğreniyorum, daha sonra bunları ayrıca yazmaya çalışacağım.

Blog

BWA (Burrows-Wheeler Aligner) Hizalayıcı - Eşleştirici

Önceki yazımda belirttiğim gibi bir eşleştirici (aligner ya da mapper) kullanarak elimdeki verinin referans genomu ile ne derece eşlestiğini bulmaya çalışacağım. Daha sonra eşleşmeyen kısmıyla birtakım analizler yapacağım. BWA (Burrows-Wheeler Aligner) görece kısa dizilimleri insan genomu gibi uzun referans genomlarıyla eşleştiren bir program. 200bp (bp: baz çifti) uzunluğuna kadar bwa-short algoritması, 200bp - 100kbp arası ise BWA-SW algoritması kullanılıyor. Hizalayıcı - eşleştirici seçmede birçok faktör rol oynuyor. Birçok bu tip araç var ve farklı özelliklere sahipler.

Blog

BWA (Burrows-Wheeler Aligner) Hizalayıcı - Eşleştirici

Önceki yazımda belirttiğim gibi bir eşleştirici (aligner ya da mapper) kullanarak elimdeki verinin referans genomu ile ne derece eşlestiğini bulmaya çalışacağım. Daha sonra eşleşmeyen kısmıyla birtakım analizler yapacağım. BWA (Burrows-Wheeler Aligner) görece kısa dizilimleri insan genomu gibi uzun referans genomlarıyla eşleştiren bir program. 200bp (bp: baz çifti) uzunluğuna kadar bwa-short algoritması, 200bp - 100kbp arası ise BWA-SW algoritması kullanılıyor. Hizalayıcı - eşleştirici seçmede birçok faktör rol oynuyor. Birçok bu tip araç var ve farklı özelliklere sahipler.

Blog

BWA (Burrows-Wheeler Aligner) Hizalayıcı - Eşleştirici

Önceki yazımda belirttiğim gibi bir eşleştirici (aligner ya da mapper) kullanarak elimdeki verinin referans genomu ile ne derece eşlestiğini bulmaya çalışacağım. Daha sonra eşleşmeyen kısmıyla birtakım analizler yapacağım. BWA (Burrows-Wheeler Aligner) görece kısa dizilimleri insan genomu gibi uzun referans genomlarıyla eşleştiren bir program. 200bp (bp: baz çifti) uzunluğuna kadar bwa-short algoritması, 200bp - 100kbp arası ise BWA-SW algoritması kullanılıyor. Hizalayıcı - eşleştirici seçmede birçok faktör rol oynuyor. Birçok bu tip araç var ve farklı özelliklere sahipler.

Blog

BWA (Burrows-Wheeler Aligner) Hizalayıcı - Eşleştirici

Önceki yazımda belirttiğim gibi bir eşleştirici (aligner ya da mapper) kullanarak elimdeki verinin referans genomu ile ne derece eşlestiğini bulmaya çalışacağım. Daha sonra eşleşmeyen kısmıyla birtakım analizler yapacağım. BWA (Burrows-Wheeler Aligner) görece kısa dizilimleri insan genomu gibi uzun referans genomlarıyla eşleştiren bir program. 200bp (bp: baz çifti) uzunluğuna kadar bwa-short algoritması, 200bp - 100kbp arası ise BWA-SW algoritması kullanılıyor. Hizalayıcı - eşleştirici seçmede birçok faktör rol oynuyor. Birçok bu tip araç var ve farklı özelliklere sahipler.

Blog

BWA (Burrows-Wheeler Aligner) Hizalayıcı - Eşleştirici

Önceki yazımda belirttiğim gibi bir eşleştirici (aligner ya da mapper) kullanarak elimdeki verinin referans genomu ile ne derece eşlestiğini bulmaya çalışacağım. Daha sonra eşleşmeyen kısmıyla birtakım analizler yapacağım. BWA (Burrows-Wheeler Aligner) görece kısa dizilimleri insan genomu gibi uzun referans genomlarıyla eşleştiren bir program. 200bp (bp: baz çifti) uzunluğuna kadar bwa-short algoritması, 200bp - 100kbp arası ise BWA-SW algoritması kullanılıyor. Hizalayıcı - eşleştirici seçmede birçok faktör rol oynuyor. Birçok bu tip araç var ve farklı özelliklere sahipler.

Blog

BWA (Burrows-Wheeler Aligner) Hizalayıcı - Eşleştirici

Önceki yazımda belirttiğim gibi bir eşleştirici (aligner ya da mapper) kullanarak elimdeki verinin referans genomu ile ne derece eşlestiğini bulmaya çalışacağım. Daha sonra eşleşmeyen kısmıyla birtakım analizler yapacağım. BWA (Burrows-Wheeler Aligner) görece kısa dizilimleri insan genomu gibi uzun referans genomlarıyla eşleştiren bir program. 200bp (bp: baz çifti) uzunluğuna kadar bwa-short algoritması, 200bp - 100kbp arası ise BWA-SW algoritması kullanılıyor. Hizalayıcı - eşleştirici seçmede birçok faktör rol oynuyor. Birçok bu tip araç var ve farklı özelliklere sahipler.

Blog

BWA (Burrows-Wheeler Aligner) Hizalayıcı - Eşleştirici

Önceki yazımda belirttiğim gibi bir eşleştirici (aligner ya da mapper) kullanarak elimdeki verinin referans genomu ile ne derece eşlestiğini bulmaya çalışacağım. Daha sonra eşleşmeyen kısmıyla birtakım analizler yapacağım. BWA (Burrows-Wheeler Aligner) görece kısa dizilimleri insan genomu gibi uzun referans genomlarıyla eşleştiren bir program. 200bp (bp: baz çifti) uzunluğuna kadar bwa-short algoritması, 200bp - 100kbp arası ise BWA-SW algoritması kullanılıyor. Hizalayıcı - eşleştirici seçmede birçok faktör rol oynuyor. Birçok bu tip araç var ve farklı özelliklere sahipler.

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

Pipeline ve Pipeline Geliştirme

Bugün aldığım tanıtım derslerinin devamında, pipeline ve pipeline geliştirme ile ilgili ayrıntılı bilgiler aldım. Pipeline, aslında bildiğimiz boru hattı demek, örneğin borularla petrolün bir yerden başka bir yere taşınması için kullanılan sistem. Bunun bilgisayar terminolojisinde anlamı ise bir elementin çıktısı, diğerinin girdisi olacak şekilde oluşturulmuş işleme elementleri zinciri. Böylece çok daha komplike işlemler pipeline oluşturularak, kolay ve düzenli bir biçimde gerçekleştiriliyor. Sanırım pipeline Türkçeye ardışık düzen olarak çevriliyor, gene de ben pipeline olarak kullanacağım.

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

Pipeline ve Pipeline Geliştirme

Bugün aldığım tanıtım derslerinin devamında, pipeline ve pipeline geliştirme ile ilgili ayrıntılı bilgiler aldım. Pipeline, aslında bildiğimiz boru hattı demek, örneğin borularla petrolün bir yerden başka bir yere taşınması için kullanılan sistem. Bunun bilgisayar terminolojisinde anlamı ise bir elementin çıktısı, diğerinin girdisi olacak şekilde oluşturulmuş işleme elementleri zinciri. Böylece çok daha komplike işlemler pipeline oluşturularak, kolay ve düzenli bir biçimde gerçekleştiriliyor. Sanırım pipeline Türkçeye ardışık düzen olarak çevriliyor, gene de ben pipeline olarak kullanacağım.

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

Dizileme Çalışmalarını Kirleten Organizmaları Tespit Etme

Bu yaz stajımda ilk olarak başlayacağım çalışma yavaş yavaş şekilleniyor. Bu çalışmada bir pipeline oluşturup, bunu laboratuvarlarda dizileme (sequencing) örneklerini kirleten organizmaları bulmaya çalışacağım. Laboratuvarlarda birçok nedenden dolayı örnekler başka organizmalar ya da yabancı DNA tarafından kirlenebiliyor. Bunlar bakteri, maya olabilir ya da bir virüs DNA’sı da olabilir. Siz bir DNA’yı diziledikten sonra onun referansıyla eşleştirme çok az oranda çıkabiliyor. Bu da yabancı DNA’nın olabileceğini gösteriyor. Bir başka neden referans DNA’nın farklı olması da olabilir.

Blog

Pipeline ve Pipeline Geliştirme

Bugün aldığım tanıtım derslerinin devamında, pipeline ve pipeline geliştirme ile ilgili ayrıntılı bilgiler aldım. Pipeline, aslında bildiğimiz boru hattı demek, örneğin borularla petrolün bir yerden başka bir yere taşınması için kullanılan sistem. Bunun bilgisayar terminolojisinde anlamı ise bir elementin çıktısı, diğerinin girdisi olacak şekilde oluşturulmuş işleme elementleri zinciri. Böylece çok daha komplike işlemler pipeline oluşturularak, kolay ve düzenli bir biçimde gerçekleştiriliyor. Sanırım pipeline Türkçeye ardışık düzen olarak çevriliyor, gene de ben pipeline olarak kullanacağım.

Blog

WWW2HUSAR - HUSAR'ın Web Arayüzü

Stajımın ikinci gününde HUSAR’ın web arayüzünü konuştuk. HUSAR komut isteminden komutlarla kullanılabilen, yönetilebilen bir yazılım ancak bunu kolaylaştırmak için hazırlanmış bir web arayüzü var. WWW2HUSAR adını verdikleri bu arayüz ile listelenen araçları kolayca seçebiliyor, genetik dizinizi ekleyebiliyor ve başka birçok işlemi kolayca, birkaç tık ile yapabiliyorsunuz. Bununla birlikte biraz daha HUSAR’ın işlevlerine göz attık. Yazılımda, yerel klasörde gen dizisi listeleri oluşturarak, bunları çoklu dizi hizalama (multiple sequence alignment) aracı ile genlerin benzerliklerini karşılastırabiliyor ve örneğin evrimsel ilişkilerini ortaya çıkarabiliyorsunuz.

Blog

DKFZ - Heidelberg Biyoenformatik Birimi'nde Staj

Erasmus programıyla yapıyor olduğum yaz stajı başladı. İlk olarak birimi yöneten bilim insanlarından birkaç saatlik tanıtım dersi aldım. Bu derste birimin kısa tarihi, birimin günümüze kadar yaptıkları projeler ve bunlarin ayrintilari konusunda bilgiler aldım. Biyoenformatik Birimi DKFZ’nin (Deutsches Krebsforschungszentrum – ing. German Cancer Research Center) bir çekirdek tesisi olan Genomik ve Proteomik Çekirdek Tesisi’ne bağlı bir grup. İsimleri aynı zamanda HUSAR (Heidelberg Unix Sequence Analysis Resources) ve bu isim grubun geliştirdiği dizi analizi yapma paketinin de adı olarak kullanılıyor.

Blog

Pipeline ve Pipeline Geliştirme

Bugün aldığım tanıtım derslerinin devamında, pipeline ve pipeline geliştirme ile ilgili ayrıntılı bilgiler aldım. Pipeline, aslında bildiğimiz boru hattı demek, örneğin borularla petrolün bir yerden başka bir yere taşınması için kullanılan sistem. Bunun bilgisayar terminolojisinde anlamı ise bir elementin çıktısı, diğerinin girdisi olacak şekilde oluşturulmuş işleme elementleri zinciri. Böylece çok daha komplike işlemler pipeline oluşturularak, kolay ve düzenli bir biçimde gerçekleştiriliyor. Sanırım pipeline Türkçeye ardışık düzen olarak çevriliyor, gene de ben pipeline olarak kullanacağım.

Blog

WWW2HUSAR - HUSAR'ın Web Arayüzü

Stajımın ikinci gününde HUSAR’ın web arayüzünü konuştuk. HUSAR komut isteminden komutlarla kullanılabilen, yönetilebilen bir yazılım ancak bunu kolaylaştırmak için hazırlanmış bir web arayüzü var. WWW2HUSAR adını verdikleri bu arayüz ile listelenen araçları kolayca seçebiliyor, genetik dizinizi ekleyebiliyor ve başka birçok işlemi kolayca, birkaç tık ile yapabiliyorsunuz. Bununla birlikte biraz daha HUSAR’ın işlevlerine göz attık. Yazılımda, yerel klasörde gen dizisi listeleri oluşturarak, bunları çoklu dizi hizalama (multiple sequence alignment) aracı ile genlerin benzerliklerini karşılastırabiliyor ve örneğin evrimsel ilişkilerini ortaya çıkarabiliyorsunuz.

Blog

DKFZ - Heidelberg Biyoenformatik Birimi'nde Staj

Erasmus programıyla yapıyor olduğum yaz stajı başladı. İlk olarak birimi yöneten bilim insanlarından birkaç saatlik tanıtım dersi aldım. Bu derste birimin kısa tarihi, birimin günümüze kadar yaptıkları projeler ve bunlarin ayrintilari konusunda bilgiler aldım. Biyoenformatik Birimi DKFZ’nin (Deutsches Krebsforschungszentrum – ing. German Cancer Research Center) bir çekirdek tesisi olan Genomik ve Proteomik Çekirdek Tesisi’ne bağlı bir grup. İsimleri aynı zamanda HUSAR (Heidelberg Unix Sequence Analysis Resources) ve bu isim grubun geliştirdiği dizi analizi yapma paketinin de adı olarak kullanılıyor.

Blog

WWW2HUSAR - HUSAR'ın Web Arayüzü

Stajımın ikinci gününde HUSAR’ın web arayüzünü konuştuk. HUSAR komut isteminden komutlarla kullanılabilen, yönetilebilen bir yazılım ancak bunu kolaylaştırmak için hazırlanmış bir web arayüzü var. WWW2HUSAR adını verdikleri bu arayüz ile listelenen araçları kolayca seçebiliyor, genetik dizinizi ekleyebiliyor ve başka birçok işlemi kolayca, birkaç tık ile yapabiliyorsunuz. Bununla birlikte biraz daha HUSAR’ın işlevlerine göz attık. Yazılımda, yerel klasörde gen dizisi listeleri oluşturarak, bunları çoklu dizi hizalama (multiple sequence alignment) aracı ile genlerin benzerliklerini karşılastırabiliyor ve örneğin evrimsel ilişkilerini ortaya çıkarabiliyorsunuz.

Blog

WWW2HUSAR - HUSAR'ın Web Arayüzü

Stajımın ikinci gününde HUSAR’ın web arayüzünü konuştuk. HUSAR komut isteminden komutlarla kullanılabilen, yönetilebilen bir yazılım ancak bunu kolaylaştırmak için hazırlanmış bir web arayüzü var. WWW2HUSAR adını verdikleri bu arayüz ile listelenen araçları kolayca seçebiliyor, genetik dizinizi ekleyebiliyor ve başka birçok işlemi kolayca, birkaç tık ile yapabiliyorsunuz. Bununla birlikte biraz daha HUSAR’ın işlevlerine göz attık. Yazılımda, yerel klasörde gen dizisi listeleri oluşturarak, bunları çoklu dizi hizalama (multiple sequence alignment) aracı ile genlerin benzerliklerini karşılastırabiliyor ve örneğin evrimsel ilişkilerini ortaya çıkarabiliyorsunuz.

Blog

DKFZ - Heidelberg Biyoenformatik Birimi'nde Staj

Erasmus programıyla yapıyor olduğum yaz stajı başladı. İlk olarak birimi yöneten bilim insanlarından birkaç saatlik tanıtım dersi aldım. Bu derste birimin kısa tarihi, birimin günümüze kadar yaptıkları projeler ve bunlarin ayrintilari konusunda bilgiler aldım. Biyoenformatik Birimi DKFZ’nin (Deutsches Krebsforschungszentrum – ing. German Cancer Research Center) bir çekirdek tesisi olan Genomik ve Proteomik Çekirdek Tesisi’ne bağlı bir grup. İsimleri aynı zamanda HUSAR (Heidelberg Unix Sequence Analysis Resources) ve bu isim grubun geliştirdiği dizi analizi yapma paketinin de adı olarak kullanılıyor.

Blog

DKFZ - Heidelberg Biyoenformatik Birimi'nde Staj

Erasmus programıyla yapıyor olduğum yaz stajı başladı. İlk olarak birimi yöneten bilim insanlarından birkaç saatlik tanıtım dersi aldım. Bu derste birimin kısa tarihi, birimin günümüze kadar yaptıkları projeler ve bunlarin ayrintilari konusunda bilgiler aldım. Biyoenformatik Birimi DKFZ’nin (Deutsches Krebsforschungszentrum – ing. German Cancer Research Center) bir çekirdek tesisi olan Genomik ve Proteomik Çekirdek Tesisi’ne bağlı bir grup. İsimleri aynı zamanda HUSAR (Heidelberg Unix Sequence Analysis Resources) ve bu isim grubun geliştirdiği dizi analizi yapma paketinin de adı olarak kullanılıyor.

Blog

Biyoinformatik mi? Yoksa Biyoenformatik mi?

Yazılarıma konu ararken kitaplarla birlikte interneti de karıştırıyorum. Yabancı kaynaklar elbette fazlaca var ve yeterliler, ancak Türkçe kaynaklara baktığımda ilk gözüme çarpan bu alanın isminin farklı kullanımları oldu. Biliyorsunuz, İngilizcede bu alana bioinformatics deniyor. Gayet normal, çünkü İngilizcede informatics ics eki ile birlikte information sözcüğünden geliyor. Bu sözcük ise Latince kökene sahip1. Enformatik sözcüğü Türkçeye, Fransızcadan informatique sözcüğünden, enformatik olarak gelmiş, ayrıca bilişim olarak da Türkçesi önerilmiş2. Elbette bu Fransızca sözcük de İngilizcesi ile aynı kökene sahip.

Blog

Biyoinformatik mi? Yoksa Biyoenformatik mi?

Yazılarıma konu ararken kitaplarla birlikte interneti de karıştırıyorum. Yabancı kaynaklar elbette fazlaca var ve yeterliler, ancak Türkçe kaynaklara baktığımda ilk gözüme çarpan bu alanın isminin farklı kullanımları oldu. Biliyorsunuz, İngilizcede bu alana bioinformatics deniyor. Gayet normal, çünkü İngilizcede informatics ics eki ile birlikte information sözcüğünden geliyor. Bu sözcük ise Latince kökene sahip1. Enformatik sözcüğü Türkçeye, Fransızcadan informatique sözcüğünden, enformatik olarak gelmiş, ayrıca bilişim olarak da Türkçesi önerilmiş2. Elbette bu Fransızca sözcük de İngilizcesi ile aynı kökene sahip.

Blog

Biyoinformatik mi? Yoksa Biyoenformatik mi?

Yazılarıma konu ararken kitaplarla birlikte interneti de karıştırıyorum. Yabancı kaynaklar elbette fazlaca var ve yeterliler, ancak Türkçe kaynaklara baktığımda ilk gözüme çarpan bu alanın isminin farklı kullanımları oldu. Biliyorsunuz, İngilizcede bu alana bioinformatics deniyor. Gayet normal, çünkü İngilizcede informatics ics eki ile birlikte information sözcüğünden geliyor. Bu sözcük ise Latince kökene sahip1. Enformatik sözcüğü Türkçeye, Fransızcadan informatique sözcüğünden, enformatik olarak gelmiş, ayrıca bilişim olarak da Türkçesi önerilmiş2. Elbette bu Fransızca sözcük de İngilizcesi ile aynı kökene sahip.

Blog

Biyoinformatik mi? Yoksa Biyoenformatik mi?

Yazılarıma konu ararken kitaplarla birlikte interneti de karıştırıyorum. Yabancı kaynaklar elbette fazlaca var ve yeterliler, ancak Türkçe kaynaklara baktığımda ilk gözüme çarpan bu alanın isminin farklı kullanımları oldu. Biliyorsunuz, İngilizcede bu alana bioinformatics deniyor. Gayet normal, çünkü İngilizcede informatics ics eki ile birlikte information sözcüğünden geliyor. Bu sözcük ise Latince kökene sahip1. Enformatik sözcüğü Türkçeye, Fransızcadan informatique sözcüğünden, enformatik olarak gelmiş, ayrıca bilişim olarak da Türkçesi önerilmiş2. Elbette bu Fransızca sözcük de İngilizcesi ile aynı kökene sahip.

Blog

Biyoinformatik mi? Yoksa Biyoenformatik mi?

Yazılarıma konu ararken kitaplarla birlikte interneti de karıştırıyorum. Yabancı kaynaklar elbette fazlaca var ve yeterliler, ancak Türkçe kaynaklara baktığımda ilk gözüme çarpan bu alanın isminin farklı kullanımları oldu. Biliyorsunuz, İngilizcede bu alana bioinformatics deniyor. Gayet normal, çünkü İngilizcede informatics ics eki ile birlikte information sözcüğünden geliyor. Bu sözcük ise Latince kökene sahip1. Enformatik sözcüğü Türkçeye, Fransızcadan informatique sözcüğünden, enformatik olarak gelmiş, ayrıca bilişim olarak da Türkçesi önerilmiş2. Elbette bu Fransızca sözcük de İngilizcesi ile aynı kökene sahip.

Blog

Biyoinformatik mi? Yoksa Biyoenformatik mi?

Yazılarıma konu ararken kitaplarla birlikte interneti de karıştırıyorum. Yabancı kaynaklar elbette fazlaca var ve yeterliler, ancak Türkçe kaynaklara baktığımda ilk gözüme çarpan bu alanın isminin farklı kullanımları oldu. Biliyorsunuz, İngilizcede bu alana bioinformatics deniyor. Gayet normal, çünkü İngilizcede informatics ics eki ile birlikte information sözcüğünden geliyor. Bu sözcük ise Latince kökene sahip1. Enformatik sözcüğü Türkçeye, Fransızcadan informatique sözcüğünden, enformatik olarak gelmiş, ayrıca bilişim olarak da Türkçesi önerilmiş2. Elbette bu Fransızca sözcük de İngilizcesi ile aynı kökene sahip.

Blog

Biyoinformatik mi? Yoksa Biyoenformatik mi?

Yazılarıma konu ararken kitaplarla birlikte interneti de karıştırıyorum. Yabancı kaynaklar elbette fazlaca var ve yeterliler, ancak Türkçe kaynaklara baktığımda ilk gözüme çarpan bu alanın isminin farklı kullanımları oldu. Biliyorsunuz, İngilizcede bu alana bioinformatics deniyor. Gayet normal, çünkü İngilizcede informatics ics eki ile birlikte information sözcüğünden geliyor. Bu sözcük ise Latince kökene sahip1. Enformatik sözcüğü Türkçeye, Fransızcadan informatique sözcüğünden, enformatik olarak gelmiş, ayrıca bilişim olarak da Türkçesi önerilmiş2. Elbette bu Fransızca sözcük de İngilizcesi ile aynı kökene sahip.

Blog

Biyoinformatik mi? Yoksa Biyoenformatik mi?

Yazılarıma konu ararken kitaplarla birlikte interneti de karıştırıyorum. Yabancı kaynaklar elbette fazlaca var ve yeterliler, ancak Türkçe kaynaklara baktığımda ilk gözüme çarpan bu alanın isminin farklı kullanımları oldu. Biliyorsunuz, İngilizcede bu alana bioinformatics deniyor. Gayet normal, çünkü İngilizcede informatics ics eki ile birlikte information sözcüğünden geliyor. Bu sözcük ise Latince kökene sahip1. Enformatik sözcüğü Türkçeye, Fransızcadan informatique sözcüğünden, enformatik olarak gelmiş, ayrıca bilişim olarak da Türkçesi önerilmiş2. Elbette bu Fransızca sözcük de İngilizcesi ile aynı kökene sahip.

Blog

Biyoinformatik mi? Yoksa Biyoenformatik mi?

Yazılarıma konu ararken kitaplarla birlikte interneti de karıştırıyorum. Yabancı kaynaklar elbette fazlaca var ve yeterliler, ancak Türkçe kaynaklara baktığımda ilk gözüme çarpan bu alanın isminin farklı kullanımları oldu. Biliyorsunuz, İngilizcede bu alana bioinformatics deniyor. Gayet normal, çünkü İngilizcede informatics ics eki ile birlikte information sözcüğünden geliyor. Bu sözcük ise Latince kökene sahip1. Enformatik sözcüğü Türkçeye, Fransızcadan informatique sözcüğünden, enformatik olarak gelmiş, ayrıca bilişim olarak da Türkçesi önerilmiş2. Elbette bu Fransızca sözcük de İngilizcesi ile aynı kökene sahip.

Blog

7th International Symposium on Health Informatics and Bioinformatics

7. Sağlık Enformatiği ve Biyoenformatik üzerine Uluslararası Sempozyumu, 7th International Symposium on Health Informatics and Bioinformatics (HIBIT 2012), ilk kez 2005’te ODTÜ Enformatik Enstitüsü tarafından düzenlenmiş ve Sağlık Enformatiği, Tıbbi Enformatik, Hesaplamalı Biyoloji ve Biyoenformatik alanlarında akademisyenleri ve araştırmacıları bir araya getirmeyi ve bu alanlar hakkında yapılan çalışmaların sunulmasına ortam sağlamayı ve çalışmalar üzerine interaktif bir şekilde değerlendirmeler yapmayı amaçlamaktadır. Bu sene, 19-22 Nisan 2012’de Ürgüp, Nevşehir Perissia Hotel’de düzenlenecek olan HIBIT 2012 organizasyonu ODTÜ, ODTÜ Enformatik Enstitüsü, ODTÜ Biyolojik Bilimler Bölümü ve ODTÜ Bilgisayar Mühendisliği Bölümü partnerliği ile gerçekleştirilmektedir.

Blog

7th International Symposium on Health Informatics and Bioinformatics

7. Sağlık Enformatiği ve Biyoenformatik üzerine Uluslararası Sempozyumu, 7th International Symposium on Health Informatics and Bioinformatics (HIBIT 2012), ilk kez 2005’te ODTÜ Enformatik Enstitüsü tarafından düzenlenmiş ve Sağlık Enformatiği, Tıbbi Enformatik, Hesaplamalı Biyoloji ve Biyoenformatik alanlarında akademisyenleri ve araştırmacıları bir araya getirmeyi ve bu alanlar hakkında yapılan çalışmaların sunulmasına ortam sağlamayı ve çalışmalar üzerine interaktif bir şekilde değerlendirmeler yapmayı amaçlamaktadır. Bu sene, 19-22 Nisan 2012’de Ürgüp, Nevşehir Perissia Hotel’de düzenlenecek olan HIBIT 2012 organizasyonu ODTÜ, ODTÜ Enformatik Enstitüsü, ODTÜ Biyolojik Bilimler Bölümü ve ODTÜ Bilgisayar Mühendisliği Bölümü partnerliği ile gerçekleştirilmektedir.

Blog

7th International Symposium on Health Informatics and Bioinformatics

7. Sağlık Enformatiği ve Biyoenformatik üzerine Uluslararası Sempozyumu, 7th International Symposium on Health Informatics and Bioinformatics (HIBIT 2012), ilk kez 2005’te ODTÜ Enformatik Enstitüsü tarafından düzenlenmiş ve Sağlık Enformatiği, Tıbbi Enformatik, Hesaplamalı Biyoloji ve Biyoenformatik alanlarında akademisyenleri ve araştırmacıları bir araya getirmeyi ve bu alanlar hakkında yapılan çalışmaların sunulmasına ortam sağlamayı ve çalışmalar üzerine interaktif bir şekilde değerlendirmeler yapmayı amaçlamaktadır. Bu sene, 19-22 Nisan 2012’de Ürgüp, Nevşehir Perissia Hotel’de düzenlenecek olan HIBIT 2012 organizasyonu ODTÜ, ODTÜ Enformatik Enstitüsü, ODTÜ Biyolojik Bilimler Bölümü ve ODTÜ Bilgisayar Mühendisliği Bölümü partnerliği ile gerçekleştirilmektedir.

Blog

7th International Symposium on Health Informatics and Bioinformatics

7. Sağlık Enformatiği ve Biyoenformatik üzerine Uluslararası Sempozyumu, 7th International Symposium on Health Informatics and Bioinformatics (HIBIT 2012), ilk kez 2005’te ODTÜ Enformatik Enstitüsü tarafından düzenlenmiş ve Sağlık Enformatiği, Tıbbi Enformatik, Hesaplamalı Biyoloji ve Biyoenformatik alanlarında akademisyenleri ve araştırmacıları bir araya getirmeyi ve bu alanlar hakkında yapılan çalışmaların sunulmasına ortam sağlamayı ve çalışmalar üzerine interaktif bir şekilde değerlendirmeler yapmayı amaçlamaktadır. Bu sene, 19-22 Nisan 2012’de Ürgüp, Nevşehir Perissia Hotel’de düzenlenecek olan HIBIT 2012 organizasyonu ODTÜ, ODTÜ Enformatik Enstitüsü, ODTÜ Biyolojik Bilimler Bölümü ve ODTÜ Bilgisayar Mühendisliği Bölümü partnerliği ile gerçekleştirilmektedir.

Blog

7th International Symposium on Health Informatics and Bioinformatics

7. Sağlık Enformatiği ve Biyoenformatik üzerine Uluslararası Sempozyumu, 7th International Symposium on Health Informatics and Bioinformatics (HIBIT 2012), ilk kez 2005’te ODTÜ Enformatik Enstitüsü tarafından düzenlenmiş ve Sağlık Enformatiği, Tıbbi Enformatik, Hesaplamalı Biyoloji ve Biyoenformatik alanlarında akademisyenleri ve araştırmacıları bir araya getirmeyi ve bu alanlar hakkında yapılan çalışmaların sunulmasına ortam sağlamayı ve çalışmalar üzerine interaktif bir şekilde değerlendirmeler yapmayı amaçlamaktadır. Bu sene, 19-22 Nisan 2012’de Ürgüp, Nevşehir Perissia Hotel’de düzenlenecek olan HIBIT 2012 organizasyonu ODTÜ, ODTÜ Enformatik Enstitüsü, ODTÜ Biyolojik Bilimler Bölümü ve ODTÜ Bilgisayar Mühendisliği Bölümü partnerliği ile gerçekleştirilmektedir.

Blog

7th International Symposium on Health Informatics and Bioinformatics

7. Sağlık Enformatiği ve Biyoenformatik üzerine Uluslararası Sempozyumu, 7th International Symposium on Health Informatics and Bioinformatics (HIBIT 2012), ilk kez 2005’te ODTÜ Enformatik Enstitüsü tarafından düzenlenmiş ve Sağlık Enformatiği, Tıbbi Enformatik, Hesaplamalı Biyoloji ve Biyoenformatik alanlarında akademisyenleri ve araştırmacıları bir araya getirmeyi ve bu alanlar hakkında yapılan çalışmaların sunulmasına ortam sağlamayı ve çalışmalar üzerine interaktif bir şekilde değerlendirmeler yapmayı amaçlamaktadır. Bu sene, 19-22 Nisan 2012’de Ürgüp, Nevşehir Perissia Hotel’de düzenlenecek olan HIBIT 2012 organizasyonu ODTÜ, ODTÜ Enformatik Enstitüsü, ODTÜ Biyolojik Bilimler Bölümü ve ODTÜ Bilgisayar Mühendisliği Bölümü partnerliği ile gerçekleştirilmektedir.

Blog

7th International Symposium on Health Informatics and Bioinformatics

7. Sağlık Enformatiği ve Biyoenformatik üzerine Uluslararası Sempozyumu, 7th International Symposium on Health Informatics and Bioinformatics (HIBIT 2012), ilk kez 2005’te ODTÜ Enformatik Enstitüsü tarafından düzenlenmiş ve Sağlık Enformatiği, Tıbbi Enformatik, Hesaplamalı Biyoloji ve Biyoenformatik alanlarında akademisyenleri ve araştırmacıları bir araya getirmeyi ve bu alanlar hakkında yapılan çalışmaların sunulmasına ortam sağlamayı ve çalışmalar üzerine interaktif bir şekilde değerlendirmeler yapmayı amaçlamaktadır. Bu sene, 19-22 Nisan 2012’de Ürgüp, Nevşehir Perissia Hotel’de düzenlenecek olan HIBIT 2012 organizasyonu ODTÜ, ODTÜ Enformatik Enstitüsü, ODTÜ Biyolojik Bilimler Bölümü ve ODTÜ Bilgisayar Mühendisliği Bölümü partnerliği ile gerçekleştirilmektedir.

Blog

Biyoenformatik Nedir? Biyoenformatik'in Tanımı

Birçok organizmanın ve son olarak da 2001’de insan genomunun çıkarılmasıyla, tüm 3 milyar baz çiftinin diziliminin elde edilmesiyle, karşımıza bu bilgiyi farklı şekillerde kullanacak olan alanlar çıktı.Bu genleri anlamaya çalışan, bu genlerden oluşacak proteinleri belirlemeye çalışan alanların yanında bu bilginin analizini yapma ihtiyacı da Biyoenformatik alanını doğurdu. Biyoenformatik, biyolojik bilginin bilgisayarlar ve istatistiksel teknikler kullanılarak analiz edilmesidir; başka bir deyişle, biyoenformatik, biyolojik araştırmaları iyileştirmek ve hızlandırmak için bilgisayar veri tabanları ve algoritmaları geliştirme ve onlardan yarar sağlama bilimidir [1].

Blog

Biyoenformatik Nedir? Biyoenformatik'in Tanımı

Birçok organizmanın ve son olarak da 2001’de insan genomunun çıkarılmasıyla, tüm 3 milyar baz çiftinin diziliminin elde edilmesiyle, karşımıza bu bilgiyi farklı şekillerde kullanacak olan alanlar çıktı.Bu genleri anlamaya çalışan, bu genlerden oluşacak proteinleri belirlemeye çalışan alanların yanında bu bilginin analizini yapma ihtiyacı da Biyoenformatik alanını doğurdu. Biyoenformatik, biyolojik bilginin bilgisayarlar ve istatistiksel teknikler kullanılarak analiz edilmesidir; başka bir deyişle, biyoenformatik, biyolojik araştırmaları iyileştirmek ve hızlandırmak için bilgisayar veri tabanları ve algoritmaları geliştirme ve onlardan yarar sağlama bilimidir [1].

Blog

Biyoenformatik Nedir? Biyoenformatik'in Tanımı

Birçok organizmanın ve son olarak da 2001’de insan genomunun çıkarılmasıyla, tüm 3 milyar baz çiftinin diziliminin elde edilmesiyle, karşımıza bu bilgiyi farklı şekillerde kullanacak olan alanlar çıktı.Bu genleri anlamaya çalışan, bu genlerden oluşacak proteinleri belirlemeye çalışan alanların yanında bu bilginin analizini yapma ihtiyacı da Biyoenformatik alanını doğurdu. Biyoenformatik, biyolojik bilginin bilgisayarlar ve istatistiksel teknikler kullanılarak analiz edilmesidir; başka bir deyişle, biyoenformatik, biyolojik araştırmaları iyileştirmek ve hızlandırmak için bilgisayar veri tabanları ve algoritmaları geliştirme ve onlardan yarar sağlama bilimidir [1].

Blog

Biyoenformatik Nedir? Biyoenformatik'in Tanımı

Birçok organizmanın ve son olarak da 2001’de insan genomunun çıkarılmasıyla, tüm 3 milyar baz çiftinin diziliminin elde edilmesiyle, karşımıza bu bilgiyi farklı şekillerde kullanacak olan alanlar çıktı.Bu genleri anlamaya çalışan, bu genlerden oluşacak proteinleri belirlemeye çalışan alanların yanında bu bilginin analizini yapma ihtiyacı da Biyoenformatik alanını doğurdu. Biyoenformatik, biyolojik bilginin bilgisayarlar ve istatistiksel teknikler kullanılarak analiz edilmesidir; başka bir deyişle, biyoenformatik, biyolojik araştırmaları iyileştirmek ve hızlandırmak için bilgisayar veri tabanları ve algoritmaları geliştirme ve onlardan yarar sağlama bilimidir [1].

Blog

Biyoenformatik Nedir? Biyoenformatik'in Tanımı

Birçok organizmanın ve son olarak da 2001’de insan genomunun çıkarılmasıyla, tüm 3 milyar baz çiftinin diziliminin elde edilmesiyle, karşımıza bu bilgiyi farklı şekillerde kullanacak olan alanlar çıktı.Bu genleri anlamaya çalışan, bu genlerden oluşacak proteinleri belirlemeye çalışan alanların yanında bu bilginin analizini yapma ihtiyacı da Biyoenformatik alanını doğurdu. Biyoenformatik, biyolojik bilginin bilgisayarlar ve istatistiksel teknikler kullanılarak analiz edilmesidir; başka bir deyişle, biyoenformatik, biyolojik araştırmaları iyileştirmek ve hızlandırmak için bilgisayar veri tabanları ve algoritmaları geliştirme ve onlardan yarar sağlama bilimidir [1].

Blog

Biyoenformatik Nedir? Biyoenformatik'in Tanımı

Birçok organizmanın ve son olarak da 2001’de insan genomunun çıkarılmasıyla, tüm 3 milyar baz çiftinin diziliminin elde edilmesiyle, karşımıza bu bilgiyi farklı şekillerde kullanacak olan alanlar çıktı.Bu genleri anlamaya çalışan, bu genlerden oluşacak proteinleri belirlemeye çalışan alanların yanında bu bilginin analizini yapma ihtiyacı da Biyoenformatik alanını doğurdu. Biyoenformatik, biyolojik bilginin bilgisayarlar ve istatistiksel teknikler kullanılarak analiz edilmesidir; başka bir deyişle, biyoenformatik, biyolojik araştırmaları iyileştirmek ve hızlandırmak için bilgisayar veri tabanları ve algoritmaları geliştirme ve onlardan yarar sağlama bilimidir [1].

Blog

Biyoenformatik Nedir? Biyoenformatik'in Tanımı

Birçok organizmanın ve son olarak da 2001’de insan genomunun çıkarılmasıyla, tüm 3 milyar baz çiftinin diziliminin elde edilmesiyle, karşımıza bu bilgiyi farklı şekillerde kullanacak olan alanlar çıktı.Bu genleri anlamaya çalışan, bu genlerden oluşacak proteinleri belirlemeye çalışan alanların yanında bu bilginin analizini yapma ihtiyacı da Biyoenformatik alanını doğurdu. Biyoenformatik, biyolojik bilginin bilgisayarlar ve istatistiksel teknikler kullanılarak analiz edilmesidir; başka bir deyişle, biyoenformatik, biyolojik araştırmaları iyileştirmek ve hızlandırmak için bilgisayar veri tabanları ve algoritmaları geliştirme ve onlardan yarar sağlama bilimidir [1].

Blog

Biyoenformatik Nedir? Biyoenformatik'in Tanımı

Birçok organizmanın ve son olarak da 2001’de insan genomunun çıkarılmasıyla, tüm 3 milyar baz çiftinin diziliminin elde edilmesiyle, karşımıza bu bilgiyi farklı şekillerde kullanacak olan alanlar çıktı.Bu genleri anlamaya çalışan, bu genlerden oluşacak proteinleri belirlemeye çalışan alanların yanında bu bilginin analizini yapma ihtiyacı da Biyoenformatik alanını doğurdu. Biyoenformatik, biyolojik bilginin bilgisayarlar ve istatistiksel teknikler kullanılarak analiz edilmesidir; başka bir deyişle, biyoenformatik, biyolojik araştırmaları iyileştirmek ve hızlandırmak için bilgisayar veri tabanları ve algoritmaları geliştirme ve onlardan yarar sağlama bilimidir [1].

Blog

Biyoenformatik Nedir? Biyoenformatik'in Tanımı

Birçok organizmanın ve son olarak da 2001’de insan genomunun çıkarılmasıyla, tüm 3 milyar baz çiftinin diziliminin elde edilmesiyle, karşımıza bu bilgiyi farklı şekillerde kullanacak olan alanlar çıktı.Bu genleri anlamaya çalışan, bu genlerden oluşacak proteinleri belirlemeye çalışan alanların yanında bu bilginin analizini yapma ihtiyacı da Biyoenformatik alanını doğurdu. Biyoenformatik, biyolojik bilginin bilgisayarlar ve istatistiksel teknikler kullanılarak analiz edilmesidir; başka bir deyişle, biyoenformatik, biyolojik araştırmaları iyileştirmek ve hızlandırmak için bilgisayar veri tabanları ve algoritmaları geliştirme ve onlardan yarar sağlama bilimidir [1].

Blog

Biyoenformatik Nedir? Biyoenformatik'in Tanımı

Birçok organizmanın ve son olarak da 2001’de insan genomunun çıkarılmasıyla, tüm 3 milyar baz çiftinin diziliminin elde edilmesiyle, karşımıza bu bilgiyi farklı şekillerde kullanacak olan alanlar çıktı.Bu genleri anlamaya çalışan, bu genlerden oluşacak proteinleri belirlemeye çalışan alanların yanında bu bilginin analizini yapma ihtiyacı da Biyoenformatik alanını doğurdu. Biyoenformatik, biyolojik bilginin bilgisayarlar ve istatistiksel teknikler kullanılarak analiz edilmesidir; başka bir deyişle, biyoenformatik, biyolojik araştırmaları iyileştirmek ve hızlandırmak için bilgisayar veri tabanları ve algoritmaları geliştirme ve onlardan yarar sağlama bilimidir [1].

Blog

Biyoenformatik Nedir? Biyoenformatik'in Tanımı

Birçok organizmanın ve son olarak da 2001’de insan genomunun çıkarılmasıyla, tüm 3 milyar baz çiftinin diziliminin elde edilmesiyle, karşımıza bu bilgiyi farklı şekillerde kullanacak olan alanlar çıktı.Bu genleri anlamaya çalışan, bu genlerden oluşacak proteinleri belirlemeye çalışan alanların yanında bu bilginin analizini yapma ihtiyacı da Biyoenformatik alanını doğurdu. Biyoenformatik, biyolojik bilginin bilgisayarlar ve istatistiksel teknikler kullanılarak analiz edilmesidir; başka bir deyişle, biyoenformatik, biyolojik araştırmaları iyileştirmek ve hızlandırmak için bilgisayar veri tabanları ve algoritmaları geliştirme ve onlardan yarar sağlama bilimidir [1].

Blog

Biyoenformatik Nedir? Biyoenformatik'in Tanımı

Birçok organizmanın ve son olarak da 2001’de insan genomunun çıkarılmasıyla, tüm 3 milyar baz çiftinin diziliminin elde edilmesiyle, karşımıza bu bilgiyi farklı şekillerde kullanacak olan alanlar çıktı.Bu genleri anlamaya çalışan, bu genlerden oluşacak proteinleri belirlemeye çalışan alanların yanında bu bilginin analizini yapma ihtiyacı da Biyoenformatik alanını doğurdu. Biyoenformatik, biyolojik bilginin bilgisayarlar ve istatistiksel teknikler kullanılarak analiz edilmesidir; başka bir deyişle, biyoenformatik, biyolojik araştırmaları iyileştirmek ve hızlandırmak için bilgisayar veri tabanları ve algoritmaları geliştirme ve onlardan yarar sağlama bilimidir [1].

Blog

Hoş Geldim! Hoş Geldiniz!

Merhabalar, Biyoloji alanında özel olarak ilgi alanım olan ve daha fazla keşfetmem, üzerine çok şey öğrenmem gereken Biyoenformatik’i, bu blog aracılığıyla (olası ziyaretçilerimle birlikte) öğreneceğim. İlk yazımı biraz önce Biyoenformatik’in çeşitli otoriteler tarafından yapılan tanımları ile tamamladım. Daha sonra, Biyoenformatik’te geçen birçok ilkelerin tanımlarından da bahsetmek istiyorum. Ayrıca, Biyoenformatik hakkında yazılım dilleri, istatiksel yöntemler de yazılarımın konularını oluşturacak. Aynı zamanda Biyoenformatik ile ilgili haberlere de yer vermek ve bu haberlerle en son gelişmeleri takip etmeyi (ettirmeyi) planlıyorum.

Blog

Hoş Geldim! Hoş Geldiniz!

Merhabalar, Biyoloji alanında özel olarak ilgi alanım olan ve daha fazla keşfetmem, üzerine çok şey öğrenmem gereken Biyoenformatik’i, bu blog aracılığıyla (olası ziyaretçilerimle birlikte) öğreneceğim. İlk yazımı biraz önce Biyoenformatik’in çeşitli otoriteler tarafından yapılan tanımları ile tamamladım. Daha sonra, Biyoenformatik’te geçen birçok ilkelerin tanımlarından da bahsetmek istiyorum. Ayrıca, Biyoenformatik hakkında yazılım dilleri, istatiksel yöntemler de yazılarımın konularını oluşturacak. Aynı zamanda Biyoenformatik ile ilgili haberlere de yer vermek ve bu haberlerle en son gelişmeleri takip etmeyi (ettirmeyi) planlıyorum.