Güngör Budak's Blog

Bioinformatics, web programming, coding in general

Tags Cloud Sorted by Post Count for Jekyll Blogs without Plugins

Recently, I have been trying to transfer my old posts in a Blogger blog to my new Jekyll blog since I really liked this way of blogging. But there were some features that I like in Blogger and wasn’t supported in Jekyll by default. I did some research and found a very nice way of generating tags cloud the my blog.

Although I build my blog locally and then push to GitHub pages, I still try not to use a custom plugin. So this solution is not using any custom plugin and it should work for GitHub pages.

The idea comes from on this SO answer by Christian Specht. Briefly, he’s putting the counts of the tags in front of the tag name and then sorts, which at the end sorts the list. Of course, he needed to add some large value to the counts as string sorting doesn’t recognize numbers. But his solution wasn’t working for the tags with spaces (issue 1) and also although the ordering was correct for numbers and the tags were not alphabetically ordered (they were reverse ordered, issue 2).

So in the solution below, I fix these two issues with his approach.

First, I make the list of tags but with their counts prepended by count + (-10000) (solves issue 2 by giving negative ordering with -10000) and appended by the actual count. I also replace spaces in the tag names with ## which will help me work on tags with spaces (solves issue 1). These three data are joined by ### which is unusual enough for a separator.

Then, I convert this captured string into an array with | split = ' ' and then sort the array. Finally, I just split the tag name and post count, replace spaces back and output the tags in the correct order with their post counts.

There is also one additional thing here size in the code below which is used to style the tags. What I do is to add class tag-size- to each tag for showing them in different font sizes according to the their number of posts. I also put the CSS down below.

{% capture tags %}
  {% for tag in site.tags %}
    {{ tag[1].size | plus: -10000 }}###{{ tag[0] | replace: ' ', '##' }}###{{ tag[1].size }}
  {% endfor %}
{% endcapture %}
{% assign sorted_tags = tags | split: ' ' | sort %}
{% for sorted_tag in sorted_tags %}
    {% assign items = sorted_tag | split: '###' %}
    {% assign tag = items[1] | replace: '##', ' ' %}
    {% assign count = items[2] | plus: 0 %}
    {% if count > 5 %}
        {% assign size = 5 %}
    {% else %}
        {% assign size = count %}
    {% endif %}
    <span class="tag-size-{{ size }}">
        <a class="tag-link" href="/blog/tag/{{ tag | slugify }}" rel="tag">{{ tag }}</a> ({{ count }})
    </span>
{% endfor %}

Styles for the tags:

.tag-size-5 {
    font-size: 1.25rem;
}

.tag-size-4 {
    font-size: 1.10rem;
}

.tag-size-3 {
    font-size: 0.95rem;
}

.tag-size-2 {
    font-size: 0.80rem;
}

.tag-size-1 {
    font-size: 0.65rem;
}

Have a look at the tags cloud on the right for the live example!

How to Generate Database EER Diagrams from SQL Scripts using MySQL Workbench

MySQL Workbench makes it really easy to generate EER diagrams from SQL scripts. Follow below steps to make one for yourself.

Download and install MySQL Workbench for your system.

Open MySQL Workbench and create a new model (File -> New Model).

MySQL Workbench New Model

Then import your SQL script (File -> Import -> Reverse Engineer MySQL Create Script). Note that you should select MySQL Model tab prior to this to be able to import the SQL script.

MySQL Workbench Reverse Engineer MySQL Create Script

The above operation will open a window where you should be selecting your SQL file and make sure that you checked Place imported objects on a diagram. This will automatically generate the diagram for you.

After you Execute, and complete, click Continue and then Close to finish up.

This applies to MySQL Workbench 6.3.9 on macOS Sierra 10.12.6.

Get Size of MySQL Databases

Use below query in MySQL command prompt to get a table of databases and their sizes in MB.

SELECT table_schema "DB Name", Round(Sum(data_length + index_length) / 1024 / 1024, 1) "DB Size in MB" FROM information_schema.tables GROUP BY table_schema;

Computing Significance of Overlap between Two Sets using Hypergeometric Test

There are many cases where we have two sets (e.g. under two different conditions) of things such as transcripts, genes or proteins and we want to compute the significance of the overlap between them. Hypergeometric test is very simple and widely used option for such cases.

I’ll use the phyper function in R but you can use the same idea in SciPy (Python).

Let’s say you have from 200 genes (A);

  • 10 genes common or overlapping (set B ∩ set C)
  • 25 genes in set B
  • 50 genes in set C
  • 135 genes not in set B or set C

Hypergeometric test

To compute the significance of overlap use;

phyper(10, 50, 200 - 50, 25, lower.tail = FALSE)
[1] 0.0214406

So, if your threshold for p-value is 0.05 (or 5%), then you can say the overlap is significant.

ODTÜ Enformatik Enstitüsü'nün 20. Yılı Etkinliği

ODTÜ Enformatik Enstitüsü kuruluşunun 20. yılını bir bilim festivaliyle kutluyor. 16 Mayıs 2016‘da, ODTÜ Kültür ve Kongre Merkezi‘nde gerçekleştirilecek olan bilim festivaline herkes davetlidir!

Bilime, sanat ve müziğin de eşlik edeceği bu festivalde aşağıdaki ana konuşmacılar yer alacaktır:

  • Prof. Dr. Jennifer Hayes: New England Microsoft Araştırma ve New York Microsoft Araştırma yönetici ve eş kurucu
  • Assoc. Prof. Claudio Ferretti: Milano-Bicocca Üniversitesi, Bilgisayar Bilimi, Sistemleri ve İletişimi
  • Dr. Christian Borgs: Araştırmacı, New England Microsoft Araştırma vekil yönetici ve eş kurucu

Etkinliğin Facebook sayfasına gitmek için tıklayın.

Etkinlik programı

ODTÜ Enformatik 20. Yıl Program 1

ODTÜ Enformatik 20. Yıl Program 2

Mann Whitney U Test (Wilcoxon Rank-Sum Test) Javascript Implementation

Currently Javascript is really poor in statistical methods compared to Python (SciPy) and R. There are several efforts to fill this gap, most notably from jStat. However, still many functions, distributions and tests are missing in this library. In one of my projects, I had to implement a Javascript version of Mann Whitney U test (or also called Wilcoxon rank-sum test). Here, I’m giving a link to its source code and describing how it works.

Description

Mann Whitney U test is a nonparametric test with a null hypothesis that two samples belong to the same population. Consider you have two groups of numbers, they don’t follow any known distribution and you want to test if they are different. In such cases, you’d use Mann Whitney U test.

Mann-Whitney U test

The above plots show that between two groups (A/A and G/G, G/A) minimal CD4 levels are significantly different (Kobayashi et al., Jpn. J. Infect. Dis., 55, 131-133, 2002). And their significance are shown at the top as p-values.

This implementation is adapted from SciPy and R source codes and tested in both with several datasets.

GitHub Gist for Mann Whitney U test Javascript implementation (mannwhitneyu.js).

How to use

Just download mannwhitneyu.js file and add a script tag to your HTML pointing the downloaded file and call the test with your datasets. Create a file called index.html and paste the following in it and save. Also make sure you place mannwhitneyu.js next to it.

<html>
<head>
    <title>Mann Whitney U test</title>
    <script type="text/javascript" src="mannwhitneyu.js"></script>
</head>
<body>
    <script type="text/javascript">
        var x = [2, 4, 6, 2, 3, 7, 5, 1],
            y = [8, 10, 11, 14, 20, 18, 19, 9];
        var t = mannwhitneyu.test(x, y, alternative = 'less');
        console.log(t);
    </script>
</body>
</html>

Open index.html, and look at Console (Ctrl + Shift + J), you’ll see the result;

Object {U: 0, p: 0.0004654861357875073}

This result shows that numbers in x are significantly different and smaller than the ones in y. The alternative argument can also be greater, which is again a one-sided test, testing if the first group has numbers that are significantly different and greater. Also, you may give two-sided as alternative, which will compute a two-sided test.

Soon, I’ll send this code to jStat and hopefully it’ll be available there. I’m also considering making it as a Node module so that developers can easily include it in their Node projects.

MiClip 1.3 Installation

MiClip is a CLIP-seq data peak calling algorithm implemented in R but currently it doesn’t show up in the CRAN but you can obtain it from the archive and install from the source or tar.gz file.

Download the tar.gz file:

wget https://cran.r-project.org/src/contrib/Archive/MiClip/MiClip_1.3.tar.gz

Start R:

R

Install dependencies:

install.packages("moments")
install.packages("VGAM")

Finally install MiClip 1.3:

install.packages("MiClip_1.3.tar.gz", repos = NULL, type="source")

Then you can test it by loading the package and viewing its help file.