Tips about Regular Expression

  1. In Vim, greedy matching is used by default. In order to use non-greedy matching, you can use \{-}, or \{-n,m}, such as: :/Q.\{-0,200}[IL].\{-0,2000}FF.
  2. In Perl, default matching is also greedy. To use non-greedy matching, please change * , + , ? , and{} into *? , +? , ?? , and {}? , respectively.
  3. To count the number of matching in Vim, you can use command ":%s/<pattern>//g". After get the number of matches, then execute ":u" to undo the changes.
  4. In Perl, to count the number of matching, you can use command: 
    my $number =()= $string =~ /\./gi;




Calculate the CDF of Poisson Distribution with Boost C++ Library

The Cumulative Distribution Function(CDF) of Poisson distribution can be easily calculated by R function ppois() or octave/Matlab function poisspdf(). However, it is not a easy thing to deal with statistics with C++ from scratch.

Today I found a very powerful C++ mathematical library(actually not limited to math), boost. Unfortunately, it is hard to figure out the API of boost library from the code if you are not familiar with generalization programming(just like me). Besides, there is no useful "hello world" examples on the Internet to show the boost library handling statistics(examples on webpage C++ Statistical Distributions in Boost is a bit complicated and the header files are not included).

Therefore, I made up following example to show how to get the cdf of Poisson distribution with boost C++ library:

#include <boost/math/distributions/poisson.hpp>
#include <iostream>

using namespace std;
using namespace boost::math;

int main()
poisson_distribution<> p(2.9);
cout<<"cdf: ppois(1,2.9)="<<cdf(p,1)<<endl;

return 0;


  2. C++ Statistical Distributions in Boost

An Implementation of Merge Sort in C

Following C code is the implementation of merge sort, with the time complexity of O(nlogn). It was used in my current project to sort 148 million integers. At first I used bubbled sort, which took me hours to have the 148M integers sorted, because the time complexity of bubble sort is O(n^2). After replacing the sorting algorithm with merge sort, the time of sorting reduced to less than 10 mins. Amazing improvement! Although I have heard of the importance of sorting/searching algorithm for years, it was the first time I realize the magic of algorithms. 

The merge sort below was found in Internet. Sorry that I forgot to record the hyperlink of the webpage. Only a few changes were made by me. 


void Merge(int* input, long p, long r)
long mid = floor((p + r) / 2);
long i1 = 0;
long i2 = p;
long i3 = mid + 1;

// Temp array
int* temp=new int[r-p+1];

// Merge in sorted form the 2 arrays
while ( i2 <= mid && i3 <= r )
if ( input[i2] < input[i3] )
temp[i1++] = input[i2++];
temp[i1++] = input[i3++];

// Merge the remaining elements in left array
while ( i2 <= mid )
temp[i1++] = input[i2++];

// Merge the remaining elements in right array
while ( i3 <= r )
temp[i1++] = input[i3++];

// Move from temp array to master array
for ( int i = p; i <= r; i++ )
input[i] = temp[i-p];

delete [] temp;

// inputs:
// p - the start index of array input
// r - the end index of array input
void Merge_sort(int* input, long p, long r)
if ( p < r )
long mid = floor((p + r) / 2);
Merge_sort(input, p, mid);
Merge_sort(input, mid + 1, r);
Merge(input, p, r);

Connect Mac OS X with Linux Server

There are many ways to connect Mac OS X with Linux by command line. However, connecting them with GUI is not so easy. I tried two ways, but none of them work perfectly and each of them has both advantages and disadvantages.

The first way I tried was VNC. Chicken of the VNC was said to be the best VNC client under Mac OS X, unfortunately, I haven't make it work, even follow the tips of webpage Using VNC on Mac OS X. I used realvnc. The first step is to configure ssh. I followed the tips on webpage Connecting to Remote Linux Desktop via SSH with X11 Forwarding.  In the Linux server, after installing openssh, edit the file /etc/ssh/ssh_config and make sure

ForwardAgent yes
ForwardX11 yes
ForwardX11Trusted yes

Then edit /etc/ssh/sshd_config and make sure

X11Forwarding yes


In the terminal of Mac OS X, enter command "ssh -L 5901:localhost:5900 <login>@<LinuxServer>". 5901 is the port number to be used by RealVNC, and 5900 is the port number of the VNC in Linux. For details about this command, refer to SSH Port Forwarding on Mac OS X. Then start RealVNC, enter "localhost:5901" and your VNC password.

RealVNC works reliably, but the quality of GUI is not good and sometimes responds slowly.

The other method is also based on ssh, therefore the ssh should also be configured in the Linux side. The only difference is X11 should be installed in Mac OS X. The above
ssh command can also be used. Or just use "ssh -X <login>@<LinuxServer>". After login to Linux in terminal, run command "gnome-session"(this requires Gnome be installed in Linux server). In this way, remote Linux programs runs as local Mac applications. For more info, please refer to Connecting to Remote Linux Desktop via SSH with X11 Forwarding. The disadvantage of using this way is, X11 is not reliable in Mac OS X and it may crash at any time.

How to make NIC BCM57780 work in Scientific Linux?

I decided to install Scientific Linux on the computer in my lab. Partly because this distribution is called "Scientific", partly because it is compiled from Red Hat Enterprise Linux, the most prestigious Linux distribution in the world.

At first, I downloaded the Everything DVDs and burned it into my USB stick via UNetbootin. Unfortunately, it couldn't boot. Then I tried the Live CD. It did worked and I installed the SL in a few minutes. However, the network could not be connected and the NIC could not be activated. I checked the NIC type by "lspci | grep net", and I found it was Broadcom Corporation NetLink BCM57780.

Many people complain this network card. Some guys tried to download the source code of the driver from Broadcom and compiled it by themselves. I didn't want to do that, so I searched "tg3 rpm" and finally got the rpm package for SL kmod-tg3-3.122-1.el6_2.x86_64.rpm. I installed it and restarted the network even the computer, the NIC still didn't work.

Then I resorted to compiling the source code of tg3. Unfortunately, I realized that gcc was not installed -- the LiveCD didn't contain the GCC package. When I tried to manually install GCC from rpm, a series of shared library dependencies blocked me. So I have to downloaded the LiveDVD of SL, burned it again and installed the SL again. This time, the GCC was included.

Before building the tg3 driver, I installed the downloaded tg3 rpm package and searched again for the driver. Finally I found webpage How to updating driver for gigabit network card [Broadcom TG3:netXtream] on fedora core 4. and responses to eth0 no device found, NIC Broadcom tg3 drivers, kernel, DKMS. Combing the two webpages, I got to know how to make BCM57780 work in SL:

  1. Download and install kmod-tg3-3.122-1.el6_2.x86_64.rpm.
  2. Go to /lib/modules/<kernel#>/kernel/net and check if tg3.ko there. If not, try executing "locate tg3".
  3. Run command "insmod /path/to/tg3.ko".
  4. Run command "service network restart".
  5. Update /etc/.rc.local and append commands in 3 & 4 to the end. Or you must manually run commands 3&4 after reboot.

Besides, if the above still not work, you may consider appending "biosdevname=0" to the kernel line of the grub config file /etc/grub.conf.

[Update 08/09/2012]:

I am sorry that I didn't really tried if the way above modifying /etc/rc.local really work after reboot. And today I did try, but the answer was no. I struggled for another hour and finally found out the tricky tip: module tg3.ko must be removed and re-installed before restarting your network, or the Broadcom NIC still wouldn't work!

So you should append following code to your /etc/rc.local:

rmmod tg3.ko
insmod /path/to/tg3.ko
service network restart

SOCK_RAW Issue with setuid and chroot-ed login on Linux Servers(Still Unresolved)


when using function socket(AF_INET,SOCK_RAW,IPPROTO_TCP...) with setuid&chroot-ed fake root on Linux servers, it would always fail. However, the real root can work well. Usually the fake root can do most things that root login required.

After investigation, got following hints:

  • According to man page of SOCK_RAW(7), "Only processes with an effective user ID of 0 or the CAP_NET_RAW capability are allowed to open raw sockets".
  • According to capabilities(7) - Linux man page, "For the purpose of performing permission checks, traditional UNIX implementations distinguish two categories of processes: privileged processes (whose effective user ID is 0, referred to as superuser or root), and unprivileged processes (whose effective UID is nonzero). Privileged processes bypass all kernel permission checks, while unprivileged processes are subject to full permission checking based on the process's credentials (usually: effective UID, effective GID, and supplementary group list)".

Starting with kernel 2.2, Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities, which can be independently enabled and disabled. Capabilities are a per-thread attribute.


Use RAW and PACKET sockets.

  • In raw socket access as normal user on linux 2.4, setuid is suggested, but it didn’t work.

Since we can't provide root login to all users, we must either find a way to let raw sockets work with setuid&chroot-ed login, or substitute raw sockets with other options.

Error "ldd: execution failed due to signal 9"

Today, one of my folks met a very weird problem and finally he asked me for help. The issue was, when running command "ldd CDNap" on a Solaris 9 server(AP11), following error would show up:

AP11: ldd CDNap
ldd: CDNap: execution failed due to signal 9

However, when running this command on another Solaris 9 server(ap15), it worked well.

According to Weird exec failure, "My bet is that the program has a rather large BSS segment, and there isn't enough swap to reserve the space. (ldd(1) works by setting up some environment variables, then running the program itself, which is why it would fail)". To verify this, you can run

/usr/ccs/bin/size /path/to/program

To check the free space of swap, you can

swap -s

After comparing the swap space and memory of the two Solaris servers, I noticed that the server(AP11) bearing this issue has smaller memory(1GB) and swap space(only 75008k available).

AP11: swap -l
swapfile             dev  swaplo blocks   free
/dev/dsk/c0t2d0s1   136,1      16 4198304 4198304
AP11: swap -s
total: 513384k bytes allocated + 2296288k reserved = 2809672k used, 75008k available
AP11: uname -a
SunOS coolap22n 5.9 Generic_122300-61 sun4u sparc SUNW,UltraAX-i2
AP11: prtconf|grep Memory
Memory size: 1024 Megabytes

Server(ap15) without ldd issue has 2GB memory and larger swap(4966160k available).

ap15:root > swap -s
total: 372136k bytes allocated + 522864k reserved = 895000k used, 4966160k available
ap15:root > df -k swap
Filesystem            kbytes    used   avail capacity  Mounted on
swap                 1793064960 1589658648 203406312    89%    /tmp
ap15:root > swap -l
swapfile             dev  swaplo blocks   free
/dev/dsk/c0t2d0s1   136,1      16 8392544 8392544
ap15:root > uname -a
SunOS coolap10n 5.9 Generic_122300-61 sun4u sparc SUNW,UltraAX-i2

It seems it's time for such old Solaris servers to retire.

String Evolver, My First Genetic Algorithm

When reading Evolutionary Computation for Modeling and Optimization[1], I found following problem in section 1.2.3:

A string evolver is an evolutionary algorithm that tries to match a reference string from a population of random strings. The underlying character set of the string evolver is the alphabet from which the strings are drawn.

The solution given in the context is:

  1. Start with a reference string and a population of random strings.
  2. The fitness of a string is the number of positions in which it has the same character as the reference string.
  3. To evolve the population, split it into small random groups called tournaments.
  4. Copy the most fit string(break ties by picking at random among the most fit strings) over the least fit string in each tournament.
  5. Then change one randomly chosen character in each copy(mutation).
  6. Repeat until an exact match with the reference string is obtained.

When trying this solution, I noticed that it won't converge even after a long time. Therefore, I modified the crossover and mutation strategy.A subset of population(strings) will be selected based on fitness in a tournament, and the bottom-ranked
50% of each tournament will be deleted. New population is formed by crossing the remaining high 20 individuals in a tournament, while the high-ranked parents are kept unchanged[2]. Finally, a random selected string in a tournament will be mutated.

My codes are:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


升级Ubuntu 12.04所遇到的一些问题

升级Ubuntu 11.10到12.04后,开机报错“no partition found”,进入Grub后,输入"ls"命令,也报错“no partition found”。可见Grub已经崩溃了。

于是我下载了UNetbootin,然后利用它在Win 7下制作Ubuntu自启动U盘。制作好之后,结果发现不会自动引导启动,需要在开机之后出现Thinkpad欢迎界面时按F12,然后选择U盘启动。然后选择U盘启动Ubuntu,进入Ubuntu后按照 How to
Fix Grub 2
 中的方法安装Grub 2到硬盘。

To fix GRUB 2, you need an Ubuntu live CD from which you need to boot. Once you boot to the LIVE
CD, open a terminal an and type these commands:

a) Firstly, you need to find out on which partition your Linux system is installed:

sudo fdisk -l

(in my case, it's "sda1")

b) Now, we must mount this partition:

sudo mount /dev/sda1 /mnt

Where "sda1" is the partition where you installed Ubuntu (or any other Linux distro). It could be "sda5", "sda6", etc. for you.

c) Install grub to the partition you've mounted:

sudo grub-install --root-directory=/mnt/ /dev/sda

Important: Please
notice that it's "/dev/sda", not "/dev/sda1". "sda" is the hard disk on which your Linux distribution is installed!

d) Restart your computer. As previous Grub 2 entries are removed, run the following
command to restore them:

sudo update-grub

我以前是用USB Stick安装的Ubuntu,在安装过程中有一选项是选择引导程序的安装位置,默认是/dev/sdb,也就是U盘。因为当时我没注意,所以Grub 2被安装到了U盘,尽管事后重新安装到了硬盘上,但貌似升级后还是不太正常。

重启电脑进入Grub 2之后,输入命令(Grub 2中已经不再支持kernel命令,需要用linux指定kernel):

linux (hd0,X)/vmlinuz-3.2......
initrd (hd0,X)/initrd.img-3.2......

悲哀的是系统启动到一半仍然无法进入GUI或者shell。多次检查测试后确认 / 所在的分区已经崩溃,Ubuntu根本无法识别。

虽然不想,但此时我已别无选择,只能重新安装Ubuntu 12.04了。可惜我电脑里那些尚未备份的文件和电影啊……

Briss -- Crop the Margins of Your PDF Files


先是寄希望于Linux系统自带的pdfcrop,结果尝试了各种选项后,发现PDF的size没有变,还是原来的8.5x8.9 inch。


再后来发现了网上的一个PDFcrop版本,也是用Perl写的,只是更加友好一点,尝试之后仍不奏效。自己上手用hardcode改为自己想要的结果,失败。不过在该tool中最大的发现就是 1inch=28.3464567bp(dpi).


最后在网上搜索PDF Margin Crop,无意中发现了Briss,抱着试试看的心态下载使用了一下,发现竟然能够识别图片转PDF的文字区域,拖动左上或右下脚上的蓝色正方形区域还可一手动选择保留的区域, Terrific!


java -jar briss-0.0.13.jar
java -jar briss-0.0.13.jar cropthis.pdf