Basic Linux Multi-Process and Multi-Thread Programming

Today I have to make my algorithm running in parallel in order to make it faster. At first I used following way to implement multi-process:

	unsigned int proc_num = 5;
	pid_t* pids=new pid_t[proc_num];
	double incr=(double)N/(double)proc_num;
	/* Start children. */
	for (unsigned int i = 0; i < proc_num; ++i) {
		if ((pids[i] = fork()) < 0) {
	 	} else if (pids[i] == 0) {
	    		// DoWorkInChild();

	/* Wait for children to exit. */
	int status;
	pid_t pid;
	while (proc_num > 0) {
	  	pid = wait(&status);
		if(status != 0)
	  		printf("Child Process with PID %ld aborted with status 0x%x.\n", (long)pid, status);
	  	--proc_num;  // TODO(pts): Remove pid from the pids array.

Above way worked well, however, there's no way to change the "shared" variables in child processes. Because each child process has an independent copy of all variables.

In order to change the same array in parallel, I implemented multi threads.

struct thread_info {    /* Used as argument to thread_start() */
	pthread_t thread_id;        /* ID returned by pthread_create() */
	int	start;       /* Application-defined thread # */
	int     end;      /* From command-line argument */
	int*	gpdarr;	/* Array to store GPD number */
	long int G;	/* genome length */
	unsigned int L;	/* read length */
	double l1;	// GPD parameter lambda1
	double l2;	// GPD parameter lambda2
	double M;	// maximum of GPD density

void printPt(pthread_t pt) {
  unsigned char *ptc = (unsigned char*)(void*)(&pt);
  for (size_t i=0; i<sizeof(pt); i++) {
    printf("%02x", (unsigned)(ptc[i]));

void* gen_gpd_num(void* arg)
	struct thread_info *tinfo = (struct thread_info *) arg;
	// do something here


int main()
        unsigned int tnum = 20;
	double incr=(double)N/(double)tnum;
	thread_info* tinfo=new thread_info[tnum];

	int err;
	void* res;
	/* Start children. */
	for(unsigned int idx=0;idx<tnum;++idx) 

		err = pthread_create(&tinfo[idx].thread_id, NULL, &gen_gpd_num, &tinfo[idx]);

			printf("can't create thread :[%s]", strerror(err));
			pthread_join(tinfo[idx].thread_id, &res);
        delete [] tinfo;

The potential problem of using multi-threading in Linux is -- it is hard to figure out if there really are many threads are running simultaneously. 

When compiling multi-threading program using GCC, pthread library must be specified:

g++ -o foo -Wall -L/usr/lib -lpthread


  1. How to Create Threads in Linux (With a C Example Program)


Simple Matlab/Octave Commands to Process Data

To load data in following format stored in a text file "read_pos' into Matlab/Octave, use commands

f=fopen('read_pos','r'); % open file
a=textscan(f,'%s%d'); % read file
y=a{2}; % a is a cell: a{1} stores read id, a{2} stores numbers.

Then data of the second column are stored in variable y.

r12353.1 2407054.5
r12361.1 5328858.5
r12363.1 2360272
r12368.1 4726440.5
r12372.1 2224001
r12373.1 5165613.5
r12381.1 501776
r12385.1 3475398
r12394.1 3376364
r12401.1 2142875.5
r12411.1 2191090.5
r12419.1 1240590
r12420.1 4903572
r12422.1 767011.5
r12426.1 3575915.5
r12429.1 554956
r12433.1 4786335
r12435.1 3373955.5
r12442.1 5363611.5
r12452.1 1903660.5
r12454.1 2784165
r12466.1 5137479
r12470.1 592191

If there are only numbers stored in file "read_pos", following commands should be used:

f=fopen('read_pos','r'); % open file
a=textscan(f,'%s%d'); % read file
y=cell2mat{a}; % convert cell a to vector
size(y) % check the size

To plot the distribution of all the numbers, you can:


If another data set z is also loaded and you want to plot y and z in the same graph, you can:

hold on

To annotate and scale the graph, use commands;

xlabel('AMD Data k-mer Occurrances');
title('GPD Classification of AMD k-mers')
legend('Bin 1','Bin 2')
axis([-5 5 0 10000]) % scale axes

A matlab function to plot multiple datasets:

function result=plot_distr(varargin)


marker={'bo' 'r+' 'k*' 'gs' 'y^' 'md' 'cp' 'bx' 'r.' 'k>' 'gh' 'y<'};
for k=1:nVarargs
hold on
ylim([0-5*nVarargs 6*nVarargs]); % scale y-axis

Tips about Regular Expression

  1. In Vim, greedy matching is used by default. In order to use non-greedy matching, you can use \{-}, or \{-n,m}, such as: :/Q.\{-0,200}[IL].\{-0,2000}FF.
  2. In Perl, default matching is also greedy. To use non-greedy matching, please change * , + , ? , and{} into *? , +? , ?? , and {}? , respectively.
  3. To count the number of matching in Vim, you can use command ":%s/<pattern>//g". After get the number of matches, then execute ":u" to undo the changes.
  4. In Perl, to count the number of matching, you can use command: 
    my $number =()= $string =~ /\./gi;




Calculate the CDF of Poisson Distribution with Boost C++ Library

The Cumulative Distribution Function(CDF) of Poisson distribution can be easily calculated by R function ppois() or octave/Matlab function poisspdf(). However, it is not a easy thing to deal with statistics with C++ from scratch.

Today I found a very powerful C++ mathematical library(actually not limited to math), boost. Unfortunately, it is hard to figure out the API of boost library from the code if you are not familiar with generalization programming(just like me). Besides, there is no useful "hello world" examples on the Internet to show the boost library handling statistics(examples on webpage C++ Statistical Distributions in Boost is a bit complicated and the header files are not included).

Therefore, I made up following example to show how to get the cdf of Poisson distribution with boost C++ library:

#include <boost/math/distributions/poisson.hpp>
#include <iostream>

using namespace std;
using namespace boost::math;

int main()
poisson_distribution<> p(2.9);
cout<<"cdf: ppois(1,2.9)="<<cdf(p,1)<<endl;

return 0;


  2. C++ Statistical Distributions in Boost

An Implementation of Merge Sort in C

Following C code is the implementation of merge sort, with the time complexity of O(nlogn). It was used in my current project to sort 148 million integers. At first I used bubbled sort, which took me hours to have the 148M integers sorted, because the time complexity of bubble sort is O(n^2). After replacing the sorting algorithm with merge sort, the time of sorting reduced to less than 10 mins. Amazing improvement! Although I have heard of the importance of sorting/searching algorithm for years, it was the first time I realize the magic of algorithms. 

The merge sort below was found in Internet. Sorry that I forgot to record the hyperlink of the webpage. Only a few changes were made by me. 


void Merge(int* input, long p, long r)
long mid = floor((p + r) / 2);
long i1 = 0;
long i2 = p;
long i3 = mid + 1;

// Temp array
int* temp=new int[r-p+1];

// Merge in sorted form the 2 arrays
while ( i2 <= mid && i3 <= r )
if ( input[i2] < input[i3] )
temp[i1++] = input[i2++];
temp[i1++] = input[i3++];

// Merge the remaining elements in left array
while ( i2 <= mid )
temp[i1++] = input[i2++];

// Merge the remaining elements in right array
while ( i3 <= r )
temp[i1++] = input[i3++];

// Move from temp array to master array
for ( int i = p; i <= r; i++ )
input[i] = temp[i-p];

delete [] temp;

// inputs:
// p - the start index of array input
// r - the end index of array input
void Merge_sort(int* input, long p, long r)
if ( p < r )
long mid = floor((p + r) / 2);
Merge_sort(input, p, mid);
Merge_sort(input, mid + 1, r);
Merge(input, p, r);

Connect Mac OS X with Linux Server

There are many ways to connect Mac OS X with Linux by command line. However, connecting them with GUI is not so easy. I tried two ways, but none of them work perfectly and each of them has both advantages and disadvantages.

The first way I tried was VNC. Chicken of the VNC was said to be the best VNC client under Mac OS X, unfortunately, I haven't make it work, even follow the tips of webpage Using VNC on Mac OS X. I used realvnc. The first step is to configure ssh. I followed the tips on webpage Connecting to Remote Linux Desktop via SSH with X11 Forwarding.  In the Linux server, after installing openssh, edit the file /etc/ssh/ssh_config and make sure

ForwardAgent yes
ForwardX11 yes
ForwardX11Trusted yes

Then edit /etc/ssh/sshd_config and make sure

X11Forwarding yes


In the terminal of Mac OS X, enter command "ssh -L 5901:localhost:5900 <login>@<LinuxServer>". 5901 is the port number to be used by RealVNC, and 5900 is the port number of the VNC in Linux. For details about this command, refer to SSH Port Forwarding on Mac OS X. Then start RealVNC, enter "localhost:5901" and your VNC password.

RealVNC works reliably, but the quality of GUI is not good and sometimes responds slowly.

The other method is also based on ssh, therefore the ssh should also be configured in the Linux side. The only difference is X11 should be installed in Mac OS X. The above
ssh command can also be used. Or just use "ssh -X <login>@<LinuxServer>". After login to Linux in terminal, run command "gnome-session"(this requires Gnome be installed in Linux server). In this way, remote Linux programs runs as local Mac applications. For more info, please refer to Connecting to Remote Linux Desktop via SSH with X11 Forwarding. The disadvantage of using this way is, X11 is not reliable in Mac OS X and it may crash at any time.

How to make NIC BCM57780 work in Scientific Linux?

I decided to install Scientific Linux on the computer in my lab. Partly because this distribution is called "Scientific", partly because it is compiled from Red Hat Enterprise Linux, the most prestigious Linux distribution in the world.

At first, I downloaded the Everything DVDs and burned it into my USB stick via UNetbootin. Unfortunately, it couldn't boot. Then I tried the Live CD. It did worked and I installed the SL in a few minutes. However, the network could not be connected and the NIC could not be activated. I checked the NIC type by "lspci | grep net", and I found it was Broadcom Corporation NetLink BCM57780.

Many people complain this network card. Some guys tried to download the source code of the driver from Broadcom and compiled it by themselves. I didn't want to do that, so I searched "tg3 rpm" and finally got the rpm package for SL kmod-tg3-3.122-1.el6_2.x86_64.rpm. I installed it and restarted the network even the computer, the NIC still didn't work.

Then I resorted to compiling the source code of tg3. Unfortunately, I realized that gcc was not installed -- the LiveCD didn't contain the GCC package. When I tried to manually install GCC from rpm, a series of shared library dependencies blocked me. So I have to downloaded the LiveDVD of SL, burned it again and installed the SL again. This time, the GCC was included.

Before building the tg3 driver, I installed the downloaded tg3 rpm package and searched again for the driver. Finally I found webpage How to updating driver for gigabit network card [Broadcom TG3:netXtream] on fedora core 4. and responses to eth0 no device found, NIC Broadcom tg3 drivers, kernel, DKMS. Combing the two webpages, I got to know how to make BCM57780 work in SL:

  1. Download and install kmod-tg3-3.122-1.el6_2.x86_64.rpm.
  2. Go to /lib/modules/<kernel#>/kernel/net and check if tg3.ko there. If not, try executing "locate tg3".
  3. Run command "insmod /path/to/tg3.ko".
  4. Run command "service network restart".
  5. Update /etc/.rc.local and append commands in 3 & 4 to the end. Or you must manually run commands 3&4 after reboot.

Besides, if the above still not work, you may consider appending "biosdevname=0" to the kernel line of the grub config file /etc/grub.conf.

[Update 08/09/2012]:

I am sorry that I didn't really tried if the way above modifying /etc/rc.local really work after reboot. And today I did try, but the answer was no. I struggled for another hour and finally found out the tricky tip: module tg3.ko must be removed and re-installed before restarting your network, or the Broadcom NIC still wouldn't work!

So you should append following code to your /etc/rc.local:

rmmod tg3.ko
insmod /path/to/tg3.ko
service network restart

SOCK_RAW Issue with setuid and chroot-ed login on Linux Servers(Still Unresolved)


when using function socket(AF_INET,SOCK_RAW,IPPROTO_TCP...) with setuid&chroot-ed fake root on Linux servers, it would always fail. However, the real root can work well. Usually the fake root can do most things that root login required.

After investigation, got following hints:

  • According to man page of SOCK_RAW(7), "Only processes with an effective user ID of 0 or the CAP_NET_RAW capability are allowed to open raw sockets".
  • According to capabilities(7) - Linux man page, "For the purpose of performing permission checks, traditional UNIX implementations distinguish two categories of processes: privileged processes (whose effective user ID is 0, referred to as superuser or root), and unprivileged processes (whose effective UID is nonzero). Privileged processes bypass all kernel permission checks, while unprivileged processes are subject to full permission checking based on the process's credentials (usually: effective UID, effective GID, and supplementary group list)".

Starting with kernel 2.2, Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities, which can be independently enabled and disabled. Capabilities are a per-thread attribute.


Use RAW and PACKET sockets.

  • In raw socket access as normal user on linux 2.4, setuid is suggested, but it didn’t work.

Since we can't provide root login to all users, we must either find a way to let raw sockets work with setuid&chroot-ed login, or substitute raw sockets with other options.

Error "ldd: execution failed due to signal 9"

Today, one of my folks met a very weird problem and finally he asked me for help. The issue was, when running command "ldd CDNap" on a Solaris 9 server(AP11), following error would show up:

AP11: ldd CDNap
ldd: CDNap: execution failed due to signal 9

However, when running this command on another Solaris 9 server(ap15), it worked well.

According to Weird exec failure, "My bet is that the program has a rather large BSS segment, and there isn't enough swap to reserve the space. (ldd(1) works by setting up some environment variables, then running the program itself, which is why it would fail)". To verify this, you can run

/usr/ccs/bin/size /path/to/program

To check the free space of swap, you can

swap -s

After comparing the swap space and memory of the two Solaris servers, I noticed that the server(AP11) bearing this issue has smaller memory(1GB) and swap space(only 75008k available).

AP11: swap -l
swapfile             dev  swaplo blocks   free
/dev/dsk/c0t2d0s1   136,1      16 4198304 4198304
AP11: swap -s
total: 513384k bytes allocated + 2296288k reserved = 2809672k used, 75008k available
AP11: uname -a
SunOS coolap22n 5.9 Generic_122300-61 sun4u sparc SUNW,UltraAX-i2
AP11: prtconf|grep Memory
Memory size: 1024 Megabytes

Server(ap15) without ldd issue has 2GB memory and larger swap(4966160k available).

ap15:root > swap -s
total: 372136k bytes allocated + 522864k reserved = 895000k used, 4966160k available
ap15:root > df -k swap
Filesystem            kbytes    used   avail capacity  Mounted on
swap                 1793064960 1589658648 203406312    89%    /tmp
ap15:root > swap -l
swapfile             dev  swaplo blocks   free
/dev/dsk/c0t2d0s1   136,1      16 8392544 8392544
ap15:root > uname -a
SunOS coolap10n 5.9 Generic_122300-61 sun4u sparc SUNW,UltraAX-i2

It seems it's time for such old Solaris servers to retire.

String Evolver, My First Genetic Algorithm

When reading Evolutionary Computation for Modeling and Optimization[1], I found following problem in section 1.2.3:

A string evolver is an evolutionary algorithm that tries to match a reference string from a population of random strings. The underlying character set of the string evolver is the alphabet from which the strings are drawn.

The solution given in the context is:

  1. Start with a reference string and a population of random strings.
  2. The fitness of a string is the number of positions in which it has the same character as the reference string.
  3. To evolve the population, split it into small random groups called tournaments.
  4. Copy the most fit string(break ties by picking at random among the most fit strings) over the least fit string in each tournament.
  5. Then change one randomly chosen character in each copy(mutation).
  6. Repeat until an exact match with the reference string is obtained.

When trying this solution, I noticed that it won't converge even after a long time. Therefore, I modified the crossover and mutation strategy.A subset of population(strings) will be selected based on fitness in a tournament, and the bottom-ranked
50% of each tournament will be deleted. New population is formed by crossing the remaining high 20 individuals in a tournament, while the high-ranked parents are kept unchanged[2]. Finally, a random selected string in a tournament will be mutated.

My codes are:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>