Notes of Introduction to OS Abstractions Using Plan 9 from Bell Labs(I)

1. Loaded Programs

1.1 Command nm

Command nm can be used to display symbol information in both objects and binary files, because it looks at the symbol table stored in the binary for debugging purposes. Command strip can be used to remove the symbol table.

Option –n asks nm to sort the output by symbol address. The addresses are virtual memory addresses, because the system uses the virtual memory hardware to keep each process in its own virtual address space.

In the output of nm, etext is a symbol defined by the linker to let you know where the end of test is, and edata reports the address where the initialized data terminates.

1.2 What does the system (kernel) loader do?

· The header in the binary file reports the memory size required for the program text, and the file keeps the memory image of that text. Therefore, the system can just copy all this into memory. For a given system and architecture, there is a convention regarding which addresses the program must use. Therefore, the system knows where to load the program.

· The header in the binary reports the memory size required for initialized variables (global) and file contains a memory image for them. Thus, the system can copy those bytes to memory. Note that the system has no idea regarding where does one variable start or how big it is. The system only knows how many bytes it has to copy to memory, and at which address should they be copied.

· For uninitialized global variables, the binary header reports their total size. The system allocates that amount of memory for the program. That is all it has to do. As a courtesy, Plan 9 guarantees that such memory is initialized with all bytes being zero. This means that all your global variables are initialized to null values by default.

1.3 Memory image for the global program

The virtual memory of a process in Plan 9 has several segments. A memory segment is a portion of contiguous memory with some properties. Segments used by a Plan 9 process are:

· The test segment. It contains instructions that can be executed but not modified. The hardware is used by the system to enforce these permissions. The memory is initialized by the system with the program text (code) kept within the binary file for the program.

· The data segment. It contains the initialized data for the program. Protection is set to allow both read and write operations on it, but you cannot execute instructions on it. The memory is initialized by the system using the initialized data kept within the binary file for the program.

· The uninitialized data segment, called bss segment, which is almost like the data segment. However, this one is initialized by zeroing its memory. The name of the segment comes from an arcane instruction used to implement it on a machine that no longer exists. How much memory is given depends on the size recorded in the binary file. Moreover, this segment can grow, by using a system call that allocates more memory for it. Function libraries like malloc cause this segment to grow when they consume all the available memory in this segment. This is the reason for the gap between this segment and the stack segment, to leave room for the segment to grow.

· The stack segment is also used for reading and writing memory. Unlike other segments, this segment seems to grow automatically when more space is used. It is used to keep the stack for the process.

1.4 Process Arguments

The macros ARGBEGIN and ARGEND loop through the argument list, removing and processing

options. After ARGEND, both argc and argv reflect the argument list without any option.

Between both macros, we must write the body for a switch statement (supplied by

ARGBEGIN), with a case per option.

Macros defined in plan9.h

extern char *argv0;
#define ARGBEGIN for((void)(argv0||(argv0=*argv)),argv++,argc--;/
       argv[0] && argv[0][0]=='-' && argv[0][1];/
       argc--, argv++) {/
    char *_args, *_argt;/
    Rune _argc;/
    _args = &argv[0][1];/
    if(_args[0]=='-' && _args[1]==0){/
     argc--; argv++; break;/
    }/
    _argc = 0;/
    while(*_args && (_args += chartorune(&_argc, _args)))/
    switch(_argc)
#define ARGEND  SET(_argt);USED(_argt);USED(_argc);USED(_args);}/
     USED(argv);USED(argc);
#define ARGF()  (_argt=_args, _args="",/
    (*_argt? _argt: argv[1]? (argc--, *++argv): 0))
#define EARGF(x)  (_argt=_args, _args="",/
    (*_argt? _argt: argv[1]? (argc--, *++argv): ((x), abort(), (char*)0)))
 
#define ARGC()  _argc
 
#define SET(x) (x) = 0
#define USED(x) (void)(x)

Most of the Plan 9 programs that accept multiple options use these macros to process their

argument list in search for options. This means that the invocation syntax is similar for most pro-grams. You may combine options in a single argument, use multiple arguments, supply arguments for options immediately after the option letter, or use another argument,

terminate the option list by giving a -- argument, and so on.

Source code of bind.c

#include <u.h>
#include <libc.h>
 
void usage(void);
 
void
main(int argc, char *argv[])
{
 ulong flag = 0;
 int qflag = 0;
 
 ARGBEGIN{
 case 'a':
  flag |= MAFTER;
  break;
 case 'b':
  flag |= MBEFORE;
  break;
 case 'c':
  flag |= MCREATE;
  break;
 case 'q':
  qflag = 1;
  break;
 default:
  usage();
 }ARGEND
 
 if(argc != 2 || (flag&MAFTER)&&(flag&MBEFORE))
  usage();
 
 if(bind(argv[0], argv[1], flag) < 0){
  if(qflag)
   exits(0);
  /* try to give a less confusing error than the default */
  if(access(argv[0], 0) < 0)
   fprint(2, "bind: %s: %r/n", argv[0]);
  else if(access(argv[1], 0) < 0)
   fprint(2, "bind: %s: %r/n", argv[1]);
  else
   fprint(2, "bind %s %s: %r/n", argv[0], argv[1]);
  exits("bind");
 }
 exits(0);
}
 
void
usage(void)
{
 fprint(2, "usage: bind [-b|-a|-c|-bc|-ac] new old/n");
 exits("usage");
}

1.5 System call errors

There are several ways of printing out the error string. The most convenient way is using the format “%r” in print.

There is a function that both prints a message and exits. It is called sysfatal, and is used like follows.

The system call rerrstr reads the error string. It stores the string at the buffer you supply.

The system call werrstr writes a new value for the error string. It is used like the print. Using it, we can implement a function that pops an element from a stack and reports errors nicely:

1.6 Environment

To obtain the value for a environment variable, from a C program, we can use the getenv system call. If the variable is not defined, getenv returns a null string. A related call is putenv, which accepts a name and a value, and set the corresponding environment variable accordingly.

#include <u.h>
#include <libc.h>
 
void main()
 
{
 char * path;
 
 path=getenv("path");
 if(path==nil)
  sysfatal("path not defined!");
 print("PATH is %s /n", path);
 
 exits(nil);
}

1.7 Process States

1.8 Debugging

The program src knows how to obtain the source file name and line number that corresponds to that program counter.

; src -n -s 0x000016ff 8.hi

/sys/src/libc/fmt/dofmt.c:37

Option -n causes the source file name and line to be printed. Otherwise src would ask your editor to display your file and line. Option -s permits you to give a memory address or a symbol name to locate its source.

acid is the debugger can be used to dump the stack (function stk()), memory (function mem()) and so on.