给 Minix 内核的进程消息计数

the minix book

Minix 是个有别于 Linux 单内核的微内核操作系统,系统各个模块之间采用发消息的方式互相通信。在这本经典操作系统书:Operating Systems Design and Implementation, 3/E 的第219页有个练习题:

44. Add code to the MINIX 3 kernel to keep track of the number of messages sent from process (or task) i to process (or task) j. Print this matrix when the F4 key is hit.

这个练习题要比修改 Minix 内核的进程调度简单多了,只需要:

  1. 为每个进程初始化一个计数器;
  2. 在每次进程发消息的时候计数;
  3. 找到 Minix 在什么地方触发 F4 键,并把每个进程的计数器打印出来就可以了。

代码

在 proc.h 的 proc 进程数据结构里定义一个计数器 p_mess[NR_TASKS + NR_PROCS]:

struct proc {
...
  unsigned long p_mess[NR_TASKS + NR_PROCS];
};

在 main.c 的 main 里初始化计数器:

for (j = 0; j < NR_TASKS + NR_PROCS; j++) rp->p_mess[j] = 0;

在 proc.c 的 sys_call 里计数:

switch(function) {
  case SENDREC:
      /* A flag is set so that notifications cannot interrupt SENDREC. */
      priv(caller_ptr)->s_flags |= SENDREC_BUSY;
      /* fall through */
  case SEND:
      result = mini_send(caller_ptr, src_dst, m_ptr, flags);
      if (function == SEND || result != OK) {
          break;                                /* done, or SEND failed */
      }                                         /* fall through for SENDREC */
      if (result == OK)
          rp->p_mess[NR_TASKS+src_dst]++;
  case RECEIVE:
      if (function == RECEIVE)
          priv(caller_ptr)->s_flags &= ~SENDREC_BUSY;
      result = mini_receive(caller_ptr, src_dst, m_ptr, flags);
      break;
  case NOTIFY:
      result = mini_notify(caller_ptr, src_dst);
      break;
  case ECHO:
      CopyMess(caller_ptr->p_nr, caller_ptr, m_ptr, caller_ptr, m_ptr);
      result = OK;
      break;
  default:
      result = EBADCALL;                        /* illegal system call */
}

最后按 F4 打印出来,修改 dmp_kernel.c 的 privileges_dmp:

  printf("\n---- message track table dump ----\n\n");
  printf("-nr-             ", proc_nr(rp));
  for (rp = oldrp; rp < oldrp+10; rp++) {
        printf("[%2d]  ", proc_nr(rp));
  }
  printf("\n");
  /*for (rp = oldrp; rp < END_PROC_ADDR; rp++) {*/
  for (rp = oldrp; rp < oldrp+10; rp++) {
      if (isemptyp(rp)) continue;
          printf("\n[%2d]  %-7.7s  ", proc_nr(rp), rp->p_name);
          for (i = 0; i < 10; i++) {
              printf(" %4lu ", rp->p_mess[i]);
              rp->p_mess[i] = 0;
        }
  }

编译内核和服务,并重新用新内核启动:

# cd /usr/src/tools
# make clean
# make hdboot; make services
# reboot

注意:如果修改了 proc.h 文件,一定要 make clean 以后再 make hdboot; make services 编译内核和服务以后才能正常启动,如果没有修改 proc.h 只修改了 .c 文件可以只 make hdboot.

修改 Minix 内核的进程调度

the minix book

国外很多大学把这本经典 Minix 书:Operating Systems Design and Implementation, 3/E 作为操作系统课程的教科书或者参考教材。这本书写的非常易懂,除了操作系统的理论基础而且还搭上 Minix 的源代码讲解,是学习操作系统和系统编程的绝好材料。Linus Torvalds 就是看了这本书写出了 Linux,并和这本书的作者、操作系统大牛 Andrew S. Tanenbaum 教授有段关于操作系统设计(微内核还是单内核)的著名争论:Tanenbaum–Torvalds debate.

软件的基础是操作系统,操作系统的核心是内核,内核的核心是进程调度和内存管理,所以 hack 内核的进程调度会是一件有趣的事情。VPSee 最近在温习那本牛书,收获很大,平时看书都看得明白,真正到了动手做的时候又是另外一回事,所以读万卷书、行万里路是有道理的。那本牛书的第107页讲到了 lottery scheduling,VPSee 决定把原来的 Minix 进程调度改成 lottery 算法调度 (lottery scheduling),当作是看书后的练习,有很多大学也把这个练习作为操作系统的课程设计,比如:UCSB 的 Project 1: User Shell and Lottery Scheduling. 关于 lottery 算法这里不做介绍了,网上一大把,这里只谈代码实现,修改代码分成下面几个部分,分别涉及到:proc.h, proc.c, do_fork.c, main.c 这几个文件,这里的代码是在2007年写的,基于Minix 3.1.1,其他版本的 Minix 可能不能运行。

代码

首先在进程数据结构里定义一个 tickets 变量:

struct proc {
  struct stackframe_s p_reg;	/* process' registers saved in stack frame */

  struct proc *p_nextready;	/* pointer to next ready process */
  struct proc *p_caller_q;	/* head of list of procs wishing to send */
  struct proc *p_q_link;		/* link to next proc wishing to send */
  message *p_messbuf; 	/* pointer to passed message buffer */
  proc_nr_t p_getfrom; 		/* from whom does process want to receive? */
  proc_nr_t p_sendto;		/* to whom does process want to send? */

  sigset_t p_pending; 		/* bit map for pending kernel signals */

  char p_name[P_NAME_LEN]; 	/* name of the process, including \0 */

  int p_tickets; 		/* tickets of the process */

#if DEBUG_SCHED_CHECK
  int p_ready, p_found;
#endif
};

在系统进程表里给系统进程初始化 tickets,tickets 大小范围在1到100之间:

PUBLIC void main()
{
...
for (rp = BEG_PROC_ADDR, i = -NR_TASKS; rp < END_PROC_ADDR; ++rp, ++i) {
        rp->p_rts_flags = SLOT_FREE;		/* initialize free slot */
        rp->p_nr = i;				/* proc number from ptr */
        (pproc_addr + NR_TASKS)[i] = rp;	/* proc ptr from number */

        if (i == -4)
                rp->p_tickets = 1;
        else
                rp->p_tickets = 100 + rand()%11;
 }

除了上面的系统进程外,给每个 fork 出来的用户进程初始化 tickets,tickets 大小范围在100到105之间,这样重新调度的机会比系统进程要小一些,系统进程是100到110之间:

PUBLIC int do_fork(m_ptr)
register message *m_ptr;        /* pointer to request message */
{
  /* Parent and child have to share the quantum that the forked process had,
   * so that queued processes do not have to wait longer because of the fork.
   * If the time left is odd, the child gets an extra tick.
   */
  rpc->p_ticks_left = (rpc->p_ticks_left + 1) / 2;
  rpp->p_ticks_left =  rpp->p_ticks_left / 2;

  rpc->p_ticket = 1 + rand()% 5;

  /* If the parent is a privileged process, take away the privileges from the
   * child process and inhibit it from running by setting the NO_PRIV flag.
}

接下来就是 lottery 核心算法和进程调度部分,用 lottery 算法选一个进程出来运行:

PRIVATE void pick_lot_proc()
{
  register struct proc *rp;                     /* process to run */
  register struct proc *prev_rp = NIL_PROC;
  int q;                                        /* iterate over queues */
  int sum = 0, lottery = 0;

  for (q = 0; q < NR_SCHED_QUEUES; q++) {
      rp = rdy_head[q];
      while (rp != NIL_PROC) {
          sum += rp->p_ticket;
          rp = rp->p_nextready;
      }
  }
  lottery = 1 + rand() % sum;
  sum = 0;
  for (q = 0; q < NR_SCHED_QUEUES; q++) {
      rp = rdy_head[q];
      while (rp != NIL_PROC) {
          sum += rp->p_ticket;
          if (sum >= lottery)
              goto next;
          prev_rp = rp;
          rp = rp->p_nextready;
      }
  }

  next:
  next_ptr = rp;		/* pick up rp as the next running process */
  if (priv(rp)->s_flags & BILLABLE)
      bill_ptr = rp; 
  return;
}

编译 Minix 内核:

# cd /usr/src/tools
# make hdboot
# make services

重启 Minix,如果正常运行的话,启动速度应该比以前要慢。如果系统启动不了,或者报错、kernel panic 等,可以在启动时候选择 “2 Start small Minix 3″,使用另外一个没有被破坏的 kernel image 启动系统进行 debug.