Unix Job Control Basics

Starting a job at a Unix Command prompt will cause that job to be attached to the shell that started the job.  If that shell is terminated, the job(s) that were started under it will be terminated as well.  This is caused by the fact that when a UNIX process ends all the processes contained within that processes' process tree are also terminated.  Lets have a look.

Understanding Process Trees

I will start up a program called SqlMonkey.   I like to use a shell called the zsh (spoken z-shell).  zsh has a % as it's prompt.   I will show the commands I excute and the output from each of them:

At Terminal A
-----------------------------------------------------------------------

laptop% alias monkey
monkey='~/opt/SqlMonkey/SqlMonkey/bin/SqlMonkey.sh'
laptop% monkey

At Terminal B
-------------------------------------------------------------------------

laptop% ps auxwww |grep zsh
usera  10608  0.0  0.0   5380  2456 pts/0    Ss   22:15   0:00 zsh
usera  10784  0.0  0.0   5380  2452 pts/1    Ss   22:17   0:00 zsh
usera  11217  0.0  0.0   5380  2452 pts/2    Ss+  22:24   0:00 zsh
usera  12325  0.0  0.0   3336   804 pts/1    S+   22:44   0:00 grep zsh

I happen to know that pid 10608 was the shell that I started SqlMonkey in, if you did not know which shell it was started in you would have to investigate a bit more.

laptop% pstree -A -p 10608 
zsh(10608)---SqlMonkey.sh(12296)---java(12297)-+-{java}(12298)
                                               |-{java}(12299)
                                               |-{java}(12300)
                                               |-{java}(12301)
                                               |-{java}(12302)
                                               |-{java}(12303)
                                               |-{java}(12304)
                                               |-{java}(12305)
                                               |-{java}(12306)
                                               |-{java}(12307)
                                               |-{java}(12308)
                                               |-{java}(12309)
                                               |-{java}(12310)
                                               |-{java}(12311)
                                               |-{java}(12312)
                                               `-{java}(12313)

What this shows is that zsh (pid 10608) started a program called SqlMonkey.sh (pid 12296) which started java (pid 12297) and java started up a bunch of lwp (which are NOT processes, but instead are threads)  they show up as processes in Linux, which is merely an artifact of how Linux implements pThreads and threading in general.

Returning to Terminal A
----------------------------------------------------------------------

Doing a CRTL-Z will cause the SqlMonkey.sh program to suspend.  Once a job is suspended, you can issue the 'jobs' command to see all the processes attached to the shell prompt you are at:

^Z
zsh: suspended  ~/opt/SqlMonkey/SqlMonkey/bin/SqlMonkey.sh
laptop% jobs
[1]  + suspended  ~/opt/SqlMonkey/SqlMonkey/bin/SqlMonkey.sh
laptop%

Job Number 1 is now suspended, you can't see it, but I tried to issue a sql command to the SqlMonkey, and it wouldn't let me because the SqlMonkey is suspended.

You can then send the job number 1 into the forground, or background by issuing the following commands:

laptop% bg %1
[1]  + continued  ~/opt/SqlMonkey/SqlMonkey/bin/SqlMonkey.sh
laptop% fg %1
[1]  + running    ~/opt/SqlMonkey/SqlMonkey/bin/SqlMonkey.sh

bg sends the job into the background.  This means that the SQL command is running, but the shell will also accept other commands.  fg then takes the command from running into the forground state.  This means that the command is running, but the shell cannot accept any other commands.

The job is currently in the forground, however throwing the job into the background is quite a simple task done as follows:

^Z
zsh: suspended  ~/opt/SqlMonkey/SqlMonkey/bin/SqlMonkey.sh
laptop% jobs
[1]  + suspended  ~/opt/SqlMonkey/SqlMonkey/bin/SqlMonkey.sh
laptop% bg %1
[1]  + continued  ~/opt/SqlMonkey/SqlMonkey/bin/SqlMonkey.sh
laptop% jobs
[1]  + running    ~/opt/SqlMonkey/SqlMonkey/bin/SqlMonkey.sh
laptop%

Now the SqlMonkey is running, but it is in the background, so I can issue other commands at the prompt.  If I no longer need the shell that I opend, I may just issue the command 'exit'.

laptop% exit
THIS CAUSED THE SHELL to EXIT as well as SqlMonkey

Now I change the alias that starts up the SqlMonkey program:

laptop% alias monkey
monkey='nohup ~/opt/SqlMonkey/SqlMonkey/bin/SqlMonkey.sh 1>nohup.out 2>&1 &'
laptop% monkey
[1] 14470
laptop% jobs
[1]  + running    nohup ~/opt/SqlMonkey/SqlMonkey/bin/SqlMonkey.sh > nohup.out 2>&1
laptop%

Lets first discuss how I changed things:

  • nohup - this means no hangup.  What it is does is prevents the sighup signal from being sent to the SqlMonkey.sh pid when the shell is terminated.  (we will look a bit deeper into this)
  • 1>nohup.out - this says send standard out (std.out) to the file nohup.out, and delete any existing data in the current nohup.out file
  • 2>&1 - this says send standard err (std.err) to the same location you sent std.out (which in this example is nohup.out)
  • & - this says through the process into the background imeadiately

Now doing a pstree gives the following:
SqlMonkey

laptop% pstree -A -p 11217
zsh(11217)---SqlMonkey.sh(14470)---java(14471)-+-{java}(14472)
                                               |-{java}(14473)
                                               |-{java}(14474)
                                               |-{java}(14475)
                                               |-{java}(14476)
                                               |-{java}(14477)
                                               |-{java}(14478)
                                               |-{java}(14479)
                                               |-{java}(14480)
                                               |-{java}(14481)
                                               |-{java}(14482)
                                               |-{java}(14483)
                                               |-{java}(14484)
                                               |-{java}(14485)
                                               |-{java}(14486)
                                               `-{java}(14487)

This looks pretty much the same as the situation above where we did not use nohup, and we did not use & to through the process into the background.  However, now lets see what happens to the SqlMonkey program when we exit the Shell program that started it up.

laptop% jobs
[1]  + running    nohup ~/opt/SqlMonkey/SqlMonkey/bin/SqlMonkey.sh > nohup.out 2>&1
laptop% exit

laptop% pstree -A -p 1    
init(1)-+-NetworkManager(3455)-+-dhclient(3809)
        |                      `-{NetworkManager}(3810)
        |-SqlMonkey.sh(14470)---java(14471)-+-{java}(14472)
        |                                   |-{java}(14473)
        |                                   |-{java}(14474)
        |                                   |-{java}(14475)
        |                                   |-{java}(14476)
        |                                   |-{java}(14477)
        |                                   |-{java}(14478)
        |                                   |-{java}(14479)
        |                                   |-{java}(14480)
        |                                   |-{java}(14481)
        |                                   |-{java}(14482)
        |                                   |-{java}(14483)
        |                                   |-{java}(14484)
        |                                   |-{java}(14485)
        |                                   |-{java}(14486)
        |                                   `-{java}(14487)

You can now see that the SqlMonkey program is attached to a very special unix process call process 1, or the init process.  When the init process dies, the UNIX system goes halts.

HELP! I started a long running application and forgot to nohup it.  Like really it was our production corporate trading application!  And I want to go home!

Bash and zsh have just the command for you!

Remember that what you want to do is to make sure the when your shell exits, you don't send the hangup SIGHUP to the long running application.  That can be accomplished with a built in bash/zsh command called disown, it works as follows:

First through the process into the background, so that it is in a running state:

^Z
zsh: suspended  ~/opt/SqlMonkey/SqlMonkey/bin/SqlMonkey.sh
balcock-laptop% jobs
[1]  + suspended  ~/opt/SqlMonkey/SqlMonkey/bin/SqlMonkey.sh
balcock-laptop% bg %1
[1]  + continued  ~/opt/SqlMonkey/SqlMonkey/bin/SqlMonkey.sh
balcock-laptop%

Now disown the command!

laptop% disown %1

Now issuing a pstree will show you that SqlMonkey is nolonger attached to your shell.

init(1)-+-NetworkManager(3455)-+-dhclient(3809)
        |                      `-{NetworkManager}(3810)
        |-SqlMonkey.sh(14470)---java(14471)-+-{java}(14472)
        |                                   |-{java}(14473)
        |                                   |-{java}(14474)

 

There are a few options that you can send to disown, 

  • -a -  disown all jobs  (you don't need to specify a job number e.g. %1)
  • -h - make disown behave more like nohup (i.e. the jobs will stay in your current shell's process tree until you exit your shell) This allows you to see all the jobs that this shell started.
  • -r - only disown running jobs.  All jobs in a suspended state will not be disowned.

 

HOW DO I FIND MORE INFORMATION ON DISOWN?

type 'man bash' and searh for disown.  Below is what you should find:

       disown [-ar] [-h] [jobspec ...]
              Without  options,  each  jobspec  is  removed  from the table of
              active jobs.  If the -h option is given,  each  jobspec  is  not
              removed from the table, but is marked so that SIGHUP is not sent
              to the job if the shell receives a SIGHUP.   If  no  jobspec  is
              present,  and  neither the -a nor the -r option is supplied, the
              current job is used.  If no jobspec is supplied, the  -a  option
              means  to  remove or mark all jobs; the -r option without a job?
              spec argument restricts operation to running jobs.   The  return
              value is 0 unless a jobspec does not specify a valid job.

 




Please type the letters and numbers shown in the image.
 Captcha CodeClick the image to see another captcha.