Return to the 15-412 Staff-412@cs Q&A Page

Staff-412@cs Mailing List Questions and Answers

To serve you better: Below are excepts from many of the recent questions and answers logged on the staff-412@cs list. Although this record is not inclusive, it includes any of the posts that we thought might be useful to the class as a whole. Identities have, to the best of our ability, been removed. Some questions and answers have been edited for brevity or topical general interest.

As always, we're here to help -- please give us the opportunity.


Question

Can a limit be imposed on the line length for a command in ysh?

Answer

It is okay to impose a reasonable limit. 10 character would probably be a bad idea, 256 chars, 512 chars, 1K, or more is fine.


Question

can built-in shell commands be mixed with executable commands in ysh? for example in tcsh: ~ % echo $status > temp ~ % more temp 0 ~ %

Answer

It is not really difficult to allow for the redirection of the internal commands, but we won't be testing this behavior. The project only requires that executible programs be redirectable. THe internal commands an\ren't actually exexcutible programs, they are behaviors of the shell itself. But please do realize that most, if not all, well-known shells will allow internal commands to be redirected.


Question

what is it with the examples that keep changing?

Answer

Everything should have stabalized yesterday evening when I posted to the bboards (until then there were posts discussing difficulty with the example code...)


Question

Are we allowed to assume that there will be at least one space between tokens? e.g., is ls|more a legal statement, or can we assume it will always be ls | more?

Answer

I can't promise whether or not we'll actually test this behavior. But you really shouldn't assume spaces -- they're not necessary.


Question

Do we have to worry about the case where a backgrounded process asks for input through stdin (assuming stdin is not redirected to a file)? As in, if the user says "vi foo.txt &"

Answer

Don't worry about that one. Under normal circumstances, that process would block. We won't test its behavior within your shell. I hope this answers your question. If not, please ask again -- we're here to help.


Question

Many of you may have had difficulty with the example code. This is because you were using a job-control capable shell like csh or tcsh.

Answer

Since your login shell is the shell that is initially bound to the controlling terminal, it continues to receive the certain signals from the controlling terminal, take some action, and propogate these signals to your shells. The two solutions to this problem are: (a) create a new session and allocate a new terminal for it (the latter part is complex) (b) Tell your process to ignore a signal that is used by the job control shell. Jason and I have modified the example code for the two terminal examples to function this way. It also makes both of the #defines unnecessary. These can serve both as examples of terminal groups, &c and also as examples of manipulating a process's own process mask (the other two manipulate the mask for the associated signal handler only).


Question

The problems that many people are experiencing are related to job control shells like tcsh and sh. These shells interact very tighly with the terminal dirver and this relationship can't easily be undone.

Answer

To solve this problem, we basically need to ignore SIGTTOU, a signal generate by the job control shell. We don't want to get this signal, becuase we really want to pretend that the underlying shell isn't there. Unfortunately, it is very difficult to sever the relationship between the terminal driver and the login spawned shell, so we just need to ignore this signal. Please take a look at the revised examples. Even if you are using a non-job-control shell, like sh, the revised examples also provide a good example of how to block signals for a process (as opposed to for a signal handler)


Question

When are this semester's office hours?

Answer

With project #2 in progress, I'd like to announce our office hours. Thanks for giving us some time to get organized this semester -- TA assignments are always a bit slow at the beginning of the spring semester. Please remember that we're here to help. Please give us the opportunity.

--------------------------
Gregory Kesden:

  3:00-4:00 Monday - Friday in WeH 8021

  Although my "official" office hours run from 3:00 to 4:00, I am
  usually available from 1:30 - 4:00, and in the evenings.

  Please don't hesitate to drop by any time I'm here. You can finger me on
  my workstation (gigo.sp) or call ahead (268-1590). 

  For the moment, I am just leaving myself some wiggle room, since TAs for
  612 haven't been settled, yet. Things may be a bit hectic and I may have
  to schedule meetings once the TAs arrive.



15-412 Office Hours:

  Monday:

    3:00 - 5:00   Benicio Sanchez   WeH 3108
    3:00 - 4:00   Gregory Kesden    WeH 8021

  Tuesday:
   
    3:30 - 5:30   Michelle berger   WeH 3108
    5:30 - 7:30   Benicio Sanchez   WeH 3108
    3:00 - 4:00   Gregory Kesden    WeH 8021

  Wednesday

    3:00 - 5:00   Chris Palmer      WeH 8207
    5:00 - 7:00   Jason Flinn       WeH 8208
    3:00 - 4:00   Gregory Kesden    WeH 8021

  Thursday

    3:00 - 5:00   Michelle Berger   WeH 3108
    5:00 - 7:00   Jason Flinn       WeH 8208
    3:00 - 4:00   Gregory Kesden    WeH 8021

  Friday

    1:30 - 3:30   Chris Palmer      WeH 8207
    3:00 - 4:00   Gregory Kesden    WeH 8021

Happy coding!

Question

Can we assume that there will be a space between a word and a special character ("|", "<", ">", "&") on the command line?? It makes the writing of the parser a bit easier...

Answer

I can't promise whether or not we'll test this case, we haven't decided on our test set yet. But you really shouldn't require spaces between the metacharacters -- they're not necessary. This won't be many points off, but it might be one point-ish. I hope this answers your questions. If not, please ask again -- we're here to help.


Question

There is a race condition in the new terminalgroups1 example: if the child (vi) gets scheduled before the parent calls tcsetpgrp() then it tries to read it has problems. Since vi is still in the background and SIGTTIN is blocked, the read fails with EIO, causing an infinite loop To see this race condition happen more frequently, put a sleep(1) right before the call to tcsetpgrp.

Answer

Bradley's post was exactly correct. Zarchary Loafman actually mailed me about this moments ago, also. Thanks to you both. The examples on the web have now been corrected. This situation is a side-effect of the read is handled. In this environment, if the terminal isn't available, it doesn't block. Instead the terminal driver generates the two signals that we are ignoring. Sorry again for the confusion.


Question

we could just pick an arbitrary big number, but what's likely to be the upper limit of the number of jobs we could have in one session of the

Answer

I think I'd like you to use a dynamic list for this one. Unlike a command line, this can vary widely from system to system and from moment to moment Otherwise, I guess the best advice I can give is to make it large enough that we won't break it with heavy use during the demo, but not so large that it is a massive and noticable waste of space.


Question

I'm wondering what it means to run a job in the background. Here's my line of thinking: If the job should be run in the background, exec it but give the shell control of the terminal (tcsetpgrp(2,pid of shell)). If the job should be run in the foreground, exec it and give the process control of the terminal (tcsetpgrp(2,pid of job)). If that is not the correct way to think about it, then what is? If that is the correct way to think about it, then why doesn't "pico &" work? It prints the "pico screen", transfers control to the shell, and then pico dies (instead of just going to the background):

       if(!pid) /*We're the child */
       {
	  int cpid=getpid();
	  setpgid(0,0);
          if(cmd->run_in_fg) tcsetpgrp(2,cpid);
          else tcsetpgrp(2,ppid);
          
          execvp (cmd->argv[0], cmd->argv);
          exit(-1);
        }
      
      if (pid) /*We're the parent, so pid is the child*/
        {
          /*children should have different group id than parent.*/
          setpgid(pid,pid);
          /*set child to fg if it should be*/
          if(cmd->run_in_fg) 
            {
              tcsetpgrp(2,pid);
              waitpid (pid, NULL, 0);
            }
          tcsetpgrp(2,ppid);
        }

Answer

That's correct, but don't forget that you should waitxx() for foreground jobs from your shell. > If that is the correct way to think about it, then why doesn't "pico &" Let's not worry too much about the case when we have an interactive program in the background. A normal job control shell uses special signals to handle this case and suspends the process. But since we're not the actual login shell, it is very hard for us to take care of this.


Question

if exit is entered while other processes are running in the background, what should the shell do? in tcsh, the processes sort of become orphans, and the shell exits without complaint.

Answer

Interesting. I thought that the behavior of tcsh was to kill its children via a SIGHUP when it exits, unless they are started with "no hup." In order to protect the availability of our computing reosurces, I'd like for you to do the same thing -- it'll save many cycles. But this wasn't specified and won't be tested as part of our evaluation.


Question

On the web site, the date that Hwk2 is due, says that Hwk3 is due then (before it's handed out) and there is effectively no due date for Hwk2. Just a typo that could stand to be cleared up!

Answer

All fixed.


Announcement

The web site now has two new features: 1) links to anonymized questions and answers on staff-412@cs list. We're always here to help. But you may get a quicker answer to common questions by searching this collection of recent questions. We remove identities and identifying information (as best as we can) to ensure your privacy. This collection will be updated at least nightly. 2) Thanks to Adam Pennington we now have a log of the 412 Zepher class. This log is updated in real-time. We've also updated the on-line syllabus to include office hours and locations.


Question

I have a pretty well working set of beginning job control stuff using signals, *however*, I'm running into some issues and could use some clarifying.. The following is currently the output (and actions I took offscreen):

C:\> xterm &
C:\> xterm &
C:\> cat
(Offscreen: exit both xterms)
^D
Process 16598 exited, return code 0, tc = 16607.

Process 16589 exited, return code 0, tc = 16607.
Do you want this sort of delayed signal buffer (I did blocked the child signals until the fg process was done) or do you want the following:
----
C:\> cat

Process 16598 exited, return code 0, tc = 16607.

Process 16589 exited, return code 0, tc = 16607.

^D
----
I'm having trouble implementing the latter because something is causing the child to stop.. but if you want it that way I can fix it.

Answer

Handle this however you prefer. Prepare to defend your answer to the grader from our perspective, 2 things are important: (a) your handling of theevent is asynchronous (even if notification is not), and (b) you did what you did for a reason. Polling will get some credit -- but not full credit.


Question

Someone asked me how to determine the pid of the child, from within the parent process's handler for SIGCHLD.

Answer

The folks last semester had some difficulty with this, and the man pages and headers can be a bit difficult. I have posted a new example under the Project #1 page: "Example of a Signal Handler and child status " Since I didn't explain this example in class, I've commented it heavily. But please remember that this is just a pedagogical example -- not a direct piece of your puzzle.

Question

My partner thinks that we are group xx, but neither of us can get into /afs/andrew.cmu.edu/scs/cs/15-412/usr/grpxx There are either two possibilities that we can think of here:

(1) we are morons and are not group xx
(2) we don't have access
There are either two possibilities that we can think of here: (1) we are morons and are not group xx (2) we don't have access

Answer

We haven't created the directories yet -- not all of the groups have registered. We'll create these directories on Thursday. I'm sorry for any inconvenience.


Question

We have a problem with pipe command that would not terminate, could you please have a look to see if you can find what is wrong.

Answer

It looks like the problem is that the parent is not closing all of the pipe fds. If there are any open fd's referring to the input side of the pipe, the side of the pipe that reads the data will never see a closed pipe and will continue to block on read. It looks to me like the parent is holding the pipe open, preventing the child process from ending, at the same time that it is waiting for the child to end.


Question

I'm using the function waitpid in my program. I pass -grpid as a parameter, where grpid is the id of the group. In the man page it says that in this case, the function waits for ANY process in this group to exit. I wonder if this means that it waits until all the processes have exited? Or does it just mean that anyone of these processes has exited?

Answer

I believe your second interpretation is correct (one of the processes has exited). If you need to be 100% certain, you could construct a small test case. (Fork two processes which sleep for different amounts of time then exit, put them in the same process group, and call waitpid in a loop. See how many times waitpid gets called).


Question

can I put more parameters in sigaction(sig, &sa_action, &old_action), if I specify them in the childHandler? for example. I'd like to give the adress of the table where my bg processes are stored. Can I do that and if yes, should the the synthax become sigaction(sig,&sa_action,&old_action,&addr)?

Answer

I would have to say no. If you look at the man page, the prototype is defined something like [on my linux machine]:

int sigaction(int signum,  const  struct  sigaction  *act,
       struct sigaction *oldact);
and consequently you should get a compile time error if you try to pass an extra argument. As always, to be 100% sure what the behaviour would be, you should probably construct a small test example.

Question

We're having some trouble with the siginfo_t* that gets passed to the parent function when a SIGCHLD is sent to it. Sometimes it is already NULL by the time we get it. Right now we handle it by looking through all the groups and figuring out if anybody has died, but this is very inconvenient and causes problems with the rest of the code since we can set the suspended bit in our structure to reflect the actual state of the process. We thought it might just be that the process had died and all the signal data associated with it got free()d, but it also happens on suspend. Any ideas why it would return null so often?

Answer

[Self-answered] Nevermind. I overlooked the new line about flags on the new signal example.


Question

Is it acceptable behaivor to wait for the last pipe element to exit, or do we need to wait for the entire group to exit? I think correct behaivor is to wait for it all, as my experiments in bash show, but it's amiguous. On a similar vain, should the notification be by group alone, or is it okay to notify the user of termination of individual elements? Like cat dir.txt | sort | sort would only get one notification in bash upon termination, but I could see it notifying you of each termination. Is there any easy to tell if an entire process group has exited? (besides keeping track of individual elements).

Answer

According to the project description, "a command line with "pipes" is not considered to have completed until all of its component program invocations exit." To me, that means that you need to wait for the entire group to exit. For the notification, I think the user should be notified when the whole group exits, not individual elements. If you want to do it the other way, just be able to justify it to whomever grades your project. I don't see an easy way of being able to tell if an entire process group has exited in the man page for waitpid. I'll continue to look into this, though, and let you know if I find anything else.
Your shell can keep track of the order of the processes

  a | b | c
and notify the user when c exits, if you want your program to work that way.

Question

it doesn't seem like the new examples are showing up on the web when accessed at http://www.andrew/course/15-412. has the page been recently republished? it sure would be more convenient than typing in the afs path...

Answer

Please use the course URL: http://www.cs.cmu.edu/~412...afs web space is rarely update.


Question

We are having trouble with signals. Example: fg'ing a suspended process. When our shell sends a SIGCONT to a suspended process, if the job scheduler decides to execute the parent first then the shell waits. While the shell is waiting, the job scheduler executes the child process which receives the SIGCONT and sends the SIGCHLD signal back to the parent. So at this point the shell gives itself control of the terminal, which is incorrect. But if the shell waits for the SIGCONT first, then the job scheduler might send the SIGNAL and execute the child, so the first wait would not stop until the child received perhaps an exit, and then the second wait would never end. Now what is the correct way to work around this. Perhaps we could have a while loop in our shell:

 (while foreground process not sent to background) 
   wait(foreground pid,&status,WNOHANG);

But wait. The status doesn't tell you if the process suspended. IT ONLY SAYS WIFEXITED(status) or WIFSTOPPED(status) neither of which is nonzero for a suspended process. So how can we know if the process is suspended. Besides which, what does WNOHANG mean? If the last argument is 0 I understand what it means to not have a signal, but why WNOHANG? Does that mean that as long as our process is running, if it doesn't receive a signal then this is a NOP? Also, this loop seems like a waste of cycle time, is that important? So I basically have three questions: 1. Is a while loop the way to go? 2. If not, should we be waiting in the signal handler(this seems like a very bad idea to me.) 3. If a while loop is the way to go, how do we discover if a process is suspended? The WIFSTOPPED checks for SIGSTP not SIGTSTP.

Answer

I'm confused by your use of the words "shell", "parent", and "child". The project description says that fg should be an internal command. Which to me means that you shouldn't have to fork a process to run it. So what process are you referring to when you say parent and child? Shouldn't the parent be the shell and the child be the suspended, now running, process? As far as the WNOHANG, the man page says that if there is no process with pid that has no status available, 0 is returned. so it is not a NOP, but execution is continued.


Question: what should we do in the case of something like this


program < animals.txt trees.txt > outfile.txt dogs.txt

an actual shell will treat this as a program invocation of
program trees.txt dogs.txt
with stdin taken from animals.txt 
with stdout put into outfile.txt

Answer:

Since the purpose of this assignment is not to test your ability to write a parser, feel free to follow the simpler form of the grammar. There is no harm in having the more realistic parser but it isn't strictly necessary.


Question:

In using setpgid(), it is returning -1, with errno set to ACCES, which according to the man page for setpgid, is that the child process which is the leader of the specified process group has exec'ed. Does this mean that you can't put things into a process group after the group leader has already exec'd?

Answer:

Not exactly. You can only move a child into a different process group using setpgid() before it calls an exec_() function. After that point, although it is still a child, the setpgid() is not permitted.

THis is why in the sample code, that we have a setpgid() before the exec_().


Question:

What signal does the child send to the parent if it gets a ctrl-z?

Answer:

The short answer to your question is that the parent will get a SIGCHLD signal.

The long answer is that typing a control-Z will cause the terminal driver to generate a SIGTSTP message. This causes a SIGSTOP to be send to the child. When the child stops, the operating system will send the parent notification through a SIGCHLD signal.


Question:

[ paraphrased: ] We get horrible behaviour if we execute a program that makes use of tcsetpgrp().

It is safe to assume that we will not call setpgid(), tcsetpgid(), or similar process group/job function from any test program. Although these features are accessible to users, they are typically only used by job control shells or other special cases.

Answer:

The example code that we have provided you is just a simle example, it really isn't intended as anything other than an illustration of the functions.


Question:

Ok, so I am detecting that the child has stopped, and calling tcsetpgrp(0, parentpid), but somehow the parent is not getting back control of the terminal. Any idea why this might be happening?

Answer:

Are you doing this in the parent's SIGCHLD handler? If not, could it be that the parent is still waiting for the child? I don't think that a SIGCHLD changes the child's status in the context of wait.


Question:

Is " waitpid" the only way to avoid zombies or do they disappear when you catch a signal?

Answer:

You can use any of the wait_() family of function. But they don't go away by themselves and they aren't cleaned up by the signal handler.


Question:

If you suspend a child then kill it, it looks like the child doesn't know it's killed until you continue it. After you continue it, it rapidly sends SIGCHLD that says it continued, then another SIGCHLD saying it was killed. This makes some sense, yet bash and tcsh both manage to detect this (with their usual delay of waiting 'til next command to notify you).

Is it a problem that needs solving in this assignment?

I just tried using wait instead of my signal method, and not even wait is returning any status change of the killed process.. how do normal shells manage this? (trying other methods of polling)

Answer:

I don't see the same behaviour in my implementation of the shell. I wonder if you might have a bug in your implementation or if there is some other difference.


Question:

My partner and I are having some issues with the Process Groups. We've tried a variety of things, and here's what we've come up with:

-After the fork(), the parent is the terminal controller, as well as the session leader.
-We call setpgrp() and we can put the child into a new process group. However, the parent's group still remains the session controller.

So, we'd like to change the terminal group to the child's group, however, we can't seem to do this. When we try to use tcsetpgrp in the child, the process seems to hang.

We check to see if the process is a foreground process, and if it is we use tcsetpgrp() to set the child as the terminal controller. After a foreground process is complete we attempt to set the terminal controller back to the parent.

Answer:

The (sketch) code that you sent looked about correct. You're right that you need to set the controlling terminal. As a first guess, have you correctly blocked the SIGTTIN and SIGTTOU signals (see the terminal groups example code on the web)?


Question

We tried using more &. It showed the first screenful and then our shell had control of the terminal again. More & is also interactive, and the login shell suspends ttyoutput. Is the correct behavior from our shell what is described above? We don't really know what sort of stuff we should test for correct behavior. Could you give some examples, and what _should_ happen without implementing everything the login shell is?

Answer

Yes, I would expect more to suspend on ttyoutput when it is run in the background, since it is trying to read from the terminal. If your shell can emulate the behavior of a standard Unix shell like tcsh or bash, that is fine. Don't worry about executing a job control shell like ysh or tcsh from within your shell. That is more than we're asking for in this assignment. I would suggest testing the following combination of test cases:

job doesn't interact with terminal, job reads from terminal, job writes to terminal, job reads/writes from/to terminal.

job run in background, run in foreground, run in foreground then bg'd, run in background, then fg'd.


Question

I'm having some problems writing code that can tell when a child process was suspended (e.g, with ^Z). When the child gets stopped, the parent immediately gets a SIGCHLD. Inside ysh's handler for sigchld, the following code runs:

  rval = waitpid( fg1->fg_pid[0], &stat, WNOHANG);
  if(rval<0)
      perror("sigchld waitpid: ");

  printf("handle_sigchld: rval is %d.\n", rval);

The rval returned by waitpid, surprisingly is 0. If I've understood man pages correctly, this means that the kernel was unable to deliver any new news to ysh about the child. I've verified that the PID being passed in this waitpid call is in fact the PID of the child I"m interested in. So, I don't understand why wait wasn't returing information indicating that the child process had been suspended

Answer

Your signal handler is also passed a status in the siginfo_t structure which can tell you the reason why your signal handler is invoked. Alternatively, there is a special flag, WUNTRACED, which you can pass to wait which will cause wait to return for children which are stopped, whose status has not yet been reported. Interestingly, this flag is documented in Linux but not in Solaris :(


Question

Is there a signal that gets passed when a child is suspended? I.e. through SIGCHLD, and if so, how does one check for it (like a WIFEXITED functionality)?

Answer

The parent gets a SIGCHLD when its child is suspended. It can be checked for as described in the previous answer (look at the macro WIFSTOPPED in /usr/include/sys/wait.h.


Question

With WUNTRACED, am I just allowed to poll the process, or does that violate the no busy-wait rule?

Answer

You're correct in thinking that polling is not a good design choice - it needlessly consumes CPU cycles. (However, polling is better than nothing at all).

Remember, though, that signals are asynchronous. Your SIGCHLD signal handler will only get invoked when a child's status changes.


Question

My partner and I are having a with commands that output to the terminal. When we do a "more file" it does not display anything and it zeros out the file. What could be causing such a bug?

Answer

One possible reason may be that more is writing to your file instead of stdout. This could be caused by an incorrect dup2() call somewhere in your shell. Remember that a child inherits the file descriptors of its parent.


Question

What does WTERMSIG(status) check for exactly?

Answer

WTERMSIG(status) returns the number of the signal that caused the child process to terminate. This macro can only be evaluated if WIFSIGNALED returned non-zero.


Question

Sorry to send a second email out tonight, but I have another question. Is there a signal that will tell a suspended process to begin executing in the background? (I'm sure the reasons for this question are obvious)

Answer

SIGCONT will cause a suspended process to resume execution. Whether or not it is a foreground or background process depends on the process group that it is in (and whether or not the shell is waiting for it to finish).


Question

We're having somewhat of a problem with our background command. We run a process (xterm), then halt it with ctrl+z, which works fine. Then when we run bg %xxxxx, the background process continues fine, but the shell itself loses the input to the terminal (although the output is retained). (the prompt shows up, but the lines typed in never reach the shell).

The current code to handle the bg command looks like this:

 sigsend(P_PGID, j->groupid, SIGCONT);
 tcsetpgrp(0, jobs->yshid);
 changest(jobs, j->groupid, STATE_BG);
 jobs->exitstatus=0;

We've also tried to use 1 and 2 as handles to the terminal, which hasn't helped, and we've tried to set the foreground first to the process and then back to ysh after the signal.

That has not helped. For some reason, sending the signal severs the connection. (sending no signal, or an empty signal does not)

Do you know what could be causing the problem?

Answer

The approach used in the code below looks OK to me. When you say that the input never reaches the shell, do you mean that your fgets() or equivalent is not returning? Or is your shell suspended on tty output?

I would suggest checking the return code from tcsetpgrp, and making sure that you are blocking SIGTTIN and SIGTTOUT. Other than that, I would have to take a look at your code to give an answer.


Question

I'm having some problems writing code that can tell when a child process was suspended (e.g, with ^Z). I've been testing with starting cat in the foreground, then trying to suspend it.

When I fork a child process in the foreground, I wait for it to finish with this code:

ret = waitpid(childpid[0], &exit_status, 0);

When I press ^Z, I can tell that cat actually gets stopped, as indicated by below output.

%ps
   PID TT       S  TIME COMMAND
  9282 pts/31   S  0:00 -tcsh
 23283 pts/31   S  0:00 ./ysh
 23284 pts/31   T  0:00 cat

When the child gets stopped, the parent immediately gets a SIGCHLD -- it is my understanding that this is intended to notify the parent (ysh) that the child (cat) has been suspended.

    Inside ysh's handler for sigchld, the following code runs:

  rval = waitpid( fg1->fg_pid[0], &stat, WNOHANG);
  if(rval<0)
      perror("sigchld waitpid: ");

  printf("handle_sigchld: rval is %d.\n", rval);

The rval returned by waitpid, surprisingly is 0. If I've understood man pages correctly, this means that the kernel was unable to deliver any new news to ysh about the child. I've verified that the PID being passed in this waitpid call is in fact the PID of the child I"m interested in. So, I don't understand why wait wasn't returing information indicating that the child process had been suspended (suspended == stopped, detected by WIFSTOPPED macro, correct?).

After my SIGCHLD handler concludes that it has nothing to do (as hasn't found any suspended processes to take note of), it returns. Execution continues with the return of the waitpid() call placed after I forked the child. This waitpid returns -1, and sets Errno to indicate a "Interrupted system call." Calling waitpid() again to check for information about the process doesn't yield any useful information either. So, I'm at a loss as to how to find out that my child process was actually suspended, so I can act accordingly.

I hope I've given you enough information (but not too much as to make it hard to understand) to make my situation clear. I'd be happy to send you complete copies of the source if that might help. If you have any ideas for what I'm doing wrong, or what the correct approach is in this situation, I'd really appreciate any advice you could offer.

Answer

Your signal handler is also passed a status in the siginfo_t structure which can tell you the reason why your signal handler is invoked. Alternatively, Chris tells me that there is a special flag, WUNTRACED, which you can pass to wait which will cause wait to return for children which are stopped, whose status has not yet been reported. Interestingly, this flag is documented in Linux but not Solaris :(


Question

Is there a signal that gets passed when a child is suspended? I.e. through SIGCHLD, and if so, how does one check for it (like a WIFEXITED functionality)?

Answer

Yep, the parent gets a SIGCHLD when its child is suspended. If the parent is handling that signal, you can check the status code that is passed to the signal handler, or you can do a wait...() system call with the special flag, WUNTRACED (which causes wait to return for children which are stopped and whose status has not yet been reported). (Look in /usr/include/sys/wait.h for relevant macros.)


Question

What kind of status code passed to the handler would indicate that the process was suspended?

And with WUNTRACED, am I just allowed to poll the process (probably with WUNTRACED | WNOHANG), or does that violate the no busy-wait rule?

Answer

You're correct in thinking that polling is not a good design choice - it needlessly consumes CPU cycles. (However, polling is better than nothing at all).

Remember, though, that signals are asynchronous. Your SIGCHLD signal handler will only get invoked when a child's status change.

If I remember correctly, the macro WIFSTOPPED(status) will return true when a process is suspended. If this is incorrect, experiment with the other macros in wait.h, print out the actual status code, etc.


Question

My partner and I are having a with commands that output to the terminal. Commands such as more and cat. When we do a "more file" it does not display anything and it zeros out the file. If we were to "ls > file" and then "more file" the problem occurs. But if we were to exit out of our shell after the ls command and then do a more it will output the correct info. What could be causing such a bug?

Answer

There are several things which could cause such a bug. One possible reason may be that more is writing to your file instead of stdout. This could be caused by an incorrect dup2() call somewhere in your shell. Remember that a child inherits the file descriptors of its parent.


Question

Another quick question (sorry to abuse this email address) ... what does WTERMSIG(status) check for exactly? It was in the sample code. Is this only for processes terminated by a signal? What kind of signal would cause this to be true?

Answer

WTERMSIG(status) returns the number of the signal that caused the child process to terminate. This macro can only be evaluated if WIFSIGNALED returned non-zero.

Many signals could cause a process to terminate, two examples are SIGINT and SIGTERM.

WIFSIGNALED(stat_val) Evaluates to non-zero value if status was returned for a child process that terminated due to the receipt of a signal that was not caught

WTERMSIG(stat_val) If the value of WIFSIGNALED(stat_val) is non-zero, this macro evaluates to the number of the signal that caused the termination of the child process.


Question

The problem is that when ls | more is run, when ls finishes, it sends a signal to the parent process. However, when more subsequently finishes, it does not seem to send any signal at all, which is very strange.

Answer

If the parent process (ysh) creates a pipe and doesn't remember to close its copy of them (after the fork, of course), more will never see an end-of-file from the pipe. This happens because there is still an open reference to it. This is the behaviour that you saw.


Question

"A DMA based network-transfer is initialized by the currently running process."

So, this is very vague. Is the process expecting to block until this finishes, what are the parameters, etc? Are we talking about a read() or write() on a socket, a socket() call itself, or just some random process trying to use DMA directly (which would be flatly denied)?

It looks to me that by initialized, it should mean that it's not blocking, so it doesn't really need to switch queues. But this really depends on the situation, I think.

Answer

In this question, there may be more than one correct answer, depending upon your assumptions, so state them clearly.

One assumption that is reasonable here is that the process has initiated the DMA-based tranfer via a socket write(), and that the act of initiating the DMA transfer does not cause the process to block.


Question

Are we to assume for problem #3 (CPU occupancy) that no task can use the CPU while Disk/Printer is in use? I'm struggling between the notion of "CPU occupancy chart" and non-CPU tasks. If this is the case (that CPU, Disk and Printer tasks are treated as CPU-consuming tasks for the occupancy chart), I'm assuming the point of the notion that the Disk/Printer can't be interrupted is that they will run beyond 10ms under Round Robin... correct?

Answer

In this question, you can assume that different processes can use the CPU, disk, and printer simultaneously, but that the SAME process can only use one of these at any given time.

Round-robin scheduling is only being used for CPU scheduling. As you say, the disk and printer can not be interrupted, so any I/O operation that is started will run to completion before the next I/O operation begins.


Question

I am very confused on this problem. I was told by someone who emailed earlier that we were not allowed to schedule the CPU and I/O at the same time in this problem. It is possible that he may have been misinformed, thus i am asking the question. I observe that all of process X's CPU bursts are shorter than that of process Y. If this restriction is in place, it seems to me that the SJF case would just reduce down to the Run Until Completion case.

Are my thoughts correct, or do i just have some fundamental misunderstanding of the SJF algorithm?

Answer

You can assume that two different processes can use CPU and I/O simultaneously, but that a single process can not use the CPU and I/O at the same time (i.e. the I/O operations are blocking).


Question

In the interim, i have thought of another question, the SJF algorithm that we have to implement, is it preemptive or nonpreemptive, since the book describes both.

Answer

You can use the non-preemptive version of SJF for this question.


Question

I have asked two different students as to how to interpret question three. My question is whether process x can run cpu and i/o bursts while process y is held up in an i/o burst (assuming its not the same i/o device.)

Unfortunately both people I questioned were also confused as to how to interpret the question and received differing answers from the staff of the course.

Anyhow, which way should I go in answering the question?

Answer

Here is the hardware model:

Different processes can use different components (CPU, disk, or printer) at the same time. A single process can only use one of these components at any given time (i.e. I/O is blocking).

What may be confusing is that scheduling algorithms can impose additional constraints above and beyond what the hardware may allow (for instance, Run Until Completion).


Question

Is SJF in #3 preemptive or non-preemptive?

If preemptive, how are shortest-job-remaining ties solved?

Answer

In #3, you can assume non-preemptive SJF (as described in the book on pg. 130.


Question

In problem 7, the only difference between this code and the correct code we were taught in the class (Peteron's Algo) is (1-i) is replaced by (i+1)mod2. Is this going to cause problem? Could you give me any hint?

Answer

This is what you need to determine.


Question

With regard to the code from problem 7 listed below:


    /*
     * Shared information
     */
    int turn; /* Turn is initially 0 */
    boolean flag[2]; /* Both are initially false */
     
    /*
     * Consider 2 processes, pid=0 and pid=1
     */
    process (int pid)
    {
	  while (1)
	  {
		NonCriticalCode();
		flag[i] = TRUE;
		turn = (i+1) mod 2;
		while ((flag[(i + 1) mod 2]) && (turn == (i+1) mod 2));
		CriticalSection()
		flag[i] = FALSE;
	  }
    }

Where is the variable ``i'' declared and what is its initial value? Should it really be ``pid'', or is it another value altogether?

Answer

Yes, it should really read pid in that problem.


Question

i had a question about shortest job first. When does it decide to schedule the next job? I mean like in the problem, obviouly process X will schedule 5 ms of CPU. Then process X will schedule its 10 ms Disk access. But my question is will the scheduler decide to schedule process Y's 15 ms of CPU while process X is reading the disk, or will the OS wait until process X finishes the disk read to schedule, in which case the shortest job will be the 5 ms of CPU time for process X?

Answer

You can assume the non-preemptive version of SJF, so the scheduler will schedule the next process whenever the CPU is idle.

Also, you can assume a hardware architecture in which different processes can use different components at the same time.


Question

maybe i'm just reading this question or answer wrong.. but it SOUNDS like they're saying that an interrupt goes off every time quanta and interrupts the process no matter what.. meaning:

process x: CPU(5), disk(5), CPU(10)
process y: CPU(10), printer(10)
time quanta of 10

i'm interpreting what they say below to mean that execution goes as follows,

x runs for 5 on CPU, then starts DISK
y then runs on CPU for 5 (stopping due to interrupt)
then x takes over again.

i THOUGHT round-robin proceeded as follows: x runs for 5 on CPU, then starts DISK y runs on CPU for 10 (timer starts over) etc...

This is what i gleaned from looking at the example on page 136 in the book because P2 runs for 3 (quanta of 4) then P3 starts running and continues for its needed 3, then P1 takes over etc...

Answer

You can assume non-preemptive SJF in problem 3.

For Round Robin, I agree that the book is unclear. In most systems that I know of, timer interrupts are delivered at regular interrupts. In your example, that means that the process gets interrupted after 5 ms. I think that this is the best assumption for you to make in this problem.


Question

For the round robin alg.: Say process x runs on the cpu for the first 5 ms of its timeslice. I assume y runs for the next 5ms (if it wants to). But then which has preference at the end of these 10ms if they both want the cpu. x because y was just executing? or y because x had the choice last time?

For the round robin and sjf: Does the system allow one process to be using one resource while the other process is using another? ie x using the disk while y uses the cpu. In other words does this system have some sort of DMA?

These seem like implementation differences which would cause drastically different solutions.

Answer

For RR scheduling: When the timer goes off at the end of a time quantum, the scheduler puts the process that is currently executing at the end of the ready queue (no matter how long it has been executing).

For your 2nd question: You can assume that the hardware supports DMA, so different processes can use different components (disk, printer, CPU) simultaneously. Of course, the particular scheduling algorithm being used has to allow this, too.


Question

I think what he was asking was also this: If both processes start to do I/O, and the CPU is dilly dallying about waiting for these IOs to finish, and then both processes cease to do IO at the same time, which process does the kernel give it to? My answer would be "the process that did not have it last". This would make sense in the round robin theme, but it could be that the kernel lets whoever had it last finish up the CPU quanta they had already earned, or it could be the kernel flips a coin, or it could be that the kernel flushes them both down the toilet for being mean to its poor brain.

Answer

In a real system, it isn't possible for two processes to become ready at the same time. One process must be marked ready first since the scheduler is serialized.

However, this problem may be a different story since you don't have access to that level of detail. If such a situation arises, you can make any reasonable assumption for the behavior (as long as you state what that assumption is).


Question

In the book, they are apparently assuming that they are getting clock interrupts at a finer granularity than the time quantum. It looks like they are getting clock interrupts at least at a 1 ms. granularity, and their time quantum is 4 ms.

Typically, clock chips are programmed to deliver interrupts at a regular interval (like 10 ms.), rather than programmed to deliver the next interrupt at a precise time.

Answer

In our problem, you can assume that the clock granularity is 10 ms., the same as the time quantum. Therefore, on each clock interrupt, the scheduler must make a decision which process to run next. If a process starts executing in the middle of a time quantum, it is not guaranteed to run for a full time quantum.