.FP lucidasans
.TL
C Programming in Plan 9 from Bell Labs
.AU
Pietro Gagliardi
.AB
This paper is an introduction to programming with Plan 9 from Bell Labs with the C language.
Plan 9 provides not only a significantly improved version of C, but also a number of programming libraries to simplify complicated tasks.
This paper is meant to be a supplement to the manual pages, other documents provided by the system in
.CW /sys/doc ,
and a programmer's literature collection.
.AE
.\" started July 8, 2008
.nr XT 4 \" tab in program is four spaces
.EQ
delim @@
.EN
.de Bx \" BX, but works like B, I, BI, CW
.ie t \\&\\$3\(br\|\\$1\|\(br\l'|0\(rn'\l'|0\(ul'\\$2
.el \\&\\$3\(br\\kA\|\\$1\|\\kB\(br\v'-1v'\h'|\\nBu'\l'|\\nAu'\v'1v'\l'|\\nAu'\\$2
..
.NH
Introduction
.PP
Plan 9 from Bell Labs has always been a system above the rest: simple, portable, and feature-complete.
It isn't
.UX ;
rather, it improves on the basics of
.UX
by providing a number of features absent from most other operating systems.
One of those features is a great programming environment that rivals
.UX 's.
Plan 9 is fully Unicode-conformant through its nearly universal use of the UTF-8 encoding, brought to us by two of the people that brought us Plan 9.
It not only keeps the C language of old, but through the work of Ken Thompson, it provides a C that makes some otherwise complicated constructs straightforward.
Backing this new C up is 33 programmability libraries that significantly reduce the amount of code a programmer needs to write.
And every single line of this code is fully portable among different Plan 9 installations, even with different architectures — the notion of a
.CW configure
script has been vanquished at last.
.PP
Learning programming with Plan 9 is not something that requires complicated textbooks and four years of college study to master.
In fact, with just the manual pages and pages of some documentation in hand, someone can quickly master the core concepts.
However, there sometimes is a need of a starter's guide or tutorial to start with or to clear up some uncertainty.
That task is what this paper aims to do.
This paper is
.I not
a full reference to Plan 9's programming environment — the manual pages do that.
Keep this in mind while you read.
.PP
You need to know how to use Plan 9 from Bell Labs, rc, an editor such as sam or acme, and the C programming language to start.
The official guide to C is Prof. Brian Kernighan and Dennis Ritchie's
.I "The C Programming Language" ,
now in its second edition.
Read through it: you'll learn quite a lot.
.NH
Core Concepts
.PP
Here is Kernighan's "hello, world"-printing program that has become quintessential, in Standard C and with a few differences from the one in Kernighan's book (for exposition purposes).
.P1
#include <stdio.h>
int main()
{
printf("hello, world\en");
return 0;
}
.P2
Now here it is as a Plan 9 programmer would write it.
.P1
#include <u.h>
#include <libc.h>
void
main()
{
print("hello, world\en");
exits(0);
}
.P2
Immediately, expert C programmers will say things like “Where did
.CW stdio
go?” and shout at the top of their lungs things like “You can't declare
.CW main
as returning
.CW void !”
If you're one of these guys, then you better get used to it.
.PP
The include file
.CW u.h ,
stored in
.CW /$objtype/include
where
.CW $objtype
is an environment variable storing the current CPU name,
contains CPU-specific definitions.
All header files in Plan 9 use this, so it must be included first.
Next comes
.CW libc.h ,
stored in
.CW /sys/include .
.CW libc.h
contains the definitions for the C library, which is linked into every Plan 9 program.
The C library consists of several parts:
.IP \(bu
All the Plan 9 system calls (save for a few that only the library uses)
.IP \(bu
A set of subroutines to facilitate using the system calls
.IP \(bu
The formatted print routines
.IP \(bu
Mathematical functions
.IP \(bu
Time functions
.IP \(bu
Functions for working with Unicode characters, or
.CW Rune s
.LP
.CW libc
must be second; it is used by most, if not all, other libraries.
.PP
The
.CW print
function is a member of the set of formatted print routines; it works just like
.CW printf
in C, with several minor differences:
.IP \(bu
The
.CW %u
format is gone; it has been replaced with the
.CW u
modifier to other integer formats.
So instead of saying
.CW %-3lu ,
you say
.CW %-3uld .
.IP \(bu
The
.CW %b
format is provided for printing binary numbers.
.IP \(bu
The
.CW %C
and
.CW %S
formats are provided for printing UTF-8 characters, called
.CW Rune s,
and strings of
.CW Rune s,
respecitvely.
They are discussed later.
.IP \(bu
The
.CW ll
modifier flag to integral formats prints
.CW vlong s,
which are described later.
.IP \(bu
The
.CW %r
format prints the error string, which is described next.
.IP \(bu
You can create your own formats; that is described later.
.LP
Otherwise,
.CW print
behaves the same as
.CW printf .
.PP
.CW exits
and the
.CW void
return from
.CW main
require a bit of explanation.
The traditional way of representing errors and status returns in C is with numbers: a return from
.CW main
or the argument to
.CW exit
represents a status return from a program, and
.CW errno
stores information about error returns from functions.
The traditional behavior is to have zero mean no error and any other value mean error; ANSI C defines
.CW EXIT_SUCCESS
and
.CW EXIT_FAILURE
for status returns from programs.
.PP
This gets restricted very quickly.
ANSI C only defines three standard values for
.CW errno
(domain error, range error, and illegal multibyte sequence) and two values for status return.
And sometimes an integer won't tell you enough.
For example, let's take the
.UX
.CW lseek
system call, which manipulates the file read/write position:
.P1
long lseek(int fd, long offset, int from);
.P2
If any argument is invalid (for example,
.CW from
not 0, 1, or 2),
.CW lseek
returns with
.CW errno
set to
.CW EINVAL
(specific to
.UX ).
But this doesn't tell you
.I which
argument was invalid, or how many; it only says that something was not right.
We can add the appropriate
.CW errno
values to resolve this problem.
But what about a library that defines over 1,000 values for
.CW errno ?
On machines with small
.CW int
sizes, this chokes your program and defeats the purposes of both sides.
.PP
A better idea is to give the programmer the ability to handle any error that comes in without worry of losing standards compliance or clarity, and to generate any error without falling into a surfeit of possibility.
So the designers of Plan 9 decided to use strings instead of numbers.
Each program has an
.I "error string"
which is set by routines when an error occurs.
And each program returns a string to the host environment with the
.CW exits
system call.
The value given to
.CW exits
can be accessed from rc through the environment variable
.CW $status .
.PP
So with a string, how do you represent a lack of error?
Why, with a null pointer or null string!
Because the constant
.CW 0
turns into a null pointer, the statement
.P1
exits(0);
.P2
does everything already.
Of course you can also say
.P1
exits(nil);
.P2
or
.P1
exits("");
.P2
.CW nil ,
in
.CW u.h ,
is Plan 9's
.CW NULL .
.PP
So how does this explain why
.CW main
has to return
.CW void ?
You can't return a string placed in automatic storage from a function:
.P1
char *
f(void)
{
auto char s[] = "hello";
return s; /* WRONG */
}
.P2
But a programmer may store the exit status of a program in this way.
.SH
An Aside on Style
.PP
Plan 9 programs are usually written to conform to a predefined set of style guidelines, described in the manual page
.I style (6),
for the sake of uniformity.
Here is a taste:
.P1
static
int
func(int f, char *g[])
{
int i, j;
j = 5;
acquirelock();
for(i = 0; i < j; i++){
process(i, &j);
if((j = g(&i)) == 0 ? h() : i()) /* g() affects h()/i() */
if(strcmp(s, t) == 0)
something();
}
return j - i;
}
.P2
Of course this piece of code doesn't do anything sensible by itself.
It was written to show the basics of this style.
If you want to contribute to Plan 9, be sure to use this style.
Of course, you can still use your favorite style elsewhere.
.NH
Compiling Programs
.PP
.UX
compilers give you the option of compiling a program in one shot:
.P1
$ cc a.c b.c # compile and link; creates a.out
$ a.out # run
.P2
or in pieces:
.P1
$ cc -c a.c # compile; creates a.o
$ cc -c b.c # compile
$ cc a.o b.o -lS # link; creates a.out. you can also use ld and omit -lS
$ a.out # run
.P2
Plan 9 gives you no choice but to do the latter, but with
.CW ld
instead of
.CW cc
for the final stage.
On top of that, there is no single C compiler and no single linker — there is one of each for each supported processor architecture.
.PP
What are the benefits to this requirement?
First, large projects can be built with ease, just like
.CW make .
(Plan 9 provides an improved variant, called
.CW mk ,
that I describe later.)
Second, it removes one possible error: mixing computer architectures.
Third, it promotes separation of tasks: the C compiler should not be expected to link.
.PP
Using this system is easy.
All you have to know is the single character that denotes your processor.
For the Intel x86 family that is in most PCs, that character is
.CW 8 .
So I do
.P1
% 8c a.c # compile; creates a.8
% 8c b.c # compile
% 8l a.8 b.8 # link; creates 8.out
% 8.out # run
.P2
A complete list is in the manual page for the C compilers,
.I 2c (1).
.PP
Also note that a special feature of the C compilers allows the linker to detect that
.CW libc
or another Plan 9 library is to be linked into the program without any extra flags.
I will get to that later.
.NH
Manipulating Files
.PP
In Plan 9, absolutely everything is a file — even processes
.CW /proc ), (
environment variables
.CW /env ), (
and file descriptors
.CW /fd )! (
What is a file descriptor?
A file descriptor is an integer that represents an open file.
Files are opened with the
.CW open
system call, which returns one.
The syntax of
.CW open
is
.P1
int open(char *filename, int openmode);
.P2
.CW openmode
is one of the constants
.CW OREAD ,
.CW OWRITE ,
or
.CW ORDWR ,
which define what you intend to do with this file (read, write, or both), optionally combined with the constants
.CW OTRUNC ,
.CW OCEXEC ,
and
.CW ORCLOSE
via bitwise OR
.CW | ). (
If
.CW OTRUNC
is given with
.CW OWRITE
or
.CW ORDWR ,
the file is truncated to zero length.
.CW OCEXEC
and
.CW ORCLOSE
are described later.
.CW open
returns a valid file descriptor @n@ such that @n >= 0@ on success, or -1 on failure.
.PP
It is an error to open a file that doesn't exist, so the
.CW create
system call is used to create one.
(Ken got his wish.)
.CW create
takes the form
.P1
int create(char *filename, int createmode, int permissions);
.P2
If the file already exists, it is truncated to zero length.
The
.CW permissions
are just as in
.UX :
a three-digit octal number containing a combination of read, write, or execute bits for the file's owner, the group of the owner, and everyone else.
For example,
.CW 0644
yields
.CW rw-r--r-- ,
and
.CW 0750
yields
.CW rwxr-x--- .
.CW createmode
is either 0 or a bitwise OR of
.CW DMDIR ,
which creates a directory ,
.CW DMAPPEND ,
which makes a file that can only be appended to (i.e. a log file),
.CW DMEXCL ,
which makes the file openable by only one program at a time, and
.CW OEXCL ,
which will cause
.CW create
to fail if the file exists.
.PP
The
.CW read
and
.CW write
system calls read and write arbitrary data to the files:
.P1
long read(int fd, void *buf, long n);
long write(int fd, void *buf, long n);
.P2
read
.CW n
bytes from
.CW fd
into
.CW buf
and write
.CW n
bytes from
.CW fd
into
.CW buf ,
respectively.
.CW read
returns the number of bytes read, while
.CW write
returns the number of bytes written.
.PP
Why do
.CW read
and
.CW write
seem to return their argument
.CW n ?
The truth is, they don't always do so.
Let's take
.CW read
as an example.
What if the end of the file is reached before anything was read?
Well, you read nothing, so
.CW read
will appropriately return 0.
A
.CW write
can fail if the disk is full.
.PP
Instead of using the low-level
.CW write ,
you can use
.CW fprintf .
.CW fprint ,
like
.CW fprintf ,
allows formatted output to an open file.
It takes, as an extra first argument, the appropriate file descriptor.
Note that there are no reading functions like
.CW scan ;
buffered I/O via
.CW libbio ,
described later, provides the facilities.
.PP
The
.CW seek
system call changes where reads and writes are performed in relation to the file.
.P1
vlong seek(int fd, vlong amount, int from);
.P2
If
.CW from
is 0, seek to
.CW amount
from the start of the file.
If 1, seek from the current position.
If 2, seek from the end.
Note that
.CW amount
goes to the right if positive and left if negative regardless of
.CW from ,
so to seek five characters before the end, you say
.P1
seek(fd, -5, 2);
.P2
.CW seek
returns the position from the start regardless of
.CW from .
On error,
.CW seek
seems to succeed; only by examining the error string can you detect an error.
.CW seek
fails on directories and does nothing on pipes.
.PP
What is
.CW vlong ?
It is a
.CW typedef -ed
alias to
.CW "long long" .
The C compilers, as well as C99, provide the
.CW "long long"
type, which provides access to very long integer values, often 64 bits.
There is also an
.CW unsigned
variant.
.CW u.h
provides the terse alias
.CW uvlong .
On a 32-bit processor like the x86, 64-bit values are simulated.
For instance, you can't do
.P1
vlong v;
switch(v){
case a:
/* ... */
}
.P2
However, the mere fact that 64-bit values are available is promising.
.PP
Finally, the
.CW close
system call says that you are done with a file you opened or created.
It takes the form
.P1
int close(int fd);
.P2
.CW close
should only fail (return -1) if
.CW fd
is not really open, so just ignore its return value.
.PP
Before I move on, I need to talk about three file descriptors that all programs have when they are created.
File descriptor 0 is
.I "standard input" ,
which is the keyboard by default and changed with rc's
.CW < ,
.CW << ,
.CW <{\fR...\fP} ,
and
.CW | .
File descriptor 1 is
.I "standard output" ,
which is either the screen or the current rio window by default and changed with rc's
.CW > ,
.CW >> ,
and
.CW | .
So
.P1
print("hello");
.P2
is the same as
.P1
fprint(1, "hello");
.P2
File descriptor 2 is
.I "standard error" .
This allows you to give the user emergency output in the case of an error, without fear of losing the error to redirected output.
Standard error can be redirected with the
.CW [2]
modifier to the output redirection operators in rc.
.NH
UTF-8 Support
.PP
Plan 9 supports Unicode via UTF-8, however you need special provisions for handling the extended characters.
The special type
.CW Rune
is large enough to store a UTF-8 character, which can be embedded into a C program using Standard C's wide character literal format
.CW L'\fIcharacter\fP' .
A string of
.CW Rune s
can be made in the same way as a string of characters, and has the type "array of
.CW Rune s."
Most
.CW Rune s
can be entered directly from the keyboard; see
.I keyboard (6)
for instructions and the file
.CW /lib/keyboard
for a complete list and their key codes.
.PP
A UTF-8 character or string can be output with the
.CW %C
and
.CW %S
formats to the print routines, respectively.
For example,
.P1
#include <u.h>
#include <libc.h>
void
main()
{
print("3 %C 4\en", L'≤');
print("%S\en", L"Άρχιμήδης"); /* Archimedes */
}
.P2
The codes for capital alpha and lowercase eta with tonos (Unicode 0386 and 03AE, respectively) cannot be entered with the keyboard; they were generated with a simple program:
.P1
#include <u.h>
#include <libc.h>
void
main(int argc, char *argv[])
{
if(argc != 2){
fprint(2, "usage: %s hex-code\en", argv[0]);
exits("usage");
}
print("%C\en", (Rune)strtol(argv[1], nil, 16));
exits(0);
}
.P2
.CW argc ,
.CW argv ,
and
.CW strtol
act as in standard C.
If this program is compiled as
.CW code2rune ,
you can say
.P1
% code2rune 0386
Ά
% code2rune 41
A
.P2
.PP
A
.CW Rune
can be constructed from at least one
.CW char .
This allows input of
.CW Rune s
by reading a
.CW char
and seeing if it can be used to begin a
.CW Rune .
This is a simple multi-step process:
.IP 1.
Read a character.
.IP 2.
If that character is less than the constant
.CW Runeself ,
then cast that character to a
.CW Rune
and return it.
Otherwise, store that character in the first position of a buffer.
.IP 3.
Read the next character into the next buffer position.
.IP 4.
If the buffer from beginning to the current position is a full
.CW Rune ,
return that
.CW Rune .
Otherwise, return to step 3.
.PP
The function
.CW fullrune
does the test in step 4.
.P1
int fullrune(char *buf, int n);
.P2
returns a nonzero (true) value if the
.CW n
characters pointed to by
.CW buf
make up a full
.CW Rune .
The function
.CW chartorune
does the actual conversion:
.P1
int chartorune(Rune *dest, char *src);
.P2
turns the data pointed to by
.CW src
into the
.CW Rune
stored at
.CW *dest
and returns the number of bytes of
.CW src
used.
On error, it returns 1 and stores the constant
.CW Runeerror
in
.CW *dest .
The number of bytes shall never exceed
.CW UTFmax ,
a constant that defines how many possible bytes may be in a
.CW Rune .
.PP
With all this in mind, we can write a function that uses
.CW read
to read in a single
.CW Rune
from a given file descriptor and returns the number of characters read.
It behaves similarly to
.CW chartorune
on error: it returns the number of bytes read, but stores
.CW Runeerror .
.P1
long
readrune(int fd, Rune *r)
{
char buf[UTFmax];
char c;
long nread, n;
int i;
if((nread = read(fd, &c, 1)) != 1){
*r = Runeerror;
return nread;
}
if(c < Runeself){
*r = (Rune)c;
return nread;
}
buf[0] = c;
for(i = 1;;){
if((n = read(fd, &c, 1)) != 1){
*r = Runeerror;
return nread;
}
nread += n;
buf[i++] = c;
if(fullrune(buf, i)){
chartorune(r, buf);
return nread;
}
}
}
.P2
We can test this out in a program that reads
.CW Rune s
and prints them out, buffering the output.
.P1
#include <u.h>
#include <libc.h>
void
main()
{
Rune rs[100];
int i;
i = 0;
while(readrune(0, &rs[i]) > 0)
if(rs[i] == L'\en'){
rs[i] = '\e0';
print("%S\en", rs);
i = 0;
}else
i++;
exits(0);
}
.P2
Let's try this out:
.P1
% readrune
a
a
abc
abc
3≤4
3\(pw\(pw\(pw4
≤
\(pw\(pw\(pw
\fIctl-\fPd%
.P2
.PP
Something seems to be amiss.
For every Unicode character I put in, something gets eaten up and a mess of "I don't have that glyph" symbols (Peter Weinberger's famous headshot) comes up.
Our problem is declaring
.CW c
in
.CW readrune
as a
.CW char ;
if we change it to
.CW uchar
(a synonym for
.CW "unsigned char" ),
then we get this interactive session:
.P1
% readrune
3≤4
3≤4
≤+-4556
≤+-4556
\fIctl-\fPd%
.P2
.PP
There's still a problem.
Consider
.P1
% xd -c -b bad
0000000 e0 Q R S \en
0 e0 51 52 53 0a
0000005
% cat bad
.Bx ? QRS
% readrune < bad
.Bx ? S
%
.P2
Obviously incorrect.
The
.BX \f(CW?\fP
means "this is not a valid
.CW Rune ."
It turns out that even though
.CW fullrune
may report that the buffer contains a
.CW Rune ,
it does not say that the
.CW Rune
is valid.
In these situations,
.CW chartorune
may give up, returning a number of characters converted
.I less
than the number of characters read!
This means we ate too much.
Fortunately, and if we're not reading a pipe or directory, we can fix this with the use of
.CW seek .
Change the last
.CW if
to
.P1
if(fullrune(buf, i)){
n = chartorune(r, buf);
while(i > (int)n){
seek(fd, -1, 1);
i--;
nread--;
}
return nread;
}
.P2
and everything works:
.P1
term% readrune < bad
.Bx ? QRS
.P2
.PP
The
.CW seek
used says to seek -1 characters forward from the current position, or one character back.
In effect, this is
.CW ungetc
from Standard C, except that it doesn't work on pipes or directories.
There are other common uses of
.CW seek :
.P1
seek(fd, 0, 0);
.P2
seeks to the beginning of a file,
.P1
pos = seek(fd, 0, 1);
.P2
doesn't change the file position but tells you where, from the beginning, you are, and
.P1
seek(fd, 0, 2);
.P2
goes to the end.
This is done by default when opening a file that is append-only for writing.
.NH
Buffered I/O
.PP
Let us write a program
.CW runecount
that counts the number of
.CW Rune s
in a file.
The standard wc doesn't do this; it counts the number of bytes.
I have omitted the definition of
.CW readrune .
.P1
#include <u.h>
#include <libc.h>
long readrune(int, Rune *);
uvlong
runecount(int fd, char *filename)
{
uvlong n;
Rune r;
n = 0;
while(readrune(fd, &r) != 0)
n++;
print("%10ulld %s\en", n, filename);
return n;
}
void
main(int argc, char *argv[])
{
int fd, i;
uvlong total;
total = 0;
if(argc == 1)
runecount(0, "");
else{
for(i = 1; i < argc; i++){
fd = open(argv[i], OREAD);
if(fd == -1)
fprint(2, "can't open %s: %r\en", argv[i]);
else{
total += runecount(fd, argv[i]);
close(fd);
}
}
if(argc > 2)
print("%10ulld total\en", total);
}
exits(0);
}
.P2
.PP
The file
.CW /lib/glass
is a perfect file to test this program on; it contains translations of the phrase “I can eat glass and it doesn't hurt me.” in many languages and using Unicode characters.
For example,
.P1
% grep '^(French|Russian|Greek):' /lib/glass
Greek: Μπορώ να φάω σπασμένα γυαλιά χωρίς να πάθω τίποτα.
French: Je peux manger du verre, ça ne me fait pas de mal.
Russian: Я могу есть стекло, оно мне не вредит.
Greek: Μπορώ να φάω σπασμένα γυαλιά χωρίς να πάθω τίποτα.
.P2
To compare
.CW runecount
with wc, let's try them out:
.P1
% runecount /lib/glass
6715 /lib/glass
% wc -c /lib/glass
8517 /lib/glass
.P2
So
.CW /lib/glass
has 6,715
.CW Rune s
that fill up 8,517 bytes.
.PP
It turns out that when running
.CW runecount ,
I had to wait a while before getting any output, while wc returned immediately.
The time program will tell me how long a program runs, so let's try it on
.CW runecount :
.P1
% time runecount /lib/glass
6715 /lib/glass
0.02u 1.24s 6.74r runecount /lib/glass
.P2
This tells that the program took 6.74 seconds to run, with 1.24 seconds in the kernel, 0.02 seconds in user space (that is,
.CW main ,
.CW runecount ,
and
.CW readrune ),
and the rest doing various other things that I really don't know about (sorry!).
.PP
wc is faster than
.CW readrune
because it buffers its input.
A
.I buffer
is an in-memory array of a number of data objects.
When you ask to acquire a character from a character buffer tied to a file, it first sees if there is a character in the buffer.
If there is a character, the character is removed from the buffer and returned to the user.
If not, then the buffer is filled by reading enough characters to occupy every element of the buffer array, and the first character in the buffer is removed.
.CW readrune ,
however, does no buffering, so a new character has to be read every time.
.PP
Fortunately, Plan 9 provides not one but two ways of buffering input and output.
The first is libstdio, which works just like in Standard C.
But this doesn't support
.CW Rune s,
so we can't use it.
It also has several other restrictions that I won't go into.
.PP
The second is libbio, with manual page
.I bio (2).
libbio is a library for buffering input and output in much the same way as libstdio, but provides a higher level of abstraction and full
.CW Rune
support.
In fact, our
.CW readrune
function is based on libbio's equivalent function!
To put libbio into your program, just do the following:
.P1
#include <bio.h>
.P2
This must follow the
.CW #include
of
.CW libc.h .
.PP
The next step is to make a new
.CW Biobuf ,
which is the libbio equivalent to
.CW FILE .
Note that I did not say that it was equivalent to
.CW "FILE *" .
This is because there are
.I two
ways to connect a file to a
.CW Biobuf ,
with each method working differently.
The first method actually opens a file:
.P1
Biobuf *Bopen(char *filename, int openmode);
.P2
.CW openmode
is either
.CW OREAD ,
to indicate reading, or
.CW OWRITE ,
which creates the file with mode
.CW 0666
.CW rw-rw-rw- ). (
It returns a pointer to a dynamically allocated
.CW Biobuf ,
or
.CW nil
if error.
.PP
You can also connect a
.CW Biobuf
to an already open file:
.P1
int Binit(Biobuf *bp, int fd, int mode);
.P2
.CW bp
is a pointer to an already allocated
.CW Biobuf ,
either created explicitly by the compiler or dynamically allocated with
.CW malloc .
The entire
.CW malloc
family of routines is provided by the C library.
In this case,
.CW openmode
is the same as in
.CW Bopen ,
except
.CW OWRITE
does not create the file.
It returns the constant
.CW Beof
on error.
You can use this function to wrap the standard file descriptors to libbio; this is the only way to do formatted reads from standard input, since Plan 9 doesn't provide a
.CW scanf
equivalent.
.PP
Once we have a
.CW Biobuf
open, we can use several functions to read and write to them.
But first, a brief note on an extension of Plan 9's C: basic inheritance is supported.
The structure
.CW Biobuf
has the properties of another structured named
.CW Biobufhdr ,
to the point that all
.CW Biobuf
needs is to have all the elements of
.CW Biobufhdr
and the buffer itself.
A pointer to a
.CW Biobuf
can be used as a pointer to a
.CW Biobufhdr .
This feature will be described in full when we talk about the
.CW lock
family of routines.
.PP
The basic input functions provided by libbio are numerous and useful:
.P1
long Bread(Biobufhdr *bp, void *buf, long n);
void *Brdline(Biobufhdr *bp, int delim);
char *Brdstr(Biobufhdr *bp, int delim, int nulldelim);
int Blinelen(Biobufhdr *bp);
int Bgetc(Biobufhdr *bp);
long Bgetrune(Biobufhdr *bp);
int Bungetc(Biobufhdr *bp);
int Bungetrune(Biobufhdr *bp);
int Bgetd(Biobufhdr *bp, double *d);
.P2
.CW Bread
behaves just like
.CW read .
.CW Brdline
returns either a full buffer or everything up to the given delimiter.
A more useful function is
.CW Brdstr ,
which returns a
.CW malloc -ed
string consisting of the next full line ending with the given delimiter, or
.CW nil
on failure.
If
.CW nulldelim
is nonzero, the delimiter is not included in the returned string.
This eliminates the need for idioms like
.P1
s[strlen(s) - 1] = '\e0';
.P2
In both cases, the function
.CW Blinelen
returns the length of the returned line.
.PP
.CW Bgetc
and
.CW Bgetrune
read and return the next character and
.CW Rune
on the file, respectively.
Both return
.CW Beof
on end of file, hence the
.CW long
return from
.CW Bgetrune .
They can be returned to the buffer with the equivalent
.CW unget
functions.
Finally,
.CW Bgetd
reads in a
.CW double ,
returning -1 on failure or the number of bytes read on success.
.PP
The output routines are
.P1
long Bwrite(Biobufhdr *bp, void *buf, long n);
int Bputc(Biobufhdr *bp, int c);
int Bputrune(Biobufhdr *bp, long r);
int Bprint(Biobufhdr *bp, char *fmt, ...);
int Bvprint(Biobufhdr *bp, char *fmt, va_list v);
int Bflush(Biobufhdr *bp);
.P2
.CW va_list ,
and the family of (Standard C) supporting routines, are provided; the standard routines
.CW vprint
and
.CW vfprint
are provided.
.CW Bflush
immediately flushes the buffer; this is usually done when the buffer gets full.
Everything else works as expected.
.PP
The
.CW Bseek
function works like
.CW seek ,
but libbio provides an alternative to
.P1
loc = Bseek(bp, 0, 1);
.P2
in
.CW Boffset ,
which takes the
.CW "Biobufhdr *"
and returns the offset as a
.CW vlong :
.P1
loc = Boffset(bp);
.P2
To close an open file, use
.P1
int Bterm(Biobufhdr *bp);
.P2
.CW Bterm
will not close files opened with
.CW Binit ;
this allows use of the standard file descriptors after a
.CW Bterm
on them.
.PP
Let's rewrite
.CW runecount
to use libbio.
Note that we no longer need
.CW readrune
given
.CW Bgetrune .
.P1
#include <u.h>
#include <libc.h>
#include <bio.h>
uvlong
runecount(Biobuf *f, char *filename)
{
uvlong n;
Rune r;
n = 0;
while((r = Bgetrune(f)) != (Rune)Beof)
n++;
print("%10ulld %s\en", n, filename);
return n;
}
void
main(int argc, char *argv[])
{
int i;
uvlong total;
Biobuf bstdin, *bfile;
total = 0;
if(argc == 1){
if(Binit(&bstdin, 0, OREAD) == Beof){
fprint(2, "can't connect stdin to bio: %r");
exits("Binit");
}
runecount(&bstdin, "");
Bterm(&bstdin);
}else{
for(i = 1; i < argc; i++){
bfile = Bopen(argv[i], OREAD);
if(bfile == nil)
fprint(2, "can't open %s: %r\en", argv[i]);
else{
total += runecount(bfile, argv[i]);
Bterm(bfile);
}
}
if(argc > 2)
print("%10ulld total\en", total);
}
exits(0);
}
.P2
and test it:
.P1
% 8c runecount.c
% 8l -o runecount runecount.8
% runecount /lib/glass
6715 /lib/glass
% time runecount /lib/glass
6715 /lib/glass
0.00u 0.01s 0.02r runecount /lib/glass
.P2
Now the program is significantly faster, and it still yields the proper answer.
.PP
Given
.CW Bgetrune ,
is there a need for
.CW runecount ?
To be honest, this really depends on taste: one might argue that with libbio, we don't need to use the unbuffered
.CW read
and we will be just fine with
.CW Bgetrune ,
while another might say that someone may want to use
.CW readrune()
and therefore it should be preserved.
I will kill
.CW readrune()
in favor of
.CW Bgetrune .
I am doing this for several reasons:
.IP \(bu
Most programs use libbio and avoid the low-level system calls altogether.
.IP \(bu
If a program uses the system calls, it won't poll a byte or a
.CW Rune
at a time; it will just read an entire line or buffer.
.IP \(bu
Most functions deal with
.CW Rune s
implicitly, since a set of bytes makes up a
.CW Rune ,
and for those that don't, conversion and handling routines are so straightforward that they are used after input is read.
.LP
Feel free to disagree.
.SH
An Aside on Linking
.PP
The compilation process for
.CW runecount
in the previous example was shown on purpose: it showed that you did not need an explicit linker flag to link to libbio.
Of course, you could supply the libraries as arguments to the linker, in the form
.CW -l \fIext\fP,
where
.I ext
is the library name without the lib- prefix
.CW -lbio , (
for example).
.PP
But the C compilers do this for you every time you include the appropriate header file.
The C preprocessor reserves a special directive
.P1
#pragma \fItext\fP
.P2
where the
.I text
is implementation-defined.
For Plan 9's C compiler, if
.I text
is of the form
.P1
lib "\fIlibrary\fP"
.P2
then the library is automatically linked
.I "exactly once"
per program.
For example,
.P1
% grep '^#pragma[ →]lib' /sys/include/libc.h
#pragma lib "libc.a"
.P2
(A tab in the command line is represented by
.CW → .)
The file
.CW libc.a
is part of a collection of library files in
.CW /$objtype/lib .
The
.CW .a
means that the library was made with the ar program; see
.I ar (1).
.NH
Processes and Notes
.PP
Plan 9's process model, to the programmer, is very similar to
.UX 's.
You have
.CW fork ,
.CW exec ,
and
.CW wait ,
but they have changed quite a bit.
The system call is no longer
.CW fork
but
.CW rfork ,
which is much richer and more powerful.
And
.CW wait
is now
.CW await ,
which allows you to get a more precise indication of what happened and how.
.CW fork
and
.CW wait
are still there, but
.CW wait
is quite different.
And Plan 9 has no notion of the signal; instead, it uses
.I notes ,
which are strings.
.PP
The
.CW rfork
system call is simple:
.P1
int rfork(int mode);
.P2
The
.CW mode
is a bitmask of the following:
.IP \f(CWRFPROC\fP \w'\f(CWRFNOWAIT\fP'+5
Make a new process.
If not set, the mode is applied to the parent, allowing it to do things otherwise impossible.
Few programs ever need to do so (for example, ar and rio do, for their own reasons).
.IP \f(CWRFNOWAIT\fP
The parent cannot use the
.CW await
system call or any related routines on the child.
.IP \f(CWRFNAMEG\fP
The child inherits a copy of the parent's name space (see below).
If neither this nor
.CW RFCNAMEG ,
the child shares the parent's name space.
.IP \f(CWRFCNAMEG\fP
The child has a clean name space to start.
.IP \f(CWRFNOMNT\fP
DIsallow the
.CW mount
system call (described later) and access to special device directories
.CW # \fIletter\fP). (
.IP \f(CWRFENVG\fP
Copy environment variables.
Works the same as
.CW RFNAMEG .
.IP \f(CWRFCENVG\fP
Start with no environment variables.
.IP \f(CWRFNOTEG\fP
Child has its own
.I "note group" ,
so notes sent to it and its children don't affect the parent.
.IP \f(CWRFFDG\fP
Child's file descriptors are copied rather than shared.
.IP \f(CWRFCFDG\fP
Child has no file descriptors,
.I "not even standard ones" .
.IP \f(CWRFREND\fP
Don't allow the child to
.CW rendezvous
with the parent or its parents.
The
.CW rendezvous
system call is described below.
.IP \f(CWRFMEM\fP
Child and parent share data and “bss” segments — that is, global and local variables and function call.
.LP
As you can see,
.CW rfork
is a very powerful tool for controlling how a child behaves.
(Parents may want to pray for a real-life
.CW rfork .)
But for most purposes, all you want to do is make a child that has its own file descriptors and not be able to communicate with the parent —
.CW RFPROC|RFFDG|RFREND
— and that is what the routine
.CW fork
does.
Both return:
.IP \(bu
-1 on error
.IP \(bu
The child's process ID if the parent
.IP \(bu
0 if the child
.LP
and continue execution from where you left off.
So you can say
.P1
switch(pid = rfork(RFPROC | RFFDG | RFNOTEG | RFENVG | RFNOWAIT | RFREND)){
case -1:
sysfatal("rfork failed: %r");
case 0:
child();
exits(0);
}
parent();
exits(0);
.P2
The
.CW sysfatal
routine, which has the syntax
.P1
void sysfatal(char *mesg, ...);
.P2
prints the formatted message on standard error and terminates with that message as the status return.
If the global variable
.CW argv0
is set, it will be displayed before the message.
.CW argv0
should be set to
.CW argv[0]
before programs mess with it; the command-line option macros we will see shortly do this for you.
.PP
Usually a
.CW rfork
is followed by one of the
.CW exec
routines, which allow a process to be replaced by another.
The system call is
.CW exec ,
which is similar to
.UX 's
.CW execv :
.P1
void *exec(char *filename, char *argv[]);
.P2
replaces the current process with the one at
.CW filename ,
passing the given vector of arguments to the
.CW main
routine's
.CW argv .
The first argument
.CW argv[0] ) (
is the program's effective name; usually the name without path.
The final argument must be a null pointer; this is used to find
.CW argc .
The functions only return on failure and set the error string; the return value is insignificant.
Therefore, you can say
.P1
exec(prog, args);
sysfatal("exec of %s failed: %r", prog);
.P2
.PP
.CW execl
is a subroutine of the form
.P1
void *execl(char *filename, ...);
.P2
It turns each of its optional arguments into a member of an
.CW argv
array until a null pointer is seen, then calls
.CW exec .
Beware:
.P1
execl(filename, nil);
execl(filename); /* WRONG */
.P2
.PP
What denotes an executable file?
The user must have both execute and read permissions enabled on the file (although the manual page for
.CW exec
only states that execute is required), and the file cannot be a directory.
The file is opened with the mode
.CW OEXEC ,
which opens to read but requires execute permissions, and the first two bytes are scanned.
If the bytes are the characters
.CW #! ,
then the file is assumed to be text that is passed to another program.
If the first line of file
.CW f
is
.P1
#!/bin/rc
.P2
and
.CW f
is called by
.P1
execl("f", "a", nil);
.P2
then the call to
.CW execl
is, in effect,
.P1
execl("/bin/rc", "/bin/rc", "f", "a", nil);
.P2
Otherwise, the two bytes are put back and a
.CW long
is read.
If this does not equal the a.out magic number for the current CPU architecture (see
.I a.out (6)),
an error occurs.
Otherwise, the program is executed.
.PP
The
.CW await
system call, which has the form
.P1
int await(char *s, int n);
.P2
waits for a child that was not
.CW rfork -ed
with the
.CW RFNOWAIT
flag set to terminate.
When this happens, the first
.CW n
characters of a special string are stored in
.CW s
and the function returns the length of the special string that was stored (in case
.CW n
was too big), or -1 if there are no children to wait for.
The special string is of the form
.P1
\fIprocess-ID\fP \fIuser-time\fP \fIsystem-time\fP \fIreal-time\fP '\fIstatus-return\fP'
.P2
with spaces separating each field.
The status return is blank for successful termination; the appearance is
.CW '' .
The times are reported in milliseconds.
There is
.I no
.CW '\e0'
at the end of this string, so be sure to add one in your code:
.P1
char buf[256];
int n;
if((n = await(buf, 255)) >= 0)
buf[n + 1] = '\e0';
.P2
.PP
The
.CW tokenize
routine can be used to separate the individual fields:
.P1
int tokenize(char *str, char **array, int max);
.P2
.CW str
is the string to tokenize, wich is split into at most
.CW max
elements of the array by overwriting certain delimiters with
.CW '\e0' .
The function returns the number of tokens actually split.
The splitting rules are simple: split at whitespace, except treat quoted text as a single token.
The quoting rules are the same as in rc:
.TS
center;
lfCW l rfCW.
'hello' becomes hello
'stay here' becomes stay here
'the bee''s hive' becomes the bee's hive
'' becomes \fRa null string\fP
'''' becomes '
.TE
So the code to split into the individual fields is simple:
.P1
char *fields[5], buf[256];
int n;
if((n = await(buf, 255)) < 0)
sysfatal("await failed: %r");
buf[n + 1] = '\e0';
if(buf[n] != '\e'')
sysfatal("buffer was too small to hold await's message");
tokenize(buf, fields, 5);
print("pid %s took %s milliseconds and returned %s\en", fields[0], fields[3],
*fields[4] == '\e0' ? "success" : fields[4]);
.P2
.PP
This is what the
.CW wait
subroutine does.
.P1
Waitmsg *wait(void);
.P2
which waits for a process and returns a
.CW malloc -ed
structure of type
.CW Waitmsg :
.P1
typedef struct Waitmsg Waitmsg;
struct Waitmsg{
int pid;
ulong time[3];
char *msg;
};
.P2
where the fields are given in the same order that
.CW await
does, so
.CW time[1]
is system time.
.CW msg
is allocated with
.CW malloc ,
but you can't use
.CW free
since the
.CW malloc
that was used is not what you think.
You only have to
.CW free
the
.CW Waitmsg ,
and everything else is fine.
If you want to know the magic, see
.CW /sys/src/libc/9sys/wait.c .
if you want an example of
.CW wait
and
.CW Waitmsg ,
see the source for the
.CW time
command at
.CW /sys/src/cmd/time.c .
.PP
What happens if the command is interrupted (you hit the interrupt key)?
An interrupt usually kills the process by sending what's called a
.I note
to the process and all its children in the same
.I "note group" .
Forking the child to have the
.CW RFNOTEG
flag set allows the child to handle its own notes independently from the parent.
If
.CW time
did this, however, it would be unable to report that the command had been interrupted.
.PP
There are many different types of notes.
The most common are
.I interrupt ,
.I hangup ,
which is sent when you disconnect from a CPU server,
.I alarm ,
which is associated with the
.CW alarm
system call, and
.I "bad address" ,
which happens when you access invalid memory.
If any of these notes are not handled, the program terminates.
.PP
How can you handle notes?
rc allows you to define functions like
.CW sigint
that get executed when the specific note gets processed.
What really happens is rc registers its
.I "note handler"
to execute the function and return when the specific note is issued.
The system calls
.CW notify
and
.CW noted
do this.
.PP
Unlike with
.UX
signals, there is only one note handler function, which is registered with the
.CW notify
system call:
.P1
int notify(void (*f)(void *, char *));
.P2
The argument is a pointer to a function
.CW f
defined as
.P1
void f(void *ureg, char *note)
.P2
The
.CW ureg
argument is turned into a pointer to a structure of type
.CW Ureg ,
defined in
.CW /$objtype/include/ureg.h .
.CW Ureg
contains the values of machine registers at the time the note was
.I posted ,
and as such, is nonportable.
Few, if any, programs ever need to use this structure and/or this argument to the handler.
The second argument is the note string itself.
If the function passed to
.CW notify
is a null pointer, the default handler is restored.
The return value is insignificant.
.PP
Note handlers follow special rules.
They may not use floating-point operations, nor may they call functions that do.
A note handler cannot
.CW return ;
it must either exit, use the
.CW noted
system call, or call the
.CW notejmp
routine.
.CW noted
is of the form
.P1
int noted(int how);
.P2
.CW how
is
.CW NDFLT
if you want the system to do the default action or
.CW NCONT
if you want the system to go back to where the program left off.
The return value is insignificant, as the note handler doesn't return.
Also,
.CW jmp_buf ,
.CW setjmp ,
and
.CW longjmp
are provided, but you cannot
.CW longjmp
from within a note handler.
Instead, you use the safer
.CW notejmp
routine, which works the same as
.CW longjmp .
.PP
If a note interrupts a system call and the note handler calls
.CW noted(NCONT) ,
the system call terminates early with error string
.CW interrupted .
This is very important, as it can be a cause of errors.
Beware.
.PP
To send a process a note, use the
.CW postnote
subroutine:
.P1
int postnote(int who, int pid, char *note);
.P2
If
.CW who
is
.CW PNPROC ,
only the process is killed.
But if it is
.CW PNGROUP ,
all the processes in the process group is killed, with the exception of the current process if it is in that group.
This is a restriction of the operating system, not of
.CW postnote
itself.
On failure,
.CW postnote
returns -1.
A useful but undocumented note to post is
.CW kill ,
which terminates the process without giving it a fighting chance.
This is actually what the
.CW kill
command does:
.P1
term% kill rc
echo kill>/proc/2379/note # rc
echo kill>/proc/4431/note # rc
echo kill>/proc/5453/note # rc
echo kill>/proc/6233/note # rc
echo kill>/proc/6243/note # rc
echo kill>/proc/6445/note # rc
echo kill>/proc/6684/note # rc
echo kill>/proc/7005/note # rc
.P2
Piping that to rc will kill every rc, including the one you created in the pipe.
.PP
The
.CW alarm
note involves an
.I "alarm clock"
that each process has (and only one per process).
The
.CW alarm
system call is of the form
.P1
long alarm(ulong ms);
.P2
.CW ulong
is a synonym for
.CW "unsigned long" .
If its argument is 0, the alarm clock is cleared.
Otherwise, the alarm clock is set to send the note
.CW alarm
after the given number of milliseconds.
The return value is the number of milliseconds left in the previous alarm clock.
.CW alarm
can be used to write a command
.CW timeout
which stops a process from running after a given amount of time.
.P1
#include <u.h>
#include <libc.h>
int pid;
char *prog;
void
notehandler(void *, char *note)
{
if(strcmp(note, "alarm") == 0)
if(postnote(PNGROUP, pid, "kill") < 0)
sysfatal("could not time out %s: %r\en", prog);
else{
fprint(2, "timeout\en");
exits("timeout");
}
else
noted(NDFLT);
}
int
endswith(char *full, char *what)
{
int i;
char *wp = what + strlen(what) - 1;
for(i = strlen(full) - 1; wp >= what; i--, wp--)
if(full[i] != *wp)
return 0;
return 1;
}
void
main(int argc, char *argv[])
{
long ms;
Waitmsg *w;
if(argc <= 2){
fprint(2, "usage: %s seconds command-line\en", argv[0]);
exits("usage");
}
ms = strtoul(argv[1], nil, 10) * 1000; /* sec -> ms */
prog = smprint("/bin/%s", argv[2]);
switch(pid = rfork(RFPROC | RFFDG | RFENVG | RFREND | RFMEM | RFNOTEG)){
case -1:
sysfatal("fork failed: %r");
case 0:
exec(prog, &argv[2]);
prog = smprint("./%s", argv[2]);
exec(prog, &argv[2]);
sysfatal("exec failed: %r");
}
notify(notehandler);
alarm(ms);
w = wait();
if(w->msg[0] != '\e0'){
fprint(2, "%s failed with %s\en", prog, w->msg);
free(prog);
exits("failed run");
}
free(prog);
exits(0);
}
.P2 no
We have to provide
.CW endswith
since the C library doesn't provide the similar
.CW strrstr
(it does provide
.CW strstr
and other functions).
.CW smprint
creates, using
.CW malloc ,
a string which contains the fully formatted text.
Use this instead of a custom buffer and
.CW sprint ,
as it avoids the risk of truncating or overflow due to an improperly sized buffer.
The
.CW RFMEM
flag is set so the process can change
.CW prog
at will.
We kill with
.CW PNGROUP
in case the program that you run forks its own processes.
.PP
This example shows another feature of the Plan 9 C compilers: an unnamed argument signals that it is not used.
.ig
.NH
Segments, Interprocess Communication, and Locks
.PP
The easiest form of interprocess communication in Plan 9 is the pipe.
Pipes are implemented just as in
.UX ,
right down to the system call:
.P1
int pipe(int fd[2]);
.P2
creates a pipe of the form
.PS
File0: box "\f(CWfd[0]\fP"
move right
File1: box "\f(CWfd[1]\fP"
arrow -> with .start at 1/2 <File0.ne, File0.e>
arrow <- with .start at 1/2 <File0.e, File0.se>
.PE
with an arrow pointing from the writer to the reader.
However, Plan 9 has more sophisticated ways of interprocess communication.
.PP
A
.I segment
is a block of memory that can be shared.
Segments can be as small as
.CW int s
or as large as the system permits.
.CW fork
retains segments, but
.CW exec
will only do so if the program is too large that it overwrites the sgement.
We can use segments to implement shared memory.
You create a segment with the
.CW segattach
system call:
.P1
void *segattach(int attr, char *class, void *va, ulong len);
.P2
The
.CW class
is a string containing the type of segment.
For shared memory, the string is
.CW """shared""" ,
and for a segment for normal use, the string is
.CW """memory""" .
The attribute is zero or a bitmask of
.CW SG_RONLY
for a read only segment and
.CW SG_CEXEC
which releases the segment on an
.CW exec .
.CW va
marks where the segment is, or
.CW nil
if the system should choose.
Most users won't have a need for any other value.
The return value is the starting address of the segment on success, or
.CW "(void *)-1"
on error.
Its counterpart is
.P1
int segdetach(void *addr);
.P2
Simply pass the return value of
.CW segattach
to free the segment.
..
|