Plan 9 from Bell Labs’s /usr/web/sources/contrib/rsc/talk/pdos2.tr

Copyright © 2021 Plan 9 Foundation.
Distributed under the MIT License.
Download the Plan 9 distribution.


.so tmac.ppt
.de X1
.br
.ne 2.5i
.mk
.P1
.ps 20
.vs 22
..
.de X2
.P2
.rt
.P1
.in +3i
.ps 20
.vs 22
..
.de X3
.P2
..
.TL
.sp -2
\f(CWrscc\fP: an extensible compiler
.AU
.nf
Russ Cox
Frans Kaashoek
Eddie Kohler
.sp 2
.ft R
PDOS Group Meeting
.br
January 24, 2006
.sp 2
.CW http://swtch.com/~rsc/talks/


...
.SL "Problem (big picture)

Programs are buggy.


Because programming languages don't work for you.


You work for them.


How do you resolve mismatch between
programming task and language?


.SL "Outline

Problem \- 5 slides

Specific Extensions \- 10 slides

How to Implement \f(CWrscc\fP \- 10 slides
 \- parser
 \- internal representation

Please interrupt with questions.

.SL "Good Languages?
.nf

Some languages make some bugs easier to avoid / find.

\- good abstractions 

\- type systems


Never good enough.  
People extend languages any way they can.

\- C preprocessor (easy, but rarely good)

\- custom preprocessors (good, but rarely easy)

\- custom checkers

.SL "Useful Preprocessors
.nf

Click (Kohler)
 \- language to describe module properties, router configs


Tame (Krohn)
 \- more convenient syntax for libasync
 \- (Even \f2I\fP can write tame programs!)


Ace (Gosling)
 \- syntax-based preprocessor, used for bitblt


QT Rewriter
 - used for simpler event syntax in QT library


.SL "Useful Checkers
.nf

Magik (Engler)
 \- checking of flow-sensitive conditions, using xgcc
 \- more recent work about guessing things to check
 
 
Uno (Holzmann)
 \- another static checker, actually available.


Sparse (Torvalds)
 \- add annotations to Linux source for lock/unlock pairing,
    kernel/user address spaces


.SL "Implementation Techniques
.nf


Start from scratch
 \- lots and lots of work, requires knowing about compilers



Edit an existing compiler
 \- lots of work, requires knowing about compilers,
    requires \fIknowing\fP that particular compiler


.SL "Problem (recap)
.nf


Programs are buggy.

Because languages aren't good for programming.



Lots of good can come from tailoring the language.

But it's far too much work for most programmers.



How can we lower the bar?


.SL "Solution: Extensible Compiler
.nf

Extensible compiler

 \- let programmers add extensions (\f(CW.so\fP files)
    to the compiler during compilation.

 \- extensions are separate from programs that use them
    (think media-player codecs)

 \- extensions add new language features, new checking

 \- make it easy to write these extensions

 \- hide compiler when possible

 \- make compiler easy to understand 


.SL "Solution: Extensible compiler

.PE
Extensible compiler, without extensions:
.PS
X1: box wid 1.5 ht 0.35 "\f(CWrscc\fP"
SRC: box invis wid 2 "C program" with .e at X1.w-(1,0)
OBJ: box invis wid 2 "executable" with .w at X1.e+(1,0)
arrow from SRC.e to X1.w-(.2,0)
arrow from X1.e+(.2,0) to OBJ.w
SRC2: box invis wid 2 "" "" with .n at SRC-(1,0)
.PE
Extensible compiler, with one extension:
.sp
.PS
X1: box wid 1.5 ht 0.35 "\f(CWrscc\fP"
SRC: box invis wid 2 "program" "using" "extension" with .e at X1.w-(1,0)
OBJ: box invis wid 2 "executable" with .w at X1.e+(1,0)
arrow from SRC.e to X1.w-(.2,0)
arrow from X1.e+(.2,0) to OBJ.w

SRC2: box invis wid 2 "extension" "source" with .n at SRC.s-(1,1)
X2: box wid 1.5 ht 0.35 "\f(CWrscc\fP" with .w at SRC2.e+(1,0)
OBJ2: box fill 0.9 wid 1.5 ht 0.35 "\f(CWext.so\fP" at X1.se-(.25,.1)
arrow from SRC2.e to X2.w-(.2,0)
spline -> from X2.e+(.2,0) to OBJ2.s-(0,1) then to OBJ2.s
.PE



.SL "Expected Uses
.nf

Extensions are usually small (and can start small).


Programmers write compiler extensions
while writing the programs that use them.


Project leaders write compiler extensions 
to make big project easier for others.


Net work should be less than not using extensions.


Make possible the language changes mentioned earlier.


.SL "Example Extension: Alef
.LP
.X1
.in 0
.in -0.5i
void
primetask(chan(int) c)
{
	chan(int) nc;
	int i, p;


	p = <-c;
	print("%d\en", p);
	alloc nc;
	task primetask(nc);
	for(;;){
		i = <-c;
		if(i%p)
			nc <-= i;
	}
}
       Alef original
.X2
.in -4i
void
primethread(void *arg)
{
	Channel *c, *nc;
	int i, p;

	c = arg;
	p = recvul(c);
	print("%d\en", p);
	nc = chancreate(sizeof(ulong), 0);
	threadcreate(primethread, nc, 1024);
	for(;;){
		i = recvul(c);
		if(i%p)
			sendul(nc, i);
	}
}
       C transliteration
.X3

.SL "Example Extension: Alef
.X1
    Alef
   
chan(int) nc;
alloc nc;
task primetask(nc);
nc <-= i;
.X2
    C + libthread
   
Channel *nc;
nc = chancreate(sizeof(ulong), 0);
threadcreate(primethread, nc, 1024);
sendul(nc, i);
.X3

Benefits of Alef:
 \- more concise, more readable.
 \- type-checked channels.
 \- good thread creation syntax.
 \- good communication syntax (\f(CWalt\fP)

Alef is dead.
 \- compiler author moved on.
 \- those left behind switched to C.
    gave up the benefits for ease of maintenance


.SL "Example Extension: Tame

.X1
.ta +2n +2n +2n
TAME(static void 
  try_connect(str h,
    int port))
{
	VARS { int fd; }
	BLOCK{ tcpconnect(h, 
	      port, @(fd)); }
	


	if(fd < 0)
		warn << h << "\en";
	exit(0);
}

  Tame TCP client.
.X2
static void cb1(str, int);

static void try_connect(str h, 
    int port)
{
	tcpconnect(h, port, 
	    wrap(cb1, h));
}
static void cb1(str h, int fd)
{
	if(fd < 0)
		warn << h << "\en";
	exit(0);
}

  Equivalent libasync
.X3

.SL "Example Extension: Tame

.X1
.ta +2n +2n +2n
TAME(static void 
  try_connect(str h,
    int port))
{
	VARS { int fd; }
	BLOCK{ tcpconnect(h, 
	      port, @(fd)); }
	
	

	if(fd < 0)
		warn << h << "\en";
	exit(0);
}

  Tame TCP client.
.X2
static void cb1(str h, int fd)
{
	if(fd < 0)
		warn << h << "\en";
	exit(0);
}
static void try_connect(str h, 
    int port)
{
	tcpconnect(h, port, 
	    wrap(cb1, h));
}



  Libasync in the wild.
.X3

.SL "Example Extension: Tame

.X1
VARS { int fd; }
BLOCK{ tcpconnect(h,
    port, @(fd)); }
exit(fd);
.X2
static void cb1(void) {exit(fd);}
tcpconnect(port, wrap(cb1));
.X3



Benefits of Tame:
 \- more concise, more readable
 \- does not encourage upside-down programming
 \- easy to handle multiple parallel threads-of-control
 \- can write structured programs again (\f(CWif\fP, \f(CWfor\fP, ...)


 

.SL "Example Extension: Plan 9 Kernel Checker
.nf

Plan 9 kernel has many conventions to remember:
(So does any large program.)

 \- lock/unlock pairing
 \- no sleeping in interrupt handlers
 \- interrupt handlers must use ilocks
 \- exception stack for error handling
 \- kernel/user pointers
 \- no dereferencing user pointers in interrupt handlers
 \- ...

Many are checked at run-time.
No fundamental reason not to check at compile time.


.SL "Example Extension: Plan 9 Kernel Checker
.nf

Add attributes to document and enforce conventions.

.P1
.ps 20
.vs 22
void *ialloc(int) attribute(nosleep);
void *malloc(int) attribute(nosleep);
void print(char*, ...);

void*
ialloc(int n)
{
	void *v;
	
	v = malloc(n);


	return v;
}
.P2

.SL "Example Extension: Plan 9 Kernel Checker
.nf

Add attributes to document and enforce conventions.
.P1
.ps 20
.vs 22
void *ialloc(int) attribute(nosleep);
void *malloc(int) attribute(nosleep);
void print(char*, ...);

void*
ialloc(int n)
{
	void *v;
	
	v = malloc(n);
.RGB 0.75 0 0
	if(v == nil)
		print("ialloc returning nil\en");
.CO black
	return v;
}
.P2
.ps
.L1
Compiler can tell programmer he made a mistake.


.SL "\f(CWrscc\fP overview
.nf

Front end only
  \- compile to low-level C and invoke \f(CWgcc\fP

Written using itself

Extensible LR(1) parser

Extensible grammar notation

Destructuring syntax

Restructuring syntax

Other extensions as convenient


.SL "Extensible LR(1) parser
.nf

Shift-reduce parsers run an NFA on input.
 \- NFA says shift or reduce at each step
     Parse \f(CWx * x + x
     x            * x + x   shift
     x *          x + x     shift
     x * x        + x       reduce x => x*x
     x            + x       shift
     x +          x         shift
     x + x                  reduce x => x+x
     x                      
.ft

.SL "Extensible LR(1) parser
.nf

Shift-reduce parsers run an NFA on input.
 \- NFA says shift or reduce at each step
 \- NFA corresponds exactly to grammar rules

\f(CWYacc\fP precomputes, hard-codes equivalent DFA.

New parser for rscc:

 \- run determinization on the fly, caching result.
 \- if grammar changes, throw away DFA built so far.

 \- normal C library interface
.P1
yynewsym(g, "\e"**\e"", 0);
yylexnewsym(g, "\e"**\e"");
yyrulestr(g, yyaction, "expr", "expr", "\e"**\e"", "expr", 0);
.P2

.SL "Grammar Syntax
.nf

Use an extension to present a better parser interface.
.P1
yacc {
	yaccsym right "**" %prec "*";

	expr: expr "**" expr;
}
.P2
.L1
Compiles into appropriate function calls.

Provides default \f(CWyyaction\fP rule
that builds an abstract syntax tree node.

.SL "Destructuring Syntax
.nf

Need to make abstract syntax tree nodes accessible.

Don't want extension writer to need to know internal representation.

Provide matching operator \f(CW~\fP for destructuring syntax.

.P1
Node *a, *b, n;

if(n ~ `expr{ \ea**\eb })
	warn("using **: base=%N exponent=%N", a, b);
.P2


.SL "Restructuring Syntax
.nf

Same syntax as destructuring, but used as expression
instead of next to \f(CW~\fP.

.P1
Node *n, *a, *b;

if(n ~ `expr{ \ea**\eb })
	n = `expr{ pow(\ea, \eb) };
.P2
.L1
The \f(CWa\fP and \f(CWb\fP are substituted as units,
not as token streams.

.P1
int x;
Node *n, *nn;

x = 2;
n = `expr{ 1+\ex };
nn = `expr{ \en*3 };
.P2

.SL "Extensible Functions

Linked list of function pointers implementing a function.

Each function can handle a call itself,
with or without invoking next function in the list.

Syntax codifies the idiom.
.P1
extend
Node*
compile(Node *n)
{
	Node *a, *b;
	
	if(n ~ `expr{ \ea**\eb })
		return compile(`expr{ pow(\ea, \eb) });
	return default;
}
.P2

.SL "Extensible data structures

Same idea as functions; not sure of implementation.

.P1
extend
struct Node
{
	int extra;
	int other;
	int field;
};
.P2


.SL "Other extensions: iterators

.P1
static int start(State *z, Rule *r);
static int next(State *z, Sym **s);
static int end(State*);
iterator(start, next, end);

Rule *r;
Sym *s;

for(s in r)
	something(s);

.P2


.SL "Related Work
.nf

Lots of macro systems
 \- can't write checkers with those.
 \- often run into scoping problems

Other extensible languages/compilers
 \- require programmers to know too much about internals
 \- not easy enough to write extensions


.SL "Status

\f(CWRscc\fP exists.
 \- uses extensible grammar, yacc syntax
 \- provides its own grammar action functions
 \- uses restructuring, not destructuring
 \- uses extensible functions heavily
 \- no extensible data structures yet
 \- uses iterators heavily

 \- know how to switch over to implicit 
    actions and destructuring
 \- need to find ways to hide other aspects
    of compiler internals (types, flow control, ...)

 \- submit to OSDI

 \- always looking for more target uses


Bell Labs OSI certified Powered by Plan 9

(Return to Plan 9 Home Page)

Copyright © 2021 Plan 9 Foundation. All Rights Reserved.
Comments to [email protected].