Plan 9 from Bell Labs’s /usr/web/sources/contrib/steve/root/sys/src/c++/demangle/README.dem

Copyright © 2021 Plan 9 Foundation.
Distributed under the MIT License.
Download the Plan 9 distribution.


/*ident	"@(#)cls4:tools/demangler/README.dem	1.5" */

###############################################################################
#
# C++ source for the C++ Language System, Release 3.0.  This product
# is a new release of the original cfront developed in the computer
# science research center of AT&T Bell Laboratories.
#
# Copyright (c) 1993  UNIX System Laboratories, Inc.
# Copyright (c) 1991, 1992  AT&T and UNIX System Laboratories, Inc.
# Copyright (c) 1984, 1989, 1990 AT&T.  All Rights Reserved.
#
# THIS IS UNPUBLISHED PROPRIETARY SOURCE CODE of AT&T and UNIX System
#	
# Laboratories, Inc.  The copyright notice above does not evidence
# any actual or intended publication of such source code.
#
###############################################################################
Demangler Release Notes for 3.0.1
January 1992


INTRODUCTION

This is the first release of the new C++ external name demangler.  It
was rewritten for a couple of reasons, notably the desire to separate
the unmangling and printing functions and the need to fully support
template names and nested class names.

In what follows it is assumed that the reader is familiar with page
122 and following of the Annotated C++ Reference Manual, where the
name mangling scheme is given.


INSTALLATION AND CUSTOMIZATION

The demangler has two files, dem.h and dem.c.  dem.h is #included in
user programs.  dem.c can be compiled either for a standalone program:

	$ cc -DDEM_MAIN dem.c -o dem

or to produce an object to be linked with an application:

	$ cc -c dem.c

	...

	$ cc appl.o dem.o

The standard main() driver program in dem.c will read from standard
input or from a set of command-line files, and unmangle names in
place, while passing all whitespace and invalid names through
unchanged.

There are a couple of customization options.  CLIP_UNDER should be set
(#defined) at the top of dem.c if external names on your system have a
leading underscore added to them.  This is common, for example on the
Sun SPARCstation.  An #if has been added for the most common machine
types.

dem.c uses its own space management system both for speed and to avoid
having to call malloc().  SP_ALIGN is the alignment of blocks that are
allocated;  4 bytes is the default.  If your system requires alignment
on other than a power-of-2 boundary, you will have to adjust the
function gs() to use the mod operator rather than the mask technique
that is used by default.

There is a constant MAXDBUF defined in dem.h.  This is the maximum
size of buffer required for an unmangled name's data structure.  The
4096 size is the worst-case setting for a maximum input mangled name
length of 256.  If you routinely have names longer than this, you
should adjust MAXDBUF upwards accordingly.

Finally, there is a compile-time value EXPLAIN that can be set.  If
turned on with -DEXPLAIN, the demanger will append a brief description
of the type of each input name to the demangled output.


USE OF THE STANDALONE DEMANGLER

The standalone program reads from standard input or from one or more
files and processes each line in turn.  All whitespace and
non-alphanumeric characters are passed through unchanged.  All other
characters are separated into names and given to the demangler.  If
the demangler fails, the name is printed unchanged.  Otherwise, the
unmangled name is printed.

The standalone demangler has an exit status of 0 unless one or more
input files could not be opened for reading;  in that case the exit
status is the number of unopenable files.


USE OF THE DEMANGLER LIBRARY

The basic idea is to call the dem() function:

	int ret;
	char inbuf[1024];
	DEM d;
	char sbuf[MAXDBUF];

	ret = dem(inbuf, &d, sbuf);

inbuf is the input name, d the data structure that dem() fills up, and
sbuf the internal buffer that the demangler writes into to allocate
its data structure.

dem() returns -1 on error, otherwise 0.

dem.h has comments describing each field in the data structures.  The
data structures are somewhat complicated by the need to handle nested
types and function arguments which themselves are function pointers
with their own arguments.

To format this data structure, there are several functions:

	dem_print		- format a complete name

	dem_printcl		- format a class name

	dem_printarg		- format a function argument

	dem_printarglist	- format a function argument list

	dem_printfunc		- format a function name

	dem_explain		- return a string explaining a type

	demangle		- demangle in (char*) to out (char*)


AN EXAMPLE APPLICATION

This particular application reads from standard input and displays the
class name for each mangled name read, or "(none)" on errors and C
functions/data.

	#include <stdio.h>
	#include "dem.h"
	
	main()
	{
		char sbuf[MAXDBUF];
		DEM d;
		int ret;
		char buf[1024];
		char buf2[1024];
	
		while (gets(buf) != NULL) {
			ret = dem(buf, &d, sbuf);
			if (ret || d.cl == NULL) {
				printf("%s --> (none)\n", buf);
			}
			else {
				dem_printcl(d.cl, buf2);
				printf("%s --> %s\n", buf, buf2);
			}
		}
	}


TYPENAMES

The demangler handles mangled class typenames, whether they are
simple, nested, or template classes.  For example:

	A__pt__2_i --> A<int>

	__Q2_1A1B --> A::B


LOCAL VARIABLES

The demangler also handles local variables of the form:

	__nnnxxx

For example:

	__2x --> x

USE OF THE OLD C++FILT PROGRAM

Many of the previously supported options to c++filt are no longer
supported. If you need to use them, you can build the old c++filt
using the source in the subdirectory "osrc".  Note that we are
planning to eliminate them in a future release, so please send mail to
your C++ support person explaining what option you needed and why.


KNOWN BUGS AND AMBIGUITIES

1.  "signed" and "volatile" encodings are not handled.

2.  The encoding for nested classes as mentioned on page 123 of the
ARM is handled slightly differently in cfront;  there is a "_" after
the digit after the "Q".

3.  A nested class starting with "Q" sometimes has the length encoded
before it;  the demangler handles either case.

4.  The "Tnn" and "Nnnn" notations mentioned on page 124 are not fully
supported.  It is assumed that the number of the designated argument
is less than or equal to 9.  So if you have 11 or more arguments, and
you want to repeat argument 10 or greater, the demangler will reject
the encoded name.

5.  All literal arguments to templates are assumed to be const.  For
example, the non-const literal value "37" is encoded as "Ci".

6.  Some compilers will add a gratuitous "_" before external names.

7.  The grammar allows class names up to 999 characters.  This is
considered important for handling templates.


THE GRAMMAR FOR EXTERNAL NAMES

start		-->	name

######################### COMPLETE NAMES #########################

name		-->	sti | std | ptbl | func | data | vtbl | cname3 | local

sti		-->	"__sti" "__" id

std		-->	"__std" "__" id

ptbl		-->	"__ptbl_vec" "__" id

func		-->	"__op" arg funcpost | id funcpost

funcpost	-->	"__" funcpost2 | "__" cname funcpost2

funcpost2	-->	csv "F" arglist

csv		-->	"" | "C" | "S" | "V"

data		-->	id | id "__" cname

vtbl		-->	"__vtbl" "__" cname

local		-->	"__" num regid

######################### CLASS NAMES #########################

cname		-->	cname2 | nest

nest		-->	"Q" digit "_" cnamelist

cnamelist	-->	cname2 | cnamelist cname2

cname2		-->	cnlen cnid

cname3		-->	cnid | "__" nest

cnlen		-->	digit | digit digit | digit digit digit

cnid		-->	id | id "__pt__" cnlen "_" arglist

######################### ARGUMENT LISTS #########################

arglist		-->	arg | arglist arg

arg		-->	modlist arg2 | "X" modlist arg2 lit

modlist		-->	mod | modlist mod

mod		-->	"" | "U" | "C" | "V" | "S" | "P" | "R" | arr | mptr

arr		-->	"A" num "_"

mptr		-->	"M" cname

arg2		-->	fund | cname | funcp | repeat1 | repeat2

fund		-->	"v" | "c" | "s" | "i" | "l" | "f" | "d" | "r" | "e"

funcp		-->	"F" arglist "_" arg

repeat1		-->	"T" digit | "T" digit digit

repeat2		-->	"N" digit digit | "N" digit digit digit

lit		-->	litnum | zero | litmptr | cnlen id | sptr

litnum		-->	"L" digit lnum | "L" digit digit "_" lnum

litmptr		-->	"LM" num "_" litnum "_" cnlen id

lnum 		-->	num | "n" num

sptr		-->	cnlen id "__" cname

zero		-->	0

######################### LOW LEVEL STUFF #########################

digit		-->	0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

id		-->	special | regid

special		-->	"__ct" | "__pp" # etc.

regid		-->	letter | letter restid

restid		-->	letter | digit | restid letter | restid digit

letter		-->	"A"-"Z" | "a" - "z" | "_"

num		-->	digit | num digit

Bell Labs OSI certified Powered by Plan 9

(Return to Plan 9 Home Page)

Copyright © 2021 Plan 9 Foundation. All Rights Reserved.
Comments to [email protected].