Dana Vrajitoru
C311 Programming Languages
Compiler Compilers
Compiler Compilers
- Programs that simplify the creation of compilers.
- An early version was written by Broker in 1960. It was called a
compiler-compiler but it was more of a configurable compiler.
- The first ones: 1975, Lex by M. Lesk and Yacc (Yet Another
Compiler Compiler) by S. Johnson.
- They must allow a full description of the syntax of the language
(usually expressed in tables) and of the target code generation.
- The result of the program execution is a compiler.
Lex
Source -> Lex -> yylex
Input -> yylex -> Output
- A program generator designed for lexical processing of character
input streams.
- It uses rules to determine the way the input is processed.
- By itself it processes the input to produce the output directly.
- A rule is made of a regular expression which is a pattern to be
matched by the input string, and an action to be taken when the
pattern in found.
Example in Lex
%%
[ \t]+$ ;
[ \t]+ printf(" ");
%% is the beginning of the set of rules,
[ ]+ means one or more of the content (the Kleene +)
$ means the end of the line. \t is any whitespace.
Every input character that is not matched by a rule is copied to
the output as is.
The output replaces any sequence of whitespaces with a single
space, except at the end of a line where they are deleted.
Yacc and Lex
lexical grammar
rules rules
|| ||
\/ \/
Lex Yacc
|| ||
\/ \/
Input -> yylex -> yyparse -> Parsed input
Lex is used for the lexical analysis of the program, while Yacc is
used for the parsing step.
Source File Format
Lex:
definitions
%%
rules
%%
user subroutines
|
Yacc:
declarations
%%
rules
%%
programs
|
Yacc Syntax
- Declarations:
%token name1 name2 ...
%start symbol
- Every symbol not defined as a token is considered a non-terminal.
- Rule:
NT : body ;
- where NT is a non-terminal, and the body represents the right hand
side of the rule, where all the symbols are separated by spaces.
- The | can be also be used to designate alternatives.
Examples of Rules
A : B C D | E F | G ;
- Actions can be associated with some of the rules:
A : '(' B ')' { hello( 1, "abc" ); }
- The function hello is defined in the programs section.
- Returned value:
expr : '(' expr ')' { $$ = $2 ; }
- This rule says that an expression is an expression enclosed in
parenthesis and that is evaluated to the result returned by the second
symbol on the right (expr).
$$ is the returned value.
$1, $2, ... are the symbols in the rule.
Example
%left '+' '-'
%left '*' '/'
%%
expr : expr '+' expr |
expr '-' expr |
expr '*' expr |
expr '/' expr |
'-' expr %prec '*' |
NAME ;