Dana Vrajitoru
C311 Programming Languages
Regular Grammars, Scanners, DFA
DFA versus NFA
- DFA - deterministic finite state automaton.
- A state transition machine where from each state we can move to
one of the others based on the current terminal symbol.
- There at most one outgoing branch for each possible symbol.
- NFA - non-deterministic finite state automaton.
- The difference with a DFA is that there can be more than one
outgoing branch with the same symbol.
- Regular grammars can be translated as NFAs.
NFA to DFA
- To write a scanner more easily, we can transform an NFA into a
DFA.
- For this we construct a DFA where the states are combinations of
states from the NFA that represent alternatives that can be reached
from any given state with the same symbol.
- For example in the first grammar we would add a state called TV:
Example: NFA (top) and equivalent DFA (bottom)
Regular Expressions
- Recognized by regular grammars.
- They can be parsed by a linear algorithm that doesn't need to
store the global structure of the parsing tree.
- Only one non-terminal symbol is active at any moment.
- This type of parsing algorithm is called a scanner.
- The first compilation step, the lexical analysis, is done by a
scanner.
Examples
Finite States Machines
- A string (simple character means one that is not a quote, simple
quote, backslash):
S => " T
T => simple-character T
T => \ V
V => {"|'|\|n|t...} T
T => "
- Real Number
N => {-|+|e} UN
UN => UI . UI
UI => digit UI
UI => digit
- The second rule above is not a regular rule. It needs a parser and
not a scanner to be processed.
- We can transform it the following way:
UN => digit UN
UN => digit
UN => . UF
UF => digit UF
UF => digit
Scanner / Parser
- The lexical analysis of a program is done by a scanner.
- A scanner is based on a regular grammar where the end of any chain
of rules (a non-terminal converted to a single terminal) produces a
token as a result.
- The parsing process calls the scanner during its execution to
retrieve every new token, then processes it before invoking the
scanner again.
- They are not two separate phases: the scanner is called repeatedly
from the parser.
Scanner in Context
A scanner that works in the context of a program in which it
recognizes individual units.
It can be written to recognize the termination of a unit by a
character that is not part of it.
We can replace:
UN => digit UN
UN => digit
with:
UN => digit UN
UN => e
Table Driven Scanner
If the grammar is equivalent to a DFA, then we can organize the
rules in a table:
I => letter J
I => _ J
J => letter J
J => digit J
J => _ J
J => - J
J => e
Table:
| letter | digit | _ | - | else
|
I | J | err | J | err | err
|
J | J | J | J | J | end
|