Dana Vrajitoru
C311 Programming Languages
Introduction to Formal Grammars
Formal Grammars
- Abstract structures that describe a language precisely.
- Composed of a set of rules that can be used to identify correct /
incorrect strings of tokens in a language.
- They are used in particular during the compilation, mostly in the
syntactic analysis phase (parsing) and in part during the semantic
analysis phase (translation to machine language).
- Other major application: natural language processing (NLP), image
synthesis.
Formal Definition
- Let L be a finite set (alphabet, terminal symbols) and N be a set
of non-terminal symbols.
- The goal of a grammar is to generate all possible strings over the
alphabet L that are syntactically correct in the language.
- This is done by a set of production rules. A rule expands a
non-terminal symbol as a sequence which can be any combination of
terminal and non-terminal symbols.
- Usually there is a starting non-terminal symbol.
Production rules
- A production rule is of the form
s1 T s2 => s3
where T is any non-terminal symbol and s1, s2,
and s3 are strings made of any combination of terminal and
non-terminal symbols. We say that the rule expands the symbol T.
- A production rule can be recursive if the symbol on the left
appears in the string to the right.
- A grammar produces strings over the alphabet L by starting from a
string containing only the starting symbol.
- It transforms a string into another by replacing one non-terminal
symbol at a time with the right-hand side of a production rule that
expands it. The transformation stops when the resulting string is made
only of terminal symbols.
Example
Production Tree
- We can build a tree to show the production rules that we apply to
derive any string in our class.
- The leaves are terminal nodes. All the other nodes are
non-terminal symbols. An ordered print of all the leaves is the final
string.
; Lisp implementation of the grammar.
(defun check-S (Lst)
(if (equal (car Lst) 'a)
(check-T (cdr Lst))
nil))
(defun check-T (Lst)
(if (equal (car Lst) 'b)
(or (check-T (cdr Lst))
(check-V (cdr Lst)))
nil))
(defun check-V (Lst)
(if (and (equal (car Lst) 'a))
(not (cdr Lst))))
(check-S '(a b a)) ; t
(check-S '(a b b b a)) ; t
(check-S '(b a b)) ; nil
Types of Grammars
- Noam Chomsky defined a hierarchy in 1956:
- Type 0: all formal grammars (unrestricted). They generate
all languages that can be recognized by a Turing machine.
- Type 1: context-sensitive. Rules are of the form
s1 S s2 => s1 s3
s2a y b, where si are any strings made of
terminals and/or non terminals and S is a non terminal. Equivalent to
linear bounded Turing Machine.
- Type 2: context-free. They are of the form S => str, where
S is a non terminal ans str is any string made of terminals and/or non
terminals. Equivalent to a pushdown automaton (using a stack). They
describe most programming languages.
- Type 3: regular grammars. The rules are of the form S => T
or S => a T or S =>T a or S => a, where a is a terminal, S and T are
non-terminals. They are equivalent to finite state machines.
Examples of Rules
- Type 0: S b c => T b a V
- Type 1: many NP =>many noun-plural
- Type 2: Condition => ( Expression )
Assignment => variable = Expression ;
- Type 3: see example used before.