CIS 400 LECTURE 5
Syntax
·
Defined
as set of rules specifying legitimate sequence of characters for allowable
programs
·
Grammar
º set of rules
Semantics
·
Meaning
assigned to syntactic constructs
Characteristics
of good syntax
·
Readability
·
Writeability
·
Redundancy
·
Easy
to translate
·
Lack
of ambiguity
Program
structure and sub-structure
·
Separate
sub-program definitions
·
Nested
sub-programs
·
Separate
data from code (i.e. pre-compiled headers in C++)
·
Un-separated
sub-programs
Language
grammar
"T"
is some alphabet for some set of languages
{character
set for machine}
T*
all possible sequences: "strings", "words" or
"tokens"
"language"
is some sub-set of T*
Language
is a 4-tuple
L(T,
N, P, S) T = terminal N = non-terminal P = productions S =
start
Generative
grammars
Let
a,b Î T
A,B,S Î N
P: S®aAB rule 1 S is start
S®a rule 2
A®bs rule 3
B®b rule 4
Derivation
example
S ® aAB by rule 1
aAB ® abSB by rule 3
abSB
® abSb
by rule 4
abSb
® abab
by rule 2 (this is considered
a "derivation")
Finite
state machine
·
Finite
number of states (N: non-terminals)
·
Start
state (S: where S Î N)
·
Final
states (FÍ N)
·
Sets of inputs (T or alphabet)
·
Transition function: dI =T C N® N
T = {a,b} F = {S2} N = {S0, S1, S2,
S3}
|
a |
b |
S0 |
S1 |
S |
S1 |
S3 |
S2 |
S2 |
S1 |
S3 |
S3 |
S3 |
S3 |
For derivation aabbaabbab: state is "stuck" in S3
For derivation bbbabab: arrives in final state S2
bn(ab)(ab)m is accepted by this machine, (also represented as b*(ab)+
expression)
Four classes of grammars (per Chomsky)
·
Regular, (where x & y are null): at most one non-terminal
A®a
Þ S®e (allowed only
if NOT A®S)
A®aB
·
Context-free grammars, (x & y are null), ZÎ(T È N)*
A®e
A®abAbB
·
Context-sensitive, ( xAy®Z, where x,y,z Î (N È T)*)
S®e (if S never on RHS)
AaBS®abbBBa
·
Unrestricted, (a.k.a. recursively enumerable): no restrictions
Regular expression accepted by finite state machine
CFG accepted by push-down automata
·
Current state
·
Top of stack symbol
Procedure for this push-down automata
Add a,b,c to stack. If a or b, remain in S. Else if c, move to S2. Once in S2, pop
the stack, then:
1) if input matches stack, erase input symbol
2) if no match, go to S3
3) if input ends and stack not empty, go to S3
Context-sensitive grammar accepted by a "linear bound
automaton", (similar to a Turing machine but with a finite input tape).
Unrestricted grammar accepted by a Turing machine
·
infinite input tape
·
read/ write head
·
finite instruction set
Turing machine represented by a 5-tuple, T=(S1, a, S2,
b, M)
S1 º current machine
state
a º symbol
"under" the read/ write head
S2 º next state, (in
machine)
b º symbol to
replace, or written over, "a"
M º move instruction,
instructs to move left or right on TAPE
TOP-DOWN PARSING
EXAMPLE: the dog bit the boy
Sentence::= noun phrase verb phrase
Noun phrase::= article noun
Verb phrase::= verb object
Object::= noun phrase
Article::= the | a | an
Noun::= dog | boy | girl
Verb::= bit | saw | wrote
Bottom-up parsing