scanning
maps characters into tokens and finds token syntax errors
a.k.a tokenising
flex
rule based scanner generator that constructs a DFA
domain specific languages
a small language for a narrow purpose
aims to make it easy to write concise programs
flex is an example of a DSL for generating a scanner
parsing
builds parse trees (ASTs) from tokens
identifies phrase syntax errors
hand coded parsers
recursive descent or top-down parsing
some grammars cannot be processed with recursive descent
parser generators
top-down or bottom-up parsers
grammar categories
type 3 - regular grammar
type 2 - context-free grammar
type 1 - context-sensitive grammar
type 0 - free grammar
type 3 grammar
finite state automata, state and transitions
next state = input + current state
useful for token recognition
type 2 grammar
pushdown automata, FSA and stack
next state = input + current state + top of stack
useful for phrase recoginition
interpreter
computes an answer from a parse tree
hand coded interpreter
tree walk
interpreter generator
not used outside of research
type checking (lang processing)
interprets a program to verify its type
type inference (lang processing)
interprets a program to compute its type
tree interpreters
interpretation process is a tree walk
code for node type can change what is next
graph interpreters
used in combinator redution languages
linear interpretation
virtual machines like the JVM
mimics hardware in software, write once and run anywhere
0-address architecture
architecture used by VMs where there are no registers and only a stack
operands are implicitly consumed from the top of the stack and results are pushed to the top of the stack
instruction set design options
program representation options
compiler
translates source program to target program
they can either
* generate target machine code
* generate intermediate machine code (VM code)
intermediate languages
three address code
idealised intermediate language
dst = src1 op src2
generated by doing a tree walk
IR optimisation