Sentence
A string of characters over some alphabet or finite set of symbols.
Language
A set of sentences.
Lexeme
The lowest level of a syntactic unit of a language. Are often not included in the formal description of the syntax of a programming language. Lexemes are divided into groups, each group receiving a token.
Token
A category of lexemes, e.g. an identifier.
Lexical analysis
The process of recognizing lexemes.
What are the lexemes in the line:
index = 2 * count + 17;
index, = , 2, *, count, +, 17, ;
What are the tokens in the line:
index = 2 * count + 17;
identifier, equal_sign, int_literal, multi_op, identifier, plus_op, int_literal, semicolon
Recognizers
Reads input strings over the alphabet of the language and decides whether the input string belong to the language. The syntax analyzer part of the compiler is a recognizer.
Generator
A device that generates sentences of a language. One can determine if the syntax of a sentence is correct if it could be made by a generator.
Formal Grammar
A set of rules for rewriting strings, along with a “start symbol” from which rewriting starts. Therefore, a grammar is usually thought of as a language generator.
What are the four basic formal rules for string patterns?
Concatenation, Alternation, Kleene Closure, and Recursion.
Concatenation
The adding of strings.
Alternation:
Choice among a finite number of alternatives.
Kleene Closure:
Repetition an arbitrary number of times.
Recursion:
Creation of a construct from simpler instances of the same construct.
Regular Languages
A set of strings that can be defined by the first three rules of string patterns, and are generated by regular expressions.
Context Free Language (CFL)
Any set of strings that can be defined if we add recursion and are generated by context-free grammars.
Context-Free Grammars
Developed by Noam Chomsky in the mid 1950s, they are language generators meant to describe the syntax of natural languages.
Backus-Naur Form (BNF)
Invented by John Backus in 1959 to describe the syntax of Algol 58 and later modified by Peter Naur to describe Algol 60. BNF is equivalent to context-free grammars.
Metalanguage
A language used to define another language. BNF is a metalanguage.
Abstractions
Used to represent classes of syntactic structures and act like syntactic variables. They are also called nonterminal symbols. Can have more than one RHS.
Terminals
Lexemes and tokens not defined in terms of other terminals or non-terminals. Non-terminals are enclosed in angle brackets.
Rule/Production
Has a LHS that is nonterminal and a RHS that is terminal and/or nonterminals.
Define Context-Free Grammar G:
A tuple of elements (S, N, T, P) where S is a start symbol, T is a set of terminals, N is a set of non-terminals, and P is a finite non-empty set of rules.