Lexical Analysis Flashcards

Question 1

Q

Language implementation systems must always…

Answer

A

analyze source code.

Question 2

Q

What are most syntax analysis based on?

Answer

A

A formal description of the syntax of the source language (BNF).

Question 3

Q

What are the two parts of the syntax analysis portion of a language processor?

Answer

A

A low-level part (lexical analyzer, a finite automaton based on a regular grammar) and a high-level part (syntax analyzer, a push-down automaton based on a context-free grammar or BNF).

Question 4

Q

Lexical Analyzer

Answer

A

A pattern matcher for character strings, acting as a “front-end” for the parser.

Question 5

Q

What are the three approaches to building a lexical analyzer?

Answer

A

Writing a formal description (using regular grammars) of the tokens using a software tool that creates a table-driven lexical analyzer (e.g. lex on UNIX or flex).
Design a state diagram that describes the tokens and write a
program that implements the state diagram.
Design a state diagram that describes the tokens and hand-construct a table-driven implementation of the state diagram.

Question 6

Q

Finite State Automata (FSA)

Answer

A

A mathematical model for design descriptions of processes such as lexical analysis. They can model the design description of recognizing string patterns based on regular expressions. There are a finite number of states and transitions between states.

Question 7

Q

Accepting

Answer

A

A subset of states is called accepting. Accepting state corresponds to
a single token type.

Question 8

Q

Describe the formal mechanism for a FSA.

Answer

A

Begin at the start state, then process the input by character, each character triggering a new transition. If the FSA reaches an accepting state, then the token type has been determined.

Question 9

Q

Deterministic FSA (DFA)

Answer

A

Contains a set of states with an input alphabet, unique start state, unique end symbol ($), and a final or accepting state. Can only be deterministic if for each state there is only on outgoing arc from that state for each input symbol.

Question 10

Q

What are examples of DFA?

Answer

A

Catenation, repetition, and alternation.

Question 11

Q

What is a configuration on a DSA?

Answer

A

Consists of a state and the remaining input.

Question 12

Q

What is a “move?”

Answer

A

A move consists of traversing the arc exiting the state that
corresponds to the leftmost input symbol, thereby consuming it. If no arc exists, either the state is final or there is an error.

Question 13

Q

When is an input accepted?

Answer

A

It is accepted when, starting with the start state, the automaton
consumes all the input and halts in a final state.

Question 14

Q

How should a state diagram be designed?

Answer

A

Do not include a transition from every state on every character (too large!) Combine transitions to simplify the diagram. For example, when recognizing an identifier, let all uppercase and lowercase letters be equivalent, then include a character class.

Question 15

Q

How should a state diagram handle reserved words?

Answer

A

Reserved words can be combined with identifiers. A table look-up can be used later on to determine if an identifier is actually a reserved word.

Question 16

Q

Lexers

Answer

Study These Flashcards

A

Machines recognizing multiple patterns at once. Instead of “end of input,” “not part of this pattern” is considered the end of the pattern. The pattern is then accepted, then return to the start state. The character is still pushed back or saved in the case that it is the beginning of another pattern.

Question 17

Q

What are two implementation problems with lexers?

Answer

Study These Flashcards

A

Sometimes it cannot be determined if a token has ended without a look-ahead (nowadays most languages can be tokenized with a one-character look-ahead). Additionally, some tokens may be a proper substring of another token. This conflict is usually resolved by looking for the longest match.

Question 18

Q

How can a look-ahead be performed during lexing?

Answer

Study These Flashcards

A

Look-aheads typically look forward one character. In C++ streams, peek() can be used to access the next character and read() reads it. If it belongs to the next token, putback() can be used to put the character back.

Question 19

Q

How are lexers used by a parser?

Answer

Study These Flashcards

A

Parsing is done by reading a token at a time and matching as they are read. The lexer is usually a function that is called by the parser to obtain the tokens (getToken(), or yylex() if using an automatic lexer generator).