Thursday, 13 September 2012

Compiler Construction Viva notes

Viva voice notes

Subject : Compilers Construction

A program that accepts text expressed in one language and generates semantically equivalent text expressed in another language.
source language
The input language of a translator.
target language
The output language of a translator.
A translator from an assembly language to the corresponding machine language.
A translator from a high level language to a low level language.
high-level translator
A translator from one high-level language to another.
A translator from machine language to assembler language.
A translater from a low level language to a high level language.
source program
The input text of an assembler or compiler.
object program
The output text of an assembler or compiler.
implementation language
The language in which a program is expressed.
tombstone diagram
A graphical representation of the overall function of a system.
cross compiler
A compiler which generates code for a machine different from the machine on which it is run.
Portable program
A program which can be (compiled and) run on any machine.

interpretive compiler
A program which combines a compiler that produces object code in an intermediate language with an interpreter for that intermediate language.

What is a compiler

A compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code).

Difference between compilers & interpreters.
A compiler first takes in the entire program, checks for errors, compiles it and then executes it. Whereas, an interpreter does this line by line, so it takes one line, checks it for errors and then executes it.

Eg of Compiler - C
Eg of Interpreter – PHP
Language processor.
a parser which parses a particular language are called language processors
Symbol table.
a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier in a program's source code is associated with information relating to its declaration or appearance in the source, such as its type,scope level and sometimes its location.

Explain Different Phases Of A Compiler With An Example
1. Lexical analysis
This is the initial part of reading and analysing the program text: The text is read and divided into tokens , each of which corresponds toa symbol in the programming language,
e.g. , a variable name, keyword or number.
2. Syntax analysis
This phase takes the list of tokens produced by the lexical analysis and arranges these in a tree-structure (called the syntax tree ) that reflects the structure of the program. This phase is often called parsing

3. Semantic analysis - checks for errors.

4. Intermediate Code generation - generates machine code.

5. Code optimization (Machine independent)- looks for ways to make code smaller and more efficient.

6. Code Generator

7. Target program (machine dependent)- creates the output ( .exe, .com, .dll, etc.).

Basically, RISC CPU's ( eg: ARM processer...) contain an instruction set where every instruction and operand is of the exact same length. This makes the CPU design much simpler since every instruction an operand fits in the pipeline with no wasted cycles
CISC CPU's (x86, 8051, etc...) contain complex, variable length instruction sets and operands. While having complex variable length instructions make low level programming much easier, it comes at the price of greatly increasing the complexity of the CPU and reducing efficiency.
Application of compiler technology.
pattern recognition systems.
diagram recognition and diagram-processing tasks by use of grammars

Lexical Analyzer
lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function which performs lexical analysis is called a lexical analyzer, lexer or scanner.

Tokens, Patterns, Lexemes
A lexical token is a sequence of characters that can be treated as a unit in the grammar of the programming languages.
Example of tokens:
• Type token (id, num, real, . . . )
There is a set of strings in the input for which the same token is produced as output. This set of strings is described by a rule called a pattern associated with the token.
Regular expressions are an important notation for specifying patterns.
For example, the pattern for the Pascal identifier token, id, is: id → letter (letter | digit)*.
A lexeme is a sequence of characters in the source program that is matched by the pattern for a token.
For example, ( =, , <, , >=)

Regular Expressions
1. The regular expressions over alphabet specifies a language according to the following rules. is a regular expression that denotes { }, that is, the set containing the empty string.
Regular Definitions
A regular definition gives names to certain regular expressions and uses those names in other regular expressions.
Deterministic Finite Automata (DFA)
A deterministic finite automation is a special case of a non-deterministic finite automation (NFA) in which
1. no state has an -transition
2. for each state s and input symbol a, there is at most one edge labeled a leaving s.
Nondeterministic Finite Automata (NFA)
A nondeterministic finite automation is a mathematical model consists of
1. a set of states S;
2. a set of input symbol, ∑, called the input symbols alphabet.
3. a transition function move that maps state-symbol pairs to sets of states.
4. a state so called the initial or the start state.
5. a set of states F called the accepting or final state.
Synthesized Attributes:
An attribute is synthesized if its value at a parent node can be determined from attributes of its children.

Syntax-Directed Definitions:
• A syntax-directed definition uses a CFG to specify the syntatic structure of the input.
• A syntax-directed definition associates a set of attributes with each grammar symbol.
• A syntax-directed definition associates a set of semantic rules with each production rule.

Parsing is the process of determining if a string of tokens can be generated by a grammar. A parser must be capable of constructing the tree, or else the translation cannot be guaranteed correct. For any language that can be described by CFG, the parsing requires O(n3) time to parse string of n token. However, most programming languages are so simple that a parser requires just O(n) time with a single left-to-right scan over the iput string of n tokens.
There are two types of Parsing
1. Top-down Parsing (start from start symbol and derive string)
A Top-down parser builds a parse tree by starting at the root and working down towards the leaves.
o Easy to generate by hand.
o Examples are : Recursive-descent, Predictive.
2. Bottom-up Parsing (start from string and reduce to start symbol)
A bottom-up parser builds a parser tree by starting at the leaves and working up towards the root.
o Not easy to handle by hands, usually compiler-generating software generate bottom up parser
o But handles larger class of grammar
o Example is LR parser.

Pridictive Parsing:
Recursive-descent parsing is a top-down method of syntax analysis that executes a set of recursive procedure to process the input. A procedure is associated with each nonterminal of a grammar.
A predictive parsing is a special form of recursive-descent parsing, in which the current input token unambiguously determines the production to be applied at each step.

Left Recursion:
The production is left-recursive if the leftmost symbol on the right side is the same as the non terminal on the left side. For example,
expr → expr + term.

Issues in the Design of Code generator
Code generator concern with:
1. Memory management.
2. Instruction Selection.
3. Register Utilization (Allocation).
4. Evaluation order.

What is lex
Lex is a computer program that generates lexical analyzers ("scanners" or "lexers").
Lex is commonly used with the yacc parser generator.

Lex reads an input stream specifying the lexical analyzer and outputs source code implementing the lexer in the C programming language.
Lexical error:
A lexical error is any input that can be rejected by the lexer. This generally results from token recognition falling off the end of the rules you've defined.

Types of parsers
Top-down parsers
_ start at the root of derivation tree and _ll in
_ picks a production and tries to match the input
_ may require backtracking
_ some grammars are backtrack-free (predictive)

Eg: recursive descent, LL

Bottom-up parsers
_ start at the leaves and _ll in
_ start in a state valid for legal _rst tokens
_ as input is consumed, change state to encode possibilities
(recognize valid pre_xes)
_ use a stack to store both state and sentential forms

Eg: LR, CYK (look ahead) parser , slr, lalr

What does LL and LR mean
LL(1) means – first ‘L’ represents scanning input from left to right. Second ‘L’ represents producing leftmost derivation. (1) one input symbol of lookahead at each step.
LR(k) means – ‘L’ Left to right scanning, R-rightmost derivation in reverse. K-0 or 1 no. of i/p symbols.

A loader is a system program, which takes the object code of a program as input and prepares it for execution.
• The loader performs the following functions :

• Allocation - The loader determines and allocates the required memory space for the program to execute properly.

• Linking -- The loader analyses and resolve the symbolic references made in the object modules.

• Relocation - The loader maps and relocates the address references to correspond to the newly allocated memory space during execution.

• Loading - The loader actually loads the machine code corresponding to the object modules into the allocated memory space and makes the program ready to execute.

Compile-and-Go Loaders:

• A compile and go loader is one in which the assembler itself does the processes of compiling then place the assembled instruction inthe designated memory loactions.

Absolute Loader

The assembler generates the object code equivalent of the source program bu the output is punched on to the cads forming the object decks instead of loading in memory.
Relocation loader

Loaders that allow for program relocation are called relocating loaders/relative loaders.
Linkage Editors

A linkage editor produces a linked version for the program called an executable image, which is written to a file for later execution.

Dynamic Linking

Dynamic linking allows an object module to include only the information that is require at load time to execute a program.

There are two types of dynamic linking, they are Load time dynamic linking and Run time dynamic linking.
Code Generation
code generation is the process by which a compiler's code generator converts some internal representation of source code into a form (e.g., machine code) that can be readily executed by a machine

code optimization
Code optimization is the process of modifying the code to make some aspect of software or hardware work more efficiently or use fewer resources or reduce compilation time or use memory efficiently etc

CFG is a grammar which naturally generates a formal language in which clauses can be nested inside clauses arbitrarily deeply, but where grammatical structures are not allowed to overlap