Skip to content

Репозиторий содержит реализацию динамически типизируемого интерпретируемого языка программирования (создание компилятора)

Notifications You must be signed in to change notification settings

Boogyy/Compilers-Construction

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Compiler for Project D

This is our implementation of a compiler for Project D, a dynamically-typed interpreted programming language. The compiler is written in Rust and follows the traditional compiler pipeline: lexing, parsing, semantic analysis, and interpretation.

Overview

The compiler reads source code written in the Project D language and executes it directly by building an abstract syntax tree (AST) and walking through it. We've implemented four main phases:

  1. Lexical Analysis - Breaks source code into tokens
  2. Parsing - Constructs an abstract syntax tree from tokens
  3. Semantic Analysis - Checks for errors and optimizes the AST
  4. Interpretation - Executes the program by traversing the AST

The language supports variables, functions (with closures), arrays, tuples, conditionals, loops, and various operators. It's dynamically typed and uses lexical scoping.

Project Structure

dlang/
├── src/
│   ├── lexer.rs      - Tokenizes the input source code
│   ├── parser.rs     - Builds the AST from tokens
│   ├── analyzer.rs   - Semantic checks and optimizations
│   ├── interpreter.rs - Executes the AST
│   ├── ast.rs        - Defines the AST structure
│   ├── token.rs      - Defines all token types
│   └── main.rs       - Main entry point
├── test_programs/    - Sample programs for testing
└── tests/            - Unit tests

How to Build and Run

You'll need Rust installed. Build the project with:

cd dlang
cargo build

To run a program from a file:

cargo run test_programs/good_program.txt

Or run without arguments to see a bunch of demo programs:

cargo run

The output shows each phase of compilation:

  • The original AST
  • Semantic analysis results
  • Optimized AST (if changes were made)
  • Program execution output

Running Tests

To run all unit tests with detailed output, use:

cargo test -- --nocapture --test-threads=1

This command shows the full output of each test, making it easier to debug and understand test failures. The --nocapture flag ensures that all print statements and logs are displayed, and --test-threads=1 runs tests sequentially for clearer output.

Language Features

Basic Operations

Variables are declared with var and assigned with :=:

var x := 10
var y := 20
print x + y

The language supports integers, reals, booleans, strings, and none (similar to null).

Control Flow

If/else statements:

if age >= 18 then
    print "Adult"
else
    print "Minor"
end

While loops:

var i := 1
while i <= 5 loop
    print i
    i := i + 1
end

For loops can iterate over arrays or ranges:

for i in 1..5 loop
    print i
end

var arr := [10, 20, 30]
for num in arr loop
    print num
end

Functions

Functions are first-class values. They can be defined with an arrow (=>) for single expressions:

var add := func(x, y) => x + y
print add(5, 3)

Or with a block (is ... end) for multiple statements:

var factorial := func(n) is
    if n <= 1 then
        return 1
    else
        return n * factorial(n - 1)
    end
end

Functions capture their lexical environment (closures), so nested functions work as expected.

Arrays and Tuples

Arrays are 1-indexed:

var arr := [1, 2, 3, 4, 5]
print arr[1]  // prints 1
arr[3] := 99  // modifies the third element

Tuples store named or indexed fields:

var point := {x := 10, y := 20}
print point.x
print point.y

var t := {a := 1, 2, c := 3}
print t.a    // by name
print t.2    // by index
print t.c    // by name

Type Checking

The is operator checks types at runtime:

var x := 42
if x is int then
    print "x is an integer"
end

Special Statements

  • return - Returns from a function (only allowed inside functions)
  • exit - Exits from a loop (only allowed inside loops)
  • print - Prints one or more values to stdout

Implementation Details

Lexer

The lexer (lexer.rs) reads through the source code character by character and produces tokens. It handles:

  • Keywords (var, if, while, for, func, etc.)
  • Identifiers and literals (numbers, strings, booleans)
  • Operators and punctuation
  • Comments (both // single-line and /* */ multi-line)
  • Whitespace and newlines

The lexer tracks line and column numbers for error reporting. When it encounters an unexpected character, it produces an error token with location information.

Parser

The parser (parser.rs) uses a recursive descent approach. It builds an AST by consuming tokens according to the language's grammar rules. The parser handles operator precedence and associativity correctly, so expressions like 2 + 3 * 4 are parsed as 2 + (3 * 4).

The parser is pretty forgiving - it skips semicolons, newlines, and comments automatically. This makes the language easier to write without worrying about exact formatting.

Semantic Analyzer

The semantic analyzer (analyzer.rs) has two jobs: checking for errors and optimizing the code.

Semantic Checks

Before optimization, we verify:

  1. Declarations before usage - Variables and functions must be declared before they're used. We track declarations in a symbol table as we walk the AST.

  2. Return statement placement - return statements can only appear inside function bodies, not in the global scope.

  3. Array bounds checking - For array literals with constant indexes, we check at compile time if the index is out of bounds.

If any semantic errors are found, we report them and skip optimization.

Optimizations

We run several optimization passes that modify the AST:

  1. Constant folding - Evaluates constant expressions at compile time. For example, 5 + 3 becomes 8. Works for arithmetic, comparisons, and boolean operations.

  2. Conditional simplification - Removes branches that can never execute. If we see if true then ... else ... end, we replace it with just the then branch.

  3. Unreachable code removal - After a return or exit statement, any code in the same block is removed since it can never execute.

  4. Unused variable removal - Variables that are declared but never used are removed entirely.

The optimizer runs these passes in a loop until no more changes occur. This means constant folding can enable other optimizations - for example, if constant folding makes a condition evaluate to true, then conditional simplification can remove the else branch.

Interpreter

The interpreter (interpreter.rs) walks the AST and executes statements as it goes. It maintains:

  • An environment (symbol table) for variable bindings, with proper scoping
  • A stack to track whether we're inside a function or loop (for validating return/exit)
  • Function closures that capture the environment at definition time

Values are represented as a Value enum that can hold integers, reals, booleans, strings, arrays, tuples, functions, or none. Since the language is dynamically typed, type checking happens at runtime.

The interpreter handles all the language rules:

  • Arrays are 1-indexed (valid range is 1 to length)
  • Division by zero raises an error
  • Array index out of bounds raises an error
  • Undefined variables raise an error
  • Function argument counts must match

Scoping follows lexical rules - inner scopes can shadow outer variables, and functions capture the environment they were defined in.

Error Handling

The compiler provides clear error messages at different stages:

  • Lexical errors - "Unexpected character '@' at line 5, column 3"
  • Parse errors - "Expected identifier, got '42' at line 10, column 5"
  • Semantic errors - "Variable 'x' used before declaration"
  • Runtime errors - "Division by zero" or "Array index 10 out of bounds (array size: 3)"

When an error occurs, we print a message and stop execution at that phase (semantic errors prevent optimization, runtime errors stop execution).

Test Programs

The test_programs/ directory contains several examples:

  • good_program.txt - Basic functionality with constant folding
  • semantic_error.txt - Demonstrates semantic error detection
  • unreachable_code.txt - Shows unreachable code removal
  • unused_var.txt - Tests unused variable removal
  • conditional_simplify.txt - Demonstrates conditional simplification

What We Learned

Building this compiler helped us understand:

  • How lexers break text into tokens
  • How parsers build trees from sequences of tokens
  • How semantic analysis catches errors that parsing can't
  • How optimizations can simplify code before execution
  • How interpreters execute programs step by step
  • The importance of proper scoping and closures

The code is structured to be maintainable - each phase is separated into its own module, and the AST serves as a clean interface between phases. Error handling is consistent throughout, making debugging easier.

About

Репозиторий содержит реализацию динамически типизируемого интерпретируемого языка программирования (создание компилятора)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 96.9%
  • Makefile 3.1%