Compiler for Project D

This is our implementation of a compiler for Project D, a dynamically-typed interpreted programming language. The compiler is written in Rust and follows the traditional compiler pipeline: lexing, parsing, semantic analysis, and interpretation.

Overview

The compiler reads source code written in the Project D language and executes it directly by building an abstract syntax tree (AST) and walking through it. We've implemented four main phases:

Lexical Analysis - Breaks source code into tokens
Parsing - Constructs an abstract syntax tree from tokens
Semantic Analysis - Checks for errors and optimizes the AST
Interpretation - Executes the program by traversing the AST

The language supports variables, functions (with closures), arrays, tuples, conditionals, loops, and various operators. It's dynamically typed and uses lexical scoping.

Project Structure

dlang/
├── src/
│   ├── lexer.rs      - Tokenizes the input source code
│   ├── parser.rs     - Builds the AST from tokens
│   ├── analyzer.rs   - Semantic checks and optimizations
│   ├── interpreter.rs - Executes the AST
│   ├── ast.rs        - Defines the AST structure
│   ├── token.rs      - Defines all token types
│   └── main.rs       - Main entry point
├── test_programs/    - Sample programs for testing
└── tests/            - Unit tests

How to Build and Run

You'll need Rust installed. Build the project with:

cd dlang
cargo build

To run a program from a file:

cargo run test_programs/good_program.txt

Or run without arguments to see a bunch of demo programs:

cargo run

The output shows each phase of compilation:

The original AST
Semantic analysis results
Optimized AST (if changes were made)
Program execution output

Running Tests

To run all unit tests with detailed output, use:

cargo test -- --nocapture --test-threads=1

This command shows the full output of each test, making it easier to debug and understand test failures. The --nocapture flag ensures that all print statements and logs are displayed, and --test-threads=1 runs tests sequentially for clearer output.

Language Features

Basic Operations

Variables are declared with var and assigned with :=:

var x := 10
var y := 20
print x + y

The language supports integers, reals, booleans, strings, and none (similar to null).

Control Flow

If/else statements:

if age >= 18 then
    print "Adult"
else
    print "Minor"
end

While loops:

var i := 1
while i <= 5 loop
    print i
    i := i + 1
end

For loops can iterate over arrays or ranges:

for i in 1..5 loop
    print i
end

var arr := [10, 20, 30]
for num in arr loop
    print num
end

Functions

Functions are first-class values. They can be defined with an arrow (=>) for single expressions:

var add := func(x, y) => x + y
print add(5, 3)

Or with a block (is ... end) for multiple statements:

var factorial := func(n) is
    if n <= 1 then
        return 1
    else
        return n * factorial(n - 1)
    end
end

Functions capture their lexical environment (closures), so nested functions work as expected.

Arrays and Tuples

Arrays are 1-indexed:

var arr := [1, 2, 3, 4, 5]
print arr[1]  // prints 1
arr[3] := 99  // modifies the third element

Tuples store named or indexed fields:

var point := {x := 10, y := 20}
print point.x
print point.y

var t := {a := 1, 2, c := 3}
print t.a    // by name
print t.2    // by index
print t.c    // by name

Type Checking

The is operator checks types at runtime:

var x := 42
if x is int then
    print "x is an integer"
end

Special Statements

return - Returns from a function (only allowed inside functions)
exit - Exits from a loop (only allowed inside loops)
print - Prints one or more values to stdout

Implementation Details

Lexer

The lexer (lexer.rs) reads through the source code character by character and produces tokens. It handles:

Keywords (var, if, while, for, func, etc.)
Identifiers and literals (numbers, strings, booleans)
Operators and punctuation
Comments (both // single-line and /* */ multi-line)
Whitespace and newlines

The lexer tracks line and column numbers for error reporting. When it encounters an unexpected character, it produces an error token with location information.

Parser

The parser (parser.rs) uses a recursive descent approach. It builds an AST by consuming tokens according to the language's grammar rules. The parser handles operator precedence and associativity correctly, so expressions like 2 + 3 * 4 are parsed as 2 + (3 * 4).

The parser is pretty forgiving - it skips semicolons, newlines, and comments automatically. This makes the language easier to write without worrying about exact formatting.

Semantic Analyzer

The semantic analyzer (analyzer.rs) has two jobs: checking for errors and optimizing the code.

Semantic Checks

Before optimization, we verify:

Declarations before usage - Variables and functions must be declared before they're used. We track declarations in a symbol table as we walk the AST.
Return statement placement - return statements can only appear inside function bodies, not in the global scope.
Array bounds checking - For array literals with constant indexes, we check at compile time if the index is out of bounds.

If any semantic errors are found, we report them and skip optimization.

Optimizations

We run several optimization passes that modify the AST:

Constant folding - Evaluates constant expressions at compile time. For example, 5 + 3 becomes 8. Works for arithmetic, comparisons, and boolean operations.
Conditional simplification - Removes branches that can never execute. If we see if true then ... else ... end, we replace it with just the then branch.
Unreachable code removal - After a return or exit statement, any code in the same block is removed since it can never execute.
Unused variable removal - Variables that are declared but never used are removed entirely.

The optimizer runs these passes in a loop until no more changes occur. This means constant folding can enable other optimizations - for example, if constant folding makes a condition evaluate to true, then conditional simplification can remove the else branch.

Interpreter

The interpreter (interpreter.rs) walks the AST and executes statements as it goes. It maintains:

An environment (symbol table) for variable bindings, with proper scoping
A stack to track whether we're inside a function or loop (for validating return/exit)
Function closures that capture the environment at definition time

Values are represented as a Value enum that can hold integers, reals, booleans, strings, arrays, tuples, functions, or none. Since the language is dynamically typed, type checking happens at runtime.

The interpreter handles all the language rules:

Arrays are 1-indexed (valid range is 1 to length)
Division by zero raises an error
Array index out of bounds raises an error
Undefined variables raise an error
Function argument counts must match

Scoping follows lexical rules - inner scopes can shadow outer variables, and functions capture the environment they were defined in.

Error Handling

The compiler provides clear error messages at different stages:

Lexical errors - "Unexpected character '@' at line 5, column 3"
Parse errors - "Expected identifier, got '42' at line 10, column 5"
Semantic errors - "Variable 'x' used before declaration"
Runtime errors - "Division by zero" or "Array index 10 out of bounds (array size: 3)"

When an error occurs, we print a message and stop execution at that phase (semantic errors prevent optimization, runtime errors stop execution).

Test Programs

The test_programs/ directory contains several examples:

good_program.txt - Basic functionality with constant folding
semantic_error.txt - Demonstrates semantic error detection
unreachable_code.txt - Shows unreachable code removal
unused_var.txt - Tests unused variable removal
conditional_simplify.txt - Demonstrates conditional simplification

What We Learned

Building this compiler helped us understand:

How lexers break text into tokens
How parsers build trees from sequences of tokens
How semantic analysis catches errors that parsing can't
How optimizations can simplify code before execution
How interpreters execute programs step by step
The importance of proper scoping and closures

The code is structured to be maintainable - each phase is separated into its own module, and the AST serves as a clean interface between phases. Error handling is consistent throughout, making debugging easier.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
dlang		dlang
.gitignore		.gitignore
Project D.pdf		Project D.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compiler for Project D

Overview

Project Structure

How to Build and Run

Running Tests

Language Features

Basic Operations

Control Flow

Functions

Arrays and Tuples

Type Checking

Special Statements

Implementation Details

Lexer

Parser

Semantic Analyzer

Semantic Checks

Optimizations

Interpreter

Error Handling

Test Programs

What We Learned

About

Uh oh!

Releases

Packages

Languages

Boogyy/Compilers-Construction

Folders and files

Latest commit

History

Repository files navigation

Compiler for Project D

Overview

Project Structure

How to Build and Run

Running Tests

Language Features

Basic Operations

Control Flow

Functions

Arrays and Tuples

Type Checking

Special Statements

Implementation Details

Lexer

Parser

Semantic Analyzer

Semantic Checks

Optimizations

Interpreter

Error Handling

Test Programs

What We Learned

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages