-
Notifications
You must be signed in to change notification settings - Fork 2
Conversation
WalkthroughThe changes in this pull request involve significant restructuring of the lexer functionality across multiple files. The Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Outside diff range and nitpick comments (2)
src/lexer/tokens.h (2)
5-24: Consider explicit values forenum TokenTypeWhile allowing the compiler to assign values automatically to enums is acceptable, explicitly assigning values can improve readability and maintainability, especially if values have specific meanings or are used externally.
Assign explicit integer values to each
TokenTypeif they have significance beyond the code (e.g., for debugging or interfacing with other systems).
30-32: InitializevaluetoNULLinstruct TokenEnsure that the
valuemember ofstruct Tokenis initialized toNULLto prevent undefined behavior when it's not assigned explicitly.Modify the token initialization code to set
valuetoNULLby default.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/lexer/lexer.c(2 hunks)src/lexer/lexer.h(1 hunks)src/lexer/tokens.h(1 hunks)
🔇 Additional comments (1)
src/lexer/lexer.h (1)
12-12: Change to dynamic tokens array requires careful memory management
Changing tokens from a fixed-size array to a pointer necessitates proper allocation and deallocation to prevent memory leaks.
Ensure that all usages of LexerResult properly manage the tokens pointer. Would you like me to search for all instances where LexerResult is used to confirm that memory is managed correctly?
| int buffLen = 32; | ||
| char* buff = malloc(buffLen); | ||
|
|
||
| if(c == ' ' || c == '\t' || c == '\n') { | ||
| continue; | ||
| } else if (isdigit(c)) { | ||
| int numLen = 0; | ||
|
|
||
| while(isdigit(c)) { | ||
| buff[numLen] = c; | ||
| numLen++; | ||
|
|
||
| c = *string++; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential buffer overflow due to fixed buffer size
The buffers buff allocated with a fixed size of 32 bytes may overflow if the tokens (numbers, strings, keywords) are longer than 31 characters (reserving one byte for the null terminator). There is no check to prevent writing beyond the allocated memory, which can lead to undefined behavior or security vulnerabilities.
Apply this diff to implement dynamic resizing of buff:
33 int buffLen = 32;
34 char* buff = malloc(buffLen);
+ int buffIndex = 0;
...
42 buff[numLen] = c;
43 numLen++;
+ buffIndex++;
+ if (buffIndex >= buffLen - 1) { // Resize if needed
+ buffLen *= 2;
+ buff = realloc(buff, buffLen);
+ if (buff == NULL) {
+ // Handle allocation failure
+ }
+ }
45 c = *string++;Repeat similar changes for string parsing (lines 54-59) and keyword parsing (lines 67-72).
Also applies to: 54-59, 67-72
| int buffLen = 32; | ||
| char* buff = malloc(buffLen); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory leak: buff is not freed after use
The buffer buff is allocated with malloc but is not freed after it's no longer needed, leading to memory leaks.
Ensure that buff is freed after assigning it to result.tokens[result.size - 1].value or if it's no longer needed. For tokens that don't store buff, free it before moving to the next iteration.
+ // After using buff
+ if (buff_needed) {
+ result.tokens[result.size - 1].value = buff;
+ } else {
+ free(buff);
+ }Also applies to: 52-52, 64-64
| while(isdigit(c)) { | ||
| buff[numLen] = c; | ||
| numLen++; | ||
|
|
||
| c = *string++; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing null terminators in token values
When collecting characters for numbers and keywords, the buffers are not null-terminated, which can lead to undefined behavior when used as strings.
Add a null terminator at the end of the buffers after the loops.
46 }
+ buff[numLen] = '\0';Apply similar changes for keyword parsing.
Also applies to: 67-72
| result.tokens[result.size - 1].value[0] = c; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uninitialized value field in MATH_OP tokens
For math operator tokens, you're assigning c to result.tokens[result.size - 1].value[0], but value is not allocated, which may cause undefined behavior.
Allocate memory for value before assigning to it.
114 pushToken(&result, MATH_OP);
+ result.tokens[result.size - 1].value = malloc(2);
115 result.tokens[result.size - 1].value[0] = c;
+ result.tokens[result.size - 1].value[1] = '\0';📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| result.tokens[result.size - 1].value[0] = c; | |
| } | |
| result.tokens[result.size - 1].value = malloc(2); | |
| result.tokens[result.size - 1].value[0] = c; | |
| result.tokens[result.size - 1].value[1] = '\0'; | |
| } |
| result.tokens = malloc(sizeof(struct Token) * 1024); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fixed-size token array may cause overflow
Allocating result.tokens with a fixed size of 1024 tokens could lead to overflow if processing inputs with more tokens.
Implement dynamic resizing for result.tokens to handle inputs with more than 1024 tokens.
27 int tokensCapacity = 1024;
27 result.tokens = malloc(sizeof(struct Token) * tokensCapacity);
...
+ // Before pushing a new token
+ if (result.size >= tokensCapacity) {
+ tokensCapacity *= 2;
+ result.tokens = realloc(result.tokens, sizeof(struct Token) * tokensCapacity);
+ if (result.tokens == NULL) {
+ // Handle allocation failure
+ }
+ }Committable suggestion skipped: line range outside the PR's diff.
| while(c != '\"') { | ||
| buff[strLen] = c; | ||
| strLen++; | ||
|
|
||
| c = *string++; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Infinite loop risk when parsing strings
The loop while(c != '\"') doesn't check for the end of the string. If a closing quote is missing, this will cause an infinite loop.
Modify the loop condition to check for the null terminator to prevent infinite loops.
54 while(c != '\"' && c != '\0') {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| while(c != '\"') { | |
| buff[strLen] = c; | |
| strLen++; | |
| c = *string++; | |
| } | |
| while(c != '\"' && c != '\0') { | |
| buff[strLen] = c; | |
| strLen++; | |
| c = *string++; | |
| } |
| * Returns a LexerResult containing the tokens | ||
| */ | ||
| struct LexerResult runLexer(const char* input); | ||
| struct LexerResult runLexer(char* input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Changing input parameter to char* may lead to unintended modifications
The runLexer function now takes a char* instead of a const char*, which allows modification of the input string. This could cause unexpected side effects if the input string is used elsewhere.
If the function does not need to modify the input string, change the parameter back to const char*.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
runLexerfunction to reflect changes in input handling.Refactor
Tokenstructure and restructured theLexerResultfor better token management.TokenTypeenumeration and removed outdated keyword handling functions.