feat: enable function calling support for streaming responses#102
feat: enable function calling support for streaming responses#102Pavilion4ik wants to merge 2 commits intoopenedx:mainfrom
Conversation
|
Thanks for the pull request, @Pavilion4ik! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #102 +/- ##
==========================================
+ Coverage 91.22% 91.27% +0.05%
==========================================
Files 51 51
Lines 4547 4724 +177
Branches 276 298 +22
==========================================
+ Hits 4148 4312 +164
- Misses 311 320 +9
- Partials 88 92 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi @Pavilion4ik I did not notice this was open. I assigned myself to take a look soon. |
|
Hi @Pavilion4ik i haven't checked static code but i tried to run it with an error:
My profile's config:
|
- Removed the restriction in LitellmProcessor that disabled streaming when tools are present - Implemented `_handle_streaming_tool_calls` in LLMProcessor to aggregate chunks, reconstruct tool calls, and handle recursion - Updated `_completion_with_tools` to delegate to the streaming handler when `stream=True` - Added unit tests covering streaming tool calls and recursive execution # Conflicts: # backend/tests/test_litellm_base_processor.py # backend/tests/test_llm_processor.py
- Implemented recursive tool execution for streaming via the Responses API - Refactor streaming logic into `_execute_stream_tool_call` to reduce nesting - Add token usage tracking and session persistence for streamed threads - Add unit tests for recursive streaming and tool call synchronization.
6c44596 to
b518869
Compare


This PR refactors the LiteLLM-based processors to support streaming responses even when OpenAI function calling (tools) is enabled. Specifically, it includes:
chunk aggregation: Added logic to buffer streaming chunks in LLMProcessor, reconstruct fragmented tool call arguments, and execute the tools once the stream for that specific call is complete.
Recursive Streaming: Implemented yield from recursion in _handle_streaming_tool_calls to allow the LLM to call a function, receive the output, and continue streaming the final text response to the user.
Educator Processor Update: enabled streaming in EducatorAssistantProcessor for general chat, while explicitly forcing non-streaming mode for generate_quiz_questions (since it requires full response JSON validation and retry logic).
Unit Tests: Added comprehensive tests to verify that streaming works correctly with single and multiple tool calls.
Why?
Previously, LitellmProcessor explicitly disabled streaming if any tools were configured. This resulted in a poor User Experience (UX) where users had to wait for the entire generation to finish before seeing any text, simply because a tool might have been used. This change allows for a "best of both worlds" scenario: immediate feedback via streaming for text responses, and correct execution of background functions when the model decides to use them.