Skip to content

Conversation

@JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Dec 30, 2025

What problem does this PR solve?

Issue Number: close #10633

Problem Summary:

What is changed and how it works?


Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • Bug Fixes
    • Strengthened S3 file download reliability with improved error handling and clearer diagnostic messages for troubleshooting failures.
    • Enhanced logging visibility for S3 download operations to aid in monitoring and diagnostics.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: JaySon-Huang <tshent@qq.com>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 30, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Dec 30, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lloyd-pottiger for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JaySon-Huang
Copy link
Contributor Author

@coderabbitai

@JaySon-Huang
Copy link
Contributor Author

@coderabbitai review

Signed-off-by: JaySon-Huang <tshent@qq.com>
@kennedy8312
Copy link

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Jan 4, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link

coderabbitai bot commented Jan 4, 2026

📝 Walkthrough

Walkthrough

Modified S3 file cache download implementation to use buffered, rate-limited writes via a new downloadToLocal helper function, replacing burst streaming. Download writes to a temporary file first, then renames to the final destination, with enhanced error handling and proper fsync operations.

Changes

Cohort / File(s) Summary
S3 FileCache Rate-Limited Download
dbms/src/Storages/S3/FileCache.cpp
Added downloadToLocal() helper function using PosixWritableFile with write-limiter and ReadBufferFromIStream for buffered reads from S3. Modified FileCache::downloadImpl() to download to temporary file, then rename; removed direct streaming. Replaced filesystem size lookup with returned fsize from download function. Adjusted logging level and added new includes (PosixWritableFile.h, ReadBufferFromIStream.h, AWSStreamFwd.h, fcntl.h, errno). Enhanced error handling for write/fsync failures.

Sequence Diagram

sequenceDiagram
    participant S3 as S3 Object Storage
    participant ReadBuf as ReadBufferFromIStream
    participant WriteLim as PosixWritableFile<br/>(with RateLimiter)
    participant TempFS as Temporary File
    participant FinalFS as Final Cache Location
    
    rect rgb(220, 240, 255)
    Note over S3,ReadBuf: Initialize S3 stream
    S3->>ReadBuf: GetObject (istream)
    end
    
    rect rgb(240, 255, 240)
    Note over ReadBuf,WriteLim: Buffered, Rate-Limited Download
    loop Chunked reads with rate limiting
        ReadBuf->>ReadBuf: Read chunk from buffer
        ReadBuf->>WriteLim: Write chunk (rate limiter gates throughput)
        WriteLim->>TempFS: Write to temp file
        WriteLim->>WriteLim: Check rate limit quota
    end
    end
    
    rect rgb(255, 240, 240)
    Note over WriteLim,TempFS: Finalization
    WriteLim->>TempFS: fsync (ensure durability)
    WriteLim->>WriteLim: Report file size
    end
    
    rect rgb(255, 250, 220)
    Note over TempFS,FinalFS: Atomic Finalization
    TempFS->>FinalFS: rename(temp_file, final_file)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A rabbit hops with glee,
S3 streams now flow so free—
With buffers fine and rate-limiter kind,
No bursts shall break the local mind!
From temp to cache, a hop, then done, 🏃✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description follows the required template structure but is largely incomplete, with critical sections like 'Problem Summary' and 'What is changed and how it works' left empty. Fill in the 'Problem Summary' and 'What is changed and how it works' sections, and check at least one test checkbox to indicate testing status.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title '[WIP] disagg: better cache downloading' is related to the PR's changes but uses vague terminology ('disagg', 'better') that doesn't clearly convey what specific improvement was made. Clarify the title to more specifically describe the main change, such as 'Fine-grained rate limiting for S3 cache downloads' or similar.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The code changes align with the linked issue #10633 by replacing the burst-based rate limiting with a new downloadToLocal helper function that performs finer-grained rate-limited writes.
Out of Scope Changes check ✅ Passed All changes in FileCache.cpp are focused on implementing fine-grained rate limiting for S3 downloads, which is directly aligned with issue #10633; no out-of-scope changes detected.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 965cc8c and 2bc6ff5.

📒 Files selected for processing (1)
  • dbms/src/Storages/S3/FileCache.cpp
🧰 Additional context used
🪛 Clang (14.0.6)
dbms/src/Storages/S3/FileCache.cpp

[error] 989-989: parameter 'istr' is unused

(misc-unused-parameters,-warnings-as-errors)


[error] 990-990: 3 adjacent parameters of 'downloadToLocal' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 990-990: parameter 'fname' is unused

(misc-unused-parameters,-warnings-as-errors)


[error] 991-991: parameter 'content_length' is unused

(misc-unused-parameters,-warnings-as-errors)


[error] 992-992: parameter 'write_limiter' is unused

(misc-unused-parameters,-warnings-as-errors)


[error] 996-996: variable 'ofile' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 998-998: statement should be inside braces

(readability-braces-around-statements,-warnings-as-errors)


[error] 1002-1002: variable 'total_written' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 1004-1004: variable 'rbuf' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 1007-1007: variable 'count' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 1008-1008: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)


[error] 1008-1008: variable 'write_res' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

🔇 Additional comments (4)
dbms/src/Storages/S3/FileCache.cpp (4)

24-24: LGTM! New includes support the buffered download implementation.

The added headers provide the necessary functionality for implementing rate-limited, buffered writes to local files during S3 downloads.

Also applies to: 26-26, 35-35, 38-38, 42-42


1003-1004: Verify: Is the 16KB buffer size intentional for rate limiting granularity?

The buffer size of 16KB seems smaller than typical S3 download buffers (which are often 64KB-1MB). However, given the PR's objective to implement fine-grained rate limiting to prevent I/O bursts, this smaller buffer size may be intentional to provide better rate-limiting granularity.

Could you confirm whether:

  1. The 16KB size is a deliberate choice for finer-grained rate limiting control?
  2. Performance testing shows this size provides adequate throughput while preventing bursts?

1052-1055: LGTM! Proper temp-file-then-rename pattern for atomic downloads.

Downloading to a temporary file first and then renaming to the final destination is a good practice that ensures:

  • Atomic file replacement (on most filesystems)
  • Incomplete files are never visible to readers
  • Proper cleanup on failure (temp files are cleaned up during restore)

1065-1071: LGTM! Appropriate log level for download completion.

Changing the log level from DEBUG to INFO for successful download completion is appropriate, as this represents a significant operation that should be visible in standard operational logs for monitoring and troubleshooting.

Comment on lines +1005 to +1017
while (!rbuf.eof())
{
size_t count = rbuf.buffer().end() - rbuf.position();
if (ssize_t write_res = ofile.write(rbuf.position(), count); write_res < 0)
{
throwFromErrno(fmt::format("write to file failed, fname={}", fname), write_res, errno);
}
else
{
total_written += write_res;
}
rbuf.position() += count;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find and examine PosixWritableFile::write() implementation
find . -type f -name "*.cpp" -o -name "*.h" | head -20

Repository: pingcap/tiflash

Length of output: 1009


🏁 Script executed:

#!/bin/bash
# Search for PosixWritableFile class definition
rg "class PosixWritableFile" -A 20

Repository: pingcap/tiflash

Length of output: 2242


🏁 Script executed:

#!/bin/bash
# Find the file containing PosixWritableFile
find . -type f \( -name "*.cpp" -o -name "*.h" \) -exec rg -l "PosixWritableFile" {} \;

Repository: pingcap/tiflash

Length of output: 694


🏁 Script executed:

#!/bin/bash
# Look for the write method in PosixWritableFile
rg "PosixWritableFile.*write" -B 2 -A 15

Repository: pingcap/tiflash

Length of output: 3460


🏁 Script executed:

#!/bin/bash
# Check the actual code at the specified file and lines
wc -l dbms/src/Storages/S3/FileCache.cpp

Repository: pingcap/tiflash

Length of output: 98


🏁 Script executed:

#!/bin/bash
# Read the actual code at FileCache.cpp around lines 1005-1017
sed -n '988,1025p' dbms/src/Storages/S3/FileCache.cpp

Repository: pingcap/tiflash

Length of output: 1501


🏁 Script executed:

#!/bin/bash
# Check the PosixWritableFile implementation more closely
cat dbms/src/IO/BaseFile/PosixWritableFile.cpp | grep -A 10 "ssize_t PosixWritableFile::write"

Repository: pingcap/tiflash

Length of output: 356


Critical: Partial writes not handled correctly—data corruption risk.

The implementation has a critical bug in the write loop. PosixWritableFile::write() directly returns the result of ::write(fd, buf, size), which may write fewer bytes than requested (a valid POSIX behavior for partial writes). However, line 1016 advances the buffer position by the full count (requested bytes) regardless of write_res (actual bytes written).

Impact: If a partial write occurs—e.g., ::write() returns 50 out of 100 requested bytes—the code advances the buffer position by 100, causing the next buffer refill to skip 50 unwritten bytes. This results in data loss and file corruption.

The fix requires looping until all requested bytes are written, adjusting the offset after each partial write:

Proposed fix
     while (!rbuf.eof())
     {
         size_t count = rbuf.buffer().end() - rbuf.position();
-        if (ssize_t write_res = ofile.write(rbuf.position(), count); write_res < 0)
-        {
-            throwFromErrno(fmt::format("write to file failed, fname={}", fname), write_res, errno);
-        }
-        else
-        {
-            total_written += write_res;
-        }
-        rbuf.position() += count;
+        size_t bytes_to_write = count;
+        size_t bytes_written_total = 0;
+        while (bytes_written_total < bytes_to_write)
+        {
+            ssize_t write_res = ofile.write(rbuf.position() + bytes_written_total, bytes_to_write - bytes_written_total);
+            if (write_res < 0)
+            {
+                throwFromErrno(fmt::format("write to file failed, fname={}", fname), write_res, errno);
+            }
+            bytes_written_total += write_res;
+            total_written += write_res;
+        }
+        rbuf.position() += bytes_written_total;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while (!rbuf.eof())
{
size_t count = rbuf.buffer().end() - rbuf.position();
if (ssize_t write_res = ofile.write(rbuf.position(), count); write_res < 0)
{
throwFromErrno(fmt::format("write to file failed, fname={}", fname), write_res, errno);
}
else
{
total_written += write_res;
}
rbuf.position() += count;
}
while (!rbuf.eof())
{
size_t count = rbuf.buffer().end() - rbuf.position();
size_t bytes_to_write = count;
size_t bytes_written_total = 0;
while (bytes_written_total < bytes_to_write)
{
ssize_t write_res = ofile.write(rbuf.position() + bytes_written_total, bytes_to_write - bytes_written_total);
if (write_res < 0)
{
throwFromErrno(fmt::format("write to file failed, fname={}", fname), write_res, errno);
}
bytes_written_total += write_res;
total_written += write_res;
}
rbuf.position() += bytes_written_total;
}
🧰 Tools
🪛 Clang (14.0.6)

[error] 1007-1007: variable 'count' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 1008-1008: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)


[error] 1008-1008: variable 'write_res' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

🤖 Prompt for AI Agents
In dbms/src/Storages/S3/FileCache.cpp around lines 1005 to 1017, the write loop
advances the buffer position by the full requested count even when
PosixWritableFile::write() returns a smaller number (partial write), causing
data loss; change the logic to loop until the entire requested count is written:
on each iteration call write() with the current pointer and remaining bytes,
check write_res for negative (throw on error other than transient EINTR/EAGAIN
where you should retry) and for zero (treat as error), add only write_res to
total_written, advance rbuf.position() by write_res, subtract write_res from
remaining count and continue until remaining is zero before refilling the
buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fine grained rate limiting when downloading cache from S3 to local disk

2 participants