Add DataFrame usage guide with HTML rendering customization options#1108
Merged
timsaucer merged 5 commits intoapache:mainfrom Apr 27, 2025
Merged
Add DataFrame usage guide with HTML rendering customization options#1108timsaucer merged 5 commits intoapache:mainfrom
timsaucer merged 5 commits intoapache:mainfrom
Conversation
timsaucer
requested changes
Apr 26, 2025
Member
timsaucer
left a comment
There was a problem hiding this comment.
This is excellent! Thank you very much for it. My only comments are to fix the rst parsing.
docs/source/user-guide/dataframe.rst
Outdated
| and Arrow. | ||
|
|
||
| A DataFrame represents a logical plan that can be composed through operations like filtering, projection, and aggregation. | ||
| The actual execution happens when terminal operations like `collect()` or `show()` are called. |
Member
There was a problem hiding this comment.
These need double back ticks to render properly.
``collect()`` or ``show()``
docs/source/user-guide/dataframe.rst
Outdated
| When working in Jupyter notebooks or other environments that support HTML rendering, DataFrames will | ||
| automatically display as formatted HTML tables, making it easier to visualize your data. | ||
|
|
||
| The `_repr_html_` method is called automatically by Jupyter to render a DataFrame. This method |
docs/source/user-guide/dataframe.rst
Outdated
| The actual execution happens when terminal operations like `collect()` or `show()` are called. | ||
|
|
||
| Basic Usage | ||
| ---------- |
Member
There was a problem hiding this comment.
The ---- needs to be the same length as the title above it. It's one - too short
docs/source/user-guide/dataframe.rst
Outdated
| df.show() | ||
|
|
||
| HTML Rendering | ||
| ------------- |
docs/source/user-guide/dataframe.rst
Outdated
| plain text output. | ||
|
|
||
| Customizing HTML Rendering | ||
| ------------------------- |
docs/source/user-guide/dataframe.rst
Outdated
| The formatter settings affect all DataFrames displayed after configuration. | ||
|
|
||
| Custom Style Providers | ||
| --------------------- |
docs/source/user-guide/dataframe.rst
Outdated
| configure_formatter(style_provider=MyStyleProvider()) | ||
|
|
||
| Creating a Custom Formatter | ||
| -------------------------- |
docs/source/user-guide/dataframe.rst
Outdated
| custom_html = formatter.format_html(batches, schema) | ||
|
|
||
| Managing Formatters | ||
| ------------------ |
docs/source/user-guide/dataframe.rst
Outdated
| print(formatter.theme) | ||
|
|
||
| Contextual Formatting | ||
| -------------------- |
Contributor
Author
|
Thank you @timsaucer for the detailed review. |
timsaucer
approved these changes
Apr 27, 2025
Member
|
Thank you again! |
kosiew
added a commit
to kosiew/datafusion-python
that referenced
this pull request
Apr 28, 2025
…pache#1108) * docs: enhance user guide with detailed DataFrame operations and examples * move /docs/source/api/dataframe.rst into user-guide * docs: remove DataFrame API documentation * docs: fix formatting inconsistencies in DataFrame user guide * Two minor corrections to documentation rendering --------- Co-authored-by: Tim Saucer <timsaucer@gmail.com>
timsaucer
added a commit
that referenced
this pull request
May 5, 2025
…Memory and Display Controls (#1119) * feat: add configurable max table bytes and min table rows for DataFrame display * Revert "feat: add configurable max table bytes and min table rows for DataFrame display" This reverts commit f9b78fa. * feat: add FormatterConfig for configurable DataFrame display options * refactor: simplify attribute extraction in get_formatter_config function * refactor: remove hardcoded constants and use FormatterConfig for display options * refactor: simplify record batch collection by using FormatterConfig for display options * feat: add max_memory_bytes, min_rows_display, and repr_rows parameters to DataFrameHtmlFormatter * feat: add tests for HTML formatter row display settings and memory limit * refactor: extract Python formatter retrieval into a separate function * Revert "feat: add tests for HTML formatter row display settings and memory limit" This reverts commit e089d7b. * feat: add tests for HTML formatter row and memory limit configurations * Revert "feat: add tests for HTML formatter row and memory limit configurations" This reverts commit 4090fd2. * feat: add tests for new parameters and validation in DataFrameHtmlFormatter * Reorganize tests * refactor: rename and restructure formatter functions for clarity and maintainability * feat: implement PythonFormatter struct and refactor formatter retrieval for improved clarity * refactor: improve comments and restructure FormatterConfig usage in PyDataFrame * Add DataFrame usage guide with HTML rendering customization options (#1108) * docs: enhance user guide with detailed DataFrame operations and examples * move /docs/source/api/dataframe.rst into user-guide * docs: remove DataFrame API documentation * docs: fix formatting inconsistencies in DataFrame user guide * Two minor corrections to documentation rendering --------- Co-authored-by: Tim Saucer <timsaucer@gmail.com> * Update documentation * refactor: streamline HTML rendering documentation * refactor: extract validation logic into separate functions for clarity * Implement feature X to enhance user experience and optimize performance * feat: add validation method for FormatterConfig to ensure positive integer values * add comment - ensure minimum rows are collected even if memory or row limits are hit * Update html_formatter documentation * update tests * remove unused type hints from imports in html_formatter.py * remove redundant tests for DataFrameHtmlFormatter and clean up assertions * refactor get_attr function to support generic default values * build_formatter_config_from_python return PyResult * fix ruff errors * trigger ci * fix: remove redundant newline in test_custom_style_provider_html_formatter * add more tests * trigger ci * Fix ruff errors * fix clippy error * feat: add validation for parameters in configure_formatter * test: add tests for invalid parameters in configure_formatter * Fix ruff errors --------- Co-authored-by: Tim Saucer <timsaucer@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #1100
Rationale for this change
This change provides users with a dedicated and detailed guide for working with DataFrames in DataFusion. It introduces essential concepts, usage examples, and advanced features like HTML rendering customization, making it easier for both new and experienced users to take full advantage of the DataFrame API. This documentation enhancement will improve developer experience and usability.
What changes are included in this PR?
docs/source/user-guide/dataframe.rstdocs/source/user-guide/basics.rstwith a reference to the new DataFrame guideAre there any user-facing changes?
✅ Yes — this PR adds new user-facing documentation:
There are no breaking changes to the public API — only enhancements to documentation.