refactor: replace HashMap with TableProperties in TableMetadata (#2028)#2037
refactor: replace HashMap with TableProperties in TableMetadata (#2028)#2037vigneshsiva11 wants to merge 6 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors TableMetadata to use a typed TableProperties struct instead of a raw HashMap<String, String>, improving type safety and reducing allocations by returning references instead of reconstructing property maps.
Changes:
- Replaced
HashMap<String, String>withTablePropertiesinTableMetadata - Modified
table_properties()to return a reference to the struct instead of aResult - Added
otherfield toTablePropertiesto store non-explicitly mapped properties - Updated serialization/deserialization logic to handle the new structure
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| crates/iceberg/src/transaction/mod.rs | Updated to use reference-based API and clone when needed |
| crates/iceberg/src/spec/table_properties.rs | Added new field, traits, and constructor method |
| crates/iceberg/src/spec/table_metadata_builder.rs | Updated to use TableProperties struct and access other field |
| crates/iceberg/src/spec/table_metadata.rs | Changed internal field type and modified public API |
Comments suppressed due to low confidence (3)
crates/iceberg/src/spec/table_metadata.rs:3893
- The test now calls
.unwrap()on a method that no longer returns aResult, suggesting tests need to be updated. After the refactor,table_properties()returns&TablePropertiesdirectly, so the.unwrap()is no longer valid and would cause compilation failures.
let props = metadata.table_properties().unwrap();
crates/iceberg/src/spec/table_metadata.rs:3940
- The test calls
.unwrap()on a method that now returns&TablePropertiesinstead ofResult<TableProperties>, which would cause a compilation error.
let props = metadata.table_properties().unwrap();
crates/iceberg/src/spec/table_metadata.rs:3973
- This test expects
table_properties()to return an error for invalid property values, but the refactored API returns&TablePropertiesand thenew()method silently uses defaults viaunwrap_or_default(). This test would no longer compile and the error handling behavior it validates is lost.
let err = metadata.table_properties().unwrap_err();
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| TableProperties::PROPERTY_DATAFUSION_WRITE_FANOUT_ENABLED, | ||
| TableProperties::PROPERTY_DATAFUSION_WRITE_FANOUT_ENABLED_DEFAULT, | ||
| )?, | ||
| other: props.clone(), |
There was a problem hiding this comment.
The other field is set to a full clone of the input props HashMap, but this includes all properties (even the ones that were already parsed into struct fields). This creates unnecessary duplication and memory overhead. Consider filtering out the explicitly parsed properties before storing in other.
| pub fn new(props: HashMap<String, String>) -> Self { | ||
| Self::try_from(&props).unwrap_or_default() |
There was a problem hiding this comment.
The new() method silently swallows errors via unwrap_or_default(), which could hide parsing failures for invalid property values. This differs from the original behavior where table_properties() returned a Result. Consider returning Result<Self, anyhow::Error> to preserve error handling capabilities, or at minimum document that invalid values fall back to defaults.
| pub fn new(props: HashMap<String, String>) -> Self { | |
| Self::try_from(&props).unwrap_or_default() | |
| pub fn new(props: HashMap<String, String>) -> Result<Self, anyhow::Error> { | |
| Self::try_from(&props) |
| pub other: HashMap<String, String>, | ||
| } | ||
|
|
||
| impl TableProperties { |
There was a problem hiding this comment.
Making the other field public exposes internal implementation details and allows external code to directly modify properties, potentially causing inconsistencies. Consider making this field private and providing accessor methods if needed.
| pub other: HashMap<String, String>, | |
| } | |
| impl TableProperties { | |
| other: HashMap<String, String>, | |
| } | |
| impl TableProperties { | |
| /// Returns a reference to the map of additional, non-standard table properties. | |
| pub fn other(&self) -> &HashMap<String, String> { | |
| &self.other | |
| } | |
| /// Returns a mutable reference to the map of additional, non-standard table properties. | |
| pub fn other_mut(&mut self) -> &mut HashMap<String, String> { | |
| &mut self.other | |
| } |
| let table_props = self.table.metadata().table_properties(); | ||
|
|
||
| let backoff = Self::build_backoff(table_props)?; | ||
| let backoff = Self::build_backoff(table_props.clone())?; |
There was a problem hiding this comment.
This clones the entire TableProperties struct (including the potentially large other HashMap) just to pass it to build_backoff(). Since build_backoff() only reads a few fields, passing a reference would be more efficient.
| let backoff = Self::build_backoff(table_props.clone())?; | |
| let backoff = Self::build_backoff(&table_props)?; |
| pub write_datafusion_fanout_enabled: bool, | ||
| /// Any other properties that are not explicitly captured in named fields. | ||
| #[serde(flatten)] | ||
| pub other: HashMap<String, String>, |
There was a problem hiding this comment.
This field name is not very descriptive to what it actually holds
liurenjie1024
left a comment
There was a problem hiding this comment.
Thanks @vigneshsiva11 , just finished first round of review.
| @@ -358,16 +357,13 @@ impl TableMetadata { | |||
| /// Returns properties of table. | |||
| #[inline] | |||
| pub fn properties(&self) -> &HashMap<String, String> { | |||
There was a problem hiding this comment.
I think we should remove this method.
| ), // Overwritten immediately by add_default_partition_spec | ||
| default_partition_type: StructType::new(vec![]), | ||
| last_partition_id: UNPARTITIONED_LAST_ASSIGNED_ID, | ||
| properties: HashMap::new(), |
There was a problem hiding this comment.
I think in TableMetadataBuilder, we should still use HashMap<String, String> given it's a mutable api. We could add an extra field in TableMetadataBuilder.
| pub write_datafusion_fanout_enabled: bool, | ||
| /// Any other properties that are not explicitly captured in named fields. | ||
| #[serde(flatten)] | ||
| pub additional_properties: HashMap<String, String>, |
There was a problem hiding this comment.
I don't think we need this property.
|
|
||
| /// TableProperties that contains the properties of a table. | ||
| #[derive(Debug)] | ||
| #[derive(Debug, Clone, PartialEq, Eq, Default, serde::Serialize, serde::Deserialize)] |
There was a problem hiding this comment.
We have dedicated ser/de for TableMetadata, we should not add it here.
| pub const PROPERTY_SNAPSHOT_COUNT: &str = "snapshot-count"; | ||
| /// Creates a new TableProperties from a HashMap of raw strings. | ||
| pub fn new(props: HashMap<String, String>) -> Self { | ||
| Self::try_from(&props).unwrap_or_default() |
There was a problem hiding this comment.
We should not unwrap, just implement TryFrom.
|
Hi @liurenjie1024, Wouldn’t it be more convenient to use |
Which issue does this PR close?
Closes #2028
What changes are included in this PR?
This PR refactors
TableMetadatato use the typedTablePropertiesstruct instead of a rawHashMap<String, String>, addressing issue #2028. This improvement enhances type safety and reduces repeated allocations by returning references to structured data rather than reconstructing property maps on every call.Key Modifications:
TableMetadata
propertiesfield typetable_properties()getter to return a reference to the structTableProperties
Default,Clone,PartialEq, andEqtrait implementationsother: HashMap<String, String>field to maintain compatibility with custom or non-explicitly mapped propertiesnew()constructor to facilitate conversion from raw mapsTableMetadataBuilder
TableMetadatausing the newTablePropertiestypeDocumentation
missing_docs)Are these changes tested?
Yes.
cargo check -p icebergwith no errors or lint warningscargo fmt --allThe logic ensures existing property-based workflows remain functional through the internal
otherproperty map delegation.