Description
If you try to add two datasets with add_features_from, it will fail if the datasets were created from pyarrow Tables, and the predictor is needed during training.
The reason is that add_features_from will set the data field to 0 on the new dataset if the underlying data is a pyarrow table, because it is not handled here: https://github.com/microsoft/LightGBM/blob/master/python-package/lightgbm/basic.py#L3446
Instead that function should concatenate the two pyarrow tables
Reproducible example
x = pl.DataFrame({"a": [1]})
y = pl.DataFrame({"b": [2]})
a = lgb.Dataset(data=x.to_arrow())
b = lgb.Dataset(data=y.to_arrow())
b = b.add_features_from(a)
lgb.train(params, b, init_model=init_model)
Error:
lightgbm.basic.LightGBMError: Cannot set predictor after freed raw data, set free_raw_data=False when construct Dataset to avoid this.
Environment info
LightGBM version: main