Skip to content

[python-package] add_features_from fails if the dataset is built from pyarrow Tables #6937

@cBournhonesque

Description

@cBournhonesque

Description

If you try to add two datasets with add_features_from, it will fail if the datasets were created from pyarrow Tables, and the predictor is needed during training.

The reason is that add_features_from will set the data field to 0 on the new dataset if the underlying data is a pyarrow table, because it is not handled here: https://github.com/microsoft/LightGBM/blob/master/python-package/lightgbm/basic.py#L3446

Instead that function should concatenate the two pyarrow tables

Reproducible example

x = pl.DataFrame({"a": [1]})
y = pl.DataFrame({"b": [2]})
a = lgb.Dataset(data=x.to_arrow())
b = lgb.Dataset(data=y.to_arrow())
b = b.add_features_from(a)

lgb.train(params, b, init_model=init_model)

Error:

lightgbm.basic.LightGBMError: Cannot set predictor after freed raw data, set free_raw_data=False when construct Dataset to avoid this.

Environment info

LightGBM version: main

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions