Skip to content

what can be done for unbalanced data: oversampling has strange behavior ? #13

@Sandy4321

Description

@Sandy4321

what can be done for unbalanced data?
for example :
number of target yes is 200
but number of target no is 500000

Oversampling , meaning replicating records with target yes helps little bit
when oversampling used one time
So it will be
number of target yes is 400
number of target no is 500000

but second replicating surprisingly is not helping , relatively to first replicating

so when it is
number of target yes is 600
number of target no is 500000

then performance is the same as when
number of target yes is 400
number of target no is 500000

The questions are:
1
do you remove identical rows?
2
Do you have weighting for particular rows?
for example
rows with targets yes is may have more influence than rows targets "no"
Then rows weighting can be used for unbalanced data?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions