what can be done for unbalanced data?
for example :
number of target yes is 200
but number of target no is 500000
Oversampling , meaning replicating records with target yes helps little bit
when oversampling used one time
So it will be
number of target yes is 400
number of target no is 500000
but second replicating surprisingly is not helping , relatively to first replicating
so when it is
number of target yes is 600
number of target no is 500000
then performance is the same as when
number of target yes is 400
number of target no is 500000
The questions are:
1
do you remove identical rows?
2
Do you have weighting for particular rows?
for example
rows with targets yes is may have more influence than rows targets "no"
Then rows weighting can be used for unbalanced data?
Thanks