what can be done for unbalanced data: oversampling has strange behavior ?

what can be done for unbalanced data?
for example :
number of target yes is 200
but number of target no is 500000

Oversampling , meaning replicating records with target yes helps little bit 
when oversampling  used one time
So it will be
number of target yes is 400
number of target no is 500000

but second replicating  surprisingly  is not helping , relatively to first replicating

so when it is  
number of target yes is 600
number of target no is 500000

then performance is the same as when 
number of target yes is 400
number of target no is 500000

The questions  are:
1
do you remove identical rows?
2
Do you have weighting for particular  rows? 
for example 
rows with targets yes is may have more influence than rows targets "no"
Then rows weighting  can be used for unbalanced data?

Thanks


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

what can be done for unbalanced data: oversampling has strange behavior ? #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

what can be done for unbalanced data: oversampling has strange behavior ? #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions