Thank you for your support.
Suppose we have a training set with an attribute “age” which contains following values.
Age : 10, 11, 16, 18, 20, 35
Now at a node, the algorithm will consider following possible splitting
Age <=10 & Age>10
Age <=11 & Age>11
Age <=16 & Age>16
Age <=18 & Age>18
Age <=20 & Age>20
You can see that if there are N possible values, we would have to consider N-1 possible splits.
And note that we do not choose the mid-point between values as the splitting threshold.
other approach like Binning is also used.lets understand from above example.
- sort the values
- Decide the number of bin. here consider numbers of bin =3
bin1 = [10,11], bin2 = [16,18], bin3 = [20,35]
basically we are categorising a real value data to look like a categorical data. But it is not at all categorisation as it changes after every time you perform splitting and values also changes.