-
Notifications
You must be signed in to change notification settings - Fork 103
Open
Description
Hi!
First of all, thanks for providing this nice work!
While I am looking into the code, I found the squared_distance function is a little bit confusing. If Y is not provided (so Y = X), this function will do an option of X - X and then take the sum. So, isn't the return value zero?
Lines 11 to 33 in 43b0fca
| def squared_distance(X, Y=None, W=None): | |
| ''' | |
| Calculates the pairwise distance between points in X and Y | |
| X: n x d matrix | |
| Y: m x d matrix | |
| W: affinity -- if provided, we normalize the distance | |
| returns: n x m matrix of all pairwise squared Euclidean distances | |
| ''' | |
| if Y is None: | |
| Y = X | |
| # distance = squaredDistance(X, Y) | |
| sum_dimensions = list(range(2, K.ndim(X) + 1)) | |
| X = K.expand_dims(X, axis=1) | |
| if W is not None: | |
| # if W provided, we normalize X and Y by W | |
| D_diag = K.expand_dims(K.sqrt(K.sum(W, axis=1)), axis=1) | |
| X /= D_diag | |
| Y /= D_diag | |
| squared_difference = K.square(X - Y) | |
| distance = K.sum(squared_difference, axis=sum_dimensions) | |
| return distance |
Another question about the number of clusters K, can I use a relatively larger number when my dataset contains about 1 million samples? For example, over 1000?
Thanks!
Fan
Metadata
Metadata
Assignees
Labels
No labels