本文共 2308 字,大约阅读时间需要 7 分钟。
K-means是一种经典的无监督学习算法,广泛应用于数据聚类任务。其核心思想如下:
import numpy as npimport pandas as pdfrom matplotlib import pyplot as pltdef distance(a, b): """计算两点之间的欧式距离""" return np.sqrt((a[0] - b[0])**2 + (a[1] - b[1])**2)def init_centers(k): """初始化K个2维聚类中心点""" return np.random.random(k*2).reshape(k, 2)def assign_cluster(point, centers): """确定点所属的聚类""" kclass = 0 min_distance = np.inf for i, center in enumerate(centers): d = distance(point, center) if d < min_distance: min_distance = d kclass = i return kclassdef update_centers(points, kclasses): """更新聚类中心点""" k = max(kclasses) + 1 new_centers = np.zeros((k, 2)) point_df = pd.DataFrame(points, columns=["x", "y"]) kclasses_df = pd.DataFrame(kclasses, columns=["kclass"]) points_with_class = point_df.join(kclasses_df) for i in range(k): mask = points_with_class["kclass"] == i subset = points_with_class[mask] new_centers[i] = subset[["x", "y"]].mean(axis=0) return new_centersdef kmeans(points, k): """执行K-means算法""" centers = init_centers(k) kclasses = np.zeros(len(points)) for _ in range(200): new_centers = update_centers(points, kclasses) if np.array_equal(kclasses, new_centers Classes): break centers = new_centers kclasses = assign_cluster(points, centers) return centers, kclasses
init_centers(k)函数,随机生成K个初始中心点。assign_cluster(point, centers)函数,计算每个点到各中心点的距离,确定其所属聚类。update_centers(points, kclasses)函数,计算每个聚类的平均值,作为新的聚类中心。# 示例数据c = np.array([ [1, 2], [1, 1], [2, 2], [5, 5], [-0.10, -2.10], [-0.8, -1.8], [-2.9, -0.9], [-3.1, -2.2], [2, 6], [7, 10]])# 运行K-means算法a, b = kmeans(c, 2)# 绘制结果c_df = pd.DataFrame(c)plt.plot(c_df[0], c_df[1], 'or')plt.plot(a[0], a[1], 'xb')plt.show()
通过上述实现,可以清晰地看到K-means算法的工作原理和实际应用,适用于各种二维数据的聚类任务。
转载地址:http://hjpv.baihongyu.com/