当前位置：文档之家› 基于密度比例的密度峰值聚类算法

基于密度比例的密度峰值聚类算法

Computer Engineering and Applications 计算机工程与应用

2017，53（16）1引言聚类是一种无监督的数据分析方法，其目的是在没有先验知识的情况下，根据数据性质之间的相似性将其分成有限类簇，使得同一类簇内对象彼此相似性较高，不同类簇间对象彼此相似性较低[1-5]。聚类分析可以从数据中发现有用的信息，解释数据间隐藏的关系和规律[2，5]，在工程系统[6]，计算机科学，生命和医学科学，社会科学及经济领域方面都有广泛应用[1-5，7]。

按照数据的特征描述方式，聚类算法可以大体分为：基于划分的聚类算法、基于层次的聚类算法、基于密度的聚类算法、基于网格的聚类算法、基于模型的聚类算法和基于图论的聚类算法[8]。其中基于密度的聚类算法主要有：DBSCAN [9]，OPTICS [10]，GDBSCAN [11]等。该类算法可以发现任意形状的类簇，主要思想是：只要一基于密度比例的密度峰值聚类算法

高诗莹1，2，3，周晓锋2，3，李帅2，3

GAO Shiying 1，2，3,ZHOU Xiaofeng 2，3,LI Shuai 2，3

1.东北大学计算机科学与工程学院，沈阳110000

2.中国科学院沈阳自动化研究所，沈阳110016

3.中国科学院网络化控制系统重点实验室，沈阳110016

1.School of Computer Science and Engineering,Northeastern University,Shenyang 110000,China

2.Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang 110016,China

3.Key Laboratory of Control Network System,Chinese Academy of Sciences,Shenyang 110016,China

GAO Shiying,ZHOU Xiaofeng,LI Shuai.Clustering by fast search and find of density peaks based on https://www.doczj.com/doc/9214608541.html,puter Engineering and Applications,2017,53（16）：10-17.

Abstract ：CFSFDP （Clustering by Fast Search and Find of Density Peaks ）is a new density-based clustering algorithm,which can cluster the non-spherical data with fewer parameters and high speed of clustering.However,when the density of different clusters vary widely,it is hard to find the clusters with sparse density,so that the accuracy of clustering will be decreased.To solve this problem,this paper proposes a density-raito based CFSFDP that short of R-CFSFDP.In this algo-rithm,the density-ratio is introduced into CFSFDP to make clusters with sparse density easily identifiable.To validate the algorithm,experiments are conducted with 9data sets （2synthetic data sets,7UCI data sets ）.The experimental results show that,when the cluster shape is complex and the density of different clustersvary widely,it makes the cluster centers easier to be determined and has a higher accuracy of the clustering than CFSFDP.

Key words ：clustering;density peaks;density-raito;varying densities

摘要：CFSFDP （Clustering by Fast Search and Find of Density Peaks ）是一种新的基于密度的聚类算法。该算法可以对非球形分布的数据聚类，有待调节参数少、聚类速度快等优点。但是对于类簇间密度相差较大的数据，该算法容易遗漏密度较小的类簇而影响聚类的准确率。针对这一问题，提出了基于密度比例峰值聚类算法即R-CFSFDP 。该算法将密度比例引入到CFSFDP 中，通过计算样本数据的密度比峰值来提高数据中密度较小类簇的辨识度，进而提升整体聚类的准确率。基于9个常用测试数据集（2个人工合成数据集，7个UCI 数据集）的聚类实验结果表明，对于类簇间密度相差较大和类簇形状复杂的数据聚类问题，R-CFSFDP 能够使得类簇中心更加清晰、易确定，聚类结果更好。关键词：聚类；密度峰值；密度比例；密度变化

文献标志码：A 中图分类号：TP 183doi ：10.3778/j.issn.1002-8331.1704-0227

基金项目：辽宁省科学技术计划项目（No.2015106015）。

作者简介：高诗莹（1993—），女，硕士研究生，研究领域为数据挖掘，聚类，E-mail ：gaoshiying@https://www.doczj.com/doc/9214608541.html, ；周晓锋（1978—），女，博士，

副研究员，研究领域为机器学习；李帅（1988—），男，助理研究员，研究领域为机器学习。

收稿日期：2017-04-18修回日期：2017-06-19文章编号：1002-8331（2017）16-0010-08

10万方数据