您好,欢迎访问三七文档
当前位置:首页 > 金融/证券 > 股票报告 > 基于模糊C均值的聚类分析
基于模糊C均值的聚类分析1模糊c均值聚类(FCM)方法模糊C均值聚类(FCM)方法是一种在已知聚类数的情况下,利用隶属度函数和迭代算法将有限的数据集分别聚类的方法。其目标函数为:式中,为样本数;为聚类数;为第个样本相对于第个聚类中心的隶属度;为第个类别的聚类中心;为样本到聚类中心的欧式距离。聚类的结果使目标函数最小,因此,构造如下新的目标函数:(2)这里,=1,⋯,n,是等式的n个约束式的拉格朗日乘子。对所有输入参量求导,使式(1)达到最小的必要条件为:(3)(4)由上述两个必要条件,模糊c均值聚类算法是一个简单的迭代过程。在批处理方式运行时,FCM采用下列步骤确定聚类中心和隶属矩阵U:步骤1用值在0,1间的随机数初始化隶属矩阵U,使其满足式(2)中的约束条件。步骤2用式(3)计算c个聚类中心,i=1,⋯,c。步骤3根据式(1)计算目标函数。如果它小于某个确定的阈值,或它相对上次价值函数值的改变量小于某个阈值,则算法停止。步骤4用式(4)计算新的U阵。近回步骤2。当算法收敛时,就得到了各类的聚类中心和各个样本对于各类的隶属度值,从而完成了模糊聚类划分。上述算法中,由于引入的归一化条件,在样本集不理想的情况下可能导致结果不好。比如,如果某个野值样本远离各类的聚类中心,本来它严格属于各类的隶属度都很小,但由于归一化条件的限制,将会使它对各类都有较大的隶属度(比如两类情况下各类的隶属度都是0.5),这种野值的存在将影响迭代的最终结果。程序•ifnargin~=2&nargin~=3,•error('Toomanyortoofewinputarguments!');•end•data_n=size(data,1);•in_n=size(data,2);•default_options=[2;%u矩阵分割指数100;%迭代的最大次数1e-5;%改进的最小值1];%迭代时显示信息•ifnargin==2,•options=default_options;•else•iflength(options)4,•tmp=default_options;•tmp(1:length(options))=options;•options=tmp;•end•nan_index=find(isnan(options)==1);•options(nan_index)=default_options(nan_index);•ifoptions(1)=1,•error('Theexponentshouldbegreaterthan1!');•end•end•expo=options(1);%u矩阵指数•max_iter=options(2);%迭代最大次数•min_impro=options(3);%改进的最小值•display=options(4);•obj_fcn=zeros(max_iter,1);%目标函数的建立•U=initfcm(cluster_n,data_n);%初始化模糊分割矩阵%以下为主循环:•fori=1:max_iter,•[U,center,obj_fcn(i)]=stepfcm(data,U,cluster_n,expo);•ifdisplay,•fprintf('Iterationcount=%d,obj.fcn=%f\n',i,obj_fcn(i));•end%检查终止情况:•ifi1,•ifabs(obj_fcn(i)-obj_fcn(i-1))min_impro,break;end,•end•end•iter_n=i;%•obj_fcn(iter_n+1:max_iter)=[];调用上述程序建立起来的模糊聚类函数,得到以下运行程序:A=[1739.941675.152395.96;373.33087.052429.47;1756.7716521514.98;864.451647.312665.9;222.853059.542002.33;877.882031.663071.18;1803.581583.122163.05;2352.122557.041411.53;401.33259.942150.98;363.343477.952462.86;1571.171731.041735.33;104.83389.832421.83;499.853305.752196.22;2297.283340.14535.62;2092.623177.21584.32;1418.791775.892772.9;1845.591918.812226.49;2205.363243.741202.69;2949.163244.44662.42;1692.621867.52108.97;1680.671575.781725.1;2802.883017.111984.98;172.783084.492328.65;2063.543199.761257.21;1449.581641.583405.12;1651.521713.281570.38;341.593076.622438.63;291.023095.682088.95;237.633077.782251.96;1702.81639.792068.74;1877.931860.961975.3;867.812334.682535.1;1831.491713.111604.68;460.693274.772172.99;2374.983346.98975.31;2271.893482.97946.7;1783.641597.992261.31;198.833250.452445.08;1494.632072.592550.51];[CENTER2,U2,OBJ_FCN2]=FCM(A,4)得出聚类中心及隶属度矩阵:•CENTER2=314.723194.72283.52330.53250958.331748.81733.21927.71211.818792821.1•U2=•Columns1through60.0332010.970070.0297890.0560820.926260.0716660.0295210.00585250.0471230.0208860.0175960.0203680.648970.00886170.849870.119480.0239370.0761620.288310.0152170.0732230.803550.0322080.8318•Columns7through120.014860.0702570.978080.9310.0156680.943720.0159190.544630.00557410.0172720.0180670.013430.886830.28540.00683290.0203490.918890.0167910.0823940.0997130.0095170.0313760.0473750.026058•Columns13through180.958960.0246540.030030.0147140.0268680.0150670.0106410.919280.900320.00811240.0295390.948150.0126680.0358320.0453170.0619140.802490.023700.0177290.0202350.0243330.915260.14110.013072•Columns19through240.0414130.0132790.0136030.106160.975340.0350020.842630.0134560.0171190.503260.0051040.88240.0744230.90690.926140.242530.00749040.0533150.0415340.0663630.0431410.148050.0120660.029278•Columns25through300.0683720.0266210.968610.963670.984340.0066660.0362580.037390.00607310.00850950.00333540.00693660.148640.869030.00927170.0115970.00483480.952540.746730.0669610.0160460.0162260.0074870.033862•Columns31through360.00777990.214670.0201320.969560.00194060.00983720.0106020.0435710.0329470.0077870.993410.969340.952650.157980.894160.00945310.00297260.013040.028970.583780.0527620.0132040.0016780.0077779•Columns37through390.0224530.971930.0483980.0223670.00613550.0285220.808250.00833480.232140.146930.0135950.69094•OBJ_FCN2=1.9285e+0071.352e+0078.2142e+0065.4477e+0064.9316e+0064.85e+0064.8367e+0064.8344e+0064.834e+0064.8339e+0064.8339e+0064.8339e+0064.8339e+0064.8339e+0064.8339e+0064.8339e+0064.8339e+0064.8339e+0064.8339e+0064.8339e+0064.8339e+0064.8339e+0064.8339e+006此时,目标函数在4.8339e+006处收敛,算法结速由得出的聚类中心矩阵及隶属度矩阵就可以进行分类了:聚类中心矩阵有4行,每一行代表一类及四类,3列,每一列代表一种颜色;由隶属度矩阵中Columns1through6为例:0.0332010.970070.0297890.0560820.926260.0716660.0295210.00585250.0471230.0208860.0175960.0203680.648970.00886170.849870.119480.0239370.0761620.288310.0152170.0732230.803550.0322080.8318先看上述矩阵的第一列,从中找出最大值是0.64897,在第三行,由最大隶属度原则可得:样本1属于第三类,以此类推,可将39个样本值进行分类,最终分类结果如下:BlueGreenRedType1739.941675.152395.961373.33087.052429.4721756.7716521514.981864.451647.312665.93222.853059.542002.332877.882031.663071.1831803.581583.122163.0512352.122557.041411.534401.33259.942150.982363.343477.952462.8621571.171731.041735.331104.83389.832421.832499.853305.752196.2222297.283340.14535.6242092.623177.21584.3241418.791775.892772.931845.591918.812226.4912205.363243.741202.6942949.163244.44662.4241692.621867.52108.9711680.671575.781725.112802.883017.111984.984172.783084.492328.6522063.543199.761257.2141449.5816
本文标题:基于模糊C均值的聚类分析
链接地址:https://www.777doc.com/doc-5062807 .html