您好,欢迎访问三七文档
当前位置:首页 > 行业资料 > 酒店餐饮 > 大数据的整合分析方法_马双鸽
3211201511StatisticalResearchVol.32No.11Nov.2015*、。IntegrativeAnalysis、。。、、。、L1GroupBridge、L1GroupMCP、CompositeMCPL1GroupBridge。、。C829.2A1002-4565201511-0003-09IntegrativeAnalysisforBigDataMaShuanggeWangXiaoyanFangKuangnanAbstractThedifferenceofdatasourcehighdimensionalityandsparsityarethemaincharacteristicsofbigdata.Howtominingtheheterogeneityandassociationofdifferentdatasetsandachievedimensionreductionisoneofgoalsandchallengesofbigdataanalysis.Integrativeanalysisprovidesaneffectivewayofanalyzingbigdata.Itsimultaneouslyanalyzesmultipledatasetsavoidingthemodelinstabilityfromindividualvariationscausedbyregionalandtimefactorandsoon.Thecoefficientsofeachcovariateacrossalldatasetsaretreatedasagroupandpenaltyfunctionisusedtoshrinkagethesegroupsofcoefficientstoachievevariableselection.Inthispaperwereviewtheexistingresearchofpenalizedintegrativeanalysisfromthreeaspectsofhomogeneityintegrativeanalysisheterogeneityintegrativeanalysisandnetworkintegrativeanalysis.Threesimulationsareconductedtoverifytheperformanceofintegrativeanalysisincludingweakmoderateandstrongcorrelations.ItshowsthatL1GroupBridge、L1GroupMCP、CompositeMCPperformwellandL1GroupBridgehasthelowestfalsepositiveandismoststable.Finallyintegrativeanalysisisappliedtoanalyzethenewruralcooperativemedicalexpendituredatawithsourcedifferenceaswellascancergeneticsdatawithtypicalcharacteristicsofbigdatasuchassuperhighdimensionandsmallsample.KeywordsBigDataIntegrativeAnalysisVariableSelectionMedicalExpenditureCancerGeneticsData*“”2012LD001、“、”2013LZ53、“”13&ZD148、“”13CTJ001“”71471152。、21。、。heterogeneityordifferencehomogeneityorsimilarity1。·4·201511heterogeneity。IntegrativeAnalysis。2060、、、。“pn”。。、1“”。PenalizedIntegrativeAnalysis。、、。、。LassoTibshirani1996、、。4IndividualVariableSelectionLassoTibshirani1996、SCADFanLi2001、MCPZhang2007、BridgeFrankFriedman1993ZouHastie2005、MnetHuang2010GroupLassoYuanLin20062、CAPZhao2009SparseGroupLassoSimon20133L1GroupBridgeHuang20094。。、、。、———3、。、。。Mp。mnmymnm×1Xmnm×p。mym=Xm'βm+εm1βm=βm1…βmpT3211·5·εmEεm=0、varεm=σ2m。Xjβj=β1j…βmj。MM。β=argminβLXyβ+Pβλ2y=y1'…yM''ΣMm=1nm×1X=diagX1…XMΣMm=1nm×Mpβ=β1'…βM''Mp。LXyβLXyβ=ΣMi=1LXmymβmL·、LXyβ=y-Xβ'y-Xβ。Pλβλ。λPλββλ。λ。、homogeneityheterogeneitynetwork。M。。MXjm。、。Iβ1j=0=…IβMj=0j=1…p33βj00。M“all-in-all-out”。PouterPinnerPβλ=PouterΣpjk=1Pinner|βjk|λ4PouterPinnerPinnerRidgeHoerlKennard197000。。。L2GroupBridge、L2GroupMCP。1.L2GroupBridge。Ma2011a5LogisticL2GroupBridge。Ridge、BridgePβλγ=λΣpj=1‖βj‖γ=λΣpj=1ΣMi=1βij21/2γ0<γ<1。GroupLassoGroupLassoL2GroupBridge6。Ma20127L2GroupBridgeAFTAcceleratedFailureTime。2.L2GroupMCP。L2GroupMCP·6·20151189。Ma2011b10。Ridge、MCPPβλα=Σpj=1PMCP‖βj‖λaPMCP·MCPPMCPθλα=λθ-θ22aθ≤aλaλ22θ>a{λP'MCPθλα=λ-θaθ≤aλ0θ>a{λa。MCP。Liu201411L2GroupMCP。3.GroupLasso。GroupLasso。Zhang201512。Pβλ=λΣPj=1‖βj‖GroupLasso、L2GroupSCAD、L2GroupMCP。M。L2GroupSCAD、CAP、adaptiveGroupLassoWangLeng2006。MjIβmj=0m=1…M。。、。。。1.。4Ridge。L1GroupMCP11Pβλa=Σpj=1PMCPΣMm=1|βmj|λaLassoMCP。LassoOracleFanLi2001MCP。Liu2014aCompositeMCP、MCPPβλab=Σpj=1PMCPΣMm=1PMCP|βmj|λaλbCompositeMCPL1GroupMCPZhang2015CompositeMCPL1GroupMCP。L1GroupBridgeHuang2009Shi201413。L1GroupBridgeLasso、Bridge。Pβλ=λΣpj=1pj‖βj‖γl2.。。Pβλ1λ2=λ1Σpj=1P1‖βj‖+λ2Σpj=1ΣMm=1P2|βmj|P1·3211·7·P2·。Zhang2015。SparseGroupMCPP1·P2·MCP。SparseGroupLassoSGLSimon2013adaptiveSparseGroupLassoadSGLFang201414。PSGLβλ1λ2=λ1Σpj=1‖βj‖+λ2‖β‖1PadSGLβλ1λ2=λ1Σpj=1wj‖βj‖2+λ2ξT|β|SGLLassoGroupLassoOracleSGLFang2014adSGLwξ。。SGLadSGLLasso。。。。within-datasetstructure。across-datasetstructure。Liu201315Pβλ=λΣ1≤jk≤pajk‖βj‖2M槡j-‖βk‖2M槡()m25ML2。ajkXjXkβjβkL2。Liu2013L2GroupMCP。Shi2014Contrast。ContrastPCβ=λΣpj=1Σk≠lakljβkj-βlj266aklj=Isgnβkj=sgnβljsgnβkj=sgnβljXjklsgnβkj≠sgnβljXjContrast。sgnβklShi2014。ContrastL2GroupBridge、L1GroupBridge。、GroupCoordinateDescentGCDYuanLin2006。GCDCoordinateDescentCDFu1998。GCDGroupLassoMeier2008LogisticGroupLasso。Zhao2015161β0=β1T0…βJT0s=0r=y-Xβ0。2j∈1…pβ0kk≠jβj=β1j…βMj'。①zj=X'jrn+βsjXjβj②βs+1j←FzjλFzjλ·8·201511③r←r-Xjβs+1j-βsj。3ss+1。42、3。Tseng2001。。Tseng。GroupLassoLβyXPβλPβλΣpj=1fλβjGCD。λ。λ。λmaxβ=0λλmax。λmin0λmaxλmin=0.001λmax。λminλmaxλ。CrossValidationCV、GCV、GIC、AIC、BIC、RIC、Cp。CVMa2011a、2011b、2012kCV①k②k-1、③2④λλ。、。。380100015、6、718。3313。27、8、9352、3、424。3310。30。、。Zhang2015Liu2014①XcovXiXj=ρ|i-j|ρXiXj②0U0.51∪U-1-0.5σ=0.5。MCP、L1GroupMCP、L1GroupBridgeCompositeMCPMCP。P0TP。1001。①4PTP。②3100%。③3MCP0.20.5MCP3。④3L1GroupBridgeCompositeMCPL1GroupMCP。、、3211·9·1ρMCPL1GroupMCPL1GroupBridgeCompositeMCPPTPPTPPTPPTP10.221.913.4918030.1213.0818019.140.8018029.9410.801800.525.865.4517.930.2627.879.5718018.620.7218026.258.001800.816.904.9210.571.1319.002.3416.970.9118.191.1017.050.9119.022.4016.950.9220.230.345.4423.970.3044.1414.1124025.300.9524042.4812.872400.528.174.0523.900.3334.327.6424024.401.0123.760.4534.146.272400.820.104.7012.471.3524.662.3622.101.5321.791.4621.491.2724.722.2622.091.5330.235.305.9329.980.2048.2311.7630031.091.0629.980.1448.1011.983000.532.182.8529.191.4638.024.4630030.220.8029.790.4137.864.243000.822.314.7114.191.5125.722.4724.152.5427.691.1127.371.0525.682.4724.132.50。。。201279、、、、5。688587、5
本文标题:大数据的整合分析方法_马双鸽
链接地址:https://www.777doc.com/doc-5097843 .html