您好,欢迎访问三七文档
当前位置:首页 > 行业资料 > 冶金工业 > 矩阵运算-梯度和求导公式
AppendixDMatrixcalculusFromtoomuchstudy,andfromextremepassion,comethmadnesse.−IsaacNewton[150,§5]D.1Directionalderivative,TaylorseriesD.1.1GradientsGradientofadifferentiablerealfunctionf(x):RK→Rwithrespecttoitsvectorargumentisdefinedintermsofpartialderivatives∇f(x),∂f(x)∂x1∂f(x)∂x2...∂f(x)∂xK∈RK(1719)whilethesecond-ordergradientofthetwicedifferentiablerealfunctionwithrespecttoitsvectorargumentistraditionallycalledtheHessian;∇2f(x),∂2f(x)∂x21∂2f(x)∂x1∂x2···∂2f(x)∂x1∂xK∂2f(x)∂x2∂x1∂2f(x)∂x22···∂2f(x)∂x2∂xK............∂2f(x)∂xK∂x1∂2f(x)∂xK∂x2···∂2f(x)∂x2K∈SK(1720)'2001JonDattorro.co&edgversion2010.01.05.Allrightsreserved.citation:Dattorro,ConvexOptimization&EuclideanDistanceGeometry,MεβooPublishingUSA,2005,v2010.01.05.657658APPENDIXD.MATRIXCALCULUSThegradientofvector-valuedfunctionv(x):R→RNonrealdomainisarow-vector∇v(x),h∂v1(x)∂x∂v2(x)∂x···∂vN(x)∂xi∈RN(1721)whilethesecond-ordergradientis∇2v(x),h∂2v1(x)∂x2∂2v2(x)∂x2···∂2vN(x)∂x2i∈RN(1722)Gradientofvector-valuedfunctionh(x):RK→RNonvectordomainis∇h(x),∂h1(x)∂x1∂h2(x)∂x1···∂hN(x)∂x1∂h1(x)∂x2∂h2(x)∂x2···∂hN(x)∂x2.........∂h1(x)∂xK∂h2(x)∂xK···∂hN(x)∂xK=[∇h1(x)∇h2(x)···∇hN(x)]∈RK×N(1723)whilethesecond-ordergradienthasathree-dimensionalrepresentationdubbedcubix;D.1∇2h(x),∇∂h1(x)∂x1∇∂h2(x)∂x1···∇∂hN(x)∂x1∇∂h1(x)∂x2∇∂h2(x)∂x2···∇∂hN(x)∂x2.........∇∂h1(x)∂xK∇∂h2(x)∂xK···∇∂hN(x)∂xK=[∇2h1(x)∇2h2(x)···∇2hN(x)]∈RK×N×K(1724)wherethegradientofeachrealentryiswithrespecttovectorxasin(1719).D.1ThewordmatrixcomesfromtheLatinforwomb;relatedtotheprefixmatri-derivedfrommatermeaningmother.D.1.DIRECTIONALDERIVATIVE,TAYLORSERIES659Thegradientofrealfunctiong(X):RK×L→Ronmatrixdomainis∇g(X),∂g(X)∂X11∂g(X)∂X12···∂g(X)∂X1L∂g(X)∂X21∂g(X)∂X22···∂g(X)∂X2L.........∂g(X)∂XK1∂g(X)∂XK2···∂g(X)∂XKL∈RK×L=£∇X(:,1)g(X)∇X(:,2)g(X)...∇X(:,L)g(X)¤∈RK×1×L(1725)wherethegradient∇X(:,i)iswithrespecttotheithcolumnofX.Thestrangeappearanceof(1725)inRK×1×Lismeanttosuggestathirddimensionperpendiculartothepage(notadiagonalmatrix).Thesecond-ordergradienthasrepresentation∇2g(X),∇∂g(X)∂X11∇∂g(X)∂X12···∇∂g(X)∂X1L∇∂g(X)∂X21∇∂g(X)∂X22···∇∂g(X)∂X2L.........∇∂g(X)∂XK1∇∂g(X)∂XK2···∇∂g(X)∂XKL∈RK×L×K×L=£∇∇X(:,1)g(X)∇∇X(:,2)g(X)...∇∇X(:,L)g(X)¤∈RK×1×L×K×L(1726)wherethegradient∇iswithrespecttomatrixX.660APPENDIXD.MATRIXCALCULUSGradientofvector-valuedfunctiong(X):RK×L→RNonmatrixdomainisacubix∇g(X),£∇X(:,1)g1(X)∇X(:,1)g2(X)···∇X(:,1)gN(X)∇X(:,2)g1(X)∇X(:,2)g2(X)···∇X(:,2)gN(X).........∇X(:,L)g1(X)∇X(:,L)g2(X)···∇X(:,L)gN(X)¤=[∇g1(X)∇g2(X)···∇gN(X)]∈RK×N×L(1727)whilethesecond-ordergradienthasafive-dimensionalrepresentation;∇2g(X),£∇∇X(:,1)g1(X)∇∇X(:,1)g2(X)···∇∇X(:,1)gN(X)∇∇X(:,2)g1(X)∇∇X(:,2)g2(X)···∇∇X(:,2)gN(X).........∇∇X(:,L)g1(X)∇∇X(:,L)g2(X)···∇∇X(:,L)gN(X)¤=[∇2g1(X)∇2g2(X)···∇2gN(X)]∈RK×N×L×K×L(1728)Thegradientofmatrix-valuedfunctiong(X):RK×L→RM×Nonmatrixdomainhasafour-dimensionalrepresentationcalledquartix(fourth-ordertensor)∇g(X),∇g11(X)∇g12(X)···∇g1N(X)∇g21(X)∇g22(X)···∇g2N(X).........∇gM1(X)∇gM2(X)···∇gMN(X)∈RM×N×K×L(1729)whilethesecond-ordergradienthassix-dimensionalrepresentation∇2g(X),∇2g11(X)∇2g12(X)···∇2g1N(X)∇2g21(X)∇2g22(X)···∇2g2N(X).........∇2gM1(X)∇2gM2(X)···∇2gMN(X)∈RM×N×K×L×K×L(1730)andsoon.D.1.DIRECTIONALDERIVATIVE,TAYLORSERIES661D.1.2Productrulesformatrix-functionsGivendimensionallycompatiblematrix-valuedfunctionsofmatrixvariablef(X)andg(X)∇X¡f(X)Tg(X)¢=∇X(f)g+∇X(g)f(1731)while[51,§8.3][309]∇Xtr¡f(X)Tg(X)¢=∇X³tr¡f(X)Tg(Z)¢+tr¡g(X)f(Z)T¢´¯¯¯Z←X(1732)Theseexpressionsimplicitlyapplyaswelltoscalar-,vector-,ormatrix-valuedfunctionsofscalar,vector,ormatrixarguments.D.1.2.0.1Example.Cubix.Supposef(X):R2×2→R2=XTaandg(X):R2×2→R2=Xb.Wewishtofind∇X¡f(X)Tg(X)¢=∇XaTX2b(1733)usingtheproductrule.Formula(1731)callsfor∇XaTX2b=∇X(XTa)Xb+∇X(Xb)XTa(1734)Considerthefirstofthetwoterms:∇X(f)g=∇X(XTa)Xb=£∇(XTa)1∇(XTa)2¤Xb(1735)ThegradientofXTaformsacubixinR2×2×2;a.k.a,third-ordertensor.∂(XTa)1∂X11IIIIII∂(XTa)2∂X11IIIIII∂(XTa)1∂X12∂(XTa)2∂X12∂(XTa)1∂X21IIIIII∂(XTa)2∂X21IIIIII∂(XTa)1∂X22∂(XTa)2∂X22∇X(XTa)Xb=(Xb)1(Xb)2∈R2×1×2(1736)662APPENDIXD.MATRIXCALCULUSBecausegradientoftheproduct(1733)requirestotalchangewithrespecttochangeineachentryofmatrixX,theXbvectormustmakeaninnerproductwitheachvectorintheseconddimensionofthecubix(indicatedbydottedlinesegments);∇X(XTa)Xb=a100a1a200a2·b1X11+b2X12b1X21+b2X22¸∈R2×1×2=·a1(b1X11+b2X12)a1(b1X21+b2X22)a2(b1X11+b2X12)a2(b1X21+b2X22)¸∈R2×2=abTXT(1737)wherethecubixappearsasacomplete2×2×2matrix.Inlikemannerforthesecondterm∇X(g)f∇X(Xb)XTa=b10b200b10b2·X11a1+X21a2X12a1+X22a2¸∈R2×1×2=XTabT∈R2×2(1738)Thesolution∇XaTX2b=abTXT+XTabT(1739)canbefoundfromTableD.2.1orverifiedusing(1732).2D.1.2.1KroneckerproductApartialremedyforventuringintohyperdimensionalmatrixrepresentations,suchasthecubixorquartix,istofirstvectorizematricesasin(37).ThisdevicegivesrisetotheKroneckerproductofmatrices⊗;a.k.a,directproductortensorproduct.Althoughitseesreversalintheliterature,[321,§2
本文标题:矩阵运算-梯度和求导公式
链接地址:https://www.777doc.com/doc-7194754 .html