您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 项目/工程管理 > python实现PageRank计算
#coding=utf-8#Filename:pr.pyS=[[0,0,0,0],[0.3333,0,0,1],[0.3333,0.5,0,0],[0.3333,0.5,1,0]]#原始矩阵U=[[1,1,1,1],[1,1,1,1],[1,1,1,1],[1,1,1,1]]#全部都为1的矩阵f=[1,1,1,1]#物征向量alpha=0.85#a值0-1之间的小数n=len(S)#网页数'''aSa权重值由google决定值大小,0-1之间,S为原始矩阵'''defmultiGeneMatrix(gene,Matrix):mullist=[[0]*len(Matrix)forrowinrange(len(Matrix))]#定义新的矩阵大小,初始化为0foriinrange(0,len(Matrix)):forjinrange(0,len(Matrix)):mullist[i][j]+=Matrix[i][j]*genereturnmullist'''两个矩阵相加'''defaddMatrix(Matrix1,Matrix2):iflen(Matrix1[0])!=len(Matrix2):print这两个矩阵无法相加...returnaddlist=[[0]*len(Matrix1)forrowinrange(len(Matrix1))]#定义新的矩阵大小foriinrange(0,len(Matrix1)):forjinrange(0,len(Matrix2)):addlist[i][j]=Matrix1[i][j]+Matrix2[i][j]returnaddlist'''矩阵与向量相乘'''defmultiMatrixVector(m,v):rv=range(len(v))forrowinrange(0,len(m)):temp=0forcolinrange(0,len(m[1])):temp+=m[row][col]*v[col]rv[row]=tempreturnrv#公式f1=multiGeneMatrix(alpha,S)f2=multiGeneMatrix((1-alpha)/len(S[0]),U)G=addMatrix(f1,f2)printG#google矩阵#迭代过程count=0while(True):count=count+1pr_next=multiMatrixVector(G,f)print第%s轮迭代%countprintstr(round(pr_next[0],5))+\t+str(round(pr_next[1],5))+\t+str(round(pr_next[2],5))+\t+str(round(pr_next[3],5))ifround(f[0],5)==round(pr_next[0],5)andround(f[1],5)==round(pr_next[1],5)andround(f[2],5)==round(pr_next[2],5)andround(f[3],5)==round(pr_next[3],5):#当前向量与上次向量值偏差不大后,停止迭breakf=pr_nextprintPageRank值已计算完成运行结果:第1轮迭代0.151.28330.85831.70831第2轮迭代0.151.644550.73791.46746第3轮迭代0.151.439830.891431.51864第4轮迭代0.151.483330.804421.56213第5轮迭代0.151.52030.822911.50666第6轮迭代0.149991.473150.838621.53809第7轮迭代0.149991.499860.818581.5314第8轮迭代0.149991.494180.829931.52572第9轮迭代0.149991.489350.827511.53295第10轮迭代0.149991.49550.825461.52885第11轮迭代0.149991.492010.828071.52971第12轮迭代0.149991.492740.826591.53045第13轮迭代0.149991.493370.82691.5295第14轮迭代0.149991.492560.827171.53003第15轮迭代0.149991.493010.826821.52991第16轮迭代0.149991.492910.827011.52981第17轮迭代0.149991.492820.826971.52993第18轮迭代0.149991.492920.826931.52986第19轮迭代0.149991.492860.826971.52987第20轮迭代0.149991.492870.826951.52987第21轮迭代0.149991.492870.826951.52985第22轮迭代0.149991.492850.826951.52986第23轮迭代0.149991.492860.826941.52985第24轮迭代0.149991.492850.826941.52984第25轮迭代0.149991.492840.826941.52984第26轮迭代0.149991.492840.826941.52983第27轮迭代0.149981.492840.826931.52983第28轮迭代0.149981.492830.826931.52982第29轮迭代0.149981.492830.826931.52982PageRank值已计算完成#coding=utf-8#FileName:MapReducePageRank.py'''模拟map-reduce的思想,实现AB物理节点的分布计算。'''#矩阵与乘因子defmultiGeneMatrix(gene,Matrix):result=[[0]*len(Matrix[0])forrowinrange(len(Matrix[0]))]#定义大小一样新的矩阵,初始化为0foriinrange(0,len(Matrix[0])):forjinrange(0,len(Matrix[0])):result[i][j]=Matrix[i][j]*genereturnresult#两个矩阵相加defaddMatrix(Matrix1,Matrix2):iflen(Matrix1[0])!=len(Matrix2[1]):print这两个矩阵无法相加...returnaddList=[[0]*len(Matrix1[0])forrowinrange(len(Matrix1[0]))]foriinrange(0,len(Matrix1[0])):forjinrange(0,len(Matrix2[0])):addList[i][j]=Matrix1[i][j]+Matrix2[i][j]returnaddList#两个矩阵合并defaddColumnMatrix(Matrix1,Matrix2):result=Matrix1+Matrix2returnresult#矩阵与向量相乘defmultiMatrixVector(m,v):rv=range(len(m))forrowinrange(0,len(m)):temp=0forcolinrange(0,len(m[1])):temp+=m[row][col]*v[col]rv[row]=tempreturnrv'''按照map-reduce的思想,现在假设有物理节点A,B参与计算,其中网页1、2保存于A,网页3、4保存于B,试述完整的pagerank计算过程'''alpha=0.85#定义google权重值aS=[[0,0],[0.3333,0],[0.3333,0.5],[0.3333,0.5]]#假设A节点存放page1page2网页原始矩阵bS=[[0,0],[0,1],[0,0],[1,0]]#假设B节点存放page3page4网页原始矩阵U=[[1,1],[1,1],[1,1],[1,1]]#全部为1的矩阵n=len(aS)#得到网页个数#计算A节点的G矩阵fa1=multiGeneMatrix(alpha,aS)#google权重值与原始矩阵fa2=multiGeneMatrix((1-alpha)/n,U)#(1-a)/n*UaG=addMatrix(fa1,fa2)#aS+(1-a)/n*U#计算B节点的G矩阵fb1=multiGeneMatrix(alpha,bS)fb2=multiGeneMatrix((1-alpha)/n,U)bG=addMatrix(fb1,fb2)#AB节点特征向量qa_cur=[1,1]qb_cur=[1,1]count=1while(True):count=count+1#得到A节点qG#printaG#printqa_curqa_next=multiMatrixVector(aG,qa_cur);#得到B节点qGqb_next=multiMatrixVector(bG,qb_cur);#合并两个矩阵qab_next=addColumnMatrix(qa_next,qb_next);#小数位缩小到5位qa_cur[0]=round(qa_cur[0],5)qa_cur[1]=round(qa_cur[1],5)qb_cur[0]=round(qb_cur[0],5)qb_cur[1]=round(qb_cur[1],5)qab_next[0]=round(qab_next[0],5)qab_next[1]=round(qab_next[1],5)qab_next[2]=round(qab_next[2],5)qab_next[3]=round(qab_next[3],5)#判断是否收敛到十分接近ifqa_cur[0]==qab_next[0]andqa_cur[1]==qab_next[1]andqb_cur[0]==qab_next[2]andqb_cur[1]==qab_next[3]:breakqa_cur[0]=qab_next[0]qa_cur[1]=qab_next[1]qb_cur[0]=qab_next[2]qb_cur[1]=qab_next[3]sum=qa_cur[0]+qa_cur[1]+qb_cur[0]+qb_cur[1]printP1=+str(qa_cur[0]/sum)printP2=+str(qa_cur[1]/sum)printP3=+str(qb_cur[0]/sum)printP4=+str(qb_cur[1]/sum)print第%s轮迭代。%count运行结果:P1=0.0523267982976P2=0.249982557734P3=0.0523267982976P4=0.645363845671第2轮迭代。P1=0.0177595628415P2=0.0409836065574P3=0.0409836065574P4=0.900273224044第3轮迭代。P1=0.00261177626645P2=0.0085593855861P3=0.0417625610923P4=0.947066277055第4轮迭代。P1=0.000469766144541P2=0.00132121728152P3=0.0421027907045P4=0.956106225869第5轮迭代。P1=8.26733246251e-05P2=0.00023148530895P3=0.0421633955588P4=0.957522445808第6轮迭代。P1=1.86008444783e-05P2=3.72016889567e-05P3=0.0421681144324P4=0.957776083034第7轮迭代。P1=0.0P2=0.0P3=0.0421766145735P4=0.957823385426第8轮迭代。P1=0.0
本文标题:python实现PageRank计算
链接地址:https://www.777doc.com/doc-2853890 .html