您好,欢迎访问三七文档
当前位置:首页 > 临时分类 > Python-实验13--程序应用开发-乳腺癌分类编程
实验13程序开发应用实验目的:1、掌握程序开发钥匙步骤2、理解分治算法的思想实验内容:•乳腺癌分类(breast-cancer-classifier)问题–问题描述:•给定一些肿瘤样本,现在希望能根据肿瘤数据对其进行判定是恶性的(malignant)还是良性的(benign)判断•数据集有如下9个肿瘤属性,可以做成9个类别•数据集如下所示:('1000025','b',5,1,1,1,2,1,3,1,1)('1002945','b',5,4,4,5,7,10,3,2,1)('1015425','b',3,1,1,1,2,2,3,1,1)('1016277','b',6,8,8,1,3,4,3,7,1)('1017023','b',4,1,1,3,2,1,3,1,1)('1017122','m',8,10,10,8,7,10,9,7,1)('1018099','b',1,1,1,1,2,10,3,1,1)('1018561','b',2,1,2,1,2,1,3,1,1)•通过研究这些属性,找到肿瘤预测模式,根据肿瘤属性来判定肿瘤性质?•设计分类器算法如下:–从训练文件中创建训练集–创建分类器,使用训练集中确定每个属性的分类值–从测试文件中创建测试集–使用分类器,对测试集进行分类–计算这些判定的准确性•参考“大框架”–defmain():–print读取训练数据。。。–trainfile=test_data.txt–trainingSet=makeTrainingSet(trainfile)–print创建分类器–classifier=trainClassifier(trainingSet)–print使用分类器,对测试集进行分类–results=classifyTestSet(trainingSet,classifier)–print计算这些判定的准确性–reportResults(results)–defReadSet(FileName):RSet=[]ReadFile=open(e:\\breast-cancer-wisconsin.data.txt,r)forlineinReadFile:line=line.strip()if?inline:continueid,a1,a2,a3,a4,a5,a6,a7,a8,a9,diag=line.split(,)ifdiag==4:diagMorB=melse:diagMorB=bpatientTup=(id,diagMorB,int(a1),int(a2),int(a3),int(a4),int(a5),int(a6),int(a7),int(a8),int(a9))RSet.append(patientTup)ReadFile.close()returnRSetdefSumList(List1,List2):SumList=[0.0]*9forindexinrange(9):SumList[index]=List1[index]+List2[index]returnSumListdefMakeAvg(SumList,total):AverageList=[0.0]*9forindexinrange(9):AverageList[index]=SumList[index]/float(total)returnAverageListdefClassifier(TrainSet):benignSum=[0]*9benignCount=0malignantSum=[0]*9malignantCount=0forpatientTupinTrainSet:ifpatientTup[1]==b:benignSum=SumList(patientTup[2:],benignSum)benignCount+=1else:malignantSum=SumList(patientTup[2:],malignantSum)malignantCount+=1benignAvg=MakeAvg(benignSum,benignCount)malignantAvg=MakeAvg(malignantSum,malignantCount)classifier=MakeAvg(SumList(benignAvg,malignantAvg),2)returnclassifierdefTest(TestSet,classifier):Result=[]forReadTupinTestSet:bCount=0mCount=0forindexinrange(9):ifReadTup[index+2]classifier[index]:mCount+=1else:bCount+=1ResultTup=(ReadTup[0],bCount,mCount,ReadTup[1])Result.append(ResultTup)returnResultdefShowResult(Result):totalCount=0wrongCount=0forRTupinResult:totalCount+=1ifRTup[1]RTup[2]:ifRTup[-1]==m:wrongCount+=1elifRTup[-1]==b:wrongCount+=1rightCount=totalCount-wrongCountaccuracy=100*float(rightCount)/totalCountprintTotal:%dWrong:%dTheaccuracy:%d%(totalCount,wrongCount,accuracy),%defmain():printReadinginTraindata...TrainFileName=e:\\breast-cancer-wisconsin.data.txtRSet=ReadSet(TrainFileName)printReadSetisdone.printFormingaclassifier...classifier=Classifier(RSet)printClassifierisdone.printReadinginTestdata...TestFileName=e:\\fullTestData.txtTestSet=ReadSet(TestFileName)printTestSetisdone.printStartTesting...Result=Test(TestSet,classifier)printTestisdone.printShowtheresult.ShowResult(Result)printProgramisfinished.main()
本文标题:Python-实验13--程序应用开发-乳腺癌分类编程
链接地址:https://www.777doc.com/doc-7259002 .html