您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 市场营销 > Python下Pandas的14个最佳特色功能
Python下Pandas的14个最佳特色功能14BestPythonPandasFeaturesPandasisthemostwidelyusedtoolfordatamunging.Itcontainshigh-leveldatastructuresandmanipulationtoolsdesignedtomakedataanalysisfastandeasy.Inthispost,Iamgoingtodiscussthemostfrequentlyusedpandasfeatures.Iwillbeusingoliveoildatasetforthistutorial,youcandownloadthedatasetfromthispage(scrolldowntodatasection).Apartfromservingasaquickreference,IhopethispostwillhelpyoutoquicklystartextractingvaluefromPandas.Soletsgetstarted!1)LoadingData“TheOliveOilsdatasethaseightexplanatoryvariables(levelsoffattyacidsintheoils)andnineclasses(areasofItaly)”.FormoreinformationyoucancheckmyIpythonnotebook.Iamimportingnumpy,pandasandmatplotlibmodules.1234%matplotlibinlineimportnumpyasnpimportmatplotlib.pyplotaspltimportpandasaspdIamusingpd.read_csvtoloadoliveoildataset.Functionheadreturnsthefirstnrowsof‘olive.csv’.HereIamreturningthefirst5rows.2)RenameFunctionIamgoingtorenamethefirstcolumn(‘Unnamed:0)to‘area_Idili’.Renamefunctionasanargumentittakesadictionaryofcolumnnamesthatshouldberenamedaskeys(olive_oil.columns[0])andthenewtitle(‘area_Idili’)tobethevalue.Olive_oil.columnswillreturnthecolumnnames.inplace=TrueisusedincaseyouwanttomodifytheexistingDataFrame.3)MapOnethingthatIwanttodoistocleanthearea_Idlicolumnandremovethenumbers.Iamusingmapobjecttoperformthisoperation.Mappropertyapplieschangestoeveryelementofacolumn.Iamapplyingsplitfunctiontocolumnarea_idili.Splitfunctionreturnsalist,and-1returnsthelastelementofthelist.Adetailedexplanationoflambdaisgivenhere.Seehowsplitfunctionworks:4)ApplyandApplyMapIhavealistofacidscalledacidlist.Applyisaprettyflexiblefunction,itappliesafunctionalonganyaxisoftheDataFrame.Iwillbeusingapplyfunctiontodivideeachvalueoftheacidby100.list_of_acids=[‘palmitic’,‘palmitoleic’,‘stearic’,‘oleic’,‘linoleic’,‘linolenic’,‘arachidic’,‘eicosenoic’]12df=olive_oil[list_of_acids].apply(lambdax:x/100.00)df.head(5)Similartoapply,applymapfunctionworkselement-wiseonaDataFrame.Summingup,applyworksonarow/columnbasisofaDataFrame,applymapworkselement-wiseonaDataFrame,andmapworkselement-wiseonaSeries.5)ShapeandColumnsShapepropertywillreturnatupleoftheshapeofthedataframe.olive_oil.columnswillgiveyouthecolumnvalues.6)UniquefunctionOlive_oil.region.unique()willreturnuniqueentriesinregioncolumn,therearethreeuniqueregions(1,2,3).Iamapplyingthesameuniquepropertytoareacolumn,thereare9uniqueareas.7)CrossTabCrossTabcomputesthesimplecrosstabulationoftwofactors.HereIamapplyingcrosstabulationtoareaandregioncolumns.8)AccessingSubdataframesThesyntaxforindexingmultiplecolumnsisgivenbelow.Toindexasinglecolumnyoucanuseolive_oil[‘palmitic’]orolive_oil.palmitic.9)Plottingplt.hist(olive_oil.palmitic).Youcanplothistogramusingplt.histfunction.Youcanalsogeneratesubplotsofpandasdataframe.HereIamgenerating4differentsubplotsforpalmiticandlinoleniccolumns.Youcansetthesizeofthefigureusingfigsizeobject,nrowsandncolsarenothingbutthenumberofcolumnsandrows.10)GroupbyandStatisticsGroupbygroupsthedatainto3parts(region1,2and3).Thefunctiongroupbygivesdictionarylikeobject.HereIamgroupingbyregions[olive_oil.groupby(‘region’)].Iamapplyingdescribeonthegroup,describetakesanydataframeandcomputestatisticsonit.Thisisthequickwayofgettingstatisticsbygroupofanydataframe.Youcanalsocalculatestandarddeviationoftheregion_groupbyusingolive_oil.groupby(‘region’).std()11)AggregatefunctionAggregatefunctiontakesafunctionasanargumentandappliesthefunctiontocolumnsinthegroupbysubdataframe.Iamapplyingnp.mean(computesmean)onallthreeregions.12)JoinIamrenamingolmeanandolstdcolumns.In[34]:list_of_acids=[‘palmitic’,‘palmitoleic’,‘stearic’,‘oleic’,‘linoleic’,‘linolenic’,‘arachidic’,‘eicosenoic’]Pandascandogeneralmerges.Whenwedothatalonganindex,it’scalledajoin.HereImaketwosub-dataframesandjointhemonthecommonregionindex.13)MaskingYoucanalsomaskaparticularpartofthedataframe.olive_oil.eicosenoic0.05willcheckifeachvalueincolumneicosenoicislessthan0.05,ifthevalueislessthan0.05thenitwillreturntrue,elseitwillreturnfalse.In[29]:eico=(olive_oil.eicosenoic0.05)14)HandlingMissingValuesMissingdataiscommoninmostdataanalysisapplications.Ifinddropnaandfillnafunctionveryusefulwhilehandlingmissingdata.Iamcreatinganewdataframe.Thedropnacanusedtodroprowsorcolumnswithmissingdata(None).Bydefault,itdropsallrowswithanymissingentry.fillnacanbeusedtofillmissingdata(None).First,Iamcreatingadataframewithasinglecolumn.IamusingfillnareplacesthemissingvalueswiththemeanofDataFrame(data).ConclusionThesearesomeoftheimportantfunctionsIusefrequentlywhilecleaningdata.IhighlyrecommendWesMicknney’sPythonforDataAnalysisbookforlearningpandas.IsthereanyotherimportantpandasfunctionthatImissed?ManuJeevanisaDatascienceandAnalyticsbloggeratBigDataExaminer,wherehewritesaboutDataScience,PythonandDigitalanalytics.Photocredit:Smithsonian’sNationalZoo/Foter原文出处:
本文标题:Python下Pandas的14个最佳特色功能
链接地址:https://www.777doc.com/doc-4429894 .html