您好,欢迎访问三七文档
当前位置:首页 > 行业资料 > 国内外标准规范 > 10MinutesToPandas(中文版)
本文是对pandas官网上《10 Minutes to pandas》的一个简单的翻译,原文在这里(docs/stable/10min.html)。这篇文章是对pandas的一个简单的介绍,详细的介绍请参考:Cookbook (docs/stable/cookbook.html#cookbook)。习惯上,我们会按下面格式引入所需要的包:In [2]:importpandasaspdIn [3]:importnumpyasnpIn [4]:importmatplotlib.pyplotasplt一、创建对象可以通过Data Structure Intro Section (docs/stable/dsintro.html#dsintro)来查看有关该节内容的详细信息。1. 可以通过传递一个list对象来创建一个Series,pandas会默认创建整型索引:In [5]:s=pd.Series([1,3,5,np.nan,6,8])In [6]:s2.通过传递一个numpy array,时间索引以及列标签来创建一个DataFrame:In [7]:dates=pd.date_range('20130101',periods=6)Out[6]:0113253NaN4658dtype:float64In [8]:datesIn [9]:df=pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))In [10]:df3.通过传递一个能够被转换成类似序列结构的字典对象来创建一个DataFrame:In [11]:df2=pd.DataFrame({'A':1.,'B':pd.Timestamp('20130102'),'C':pd.Series(1,index=list(range(4)),dtype='float32'),'D':np.array([3]*4,dtype='int32'),'E':pd.Categorical(['test','train','test','train']),'F':'foo'})In [12]:df2Out[8]:DatetimeIndex(['2013-01-01','2013-01-02','2013-01-03','2013-01-04','2013-01-05','2013-01-06'],dtype='datetime64[ns]',freq='D')Out[10]:ABCD201301010.3976361.0918430.6218260.643196201301020.0477350.6936751.5889320.774878201301030.3094500.9517790.0280302.033464201301040.9059280.8072861.0328490.160724201301050.1951940.2218921.8430141.704663201301060.7730581.4861860.8764082.159035Out[12]:ABCDEF012013010213testfoo112013010213trainfoo212013010213testfoo312013010213trainfoo4.查看不同列的数据类型:In [13]:df2.dtypes5.使用Tab自动补全功能会自动识别所有的属性以及自定义的列二、查看数据详情请参阅:Basics Section (docs/stable/basics.html#basics)1.查看Frame中头部和尾部的行:In [14]:df.head()In [15]:df.tail(3)2.显示索引、列和底层的numpy数据:Out[13]:Afloat64Bdatetime64[ns]Cfloat32Dint32EcategoryFobjectdtype:objectOut[14]:ABCD201301010.3976361.0918430.6218260.643196201301020.0477350.6936751.5889320.774878201301030.3094500.9517790.0280302.033464201301040.9059280.8072861.0328490.160724201301050.1951940.2218921.8430141.704663Out[15]:ABCD201301040.9059280.8072861.0328490.160724201301050.1951940.2218921.8430141.704663201301060.7730581.4861860.8764082.159035In [16]:df.indexIn [17]:df.columnsIn [18]:df.values3.describe()函数对于数据的快速统计汇总:In [19]:df.describe()4.对数据的转置(tranverse):Out[16]:DatetimeIndex(['2013-01-01','2013-01-02','2013-01-03','2013-01-04','2013-01-05','2013-01-06'],dtype='datetime64[ns]',freq='D')Out[17]:Index(['A','B','C','D'],dtype='object')Out[18]:array([[-0.39763564,1.09184269,0.62182584,0.64319558],[0.0477345,-0.69367452,1.58893209,-0.77487822],[-0.30944982,-0.95177856,0.0280302,-2.03346358],[-0.9059281,0.80728618,1.03284944,0.16072416],[-0.19519391,-0.22189217,-1.84301383,-1.7046631],[0.77305772,-1.48618624,0.87640846,2.15903469]])Out[19]:ABCDcount6.0000006.0000006.0000006.000000mean0.1645690.2424000.3841720.258342std0.5567911.0135421.2048401.571102min0.9059281.4861861.8430142.03346425%0.3755890.8872530.1764791.47221750%0.2523220.4577830.7491170.30707775%0.0129980.5499920.9937390.522578max0.7730581.0918431.5889322.159035In [20]:df.T5.按轴进行排序In [21]:df.sort_index(axis=1,ascending=False)6.按值进行排序In [22]:df.sort(columns='B')Out[20]:2013010100:00:002013010200:00:002013010300:00:002013010400:00:002013010500:00:002013010600:00:00A0.3976360.0477350.3094500.9059280.1951940.773058B1.0918430.6936750.9517790.8072860.2218921.486186C0.6218261.5889320.0280301.0328491.8430140.876408D0.6431960.7748782.0334640.1607241.7046632.159035Out[21]:DCBA201301010.6431960.6218261.0918430.397636201301020.7748781.5889320.6936750.047735201301032.0334640.0280300.9517790.309450201301040.1607241.0328490.8072860.905928201301051.7046631.8430140.2218920.195194201301062.1590350.8764081.4861860.773058Out[22]:ABCD201301060.7730581.4861860.8764082.159035201301030.3094500.9517790.0280302.033464201301020.0477350.6936751.5889320.774878201301050.1951940.2218921.8430141.704663201301040.9059280.8072861.0328490.160724201301010.3976361.0918430.6218260.643196三、选择虽然标准的Python/Numpy的选择和设置表达式都能够直接派上用场,但是作为工程使用的代码,我们推荐使用经过优化的pandas数据访问方式:.at,.iat, .loc,.iloc和.ix详情请参阅Indexing and Selecting Data(docs/stable/indexing.html#indexing) 和MultiIndex / Advanced Indexing(docs/stable/advanced.html#advanced)获取1.选择一个单独的列,这将会返回一个Series,等同于df.A:In [23]:dfIn [24]:df['A']2.通过[]进行选择,这将会对行进行切片Out[23]:ABCD201301010.3976361.0918430.6218260.643196201301020.0477350.6936751.5889320.774878201301030.3094500.9517790.0280302.033464201301040.9059280.8072861.0328490.160724201301050.1951940.2218921.8430141.704663201301060.7730581.4861860.8764082.159035Out[24]:2013-01-01-0.3976362013-01-020.0477352013-01-03-0.3094502013-01-04-0.9059282013-01-05-0.1951942013-01-060.773058Freq:D,Name:A,dtype:float64In [25]:df[0:3]In [26]:df['20130102':'20130104']通过标签选择1.使用标签来获取一个交叉的区域In [27]:dfOut[25]:ABCD201301010.3976361.0
本文标题:10MinutesToPandas(中文版)
链接地址:https://www.777doc.com/doc-4444986 .html