创建dataframe
dataframe本质上就是一个表格数据结构,有列索引和行索引
行索引,表明不同行,横向索引,叫index,0轴,axis=0
列索引,表明不同列,纵向索引,叫columns,1轴, axis=1
pandas.DataFrame( data, index, columns, dtype, copy)
例子
import pandas as pd
data = [['Google',10],['Runoob',12],['Wiki',13]]
df = pd.DataFrame(data,columns=['Site','Age'],dtype=float)
print(df)
import pandas as pd
data = {'Site':['Google', 'Runoob', 'Wiki'], 'Age':[10, 12, 13]}
df = pd.DataFrame(data)
print (df)
import pandas as pd
movie_ratings = {'Toy Story':{'rating':4.0, 'genre':'Animation'},
'Jumanji':{'rating':4.0, 'genre':'Adventure'},
'Grumpier Old Men':{'rating':4.0, 'genre':'Comedy'},
'Waiting to Exhale':{'rating':4.0, 'genre':'Comedy'}
}
ratings_df = pd.DataFrame(movie_ratings)
ratings_df = ratings_df.transpose()#实现横排变成竖排
print(ratings_df)
# 添加新的index名字
ratings_df = ratings_df.rename_axis('title').reset_index()
读入csv
df = pd.read_csv('nba.csv')
经典画图操作
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
df = pd.read_csv('E:\补课\MESS\Praktikum\P2\Aufgabe_2_2_FFT_sinus.csv')
g = sns.relplot(x='Hertz',y= 'dBV', estimator=None,kind='line',data=df)
g.figure.autofmt_xdate()
plt.ticklabel_format(style= 'plain')
plt.show()
g.savefig('E:\补课\MESS\Praktikum\P2\Bild/sinus.png')
dataframe的基本操作
重要方法
df.head()
不加参数默认显示前5行
df.tail()
df.shape()
df.info()
df.describe()
df.columns()
df.index()
数据选取
更改列名
a.rename(columns={'A':'a','B':'b'},inplace = True)
根据标签索引数据
直接索引
先列后行(与习惯的先行后列相反),需要通过索引的字符串进行获取
data["open"]["2018-02-27"]
loc
loc是按照标签label选取
df.loc[0:5,:]#选取0到5行,显示所有列,这里的0和5都是标签恰好是数字
iloc
iloc是按照位置选取,只是行和列都是从0开始计算,并且iloc的0:X中不包括X,只能到X-1
stock.iloc[0:6,[0,3,4,5]]
ix
ix为loc和iloc的综合版本,既可以按位置索引,也可以按照标签索引
stock.ix[2:5,'股票代码':'当前价']
排序
使用df.sort_values(by=, ascending=)
参数:
by: 指定排序参考的键
ascending:默认升序
ascending=False:降序
ascending=True:升序
ratings_df.sort_values("rating")
ratings_df.sort_values("rating", ascending=False)
ratings_df.sort_values(["rating", "time"], ascending=[False, True])
过滤filter
ratings_df[ratings_df["userId"]==1]
# Filtering for rows where rating=5.0
high_ratings_df = ratings_df[ratings_df["rating"]==5.0]
# Print high_ratings_df
print(high_ratings_df)
isin()方法
# also results in Boolean series
is_movie123 = ratings_df['movieId'].isin([1, 2, 3])
#subset ratings_df with is_movie123
ratings_df[is_movie123]
算术运算
自定义运算
# 'open' 'close'两列的最大值-最小值结果
data[["open","close"]].apply(lambda x: x.max() - x.min(),axis=0)
# open 22.74
# close 22.85
# dtype: float64
# 每一行的open close 最大值-最小值
data[["open","close"]].apply(lambda x: x.max() - x.min(),axis=1)
Comments | NOTHING