奥运会自1896年开始,每4年举办一次,从1904年起,夏季奥运会的每个项目都会颁发奖牌。夏季奥运会取得的成功促使了冬季奥林匹克运动会的产生。 本案例的目的在于通过一个实际的奥运会运动员数据集,让大家快速掌握数据可视化的基本操作,熟悉使用Python进行简单数据处理的过程,为之后的机器学习建模做好铺垫。

列名 含义
ID 编号
Name 名字
Sex 性别
Age 年龄
Height 身高
Weight 体重
Team 队伍
NOC 国家/地区代号
Games 奥运会名称
Year 时间
Season 季节
City 城市
Sport 运动大类
Event 具体运动项目
Medal 奖牌

数据集回顾

In [2]:
import pandas as pd
import numpy as np 
import matplotlib.font_manager as fm 
myfont = fm.FontProperties(fname='./input/simhei.ttf') ## 加入中文字体
athlete = pd.read_csv('./input/athlete_events.csv') ##导入数据
In [3]:
from IPython.core.interactiveshell import InteractiveShell 
InteractiveShell.ast_node_interactivity = 'all' ##显示cell中的所有结果
In [4]:
athlete.shape
athlete.columns
Out[4]:
(271116, 15)
Out[4]:
Index(['ID', 'Name', 'Sex', 'Age', 'Height', 'Weight', 'Team', 'NOC', 'Games',
       'Year', 'Season', 'City', 'Sport', 'Event', 'Medal'],
      dtype='object')
In [5]:
athlete.isnull().any() ##检查数据集中每列是否有缺失值(any和all)
Out[5]:
ID        False
Name      False
Sex       False
Age        True
Height     True
Weight     True
Team      False
NOC       False
Games     False
Year      False
Season    False
City      False
Sport     False
Event     False
Medal      True
dtype: bool
In [6]:
### 填补缺失值
athlete['Height'] = athlete['Height'].fillna(athlete['Height'].mean())
athlete['Weight'] = athlete['Weight'].fillna(athlete['Weight'].mean())
athlete = athlete[athlete['Age'].notnull()]
athlete['Medal'] = athlete['Medal'].fillna('NoMedal')
In [7]:
athlete.isnull().any() ##再次验证
Out[7]:
ID        False
Name      False
Sex       False
Age       False
Height    False
Weight    False
Team      False
NOC       False
Games     False
Year      False
Season    False
City      False
Sport     False
Event     False
Medal     False
dtype: bool

数据可视化

In [9]:
## 常用的工具
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

Matplotlib和Seaborn简介

绘图要素:

  • 图表类型(条形图、直方图、散点图、饼图等)
  • 坐标轴(x,y轴是什么、刻度)
  • 标题
  • What's more: 颜色、背景、点的形状和大小、长方形的宽度、是否有因子

绘图步骤:

  • 第一步:创建画布(子图)

  • plot1=plt.figure(figsize=(8,6),dpi=144) #设置大小和清晰度

  • 第二步:画布内容

  • plt.title('标题') #添加标题

  • plt.xlabel('x') #添加x轴的名称

  • plt.ylabel('y') #添加y轴的名称

  • plt.xlim((0,1)) #x轴刻度范围

  • plt.ylim((0,1)) #y轴刻度范围

  • plt.xticks([0,1,2,3,4]) #x轴的刻度值

  • plt.yticks([0,1,2,3,4]) #y轴的刻度值

  • 第三步:开始绘图

In [74]:
## 1、Matplotlib-散点图(一般用于连续性数据)
In [75]:
athlete.columns
Out[75]:
Index(['ID', 'Name', 'Sex', 'Age', 'Height', 'Weight', 'Team', 'NOC', 'Games',
       'Year', 'Season', 'City', 'Sport', 'Event', 'Medal'],
      dtype='object')
In [10]:
plt.figure(figsize=(8,6))
plt.scatter(x=athlete['Height'],y=athlete['Weight'])
plt.title('身高和体重的散点图',fontproperties=myfont) #添加标题
plt.xlabel('身高',fontproperties=myfont) #添加x轴的名称
plt.ylabel('体重',fontproperties=myfont) #添加y轴的名称
Out[10]:
<Figure size 576x432 with 0 Axes>
Out[10]:
<matplotlib.collections.PathCollection at 0x7f5df909d390>
Out[10]:
Text(0.5,1,'身高和体重的散点图')
Out[10]:
Text(0.5,0,'身高')
Out[10]:
Text(0,0.5,'体重')
In [77]:
## 2、Matplotlib-折线图(属性值的变化趋势)
In [12]:
plt.figure(figsize=(8,6))
plt.plot(athlete['Age'][0:10])
plt.title('年龄的折线图',fontproperties=myfont,size=20) #添加标题
plt.ylabel('年龄',fontproperties=myfont,size=20) #添加y轴的名称
In [79]:
## 3、Matplotlib-直方图(表示属性值在某区间的数量)
In [13]:
plt.figure(figsize=(8,6))
plt.hist(athlete['Age']) ##垂直:vertical
plt.title('年龄的条形图',fontproperties=myfont,size=20) #添加标题
plt.xlabel('年龄',fontproperties=myfont,size=20) #添加x轴的名称
plt.ylabel('数量',fontproperties=myfont,size=20) #添加y轴的名称
Out[13]:
<Figure size 576x432 with 0 Axes>
Out[13]:
(array([2.06600e+04, 1.63881e+05, 6.32200e+04, 9.47500e+03, 3.11100e+03,
        8.55000e+02, 3.39000e+02, 9.00000e+01, 9.00000e+00, 2.00000e+00]),
 array([10. , 18.7, 27.4, 36.1, 44.8, 53.5, 62.2, 70.9, 79.6, 88.3, 97. ]),
 <a list of 10 Patch objects>)
Out[13]:
Text(0.5,1,'年龄的条形图')
Out[13]:
Text(0.5,0,'年龄')
Out[13]:
Text(0,0.5,'数量')
In [81]:
## 4、Matplotlib-箱线图(检测异常值)
In [82]:
plt.figure(figsize=(8,6))
plt.boxplot(athlete['Weight']) ##垂直:vertical
plt.title('体重的箱线图',fontproperties=myfont,size=20) #添加标题
plt.xlabel('体重',fontproperties=myfont,size=20) #添加x轴的名称
Out[82]:
<Figure size 576x432 with 0 Axes>
Out[82]:
{'boxes': [<matplotlib.lines.Line2D at 0x7f78a24e3dd8>],
 'caps': [<matplotlib.lines.Line2D at 0x7f78a24ed828>,
  <matplotlib.lines.Line2D at 0x7f78a24edc50>],
 'fliers': [<matplotlib.lines.Line2D at 0x7f78a24f54e0>],
 'means': [],
 'medians': [<matplotlib.lines.Line2D at 0x7f78a24f50b8>],
 'whiskers': [<matplotlib.lines.Line2D at 0x7f78a24e3f28>,
  <matplotlib.lines.Line2D at 0x7f78a24ed400>]}
Out[82]:
Text(0.5,1,'体重的箱线图')
Out[82]:
Text(0.5,0,'体重')
In [83]:
## 7、Seaborn——散点图(配色更好看,种类更多但函数和操作比较简单)
In [84]:
plt.figure(figsize=(30,20))
sns.jointplot(x=athlete['Height'],y=athlete['Weight'],kind='reg')
Out[84]:
<Figure size 2160x1440 with 0 Axes>
Out[84]:
<seaborn.axisgrid.JointGrid at 0x7f7883eaca20>
<Figure size 2160x1440 with 0 Axes>
In [85]:
## 8、Seaborn——箱线图
In [14]:
plt.figure(figsize=(30,20))
plot8=sns.boxplot(x='Sex',y='Weight',data=athlete)
Out[14]:
<Figure size 2160x1440 with 0 Axes>

Let's Start

In [17]:
## 金银铜牌合并为得奖
pd.options.mode.chained_assignment = None 
athlete.Medal[(athlete['Medal']!='NoMedal')]= 'Medal'
In [88]:
athlete.head()
Out[88]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
0 1 A Dijiang M 24.0 180.00000 80.000000 China CHN 1992 Summer 1992 Summer Barcelona Basketball Basketball Men's Basketball NoMedal
1 2 A Lamusi M 23.0 170.00000 60.000000 China CHN 2012 Summer 2012 Summer London Judo Judo Men's Extra-Lightweight NoMedal
2 3 Gunnar Nielsen Aaby M 24.0 175.33897 70.702393 Denmark DEN 1920 Summer 1920 Summer Antwerpen Football Football Men's Football NoMedal
3 4 Edgar Lindenau Aabye M 34.0 175.33897 70.702393 Denmark/Sweden DEN 1900 Summer 1900 Summer Paris Tug-Of-War Tug-Of-War Men's Tug-Of-War Medal
4 5 Christine Jacoba Aaftink F 21.0 185.00000 82.000000 Netherlands NED 1988 Winter 1988 Winter Calgary Speed Skating Speed Skating Women's 500 metres NoMedal
In [18]:
medals = athlete[athlete['Medal'] == 'Medal'] ##将得奖的分离出来
In [90]:
medals.head()
Out[90]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
3 4 Edgar Lindenau Aabye M 34.0 175.33897 70.702393 Denmark/Sweden DEN 1900 Summer 1900 Summer Paris Tug-Of-War Tug-Of-War Men's Tug-Of-War Medal
37 15 Arvo Ossian Aaltonen M 30.0 175.33897 70.702393 Finland FIN 1920 Summer 1920 Summer Antwerpen Swimming Swimming Men's 200 metres Breaststroke Medal
38 15 Arvo Ossian Aaltonen M 30.0 175.33897 70.702393 Finland FIN 1920 Summer 1920 Summer Antwerpen Swimming Swimming Men's 400 metres Breaststroke Medal
40 16 Juhamatti Tapio Aaltonen M 28.0 184.00000 85.000000 Finland FIN 2014 Winter 2014 Winter Sochi Ice Hockey Ice Hockey Men's Ice Hockey Medal
41 17 Paavo Johannes Aaltonen M 28.0 175.00000 64.000000 Finland FIN 1948 Summer 1948 Summer London Gymnastics Gymnastics Men's Individual All-Around Medal

1. 年龄

In [91]:
plt.figure(figsize=(30, 10)) ##设置图片大小
sns.countplot(medals['Age']) ##选择图表类型
plt.title('奖牌随年龄的分布', fontproperties=myfont,size=20) ##设置标题
Out[91]:
<Figure size 2160x720 with 0 Axes>
Out[91]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f78a2855d68>
Out[91]:
Text(0.5,1,'奖牌随年龄的分布')
In [92]:
plt.figure(figsize=(20, 10))
sns.countplot(medals.Sport[medals['Age'] > 50])
plt.title('年龄大于50的得奖数',fontproperties=myfont,size=20)
## 射击、马术、航海、赛艇、美术比赛、爬山、 射箭、冰壶、槌球、击剑
Out[92]:
<Figure size 1440x720 with 0 Axes>
Out[92]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f78a21f3ba8>
Out[92]:
Text(0.5,1,'年龄大于50的得奖数')
In [93]:
country=(medals.Team[medals['Age'] > 50]).value_counts()
Out[93]:
array([28, 23, 21, 18, 12,  8,  6,  5,  5,  5,  4,  4,  3,  3,  3,  2,  2,
        2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1])
In [94]:
plt.figure(figsize=(20, 10))
sns.barplot(x=country.index[0:9],y=country.values[0:9])
#sns.countplot(medals.Team[medals['Age'] > 50])
plt.title('年龄大于50的得奖数(前10名)',fontproperties=myfont,size=20)
Out[94]:
<Figure size 1440x720 with 0 Axes>
Out[94]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7883b25550>
Out[94]:
Text(0.5,1,'年龄大于50的得奖数(前10名)')

2. 性别

In [95]:
plt.figure(figsize=(20, 10))
sns.countplot(x='Year', data=medals, hue='Sex')
plt.title('得奖数随年龄的变化 ',fontproperties=myfont,size=20)
Out[95]:
<Figure size 1440x720 with 0 Axes>
Out[95]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f78832a8be0>
Out[95]:
Text(0.5,1,'得奖数随年龄的变化 ')
In [96]:
## 直观观察,计算比率
In [98]:
#gender = medals.pivot_table(medals, index=['Year'],columns='Sex') 
#gender
In [19]:
#gender = medals.pivot_table(medals, index=['Year'], columns='Sex', aggfunc=len) ## len为统计个数
gender = medals.pivot_table(medals, index=['Year','Games'], columns='Sex', aggfunc=len).reset_index()[['Year','Games','Sport']]
In [100]:
gender
Out[100]:
Year Games Sport
Sex F M
0 1896 1896 Summer NaN 101.0
1 1900 1900 Summer 11.0 457.0
2 1904 1904 Summer 9.0 413.0
3 1906 1906 Summer 2.0 305.0
4 1908 1908 Summer 15.0 750.0
5 1912 1912 Summer 30.0 904.0
6 1920 1920 Summer 44.0 1134.0
7 1924 1924 Summer 44.0 737.0
8 1924 1924 Winter 6.0 114.0
9 1928 1928 Summer 92.0 620.0
10 1928 1928 Winter 6.0 82.0
11 1932 1932 Summer 62.0 573.0
12 1932 1932 Winter 6.0 86.0
13 1936 1936 Summer 88.0 828.0
14 1936 1936 Winter 9.0 99.0
15 1948 1948 Summer 100.0 736.0
16 1948 1948 Winter 15.0 120.0
17 1952 1952 Summer 140.0 756.0
18 1952 1952 Winter 18.0 118.0
19 1956 1956 Summer 143.0 737.0
20 1956 1956 Winter 27.0 123.0
21 1960 1960 Summer 156.0 752.0
22 1960 1960 Winter 39.0 108.0
23 1964 1964 Summer 200.0 827.0
24 1964 1964 Winter 46.0 140.0
25 1968 1968 Summer 215.0 840.0
26 1968 1968 Winter 46.0 153.0
27 1972 1972 Summer 250.0 964.0
28 1972 1972 Winter 45.0 154.0
29 1976 1976 Summer 389.0 931.0
30 1976 1976 Winter 51.0 160.0
31 1980 1980 Summer 434.0 950.0
32 1980 1980 Winter 51.0 167.0
33 1984 1984 Summer 494.0 981.0
34 1984 1984 Winter 54.0 168.0
35 1988 1988 Summer 554.0 1028.0
36 1988 1988 Winter 63.0 200.0
37 1992 1992 Summer 592.0 1120.0
38 1992 1992 Winter 99.0 219.0
39 1994 1994 Winter 108.0 223.0
40 1996 1996 Summer 764.0 1078.0
41 1998 1998 Winter 189.0 251.0
42 2000 2000 Summer 880.0 1124.0
43 2002 2002 Winter 208.0 270.0
44 2004 2004 Summer 898.0 1103.0
45 2006 2006 Winter 231.0 295.0
46 2008 2008 Summer 932.0 1116.0
47 2010 2010 Winter 229.0 291.0
48 2012 2012 Summer 918.0 1023.0
49 2014 2014 Winter 265.0 332.0
50 2016 2016 Summer 969.0 1054.0
In [20]:
gender.columns = ['Year','Games','F','M']
In [102]:
gender.fillna(0,inplace=True)
gender
Out[102]:
Year Games F M
0 1896 1896 Summer 0.0 101.0
1 1900 1900 Summer 11.0 457.0
2 1904 1904 Summer 9.0 413.0
3 1906 1906 Summer 2.0 305.0
4 1908 1908 Summer 15.0 750.0
5 1912 1912 Summer 30.0 904.0
6 1920 1920 Summer 44.0 1134.0
7 1924 1924 Summer 44.0 737.0
8 1924 1924 Winter 6.0 114.0
9 1928 1928 Summer 92.0 620.0
10 1928 1928 Winter 6.0 82.0
11 1932 1932 Summer 62.0 573.0
12 1932 1932 Winter 6.0 86.0
13 1936 1936 Summer 88.0 828.0
14 1936 1936 Winter 9.0 99.0
15 1948 1948 Summer 100.0 736.0
16 1948 1948 Winter 15.0 120.0
17 1952 1952 Summer 140.0 756.0
18 1952 1952 Winter 18.0 118.0
19 1956 1956 Summer 143.0 737.0
20 1956 1956 Winter 27.0 123.0
21 1960 1960 Summer 156.0 752.0
22 1960 1960 Winter 39.0 108.0
23 1964 1964 Summer 200.0 827.0
24 1964 1964 Winter 46.0 140.0
25 1968 1968 Summer 215.0 840.0
26 1968 1968 Winter 46.0 153.0
27 1972 1972 Summer 250.0 964.0
28 1972 1972 Winter 45.0 154.0
29 1976 1976 Summer 389.0 931.0
30 1976 1976 Winter 51.0 160.0
31 1980 1980 Summer 434.0 950.0
32 1980 1980 Winter 51.0 167.0
33 1984 1984 Summer 494.0 981.0
34 1984 1984 Winter 54.0 168.0
35 1988 1988 Summer 554.0 1028.0
36 1988 1988 Winter 63.0 200.0
37 1992 1992 Summer 592.0 1120.0
38 1992 1992 Winter 99.0 219.0
39 1994 1994 Winter 108.0 223.0
40 1996 1996 Summer 764.0 1078.0
41 1998 1998 Winter 189.0 251.0
42 2000 2000 Summer 880.0 1124.0
43 2002 2002 Winter 208.0 270.0
44 2004 2004 Summer 898.0 1103.0
45 2006 2006 Winter 231.0 295.0
46 2008 2008 Summer 932.0 1116.0
47 2010 2010 Winter 229.0 291.0
48 2012 2012 Summer 918.0 1023.0
49 2014 2014 Winter 265.0 332.0
50 2016 2016 Summer 969.0 1054.0
In [21]:
gender['ratio'] = gender['F'] /(gender['F'] + gender['M'])
In [22]:
def data(a):
    if a==0:
        a=0
    elif 0<a<=0.15:
        a=0.15
    elif 0.15<a<=0.3:
        a=0.3
    else:
        a=0.45
    return a
In [23]:
gender.ratio=gender.ratio.apply(data)
In [106]:
gender
Out[106]:
Year Games F M ratio
0 1896 1896 Summer 0.0 101.0 0.00
1 1900 1900 Summer 11.0 457.0 0.15
2 1904 1904 Summer 9.0 413.0 0.15
3 1906 1906 Summer 2.0 305.0 0.15
4 1908 1908 Summer 15.0 750.0 0.15
5 1912 1912 Summer 30.0 904.0 0.15
6 1920 1920 Summer 44.0 1134.0 0.15
7 1924 1924 Summer 44.0 737.0 0.15
8 1924 1924 Winter 6.0 114.0 0.15
9 1928 1928 Summer 92.0 620.0 0.15
10 1928 1928 Winter 6.0 82.0 0.15
11 1932 1932 Summer 62.0 573.0 0.15
12 1932 1932 Winter 6.0 86.0 0.15
13 1936 1936 Summer 88.0 828.0 0.15
14 1936 1936 Winter 9.0 99.0 0.15
15 1948 1948 Summer 100.0 736.0 0.15
16 1948 1948 Winter 15.0 120.0 0.15
17 1952 1952 Summer 140.0 756.0 0.30
18 1952 1952 Winter 18.0 118.0 0.15
19 1956 1956 Summer 143.0 737.0 0.30
20 1956 1956 Winter 27.0 123.0 0.30
21 1960 1960 Summer 156.0 752.0 0.30
22 1960 1960 Winter 39.0 108.0 0.30
23 1964 1964 Summer 200.0 827.0 0.30
24 1964 1964 Winter 46.0 140.0 0.30
25 1968 1968 Summer 215.0 840.0 0.30
26 1968 1968 Winter 46.0 153.0 0.30
27 1972 1972 Summer 250.0 964.0 0.30
28 1972 1972 Winter 45.0 154.0 0.30
29 1976 1976 Summer 389.0 931.0 0.30
30 1976 1976 Winter 51.0 160.0 0.30
31 1980 1980 Summer 434.0 950.0 0.45
32 1980 1980 Winter 51.0 167.0 0.30
33 1984 1984 Summer 494.0 981.0 0.45
34 1984 1984 Winter 54.0 168.0 0.30
35 1988 1988 Summer 554.0 1028.0 0.45
36 1988 1988 Winter 63.0 200.0 0.30
37 1992 1992 Summer 592.0 1120.0 0.45
38 1992 1992 Winter 99.0 219.0 0.45
39 1994 1994 Winter 108.0 223.0 0.45
40 1996 1996 Summer 764.0 1078.0 0.45
41 1998 1998 Winter 189.0 251.0 0.45
42 2000 2000 Summer 880.0 1124.0 0.45
43 2002 2002 Winter 208.0 270.0 0.45
44 2004 2004 Summer 898.0 1103.0 0.45
45 2006 2006 Winter 231.0 295.0 0.45
46 2008 2008 Summer 932.0 1116.0 0.45
47 2010 2010 Winter 229.0 291.0 0.45
48 2012 2012 Summer 918.0 1023.0 0.45
49 2014 2014 Winter 265.0 332.0 0.45
50 2016 2016 Summer 969.0 1054.0 0.45
In [24]:
plt.figure(figsize=(30,20)) 
plt.scatter(gender['M'],gender['F'],s=300)
plt.xlabel('M',size=20)
plt.ylabel('F',size=20)
plt.title('男性与女性得奖数的对比', fontproperties=myfont,size=24)
Out[24]:
<Figure size 2160x1440 with 0 Axes>
Out[24]:
<matplotlib.collections.PathCollection at 0x7f5df163f5f8>
Out[24]:
Text(0.5,0,'M')
Out[24]:
Text(0,0.5,'F')
Out[24]:
Text(0.5,1,'男性与女性得奖数的对比')

3.国家

In [108]:
country = medals.Team.value_counts().reset_index(name='medal').head(10)
country
Out[108]:
index medal
0 United States 5205
1 Soviet Union 2451
2 Germany 1970
3 Great Britain 1629
4 Italy 1521
5 France 1520
6 Sweden 1434
7 Australia 1302
8 Canada 1234
9 Hungary 1126
In [109]:
plt.figure(figsize=(30, 10)) ##一定要在开头设置
country_plot = sns.barplot(x='index', y='medal', data=country)
country_plot.set_xlabel("top 10 countries")
country_plot.set_ylabel("numbers of medals")
plt.title('国家的得奖数',fontproperties=myfont,size=20)
Out[109]:
<Figure size 2160x720 with 0 Axes>
Out[109]:
Text(0.5,0,'top 10 countries')
Out[109]:
Text(0,0.5,'numbers of medals')
Out[109]:
Text(0.5,1,'国家的得奖数')

每年国家得奖情况

In [110]:
team = medals.pivot_table(medals, index=['Year','Team'], aggfunc=len).reset_index()[['Year','Team','Sport']]
us=team[team['Team']=='United States']
su=team[team['Team']=='Soviet Union']
germany=team[team['Team']=='Germany']
britain=team[team['Team']=='Great Britain']
italy=team[team['Team']=='Italy']
In [111]:
team
Out[111]:
Year Team Sport
0 1896 Australia 2
1 1896 Australia/Great Britain 2
2 1896 Austria 5
3 1896 Denmark 6
4 1896 Ethnikos Gymnastikos Syllogos 1
5 1896 France 11
6 1896 Germany 29
7 1896 Great Britain 6
8 1896 Great Britain/Germany 2
9 1896 Greece 6
10 1896 Greece-1 2
11 1896 Hungary 6
12 1896 Switzerland 3
13 1896 United States 20
14 1900 A North American Team 3
15 1900 Amateur Athletic Association 5
16 1900 Aschenbrodel 2
17 1900 Australia 5
18 1900 Austria 6
19 1900 BLO Polo Club, Rugby 4
20 1900 Bagatelle Polo Club, Paris 4
21 1900 Belgium 9
22 1900 Belgium-1 1
23 1900 Bohemia 2
24 1900 Bohemia/Great Britain 2
25 1900 Bona Fide 2
26 1900 Brussels Swimming and Water Polo Club 2
27 1900 Brynhild-2 1
28 1900 Canada 2
29 1900 Carabinier-15 1
... ... ... ...
1958 2016 Portugal 1
1959 2016 Puerto Rico 1
1960 2016 Qatar 1
1961 2016 Romania 16
1962 2016 Russia 113
1963 2016 Russia-2 2
1964 2016 Serbia 54
1965 2016 Singapore 1
1966 2016 Slovakia 8
1967 2016 Slovenia 4
1968 2016 South Africa 23
1969 2016 South Korea 24
1970 2016 South Korea-1 2
1971 2016 Spain 43
1972 2016 Spain-2 2
1973 2016 Sweden 28
1974 2016 Switzerland 11
1975 2016 Tajikistan 1
1976 2016 Thailand 6
1977 2016 Trinidad and Tobago 1
1978 2016 Tunisia 3
1979 2016 Turkey 8
1980 2016 Ukraine 15
1981 2016 United Arab Emirates 1
1982 2016 United States 256
1983 2016 United States-1 4
1984 2016 United States-2 4
1985 2016 Uzbekistan 13
1986 2016 Venezuela 3
1987 2016 Vietnam 2

1988 rows × 3 columns

In [112]:
plt.figure(figsize=(30, 10))
plt.title('Comparision between top5 countries in years')
plt.plot(us.Year, us.Sport, color='green', label='USA medals')
plt.plot(su.Year, su.Sport, color='red', label='SU medals')
plt.plot(germany.Year, germany.Sport,  color='skyblue', label='Germant medals')
plt.plot( britain.Year,  britain.Sport, color='blue', label='Britain medals')
plt.plot(italy.Year,  italy.Sport, color='blue', label='Italy medals')
plt.legend() # 显示图例
plt.xlabel('Year')
plt.ylabel('Medal_counts')
Out[112]:
<Figure size 2160x720 with 0 Axes>
Out[112]:
Text(0.5,1,'Comparision between top5 countries in years')
Out[112]:
[<matplotlib.lines.Line2D at 0x7f7882cf4eb8>]
Out[112]:
[<matplotlib.lines.Line2D at 0x7f7882c8c9e8>]
Out[112]:
[<matplotlib.lines.Line2D at 0x7f7882c8cef0>]
Out[112]:
[<matplotlib.lines.Line2D at 0x7f7882c98390>]
Out[112]:
[<matplotlib.lines.Line2D at 0x7f7882c98860>]
Out[112]:
<matplotlib.legend.Legend at 0x7f7882c8c4a8>
Out[112]:
Text(0.5,0,'Year')
Out[112]:
Text(0,0.5,'Medal_counts')
In [113]:
team2=pd.concat([us,su,germany,britain,italy],axis=0)
In [114]:
team2
Out[114]:
Year Team Sport
13 1896 United States 20
90 1900 United States 45
134 1904 United States 186
172 1906 United States 24
224 1908 United States 65
275 1912 United States 106
322 1920 United States 194
360 1924 United States 186
406 1928 United States 92
449 1932 United States 190
498 1936 United States 108
552 1948 United States 149
610 1952 United States 144
665 1956 United States 143
719 1960 United States 145
783 1964 United States 165
836 1968 United States 173
891 1972 United States 195
941 1976 United States 173
992 1980 United States 30
1049 1984 United States 359
1112 1988 United States 212
1198 1992 United States 236
1229 1994 United States 19
1320 1996 United States 255
1356 1998 United States 30
1448 2000 United States 240
1485 2002 United States 70
1570 2004 United States 259
1610 2006 United States 48
... ... ... ...
247 1912 Italy 24
301 1920 Italy 84
346 1924 Italy 48
389 1928 Italy 72
431 1932 Italy 77
473 1936 Italy 70
523 1948 Italy 68
585 1952 Italy 54
642 1956 Italy 45
692 1960 Italy 86
757 1964 Italy 51
812 1968 Italy 35
864 1972 Italy 28
915 1976 Italy 35
969 1980 Italy 38
1022 1984 Italy 65
1081 1988 Italy 37
1157 1992 Italy 64
1212 1994 Italy 26
1276 1996 Italy 71
1342 1998 Italy 15
1407 2000 Italy 65
1471 2002 Italy 19
1532 2004 Italy 104
1594 2006 Italy 21
1662 2008 Italy 42
1736 2010 Italy 5
1805 2012 Italy 68
1869 2014 Italy 14
1936 2016 Italy 70

139 rows × 3 columns

In [115]:
g=sns.FacetGrid(team2,col='Team')
g.map(plt.plot,'Year','Sport')
Out[115]:
<seaborn.axisgrid.FacetGrid at 0x7f7882c10b38>
In [ ]: