本案例是Pandas数据分析课程【第三章】数据初步探索配套案例。案例使用酒品消耗量数据集，其记录了全球193个国家某年的各类酒品消耗数据，我们对其进行初步探索。

目录¶

1. 数据源
 2. 数据预处理
     2.1 导入数据
     2.2 数据类型与数据筛选
     2.3 数据基本情况
     2.4 缺失值处理
     2.5 索引调整方法
     2.6 相关性计算
     2.7 多条件筛选
     2.8 不同洲的饮酒情况对比

1. 数据源¶

在本案例中，我们将使用全球193个国家某年的各类酒品消耗数据。主要数据集变量如下：

变量名称	含义说明
country	国家
beer_servings	啤酒消耗量
spirit_servings	烈酒消耗量
wine_servings	红酒消耗量
total_litres_of_pure_alcohol	总酒精消耗量
continent	所在大洲

2. 数据预处理¶

2.1 导入数据¶

import numpy as np
import pandas as pd

本次数据文件为txt格式，我们使用read_table读取。

drink = pd.read_table("./input/drinks.txt")
drink.head()

使用默认的参数进行读取出现了问题，可以看到，数据的分隔符应设定为"|"，我们设置分隔符参数sep为"|"。

drink = pd.read_table("./input/drinks.txt",sep="|")

使用head查看数据前几行，参数n可以指定行数。查看数据前十行。

drink.head(n=10)

tail可以查看数据最后几行，同样可以通过参数n指定行数。查看数据最后五行。

drink.tail(n=5)

2.2 数据类型与数据筛选¶

2.2.1使用dtypes查看数据类型

我们可以使用dtypes查看数据中各变量类型。

drink.dtypes

country                          object
beer_servings                   float64
spirit_servings                 float64
wine_servings                   float64
total_litres_of_pure_alcohol    float64
continent                        object
dtype: object

dtypes的返回值是一个Series，我们可以使用isinstance进行确认，isinstance函数用来判断一个对象是否是一个已知的类型。

isinstance(drink.dtypes,pd.Series)

True

2.2.2使用astype修改数据类型

如果我们想更改某些变量的数据类型，可以使用astype方法。例如，我们将float类型的啤酒消耗量改为object类型。

drink.beer_servings.astype('object')[:5]

0      0
1     89
2     25
3    245
4    217
Name: beer_servings, dtype: object

2.2.3使用select_dtypes查看特定数据类型的数据

有时，数据中的变量很多，我们只想查看特定类型的变量情况，我们应使用select_dtypes。可以通过include参数或exclude参数选择要包含的数据类型或要剔除的数据类型。例如，我们只查看数据中的浮点型数据。

drink.select_dtypes(include=['float64']).head()

2.3 数据基本情况¶

2.3.1使用rename修改列名

中间的几个列名比较复杂，我们使用rename对其进行简化。
在参数中，我们设定了inplace参数为True，这将不创建新的对象，直接对原始对象进行修改，这一参数在许多DataFrame的方法中都有，后文中我们还会多次见到。

drink.rename(columns={'beer_servings':'beer','spirit_servings':'spirit','wine_servings':'wine','total_litres_of_pure_alcohol':'pure_alcohol'},inplace=True)
drink.columns

Index(['country', 'beer', 'spirit', 'wine', 'pure_alcohol', 'continent'], dtype='object')

2.3.2使用info初步查看数据类型和大小

使用info方法初步查看数据的变量类型、缺失值情况、占用空间大小。

drink.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Data columns (total 6 columns):
country         193 non-null object
beer            190 non-null float64
spirit          190 non-null float64
wine            190 non-null float64
pure_alcohol    193 non-null float64
continent       170 non-null object
dtypes: float64(4), object(2)
memory usage: 9.1+ KB

这里我们使用了info的默认参数设置，但若变量数很多，我们也可以设置一些参数来简化输出。例如：如果不关心数据中的缺失值情况，可以设置null_counts参数为False。

drink.info(null_counts=False)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Data columns (total 6 columns):
country         object
beer            float64
spirit          float64
wine            float64
pure_alcohol    float64
continent       object
dtypes: float64(4), object(2)
memory usage: 9.1+ KB

如果不想具体看每一个变量的情况，可以设置verbose参数为False。

drink.info(verbose=False)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Columns: 6 entries, country to continent
dtypes: float64(4), object(2)
memory usage: 9.1+ KB

我们看到info返回结果的开头行可能会误以为info的返回也是一个数据框。但事实上，info方法是直接print出各种汇总结果的，其没有返回值。我们可以通过type函数查看返回的类型。

type(drink.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Data columns (total 6 columns):
country         193 non-null object
beer            190 non-null float64
spirit          190 non-null float64
wine            190 non-null float64
pure_alcohol    193 non-null float64
continent       170 non-null object
dtypes: float64(4), object(2)
memory usage: 9.1+ KB

NoneType

可以看到，返回的类型为NoneType，这就说明返回为None。

2.3.3使用describe查看数据数值统计情况

describe也是一种常用的数据基本描述方法。

drink.describe()

我们也可以通过调整percentile参数来获得需要的百分位数。

drink.describe(percentiles=[0.05,0.95])

describe默认只汇总统计所有连续型变量的情况。如果我们想查看特定类别的列或者排除部分列的结果，可以设定参数include或exclude。
例如我们想查看除了数值类型以外的其他类型变量的情况。

drink.describe(exclude=np.number)

值得一提的是，describe返回的对象其实仍然是一个数据框，因此我们可以很轻松的提取需要的汇总统计量。

type(drink.describe())

pandas.core.frame.DataFrame

des = drink.describe(include="all")
des

des.loc['mean','beer']

107.83684210526316

这等价于：

drink.beer.mean()

107.83684210526316

2.4 缺失值处理¶

下面我们来对数据中的缺失值进行处理。通过前面的的汇总统计我们看到，beer、spirit和wine三个变量中都存在少量缺失值，continent中存在较多缺失。我们将数据中含有缺失值的行提取出来。

首先，我们使用isnull。isnull会对数据框中的每一个元素进行缺失值检查，是缺失值为True，不是为False，最终返回一个数据框。

miss = drink.isnull()
miss.head(6)

下面我们要找到存在缺失值的行，我们可以使用any，并设定axis参数为1。则当每一行中存在True时，就返回True
注：如果我们想找到所有列都为缺失值的行，就可以使用all方法。

miss.any(axis=1)

0      False
1      False
2      False
3      False
4      False
5       True
6      False
7      False
8      False
9      False
10     False
11      True
12     False
13     False
14      True
15     False
16     False
17      True
18     False
19     False
20     False
21     False
22     False
23     False
24     False
25     False
26     False
27     False
28     False
29     False
       ...  
163    False
164    False
165    False
166    False
167    False
168    False
169    False
170    False
171    False
172    False
173    False
174     True
175    False
176    False
177    False
178    False
179    False
180    False
181    False
182    False
183    False
184     True
185    False
186    False
187    False
188    False
189    False
190    False
191    False
192    False
Length: 193, dtype: bool

这样，我们就获得了列中含有缺失值的行的布尔型索引。我们将这部分数据单独查看一下：

drink[miss.any(axis=1)]

可以看到，continent为缺失值的都是北美洲国家，进一步查看原始数据可以发现，其实是因为北美洲的缩写为NA，在读入时被默认为了缺失值。我们对这部分值进行填充。(这提醒我们一定要对缺失原因进行查找，不然可能因此损失有效数据)

设定参数value指定缺失值的填充值为'NA',设定inplace参数对原始数据进行修改。

drink.continent.fillna(value='NA', inplace=True)

beer、spirit和wine数据缺失的情况都发生在同一国家，我们将这三条数据直接删除。

dropna默认会将任意列含有缺失值的行都删除。

drink.dropna(inplace=True)

drink.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 190 entries, 0 to 192
Data columns (total 6 columns):
country         190 non-null object
beer            190 non-null float64
spirit          190 non-null float64
wine            190 non-null float64
pure_alcohol    190 non-null float64
continent       190 non-null object
dtypes: float64(4), object(2)
memory usage: 10.4+ KB

现在，我们得到了190个国家的完整数据。

2.5 索引调整方法¶

2.5.1使用index可以查看数据的索引

drink.index

Int64Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
            ...
            183, 184, 185, 186, 187, 188, 189, 190, 191, 192],
           dtype='int64', length=190)

2.5.2使用reindex修改索引

可以使用reindex修改索引。例如，我们修改索引为0至189的整数。

drink.reindex(range(0,190))

reindex也可以用来调整列的顺序，这时需要设定axis参数为'columns'或1

drink.reindex(['beer','spirit','wine','pure_alcohol','country','continent'],axis="columns").head()

2.5.3使用set_index修改索引

set_index也是修改索引的一种方法，与reindex不同的是，set_index需指定DataFrame中存在的某一列或某几列设定为索引。例如，我们修改索引为'continent'。

drink.set_index('continent')[:5]

这里会默认将指定为索引的列删除，若我们仍想在数据中保留该列，应设定参数drop为False。

drink.set_index('continent',drop=False)[:5]

若我们想在原来的索引基础上添加新的变量构成层次化索引，应设定append参数为True。

drink.set_index('continent',append=True)[:5]

也可以直接指定一组变量设定层次化索引。

drink2 = drink.set_index(['country','continent'])
drink2.head()

2.5.4使用swaplevel和reorder_levels调整索引顺序

若要调整层次化索引的顺序，可以使用swaplevel。

drink2.swaplevel()[:5]

swaplevel一次只能交换两层索引的顺序，当数据索引多于两层时，需要指定交换的层。我们构建一个小数据来看一下。

idx = pd.MultiIndex.from_arrays([['a1','a1','a2'],['b1','b2','b2'],['c1','c1','c2'],['d1','d1','d1']])
df = pd.DataFrame([[1,1,1],[2,2,2],[3,3,3]],index=idx)
df

我们记现在的索引顺序为a,b,c,d；若我们想改成c,b,a,d，则交换第一层和第三层索引的顺序即可。

df.swaplevel(0,2)

但若我们想改为c,d,a,b，则需要交换两次才可以。

df.swaplevel(0,2).swaplevel(1,3)

处理这种多层次索引时，使用reorder_levels会更方便，它可以直接指定所有索引的顺序。还是将原来的顺序改为c,d,a,b。

df.reorder_levels([2,3,0,1])

2.6 相关性计算¶

我们通过corr来计算各变量的相关系数。

drink.loc[:,['beer','spirit','wine']].corr()

可以看到，三类酒的消耗量都呈正相关。其中，啤酒与红酒最相关，相关系数为0.523；啤酒与烈酒也较相关，相关系数为0.450。corr默认计算的是Pearson相关系数，我们可以通过设定corr的参数method来计算其他的相关系数。

drink.loc[:,['beer','spirit','wine']].corr(method="spearman")

与默认的Pearson相关系数相比，三类酒的Spearman相关系数更大。这主要是因为Pearson相关系数是基于样本的协方差与方差计算，更适用于大样本或正态分布数据。而观察前文describe的结果我们可以看到本次数据是比较右偏的，不符合正态分布的假设条件。而Spearman秩相关系数是基于数据排序进行计算，对数据的分布没有要求，此时更适合。

2.7 多条件筛选¶

我们查找啤酒、烈酒和红酒的消耗量都高于相应酒种消耗量75%分位数的国家。

alcohol=drink.loc[(drink.beer>des.loc['75%','beer'])&(drink.wine>des.loc['75%','wine'])&(drink.spirit>des.loc['75%','spirit'])]
alcohol

alcohol.continent.value_counts()

EU    11
AS     1
NA     1
Name: continent, dtype: int64

可以看到，在三种酒类消耗量都高于75%分位数的13个国家中，竟然有11个都是欧洲国家，而唯一上榜的亚洲国家是俄罗斯（其实也可以算作是欧洲国家）。

这里要注意，三个判别式(drink.xx>des.loc[xx,xx])两端的括号不能省略。例如，如下的代码将会报错：

alcohol=drink.loc[drink.beer>des.loc['75%','beer'] & drink.wine>des.loc['75%','wine']]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/explorer/pyenv/jupyter-py36/lib/python3.6/site-packages/pandas/core/ops.py in na_op(x, y)
   1303         try:
-> 1304             result = op(x, y)
   1305         except TypeError:

/explorer/pyenv/jupyter-py36/lib/python3.6/site-packages/pandas/core/ops.py in rand_(left, right)
    148 def rand_(left, right):
--> 149     return operator.and_(right, left)
    150 

TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/explorer/pyenv/jupyter-py36/lib/python3.6/site-packages/pandas/core/ops.py in na_op(x, y)
   1320                 try:
-> 1321                     result = libops.scalar_binop(x, y, op)
   1322                 except:

pandas/_libs/ops.pyx in pandas._libs.ops.scalar_binop()

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'double'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-149-9e99de3a280e> in <module>()
----> 1 alcohol=drink.loc[drink.beer>des.loc['75%','beer'] & drink.wine>des.loc['75%','wine']]

/explorer/pyenv/jupyter-py36/lib/python3.6/site-packages/pandas/core/ops.py in wrapper(self, other)
   1358                       is_integer_dtype(np.asarray(other)) else fill_bool)
   1359 
-> 1360             res_values = na_op(self.values, other)
   1361             unfilled = self._constructor(res_values, index=self.index)
   1362             return filler(unfilled).__finalize__(self)

/explorer/pyenv/jupyter-py36/lib/python3.6/site-packages/pandas/core/ops.py in na_op(x, y)
   1324                                     "with a scalar of type [{typ}]"
   1325                                     .format(dtype=x.dtype,
-> 1326                                             typ=type(y).__name__))
   1327 
   1328         return result

TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]

这是因为"&"运算符的等级是要高于">"的，而我们的实际运算应该是先进行">"运算再进行"&"运算。同理，"|"，"<","=="等运算时也应该注意这个问题。

2.8 不同洲的饮酒情况对比¶

为方便筛选，我们设定continent为新的索引。

drink.set_index(keys=["continent"],inplace=True)
drink.head()

下面计算各个大洲各类酒的消耗总量。

提取出各大洲名称。

continent = drink.index.unique()
continent

Index(['AS', 'EU', 'AF', 'NA', 'SA', 'OC'], dtype='object', name='continent')

使用values方法将sum返回值转换为数组。

summarize = drink.loc['AS',['beer','spirit','wine']].sum().values
summarize

array([1630., 2677.,  399.])

循环计算，使用vstack对结果进行叠加，并将结果转化为数据框。

for name in continent[1:]:
    summarize = np.vstack((summarize,drink.loc[name,['beer','spirit','wine']].sum().values))
summarize = pd.DataFrame(data=summarize,columns=['beer','spirit','wine'],index=continent)
summarize

不出所料，欧洲的三类酒的总消耗量都是最高的。特别是红酒消耗量，比其他五个大洲之和还要多。

下面我们计算各类酒占各大洲总消耗量的比重，来看看各大洲对不同酒类的喜好。
首先按列求和，计算各大洲的总消耗量。

summarize.sum(axis=1)

continent
AS     4706.0
EU    21085.0
AF     4986.0
NA     7721.0
SA     4227.0
OC     2940.0
dtype: float64

按行相除，Pandas会将$6\times1$的总消耗量广播为$6\times3$，再各元素对应相除。

summarize.div(summarize.sum(axis=1),axis=0)

可以看到，亚洲和北美洲更喜欢烈酒，其他四个大洲都更喜欢啤酒。相比之下，红酒在各个州的消耗占比都较低。

注意，上面我们使用div并设定参数axis为0；如果直接使用"/"进行运算是不行的：

summarize/summarize.sum(axis=1)

这是因为"/"默认的是按列相除，如果我们想使用"/"进行运算，就需要先将summarize进行转置：

summarize.T/summarize.sum(axis=1)

	country\|beer_servings\|spirit_servings\|wine_servings\|total_litres_of_pure_alcohol\|continent
0	Afghanistan\|0\|0\|0\|0\|AS
1	Albania\|89\|132\|54\|4.9\|EU
2	Algeria\|25\|0\|14\|0.7\|AF
3	Andorra\|245\|138\|312\|12.4\|EU
4	Angola\|217\|57\|45\|5.9\|AF

	country	beer_servings	spirit_servings	wine_servings	total_litres_of_pure_alcohol	continent
0	Afghanistan	0.0	0.0	0.0	0.0	AS
1	Albania	89.0	132.0	54.0	4.9	EU
2	Algeria	25.0	0.0	14.0	0.7	AF
3	Andorra	245.0	138.0	312.0	12.4	EU
4	Angola	217.0	57.0	45.0	5.9	AF
5	Antigua & Barbuda	102.0	128.0	45.0	4.9	NaN
6	Argentina	193.0	25.0	221.0	8.3	SA
7	Armenia	21.0	179.0	11.0	3.8	EU
8	Australia	261.0	72.0	212.0	10.4	OC
9	Austria	279.0	75.0	191.0	9.7	EU

	beer_servings	spirit_servings	wine_servings	total_litres_of_pure_alcohol
0	0.0	0.0	0.0	0.0
1	89.0	132.0	54.0	4.9
2	25.0	0.0	14.0	0.7
3	245.0	138.0	312.0	12.4
4	217.0	57.0	45.0	5.9

	beer	spirit	wine	pure_alcohol
count	190.000000	190.000000	190.000000	193.000000
mean	107.836842	82.273684	50.231579	4.717098
std	101.047475	88.385871	80.081830	3.773298
min	0.000000	0.000000	0.000000	0.000000
25%	21.000000	5.000000	1.000000	1.300000
50%	76.500000	58.500000	9.000000	4.200000
75%	191.000000	130.250000	61.250000	7.200000
max	376.000000	438.000000	370.000000	14.400000

	beer	spirit	wine	pure_alcohol
count	190.000000	190.000000	190.000000	193.000000
mean	107.836842	82.273684	50.231579	4.717098
std	101.047475	88.385871	80.081830	3.773298
min	0.000000	0.000000	0.000000	0.000000
5%	0.000000	0.000000	0.000000	0.000000
50%	76.500000	58.500000	9.000000	4.200000
95%	296.100000	253.100000	227.600000	11.340000
max	376.000000	438.000000	370.000000	14.400000

	country	beer_servings	spirit_servings	wine_servings	total_litres_of_pure_alcohol	continent
188	Venezuela	333.0	100.0	3.0	7.7	SA
189	Vietnam	111.0	2.0	1.0	2.0	AS
190	Yemen	6.0	0.0	0.0	0.1	AS
191	Zambia	32.0	19.0	4.0	2.5	AF
192	Zimbabwe	64.0	18.0	4.0	4.7	AF

	country	beer	spirit	wine	pure_alcohol	continent
0	False	False	False	False	False	False
1	False	False	False	False	False	False
2	False	False	False	False	False	False
3	False	False	False	False	False	False
4	False	False	False	False	False	False
5	False	False	False	False	False	True

	beer	spirit	wine
beer	1.000000	0.450484	0.523014
spirit	0.450484	1.000000	0.187589
wine	0.523014	0.187589	1.000000

	beer	spirit	wine
beer	1.000000	0.604221	0.704716
spirit	0.604221	1.000000	0.536344
wine	0.704716	0.536344	1.000000

	beer	spirit	wine
continent
AS	1630.0	2677.0	399.0
EU	8720.0	5965.0	6400.0
AF	3258.0	866.0	862.0
NA	3345.0	3812.0	564.0
SA	2101.0	1377.0	749.0
OC	1435.0	935.0	570.0

	beer	spirit	wine
continent
AS	0.346366	0.568848	0.084785
EU	0.413564	0.282903	0.303533
AF	0.653430	0.173686	0.172884
NA	0.433234	0.493718	0.073048
SA	0.497043	0.325763	0.177194
OC	0.488095	0.318027	0.193878

	country	beer	spirit	wine	pure_alcohol	continent
5	Antigua & Barbuda	102.0	128.0	45.0	4.9	NaN
11	Bahamas	122.0	176.0	51.0	6.3	NaN
14	Barbados	143.0	173.0	36.0	6.3	NaN
17	Belize	263.0	114.0	8.0	6.8	NaN
32	Canada	240.0	122.0	100.0	8.2	NaN
41	Costa Rica	149.0	87.0	11.0	4.4	NaN
43	Cuba	93.0	137.0	5.0	4.2	NaN
50	Dominica	52.0	286.0	26.0	6.6	NaN
51	Dominican Republic	193.0	147.0	9.0	6.2	NaN
54	El Salvador	52.0	69.0	2.0	2.2	NaN
68	Grenada	199.0	438.0	28.0	11.9	NaN
69	Guatemala	53.0	69.0	2.0	2.2	NaN
73	Haiti	1.0	326.0	1.0	5.9	NaN
74	Honduras	69.0	98.0	2.0	3.0	NaN
79	Iran	NaN	NaN	NaN	0.0	AS
84	Jamaica	82.0	97.0	9.0	3.4	NaN
97	Libya	NaN	NaN	NaN	0.0	AF
109	Mexico	238.0	68.0	5.0	5.5	NaN
122	Nicaragua	78.0	118.0	1.0	3.5	NaN
130	Panama	285.0	104.0	18.0	7.2	NaN
143	St. Kitts & Nevis	194.0	205.0	32.0	7.7	NaN
144	St. Lucia	171.0	315.0	71.0	10.1	NaN
145	St. Vincent & the Grenadines	120.0	221.0	11.0	6.3	NaN
147	San Marino	NaN	NaN	NaN	0.0	EU
174	Trinidad & Tobago	197.0	156.0	7.0	6.4	NaN
184	USA	249.0	158.0	84.0	8.7	NaN

	country	beer	spirit	wine	pure_alcohol	continent
0	Afghanistan	0.0	0.0	0.0	0.0	AS
1	Albania	89.0	132.0	54.0	4.9	EU
2	Algeria	25.0	0.0	14.0	0.7	AF
3	Andorra	245.0	138.0	312.0	12.4	EU
4	Angola	217.0	57.0	45.0	5.9	AF
5	Antigua & Barbuda	102.0	128.0	45.0	4.9	NA
6	Argentina	193.0	25.0	221.0	8.3	SA
7	Armenia	21.0	179.0	11.0	3.8	EU
8	Australia	261.0	72.0	212.0	10.4	OC
9	Austria	279.0	75.0	191.0	9.7	EU
10	Azerbaijan	21.0	46.0	5.0	1.3	EU
11	Bahamas	122.0	176.0	51.0	6.3	NA
12	Bahrain	42.0	63.0	7.0	2.0	AS
13	Bangladesh	0.0	0.0	0.0	0.0	AS
14	Barbados	143.0	173.0	36.0	6.3	NA
15	Belarus	142.0	373.0	42.0	14.4	EU
16	Belgium	295.0	84.0	212.0	10.5	EU
17	Belize	263.0	114.0	8.0	6.8	NA
18	Benin	34.0	4.0	13.0	1.1	AF
19	Bhutan	23.0	0.0	0.0	0.4	AS
20	Bolivia	167.0	41.0	8.0	3.8	SA
21	Bosnia-Herzegovina	76.0	173.0	8.0	4.6	EU
22	Botswana	173.0	35.0	35.0	5.4	AF
23	Brazil	245.0	145.0	16.0	7.2	SA
24	Brunei	31.0	2.0	1.0	0.6	AS
25	Bulgaria	231.0	252.0	94.0	10.3	EU
26	Burkina Faso	25.0	7.0	7.0	4.3	AF
27	Burundi	88.0	0.0	0.0	6.3	AF
28	Cote d'Ivoire	37.0	1.0	7.0	4.0	AF
29	Cabo Verde	144.0	56.0	16.0	4.0	AF
...	...	...	...	...	...	...
160	Spain	284.0	157.0	112.0	10.0	EU
161	Sri Lanka	16.0	104.0	0.0	2.2	AS
162	Sudan	8.0	13.0	0.0	1.7	AF
163	Suriname	128.0	178.0	7.0	5.6	SA
164	Swaziland	90.0	2.0	2.0	4.7	AF
165	Sweden	152.0	60.0	186.0	7.2	EU
166	Switzerland	185.0	100.0	280.0	10.2	EU
167	Syria	5.0	35.0	16.0	1.0	AS
168	Tajikistan	2.0	15.0	0.0	0.3	AS

	country	beer	spirit	wine	pure_alcohol	continent
3	Andorra	245.0	138.0	312.0	12.4	EU
25	Bulgaria	231.0	252.0	94.0	10.3	EU
44	Cyprus	192.0	154.0	113.0	8.2	EU
45	Czech Republic	361.0	170.0	134.0	11.8	EU
60	Finland	263.0	133.0	97.0	10.0	EU
75	Hungary	234.0	215.0	185.0	11.3	EU
93	Latvia	281.0	216.0	62.0	10.5	EU
99	Luxembourg	236.0	133.0	271.0	11.4	EU
141	Russian Federation	247.0	326.0	73.0	11.5	AS
151	Serbia	283.0	131.0	127.0	9.6	EU
155	Slovakia	196.0	293.0	116.0	11.4	EU
160	Spain	284.0	157.0	112.0	10.0	EU
184	USA	249.0	158.0	84.0	8.7	NA