Internet-Defense-League

2011-07-09

Index decomposition with R

Few days ago, I finally finished a small package ida. It enables you to analyse contributions of underlying factors to the change in an aggregate, using methods based on index number theory. These methods have become popular by, but are not restricted to, investigating the change of CO2 emissions.

Here is a chart that shows what the change of population, welfare, efficiency and fuel substitution contributed to CO2 emissions:

The numbers refer to Gt CO2. The data comes from Worldbank, however we treated missing values rather uncautious here. So the result may or may not be valid. However, it puts the efforts in perspective: Clearly the reduction of the energy use per GDP has not been capable to compensate the additional emissions from population growth and income per capita growth. The carbon intensity, i.e. the emissions per energy unit, remained nearly unchanged.

Here is how to produce the chart:

First, you need the package ida, and — for this example — its suggested dependencies. The ida package is not yet on CRAN, but on a self-hosted repository. Here is how to install

> install.packages(ida, repos = c(getOption("repos"), 
+ "http://userpage.fu-berlin.de/~kweinert/R"),
+ dependencies = c("Depends", "Suggests"))

For Windows users, only a R 2.13 binary is available, users of other versions need to add the type="source" parameter.

Next, we load the example data. We represent the CO2 emissions as a product

C(t) = p(t) * w(t) * i(t) * s(t)

with

  • p(t), the world’s population
  • w(t), the income per person, i.e. the GDP per capita
  • i(t), the energy intensity, i.e. the energy per GDP
  • s(t), the carbon intensity, i.e. the CO2 emissions per energy used

This decomposition of the CO2 emissions is attributed to a researcher named Kaya. The data above on country level is attached to the ida package and can be aggregated with the transform_kaya function

> world <- transform_kaya("world")
> world
variable Y1990 Y1991 Y1992 Y1993
1 population 5.2527625 5.3367567 5.4173742 5.4988154
2 gdp_pcap 6.7069964 6.6873499 6.7029494 6.7184605
3 energy_intensity 0.2405942 0.2397186 0.2351846 0.2331813
4 carbon_intensity 1.9561772 2.0998782 2.5575150 2.5694175
Y1994 Y1995 Y1996 Y1997 Y1998
1 5.5792484 5.6611426 5.7409887 5.8208185 5.8997860
2 6.8261435 6.9418972 7.0965323 7.2881507 7.3668336
3 0.2277819 0.2262243 0.2238883 0.2166251 0.2124994
4 2.5612014 2.5424658 2.5441399 2.5385912 2.4862305
Y1999 Y2000 Y2001 Y2002 Y2003 Y2004
1 5.978254 6.0559912 6.1329413 6.209233 6.2854260 6.3613650
2 7.541670 7.7996127 7.8757705 7.986565 8.1649870 8.4642982
3 0.208676 0.2033032 0.1995896 0.198509 0.1987728 0.1982568
4 2.442819 2.4734731 2.4804332 2.471007 2.4972474 2.5050409
Y2005 Y2006 Y2007
1 6.4371665 6.5133449 6.5899233
2 8.7438915 9.0785423 9.4397475
3 0.1943548 0.1897489 0.1850416
4 2.5204078 2.5227782 2.5390367

Now we can calculate the contribution of the four factors. Currently there are two methods implemented: the additive Laspeyres decomposition, which usually has a residual term of unexplained change; and the log-mean Divisia Index I decomposition, which is not really well defined for negative factors. More details on these and other methods can be found in Alexia van der Cruisse de Waizier’s thesis. We choose the latter method here

> lmdi <- ida(world, effect = "variable", from = "Y1995", 
+ to = "Y2007", method = "lmdi1")
> lmdi
Method: lmdi1 (additive)

Index Values:
Y1995 Y2007
22.60355 29.22665

Decomposition results:
variable Y1995_Y2007
population 3.91539472647059
gdp_pcap 7.92156787903755
energy_intensity -5.17907244220765
carbon_intensity -0.0347851393004786
Total change 6.62310502400001

The package is capable of more features, for instance is aggregating over up to two levels supported. This is a topic for a later post.

No comments: