Anton From Perm

Just another weblog

Archive for the ‘Data analysis’ Category

Centers of gravity in Russia

with one comment

GDP per capita 2007

GDP per capita, 2007, in percentage of Russia=100

A >125
D 50-75
E <50
Above is a chloropleth regional map of Russia, expressing the GRP of regions to the Russian average. The most interesting point is how Moscow seems to soak up all resources and population around a radius of several hundreds kilometers.
It reminds me of Volkswagen placing a industrial park in Kaluga, only to realize later that they could not find enough workforce at the wage VW was willing to pay. The locals preferred to drive three hours each day to Moscow in order to earn three times as much 🙂

Written by antonfromperm

January 23, 2010 at 9:29 pm

Doing Business in Russia

with 2 comments

The World Bank published a Doing Business in Russia report several weeks ago. It focuses on 4 main indicators: starting a business, getting a construction permit (which is a total nightmare in Russia 🙂 ), registering property and trading across borders. Each indicators is measured by the number of steps/documents, the time to complete the procedures and the costs as a percentage of national income. Strangely, the World Bank didn’t publish a simulator as it did for other countries (China, Mexico, etc…) Get the simulator as a Excel spreadsheet here (I use exactly the same methodology and formulas as the World Bank does in other reports)

A few words about the methodology:

The index is calculated as the simple average of a city’s percentile ranking on each of the 4 topics covered in the study (starting a business, getting a construction permit, registering property & trading abroad). The ranking on each topic is in turn the simple average of the percentile ranking on its component indicators. [The percentile rank is the percentage of values below (<) OR less or equal  (<=) to a  given value, depending on the definition]

For example it takes in Moscow 9 procedures, 30 days and 2.7% of annual income per capita to open a business.  The minimum capital requirement amounts to 2.2% of annual income per capita. It means that on the 4 component indicators, Moscow ranks in the 0th (best), 67th, 100th (worst) and 0th percentile. On average, Moscow ranks in the 53th percentile. It ranks in the 96th percentile on dealing with construction permits, 44th percentile on registering property and 67th percentile on trading across borders. The average of Moscow’s percentile rankings is 62%. If you now order all cities by their (ascending) average percentile rank, Moscow gets the last (and 10th) place.

However, percentiles are totally meaningless in small samples, especially when the observations are about the same and you get “tied ranks” while ordering the data. When you have ten values, of which eight are the same, you might have to assign the 0th or 100th percentile to the highest respectively the lowest value (some statisticians argues that the 0th and 100th percentile cannot be determined in a finite sample). And the remaining ones could lie in a range from 20th to 80th percentile (depending on HOW you define the percentile rank).

Now comes the problem: my calculations and the World Bank report’s result do not match, despite using exactly the same data and method. Let’s assume (out of goodwill) that the WB used a different method for calculating percentiles. I will post an update as soon as I get an answer from them.

Written by antonfromperm

December 3, 2009 at 3:27 pm

How to make a Russian regional thematic map

with 5 comments

Nathan at Flowing Data explained how to create a US County thematic map using free tools and it immensely helped me doing the same for Russian regions.

The Result

Inward FDI Performance Index 2005-2007

Step 1: Get the blank map

Unfortunately, we can’t take the blank Russia SVG map from Wikimedia Commons. First, it doesn’t take into account the latest mergers in 2008 (Irkutsk Oblast + Ust-Orda Buryat Autonomnous Okrug = Irkutsk Oblast, Chita Oblast + Agin-Buryat Autonomous Okrug = Zabaykalsky Krai). Second, the regions aren’t identified in the SVG file by their ISO_3166-2:RU code, but by their concatenated transliterated names. Third, the Federal Service of State Statistics doesn’t publish figures of several regions (like Yamalo-Nenets Autonomnous Okrug) because they fall under the administrative jurisdiction of other federal subjects of Russia (in this case Tyumen Oblast). It means that we have to group these regions together so that they can “inherit” the data visualization from their respective jurisdiction.

Get the new map I just uploaded to Wikimedia Commons here.

Step 2: Run python script

You can simply follow Nathan’s blog post and write it yourself, or use mine instead. I wrote it because I am too lazy to rewrite the script each time I want to colorize a map with different parameters.

Get it here (don’t forget to get BeautifulSoup first),:

Sample usage: python -d Inward_FDI_Performance_Index_2005_2007.csv -c RdYlBu6.dat -l IFPI.dat -i Russia2009blank.svg -o IFPI.svg


  • -d : Region specific data. Remove the headers and save it as a CSV file. Put the ISO3166-2:RU code into the first column, and the value into the second column
  • -c: Color scheme. It’s a simple text file containing the hexadecimal colors in reverse order, e.g. #D73027 #FC8D59 #FEE090 #E0F3F8 #91BFDB #4575B4
    Use ColorBrewer to help you select the colors to use.
  • -l: Legend file, e.g. 1 0.5 0.25 0.10 0 -0.1 The script will assign the last color in your list if the data point is above the first value in your legend file, and so on
  • -i: Input SVG file, i.e. the blank map
  • -o: Output SVG file


Written by antonfromperm

November 25, 2009 at 12:02 am

Easily unpivot data using Python

with one comment

Denormalized data in multiple columns

I was looking for an easy way to unpivot data, i.e. expand values from multiple columns in a single record into multiple records with the same values in a single column.

Normalized data in multiple records

Since the only tool available was SQL Server/SSIS, I wrote a short script in python (less than 50 lines of code) to easily unpivot CSV data.

Get the code here:


  • -v Verbose
  • -i Input file
  • -o Output file
  • -c Number of columns to be “frozen”, the default is one

Sample usage:
python -v -i GRP.csv -o GRPunpivot.csv -c 1

Written by antonfromperm

November 23, 2009 at 11:16 pm