German cities list

What: Extract a list of german cities and countries from wikipedia
Why: Get a list of german cities for text processing
How: Using Beautifulsoup, Requests and Python

Introduction

Wikipedia contains a list of german cities and towns. This list is formatted in html and needs to be processed for further automatic processing. Additionally, for each city the country is mentioned.

Code

Below is the python code for extracting the list. The url and the processing of the page specific search via Beautifulsoup is hard encoded. The wikipedia page uses a 2-letter encoding for the countries, which is mapped to the full country name.

The code can be tested via the following snippet, which can be embedded as self test in the same script, where the CityList class is defined.

Usage

Use it from within python:

The output will be something like:

[...,
'Vohenstrauß': 'Bayern',
'Neuötting': 'Bayern',
'Eggenfelden': 'Bayern',
'Gernsheim': 'Hessen',
'Braunsbedra': 'Sachsen-Anhalt',
'Tegernsee': 'Bayern',
...]