A pile of endless side projects

A tiny blog on stuff I do

The most famous cuisine in the world (on Wikipedia)

2020-10-06

Some time has passed since the last post. In the meanwhile some interesting projects were born1

Between the draft of a recipe book created to organize some notes and the computation of Sardinia geographical centroid (improvised after the holidays), while reading on Wikipedia some pages on foreign cuisines, I wondered:

What is the most "famous" cuisine in the world (on Wikipedia)? 2 3

Put it like this, it’s a vague question. In detail:

In brief: I’ve tried to correlate cuisine page lengths (written in multiple languages) and to create an intuitive, easy-to-read visualization

Before discussing the technical side let’s see the results! :)
(Every graph is also available as interactive version, through the dedicated link. Best viewing on PC)

Heatmap (correlation matrix) of Wikipedia cuisines

↬ Link to the interactive version ↫

There are a lot of things to say on this graph! Before explaining how it was made, let’s see how to read it and what it means!

Let’s consider a small part of the matrix obtained zooming it (using the interactive version):

A couple of interesting things can be observed only from the color: the page on the Italian cuisine in Korean is quite long, as the page on the Israeli cuisine in Indonesian!

Unexpected famous cuisines

Exploring the heatmap, strange cuisine-language combinations can be found. For example:

Remarks and disclaimers

↬ Link to interactive version (with labeled axes) ↫12

Statistics and podium 🏆

The following statistics are computed using the entire version of the dataset, without excluding voices or national languages (regional and local dialects were ignored)13

Most “famous” cuisines of the world (cumulative, on all languages)

↬ Link to the interactive version ↫

Adding up the page lengths of every cuisine for all considered languages, the following rankings are obtained:

  Cuisine Length
1 🥇 🇮🇹 Italian 1263679
2 🥈 🇩🇪 German 1016720
3 🥉 🇯🇵 Japanese 981386
4 🇰🇷 Korean 912384
4 🇺🇸 American 893520
6 🇫🇷 French 874603
7 🇮🇩 Indonesian 832875
8 🇷🇺 Russian 793379
9 🇮🇳 Indian 778534
10 🇳🇱 Dutch 681041

In the first position, the Italian cuisine, with an overall length of 1.26 million characters! :tada:

The longest pages (independently from the language)

The top ten of longest cuisines (among all considered Wikipedias) is the following:

  Cuisine and language Native title page Length
1 🥇 🇷🇺 Russian (in Polish) Kuchnia rosyjska 363864
2 🥈 🇩🇪 German (in Russian) Немецкая кухня 279328
3 🥉 🇦🇷 Argentine (in Italian) Cucina argentina 227606
4 🇺🇸 American (in French) Cuisine des États-Unis 218192
5 🇺🇸 American (in Japanese) アメリカ料理 190920
6 🇺🇸 American (in English) American cuisine 181443
7 🇮🇩 Indonesian (in Russian) Индонезийская кухня 175120
8 🇲🇾 Malaysian (in English) Malaysian cuisine 162794
9 🇮🇹 Italian (in Kannada) ಇಟ್ಯಾಲಿಯನ್‌ ಪಾಕಪದ್ಧತಿ 152911
10 🇦🇷 Argentine (in Spanish) Gastronomía de Argentina 140174

At the top, the longest cuisine is the page on the Russian cuisine (written in Polish), with 363864 characters! :tada:

A curiosity on the 9th place (just because I’m Italian)
I wasn’t aware of the existence of Kannada (a language spoken in southern India), but it seems they are really interested in the Italian cuisine (ಇಟ್ಯಾಲಿಯನ್‌ ಪಾಕಪದ್ಧತಿ). As an Italian I think is wonderful to read something written in a completely different alphabet but at the same time find images of caffettiere (coffee makers), focacce and tiramisù:

Someone pointed out that the part on Turin bicerin seems untranslatable (“I think that there is a word for ‘latte’ (‘milk’) in Kannada”). But partial translation are common on Wikipedia, nothing too strange here! :)

The languages with most cuisine pages

↬ Link to the interactive version ↫

No wonder, the Wikipedia version with more cuisine pages is the English Wikipedia 🇬🇧

How was this analysis made?

To avoid a long digression on the implementation, I will only briefly mention the used packages and give an overview on data collection and processing. For those interested in digging more into this part, the repository of this project is on GitHub

Used packages
The entire project is developed in Python. The main packages used (handled using Poetry) are the following:

Package PyPi page Description
beautifulsoup4 🔗 Handle/parse low-level HTML14
pandas 🔗 Must-have to store and analyze data
emoji 🔗 Needed for emojis, used for national flags
plotly 🔗 A must-have to create visualizations/plots15

Download and data processing
There are 4 steps of data preparation. At each step, the same data structure is enriched with more and more information.
The functions (that represent the steps) are the following:

Visualization and graphs
A final step (step5_create_plots, in visualization.py) loads the previously created data structures to create graphs, tables and stats that are saved in images, HTML and Markdown

Conclusions

This whole analysis was made just for fun, out of curiosity and to do something new. That being said, I’m happy of the results! It’s really interesting to look for strange correlations in the heatmap, discovering new cuisines and languages that I didn’t know about!

If you find something curious or want to create similar statistics to a different Wikipedia category, let me know or try directly by yourselves (the code is as always open-source (repository su GitHub) and it’s released under MIT license)

Things (still) to do

Since when I started writing this article some improvements and new graphs to make have come to my mind. I’m happy with the current status, but in the future I would like to implement (by importance/relevance):

Thanks for reading, see you next time! :)


  1. ITAQA is going on and I have new things to talk about! I hope to write about it soon 

  2. Narrow it down to Wikipedia simplifies a lot everything! 

  3. I already had a guess, but I wasn’t able to prove it (yet

  4. A collection of interesting voices is “featured articles”, but it only classify in “interesting” vs “not interesting” 

  5. As explained later, I know that the length of a page is not a certain measure of its fame. Also it was pointed out that different languages have different information density 

  6. Yes, I know, a lot of countries and languages are missing, I will explain later  2

  7. Note that on some rows there are multiple dots, indicating all countries where that language is the main one. On this regard: some countries have more than one main language (Switzerland, Luxembourg, Belgium), in which case multiple dots on the same column should be marked. This higher level of detail is not implemented (yet) 

  8. A dear friend of mine (who loves Greek cuisine) commented: “there is really little to say on their cuisine, it can be summarized in: feta, moussaka, gyros, i cosi con la vite, olive. Stop.” 

  9. Namely, all the national cuisines that are listed in the English Wikipedia 

  10. Namely, all the languages that have at least a voice on one of the national cuisines taken into account 

  11. I ignored all voices shorter than 4000 characters, all cuisines present in less than 13 languages, all languages with less than 14 cuisine pages and all Wikipedias in local dialects 

  12. The interactive version has on the language axis some language prefixes not converted in the language name. It’s still interesting to navigate! 

  13. All statistics and graphs are based on the situation on October 4th, 2020 

  14. Initially I planned to do the parsing/data download using the wikipedia (🔗) package, but currently is incomplete and not optimized. For this reason I switched to the low-level approach with beautifulsoup 

  15. I think that matplotlib is a little bit outdated and a blob, there are some alternatives (like plotly) that are lightweight and natively interactive/modern. I have still to try seaborn (website), I’ve read some nice thing 

  16. Maybe using directly GitHub Actions (I don’t know how much it would be feasible, but I want to try it) 

  17. Since when I started to write this post there was already a change to make: the German cuisine overtook the Japanese one, moving on to the second place, after a big expansion of the page “German cuisine in Russian at the end of September 

  18. But I’m afraid that local-dialect Wikipedias are too small, I don’t think they contain a lot of information on the cuisines of other regions