The idea of an interactive tornado database for the entire world, not just a localized area, is something that has never been realized before – the datasets exist, but they are not commonly brought together. Needless to say, our effort to do so came with some challenges. The following is an account of everything we had to do in order to acquire the data we have now.
Data Gathering
In early January, we made a push to find as many usable tornado datasets as we could. By far the biggest one, which we had our sights set on for a while, was the one maintained by the ESSL – the European Severe Weather Database (ESWD). This was also the most proprietary, since the ESSL normally charges money for it, so in order to gain access we made an agreement that limited the capacity with which we could display it. Still, from Iceland to Morocco to Syria to Yakutsk, it covers more geographic area than any of our other datasets, as well as a larger time span – its first record is all the way back in 66 CE (though the timeline on the explorer only goes to 1700 for now).
We also found some detailed data for Japan from the Japan Meteorological Agency, extending back to the 60s (1960s, that is, as opposed to the ESWD data, which starts in the actual 60s). Out of any dataset, this was probably the toughest to actually retrieve data for. I couldn’t find any master files available – instead, they had a big table with a list of links to a separate webpage for every single tornado. I wrote a Python script that would go down the table, open each link, and retrieve the contents of each page, scraping them into a file (big thanks to Anthony for contributing a data parsing function). This process was made even more complicated by the fact that everything was in Japanese.
I also obtained some data for Mexico thanks to Dr. José Francisco León-Cruz – see his publication describing it here.
For New Zealand, I used a catalog from the National Institute of Water and Atmospheric research. New Zealand doesn’t get many tornadoes, but the ones they do get are documented quite well (they even have injury descriptions, one of which is as detailed as “Mrs. Benton suffered a bruised back”).
Australia’s available data is rather easy to acquire from the Bureau of Meteorology. Unfortunately it is not very high quality – unrated tornadoes are labeled as F0 and thus are impossible to distinguish from those that were actually rated F0 (which distorts the climatology. Most sources would count these as (E)FU). A cursory inspection reveals some inaccurate locations or dubious ratings. All events are single points, but some have a path direction and path length included, which I used to calculate a path endpoint. Fortunately, Dr. John Allen has been working on an updated tornato climatology for Australia which extends all the way back to 1795 and will improve greatly upon the BOM dataset.
The main official dataset for Canada is one provided by ECCC, which covers 1980-2009. Otherwise we have data from the Northern Tornadoes Project, which started a few years ago and documents Canadian tornadoes with a level of detail rivaling that of US organizations. I’m also aware of some additional datasets in the works, which I hope to integrate as soon as they are available.
China was rather frustrating to acquire data for. There is a public dataset with about 980 events from 2007-2016 with nothing but dates and locations, and a short list for 1966-1990 online. But I found a paper from 2017 called “Tornado climatology of China” which states, right in the first line of the abstract, that they used a “recently completed data set with details on 4763 tornadoes in the period 1948–2012“, including Fujita ratings. According to the paper, this work was initiated in order to evaluate tornado risk to potential nuclear power plant sites. The dataset itself was not published, and I emailed several of the authors but did not hear back. The paper cited “Chen JY. 2015b. A survey of tropical cyclone, tornado and extreme winds and assessment of atmospheric dispersion conditions for potential construction of small modular pressurize water reactors in China. Technical Report, Peking University, Beijing, 175 pp.” as a finalization of the data, but I could not find that paper at all. Given the comprehensiveness of the data, and its connection to the nuclear power grid, I would not be surprised if the Chinese government wanted to keep it private. Still, if anyone has connections with Jiayi Chen, Xuhui Cai, any other authors of the paper, or Peking University, please let us know.
Most of our information about tornadoes in Bengal (a region including Bangladesh and neighboring parts of India) was compiled by Jonathan Finch. Bangladesh receives significant and deadly tornadoes, but they are not very well documented, and details about them would probably be quite sparse and hard to find if not for his work. Almost all of our Bengal data comes from his website, bangladeshtornadoes.org. It provided coordinates for each event, but they only included one digit after the decimal place, which is not very precise (see this xkcd for a guide to coordinate precision). That’s why I went through and looked up the given towns for more accurate placements.
Our data for South Africa was compiled by Zac Muller of TornadoWatchSA and is available here. No coordinates were included, so the locations provided all had to be looked up in Google Earth (thanks to Anthony and Zach for helping here).
Our datasets for Argentina and Brazil are rather similar – both being in published research papers. Brazil’s data was compiled by Dr. Bruno Bertoni, and Argentina’s by Mariano Balbi and Pablo Barbieri.
What filled in a lot of gaps between those two datasets was a detailed google map called ”Pasillo de los Tornados – Pasado y presente” compiled by someone presumably from Argentina (I sent them an email but did not hear back). It had over 360 markers, each describing a specific event in Brazil, Uruguay, Argentina, or Paraguay. Unfortunately a format like that isn’t easily usable, since information like casualties, path length, rating, and so on is in the text description, so I had to read through each and copy that information down. Between my knowledge of Spanish and Google Translate, I was able to finish in a few days.
I didn’t actually realize Chile got tornadoes until we came across this paper on the topic. The author, Dr. José Vicencio Veloso, was kind enough to share his data with us.
You may notice we have a small amount of data for Bermuda, Madagascar, and Indonesia, as well as some for China whose source says “User Contributed”. These were added courtesy of Cheo and Zach, and are just the beginning of our data aggregation process for regions with limited or missing central tornado databases. Thanks also to Raven, who provided detailed, quality-controlled US data for 2020 and 2021 (I know this post is for non-US tornadoes, but I think it’s worth mentioning).
Data Processing
For all data sources, the first step of the process was to get the data into a CSV format (think: Excel sheet). This was much more difficult for some sources than for others (Japan, for example). Next, I processed those files to obtain the standard attributes we needed – things like location, date, path length, injuries, and so on. And finally, I wrote a big script that pulls together everything into one file, which is what the explorer on the website reads from.
Naturally, we wanted to make corrections and additions to these datasets, so I set up an advanced framework for data editing (just kidding. It’s a bunch of Google Sheets). But I did standardize the format of the sheets so that I only needed a single Python script to read them.
I also modified the script so it could read detailed path information from KML files. KMLs are the standard format for paths drawn in Google Earth (just a list of coordinates for each point in the path), and reading those in is easier than having people manually type in coordinates for really complicated paths.
We’re working on a better, dedicated database structure that will allow us to make edits much more easily. It might even allow for some level of contributions from the public, but we are a long, long way from that part. Overall, now that we’ve established a baseline of worldwide data, I’m looking forward to being able to further refine it.
Note: for a more official data documentation page, see Sources.
Considering how tiny are the British Isles, the data set of TORRO is huge. It numbers way beyond 4000 tornado events over the last thousand years.
TORRO is the Tornado and Storm Research Organisation, founded in 1974 by Prof Dr Terence Meaden. Data collection had begun years earlier.
TORRO continues to flourish with research and meetings. Many papers published. It has arranged two conferences a year in Britain since 1985. Check the TORRO web site.
Terence is 87, retired and very unwell Try corresponding with any of a dozen staff that are named at the web site including David Smart, Robert Doe, Paul Knightley, Jonathan Webb, Matthew Clark …… e.g.,
‘David Smart’ , ‘Jonathan Webb’ , ‘Matt Clark’ , ‘Paul Brown’ , ‘Paul Knightley’ , ‘Paul Knightley’ , ‘Rob Doe’ , ‘matthew clark’ ,
With best regards, Terence.