We provide you here the raw dataset in Stata format for all the languages series in the Journal of International Economics.
1. Reproduction Files
You will find our series for Common Official Language (COL), Common Spoken Language (CSL), Common Native Language (CNL) and in the case of Linguistic Proximity, LP1 and LP2, the unadjusted values that serve for constructing both variables, which we label PROX1 and PROX2. A do-file will permit you to construct our series for LP1 and LP2. But it will also permit you to construct both variables based on your own dataset. This is important in the case of LP1 and LP2 since both variables are data dependent. The same do-file will permit you to construct Common Language CL as well, a variable which depends, in turn, on LP and therefore is also data dependent. (Download the data and do file.)
In addition, we provide you our series for CL and those that we would have obtained had we not moved into logs. We label this alternative series CLE.
2. Raw Data
The proximity indices in the JIE paper are produced with many adjustments. One is the subtraction of the product of the importer and the exporter native languages when these languages are the same. This is explained carefully in our JIE: LP indices cannot be interpreted alone, there have to be interpreted in conjunction with common native language. We have also computed the linguistic proximity indices from the raw language files where we discarded the adjustments. This files has many more countries than used in the JIE study. It can be downloaded here.
Proxling2 is based on ASJP, Proxling is based on the linguistic tree formula as described in Melitz and Toubal (2014, JIE). Larger values of the indices mean higher lingusitic proximity. The iso codes are standard. Please beware that in our case, we do not distinguish between Belgium and Luxembourg and we have created a pseudo-iso “BLX”.