About CLICS

This web site serves as a browseable of the data and methods described in the paper

J.-M. List et al. (forthcoming): CLICS 2: An improved database of cross-linguistic colexifications assembling lexical data with the help of cross-linguistic data formats. Linguistic Typology. DOI: 10.1515/lingty-2018-0010

Page 11 of the paper lists summary statistics about the source datasets which are aggregated for CLICS 2. This table is reproduced dynamically below - and will thus be updated with new releases of the data.

Table 1: Overview of datasets (page 11).
ID Dataset Concept list Glosses Concepticon Varieties Glottocodes Families
allenbai Bai Dialect Survey 498 499 9 3 1
bantubvd Bantu Basic Vocabulary Database 420 415 10 10 1
beidasinitic Chinese Dialect Vocabularies 754 700 18 18 1
bowernpny Computational Phylogenetics and the Internal Structure of Pama-Nyungan: Dataset 338 338 170 168 1
hubercolumbian Dataset of Huber and Reed's "Comparative Vocabulary" 342 343 69 65 16
ids Intercontinental Dictionary Series 1310 1305 321 276 60
kraftchadic Chadic Wordlists 428 428 67 60 3
northeuralex NorthEuraLex 940 940 107 107 21
robinsonap Internal Classification of the Alor-Pantar Language Family Using Computational Methods Applied to the Lexicon 393 393 13 13 1
satterthwaitetb Phylogenetic inference of the Tibeto-Burman languages or on the usefuseful of lexicostatistics (and "megalo"-comparison) for the subgrouping of Tibeto-Burman 418 418 18 18 1
suntb Sun ZMYYC [Tibeto-Burman phonology and lexicon] 915 905 48 48 1
tls Tanzania Language Survey (TLS) 1037 797 120 97 1
tryonsolomon Solomon Islands Languages: An internal classification 315 311 111 96 5
wold The World Loanword Database 1460 1457 41 41 24
zgraggenmadang Z'graggen Madang 306 306 98 98 1

Page 12 of the paper lists the top-ten most often colexified pairs of concepts. Again, we reproduce this table dynamically below, with concept labels linking to the details page of the respective concept and counts linking to the details page on the respective colexifications.

Table 2: The ten most frequently recurring colexifications encountered in our database (page 12).
ID A Concept A ID B Concept B Families Languages Words
1370 MONTH 1313 MOON 56 289 294
1803 WOOD 906 TREE 55 211 310
1258 FINGERNAIL 72 CLAW 50 209 216
2267 SON-IN-LAW (OF MAN) 2266 SON-IN-LAW (OF WOMAN) 49 262 285
2265 DAUGHTER-IN-LAW (OF MAN) 2264 DAUGHTER-IN-LAW (OF WOMAN) 47 235 262
1608 LISTEN 1408 HEAR 47 102 105
763 SKIN 629 LEATHER 46 233 255
2259 FLESH 634 MEAT 46 222 232
1599 WORD 1307 LANGUAGE 45 94 98
1228 EARTH (SOIL) 626 LAND 43 158 181

Page 16 shows a figure displaying the biggest cluster computed with the infomap algorithm in the network of all colexifications. This cluster can be inspected on this site as well at SPEAK

Page 18 shows a figure displaying the subgraph of the network of all colexifications centered at WHEEL. This subgraph can be inspected on this site as well at WHEEL