LexiRumah (from Lexicon and Indonesian Rumah = house) is an online database containing lexical data for languages of Eastern Indonesian and Timor-Leste. The database was set up and is maintained by the NWO Vici project “Reconstructing the past through languages of the present: the Lesser Sunda Islands” at Leiden University.
Each word in LexiRumah belongs to a particular language and a particular concept. In the case of data drawn from survey word-lists, the concept to which a word matches was the prompt which elicited the word. (In nearly all cases this prompt was given in Indonesian/Malay). For data from published sources the concept to which a word matches is that judged most similar to the gloss or definition given in the source.
Each word is associated with three different transcriptions. Firstly, there is a standardised transcription (Form IPA) using the symbols of the International Phonetic Alphabet which follows the conventions of CLTS (Cross-Linguistic Transcription System). Depending on the nature of the source, and the level of knowledge of the language, this transcription is either phonemic or phonetic, bearing in mind the conventions of CLTS. In addition, in nearly all cases a phonemic sequence of two identical vowels is transcribed as a single long vowel (/VV/ → <Vː>) in order to ease cross-linguistic comparison.
Secondly, each word has an orthographic transcription (Orthography). For languages which have an established orthography which is used to some extent by speakers and/or linguists, this transcription follows the conventions of the established orthography. For languages which do not yet have any orthography this transcription mostly follows the conventions of Indonesian.
Finally, the representation of the word in the original source from which the word is derived is also given.
To take one simple example, the Ili'uun (eray1237) word which matches the concept ‘hide’ was given in the original source as <ladjōk>. This has been standardised to IPA /lad͡ʒoːk/ and is transcribed orthographically lajook in rough accordance with Indonesian orthography.
LexiRumah is a separate publication by Gereon A. Kaiping, Owen Edwards, and Marian Klamer. We recommend you cite it as
Kaiping, Gereon A., Owen Edwards, and Marian Klamer (eds.). 2019. LexiRumah 3.0.0. Leiden: Leiden University Centre for Linguistics. Available online at https://lexirumah.model-ling.eu/. Accessed on 2024-11-07.
It is important to cite the specific reference (printed source, or data collector) that the data you cite comes from. Every item in the database is linked to its reference and should be cited accordingly. Thus, for instance if you wish to cite the Hewa (sika1262-hewa) word taʔa ‘betel vine’. You would do so as following: “The Hewa word for ‘betel vine’ is taʔa (Fricke 2014)”. With the following entry in your list of references:
Fricke, Hanna. 2014. Topics in the grammar of Hewa: A variety of Sika in Eastern Flores, Indonesia. München: Lincom Europa
If you do not have access to the original source, as is often the case for survey word-list data, it is still insufficient to cite only LexiRumah. Instead, cite the original source and LexiRumah in the following way: “The Iha word for ‘thorn’ [ˈᵑg͡bɛm] contains a voiced pre-nasalised co-articulated bilabial velar plosive (Donohue 2010 in Kaiping, Edwards and Klamer 2019)”
Then in your reference list you will have one reference for LexiRumah and one for Donohue (2010). You may wish (or your publisher may require) you to treat this original source as being “in” LexiRumah, in which case you can treat it like a chapter in a book and cite it as follows:
Donohue, Mark. 2010. Bomberai survey word lists. in Kaiping, Gereon A., Owen Edwards, and Marian Klamer (eds.). 2019. LexiRumah 3.0.0. Leiden: Leiden University Centre for Linguistics. Available online at https://lexirumah.model-ling.eu/. Accessed on 2024-11-07.
The Cognate sets section contains sets of words which are formally and semantically similar, and thus likely cognate (inherited from a single ancestral etymon). These cognate sets are automatically detected with LexStat which means they are a rough approximation, but not exactly the result of the application of the traditional comparative method through identification of regular correspondences. The alignment column within a single cognate set shows which segments of each word correspond to one another, on the basis of the same algorithm. The source for all such cognate sets is given as Automatic cognate coding with LexStat, 2019.
Each cognate set is named after a randomly selected term from all forms in the cognate class. This term is not necessarily a reconstructed proto-form or a particularly frequent or representative form.
Lexical items | 117,986 | |
---|---|---|
Concepts | 607 | |
Sources | 143 | |
Cognate sets | 4,028 | |
Languages | 357 | from 11 families |
279 | Austronesian | |
51 | Timor-Alor-Pantar | |
12 | South Bird's Head | |
4 | West Bomberai | |
3 | East Bird's Head | |
2 | Hatam-Mansim | |
2 | Konda-Yahadian | |
1 | Maybrat | |
1 | Mor | |
1 | Mpur | |
1 | Tambora |