Study Tools

Language Study

Wenlin

For OS 9 and OS X. Full Unicode support. Wenlin 3 is a customizable and expandable Chinese-English dictionary, searchable Chinese text editor, Chinese text converter, and language learning tool. The dictionary is based on the updated edition of the ABC Chinese-English Dictionary edited by John DeFrancis (University of Hawaii Press, 2003), with concise, high-quality definitions for more than 10,000 individual characters and nearly 200,000 words and phrases, written with the learner of Chinese in mind. When no definition is available from the ABC dictionary, Wenlin draws from Unicode database, which contains more than 19,000 brief but often useful definitions for individual characters.

Supports instant lookup and other useful features like lookup by component, an animated stroke-order box, audible pronunciations, and handwriting input. The flashcards component supplies frequency data for the 3,000 most commonly used characters.

http://www.wenlin.com/

MacKEY

For OS X 10.2.4 and above. The Chinese-English dictionary contains around 250,000 entries and supports instant lookup. Among many other useful language-learning features, MacKEY 5 can display Hanyu Pinyin or various Cantonese transcriptions alongside Chinese character texts. Also includes a text-to-speech module.

http://www.cjkware.com/

Clavis Sinica

For OS X 10.3 and above (Java). Clavis Sinica is a text reader designed for students of Chinese. It features an integrated set of dictionary windows that together supply information about the radical and phonetic elements of the character, compounds that contain the character, lexical information, and so on. The more than 25,000 words and phrases in the dictionary include the vocabulary used in most college-level textbooks for Chinese. The flashcards tool allows you to drill yourself on the pronunciation and meaning of words and characters from any text, or the 800 most commonly used characters. Handles both Simplified and Traditional Chinese, as well as Unicode.

http://www.clavisinica.com/

DimSum

Free. For OS X 10.3 and above (Java). DimSum is a CEDICT-based instant-lookup text reader. Also has the ability to add romanizations to Chinese plain-text files, RTF files, and HTML pages. Includes an array of miscellaneous tools: an abacus, flashcards, Chinese name generator, GIF creator, and various converters for currency, measures, numbers, romanizations, and encodings.

http://www.mandarintools.com/dimsum.html

Zhongwen Development Tool

Free, open-source. For OS X 10.4 and above (Java). ZDT is a CEDICT-based flashcard tool, with optional support for the Adsotrans database. Also includes an instant-lookup web browser.

http://zdt.sourceforge.net/

LiveDictionary

For OS X 10.3 and above. LiveDictionary provides instant lookup in Safari. Uses the CEDICT dictionary for Chinese. Also works in other Cocoa applications like Mail and TextEdit, though this is not officially supported (see the user forums).

http://www.eloquentsw.com/livedictionary.html

Language Aid

For OS X 10.4 and above. Language Aid provides system-wide text lookups. Uses the CEDICT dictionary for Chinese.

http://www.aorensoftware.com/LanguageAid/

Unihan Variant Dictionary

Free. OS X 10.2 and above. Extends beyond the variants listed in the Unihan database. For anyone doing scholarly work on China, especially those working with old editions of texts, this is a most welcome tool.

http://www.ideographer.com/unihan/

Flashcard Tools

There is a wide variety of flashcard software available for Mac OS X. Reviews of many of the following are available at the Fool's Flashcard Review.

  • Anki: Free, open-source. Unicode-savvy, OS X 10.4 and above.
  • ProVoc: Free. Unicode-savvy, versions available for OS X 10.3 and above.
  • iFlash: Unicode-savvy, OS X 10.4 and above.
  • Mental Case: Unicode-savvy, OS X 10.4 and above.
  • Mindburn: Unicode-savvy, OS X 10.3 and above.
  • Studycard Studio: WorldScript-savvy, OS 9 and above.

Dictionaries Online

If you interested in any of the following projects, the Chinese Dictionaries discussion group is a good place to start asking questions.

Chinese-English

Rick Harbaugh's Chinese Characters, a Genealogy and Dictionary is online at: http://www.zhongwen.com/

Charles Muller has developed two online, Unicode-based dictionaries:

Soothill's classic Dictionary of Chinese Buddhist Terms is available in HTML format: http://www.hm.tyg.jp/~acmuller/soothill/

A fine, if idiosyncratic, dictionary is Lin Yutang's Chinese-English Dictionary of Modern Usage. First published in 1972, it is now online at: http://humanum.arts.cuhk.edu.hk/Lexis/Lindict/

Thomas Chin, Lau Chun-fat, and Chang Kai-hui's Dictionary of Chinese Characters, besides standard Mandarin, allows lookup by Cantonese and Hakka pronunciations (you can set the Romanizations to any of the most common systems), and includes a radical-stroke index, Cangjie, and more: http://www.chinalanguage.com/CCDICT/index.html

Richard Sears provides an online etymology of Chinese characters: http://chineseetymology.org/

CEDICT is a public-domain Chinese-English word dictionary founded in 1997 and now maintained by MDBG (Netherlands) under a Creative Commons license as CC-CEDICT: http://www.mdbg.net/chindict/chindict.php?page=cedict

MDBG offers a Unicode-based online interface with CC-CEDICT and the Unihan database: http://www.xuezhongwen.net/chindict/chindict.php

SmartHanzi.net offers an interface with both CC-CEDICT and the HanDeDict (Chinese-German) dictionaries, as well lookups using the Digital Dictionary of Buddhism (see above).

Adsotrans is an open-source natural language processing engine from David Lancashire. It is used for Chinese text annotation and analysis, machine translation, language learning, search processing, and more. There is an online interface: http://adsotrans.com/free-chinese-dictionary.html

Chinese-Chinese

Guoyu Cidian 國語辭典 (Ministry of Education, Taiwan): http://140.111.34.46/dict/

Learning Online

nciku

nciku is a free online Chinese-English and English-Chinese dictionary and language learning web site for Chinese and English, based in Beijing. The excellent dictionary features example sentences, conversations, handwriting-recognition lookup, and more, with an active community of users in the forums.

http://www.nciku.com/

ChinesePod

Shanghai-based ChinesePod is not free, but provides a decent selection of podcasts for learning Chinese on four levels. YMMV.

http://www.chinesepod.com/

Texts Online

Early China

CHANT (CHinese ANcient Texts), Chinese University of Hong Kong. Unicode-encoded. A careful and comprehensive scholarly project, producing definitive editions of early texts. The Pre-Han, Han, and Six Dynasties texts are the basis of the ongoing ICS Ancient Chinese Texts Concordance Series. Access to this site is by annual subscription. Individual fees are US$350 for all five current databases (see below), less for single databases.

See: http://www.chant.org/

Classical Texts

Scripta Sinica, Academia Sinica, Taiwan. Big Five-encoded. You can browse through the texts and search, but the database does not allow Boolean searches. Incorporates the 25 dynastic histories and much more. See: http://www.sinica.edu.tw/ftms-bin/ftmsw3

Hanquan 寒泉 (Cold Spring), Taiwan Normal University Library. Big Five-encoded. The database permits Boolean searches and the origins of the search results are clearly identified. While there is some overlap with Scripta Sinica, there are a number of important historical and literary texts that are only available here, along with the 1798 Siku quanshu zongmu tiyao 四庫全書總目提要 (Annotated Catalog of Books in the Imperial Library). See: http://140.122.127.253/dragon/

The rest of this section is under construction. Here are links to some of the sources that will be discussed:

Buddhist Texts

Chinese Buddhist Electronic Text Association (CBETA) provides the Chinese sections of the Taisho canon [大正藏經] and a selection of Chinese historical texts from the extended canon [卍續藏經]. The primary site is http://www.cbeta.org/. There are several mirrors. The one that works best for downloads outside of East Asia is Dharma Drum Mountain [法鼓山].

The materials on the site exist in a variety of formats, including:

  • Puji 普及 (Normal) ~ Preserves the printed Taisho text, line by line. Each line is preceded by volume, text, page and line information. For example, "T51n2099_p1101a01" means "Taisho canon, volume 51, text number 2099, page number 1101a, line 1." Each scroll [卷] in the text is a separate document. Big Five and UTF-8.
  • App (named for Professor Urs App) ~ This format is the same as Normal, but the ends of the lines have been changed to coincide with the punctuation. In the Taisho information, the number in parenthesis after the line number indicates how many characters have been moved to the beginning of the next line. Not only does this make the text easier to read, but it also makes searches for compounds more reliable. Each text is a single document. Big Five and UTF-8.
  • PDA ~ Organized by paragraph. Each scroll in the text is a separate document. Big Five only.
  • PDF ~ Organized by paragraph. Each text is a single PDF document. UTF-8 only.
  • XML ~ Requires an understanding of XML and the TEI standard. Each text is a single XML document. UTF-8 only.

Both the UTF-8 and Big Five documents use the same approach to "rare" characters, i.e., characters that are not in their character set. If the character is a standard variant, then a "normalized" form is used without comment. If it is not a standard variant, then a formula in brackets is used, like "[(序-予+林)/女]" or "[牛*宅]". There are fewer rare characters in UTF-8, so there are fewer normalized forms in the UTF-8 documents. Only the XML documents indicate which characters have been normalized.

The downloads are organized by canon and volume [冊]. Each volume has an individual page with its own table of contents. You can reach them from the main index page:

http://w3.cbeta.org/index_list.htm [Dharma Drum Mountain]

There are seven sections. Click on the tabs to see a list of volumes in that section. Use the pop-up menus to select a format to download. Click on the title of each volume to go to its table of contents page, which lists the Taisho number and title for each text in the volume, along with the date of the most recent CBETA release, the number of scrolls in the text, and the dynasty and/or author/editor of the text. Click on the title of each text to go an index of HTML pages for each scroll in the text (App format, Big Five encoding).

The site offers two other Windows-only formats for download:

  • CBReader is a Windows application built from the XML documents.
  • HTML Help uses the Microsoft CHM (Compiled HTML) format. "HTML Help" is the name of the Windows application that reads CHM files. The downloads are available in both Big-5 and GBK. They both contain traditional-character texts in HTML format, with detailed indexing and links to images of rare characters not in the encoding. The only way to extract the complete HTML is to use the free Microsoft HTML Help Workshop utility to "decompile" the CHM files on a PC. The index files do not work outside of the CHM format, but the links to images of rare characters are retained in the HTML after it is extracted, along with anchors to each individual line of text, and more. The GBK files have fewer rare characters and work well in Mac OS X. The last download is a list of rare characters not in Big Five (rare.chm). This contains information for 13,529 characters, with images and other information.

The texts have also been converted to simplified characters. See: http://www.fodian.net/

Literary Texts

This section is under construction. Here are links to some of the sources that will be discussed: