Study Tools
Applications
Wenlin
Wenlin is a customizable and expandable Chinese-English dictionary, searchable Chinese text editor, Chinese text converter, and language learning tool. The dictionary is based on the updated edition of the ABC Chinese-English Dictionary edited by John DeFrancis (University of Hawaii Press, 2003), with concise, high-quality definitions for more than 10,000 individual characters and nearly 200,000 words and phrases, written with the learner of Chinese in mind. When no definition is available from the ABC dictionary, Wenlin draws from the Unicode database, which contains more than 19,000 brief but often useful definitions for individual characters.
Supports instant lookup and other useful features like lookup by component, an animated stroke-order box, audible pronunciations, and handwriting input. The flashcards component supplies frequency data for the 3,000 most commonly used characters.
MacKEY
The Chinese-English dictionary contains around 250,000 entries and supports instant lookup. Among many other useful language-learning features, MacKEY 5 can display Hanyu Pinyin or various Cantonese transcriptions alongside Chinese character texts. Also includes a text-to-speech module.
Clavis Sinica
Runs in Java. Clavis Sinica is a text reader designed for students of Chinese. It features an integrated set of dictionary windows that together supply information about the radical and phonetic elements of the character, compounds that contain the character, lexical information, and so on. The more than 25,000 words and phrases in the dictionary include the vocabulary used in most college-level textbooks for Chinese. The flashcards tool allows you to drill yourself on the pronunciation and meaning of words and characters from any text, or the 800 most commonly used characters. Handles both Simplified and Traditional Chinese, as well as Unicode.
DimSum
Runs in Java. Free. DimSum is a CEDICT-based instant-lookup text reader. Also has the ability to add romanizations to Chinese plain-text files, RTF files, and HTML pages. Includes an array of miscellaneous tools: an abacus, flashcards, Chinese name generator, GIF creator, and various converters for currency, measures, numbers, romanizations, and encodings.
http://www.mandarintools.com/dimsum.html
Zhongwen Development Tool
Runs in Java. Free, open-source. ZDT is a CEDICT-based flashcard application. Includes an instant-lookup web browser.
Unihan Variant Dictionary
Free. Extends well beyond the variants listed in the Unihan database. This is a useful tool for anyone doing scholarly work on China, especially those working with old editions of texts.
http://www.ideographer.com/unihan/
Extensions
Small Chinese Dictionary [小词典]
Free. Rob Rohan's 小词典 Desktop installs the CEDICT dictionary into the Dictionary application in Mac OS X 10.5 and above, which then makes the data available via control-click (or right click, if you can do that) in Cocoa applications. Rohan also offers a Chinese Word of the Day [今天的单词] widget and other feeds.
http://xiaocidian.com/xiaocidian-desktop/
LiveDictionary
For Safari. Provides instant lookup using the CEDICT dictionary for Chinese-English. Also works in other Cocoa applications like Mail and TextEdit (see the user forums).
http://www.eloquentsw.com/livedictionary.html
Chinese Popup Dictionary
For Firefox 2 and 3. From Popup Chinese. Instant lookup with a large collaborative dictionary that you can add to and edit.
https://addons.mozilla.org/en-US/firefox/addon/9144
Mandarin Popup
For Firefox 3. Instant lookup. Uses CEDICT.
https://addons.mozilla.org/en-US/firefox/addon/9931
CantoFish
For Firefox 2 and 3. Instant lookup. Uses adsotrans and CEDICT. Provides Cantonese readings and idioms. Supports both Yale and Jyutping romanizations as well as both traditional and simplified characters.
http://cantofish.wordpress.com/
Rikaichan
For Firefox 2 and 3. Instant lookup for Japanese text. You'll need to download both the add-on and a dictionary. Uses JMdict for Japanese-English.
http://www.polarcloud.com/rikaichan
Flashcard Tools
A variety of blank flashcard software is available for Mac OS X. Many programs aren't really designed for learners of Chinese, in that they don't support more than two "sides" of a given card. Two that do are:
- iFlash: Unicode-savvy.
- Studycard Studio: WorldScript-savvy.
See Fool's Flashcard Review for more information.
Note: See above for Chinese language-learning applications with built-in flashcard tools.
Web Dictionaries
If you interested in any of the following projects, the Chinese Dictionaries discussion group is a good place to start asking questions.
Chinese-English
Rick Harbaugh's Chinese Characters, a Genealogy and Dictionary is here: http://www.zhongwen.com/
Charles Muller has developed two collaborative dictionary projects:
- CJKV-English Dictionary: http://www.buddhism-dict.net/dealt/
- Digital Dictionary of Buddhism: http://www.buddhism-dict.net/ddb/
Soothill's classic Dictionary of Chinese Buddhist Terms (1934) is here: http://www.acmuller.net/soothill/index.html
An excellent, if idiosyncratic, dictionary is Lin Yutang's Chinese-English Dictionary of Modern Usage. First published in 1972, it is now online at: http://humanum.arts.cuhk.edu.hk/Lexis/Lindict/
Richard Sears provides an online etymology of Chinese characters: http://chineseetymology.org/
CEDICT is a public-domain Chinese-English word dictionary, currently maintained by MDBG (Netherlands) under a Creative Commons license as CC-CEDICT: http://www.mdbg.net/chindict/chindict.php?page=cedict
- MDBG offers a Unicode-based online interface with CEDICT and the Unihan database: http://www.xuezhongwen.net/chindict/chindict.php
SmartHanzi.net offers an interface with both CEDICT and the HanDeDict (Chinese-German) dictionaries, as well as lookups using the Digital Dictionary of Buddhism.
In addition, various language-learning sites feature Chinese-English dictionaries:
- nciku (n词酷) includes a good dictionary, with handwriting input: http://www.nciku.com/
- Popup Chinese includes a user-editable dictionary and is also the home of the adsotrans natural-language processing engine: http://popupchinese.com/tools/
- YellowBridge (黃橋) has a large dictionary, also with handwriting input: http://www.yellowbridge.com/chinese/chinese-dictionary.php
Chinese-Chinese
Handian 漢典 can search using both simplified and traditional characters. Dictionary entries include the classic Shuowen jiezi 說文解字 and Kangxi zidian 康熙字典 definitions: http://www.zdic.net/
Guoyu Cidian 國語辭典 (Ministry of Education, Taiwan): http://140.111.34.46/dict/
Web Databases
Language
There are various collections of texts and textbooks designed for language learning online:
- Chinese Text Sampler (David Porter, University of Michigan) [GB]: http://www-personal.umich.edu/~dporter/sampler/sampler.html
- Read Chinese! (National Foreign Language Center, University of Maryland) [Unicode]: http://readchinese.nflc.org/
Literature
This is the largest and most fluid group of Chinese texts on the Internet, and we hope to one day do it justice. For now, this section is under construction. Here are links to some of the sources that will be discussed:
- Xin Yu Si (New Threads Electronic Library) [GB]: http://www.xys.org/
- Hong Kong Literature Database [Unicode]: http://hklitpub.lib.cuhk.edu.hk/journals/index.jsp
Early China
CHANT (CHinese ANcient Texts), Chinese University of Hong Kong. Unicode-encoded. A careful and comprehensive scholarly project, producing definitive editions of early texts. The Pre-Han, Han, and Six Dynasties texts are the basis of the ongoing ICS Ancient Chinese Texts Concordance Series. Access to this site is by annual subscription. Individual fees are US$350 for all five current databases (see below), less for single databases.
- Jiaguwen (Oracular Inscriptions on Tortoise Shells and Bones). Texts of 53,834 inscriptions, with both the original characters and their "orthographic translations." See the tour at: http://www.chant.org/info/demo_jiaguwen.asp
- Jinwen (Bronze Inscriptions). Inscriptions from 12,021 bronze vessels and around 18,000 rubbings and tracings, with both the original characters and their "orthographic translations." See the tour at: http://www.chant.org/info/demo_jinwen.asp
- Jianbo (Excavated Wood/Bamboo and Silk Scripts). The entire corpus of published texts, with scanned images of the texts juxtaposed with their interpretation in standardized characters. See the tour at: http://www.chant.org/info/demo_jianbo.asp
- Pre-Han and Han (pre-220) texts, as comprehensive as possible. See the tour at: http://www.chant.org/info/demo_prehan.asp
- Six Dynasties (220–581) texts, as comprehensive as possible. See the tour at: http://www.chant.org/info/demo_sixdyn.asp
- Leishu (Chinese Encyclopedias). Coming soon.
The rest of this section is under construction. Here are links to some of the sources that will be discussed:
- Transcription of the 1935 Harvard-Yenching edition of the Zhouyi (i.e., the "Book of Changes") [Unicode]: http://www.biroco.com/yijing/zhouyi.htm
Classical Texts
Thesaurus Linguae Sericae (TLS), "An Historical and Comparative Encyclopaedia of Chinese Conceptual Schemes." Unicode-encoded. TLS began as an innovative synonym dictionary for classical Chinese, but this "cheerfully over-ambitious, exploratory and experimental" project has begun to expand into other areas, including modern spoken Chinese. See: http://tls.uni-hd.de/Lasso/TLS/
Scripta Sinica, Academia Sinica, Taiwan. Big Five-encoded. You can browse through the texts and search, but the database does not allow Boolean searches. Incorporates the 25 dynastic histories and much more. See: http://www.sinica.edu.tw/ftms-bin/ftmsw3
Hanquan 寒泉 (Cold Spring), Taiwan Normal University Library. Big Five-encoded. The database permits Boolean searches and the origins of the search results are clearly identified. While there is some overlap with Scripta Sinica, there are a number of important historical and literary texts that are only available here, along with the 1798 Siku quanshu zongmu tiyao 四庫全書總目提要 (Annotated Catalog of Books in the Imperial Library). See: http://140.122.127.253/dragon/
The rest of this section is under construction. Here are links to some of the sources that will be discussed:
- Palace Museum, Taiwan [Big Five]: http://210.69.170.100/s25/index.htm
- Zuozhuan Digital Concordance (El Colegio de México) [Big Five]: http://intranet.colmex.mx/zuozhuan/
- Chinese Philosophical E-Text Archive (Wesleyan University) [Big Five]: http://sangle.web.wesleyan.edu/etext/
- Chinese Text Project [Unicode]: http://chinese.dsturgeon.net/
Buddhist Texts
Thesaurus Literaturae Buddhicae (TLB) presents Buddhist literature sentence by sentence in four languages: Sanskrit, Chinese, Tibetan, and English. Unicode-based. As of May 2009, this project contains only nine texts, but they are important ones, including Śāntideva's Bodhicaryāvatāra and the Vimalakīrtinirdeśa sutra: https://www2.hf.uio.no/polyglotta/index.php?page=library&library=TLB
The full text of all 85 volumes of the Taishō canon [Taishō Shinshū Daizōkyō, 大正新脩大藏經] is available for search at the University of Toyko's SAT Daizōkyō Text Database: http://21dzk.l.u-tokyo.ac.jp/SAT/. Unicode-based. Preserves the printed Taishō text, line by line. Each line is preceded by the text number and then the Taishō volume, page, and line information. In addition, the SAT site is integrated with the Digital Dictionary of Buddhism project (see above) and the INBUDS (Indian and Buddhist Studies) database.
The Chinese Buddhist Electronic Text Association (CBETA) provides the Chinese volumes of the Taishō canon, along with a selection of Chinese historical texts from the extended canon [Shinsan Zokuzōkyō, 卍新纂續藏經]: http://www.cbeta.org/. The CBETA mirror that works best for downloads outside of East Asia is Dharma Drum Mountain [法鼓山]. The materials on the site exist in a variety of formats and encodings, including:
- Puji 普及 (Normal) ~ Preserves the printed Taishō text, line by line. Each line is preceded by Taishō volume, text number, page, and line information. Each scroll [卷] in the text is a separate document. UTF-8 and Big Five.
- App (named for Professor Urs App) ~ This format is the same as Normal, but the ends of the lines have been changed to coincide with the punctuation. In the Taishō information, the number in parenthesis after the line number indicates how many characters have been moved to the beginning of the next line. Not only does this make the text easier to read, but it also makes searches for compounds more reliable. Each text is a single document. UTF-8 and Big Five.
- PDF ~ Organized by paragraph. Each text is a single PDF document. UTF-8 only.
- XML ~ Requires an understanding of XML and the TEI standard. Each text is a single XML document. UTF-8 only.
Both the UTF-8 and Big Five documents use the same approach to "rare" characters, i.e., characters that are not in their character set. If the character is a standard variant, then a "normalized" form is used without comment. If it is not a standard variant, then a formula in brackets is used, like "[(序-予+林)/女]" or "[牛*宅]". UTF-8 has a larger character set than Big Five, so there are fewer normalized forms in the UTF-8 documents. Only the XML documents indicate which characters have been normalized.
The downloads are organized by canon and volume [冊]. Each volume has an individual page with its own table of contents. You can reach them from the main index page:
http://w3.cbeta.org/index_list.htm [Dharma Drum Mountain]
There are seven sections. Click on the tabs to see a list of volumes in that section. Use the pop-up menus to select a format to download. Click on the title of each volume to go to its table of contents page, which lists the Taishō number and title for each text in the volume, along with the date of the most recent CBETA release, the number of scrolls in the text, and the dynasty and/or author/editor of the text. Click on the title of each text for an index of HTML pages for each scroll in the text (App format, Big Five encoding).
Note: Christian Wittern has created a group of Firefox search plug-ins for the SAT and CBETA databases, available here.