Language Tools



Wenlin is a customizable and expandable Chinese-English dictionary, searchable Chinese text editor, Chinese text converter, and language learning tool. The dictionary is based on the comprehensive edition of the ABC Chinese-English Dictionary (2003), with concise, high-quality definitions for more than 10,000 hanzi and nearly 200,000 words and phrases, written with the learner of Chinese in mind. Wenlin 4 also incorporates the ABC English-Chinese Dictionary (2010). Supports more than 75,000 Unicode hanzi, along with more than 11,000 seal-script [篆書] characters. When no definition is available from the ABC dictionary, Wenlin draws from the Unihan database, which contains thousands of additional definitions (brief, but often useful) for individual hanzi.

Supports instant lookup and other useful features like lookup by component, an animated stroke-order box, audible pronunciations, and handwriting input. Wenlin 4's flashcards component supports words and phrases (i.e., not just individual hanzi).


The Chinese-English dictionary contains around 300,000 entries and supports instant lookup. Among many other useful language-learning features, KEY 5.x can display Hanyu Pinyin or various Cantonese transcriptions alongside Chinese character texts. Also includes a text-to-speech module.

Clavis Sinica

Runs in Java. Clavis Sinica is a text reader designed for students of Chinese. It features an integrated set of dictionary windows that together supply information about the radical and phonetic elements of the character, compounds that contain the character, lexical information, and so on. The more than 25,000 words and phrases in the dictionary include the vocabulary used in most college-level textbooks for Chinese. The flashcards tool allows you to drill yourself on the pronunciation and meaning of words and characters from any text, or the 800 most commonly used characters. Handles both Simplified and Traditional Chinese, as well as Unicode.

They also provide a variety of useful, free online Chinese Language Resources.

Le Grand Ricci Numérique

This is the monumental "Le Grand Dictionnaire Ricci de la langue chinoise" published in 2001, the most encyclopedic Chinese dictionary in any Western language. The first digital version (2010) is a FileMaker database composed of 13,300 character entries and 280,000 words and phrases. Supports both simplified and traditional Chinese, as well as both Pinyin and Wade-Giles romanizations.


Chinese Rewriter

Chinese Rewriter handles lexical differences between usage in mainland China and Taiwan. It uses dictionary files to process the lexical replacement, and the dictionaries can be edited by the user. It also has a "Smart Conversion" function that can convert hanzi based on context.

Supports the application services feature in OS X. See:

Unihan Variant Dictionary

Free. Extends well beyond the variants listed in the Unihan database. This is a useful tool for anyone doing scholarly work on China, especially those working with old editions of texts.


Free, open-source. Damien Elmes' Anki is the best flashcard tool available on any platform. Don't waste your time with anything else. See the review at Fool's Flashcard Review.

To support this project, buy the iOS app, which is priced as a donation ($25) to the larger effort.

Language Aid

Language Aid is an instant-lookup tool that can be tied via plug-ins to various web resources, including CEDICT [Chinese-English], EDICT [Japanese-English], and Google Translate.


Small Chinese Dictionary [小词典]

Free. Rob Rohan's 小词典 Desktop installs the CEDICT dictionary into the Dictionary application in Mac OS X 10.5 and above, which then makes the data available via control-click (or right click, if you can do that) in Cocoa applications.


For Safari. Provides instant lookup using the CEDICT dictionary, and more. Also works in other Cocoa applications like Mail and TextEdit (see the user forums).

Mandarin Popup

For Firefox 3. Instant lookup. Uses the CEDICT dictionary.


For Firefox 2 and 3. Instant lookup. Uses adsotrans and CEDICT. Provides Cantonese readings and idioms. Supports both Yale and Jyutping romanizations as well as both traditional and simplified characters.


For Firefox 2 and 3. Instant lookup for Japanese text. You'll need to download both the add-on and a dictionary. Uses JMdict for Japanese-English.

Web Dictionaries

If you interested in any of the following projects, the Chinese Dictionaries discussion group is a good place to start asking questions.


Rick Harbaugh's Chinese Characters, a Genealogy and Dictionary is here:

Charles Muller has developed two collaborative dictionary projects:

Soothill's classic Dictionary of Chinese Buddhist Terms (1934) is here:

An excellent, if idiosyncratic, dictionary is Lin Yutang's Chinese-English Dictionary of Modern Usage. First published in 1972, it is now online at:

Richard Sears provides an online etymology of Chinese characters:

CEDICT is a public-domain Chinese-English word dictionary, currently maintained by MDBG (Netherlands) under a Creative Commons license as CC-CEDICT: offers an interface with both CEDICT and the HanDeDict (Chinese-German) dictionaries, as well as lookups using the Digital Dictionary of Buddhism.

In addition, various language-learning sites feature Chinese-English dictionaries:


Handian 漢典 can search using both simplified and traditional characters. Dictionary entries include the classic Shuowen jiezi 說文解字 and Kangxi zidian 康熙字典 definitions:

Guoyu Cidian 國語辭典 [Mandarin Dictionary] (Ministry of Education, Taiwan): (Big5-encoded)

Yitizi Zidian 異體字字典 [Dictionary of Character Variants] (Ministry of Education, Taiwan): (Big5-encoded)

Web Databases


There are various collections of texts and textbooks designed for language learning online:


This is the largest and most fluid group of Chinese texts on the Internet, and we hope to one day do it justice. For now, this section is under construction. Here are links to some of the sources that will be discussed:

Early China

CHANT (CHinese ANcient Texts), Chinese University of Hong Kong. Unicode-encoded. A careful and comprehensive scholarly project, producing definitive editions of early texts. The Pre-Han, Han, and Six Dynasties texts are the basis of the ongoing ICS Ancient Chinese Texts Concordance Series. Access to this site is by annual subscription. Individual fees are US$350 for all five current databases (see below), less for single databases.


The rest of this section is under construction. Here are links to some of the sources that will be discussed:

Classical Texts

Thesaurus Linguae Sericae (TLS), "An Historical and Comparative Encyclopaedia of Chinese Conceptual Schemes." Unicode-encoded. TLS began as an innovative synonym dictionary for classical Chinese, but this "cheerfully over-ambitious, exploratory and experimental" project has begun to expand into other areas, including modern spoken Chinese. See:

Scripta Sinica, Academia Sinica, Taiwan. Big Five-encoded. You can browse through the texts and search, but the database does not allow Boolean searches. Incorporates the 25 dynastic histories and much more. See:

Hanquan 寒泉 (Cold Spring), Taiwan Normal University Library. Big Five-encoded. The database permits Boolean searches and the origins of the search results are clearly identified. While there is some overlap with Scripta Sinica, there are a number of important historical and literary texts that are only available here, along with the 1798 Siku quanshu zongmu tiyao 四庫全書總目提要 (Annotated Catalog of Books in the Imperial Library). See:

The rest of this section is under construction. Here are links to some of the sources that will be discussed:

Buddhist Texts

Thesaurus Literaturae Buddhicae (TLB) presents Buddhist literature sentence by sentence in four languages: Sanskrit, Chinese, Tibetan, and English. Unicode-based. As of May 2009, this project contains only nine texts, but they are important ones, including Śāntideva's Bodhicaryāvatāra and the Vimalakīrtinirdeśa sutra:

The full text of all 85 volumes of the Taishō canon [Taishō Shinshū Daizōkyō, 大正新脩大藏經] is available for search at the University of Toyko's SAT Daizōkyō Text Database: Unicode-based. Preserves the printed Taishō text, line by line. Each line is preceded by the text number and then the Taishō volume, page, and line information. In addition, the SAT site is integrated with the Digital Dictionary of Buddhism project (see above) and the INBUDS (Indian and Buddhist Studies) database.

The Chinese Buddhist Electronic Text Association (CBETA) provides the Chinese volumes of the Taishō canon, along with a selection of Chinese historical texts from the extended canon [Shinsan Zokuzōkyō, 卍新纂續藏經]: The CBETA mirror that works best for downloads outside of East Asia is Dharma Drum Mountain [法鼓山]. The materials on the site exist in a variety of formats and encodings, including:

  • Puji 普及 (Normal) ~ Preserves the printed Taishō text, line by line. Each line is preceded by Taishō volume, text number, page, and line information. Each scroll [卷] in the text is a separate document. UTF-8 and Big Five.
  • App (named for Professor Urs App) ~ This format is the same as Normal, but the ends of the lines have been changed to coincide with the punctuation. In the Taishō information, the number in parenthesis after the line number indicates how many characters have been moved to the beginning of the next line. Not only does this make the text easier to read, but it also makes searches for compounds more reliable. Each text is a single document. UTF-8 and Big Five.
  • PDF ~ Organized by paragraph. Each text is a single PDF document. UTF-8 only.
  • XML ~ Requires an understanding of XML and the TEI standard. Each text is a single XML document. UTF-8 only.

Both the UTF-8 and Big Five documents use the same approach to "rare" characters, i.e., characters that are not in their character set. If the character is a standard variant, then a "normalized" form is used without comment. If it is not a standard variant, then a formula in brackets is used, like "[(序-予+林)/女]" or "[牛*宅]". UTF-8 has a larger character set than Big Five, so there are fewer normalized forms in the UTF-8 documents. Only the XML documents indicate which characters have been normalized.

The downloads are organized by canon and volume [冊]. Each volume has an individual page with its own table of contents. You can reach them from the main index page: [Dharma Drum Mountain]

There are seven sections. Click on the tabs to see a list of volumes in that section. Use the pop-up menus to select a format to download. Click on the title of each volume to go to its table of contents page, which lists the Taishō number and title for each text in the volume, along with the date of the most recent CBETA release, the number of scrolls in the text, and the dynasty and/or author/editor of the text. Click on the title of each text for an index of HTML pages for each scroll in the text (App format, Big Five encoding).

Note: Christian Wittern has created a group of Firefox search plug-ins for the SAT and CBETA databases, available here.