How Is Sonnet Stacking Up?
The Sonnet stack is quickly taking shape. We are going to have some cool capabilities in KDE4 that make writing much easier. The stack, as it is planned, is shown bellow. The brackets show an estimate of the work that has been completed thus far.
EDIT: Based off a discussion on kde-core-devel, all Sonnet classes will be in the Sonnet namespace. So you can prepend Sonnet:: to the class names below.
Foundations
- QString & other Qt classes - Provides 16 bit strings that store Unicode characters.
- UnicodeData [90%] Provides means to query information from UCD files provided by the Unicode Consortium. A tool named parseucd is provided to convert ucd files into a data format optimized for fast lookups and low memory usage. This also allows users to regenerate any relevant data files in order to modify behavior in the rest of the stack.
Parsing (NLP)
- GuessLanguage [50%] This class provides a statistical guess as to which language a given sample might be written in. It is based off a simple N-gram model and currently uses a trigram as well as other heuristics to determine a language. The class will be tuned to provide fast prediction for paragraph length text. [Currently based off Languid]
- TextBreaks [90%] Provides a list of relevant breaks in a given string. The default implementation will use the suggestions provided by the Unicode Consortium. This should provide adequate partitioning for word and sentence boundaries in most the world's languages (where such concepts have meaning in orthography)
- AbstractFilter/DefaultFilter [85%] Provides a customizable filter for determining words and sentences. This classes determines textbreak locations and then determines if each segmented part of speech is relevent to the target of the query. This can be customized in interpretable rules set by the user.
Correctness Testing (Spell/Grammar/Style Checking)
- Currently KSpell2 uses a plugin framework for accessing spellchecker engines. AbiWord uses a framework they developed called Enchant which is preforms almost exactly the same task as KSpell and has a very similar interface for plugins. This is no coincidence since most spellcheckers implement an API designed for compatibly with ispell. In fact, KSpell has a Enchant plugin.
- Sonnet will utilize Enchant as the interface to spellchecking and no longer support old plugins. This allows us to use the same spelling engines and rules along with the growing number of applications supporting Enchant. This also makes Sonnet more maintainable, bugfree and have more plugins available for more languages.
- Grammar checking and style checking are highly requested features and will be available via Elixir. Rather than write a KDE specific framework for interfacing with grammar checkers, we are working with the developers of Enchant to provide a general library similar to Enchant but tailored to the needs of these types of tools.
- Enchant [98%]
- Elixir [5%]
- Spell [99%] An interface to Enchant.
- Grammar [50%] An interface to Elixir
Background Checking
- The parsing and analysis of language is time intensive. Sonnet will replace the old KSpell2 background checking (based on QThread subclassing) with a ThreadWeaver based implementation that will support both KSpell and KGrammar. [10%]
GUI
No work has started on the GUI layer. Usability review requests have been made and I'm awaiting feedback. Until then, as can be seen, there is a bunch of lower level work to be keep busy with.
- Configuration - Implement features to embed configuration of Enchant and Elixir in applications.
- Standard Checking Dialogs & Widgets - This includes the dialog that appears when checking text and allows you to iterate through errors.
- Highlightling - Automatic highlighting of misspelled words, etc...
Beyond the usability of the gui, some consideration is now being taken to determine proper behaviors for the actions associated with checking a document. For example, should "ignore" permantly ignore that word in the application? Systemwide for all applications? Or, just for the session in which the application is used?
Auxiliary Code
Sonnet have will a number of helpful classes and code snippets that can be incorporated into applications, including, but not limited to:
- Automatic detection of language and using setting the spellchecker to use the correct dictionary.
- Advanced statistics - word/sentence/other counts, readability scores(Kincaid, ARI, Fog, etc...)
- Advanced layout hints - Example: should text containing 70% Hebrew be right aligned?
- Tools to define and configure autocorrection.

