Freedesktop.org Spec for Language Checking
I've added a new spec to the Freedesktop.org wiki for Desktop Language Checking. It will be used to coordinate efforts between Gnome, KDE and others on spelling/grammar/style/diction checking.
Also check out this Gnome bug report: Bug 383706 – Adding support for spellcheckers into the Gtk+ stack


13 comments:
Hi Jacob
I read about Sonnet with interest - this is the first time I've heard of it. How difficult would it be to link up backends to it? Kevin P Scannell has developed a grammar checker for Irish which I am porting to Welsh, and which could be adapted for a number of minority languages (http://borel.slu.edu/gramadoir). He also has a web-crawling project going to collect words from minority languages (something like 180 currently), which could then be used to build spell-checkers (http://borel.slu.edu/crubadan/index.html).
My Welsh version of the spellchecker provides meanings for each of the alternatives (based on a GPL dictionary I'm (slowly!) putting together - http://www.eurfa.org.uk), and I've done a draft web-interface at http://www.klebran.org.uk (alpha only - not finished yet). How difficult would it be to try adapting these to the Sonnet interface? I know nothing of C++ programming, so I'm at a complete loss as to where to begin!
Kevin Donnelly
Glad to see your rolling this into a spec. I'm so tired of OOo and Mozilla and everything else using different spell checkers and differrent PWLs. I am also very interested in spell checking and related themes.
At Translate.org.za we've developed spell checkers of varying quality :) for the 11 official languages of South Africa.
I'd like to know what we need to supply to get languages guessing working in Sonnet for those languages. Can you write a blog entry on what you need?
One thing I would love is that users can submit their personaly dictionaries for possible inclusion into the formal dictionary. Every time someone adds a word to a personal dictionary there is a chance that it should go into the official one. We could create a poweful network of dictionary improvers.
Kevin: Thanks, An Gramadóir looks very interesting.
I'm currently creating an interface for grammar checkers. It is called Elixir. This interface will allow one to write a plugin for An Gramadóir. It looks like the grammar checker is a perl module, so the plugin would have to call a perl executable or link to an embedded perl library.
To begin email me how your checkers are accessed, in whatever (programming) language you use.
An example:
bool checkWord(string myMord)
so using this would work like:
if ( checkWord("word") )
print("correct spelling")
Hi Jacob
Yes, Gramadóir is a perl module, so it is completely cross-platform, and should be easily-installable for end-users once someone does the initial development work to write rules for that specific language. Kevin Scannell does a lot of work in natural-language processing, so I believe the background is fairly robust.
I'll try emailing the only address I can find for you (?), in case you want to follow up the technical aspects in more detail than below.
Insofar as I understand your access query, Gramadóir does its own tokenising (I think based on rules derived from the words found by the Crubadán crawler, but Kevin Scannell could comment on this in more detail). So you basically feed it a text file:
gram-cy.pl tests1.txt
or you can echo text into it:
echo "y pobl" | gram-cy.pl
The latter is the one I use in Klebran.
Output is available in a variety of formats. The default will print to the terminal, so using the echo line above:
1: y pobl
Lenition missing
(that is, "pobl" should read "bobl" in this context)
The output messages can be given in any language that Gramadóir has a po-file for, by adding the flag --interface=xx (where xx is the two-letter language name).
The --api flag will give output in a pseudo-XML format:
{E offset="0" fromy="1" fromx="0" toy="1" tox="5" sentence="y pobl " errortext="y pobl" msg="Lenition missing"}
(Because of the posting issues with HTML, the curly brackets in this post stand for angle brackets.)
The --xml flag will give a proper XML format:
{?xml version="1.0" encoding="utf-8" standalone="no"?}
{!DOCTYPE teacs SYSTEM "http://borel.slu.edu/dtds/gram-cy.dtd"}
{teacs}
{line} {E msg="SEIMHIU"}{T}y{/T} {N pl="n" gnd="f" m="0"}pobl{/N}{/E} {/line}
{/teacs}
This is the one used in Klebran, since it makes easily accessible a great deal of the part-of-speech information from the back-end tagged wordlist.
The --html flag will give a HTML format:
{br}{br}1: {b class=gramadoir}y pobl{/b} {br}
Lenition missing
Kevin Scannell has recently suggested creating XSLT stylesheets to convert the XML output into other formats, overtaking the HTML output option.
The above refers to the grammar-checking, which of course, in contrast to spellcheckers, rarely works on the basis of single words/tokens. However, as part of this, Gramadóir flags words that it cannot find in its lexicon, and makes some guesses about those too. For instance, it may suggest that a word seems to be foreign, because it contains a sequence of letters that occur only rarely in the language in question. Or it may suggest that an unknown hyphenated word is a compound of two others that it does know. And anything that's left (simplifying!) will be flagged as unknown, and therefore likely a spelling mistake (the "likely" will only apply, of course, if the lexicon is big enough!).
In Klebran, I do a search on these words against the full dictionary (Eurfa), since that way I can provide meanings, but you can instead pass Gramadóir the --aspell flag, which will suggest corrections for misspellings from an aspell file for the language.
I hope this gives a bit more background, and that I have not misrepresented anything - I will alert Kevin Scannell and the Gramadóir list to this page, so that they can add comments if necessary. It might be more efficient to take things forward by email, however.
It would be very interesting to try to take this forward. Ideally, if the Elixir interface is sufficiently comprehensive, it would allow a lot of the work already done in Gramadóir to be bootstrapped into desktop end-user applications.
Kevin Donnelly
You write very well.
看房子,買房子,建商自售,自售,台北新成屋,台北豪宅,新成屋,豪宅,美髮儀器,美髮,儀器,髮型,EMBA,MBA,學位,EMBA,專業認證,認證課程,博士學位,DBA,PHD,在職進修,碩士學位,推廣教育,DBA,進修課程,碩士學位,網路廣告,關鍵字廣告,關鍵字,課程介紹,學分班,文憑,牛樟芝,段木,牛樟菇,日式料理, 台北居酒屋,日本料理,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,住宿,訂房,HOTEL,飯店,造型系列,學位,牛樟芝,腦磷脂,磷脂絲胺酸,SEO,婚宴,捷運,學區,美髮,儀器,髮型,牛樟芝,腦磷脂,磷脂絲胺酸,看房子,買房子,建商自售,自售,房子,捷運,學區,台北新成屋,台北豪宅,新成屋,豪宅,學位,碩士學位,進修,在職進修, 課程,教育,學位,證照,mba,文憑,學分班,網路廣告,關鍵字廣告,關鍵字,SEO,关键词,网络广告,关键词广告,SEO,关键词,网络广告,关键词广告,SEO,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,居酒屋,燒烤,美髮,儀器,髮型,美髮,儀器,髮型,美髮,儀器,髮型,美髮,儀器,髮型,小套房,小套房,進修,在職進修,留學,證照,MBA,EMBA,留學,MBA,EMBA,留學,進修,在職進修,牛樟芝,段木,牛樟菇,關鍵字排名,網路行銷,关键词排名,网络营销,網路行銷,關鍵字排名,关键词排名,网络营销,PMP,在職專班,研究所在職專班,碩士在職專班,PMP,證照,在職專班,研究所在職專班,碩士在職專班,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,EMBA,MBA,PMP
,在職進修,專案管理,出國留學,EMBA,MBA,PMP
,在職進修,專案管理,出國留學,EMBA,MBA,PMP
,在職進修,專案管理,出國留學
住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,住宿,民宿,飯店,旅遊,美容,美髮,整形,造型,設計,室內設計,裝潢,房地產,進修,在職進修,MBA,EMBA,羅志祥,周杰倫,五月天,蔡依林,林志玲,羅志祥,周杰倫,五月天,蔡依林,林志玲,羅志祥,周杰倫,五月天,蔡依林,羅志祥,周杰倫,五月天,蔡依林
.http://wvvv6.spaces.live.com
http://pumashoes2009.spaces.live.com
http://buy-cheap-ugg-boots.blogspot.com
http://www.eshooes.com/cheap-shoes-show
http://good-puma.blogspot.com/
.http://apps.facebook.com/faceblogged/?uid=1447786117
I would like to be the supporter of yours. Thank you for sharing such a nice article.
chaussures puma
puma speed cat
Nike Tn Chaussures
requin tn
nike shox
puma shoes
puma CAT
puma basket
puma speed
baskets puma
puma sport
puma femmes
puma shox r4 torch
nike air max requin
nike shox r3
shox rival r3
tn plus
chaussures shox
nike shox r4 torch
air max tn requin
nike tn femme
pas cher nike
tn chaussures
nike rift
nike shox nz
chaussures shox
nike shox rival
shox rival
chaussures requin
jeans online
cheap armani jeans
cheap G-star jeans
酒店打工
酒店兼職
台北酒店
打工兼差
酒店工作
酒店經紀
禮服酒店
酒店兼差
酒店上班
酒店PT
酒店
酒店喝酒
酒店消費
喝花酒
粉味
喝酒
nike air max 90
nike air max 95
nike air max tn
nike air rift
nike shox r4
nike air max 360
nike shox nz
puma mens shoes
puma shoes
puma speed
nike shoes
nike air
nike air shoes
puma cat
air max trainers
mens nike air max
nike shoes air max
nike shoes shox
air shoes
nike shoe cart
puma future
cheap puma
sports shoes
nike air rifts
nike air rift trainer
nike air
nike rift
nike rift shoes
cheap nike air rifts
bape shoes
jeans shop
diesel jeans
levis jeans
I like your blog. Thank you. They are really great .
Some new style Puma Cat shoes is in fashion this year.
The Nike Air Shoes is a best Air Shoes .
Nike Air Rift is good and Cheap Nike Shoes.
If you are a fans of Puma basket,we would offer the good and Cheap Puma Shoes for you .the cheap ugg bootsis best christmas gift now.
The information age is really convenient .
puma mens shoes
puma shoes
puma speed
nike shoes
nike air
nike air shoes
nike air max 90
nike air max 95
nike air max tn
nike air rift
nike shox r4
nike air max 360
nike shox nz
puma cat
air max trainers
mens nike air max
sports shoes
nike air rifts
nike air rift trainer
nike air
nike shoes air max
nike shoes shox
air shoes
Lucyliu IS Lucyliu
nike shoe cart
puma future
cheap puma
nike rift
jeans shop
diesel jeans
levis jeans
nike rift shoes
cheap nike air rifts
bape shoes
酒店打工 酒店兼職
台北酒店 打工兼差 酒店工作 禮服酒店
酒店兼差 酒店上班 酒店應徵 酒店 酒店經紀
Post a Comment