Friday, February 9, 2007

Sonnet Updates

I've been fairly busy the past two weeks and haven't put as much work into Sonnet as I would like. However there are several recent developments to mention.

Language Detection
We now have preliminary support for distinguishing between pt_PT and pt_BR as well as en_US and en_GB. Portuguese seems to be a special case that most NLP programs explicitly acknowledge, which I now understand. I'm not sure what should be done to additionally distinguish between en_ZA or en_AU. I've a few ideas and will let everyone know my thoughts after more testing is done. I really didn't want to start messing around with dialects, but the response to that position has been massively against me; so into the fray I go.

Elixir
The engine and documentation for Elixir is now ready for public scrutiny and comment. It has been interesting for me to write, since I've decided to only use C++ and the standard libraries so that there wouldn't be dependencies. Qt has really spoiled me :) The work has so far been done in my personal subversion repository. It will be made public as soon as Bug #9775 has been fixed and I have a working freedesktop.org CVS account.

Documentation
Aseigo mentioned the need for documentation today,

"sonnet might be cool, for instance, but unless there's a tutorial that lets people start using it in their application code quickly it'll almost certainly end up under-utilized and/or take many more revision releases of kde4 to find its potential realized."
So true. The public interfaces for Sonnet have only just settled down, and some are still on my computer and yet to be committed. So, this weekend I'll update most the changes littering my working directory and start outlining some tutorials to be put in the wiki. I've been fairly good in providing apidox so far, but those need some improvements as well. Of course, KDE4PORTING.html needs to be updated as well.

Merging
I've been hesitant to merge into trunk while the interfaces rapidly changing, but now that isn't much of a concern. A list of programs and libraries in kdepimlibs, kdebase and several other specific cases has been compiled and I'll be able to modify them when merging to ensure they build. For those projects that I personally won't migrate the tutorial should enable their developers to migrate seamlessly.

13 comments:

Thomas said...

Hi Jacob

I'm looking forward to this new spell checking framework with automatic language detection which should be most useful. I just have some questions/suggestions about the detection things:

1. I've seen you now can distinguish between pt_PT and pt_BR. Im a German speaking Swiss. With the German language (maybe also others) there exist (e.g. in OpenOffice) several slightly varying dictionaries (de_DE, de_CH, de_AT and then some) which may be difficult to automatically recognise due to not enough differences. Would it be possible to configure a default variant for German other than de_DE? I of course would use _CH as default allthough my text usually would be recognised as de_DE (or so I presume, as de_DE would be needed by ~80% of all German speaking users and therefore be the "natural" default). If I rarely really need de_DE I could select it manually.

2. For some writing I use my local dialect of the German language for which there doesn't exist a dictionary (there are no offical rules for writing, as it's mainly a spoken dialect). Is it possible to recognise this as a special language (based on some user provided text samples) and to default this to "no spell checking", so that I don't have to unselect checking all the time?

3. Language detection on a per paragraph basis should be usable most of the time, though there are cases where I use mixed languages, e.g. a special English term within a German sentence. Would it be possible to select a second "auxiliary" language against which spelling is check whenever the primary detected language reports an error? Maybe on a per document basis?

If you have any questions or need a better explanation of these items, please contact me at /thomas/dot/gantner/at/gmx/dot/net/.

-Thomas

rmgraham said...

And what English dialect list is complete without en_CA, eh?

I would really like to figure out if there are actual differences between all the non-en_US dialects of English. Would it be appropriate/possible to detect en_US vs. non-en_US and determine the dialect based on locale?

jeffrey said...

Cooks' Cottage (also known as Captain Cook's Cottage) is located in the Fitzroy Gardens, Melbourne, Australia. The cottage was constructed in 1755 in the English village of Great Ayton, North Yorkshire, by the parents of Captain James Cook, James and Grace Cook. It is a point of conjecture among historians whether James Cook, the famous navigator, ever lived in the house, but almost certainly he visited his parents at the house.

keno game

Anonymous said...

看房子,買房子,建商自售,自售,台北新成屋,台北豪宅,新成屋,豪宅,美髮儀器,美髮,儀器,髮型,EMBA,MBA,學位,EMBA,專業認證,認證課程,博士學位,DBA,PHD,在職進修,碩士學位,推廣教育,DBA,進修課程,碩士學位,網路廣告,關鍵字廣告,關鍵字,課程介紹,學分班,文憑,牛樟芝,段木,牛樟菇,日式料理, 台北居酒屋,日本料理,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,住宿,訂房,HOTEL,飯店,造型系列,學位,牛樟芝,腦磷脂,磷脂絲胺酸,SEO,婚宴,捷運,學區,美髮,儀器,髮型,牛樟芝,腦磷脂,磷脂絲胺酸,看房子,買房子,建商自售,自售,房子,捷運,學區,台北新成屋,台北豪宅,新成屋,豪宅,學位,碩士學位,進修,在職進修, 課程,教育,學位,證照,mba,文憑,學分班,網路廣告,關鍵字廣告,關鍵字,SEO,关键词,网络广告,关键词广告,SEO,关键词,网络广告,关键词广告,SEO,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,居酒屋,燒烤,美髮,儀器,髮型,美髮,儀器,髮型,美髮,儀器,髮型,美髮,儀器,髮型,小套房,小套房,進修,在職進修,留學,證照,MBA,EMBA,留學,MBA,EMBA,留學,進修,在職進修,牛樟芝,段木,牛樟菇,關鍵字排名,網路行銷,关键词排名,网络营销,網路行銷,關鍵字排名,关键词排名,网络营销,PMP,在職專班,研究所在職專班,碩士在職專班,PMP,證照,在職專班,研究所在職專班,碩士在職專班,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,EMBA,MBA,PMP
,在職進修,專案管理,出國留學,EMBA,MBA,PMP
,在職進修,專案管理,出國留學,EMBA,MBA,PMP
,在職進修,專案管理,出國留學

住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,住宿,民宿,飯店,旅遊,美容,美髮,整形,造型,設計,室內設計,裝潢,房地產,進修,在職進修,MBA,EMBA,羅志祥,周杰倫,五月天,蔡依林,林志玲,羅志祥,周杰倫,五月天,蔡依林,林志玲,羅志祥,周杰倫,五月天,蔡依林,羅志祥,周杰倫,五月天,蔡依林

sarkadan said...

Marrakech Apartments

Anonymous said...

.http://wvvv6.spaces.live.com
http://pumashoes2009.spaces.live.com
http://buy-cheap-ugg-boots.blogspot.com
http://www.eshooes.com/cheap-shoes-show
http://good-puma.blogspot.com/
.http://apps.facebook.com/faceblogged/?uid=1447786117

theprophet said...

nike shox torch
nike shox r4
nike tennis shoes
nike discount shoes
nike shox shoes
nike free shoes
nike womens shoes
discount nike shoes
puma mens shoes
cheap nike shox
mens puma shoes
men's puma shoes
tn dollar
nike shoxs
buy shoes online
sneakers shoes
free shiping shoes
nike 360 air max
puma shoes

Peejay Li said...

I would like to be the supporter of yours. Thank you for sharing such a nice article.
chaussures puma
puma speed cat
Nike Tn Chaussures
requin tn
nike shox
puma shoes
puma CAT
puma basket
puma speed
baskets puma
puma sport
puma femmes
puma shox r4 torch
nike air max requin
nike shox r3
shox rival r3
tn plus
chaussures shox
nike shox r4 torch
air max tn requin
nike tn femme
pas cher nike
tn chaussures
nike rift
nike shox nz
chaussures shox
nike shox rival
shox rival
chaussures requin
jeans online
cheap armani jeans
cheap G-star jeans

Anonymous said...

酒店打工

酒店兼職

台北酒店

打工兼差

酒店工作

酒店經紀

禮服酒店

酒店兼差

酒店上班

酒店PT

酒店

酒店喝酒

酒店消費

喝花酒

粉味

喝酒

lucyliu said...

nike air max 90
nike air max 95
nike air max tn
nike air rift
nike shox r4
nike air max 360
nike shox nz
puma mens shoes
puma shoes
puma speed
nike shoes
nike air
nike air shoes
puma cat
air max trainers
mens nike air max
nike shoes air max
nike shoes shox
air shoes
nike shoe cart
puma future
cheap puma
sports shoes
nike air rifts
nike air rift trainer
nike air
nike rift
nike rift shoes
cheap nike air rifts
bape shoes
jeans shop
diesel jeans
levis jeans

Anonymous said...

アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ
アーネストアーキテクツ

Anonymous said...

I like your blog. Thank you. They are really great .
Some new style Puma Cat shoes is in fashion this year.
The Nike Air Shoes is a best Air Shoes .
Nike Air Rift is good and Cheap Nike Shoes.
If you are a fans of Puma basket,we would offer the good and Cheap Puma Shoes for you .the cheap ugg bootsis best christmas gift now.
The information age is really convenient .

Anonymous said...

I like your blog. Thank you. They are really great .
Some new style Puma Cat shoes is in fashion this year.
The Nike Air Shoes is a best Air Shoes .
Nike Air Rift is good and Cheap Nike Shoes.
If you are a fans of Puma basket,we would offer the good and Cheap Puma Shoes for you .the cheap ugg bootsis best christmas gift now.
The information age is really convenient .