Java Stanford NLP: Spell checking -


I am trying to check the spelling accuracy of the text samples using Stanford NLP. It's just a metric of the text, no filter or nothing, so if it gets a little bit shorter, then the error is the same.

My first thought is that this word knows what the word is known by:

  Private static lexicalized parser LP = new lexicalized parser ("englishPCFG.ser. Gz "); @ Analysis (weight = 25, name = "spelling") public double spelling () {int result = 0; For (list & lt;; HasWord & gt; list: fax: sentence) {for (HasWord w: list) {if (! Lp.getLexicon ()) isKnown (w.word ())} {System.out. Format ("misspelled:% s \ n", w.word ()); Results ++; }}} Return Results / Sentences (size); }  

However, it produces quite false positives:

  misspelled: sincerity misspelled: cicphus misspelled: cicphus wrong spelling: loyalty Spelling Misspellings: Gods misspelled from: Missing spelling from: Molecular misspelling: Layer misspelled: Sisyphus misspelled: Camus misspellings: foandf misspelled: foandf misspelling: babby misspelled formd: gurl misspelled: pregnent Misspelling: babby misspelled: misspelled formd: gurl Misspelled wrong: spelling: pregnant misspelled: sincerity misspelled: sisyphus misspelled: Sisyphus misspelled: falsity misspelled wrong: misspelled misspelling: gods misspelled: misspelled from the next: atom misspelled: back misspelled: Sisyphus < / Code> 

Any ideas on how to do this better is the string method used by the dictionary of parser because the spell-checker parser is not a viable use. . The method is right: "false" means that the word has been trained by parser (with the capitalization) in almost 1 million words of the text. But to train a broad spellchecker in a data-based way, 1 million words are not just enough text. Generally people usually use at least two orders of horrors of text, and can add some tricks to handle capitalization. Parser includes some of the clever words that were overlooked in training data, but this is what isKnown (string) method is returned.


Comments

Popular posts from this blog

c# - How to capture HTTP packet with SharpPcap -

php - Multiple Select with Explode: only returns the word "Array" -

php - jQuery AJAX Post not working -