Parsing English with 500 lines of Python

Computational Linguistics

A syntactic parser describes a sentence’s grammatical structure, to help another application reason about it. Natural languages introduce many unexpected ambiguities, which our world-knowledge immediately filters out. A favourite example:

They ate the pizza with anchovies

A correct parse links “with” to “pizza”, while an incorrect parse links “with” to “eat”:

The Natural Language Processing (NLP) community has made big progress in syntactic parsing over the last few years. It’s now possible for a tiny Python implementation to perform better than the widely-used Stanford parser:

Parser Accuracy Speed (w/s) Language LOC
Stanford 89.6% 19 Java > 50,000[1]
parser.py 89.8% 2,020 Python ~500
Redshift 93.6% 2,580 Cython ~4,000

The rest of the post sets up the problem, and then takes you through a concise implementation, prepared for this post. The first 200 lines of parser.py, the part-of-speech tagger and learner, are described here. You should probably at least…

View original post 2.931 more words

Deixa un comentari

Fill in your details below or click an icon to log in:

WordPress.com Logo

Esteu comentant fent servir el compte WordPress.com. Log Out / Canvia )

Twitter picture

Esteu comentant fent servir el compte Twitter. Log Out / Canvia )

Facebook photo

Esteu comentant fent servir el compte Facebook. Log Out / Canvia )

Google+ photo

Esteu comentant fent servir el compte Google+. Log Out / Canvia )

Connecting to %s