comment by hogwild

a thoughtful web.

Good ideas and conversation. No ads, no tracking. Login or Take a Tour!

comment by hogwild

hogwild · 3421 days ago · link · · parent · post: Scientists of hubski, what science do you science?

What do you use for language modeling? Are you learning a formal grammar, building a lexicon, or doing something weird with LSTMs?

markup tips · 0

LeadGuit · 3421 days ago · link ·

Depends on the job ;-)

For all non-computational linguists I'll go ELI5 on this:

For machine translation I work mostly with the moses toolkit[1] with IRSTLM as language model. This toolkit is made for statistical machine translation - so you need a lot of data and a parallel corpus (identical texts in both languages e.g. stuff you find in those little "phrases for travelling" booklets). If you're running a Unix-System it's pretty easy to get started (baseline) - you can use the europarl-corpus for some nice experiments (what about your own Portuguese-German translator? ;-)

There is also a pretty nice tollkit named "apertium"[2] - this is about rule-based MT, so you don't need a lot of data, but you need a comprehensive grammar (constisting of a lexicon and grammar rules).

For other stuff I do there are tons of different methods and approaches each- from formal/funcitonal grammar up to machine learning/deep learning techniques (Naïve Bayes classifier, Support vector machines etc.)

If you're (or others) are interested, I could post some interesting links for Natural Language Processing (maybe a new Tag for that?)

[1] http://statmt.org/moses

[2] https://www.apertium.org

+discuss+discuss

–

hogwild · 3421 days ago · link ·

On Twitter, people use #nlproc to avoid the "neuro-linguistic programming" hypnosis cranks. If one of us posts something to that tag, I'll follow it!