a thoughtful web.
Good ideas and conversation. No ads, no tracking.   Login or Take a Tour!
comment by hogwild
hogwild  ·  3421 days ago  ·  link  ·    ·  parent  ·  post: Scientists of hubski, what science do you science?

What do you use for language modeling? Are you learning a formal grammar, building a lexicon, or doing something weird with LSTMs?





LeadGuit  ·  3421 days ago  ·  link  ·  

Depends on the job ;-)

For all non-computational linguists I'll go ELI5 on this:

For machine translation I work mostly with the moses toolkit[1] with IRSTLM as language model. This toolkit is made for statistical machine translation - so you need a lot of data and a parallel corpus (identical texts in both languages e.g. stuff you find in those little "phrases for travelling" booklets). If you're running a Unix-System it's pretty easy to get started (baseline) - you can use the europarl-corpus for some nice experiments (what about your own Portuguese-German translator? ;-)

There is also a pretty nice tollkit named "apertium"[2] - this is about rule-based MT, so you don't need a lot of data, but you need a comprehensive grammar (constisting of a lexicon and grammar rules).

For other stuff I do there are tons of different methods and approaches each- from formal/funcitonal grammar up to machine learning/deep learning techniques (Naïve Bayes classifier, Support vector machines etc.)

If you're (or others) are interested, I could post some interesting links for Natural Language Processing (maybe a new Tag for that?)

[1] http://statmt.org/moses

[2] https://www.apertium.org

hogwild  ·  3421 days ago  ·  link  ·  

On Twitter, people use #nlproc to avoid the "neuro-linguistic programming" hypnosis cranks. If one of us posts something to that tag, I'll follow it!