algorithm - How does language detection work?

Question

Welcome To Ask or Share your Answers For Others

algorithm - How does language detection work?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

algorithm - How does language detection work?

I have been wondering for some time how does Google translate(or maybe a hypothetical translator) detect language from the string entered in the "from" field. I have been thinking about this and only thing I can think of is looking for words that are unique to a language in the input string. The other way could be to check sentence formation or other semantics in addition to keywords. But this seems to be a very difficult task considering different languages and their semantics. I did some research to find that there are ways that use n-gram sequences and use some statistical models to detect language. Would appreciate a high level answer too.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:32:18+0000

Take the Wikipedia in English. Check what is the probability that after the letter 'a' comes a 'b' (for example) and do that for all the combination of letters, you will end up with a matrix of probabilities.

If you do the same for the Wikipedia in different languages you will get different matrices for each language.

To detect the language just use all those matrices and use the probabilities as a score, let say that in English you'd get this probabilities:

t->h = 0.3 h->e = .2

and in the Spanish matrix you'd get that

t->h = 0.01 h->e = .3

The word 'the', using the English matrix, would give you a score of 0.3+0.2 = 0.5 and using the Spanish one: 0.01+0.3 = 0.31

The English matrix wins so that has to be English.

Categories

algorithm - How does language detection work?

algorithm - How does language detection work?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags