online gambling singapore online gambling singapore online slot malaysia online slot malaysia mega888 malaysia slot gacor live casino malaysia online betting malaysia mega888 mega888 mega888 mega888 mega888 mega888 mega888 mega888 mega888 Multilingual Machine Learning

摘要: Exploring BLEU Scores using Patent Data

 


Does your machine learn in Chinese? I don’t speak Mandarin or Cantonese so Google Translate gets all the credit — good or bad — for translating the preceding sentence into “您的機器學習中文嗎?” But how might a researcher quickly evaluate the quality of machine translations? This question encapsulates the basic challenge that gives rise to the BLEU metric. BLEU, which stands for bilingual language understudy, is a default measure of machine translation quality and is also sometimes applied to cross-lingual natural language processing (NLP) tasks. The metric is well-established in the machine translation space but some analysts question the algorithm’s applicability to a wider set of tasks beyond the measure’s original objectives. This article takes an initial dive into the lessons, implementations, and limits of BLEU using examples drawn from multilingual patent documents.

Researchers at IBM developed the BLEU algorithm in 2002 as an efficient method to evaluate the quality of machine translation in reference to benchmark human translations. The original paper by the developers, Papineni, and colleagues, is a good place to start if you’re interested in the founding context and objectives of the algorithm. BLEU is an adjusted precision measure of the matching word sequences between a “candidate” machine translation and one or multiple “reference” human translations. BLEU counts “n-grams”, a term for word sequences of length n, in a machine translation that match the n-grams in a human translation, divided by the total count of n-grams in the machine translation. The measure is adjusted in that it clips the match count to the maximum number of n-gram occurrences in a human translation and also penalizes machine translations that diverge in word length from the reference translation.

The resulting BLEU score is a number between 0 and 1, in which 0 represents zero n-gram matches between candidate and reference texts and 1 might equal a machine translation that is exactly similar to one of the references. In practice, the measure counts matches across multiple word sequence lengths — 4-grams (four-word sequences), tri-grams (three-word sequences), bi-grams (two-word sequences), and uni-grams (one-word sequences) — via a geometric mean of the respective n-gram calculations. The algorithm was designed for comparisons at the level of a corpus of sentences, with n-gram matches calculated at the basic unit of a sentence and then combined into a corpus-level score. To clarify terminology, the use of the term “document” in the present article refers to a corpus of sentences. If you’re interested in additional resources to understand the algorithm, you might check out the video tutorial at deeplearning.ai that discusses the details of the algorithm or the written tutorial at machinelearningmastery.com that explores the NLTK implementation. To explore the tangible use cases of the metric, I next apply BLEU using translations of Chinese-language patents.

......

詳見全文: Medium

 

若喜歡本文,請關注我們的臉書 Please Like our Facebook Page: Big Data In Finance

 


留下你的回應

以訪客張貼回應

0
  • 找不到回應

YOU MAY BE INTERESTED