logo

Crowdly

Natural Language Processing

Шукаєте відповіді та рішення тестів для Natural Language Processing? Перегляньте нашу велику колекцію перевірених відповідей для Natural Language Processing в moodle.iitdh.ac.in.

Отримайте миттєвий доступ до точних відповідей та детальних пояснень для питань вашого курсу. Наша платформа, створена спільнотою, допомагає студентам досягати успіху!

Given a sentence with k tokens, how many n-grams with frequency greater than zero can be obtained from the sentence where n is an arbitrary natural number ? 

Переглянути це питання

What would be the adjusted count of bigram {“A”, “B”} if we had to observe the above maximum likelihood estimate for {“A”, “B”} without applying Laplace Smoothing ? Do not be concerned with finding a whole number.

0%
0%
0%
Переглянути це питання

Consider a vocabulary consisting of k tokens. How many n-grams can you construct from the vocabulary where n is an arbitrary natural number ? The frequency of the n-gram need not be greater than zero. 

Переглянути це питання

Considering the definition of edit distance which assigns a weight of 1 to insertions and deletions whereas 2 to substitutions, what is the minimum edit distance between lead and deal ?

0%
0%
0%
Переглянути це питання

Naive Bayes is a generative model. Let P(d | c) be the probability of observing a document d given that it belongs to c. P(c) represents the fraction of documents belonging to class c. What are P(d|c) and P(c) respectively called ?

Переглянути це питання

A spam filter classifies emails as Spam (S) or Not Spam (¬S) using the Naïve Bayes algorithm. Given a dataset, the following probabilities are known:

P(S)=0.4 (40 % of emails are spam)

P(¬S)=0.6 (60% of emails are not spam).

70% of spam emails contain "offer" and 10% of non-spam emails contain "offer".

If a new email contains the word offer, find the probability that it is spam.

Переглянути це питання

Consider three language models, A, B and C. Upon evaluating each of their performances on a test set we observe that A obtains a perplexity score of 962, B a perplexity score of 170 and C a perplexity score of 109. Which of the following is most likely to be the rationale behind the difference in performance between the three.

0%
0%
0%
Переглянути це питання

Consider the Dynamic Programming based solution of finding the minimum edit distance between two strings of different lengths. Let the substitution cost be S, insertion cost be I and deletion cost be D. Let the element in the rth row and cth column of the DP table be represented by the tuple (r, c)

After filling in R rows of the DP table we attempt to fill in the Cth column of the (R + 1) th row. It is observed that the element in (r, c) = 2 ; (r+1, c-1) = 3 ; (r, c - 1) = 4. Furthermore the terminal characters at (r+1, c) are not equal. Deduce the entry at (r + 1, c) from the above information.

0%
0%
0%
Переглянути це питання

Assume that your corpus consists of 1000 unique characters. The Byte Pair Encoding algorithm runs on your corpus for 500 iterations creating a new merge every iteration. The algorithm outputs a vocabulary at the end of its execution. What is the size of this vocabulary i.e. how many elements are in the vocabulary ?

0%
0%
0%
Переглянути це питання

Which of the following is a valid bigram from the sentence "I love NLP"?

Переглянути це питання

Хочете миттєвий доступ до всіх перевірених відповідей на moodle.iitdh.ac.in?

Отримайте необмежений доступ до відповідей на екзаменаційні питання - встановіть розширення Crowdly зараз!