CISC 7700X Final Exam

1. c 
2. b
3. a
4. d
5. c
6. b
7. a
8. d
9. c
10. b
11. a
12. 0.7778;   or 7/9
       Given: P(L)=0.5, P(S|L)=0.7, P(S|-L)=0.2
       P(L|S) = P(S|L)P(L) / P(S)  
              = P(S|L)P(L) / ( P(S|L)P(L) + P(S|-L)P(-L) ) 
              = (0.7*0.5) / (0.7*0.5 + 0.2*0.5)
              = 0.7778

13. Not enough data. We don't know P(S,A|L) and P(S,A)
       Given: P(A|L)=0.6, P(A|-L)=0.15
       P(L|S,A) = P(S,A|L)P(L) / P(S,A). 

14. 0.9333;    or 14/15 
       Given same as above.
       naive assumption: P(S,A|L) = P(A|L)P(S|L)
       P(L|S,A) = P(S,A|L)P(L) / P(S,A)
          = P(A|L)P(S|L)P(L) / ( P(A|L)P(S|L)P(L) + P(A|-L)P(S|-L)P(-L) )
                = 0.6*0.7*0.5 / (0.6*0.7*0.5 + 0.15*0.2*0.5)
                = 0.9333
       Another way to solve it is reuse results of q12:
       P(L|S,A) = P(A|L)P(L|S) / P(A)
                = P(A|L)P(L|S) / (P(A|L)P(L|S) + P(A|-L)P(-L|S))
                = (0.6 * 0.7778) / (0.6*0.7778 + 0.15*(1-0.7778)) 
                = 0.9333
       
15. a
16. d
17. c
18. Suppose n=100, then to store P(x_1,...,x_n|c) would require a 
   table with at least 2^100 entries.
   similarly, if our model has 2^100 numbers, we'd need way more than 2^100 training instances to fill in probability estimates.
   also, with a table that large, we'd essentially be memorizing the input and recalling it for classification (would not generalize well)
   Naive Bayes turns: P(x_1,...,x_n|c) into P(x_1|c)P(...|c)P(x_n|c). if n=100, we'd have 100 small tables.

19. a
20. d