Week 2

Because Monday was Memorial Day, this was a short week, but it certainly wasn’t short of things to do. Our last undergrad working on our team arrived this week, Beatriz. We were divided this week into two groups, essentially. While Khyathi and Beatriz (who have more NLP experience) worked on sifting through the many, many medical documents we have downloaded from PubMed, Lily, Wenli, and I finished and presented our analysis on our annotation data and, for the rest of the week, explored the LibSVM classifier and gain an understanding of how it works. We also started looking into feature selection for the specificity classifier, or well, that’s what I was particularly interested in researching.

On the analysis of our annotation data, we didn’t figure out too much, largely because we have such a small sample size at the moment. After talking with Ani, though, we believe we may have found a way to really get at the aspects of a sentence that lead it to be more ambiguous by discounting coreference problems, otherwise known as unknown antecedents (known from context) in linguistics-speak. I don’t have any previous experience with NLP, so it’s been really interesting to learn about terms the field has for ideas I know from my language studies and how NLPers go about solving problems raised by those ideas.

I learned some basics about how to use LibSVM after many failed attempts at understanding their provided “beginner’s guide”. It talks about astroparticle physics! I’m afraid I’m a little bit too much of a beginner for even that. Thankfully, there were some other helpful tutorials online and I managed to train a classifier to distinguish between sentences about metalworking and sentences about woodworking, just as a test. Lily wrote a script that formatted the files properly to take in our sentence data, using the words themselves as features. It’s not a very accurate classifier, but it’s definitely a beginning! Feature selection for the task seems a lot more difficult than we had thought. I decided to look for helpful information from cognitive grammar studies, and I have a few ideas to look into next week!