Friday, December 4th, 2015

DVP Talk: Sam Ventura, 12/7 @ 4 pm in Olin 264

Title:  Classification and Clustering for Record Linkage in Large Datasets Abstract: Record linkage, or the process of linking records corresponding to unique entities within and/or across data sources, is an increasingly important problem in today’s data-rich world.  Due to issues like typographical errors, name variation, and repetition of common names, linking records of unique entities within and across large data sources can be a difficult task, in terms of both accuracy and computational feasibility.  We frame record linkage as a clustering problem, where the objects to be clustered are the records in the data source(s), and the clusters are the unique […]

