Text this: A principled methodology for comparing relatedness measures for clustering publications