Text this: Large Linguistic Corpus Reduction with SCP Algorithms