Text this: PatCID: an open-access dataset of chemical structures in patent documents