Assessing Similarity Between Datasets Using Vector Representations
The article considers an approach to determining the similarity of datasets for training algorithms using datasets with human faces as an example. This approach allows finding similar datasets from different sources, expanding the detection of features and classes and significantly affecting dataset...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | Russian |
| Published: |
Educational institution «Belarusian State University of Informatics and Radioelectronics»
2025-07-01
|
| Series: | Doklady Belorusskogo gosudarstvennogo universiteta informatiki i radioèlektroniki |
| Subjects: | |
| Online Access: | https://doklady.bsuir.by/jour/article/view/4164 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The article considers an approach to determining the similarity of datasets for training algorithms using datasets with human faces as an example. This approach allows finding similar datasets from different sources, expanding the detection of features and classes and significantly affecting dataset balance. For each dataset object, a vector representation (embedding) was obtained, then the embeddings in both datasets were compared. The experiments were conducted using datasets with images of human faces as an example. To obtain embeddings, a pretrained ResNet network was used. During the research, one dataset was divided into two parts, which were similar datasets, then each of the parts was compared with a different dataset. The new similarity metric is proposed, which has several advantages and allows to find the most similar datasets. |
|---|---|
| ISSN: | 1729-7648 |