Title: Harnessing Dataset Complexity in Classification Tasks
Speaker: Nathalie Japkowicz ,Department of Computer Science, American University
Date: October 27, 2020
Abstract: The purpose of this talk is to discuss two particular and related aspects of dataset complexity— Multi-Modality and Subclass Mix—in classification tasks, observe their effects, and show how they can be harnessed using relatively simple principles to improve classification performance. We will present four studies that show the difficulties caused to three types of learning paradigms: Binary, Multi-Class and One-Class Classification. Particular attention will be given to situations where the data presents additional challenges such as class imbalances, small disjuncts and general data scarcity. A number of approaches allowing us to harness the problems will be presented that all rely on the same or a similar principle.