Instance reduction approach to machine learning and multi-database mining

Ireneusz Czarnowski, Piotr Jędrzejowicz

Abstract


The paper proposes a heuristic instance reduction algorithm as an approach to machine learning and knowledge discovery in centralized and distributed databases. The proposed algorithm is based on an original method for a selection of reference instances and creates a reduced training dataset. The reduced training set consisting of selected instances can be used as an input for the machine learning algorithms used for data mining tasks. The algorithm calculates for each instance in the data set the value of its similarity coefficient. Values of the coefficient are used to group instances into clusters. The number of clusters depends on the value of the so called representation level set by the user. Out of each cluster only a limited number of instances is selected to form a reduced training set. The proposed algorithm uses population learning algorithm for selection of instances. The paper includes a description of the proposed approach and results of the validating experiment.

Full Text:

PDF


DOI: http://dx.doi.org/10.17951/ai.2006.4.1.60-71
Date of publication: 2006-01-01 00:00:00
Date of submission: 2016-04-27 10:15:02


Statistics


Total abstract view - 268
Downloads (from 2020-06-17) - PDF - 0

Indicators



Refbacks

  • There are currently no refbacks.


Copyright (c) 2015 Annales UMCS Sectio AI Informatica

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.