Datasets

This is a collection of datasets used in some of my feature selection experimentation. Many of these datasets come from the UCI Machine Learning Repository. The decision attribute is always the final column in the dataset.


Crisp Datasets
These datasets contain discrete values only:

Breast cancer
Corral
Credit
Derm
Derm2
DNA
Exactly
Exactly2
Heart
LED
Letters
Lung
M-OF-N
Monk3
Mushroom
Parity5+2
Parity5+5
Vote
Website classification
Discretized Water Treatment

Real-valued Datasets
These datasets contain real-valued attributes:

Abalone
Arrhythmia
Caco
Fruit
FUZZIEEE example
Glass
Ionosphere
Iris
Isolet
Vehicle
Water Treatment
Waveform
Website classification
Wine

Fuzzification files

Note that these have not been optimized. For use in fuzzy-rough
attribute reduction (FRAR):

Abalone
Arrhythmia
Caco
Fruit
FUZZIEEE example
Glass
Ionosphere
Iris
Isolet
Vehicle
Water treatment
Waveform
Website classification
Wine

A readme file containing some more description can be found here.

An example dataset and fuzzification for FRAR can be found below. The decision
attribute is fuzzy.

Example dataset
Dataset fuzzification

Leave a Reply

Your email address will not be published. Required fields are marked *

*