Description:
The Propositional Satisfiability (SAT) problem (Davis et al. 1962) is one of the most studied NP-Complete problems because of its significance in both theoretical research and practical applications. Given a Boolean formula, the SAT problem requires an assignment of variables/features so that the formula evaluates to true, or a determination that no such assignment exists.The problem of finding the smallest feature subsets using rough set theory can be formulated as a SAT problem. Rough sets allows the generation from datasets of clauses of features in conjunctive normal form. If after assigning truth values to all features appearing in the clauses the formula is satisfied, then those features set to true constitute a valid subset for the data. The task is to find the smallest number of such features so that the CNF formula is satisfied. Although this can be solved by a simple algorithm for small datasets, more sophisticated SAT solution techniques (e.g. those that employ a degree of intelligent backtracking) will be required for larger data.
The aim of this project is to investigate how advanced SAT solution techniques may be applied to the problem of feature selection using rough set theory. This should result in an implemented system capable of discovering minimal feature subsets from standard datasets. The project is open as to the solution technique(s) developed and the resulting system. Part of this project will involve reading into the fundamentals of SAT and rough set theory. The problem domain to be tackled will be identified by the student and supervisor later (potential applications include web content classification and gene expression analysis).
References: