Although the expansion of storage technologies, networking systems, and information system methodologies, the capabilities of conventional data processing techniques remain limited. The need to knowledge extraction, compact representation and data analysis are highly motivated by data expansion. Nevertheless, learning from data might be a complex task, particularly when it includes noisy, redundant and information-less attributes. Feature Selection (FS) tries to select the most relevant attributes from raw data, and hence guides the construction of final classification models or decision support systems. Selected features should be representative of the underlying data and provide effective usefulness to the targeted learning paradigm (i.e. classification). In this thesis, we investigate different optimization paradigms as well as its adaptation to the requirements of the feature selection challenges, namely the problem combinatorial nature. Both theoritical and empirical aspects were studied, and confirm the effectiveness of the adopted methodology as well as the proposed metaheuristic based approaches.