Optimized Adversarial Defense: Combating Adversarial Attacks with Denoising Autoencoders and Ensemble Learning
Synopsis
Adversarial attacks pose a significant risk to machine learning models by introducing carefully crafted perturbations that can mislead the models into producing incorrect outputs. This research investigates the effectiveness of denoising autoencoders as a defense mechanism against adversarial attacks on image classification tasks. A strategy combining a denoising autoencoder with a convolutional neural network (CNN) classifier is proposed and evaluated on the Modified National Institute of Standards and Technology (MNIST) dataset. The ability of autoencoders to learn robust representations and reconstruct original images from noisy inputs is leveraged to mitigate the impact of adversarial perturbations generated by the Fast Gradient Sign Method (FGSM). A K fold cross validation ensemble technique was employed to ensure robust and generalizable results. Findings demonstrate the potential of autoencoder based defense in enhancing the robustness of classifiers against FGSM adversarial attacks, achieving significantly higher classification accuracy compared to the unprocessed adversarial set. However, due to the autoencoder being a lossy reconstruction technique, a trade off between robustness and overall classification performance is observed, with diminishing effectiveness for more severe adversarial perturbations. Despite these limitations, the research motivates further research into autoencoder based defense mechanisms, exploring more complex architectures, combining with other techniques such as ensemble learning, and extending to real world applications.


This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.