Deep neural networks have been shown to be vulnerable to small perturbations of their inputs. In this paper, we investigate the vulnerability of Neural Machine Translation (NMT) models to these attacks and propose a new attack algorithm called TransFool. TransFool can severely degrade the translation quality for different translation tasks and NMT architectures. Moreover, we show that TransFool is transferable to unknown target models. Finally, based on automatic and human evaluations, TransFool leads to improvement in performance compared to the existing attacks. Thus, TransFool permits us to better characterize the vulnerability of NMT models and outlines the necessity to design strong defense mechanisms and more robust NMT systems for real-life applications.
This page was last edited on 2024-04-12.
This page was last edited on 2024-04-12.