CWE-1039

Inadequate Detection or Handling of Adversarial Input Perturbations in Automated Recognition Mechanism

AI Translation Available

The product uses an automated mechanism such as machine learning to recognize complex data inputs (e.g. image or audio) as a particular concept or category, but it does not properly detect or handle inputs that have been modified or constructed in a way that causes the mechanism to detect a different, incorrect concept.

Status

incomplete

Abstraction

class

Affected Platforms

AI/ML

Extended Description

AI Translation

When techniques such as machine learning are used to automatically classify input streams, and those classifications are used for security-critical decisions, then any mistake in classification can introduce a vulnerability that allows attackers to cause the product to make the wrong security decision or disrupt service of the automated mechanism. If the mechanism is not developed or 'trained' with enough input data or has not adequately undergone test and evaluation, then attackers may be able to craft malicious inputs that intentionally trigger the incorrect classification.

Targeted technologies include, but are not necessarily limited to:

- automated speech recognition

- automated image recognition

- automated cyber defense

- Chatbot, LLMs, generative AI

For example, an attacker might modify road signs or road surface markings to trick autonomous vehicles into misreading the sign/marking and performing a dangerous action. Another example includes an attacker that crafts highly specific and complex prompts to 'jailbreak' a chatbot to bypass safety or privacy mechanisms, better known as prompt injection attacks.

Technical Details

AI Translation

Common Consequences

integrity availability confidentiality other

Impacts

bypass protection mechanism dos: resource consumption (other) dos: instability read application data varies by context

Detection Methods

dynamic analysis with manual results interpretation architecture or design review

Potential Mitigations

Phases:

architecture and design implementation integration

Descriptions:

• Consider implementing adversarial training, a method that introduces adversarial examples into the training data to promote robustness of algorithm at inference time.

• Consider implementing multiple models or using model ensembling techniques to improve robustness of individual model weaknesses against adversarial input perturbations.

• Reactive defenses such as input sanitization, defensive distillation, and input transformations can all be implemented before input data reaches the algorithm for inference.

• Algorithmic modifications such as model pruning or compression can help mitigate this weakness. Model pruning ensures that only weights that are most relevant to the task are used in the inference of incoming data and has shown resilience to adversarial perturbed data.

• Consider implementing model hardening to fortify the internal structure of the algorithm, including techniques such as regularization and optimization to desensitize algorithms to minor input perturbations and/or changes.

• Incorporate uncertainty estimations into the algorithm that trigger human intervention or secondary/fallback software when reached. This could be when inference predictions and confidence scores are abnormally high/low comparative to expected model performance.

• Consider reducing the output granularity of the inference/prediction such that attackers cannot gain additional information due to leakage in order to craft adversarially perturbed data.

AI Generated Translation

Common Consequences

integrità disponibilità riservatezza altro

Impacts

elusione del meccanismo di protezione dos: consumo di risorse (altro) dos: instabilità leggere dati applicazione varia dal contesto

Detection Methods

analisi dinamica con interpretazione manuale dei risultati revisione dell'architettura o del design

Potential Mitigations

Phases:

architettura e design implementazione integrazione

Descriptions:

• Considera l'implementazione dell'addestramento adversarial, un metodo che introduce esempi adversarial nel set di dati di addestramento per favorire la robustezza dell'algoritmo durante l'inferenza.

• Considera l'implementazione di più modelli o l'uso di tecniche di ensemble di modelli per migliorare la robustezza delle vulnerabilità dei singoli modelli contro perturbazioni di input adversariali.

• Le difese reattive come la sanitizzazione degli input, la distillazione difensiva e le trasformazioni degli input possono tutte essere implementate prima che i dati di input raggiungano l'algoritmo per l'inferenza.

• Modifiche algoritmiche come il pruning o la compressione del modello possono contribuire a mitigare questa vulnerabilità. Il pruning del modello garantisce che vengano utilizzati solo i pesi più rilevanti per il compito durante l'inferenza dei dati in ingresso e ha dimostrato di essere resistente ai dati perturbati in modo adversarial.

• Considera l'implementazione di tecniche di rafforzamento del modello per consolidare la struttura interna dell'algoritmo, includendo metodi come la regolarizzazione e l'ottimizzazione per rendere gli algoritmi meno sensibili a piccole perturbazioni e/o modifiche degli input.

• Incorpora stime di incertezza nell'algoritmo che attivano l'intervento umano o software secondario/di fallback quando vengono raggiunte. Questo potrebbe verificarsi quando le previsioni di inferenza e i punteggi di confidenza sono insolitamente elevati/bassi rispetto alle prestazioni attese del modello.

• Considera di ridurre la granularità dell'output dell'inferenza/predizione in modo che gli attaccanti non possano ottenere informazioni aggiuntive a causa di perdite di dati, al fine di creare dati perturbati in modo avversario.

CWE-1039

Common Consequences

Impacts

Detection Methods

Potential Mitigations

Common Consequences

Impacts

Detection Methods

Potential Mitigations

Iscriviti alla newsletter