CWE-1427

Improper Neutralization of Input Used for LLM Prompting

AI Translation Available

The product uses externally-provided data to build prompts provided to
large language models (LLMs), but the way these prompts are constructed
causes the LLM to fail to distinguish between user-supplied inputs and
developer provided system directives.

Status

incomplete

Abstraction

base

Affected Platforms

AI/ML

Extended Description

AI Translation

When prompts are constructed using externally controllable data, it is often possible to cause an LLM to ignore the original guidance provided by its creators (known as the 'system prompt') by inserting malicious instructions in plain human language or using bypasses such as special characters or tags. Because LLMs are designed to treat all instructions as legitimate, there is often no way for the model to differentiate between what prompt language is malicious when it performs inference and returns data. Many LLM systems incorporate data from other adjacent products or external data sources like Wikipedia using API calls and retrieval augmented generation (RAG). Any external sources in use that may contain untrusted data should also be considered potentially malicious.

Technical Details

AI Translation

Common Consequences

confidentiality integrity availability access control

Impacts

execute unauthorized code or commands varies by context read application data modify application data gain privileges or assume identity

Detection Methods

dynamic analysis with manual results interpretation dynamic analysis with automated results interpretation architecture or design review

Potential Mitigations

Phases:

architecture and design implementation installation operation system configuration

Descriptions:

• LLM-enabled applications should be designed to ensure proper sanitization of user-controllable input, ensuring that no intentionally misleading or dangerous characters can be included. Additionally, they should be designed in a way that ensures that user-controllable input is identified as untrusted and potentially dangerous.

• Ensure that model training includes training examples that avoid leaking secrets and disregard malicious inputs. Train the model to recognize secrets, and label training data appropriately. Note that due to the non-deterministic nature of prompting LLMs, it is necessary to perform testing of the same test case several times in order to ensure that troublesome behavior is not possible. Additionally, testing should be performed each time a new model is used or a model's weights are updated.

• During system configuration, the model could be fine-tuned to better control and neutralize potentially dangerous inputs.

• LLM prompts should be constructed in a way that effectively differentiates between user-supplied input and developer-constructed system prompting to reduce the chance of model confusion at inference-time.

• During deployment/operation, use components that operate externally to the system to monitor the output and act as a moderator. These components are called different terms, such as supervisors or guardrails.

AI Generated Translation

Common Consequences

riservatezza integrità disponibilità controllo degli accessi

Impacts

eseguire codice o comandi non autorizzati varia dal contesto leggere dati applicazione modificare dati applicazione ottenere privilegi o assumere identità

Detection Methods

analisi dinamica con interpretazione manuale dei risultati analisi dinamica con interpretazione automatica dei risultati revisione dell'architettura o del design

Potential Mitigations

Phases:

architettura e design implementazione installazione operazione configurazione sistema

Descriptions:

• Le applicazioni abilitate a LLM dovrebbero essere progettate per garantire una corretta sanificazione degli input controllabili dall'utente, assicurando che nessun carattere intenzionalmente fuorviante o pericoloso possa essere incluso. Inoltre, dovrebbero essere progettate in modo tale da garantire che gli input controllabili dall'utente siano identificati come non affidabili e potenzialmente pericolosi.

• Assicurarsi che l'addestramento del modello includa esempi di training che evitino la fuoriuscita di segreti e ignorino input dannosi. Addestrare il modello a riconoscere i segreti e etichettare correttamente i dati di training. Si noti che, a causa della natura non deterministica del prompting degli LLM, è necessario eseguire più volte il test dello stesso caso di test per garantire che comportamenti problematici non siano possibili. Inoltre, i test devono essere eseguiti ogni volta che viene utilizzato un nuovo modello o quando vengono aggiornati i pesi di un modello.

• Durante la configurazione del sistema, il modello potrebbe essere perfezionato per migliorare il controllo e neutralizzare input potenzialmente pericolosi.

• I prompt LLM dovrebbero essere costruiti in modo da differenziare efficacemente tra input fornito dall'utente e il prompt di sistema costruito dallo sviluppatore, al fine di ridurre la possibilità di confusione del modello durante il processo di inferenza.

• Durante la distribuzione/operazione, utilizzare componenti che operano esternamente al sistema per monitorare l'output e agire come moderatori. Questi componenti sono chiamati con termini diversi, come supervisori o guardrail.

CWE-1427

Common Consequences

Impacts

Detection Methods

Potential Mitigations

Common Consequences

Impacts

Detection Methods

Potential Mitigations

Iscriviti alla newsletter