Large Language Models (LLMs) are highly sensitive to how prompts are phrased – the quality of the response depends greatly on the prompt style and clarity. Prompt engineering is about giving clear instructions that align with how LLMs were trained, using formats that lead to useful outputs. Below is a structured guide of practical tips for writing effective prompts, each illustrated with real-world examples.
Be direct and unambiguous. State the task or question clearly, and avoid unnecessary polite fluff or filler that doesn't guide the model. Place instructions at the beginning of your prompt and separate them from any input text or context using delimiters like quotes or line breaks. This formatting makes it obvious what the model should do versus what text it should act upon. Include only relevant details needed to avoid overloading or confusing the model.
Example: Rather than a muddled prompt "I have some text for you, please if you don't mind could you maybe summarize it for me? Here it is: [text]", use a concise, structured format: "Summarize the text below as bullet points of key findings. Text: """[Paste text here]"""." This improved prompt clearly separates the instruction from the text and gets straight to the point. The model will know exactly that it needs to produce bullet-point findings from the given text, without extraneous verbiage.
Avoid lengthy or rambling prompts. Long, convoluted sentences can confuse the model and dilute the instruction. Aim for brevity while preserving needed context. If the task has multiple parts, consider listing them or breaking into steps rather than one giant paragraph. Using bullet points or numbered lists within your prompt for multiple requirements can enhance clarity. For instance, if asking for an analysis covering several criteria, format it as: "1. Analyze X, 2. Evaluate Y, 3. Conclude Z." This explicit structure makes it easier for the AI to address each point in order.
Be specific about what you want. Vague prompts lead to vague answers. Clearly define the desired outcome, format, or style in the prompt text. If you expect a brief answer, say "in 2-3 sentences" rather than "briefly," since terms like "brief" or "quickly" are ambiguous. For example, instead of "Provide a short description of the product", specify "Provide a 3-5 sentence description of the product highlighting its main benefit." This level of detail guides the model on length and focus, avoiding misunderstandings.
Tackle one thing at a time. If your query is complex or multifaceted, break it into smaller, manageable subtasks. LLMs handle step-by-step instructions better than a single overly broad question. This technique, known as task decomposition or chain-of-thought prompting, encourages the model to work through a problem in a logical sequence rather than jumping to a conclusion.
Example: Suppose you're planning a web development project. Instead of asking "What do I need to do to build this application?" in one go, you can prompt: "We are building feature X for our web app. Step 1: List the front-end development tasks required. Step 2: List the back-end development tasks required." By explicitly separating front-end and back-end tasks, you guide the model to organize its answer. The LLM might respond with something like:
Front-end tasks: Design the UI layout for feature X, implement responsive components, handle form inputs and validation, etc.
Back-end tasks: Set up API endpoints for feature X, implement business logic for data processing, update database schema and queries, etc.
This approach yields focused lists for each aspect of the project, ensuring no side is overlooked. Breaking the prompt into steps or sections like this leads to more organized and error-free progression on complex problems. It mirrors how a human would approach the task methodically, which helps the AI follow suit.
Guide the logical flow. You can explicitly instruct the model to think or answer step-by-step. Phrases like "think step by step" or enumerating the steps in the prompt encourage the model to follow a sequence. For example, if you have a complicated question, you might say: "Explain your reasoning step by step:" before the actual query. This is especially useful for multi-step reasoning (like math problems or troubleshooting) where you want the model's intermediate thought process.
Real-world use case: When asked a math word problem, adding "Show how you arrive at the answer, step by step." causes the model to list each calculation or logical step, improving transparency and often accuracy of the result. By structuring the prompt to break down the reasoning, you reduce the model's cognitive load and get more reliable answers for complex tasks.
Prompt engineering is an iterative process. Rarely will your first prompt be the perfect one for a complicated task. A best practice is to start with an initial prompt, observe the output, and then refine the prompt based on how the model responded. Treat each model response as feedback about your prompt clarity. If the output is missing something or goes off track, tweak your instructions, add detail, or rephrase ambiguities, and try again. This loop of trial and improvement is key to honing an effective prompt.
Example: Imagine you initially prompt an LLM: "Write a short story about a robot." The result might be a very basic story with a generic plot. By reviewing that output, you realize you wanted something more specific or engaging. You then refine the prompt to: "Write a 500-word story about a sentient robot discovering human emotions for the first time. Include themes of self-discovery and the ethics of AI." This iterated prompt is more detailed about the desired content and themes. The outcome? The AI produces a much more nuanced and rich story that aligns with your intent – for instance, a poignant tale of a robot grappling with feelings, touching on ethical questions. This before-and-after demonstrates how adding specific guidance through iteration can dramatically improve the relevance and quality of the output.
When refining, adjust one aspect at a time so you can pinpoint what changes make a difference. For instance, if the response was too general, add context or constraints; if it was too verbose, explicitly limit the length next round. Keep notes of which phrasing yielded better results – over time you'll build a sense of what works best (your "master prompt"). Remember to save earlier versions in case a new change unintentionally makes things worse, so you can revert. Each iteration should bring the prompt closer to producing the exact kind of answer you need. Experienced prompt engineers often refine prompts multiple times, especially for complex queries, treating it as a gradual optimization process.
Set the stage for the answer. LLMs can produce more useful responses when given context about who they are or who the answer is for. By assigning the model a role or persona, you inform its tone and level of detail. For example, starting your prompt with "You are an experienced climate scientist. Explain the effects of rising sea levels... " will likely yield a more technical and authoritative answer than just "Explain the effects of rising sea levels." In a real-world scenario, prompting "As a customer support agent, respond to the following complaint..." makes the model adopt a polite, helpful tone appropriate for customer service. This role-play technique (also called role prompting) tailors the output to a specific voice or expertise, which is valuable if you need the answer in a certain style (e.g. a medical explanation vs. a casual conversation).
Specify the audience or tone if needed. If you want the answer in a particular tone or complexity level, say so explicitly. For instance, "Explain quantum computing to a high school student in a friendly, simple tone." The model will adjust its language accordingly (using simpler words, a teaching tone, etc.). Words like formal, informal, academic, humorous, optimistic, etc., included in your prompt act as signals for style. Example: "Give a professional, formal summary of the meeting transcript…" will differ from asking for a "casual summary of the transcript." By guiding the style and audience, you steer the model toward the kind of output you envision.
Use affirmative instructions and explicit constraints. Clearly tell the model what to do, not just what not to do. Prompts that only say "don't do X" can lead to the model either ignoring the instruction or focusing excessively on avoiding X while neglecting what you actually need. It's more effective to phrase constraints in a positive way. For example, instead of: "Do not include any confidential information in the answer," rephrase as: "Provide an answer without mentioning any confidential information." And rather than "Don't be too technical," say "Explain this in simple, non-technical terms." This way, you're guiding the model toward the behavior you want.
Similarly, if there are certain formats or content to avoid, it can help to also tell the model what alternative approach to take. For instance: "Avoid using first-person perspective; instead, describe the process from a third-person point of view." This dual instruction ("avoid X, do Y instead") leaves less room for the AI to guess the desired style.
Nudge the model with key words for specialized outputs. Tiny prompt tweaks can guide the format of the answer. If you expect a block of code, starting the prompt or an example with a relevant keyword can cue the model. For example, an OpenAI tip for code generation is to begin with an import statement or code delimiter – e.g. saying "Write a Python function that does X. Begin with import ..." nudges the model to output actual Python code. In practice, a prompt like:
# Task: Calculate factorial of a number. # Requirements: # - Provide a Python function. # - Show usage example. import
will likely lead the model to continue writing Python code (it sees the import and follows through with code). This strategy works similarly for SQL queries (e.g., starting with SELECT in the prompt). The general idea is to include a small trigger phrase or token that NLU models associate with a format, thereby guiding the output structure (code, lists, etc.) without explicitly saying "Output code" or such.
Show, don't just tell. Providing examples in your prompt can greatly improve the model's understanding of the task and the format you want. This is known as few-shot prompting – you give a few input-output pairs (that demonstrate the task) before asking the model to produce an answer for a new input. By showing the model what a correct output looks like, you allow it to infer the pattern or rules from those examples. This technique is especially useful when the task output has a specific style or structure that might not be obvious from a one-line instruction.
Example (Few-shot): If you want the model to translate sentences, you might prompt with:
English: Where is the library? Spanish: ¿Dónde está la biblioteca? English: How are you? Spanish: ¿Cómo estás? English: What time is it? Spanish:
Here, by giving two example translations, the model learns the pattern that after "English: X", the next line should be "Spanish: [translation]". When it reaches the third query, it will follow the demonstrated format and produce the Spanish translation for "What time is it?". Real-world usage: developers often include 1-3 examples in prompts (like Q&A pairs, or input -> output cases) to guide the model's output format and style, essentially providing a mini training within the prompt. Make sure your examples are relevant and reflect the diversity of cases you expect, as the model will pattern-match to them closely.
Provide an output template. Another way to leverage examples is to literally outline the expected output format in the prompt. If you need the answer in a certain schema (a JSON, an outline, bullet points with specific labels, etc.), demonstrate that format so the model can mimic it. For instance, say you want to extract information and have it organized under specific headings. Your prompt could be:
"Extract the following from the text: Company Names, People Names, Topics, Themes. Format the output exactly like this example: Company Names: Name1, Name2, Name3 People Names: Person1, Person2 Topics: Topic1, Topic2 Themes: Theme1, Theme2 Text: """[input text]"""
By including a dummy template (with placeholders or examples) under a "Desired format" section, you show the model the exact layout you expect. The model is likely to fill in the blanks following that structure. In practice, this makes it much easier to parse the model's output later because it reliably adheres to the format you provided. It's essentially priming the model with an example answer shape. Research and experience indicate that models "respond better when shown specific format requirements", as it removes ambiguity about how the answer should look.
Use zero-shot vs. few-shot appropriately. If you have no examples to give (zero-shot), be extra sure to describe the task and output clearly. If zero-shot results are unsatisfactory, switch to few-shot by adding examples. For instance, when extracting keywords from text, an initial zero-shot prompt might just say "Extract keywords from the text:" which could produce inconsistent results. A refined prompt with a couple of examples (few-shot) explicitly showing Text -> Keywords pairs will likely yield a more consistent list of keywords for a new piece of text. Always test a prompt without examples first to see if the model's built-in knowledge suffices; if not, demonstrating the task through examples is a powerful next step.
Use neutral and precise language. The phrasing of your prompt can inadvertently introduce bias or ambiguity into the model's response. To get objective outputs, avoid loaded language or stereotypes in your instructions. For example, a prompt like "Why are doctors better at their jobs than nurses?" is leading and contains an implied bias (and could produce a problematic answer). A better approach would be a neutral prompt: "Compare the roles and responsibilities of doctors versus nurses." This framing doesn't assume one is "better" and invites a balanced explanation. Always review your prompt for any unintended bias – ensure you're not nudging the model toward a prejudiced viewpoint by the way you ask the question.
If the topic is potentially sensitive or could lead to biased content, you can explicitly instruct the model to be impartial. Including a line like "Ensure your answer is unbiased and free of stereotypes." sets a clear expectation. While the model might have inherent biases from training data, a direct prompt like that often helps it self-censor and double-check its output for fairness.
Avoid ambiguity. Ambiguous instructions often yield confusing answers or force the AI to make assumptions. Always clarify what each pronoun or placeholder refers to, and specify any context that might be missing. For example, if you ask "Do they make a good team?" the model might not know who "they" refers to unless you've mentioned it clearly earlier. Instead, ask "Do Alice and Bob make a good team?" if those are the subjects in question. Similarly, if your prompt says "List items from the last meeting" – does "last meeting" refer to a specific date or just the most recent one? Adding context like "List the action items from the project kickoff meeting (Jan 5, 2025)." removes doubt.
Quantify descriptors to pin down meaning. Words like "some," "several," or "recent" can be interpreted in many ways. If you want only two examples, say "two examples" instead of "some examples." If you say "recent research", consider specifying a timeframe like "research in the past 5 years" to guide the AI. Real-world tip: one team found that replacing an instruction "finish the summary quickly" with "finish the summary within 5 minutes" made the AI focus on brevity in content rather than rushing and potentially dropping detail. The clearer version set a concrete expectation (time frame), whereas "quickly" was open to interpretation.
Before finalizing a prompt, it can help to peer-review or self-review it for any possible double meanings or biases. Ask yourself: "Could this prompt be interpreted in more than one way? Could it inadvertently lead to an inappropriate or one-sided answer?" If yes, refine the wording. Over time, practicing this mindfulness in prompt writing will lead to more consistent and ethically sound outputs from the model.
Beyond the core strategies above, here are some extra best practices from experts and research to elevate your prompt crafting:
Provide necessary context: LLMs do not know anything beyond what's provided (unless it's common knowledge). Always include any relevant background or data the model will need to perform the task. For instance, if you want an analysis of a dataset, describe what the data is or include a snippet. If you're continuing a conversation, briefly recap important points so the model doesn't lose the thread. Context helps the AI stay on target and generate more accurate responses.
Mind the context window: Keep prompts within the model's token limits and be concise. Every model has a context size (e.g., ~4096 tokens for GPT-3.5) beyond which it cannot remember. Extremely long prompts or too much extraneous text may get truncated or cause the model to ignore important details. Optimize by trimming unnecessary words or details and focusing on the core information. If you have a lot of info, consider summarizing it or using multiple turns. A good rule is to simplify and compress any lengthy background into essential points, so you stay within limits while still giving the needed context.
Experiment with prompt variations: Don't hesitate to try different phrasings or techniques (questions, directives, role instructions) to see which yields the best results. Sometimes reordering sentences or using a slightly different keyword can significantly change the outcome. For important use cases, it can be worth A/B testing two prompt wordings to compare which one the model handles better. Continual experimentation is part of prompt engineering – LLM behavior can be non-intuitive, so a tweak that isn't obviously better may still produce a better result. Keep notes on which prompts worked well for future reference.
Start simple, then add complexity: Begin with a straightforward prompt before piling on detailed instructions or multiple examples. If the simple prompt works, great – if not, you can progressively add more guidance (examples, format specs, etc.) as needed. This incremental approach helps identify the minimum instructions required for a good outcome, and avoids over-constraining the model with unnecessary rules from the start.
Use the latest and most suitable model: If you have access to different LLM versions, choose the one best suited to your task. Newer models are often "easier to prompt engineer" and more likely to follow instructions accurately. For example, GPT-4 generally understands complex prompts better than GPT-3. Similarly, some models are tuned for code, others for chat – use the appropriate model if possible (e.g., use a code-specialized model or mode when asking programming questions). An advanced model might give good results with a shorter prompt where an older one needed more explicit detail.
Adjust settings when available: In some scenarios (like using an API or certain interfaces), you can control generation parameters such as temperature or max tokens. These aren't part of the prompt text, but they influence outcomes. A low temperature (0–0.3) will make the model more deterministic and focused – ideal for factual answers or when you want to avoid creative deviations. A higher temperature can be used for brainstorming or creative writing to get more varied results. Always set max_tokens (or equivalent) high enough to not cut off the answer, but if you want to enforce brevity, you can limit it or explicitly tell the model a word/paragraph limit. While not "prompt text" per se, these settings complement your prompt design to guide the style of output.
Iterate and verify outputs: After the model responds, verify that it meets your requirements (especially for critical applications). If something is off – e.g., the format isn't quite right or it included something you wanted excluded – refine your prompt and try again (as discussed in Tip 3). For critical or sensitive tasks, consider using a second prompt to double-check the first output (for example, ask the model to critique or verify the answer it just gave, which can catch errors or biases in the response). This kind of self-reflection prompting can improve the final quality. Always review the content for correctness and safety, since LLMs might produce confident-sounding but incorrect answers.
By following these best practices – from structuring your prompts clearly, to breaking down tasks, providing examples, and refining iteratively – you can significantly improve the effectiveness of your interactions with LLMs. Prompt crafting is part art and part science: use these guidelines as a starting framework, and continue to refine your technique with experience. A well-crafted prompt can save time, reduce frustration, and yield outputs that closely match your needs. Happy prompting!