What is prompt engineering?

9 min read

·
┌──────────────────────────────────────────────────────────┐
│  ═══════════════════════════════════════════════════     │
│  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░     │
│  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░     │
│  ────────────────────────────────────────────────────    │
│  ██████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░     │
│  █████████████████████████████████░░░░░░░░░░░░░░░░░░     │
│  ██████████████████████████████████████░░░░░░░░░░░░░     │
│  ████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░     │
│  ────────────────────────────────────────────────────    │
│  ███████████████████████████████████████░░░░░░░░░░░░     │
└──────────────────────────────────────────────────────────┘

Prompt engineering is the practice of designing and refining the instructions you give to AI models to get the best possible results. It is part writing, part experimentation, and part understanding how models think. A well-crafted prompt can be the difference between a useless response and a brilliant one.

Beyond Basic Prompting

────────────────────────────────────────

Everyone who has used ChatGPT has written prompts. But prompt engineering goes deeper. It is the systematic study of how input phrasing, structure, context, and constraints affect model output. Professional prompt engineers iterate on prompts the way software engineers iterate on code, testing variations, measuring results, and optimizing for specific outcomes.

The good news is that the core techniques are learnable. You do not need a machine learning background. You need clear thinking, an understanding of what models respond to, and a willingness to experiment.

Key Techniques

────────────────────────────────────────

###Zero-Shot Prompting

You ask the model to perform a task with no examples. Just the instruction.

"Classify this customer email as positive, negative, or neutral."

This works well for tasks the model has seen extensively during training. For common tasks like classification, summarization, and translation, zero-shot prompting is often sufficient.

###Few-Shot Prompting

You provide a few examples of the task before asking the model to perform it. This is one of the most reliable techniques for improving output quality.

You show 2-5 examples of inputs and desired outputs, then present the actual input. The model follows the pattern established by your examples. Few-shot prompting is especially powerful for tasks with specific formatting requirements or domain-specific conventions.

###Chain-of-Thought (CoT)

You ask the model to work through its reasoning step by step before providing an answer. Adding "Let's think through this step by step" or providing an example with explicit reasoning dramatically improves performance on math, logic, and complex analysis tasks.

Chain-of-thought prompting works because it forces the model to allocate more computation to the problem. Each reasoning step is generated as output tokens, and each token generation step gives the model another chance to process information.

###Self-Consistency

Run the same prompt multiple times and take the majority answer. This reduces the impact of random variations in model output. It is particularly useful for math and reasoning tasks where the model might occasionally make errors.

###Tree-of-Thought

An extension of chain-of-thought where the model explores multiple reasoning paths, evaluates them, and selects the most promising one. This is useful for complex problems with multiple possible approaches.

System Prompts and Persona Setting

────────────────────────────────────────

Most providers support a system prompt, which is a special message that sets the overall behavior, tone, and constraints for the model. System prompts are powerful because they persist across the entire conversation.

Effective system prompts:

  • Define who the model is and what it does
  • Set the tone and communication style
  • Establish constraints and guardrails
  • Specify output format preferences
  • Provide domain context

A system prompt like "You are a senior Python developer. Answer questions concisely with code examples. Always mention potential pitfalls" consistently produces better technical responses than no system prompt at all.

Temperature, Top-p, and Other Parameters

────────────────────────────────────────

Beyond the prompt text itself, generation parameters significantly affect output:

[Temperature] controls randomness. A temperature of 0 produces the most deterministic output, always choosing the highest-probability token. Higher values (0.7-1.0) introduce more variety and creativity. For factual tasks, use low temperature. For creative tasks, use higher temperature.

[Top-p (nucleus sampling)] controls diversity differently. Instead of scaling probabilities, it considers only the most likely tokens that together account for a probability mass of p. A top-p of 0.9 means the model considers tokens until their cumulative probability reaches 90%.

[Max tokens] caps the response length. Setting this appropriately prevents the model from rambling and controls costs.

[Frequency penalty and presence penalty] (available in OpenAI models) discourage repetition. Frequency penalty reduces the likelihood of tokens that have already appeared, proportional to how often they appeared. Presence penalty applies a flat penalty to any token that has appeared at all.

Prompt Templates and Reusability

────────────────────────────────────────

Production applications rarely use hardcoded prompts. Instead, they use prompt templates with variables that get filled in at runtime.

A classification template might look like: "Classify the following {document_type} into one of these categories: {categories}. Document: {document_text}"

Good prompt templates are:

  • [Parameterized]: Key values are variables, not hardcoded
  • [Versioned]: Track changes to prompts like you track code changes
  • [Tested]: Each template version is tested against a suite of examples
  • [Documented]: Include comments explaining why specific phrasing was chosen

Testing and Iterating on Prompts

────────────────────────────────────────

Prompt engineering is an empirical practice. You cannot predict from first principles whether "Classify this text" will outperform "What category does this text belong to?" You have to test.

Build an evaluation dataset of inputs with known correct outputs. Run your prompt against the dataset and measure accuracy, consistency, and quality. Change one variable at a time and measure again.

Tools like Promptfoo, LMSYS Chatbot Arena, and provider-specific playgrounds help with this process. Some teams build custom evaluation pipelines that run automatically when prompts change.

Common Patterns

────────────────────────────────────────

[Classification]: Give clear categories, optionally with descriptions and examples. Ask the model to respond with only the category name.

[Extraction]: Specify exactly what fields to extract and the desired format. Use structured output for reliability.

[Summarization]: Specify the desired length, audience, and focus. "Summarize this article in 3 bullet points for a technical audience, focusing on the methodology."

[Transformation]: Clearly define the input format, desired output format, and any rules for the transformation. "Convert this SQL query to a MongoDB query. Preserve all filtering conditions."

[Analysis]: Break complex analysis into steps. Ask the model to identify relevant factors, evaluate each one, and then synthesize a conclusion.

Provider-Specific Tips

────────────────────────────────────────

Different models respond differently to prompting techniques:

[OpenAI GPT models] respond well to explicit formatting instructions and few-shot examples. They follow system prompts closely.

[Anthropic Claude] excels with detailed, nuanced instructions. Claude responds well to being given a role and explicit reasoning frameworks. XML tags in prompts can help structure input for Claude.

[Google Gemini] handles multimodal prompts well. When working with images or video, be specific about what aspects to focus on.

[Mistral models] follow the OpenAI prompting style closely, making it easy to port prompts.

[Open source models] vary widely. Smaller models need simpler, more explicit prompts. Larger models like Llama 3 70B can handle complex instructions similar to commercial models.

The Role of Prompt Engineering as Models Improve

────────────────────────────────────────

There is an ongoing debate about whether prompt engineering will become less important as models get smarter. The reality is nuanced. Basic prompting is getting easier since newer models understand intent better with less guidance. But sophisticated prompt engineering for complex tasks, evaluation, and production reliability remains critical.

The skillset is evolving from "how do I get the model to understand me" toward "how do I design reliable, testable, maintainable AI workflows." That is a shift in emphasis, not a reduction in importance. As models become more capable, the ceiling for what good prompt engineering can accomplish rises with them.

Related Articles

Building with AI