ML Playground / Prompt Engineering View Notebook

Prompt Engineering Basics

The practice of designing effective prompts to communicate with LLMs -- from zero-shot to advanced techniques like Tree of Thoughts and ReAct.

What is Prompt Engineering?

Prompt engineering is the practice of designing and refining inputs (prompts) to get accurate, relevant, and useful outputs from large language models (LLMs) like ChatGPT, Claude, or Gemini.

LLMs are sensitive to wording, structure, and context. A poorly crafted prompt leads to vague, irrelevant, or verbose answers. Good prompts produce consistent, reliable results.

How to Write a Good Prompt

Prompt Design Checklist
  1. Define the Role -- Tell the model who it is or how to behave
  2. State the Goal Clearly -- Specify exactly what you want done
  3. Provide Context/Constraints -- Give necessary details, limits, preferences
  4. Specify Output Format -- Table, JSON, bullet points, etc.
  5. Control Style/Tone -- Formal, casual, technical, creative
  6. Test and Refine -- Run the prompt, check outputs, tweak
  7. Keep it Clear and Concise -- Avoid ambiguity
prompt = [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Who won the world series in 2020?" } ]

Basic Prompting Techniques

Zero-Shot Prompting

Ask the AI to perform a task with only instructions -- no examples. The model relies on its pre-trained knowledge.

prompt = [ { "role": "system", "content": "Goal: You are a trip planner. " "Your job is to help users plan trips and create itineraries. " "Ask the user if they already have a destination in mind. " "Create the trip plan based on their preferences, duration, and budget." } ]

One-Shot Prompting

Provide one example of the task before asking. The example shows the pattern, formatting, style, or tone you expect.

prompt = [ { "role": "system", "content": '{"goal": "You are a fitness coach chatbot.", ' '"examples": [{"user_query": "Create a 3-day workout plan", ' '"AI_answer": "Day 1: 20 min cardio... Day 2: Rest... Day 3: Strength training"}], ' '"output_format": {"question": "", "Answer": ""}}' } ]

Few-Shot Prompting

Provide 2-5 examples. Helps the model learn patterns when instructions alone are not enough.

prompt = [ { "role": "system", "content": '{"goal": "You are a programming chatbot.", ' '"examples": [' '{"user_query": "what is python", ' '"AI_answer": "Python is a high-level programming language."}, ' '{"user_query": "explain about IPL", ' '"AI_answer": "I am a Programming chatbot"}], ' '"output_format": {"question": "", "Answer": ""}}' } ]

Chain-of-Thought (CoT) Prompting

Instruct the model to think step-by-step before giving the final answer. This improves accuracy on math, logic, and multi-step reasoning tasks.

Without CoT: Q: 3 apples + 2 more? A: 5 With CoT: Q: 3 apples + 2 more? Show reasoning. A: Step 1: Start with 3. Step 2: Buy 2 more. Step 3: Total = 3 + 2 = 5 Answer: 5
prompt = [ { "role": "system", "content": "Goal: You are a helpful assistant. " "Always solve problems by explaining your reasoning step by step. " "Example: Q: 2 pencils + 3 more? " "A: Step 1: Start with 2. Step 2: Buy 3. Step 3: Total = 5. Answer: 5" }, { "role": "user", "content": "A bookstore sold 15 books on Monday and 20 on Tuesday. " "How many total? Explain." } ]

Advanced Prompting Techniques

Self-Consistency

Ask the same question multiple times (with temperature > 0), collect all answers, and pick the most common one. Like asking 5 friends to solve a math problem and trusting the majority.

import openai from collections import Counter def self_consistency(question, num_attempts=5): # Ask the same question multiple times and return the most common answer all_answers = [] for i in range(num_attempts): response = openai.chat.completions.create( model="gpt-4", messages=[ {"role": "user", "content": question + " Think step by step."} ], temperature=0.7 # Add randomness for different reasoning paths ) answer = response.choices[0].message.content final_answer = extract_final_answer(answer) all_answers.append(final_answer) print(f"Attempt {i+1}: {final_answer}") # Count votes and return majority vote_counts = Counter(all_answers) winner = vote_counts.most_common(1)[0][0] print(f"Winner (majority): {winner}") return winner def extract_final_answer(response_text): # Extract the last number from the response import re numbers = re.findall(r'\d+', response_text) return numbers[-1] if numbers else response_text question = "A store has 23 apples. 17 are sold. How many are left?" result = self_consistency(question, num_attempts=5)
Good ForNot Good For
Math problems, logic puzzlesCreative writing, essays
Factual questions, code debuggingOpen-ended or subjective questions

Tree of Thoughts (ToT)

The AI explores multiple reasoning paths like branches of a tree, evaluates which are promising, and backtracks if a path leads to a dead end. Unlike CoT (single linear path), ToT explores in parallel.

CoT: Pick one path and follow it. Wrong step = stuck. ToT: Explore multiple paths. Evaluate. Backtrack if needed. Result: GPT-4 solved only 4% of "Game of 24" with CoT, but 74% with ToT.
ToT Step-by-Step
  1. Generate multiple initial thoughts (branches)
  2. Evaluate each thought -- is it promising?
  3. Expand promising paths, prune bad ones
  4. Backtrack if a path fails, try alternatives
  5. Select the path that reaches the correct answer

ReAct (Reasoning + Acting)

The AI thinks out loud (Reasoning) and uses external tools (Acting) to solve problems. Instead of guessing, it can search the web, do calculations, or look up information.

Thought --> Action --> Observation --> Thought --> ... "I need to find X" --> search[query] --> "Result: ..." --> "Now I know..."
ToolWhat It DoesExample
search[query]Search the websearch[capital of Japan]
lookup[term]Look up in current pagelookup[population]
calculate[expr]Do mathcalculate[15 * 24 + 7]
finish[answer]Return final answerfinish[Tokyo]
react_prompt = ( "You are an assistant that solves problems by thinking step-by-step " "and using tools.\n\n" "Available Tools:\n" "- search[query]: Search the internet for information\n" "- calculate[expression]: Calculate a math expression\n" "- finish[answer]: Return the final answer\n\n" "Format:\n" "Thought: [your reasoning]\n" "Action: [tool_name][input]\n\n" "After each Action, you receive an Observation with the result.\n" "Then continue with another Thought." )

Reflexion

The AI tries something, checks if it worked, reflects on what went wrong, and tries again. A self-improvement loop with three components:

Actor (Generate)

Creates the initial solution (code, answer, etc.)

Evaluator (Score)

Tests if the solution is correct (pass/fail)

Self-Reflection (Analyze)

Figures out what went wrong and how to fix it

Attempt 1: Actor generates is_prime(n) Evaluator: is_prime(1) = True (WRONG -- 1 is not prime) Reflection: "Missing edge case for n <= 1" Attempt 2: Actor adds "if n <= 1: return False" Evaluator: 5/5 tests pass (SUCCESS)

Practical Prompt Patterns

Delimiters

Use special characters (###, ---, <tag>) to separate different parts of your prompt. Prevents confusion between instructions and content.

Negative Prompting

Tell the model what NOT to do. Sometimes easier than describing all wanted behavior.

prompt = { "role": "system", "content": "You are a customer support bot. " "DO NOT: discuss competitors, make promises about future features, " "share internal info, provide legal/financial advice, use slang." }

Output Format Constraints

Force the model to respond in a specific format (JSON, table, structured list).

prompt = ( "Extract information from this text and return as JSON only.\n\n" 'Text: "John Smith, age 32, software engineer at Google in San Francisco."\n\n' "Output format:\n" '{"name": "", "age": 0, "job": "", "company": "", "location": ""}\n\n' "Return ONLY valid JSON, no explanations." )

Role-Based Prompting

Give the model a specific persona to change its tone, expertise level, and approach.

# As a teacher prompt_teacher = { "role": "system", "content": "You are a patient elementary school teacher. " "Explain concepts simply using examples a 10-year-old would understand." } # As an expert prompt_expert = { "role": "system", "content": "You are a senior ML researcher with 20 years of experience. " "Provide technically precise answers with paper references." } # As a critic prompt_critic = { "role": "system", "content": "You are a harsh code reviewer. " "Find every possible issue, edge case, and improvement." }

Calling LLM APIs

OpenAI (GPT-4)

pip install openai. Uses chat.completions.create() with messages array.

Anthropic (Claude)

pip install anthropic. Uses messages.create() with system + messages.

Google (Gemini)

pip install google-generativeai. Uses generate_content() directly.

Key API Parameters

ParameterWhat It DoesTypical Values
temperatureControls randomness. 0 = deterministic, 1 = creative0.0 - 1.0
max_tokensMaximum response length100 - 4000
top_pNucleus sampling (alternative to temperature)0.1 - 1.0
frequency_penaltyReduces repetition0.0 - 2.0
presence_penaltyEncourages new topics0.0 - 2.0
# OpenAI example from openai import OpenAI client = OpenAI(api_key="your-api-key") response = client.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is machine learning?"} ], temperature=0.7, max_tokens=500 ) print(response.choices[0].message.content)
# Anthropic Claude example import anthropic client = anthropic.Anthropic(api_key="your-api-key") message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, system="You are a helpful coding assistant.", messages=[ {"role": "user", "content": "Write a Python function to reverse a string."} ] ) print(message.content[0].text)

Token Counting

APIs charge per token. Models have context limits (8K, 32K, 128K). Roughly: 1 token ~ 4 characters or 0.75 words in English.

import tiktoken encoder = tiktoken.encoding_for_model("gpt-4") text = "Hello, how are you doing today?" tokens = encoder.encode(text) print(f"Text: {text}") print(f"Tokens: {tokens}") print(f"Token count: {len(tokens)}") # Token count: 8

Common Pitfalls

Prompt Injection

Malicious users override system instructions. Defense: use delimiters, validate input, filter output.

Hallucinations

Model confidently generates false info. Defense: ask for sources, use RAG, lower temperature.

Context Length Limits

Long documents get truncated. Defense: chunk text, summarize first, use retrieval.

Inconsistent Outputs

Same prompt gives different results. Defense: temperature=0, seed parameter, specific instructions.

Prompt Debugging

When Your Prompt Is Not Working
  1. Start simple, then add complexity -- Do not write a 500-word prompt at once
  2. Test with multiple inputs -- Short, long, unusual, edge cases
  3. Ask the model to explain -- "What part of my prompt was unclear?"
  4. Put important instructions at START and END -- Primacy/recency effect
  5. Show exact format examples -- Do not just describe the format

Techniques Summary

TechniqueWhen to UseKey Idea
Zero-shotSimple, well-known tasksJust give instructions
One-shotNeed format/style consistencyShow one example
Few-shotComplex patternsShow 2-5 examples
Chain-of-ThoughtMath, logic, reasoning"Think step by step"
Self-ConsistencyNeed high accuracyMultiple tries + vote
Tree of ThoughtsExploration neededBranch and evaluate
ReActNeed external toolsThought -> Action -> Observe
ReflexionIterative improvementGenerate -> Reflect -> Improve

Good prompts work WITH the model's architecture, not against it. Modern LLMs process text through tokenization, embeddings, attention, and context layers. Clear structure and explicit instructions align with how these systems work internally.

Prompt Template

[ROLE] You are a [specific role with expertise]. [CONTEXT] Background information the model needs. [TASK] Specific instruction of what to do. [FORMAT] How the output should be structured. [CONSTRAINTS] What NOT to do, length limits, style requirements. [EXAMPLES] Input: X Output: Y (optional) [INPUT] The actual user input/data to process.

Prompt EngineeringZero-ShotFew-ShotCoTReActToTReflexionLLM APIs