ML Playground / BERT / GPT Fine-Tuning

BERT / GPT Fine-Tuning

Take pre-trained language models and adapt them for your specific NLP tasks using HuggingFace Transformers.

What is Fine-Tuning?

Fine-tuning takes a pre-trained language model (BERT, GPT, RoBERTa) that has already learned language patterns from billions of words, and further trains it on your specific task with your labeled data. The model keeps its language understanding and just learns the new task on top.

BERT was pre-trained on all of English Wikipedia + BookCorpus (3.3 billion words). Fine-tuning lets you leverage all that knowledge with just a few hundred labeled examples.

Pre-Training vs Fine-Tuning

PRE-TRAINING (done by Google/OpenAI, takes weeks on 100s of GPUs): Task: Masked Language Model (BERT) or Next Token Prediction (GPT) Data: Billions of words from the internet Result: General language understanding FINE-TUNING (done by you, takes minutes-hours on 1 GPU): Task: Your specific task (sentiment, NER, QA, etc.) Data: Your labeled dataset (hundreds to thousands of examples) Result: Task-specific model with pre-trained knowledge

HuggingFace Ecosystem

transformers

Library with 100,000+ pre-trained models. Simple API for loading models and tokenizers.

datasets

Library with 1000s of ready-to-use datasets. Easy loading, processing, and caching.

Trainer

High-level training API. Handles training loops, evaluation, saving, logging automatically.

Pipeline

One-line inference. pipeline("sentiment-analysis")("I love this!") → Positive 0.99

Code: Sentiment Classification with BERT

from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments from datasets import load_dataset import numpy as np from sklearn.metrics import accuracy_score # Load dataset dataset = load_dataset("imdb") # 25K train, 25K test movie reviews # Load pre-trained BERT tokenizer and model model_name = "bert-base-uncased" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2) # Tokenize data def tokenize(batch): return tokenizer(batch["text"], padding="max_length", truncation=True, max_length=256) tokenized = dataset.map(tokenize, batched=True) # Training arguments training_args = TrainingArguments( output_dir="./results", num_train_epochs=3, per_device_train_batch_size=16, per_device_eval_batch_size=64, learning_rate=2e-5, # Low learning rate for fine-tuning! weight_decay=0.01, eval_strategy="epoch", save_strategy="epoch", load_best_model_at_end=True, ) # Metrics def compute_metrics(eval_pred): preds = np.argmax(eval_pred.predictions, axis=1) return {"accuracy": accuracy_score(eval_pred.label_ids, preds)} # Train trainer = Trainer( model=model, args=training_args, train_dataset=tokenized["train"], eval_dataset=tokenized["test"], compute_metrics=compute_metrics, ) trainer.train() # Inference from transformers import pipeline classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer) print(classifier("This movie was absolutely fantastic!")) # → [{'label': 'POSITIVE', 'score': 0.998}]

Code: Named Entity Recognition (NER)

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline # Load pre-trained NER model model_name = "dslim/bert-base-NER" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForTokenClassification.from_pretrained(model_name) ner = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple") text = "Elon Musk founded SpaceX in Hawthorne, California in 2002." entities = ner(text) for entity in entities: print(f" {entity['word']:20s} → {entity['entity_group']:5s} (score: {entity['score']:.3f})") # Output: # Elon Musk → PER (score: 0.998) # SpaceX → ORG (score: 0.997) # Hawthorne → LOC (score: 0.993) # California → LOC (score: 0.999)

Common NLP Tasks

TaskModel ClassExample
Text ClassificationAutoModelForSequenceClassificationSentiment, spam, topic
Named Entity RecognitionAutoModelForTokenClassificationFind names, places, dates
Question AnsweringAutoModelForQuestionAnsweringExtract answers from context
SummarizationAutoModelForSeq2SeqLMSummarize articles
TranslationAutoModelForSeq2SeqLMEnglish → French
Text GenerationAutoModelForCausalLMComplete text, chat

Fine-Tuning Tips

Fine-tuning BERT-base needs ~11GB GPU RAM. If you're limited, use DistilBERT (40% smaller, 97% accuracy) or use Google Colab's free GPU.