Recipe AI — Intelligent Meal Planning System
Challenge
Daily meal planning requires balancing nutrition, variety, visual appeal, and practicality — a complex optimization problem that professional nutritionists spend years mastering. For busy home cooks preparing meals every single day, this is an exhausting, never-ending challenge.
Solution
Built a multi-approach ML system: cosine similarity on nutritional vectors for recipe search, LSTM networks for temporally-varied menu prediction, and later ChatGPT API for recipe simplification.
Result
A working pipeline processing ~20,000 recipes with nutritional data, capable of finding nutritionally equivalent alternatives and predicting non-repetitive weekly menus.
![]()
The Problem: Meal Planning Is a Hidden Optimization Problem
“What should we cook tonight?” — this seemingly simple question is an incredibly complex optimization problem that millions of home cooks face every day.
A professional nutritionist considers nutritional balance, variety, seasonal ingredients, visual appeal, cooking time, and cost simultaneously. They spend years learning to balance these factors intuitively. But for ordinary home cooks doing this every single day, the mental load is genuinely exhausting.
This project started over a decade ago — long before the current AI boom — as an attempt to solve this universal kitchen struggle with the tools available at the time: classical machine learning.
Approach: Three ML Techniques, One Problem
Rather than trying to solve everything with a single model, I broke the meal planning problem into three distinct sub-problems, each addressed with the most appropriate ML technique:
| Sub-Problem | Technique | What It Does |
|---|---|---|
| ”Same nutrition, different meal” | Cosine Similarity | Finds nutritionally equivalent alternatives |
| ”Don’t repeat meals” | LSTM Neural Network | Predicts varied menus based on recent history |
| ”Too complicated to cook” | ChatGPT API | Simplifies elaborate recipes for weeknight cooking |
Each approach solves a piece of the puzzle that the others cannot.
Data Foundation
The foundation of the system is a unified dataset built from three separate sources:
| Dataset | Records | Contents |
|---|---|---|
| Recipes | 19,902 | Names, categories, serving sizes, cooking instructions |
| Ingredients | 196,126 | Ingredient names, quantities, units linked to each recipe |
| Nutrition | — | Per-recipe nutritional breakdown (calories, protein, vitamins, minerals) |
After cleansing (removing missing IDs, fixing encodings, resolving inconsistencies) and merging via recipe_id, the final master dataset contained 19,312 recipes with complete nutritional profiles across 22 columns — each recipe a rich data point with metadata, a full ingredient list, and 13+ nutritional dimensions.
Technique 1: Nutritional Similarity Search (Cosine Similarity)
Why Cosine Similarity?
When comparing recipes by nutrition, the naive approach would be Euclidean distance. But this is dominated by absolute scale: a difference of 100 calories would overshadow a difference of 0.1 mg of vitamin B2, even though both could be equally significant nutritionally.
Cosine similarity compares the direction of vectors rather than magnitude — two recipes with similar nutritional proportions score highly regardless of portion size.
Implementation
Each recipe is represented as a 19-dimensional nutritional vector. After StandardScaler normalization (mean=0, std=1 per nutrient), I compute pairwise cosine similarity across all 18,174 valid recipes:
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X) # (18174, 19)
similarity_matrix = cosine_similarity(X_scaled) # (18174, 18174)
The system returns the top 30 most nutritionally similar recipes for any given dish. A chicken stir-fry might match with a pork and vegetable simmer, a tofu casserole, or a fish with root vegetables — all nutritionally equivalent but completely different in taste.
Menu-Level Extension
I extended this to full menu comparison (1,526 menus across 27 nutritional dimensions), enabling questions like: “Last Tuesday’s dinner was nutritionally great — what else can I cook that hits the same targets?”
Technique 2: Time-Series Menu Prediction (LSTM)
The Key Insight
Cosine similarity is stateless — it doesn’t know what you ate yesterday. It might suggest grilled chicken on Monday, Tuesday, and Wednesday. Nutritionally sound, but nobody wants that.
Then I realized: menu prediction is structurally identical to text generation.
| Text Generation | Menu Prediction |
|---|---|
| Vocabulary = words | Vocabulary = menu IDs |
| Sentence = sequence of words | Week = sequence of daily menus |
| Predict next word | Predict next day’s menu |
| Avoid repetition = good prose | Avoid repetition = varied meals |
This analogy directly determined the implementation — I used the exact same architecture that generates text to generate meal plans.
Architecture
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
model = Sequential([
LSTM(128, input_shape=(7, vocab_size)), # Past 7 days
Dropout(0.2),
Dense(vocab_size, activation="softmax") # Predict next menu
])
A sliding window of 7 days creates training samples: “given these 7 days of meals, what came next?” The LSTM learns weekly rhythms, consecutive avoidance, and seasonal clusters.
![]()
Temperature Sampling: The Diversity Dial
Borrowed directly from text generation, the temperature parameter controls how conservative or adventurous the suggestions are:
| Temperature | Behavior |
|---|---|
| 0.3 | Safe staples — frequently-eaten favorites |
| 0.7 | Sweet spot — familiar yet varied |
| 1.0 | Standard probability distribution |
| 1.5 | Adventurous — explores unusual combinations |
This single parameter lets users tune the system to their comfort level — the same mechanism that makes GPT outputs more or less creative.
Technique 3: Recipe Simplification (ChatGPT API)
The Problem ML Couldn’t Solve
The database contained nearly 20,000 recipes from professional cookbooks — many requiring 15+ ingredients, multi-step preparation, and specialized techniques. The cosine similarity engine could find alternatives and the LSTM could suggest variety, but if every suggestion was a 90-minute production, the system wasn’t actually solving the daily cooking problem.
Recipe simplification requires language understanding: knowing which ingredients are essential vs. optional, how to substitute techniques, how to preserve flavor while cutting complexity. No amount of numerical ML could do this.
LLM-Powered Transformation
When ChatGPT’s API became available, I added a simplification layer with explicit constraints:
- 10 ingredients or fewer (down from 15-20)
- 5 steps or fewer (down from 8-12)
- 30-minute target cooking time
- Preserve core nutrition (protein source, vegetable components)
- Generate a catchy name and structured JSON output
Concrete Example
A 16-ingredient, 8-step Braised Mackerel with Miso Glaze and Root Vegetables became an 8-ingredient, 5-step Quick Miso Mackerel Bowl cookable in 25 minutes — with protein and omega-3 preserved.
Batch processing ~20,000 recipes cost approximately $40 using GPT-3.5 Turbo (~20 million tokens).
Tech Stack
| Layer | Technology | Purpose |
|---|---|---|
| Data Pipeline | Python / pandas | Cleansing, merging, and structuring 3 datasets |
| Nutritional Search | scikit-learn (StandardScaler, cosine_similarity) | 19-dimensional recipe similarity across 18,174 recipes |
| Menu Prediction | Keras / TensorFlow (LSTM) | Time-series meal sequence prediction with temperature sampling |
| Recipe Simplification | OpenAI API (GPT-3.5 / GPT-4) | Natural language recipe transformation |
| Data Format | CSV, JSON | Unified master dataset and structured LLM outputs |
A Decade of Evolution
This project traces the evolution of applied AI itself:
| Phase | Era | Tool | Capability |
|---|---|---|---|
| 1. Classical Data Science | 2010s | pandas, scikit-learn | Precise numerical comparison, explainable results |
| 2. Deep Learning | Mid-2010s | Keras, LSTM | Temporal pattern learning, sequence generation |
| 3. Large Language Models | 2020s | OpenAI API | Natural language understanding and transformation |
Each phase didn’t replace the previous one — they complement each other. Cosine similarity provides the fastest nutritional matching. LSTM handles temporal sequencing. ChatGPT does what neither could: understand and rewrite natural language recipes.
What I Learned
Data Quality Determines Everything
After working with 20,000 recipes across a decade, the most consistent lesson: no algorithm can compensate for poor data quality. The time spent on cleansing and validation in the data pipeline phase paid dividends at every subsequent step.
Classical ML Still Has Its Place
Even in the age of LLMs, cosine similarity on well-structured numerical data is faster, cheaper, and more explainable than asking an LLM to compare recipes. The right tool depends on the problem, not the hype cycle.
The Best Time to Start Was Ten Years Ago
The problems worth solving haven’t changed. The tools have gotten dramatically better. Understanding why cosine similarity works for nutritional matching, or why temperature sampling produces diverse outputs, gives you a foundation that makes you a better engineer — even when working with modern LLMs.
Development Process in Detail
The full technical deep-dive is documented in a 4-part blog series:
| Part | Topic |
|---|---|
| Part 1 | Building the Data Pipeline — Cleansing 20,000 Recipes |
| Part 2 | Finding “Same Nutrition, Different Meal” with Cosine Similarity |
| Part 3 | Predicting Non-Boring Menus with LSTM Time Series |
| Part 4 | Transforming 20,000 Recipes with ChatGPT |