Recipe AI — Intelligent Meal Planning System

Challenge

Daily meal planning requires balancing nutrition, variety, visual appeal, and practicality — a complex optimization problem that professional nutritionists spend years mastering. For busy home cooks preparing meals every single day, this is an exhausting, never-ending challenge.

Solution

Built a multi-approach ML system: cosine similarity on nutritional vectors for recipe search, LSTM networks for temporally-varied menu prediction, and later ChatGPT API for recipe simplification.

Result

A working pipeline processing ~20,000 recipes with nutritional data, capable of finding nutritionally equivalent alternatives and predicting non-repetitive weekly menus.

Recipe AI — Intelligent Meal Planning System

The Problem: Meal Planning Is a Hidden Optimization Problem

“What should we cook tonight?” — this seemingly simple question is an incredibly complex optimization problem that millions of home cooks face every day.

A professional nutritionist considers nutritional balance, variety, seasonal ingredients, visual appeal, cooking time, and cost simultaneously. They spend years learning to balance these factors intuitively. But for ordinary home cooks doing this every single day, the mental load is genuinely exhausting.

This project started over a decade ago — long before the current AI boom — as an attempt to solve this universal kitchen struggle with the tools available at the time: classical machine learning.


Approach: Three ML Techniques, One Problem

Rather than trying to solve everything with a single model, I broke the meal planning problem into three distinct sub-problems, each addressed with the most appropriate ML technique:

Sub-ProblemTechniqueWhat It Does
”Same nutrition, different meal”Cosine SimilarityFinds nutritionally equivalent alternatives
”Don’t repeat meals”LSTM Neural NetworkPredicts varied menus based on recent history
”Too complicated to cook”ChatGPT APISimplifies elaborate recipes for weeknight cooking

Each approach solves a piece of the puzzle that the others cannot.


Data Foundation

The foundation of the system is a unified dataset built from three separate sources:

DatasetRecordsContents
Recipes19,902Names, categories, serving sizes, cooking instructions
Ingredients196,126Ingredient names, quantities, units linked to each recipe
NutritionPer-recipe nutritional breakdown (calories, protein, vitamins, minerals)

After cleansing (removing missing IDs, fixing encodings, resolving inconsistencies) and merging via recipe_id, the final master dataset contained 19,312 recipes with complete nutritional profiles across 22 columns — each recipe a rich data point with metadata, a full ingredient list, and 13+ nutritional dimensions.


Technique 1: Nutritional Similarity Search (Cosine Similarity)

Why Cosine Similarity?

When comparing recipes by nutrition, the naive approach would be Euclidean distance. But this is dominated by absolute scale: a difference of 100 calories would overshadow a difference of 0.1 mg of vitamin B2, even though both could be equally significant nutritionally.

Cosine similarity compares the direction of vectors rather than magnitude — two recipes with similar nutritional proportions score highly regardless of portion size.

Implementation

Each recipe is represented as a 19-dimensional nutritional vector. After StandardScaler normalization (mean=0, std=1 per nutrient), I compute pairwise cosine similarity across all 18,174 valid recipes:

from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)  # (18174, 19)

similarity_matrix = cosine_similarity(X_scaled)  # (18174, 18174)

The system returns the top 30 most nutritionally similar recipes for any given dish. A chicken stir-fry might match with a pork and vegetable simmer, a tofu casserole, or a fish with root vegetables — all nutritionally equivalent but completely different in taste.

I extended this to full menu comparison (1,526 menus across 27 nutritional dimensions), enabling questions like: “Last Tuesday’s dinner was nutritionally great — what else can I cook that hits the same targets?”


Technique 2: Time-Series Menu Prediction (LSTM)

The Key Insight

Cosine similarity is stateless — it doesn’t know what you ate yesterday. It might suggest grilled chicken on Monday, Tuesday, and Wednesday. Nutritionally sound, but nobody wants that.

Then I realized: menu prediction is structurally identical to text generation.

Text GenerationMenu Prediction
Vocabulary = wordsVocabulary = menu IDs
Sentence = sequence of wordsWeek = sequence of daily menus
Predict next wordPredict next day’s menu
Avoid repetition = good proseAvoid repetition = varied meals

This analogy directly determined the implementation — I used the exact same architecture that generates text to generate meal plans.

Architecture

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

model = Sequential([
    LSTM(128, input_shape=(7, vocab_size)),  # Past 7 days
    Dropout(0.2),
    Dense(vocab_size, activation="softmax")  # Predict next menu
])

A sliding window of 7 days creates training samples: “given these 7 days of meals, what came next?” The LSTM learns weekly rhythms, consecutive avoidance, and seasonal clusters.

Recipe AI system architecture and data flow

Temperature Sampling: The Diversity Dial

Borrowed directly from text generation, the temperature parameter controls how conservative or adventurous the suggestions are:

TemperatureBehavior
0.3Safe staples — frequently-eaten favorites
0.7Sweet spot — familiar yet varied
1.0Standard probability distribution
1.5Adventurous — explores unusual combinations

This single parameter lets users tune the system to their comfort level — the same mechanism that makes GPT outputs more or less creative.


Technique 3: Recipe Simplification (ChatGPT API)

The Problem ML Couldn’t Solve

The database contained nearly 20,000 recipes from professional cookbooks — many requiring 15+ ingredients, multi-step preparation, and specialized techniques. The cosine similarity engine could find alternatives and the LSTM could suggest variety, but if every suggestion was a 90-minute production, the system wasn’t actually solving the daily cooking problem.

Recipe simplification requires language understanding: knowing which ingredients are essential vs. optional, how to substitute techniques, how to preserve flavor while cutting complexity. No amount of numerical ML could do this.

LLM-Powered Transformation

When ChatGPT’s API became available, I added a simplification layer with explicit constraints:

  • 10 ingredients or fewer (down from 15-20)
  • 5 steps or fewer (down from 8-12)
  • 30-minute target cooking time
  • Preserve core nutrition (protein source, vegetable components)
  • Generate a catchy name and structured JSON output

Concrete Example

A 16-ingredient, 8-step Braised Mackerel with Miso Glaze and Root Vegetables became an 8-ingredient, 5-step Quick Miso Mackerel Bowl cookable in 25 minutes — with protein and omega-3 preserved.

Batch processing ~20,000 recipes cost approximately $40 using GPT-3.5 Turbo (~20 million tokens).


Tech Stack

LayerTechnologyPurpose
Data PipelinePython / pandasCleansing, merging, and structuring 3 datasets
Nutritional Searchscikit-learn (StandardScaler, cosine_similarity)19-dimensional recipe similarity across 18,174 recipes
Menu PredictionKeras / TensorFlow (LSTM)Time-series meal sequence prediction with temperature sampling
Recipe SimplificationOpenAI API (GPT-3.5 / GPT-4)Natural language recipe transformation
Data FormatCSV, JSONUnified master dataset and structured LLM outputs

A Decade of Evolution

This project traces the evolution of applied AI itself:

PhaseEraToolCapability
1. Classical Data Science2010spandas, scikit-learnPrecise numerical comparison, explainable results
2. Deep LearningMid-2010sKeras, LSTMTemporal pattern learning, sequence generation
3. Large Language Models2020sOpenAI APINatural language understanding and transformation

Each phase didn’t replace the previous one — they complement each other. Cosine similarity provides the fastest nutritional matching. LSTM handles temporal sequencing. ChatGPT does what neither could: understand and rewrite natural language recipes.


What I Learned

Data Quality Determines Everything

After working with 20,000 recipes across a decade, the most consistent lesson: no algorithm can compensate for poor data quality. The time spent on cleansing and validation in the data pipeline phase paid dividends at every subsequent step.

Classical ML Still Has Its Place

Even in the age of LLMs, cosine similarity on well-structured numerical data is faster, cheaper, and more explainable than asking an LLM to compare recipes. The right tool depends on the problem, not the hype cycle.

The Best Time to Start Was Ten Years Ago

The problems worth solving haven’t changed. The tools have gotten dramatically better. Understanding why cosine similarity works for nutritional matching, or why temperature sampling produces diverse outputs, gives you a foundation that makes you a better engineer — even when working with modern LLMs.


Development Process in Detail

The full technical deep-dive is documented in a 4-part blog series:

PartTopic
Part 1Building the Data Pipeline — Cleansing 20,000 Recipes
Part 2Finding “Same Nutrition, Different Meal” with Cosine Similarity
Part 3Predicting Non-Boring Menus with LSTM Time Series
Part 4Transforming 20,000 Recipes with ChatGPT

Share this article