Part 1: The Beginning and the Big Picture -- The Day ChatGPT Beat the Fund Managers
![]()
Introduction
“AI that predicts stock prices” — until recently, that sounded like something out of science fiction.
I am a software engineer who builds web services as an independent developer. I have always been interested in stock investing, but I had never formally studied technical analysis or fundamental analysis. I was, for all intents and purposes, an ordinary individual investor.
In this series, I want to share how I went from that starting point to fine-tuning an LLM (Large Language Model) for stock price prediction and launching it as a live web service.
I intend to be honest about everything — not just the successes, but the failures and detours as well. If you are an independent developer interested in trying LLM fine-tuning, or if you are curious about combining AI with financial data, I hope you will find something useful here.
ChatGPT Beat the Fund Managers
In 2023, Finder, a UK-based financial comparison site, ran an experiment that caught a lot of attention.
They had ChatGPT pick 38 stocks and build a portfolio. Over a 63-day period, that portfolio returned +4.9%. During the same period, the top 10 most popular funds in the UK (HSBC, Fidelity, etc.) averaged -0.8% — ChatGPT had significantly outperformed professional fund managers.
Over a cumulative two-year period, the ChatGPT portfolio reached +41.97% in returns, while the popular funds averaged +27.63% — a gap of more than 14 percentage points.
Of course, Finder themselves cautioned that “this does not mean you should use ChatGPT for investing.” It was a limited experiment, and the results could have been a matter of luck. Different market conditions might have yielded entirely different outcomes.
Still, what the experiment demonstrated was significant:
LLMs can “understand” a company’s situation from natural language and potentially apply that understanding to investment decisions.
“Wait, really? Then couldn’t the same thing work with Japanese stock market news?”
That was the starting point of this project.
Why an LLM? — How It Differs from Traditional Approaches
When you hear “stock price prediction,” the first things that come to mind are probably time-series models like LSTM (Long Short-Term Memory) or ARIMA (AutoRegressive Integrated Moving Average). These methods find patterns in historical price data and use them to predict future movements.
They are great at handling numerical data, but they have one major limitation: they cannot understand the content of news articles.
For example, suppose a company announces earnings with a “51% upward revision to ordinary income.” An LSTM can learn patterns from historical price charts, but it cannot comprehend the meaning of that news headline and reason about how it might affect the stock price the next day.
An LLM, on the other hand, can understand natural language. It can grasp that a “51% upward revision” is positive news, and furthermore, it can combine that understanding with context like the company’s industry, market capitalization, and recent price trends to predict the next day’s price movement.
Integrating natural language information from news with numerical data like stock prices and financial metrics to generate predictions. That was the reason I chose LLM fine-tuning as my approach.
What I Built — The Senrigan Service
What I ultimately built is an AI stock price prediction web service called Senrigan (meaning “clairvoyance” in Japanese).
Service URL: https://senrigan.tech/
Senrigan takes five types of data as input and predicts the next day’s stock price movement:
| # | Data Type | Details |
|---|---|---|
| 1 | Company information | Industry, market capitalization, company description, etc. |
| 2 | News article text | Earnings announcements, PR releases, equity-related disclosures, etc. |
| 3 | Stock price data | OHLCV (Open, High, Low, Close, Volume) for the last 5 trading days |
| 4 | Financial data | 2 years of revenue, profit margins, EPS, ROA, ROE |
| 5 | Macroeconomic indicators | CPI, GDP, unemployment rate, policy interest rates, exchange rates |
It then outputs three prediction values in JSON format:
- Close today -> Open tomorrow (overnight movement)
- Close today -> Close tomorrow (full-day movement)
- Open tomorrow -> Close tomorrow (intraday movement)
Here is what an actual prediction result looks like:
{
"close_to_next_open": {"price": 3045, "change_pct": 0.0, "trend": "neutral"},
"close_to_next_close": {"price": 3075, "change_pct": 0.98, "trend": "neutral"},
"next_open_to_close": {"price": 3075, "change_pct": 1.0, "trend": "neutral"}
}
When news is published, the system automatically collects data, the AI predicts the next day’s stock price movement, and the results are published on the website. It runs every day without human intervention.
Overall Architecture
The Senrigan service is composed of three projects:
+---------------------+ +--------------------+ +----------------+
| meloik | | assetai_firebase | | stockSite |
| News collection | | Firestore sync | | Web UI |
| AI prediction | --> | | --> | |
| Data generation | | | | |
+---------------------+ +--------------------+ +----------------+
VPS (PHP/MySQL) VPS (Python) Vercel (Next.js)
meloik (Data Generation / PHP + MySQL)
This is where everything begins. It is a collection of PHP batch processes running on a VPS (virtual private server), responsible for:
- News collection: Gathering publicly disclosed corporate information from sources like TDnet (Timely Disclosure network)
- AI prediction: Sending prediction requests to the fine-tuned LLM
- Translation: Translating news and prediction results into English (multilingual support)
- Data generation: Collecting and formatting company information, stock prices, and financial data
The core prediction flow looks something like this:
// Fetch prediction data (company info + news + prices + financials + macro indicators)
$jsonData = Utility::getPredictionData($db, $company['code'], $start_date, $end_date);
// Send prediction request to OpenAI API
$response = callFineTunedModel($jsonData);
// Save prediction results to MySQL
savePrediction($db, $code, $target_date, $response);
assetai_firebase (Firestore Sync / Python)
This project exports data from meloik’s MySQL database to Firebase (Firestore).
Why Firestore? Because it allows the frontend (Next.js) to retrieve data directly in a serverless manner. Exposing MySQL directly would be a security risk, and standing up a separate API server adds cost. By placing Firestore in between, the frontend can safely retrieve data using the Firestore SDK.
# Fetch data from MySQL and write to Firestore
def save_to_firestore(collection_name, doc_id, data, force=False):
doc_ref = db.collection(collection_name).document(doc_id)
existing_doc = doc_ref.get()
new_epoch = data.get("updated_at_epoch")
if existing_doc.exists and not force:
existing_data = existing_doc.to_dict()
old_epoch = existing_data.get("updated_at_epoch")
if old_epoch and new_epoch and old_epoch >= new_epoch:
return False # Skip if existing data is newer (cost reduction)
doc_ref.set(data)
return True
On weekdays, incremental sync runs every 15 minutes, ensuring the latest data is always reflected in Firestore.
stockSite (Web UI / Next.js + Vercel)
This is the frontend that users actually see. It is built with Next.js and deployed on Vercel.
It fetches Firestore data using ISR (Incremental Static Regeneration) with a 5-minute cache interval. This keeps Firestore read costs down while displaying near-real-time information.
Data Flow — From News to Prediction Display
Here is the complete data flow of the Senrigan service, laid out chronologically:
1. News is published
+-> meloik collects the news and stores it in MySQL
2. Prediction batch kicks off
+-> Fetches company info, prices, financials, and macro indicators from the DB
+-> Assembles the five data types into JSON
+-> Sends the JSON to the fine-tuned LLM (OpenAI API)
+-> Saves prediction results to MySQL
3. Translation batch kicks off
+-> Translates news and prediction reasoning into English
4. Firestore sync
+-> assetai_firebase exports incremental changes from MySQL to Firestore
5. Web display
+-> stockSite fetches data from Firestore and renders it on screen
All of these batch processes are scheduled via crontab and run automatically during market hours on weekdays. No human intervention is required.
The Road to LLM Fine-Tuning
Now, here is the main topic. The “AI prediction” component I mentioned in the architecture overview — the process of building a fine-tuned LLM — is the central theme of this series.
To cut to the conclusion: I ultimately used OpenAI API’s fine-tuning feature to customize the gpt-4o-mini model. But getting there involved three major phases:
Phase 1: Local GPU (RTX 3060)
-> Gave up due to insufficient VRAM
Phase 2: Fine-tuning on Google Colab
-> Tried 3 models with much trial and error, could not achieve sufficient accuracy
Phase 3: OpenAI API Fine-tuning
-> Training completed in about 8 minutes, adopted for production
Honestly, Phase 1 and Phase 2 are stories of “things that didn’t work out.” But it was precisely because of that trial and error that I gained a deep understanding of LLM fine-tuning, and I believe it enabled me to make the right decision in Phase 3.
The Models I Tried
Here is a list of all the models that appear throughout this series, encountered during the process of trial and error:
| # | Model | Parameters | Phase | Result |
|---|---|---|---|---|
| 1 | ELYZA Llama-3-JP-8B | 8B (8 billion) | Phase 1, 2 | Ran out of VRAM locally. Training ran on Colab but accuracy was insufficient |
| 2 | llm-jp-3-7.2b-instruct3 | 7.2B (7.2 billion) | Phase 2 | Implemented additional training pipeline but accuracy was insufficient |
| 3 | rinna/japanese-gpt2-medium | - | Phase 2 | Used for GGUF conversion practice |
| 4 | gpt-4o-mini | - | Phase 3 | Adopted for production. Stable JSON output and sufficient accuracy |
I initially tried open-source models because I wanted to “have my own LLM running locally.” In the end, though, API-based fine-tuning turned out to be the practical solution for an independent developer.
Technologies and Methods Used
I used a variety of technologies throughout the fine-tuning process. Detailed explanations will come in later installments, but here is an overview:
| Technology | Overview | Coverage in This Series |
|---|---|---|
| Quantization | Reducing the precision of model weights to save memory | Explained in Part 3 |
| LoRA (Low-Rank Adaptation) | Fine-tuning by training only a small number of additional parameters | Explained in Part 3 |
| SFTTrainer | HuggingFace’s supervised fine-tuning trainer | Used in Part 4 |
| GGUF conversion | Converting models for local inference | Used in Part 4 |
| OpenAI Fine-tuning API | API-based fine-tuning | Detailed in Part 5 |
Series Roadmap
This series is planned for a total of 8 installments. Here is a brief summary of each:
| Part | Title | Content |
|---|---|---|
| Part 1 (this article) | The Beginning and the Big Picture | Project motivation, Senrigan service overview, overall architecture |
| Part 2 | The Local GPU Challenge and Defeat | Taking on ELYZA 8B with an RTX 3060, and giving up due to VRAM limitations |
| Part 3 | LoRA and Quantization Explained | Illustrated explanation of the lightweight fine-tuning techniques |
| Part 4 | Stock Prediction on Colab | Trial and error with 3 models — from the Mount Fuji experiment to real data |
| Part 5 | OpenAI API Fine-Tuning | From the pivot in strategy to training completion in just 8 minutes |
| Part 6 | Training Data Design | Integrating 5 data types, data cleaning, and creating ground-truth labels |
| Part 7 | Choosing a Translation LLM | From DeepSeek to ChatGPT — how chasing low costs led to a painful lesson |
| Part 8 | MySQL to Firestore Migration and Production | The RDB-to-NoSQL challenges and cost optimization strategies |
While the focus is on technical content, I also plan to share the decision-making process as an independent developer and the lessons learned from failures along the way.
The Context of Independent Development
There is one thing I want to emphasize.
This project is, through and through, independent development. I do not have access to abundant GPU clusters, nor do I have a team of data scientists. All I had was a laptop with an RTX 3060, Google Colab’s free tier, and some OpenAI API credits.
Within those constraints, “how to realistically leverage LLMs” is the consistent theme running through this entire series. Even without cutting-edge GPUs or large-scale infrastructure, with enough ingenuity, you can integrate LLMs into your own service. I would be happy if I can show you that path.
In the next installment, I will tell the story of my first challenge — attempting fine-tuning on a local GPU (RTX 3060) — and how it ended in spectacular defeat.