Part 7: Choosing a Translation LLM -- From DeepSeek to ChatGPT

Introduction

The series so far has focused primarily on LLM fine-tuning for stock price prediction, but there is another important use of LLMs in the Senrigan service: multilingual support (translation).

I wanted to translate news collected in Japanese and the prediction results into English so that overseas users could also use the service. In the process of selecting a translation LLM provider, I was drawn to DeepSeek’s low cost — and ended up facing unexpected problems.

“Choosing an LLM provider based on cost alone will come back to bite you on quality” — that is the conclusion of this article.

The Motivation for Multilingual Support

The Senrigan service initially operated in Japanese only. However, I decided to add English support to broaden the user base.

There are primarily three items that need translation:

News titles: Headlines of stock market news
News summaries: Summaries of news article content
Prediction reasoning: Explanations of why the AI made a particular prediction

These are generated daily and in significant volume. Manual translation was not realistic — automated LLM-based translation was essential.

Why I Initially Chose DeepSeek

The first translation LLM provider I chose was DeepSeek. The reasons were straightforward.

Low cost: Significantly cheaper than OpenAI
OpenAI-compatible API: The API interface is compatible with OpenAI’s. I only needed minor code modifications to use it
Reasoning ability: The deepseek-reasoner model had a reputation for strong reasoning capability

The low cost was particularly attractive. Since I needed to translate dozens to hundreds of news articles daily, the per-article cost would make a large difference over time.

The API call code looked like this — almost identical to OpenAI’s.

// DeepSeek API call
$url = 'https://api.deepseek.com/chat/completions';
$data = [
    'model' => 'deepseek-reasoner',
    'messages' => [
        ['role' => 'user', 'content' => $prompt]
    ],
    'stream' => false,
    'max_tokens' => 2000
];

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));
curl_setopt($ch, CURLOPT_TIMEOUT, 300); // 300-second timeout

Thanks to the OpenAI-compatible API, it worked just by changing the URL and model name. Integration was smooth.

But the problems started from there.

Problem 1: Painfully Slow Processing

The first issue I noticed was response latency.

When I sent a translation request, it sometimes took tens of seconds to several minutes before a response came back. Even setting the cURL timeout to 300 seconds (5 minutes) was not always enough.

curl_setopt($ch, CURLOPT_TIMEOUT, 300); // Even 300 seconds sometimes wasn't enough

What made this worse was that when a timeout occurred, the translated text would be cut off mid-sentence. Translations would end with something like “In the second quarter earnings, this company…” with the crucial conclusion missing.

When translating hundreds of news articles in a batch, several minutes per article means the total batch processing time becomes enormous. Since these batches are scheduled via crontab, there was a risk of the next batch starting before the previous one had finished.

Problem 2: Chinese Text Leaking In

The second problem was that Chinese characters would appear in the translation output.

Despite requesting Japanese-to-English translation, parts of the output would sometimes come out in Chinese. This was particularly noticeable with financial terminology.

For example, when translating the Japanese term for “ordinary profit” into English, the output would sometimes contain Chinese expressions mixed in instead of “ordinary profit.”

This is likely caused by the fact that DeepSeek’s model was trained on a large volume of Chinese data. Since Japanese and Chinese share kanji characters, the model appears to get pulled into “Chinese mode” when handling specialized terms like financial vocabulary.

There was no way to display these translation results directly on the service. English translations with Chinese mixed in are incomprehensible to users. Manual review and correction became necessary, which undermined the whole point of automation.

Problem 3: Input Data Leakage

The third problem was the most serious.

In the Senrigan service, I was including “market trending themes ranking” data as input when generating prediction reasoning. This information was meant to enrich the context of predictions, but I discovered that DeepSeek was outputting this input data verbatim.

In other words, data I passed in with “please use this information as reference to generate the prediction reasoning” was appearing directly as part of the prediction reasoning output. Internal ranking data was being displayed in a user-facing format.

This required immediate action.

// 20250329: Removed (DeepSeek was exposing this data)
// $themes = get_market_themes($db, $date_only);
$themes = [];

I removed the market theme ranking input entirely. This was not a fundamental fix, but a measure to avoid the problem.

Unifying Under ChatGPT

With three problems piling up, I decided to abandon DeepSeek.

I unified all LLM processing under OpenAI (ChatGPT).

// DeepSeek -> Commented out
// $api_result = call_deepseek_api($db, $prompt, $result);

// Unified to ChatGPT (with retry)
$summary = call_chatgpt5($prompt, LLM_TYPE_CHATGPT_5NANO);

After switching to ChatGPT, all three problems were resolved.

Processing speed: Response time improved to a few seconds to around ten seconds. No more timeout concerns
Chinese text leakage: Completely eliminated
Input data leakage: The model faithfully follows prompt instructions and no longer outputs input data verbatim

Current Translation Pipeline

After unifying under ChatGPT, the service operates with the following model configuration.

Process	Model	Purpose
Prediction reasoning generation	gpt-5-nano	Analyzing news and generating stock prediction reasoning
News translation	gpt-5-mini	Japanese-to-English translation of titles, summaries, and prediction reasoning
MSI news summarization	gpt-5-mini	News summarization for market sentiment indicators
Stock prediction	gpt-4o-mini (fine-tuned)	Prediction using the fine-tuned model

The choice between gpt-5-nano and gpt-5-mini is based on the balance between processing demands and cost. Prediction reasoning generation is relatively lightweight, so nano suffices. Translation and summarization require higher quality output, so mini is used.

Translation Processing Flow

The translation batch (translate_english) works as follows:

// Fetch news to be translated (untranslated items)
$sql = "SELECT code, target_date, type, title, contents_summary, predict_reason
        FROM v_firestore_news
        WHERE COALESCE(is_translated_news, 0) = 0";

foreach ($rows as $row) {
    // Translate title
    $title_en = call_chatgpt5(
        $row['title'],
        LLM_TYPE_CHATGPT_5MINI,
        $system_prompt_title
    );

    // Translate summary
    $contents_summary_en = call_chatgpt5(
        $row['contents_summary'],
        LLM_TYPE_CHATGPT_5MINI,
        $system_prompt_summary
    );

    // Translate prediction reasoning
    $predict_reason_en = call_chatgpt5(
        $row['predict_reason'],
        LLM_TYPE_CHATGPT_5MINI,
        $system_prompt_reason
    );

    // Save translation results to DB
    // ...
}

The title, summary, and prediction reasoning are each translated individually, and the results are saved to MySQL.

Provider Switching Mechanism

Having gone through the DeepSeek-to-ChatGPT migration, I implemented a provider switching mechanism to prepare for the possibility of switching to yet another provider in the future.

// Switch translation provider via environment variable
$provider = getenv('TRANSLATE_PROVIDER');
if ($provider !== 'openai' && $provider !== 'ms' && $provider !== 'both') {
    $provider = 'openai'; // Default is OpenAI
}

The TRANSLATE_PROVIDER environment variable can toggle between three options:

Value	Behavior
`openai`	Translate with OpenAI (ChatGPT) (default)
`ms`	Translate with Microsoft Translator API
`both`	Translate with both, prioritizing OpenAI results

I Also Considered Microsoft Translator

I felt it was risky to rely solely on LLM-based translation, so I also evaluated the Microsoft Translator API. Microsoft has a long track record in traditional machine translation and offers high stability.

However, Microsoft Translator struggles with context-aware, natural translations. For financial news translation, you need more than simple word-for-word translation — the result must read naturally to an investor. Terms like “upward revision” need to be rendered in contextually appropriate English, not translated mechanically.

Ultimately, OpenAI was superior in terms of quality, so it is the only provider currently in use. Microsoft Translator remains in the codebase as a fallback mechanism.

The Cost Reality

Migrating from DeepSeek to ChatGPT did increase translation costs. However, when you factor in the cost of dealing with quality issues (manual review, correction work, bug fixes), the total cost may actually be lower with ChatGPT.

Factor	DeepSeek	ChatGPT
API fees	Cheap	Slightly higher
Processing speed	Slow (longer batch times)	Fast
Quality checks	Required (Chinese leakage, truncated text)	Rarely needed
Manual corrections	Frequent	Almost none
Operational stability	Low (frequent timeouts)	High

Especially for independent development, “slightly more expensive but stable” is overwhelmingly easier than “cheap but unreliable.” Time spent investigating and fixing quality issues is a direct loss of development resources.

Prediction Reasoning Generation

Beyond translation, I also use LLMs for generating prediction reasoning.

The Senrigan service displays not just the predicted stock price but also “why the AI made that prediction.” This reasoning complements the prediction output from the fine-tuned model (gpt-4o-mini FT).

// Generate prediction reasoning
$prompt = "Based on the following news and company information, explain the reasoning "
        . "for the next-day stock price prediction.\n\n"
        . "News: " . $news_title . "\n"
        . "Summary: " . $news_summary . "\n"
        . "Company: " . $company_name . "\n"
        . "Industry: " . $industry;

$summary = call_chatgpt5($prompt, LLM_TYPE_CHATGPT_5NANO);

gpt-5-nano is used for prediction reasoning generation. The prediction itself is handled by the fine-tuned model, while a general-purpose model handles the explanation — a division of labor.

Since this prediction reasoning is also translated into English, the quality of the translation pipeline directly impacts the user experience.

Retry Mechanism

LLM APIs are not always stable. Temporary network issues or delays due to API server load can occur.

For this reason, API calls include a retry mechanism.

// Generate summary with call_chatgpt5 (max 3 retries)
$max_retries = 3;
for ($retry = 0; $retry < $max_retries; $retry++) {
    $summary = call_chatgpt5($prompt, LLM_TYPE_CHATGPT_5NANO);
    if ($summary !== false && $summary !== null && trim($summary) !== '') {
        break; // Exit loop on success
    }
    echo "Retry {$retry + 1}/{$max_retries}...\n";
    sleep(2); // Wait briefly before retrying
}

During the DeepSeek era, timeouts were frequent, making the retry mechanism essential. Since switching to ChatGPT, retries occur far less frequently, but I keep the mechanism in place as a safety net.

Lessons Learned from LLM Provider Selection

Here are the lessons I took away from the DeepSeek-to-ChatGPT migration.

1. Do Not Choose Based on Cost Alone

Low cost is attractive, but the decision should be based on total cost, including the cost of dealing with quality issues. For processes like translation where quality is directly visible to users, compromising on quality is fatal.

2. Training Data Language Balance Matters for Multilingual Processing

DeepSeek’s Chinese leakage problem was rooted in the language balance of the model’s training data. The shared kanji characters between Japanese and Chinese likely triggered unexpected language switching. When doing multilingual processing, the languages a model was trained on is an important selection criterion.

3. Build Provider Switching Into Your Architecture

The LLM landscape is evolving rapidly. Today’s optimal solution may not be optimal tomorrow. Designing your system so that providers can be swapped easily ensures you can migrate quickly when a better model comes along.

4. Quality Testing on Production Data Is Essential

Problems I did not notice during testing (Chinese leakage, input data exposure) only surfaced amid the diversity of production data. It is important to verify quality not just with test data, but at production scale.

Summary

Key takeaways from the translation LLM selection experience:

Adopted DeepSeek as the translation LLM but encountered 3 problems (latency, Chinese text leakage, data exposure)
Unified under ChatGPT (gpt-5-nano / gpt-5-mini) to resolve all issues
Implemented a translation provider switching mechanism (TRANSLATE_PROVIDER)
Also evaluated Microsoft Translator as a fallback option
LLM providers should be selected based on the balance of quality, stability, and multilingual capability — not cost alone

Next time is the final installment. I will cover the data migration from MySQL to Firestore and the production operation of the Senrigan service.

Previous: Part 6 — “Training Data Design”

Next: Part 8 — “MySQL to Firestore Migration and Production: The Road from RDB to NoSQL”