Sentiment Analysis with PHP and Twitter API

Sentiment Analysis cover image

Introduction

Can human emotions be captured and quantified from text? This question sits at the intersection of natural language processing and data science, and it has practical implications across many domains:

  • Financial markets: Understanding the relationship between public sentiment and stock price movements.
  • Brand management: Measuring how consumers perceive a company or product in real time.
  • Product feedback: Determining whether responses to a new release are positive or negative at scale.
  • Public policy: Gauging public opinion on government decisions as they unfold.

Twitter (now X) was a natural data source for this kind of analysis. Its short-form, real-time nature means tweets often carry raw, unfiltered emotional signals. In this project, I built a dictionary-based sentiment analysis tool in PHP that collects tweets via the Twitter API, breaks them down through morphological analysis, and scores them against a sentiment polarity dictionary.

The full source code is available on GitHub.


Three Approaches to Sentiment Analysis

Before diving into the implementation, it is worth understanding the landscape of sentiment analysis methods. There are three primary approaches:

1. Dictionary-Based (Rule-Based) Analysis

This method relies on a pre-compiled dictionary where words are assigned numerical polarity scores. Each word in a sentence is looked up in the dictionary, and the scores are aggregated to determine overall sentiment. It is straightforward to implement and requires no training data.

2. Machine Learning

Since the 2010s, machine learning approaches have become widespread. Techniques such as Doc2Vec convert sentences into vector representations, allowing algorithms to learn contextual meaning rather than relying on individual word lookups. This enables more nuanced understanding of sentiment, including sarcasm and context-dependent expressions.

3. Cloud-Based NLP Services

Major cloud providers offer sentiment analysis as a managed service, leveraging large-scale AI models behind the scenes:

  • Google Cloud Natural Language
  • IBM Watson Tone Analyzer
  • Amazon Comprehend
  • Azure Cognitive Services

These services offer high accuracy with minimal setup, but come with API costs and data privacy considerations.

For this project, I chose the dictionary-based approach. It is the most transparent method — you can trace exactly why a sentence received a given score — and it provides a solid foundation for understanding how sentiment analysis works at a fundamental level.


How Dictionary-Based Sentiment Analysis Works

The process follows three steps:

Step 1: Collect Tweets via the Twitter API

The program accepts a search keyword, queries the Twitter Search API, and retrieves recent tweets matching that keyword.

Step 2: Morphological Analysis

Each tweet is decomposed into its constituent words through morphological analysis. This is the process of splitting a sentence into the smallest meaningful units and identifying their grammatical roles.

For example, given the sentence:

“Today is good weather.”

Morphological analysis produces:

WordPart of Speech
TodayPronoun
isVerb
goodAdjective
weatherNoun

For Japanese text, this step is essential because Japanese does not use spaces between words. The tool MeCab handles this decomposition. For English text, tokenization is more straightforward since words are already space-delimited, though MeCab can still be used with an English dictionary.

Step 3: Score Against the Sentiment Dictionary

Each morphologically analyzed word is looked up in a sentiment polarity dictionary. The dictionary assigns a real-valued score between -1 (strongly negative) and +1 (strongly positive) to each word.

The sentiment dictionary used in this project comes from the research of Professor Takamura at Tokyo Institute of Technology. The scores were computed automatically using a lexical network, mapping words from Iwanami dictionary (Japanese) and WordNet (English) to their semantic orientations.

For example, scoring the sentence “Today is good weather”:

WordScore
Today0.0205
good1.0000
weather-0.0171
Total1.0034

Since the total score is positive, the sentence is classified as expressing positive sentiment.


Environment Setup

Prerequisites

The project was built on the following stack:

  • CentOS 6.8
  • Apache 2.2
  • PHP 5.6

Installing MeCab (Morphological Analysis Engine)

MeCab is one of the most widely used morphological analysis tools for Japanese. Even for primarily English analysis, installing it provides the tokenization infrastructure the PHP program depends on.

# Install MeCab
cd /usr/local/src
wget "https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7cENtOXlicTFaRUE"
tar xvfz mecab-0.996.tar.gz
cd mecab-0.996
./configure --with-charset=utf8
make
make install

# Verify installation
mecab -v
# mecab of 0.996

# Install IPA dictionary for MeCab
cd /usr/local/src
wget "https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7MWVlSDBCSXZMTXM"
tar zxfv mecab-ipadic-2.7.0-20070801.tar.gz
cd mecab-ipadic-2.7.0-20070801
./configure --with-charset=utf8
make
make install

The PHP extension for MeCab (php-mecab) bridges MeCab’s C library into PHP:

# Install php-mecab extension
cd /usr/local/src
git clone https://github.com/rsky/php-mecab.git
cd php-mecab/mecab/
phpize
./configure --with-php-config=/usr/bin/php-config --with-mecab=/usr/bin/mecab-config
make
make install

# Add extension to php.ini
echo "extension=/usr/lib64/php/modules/mecab.so" >> /etc/php.ini

# Restart Apache
service httpd restart

A quick test to verify MeCab works from PHP:

<?php
$options = array('-d', '/usr/local/src/mecab-ipadic-2.7.0-20070801');
$mecab = new \MeCab\Tagger($options);
$text = "test words";
$node = $mecab->parseToNode($text);

while ($node) {
    echo $node->getSurface() . " => " . $node->getFeature() . "\n";
    $node = $node->getNext();
}

Installing the Sentiment Dictionary

The sentiment polarity dictionary from Professor Takamura’s research page provides the scoring foundation. Each entry maps a word to a float value between -1 and +1, computed through a lexical network algorithm. Download and place it in the project directory.

Configuring the Twitter API

To retrieve tweets programmatically, you need a Twitter Developer account and API credentials. The process involves:

  1. Registering on the Twitter Developer Portal
  2. Creating a new application
  3. Generating your API key, API secret key, Access token, and Access token secret

Twitter API keys and tokens

These four credentials are what your PHP program uses to authenticate with the Twitter API. Store them securely — they grant full access to the API on your behalf.

Installing the Twitter OAuth Library

Twitter’s API requires OAuth authentication. Rather than implementing the OAuth handshake manually, I used the abraham/twitteroauth library, installed via Composer:

# Install Composer
curl -sS https://getcomposer.org/installer | php
mv composer.phar /usr/local/bin/composer

# Install the Twitter OAuth library
cd /path-to-project-directory/
composer require abraham/twitteroauth

Implementation

The full source code is on GitHub. Here I will walk through the key parts.

Configuration (config.php)

The configuration file holds the Twitter API credentials and analysis parameters:

<?php
// Twitter API credentials
define('TWITTER_API_KEY',    'your-api-key');
define('TWITTER_API_SECRET', 'your-api-secret');
define('ACCESS_TOKEN',       'your-access-token');
define('ACCESS_SECRET',      'your-access-token-secret');

// Analysis parameters
define('SEARCH_COUNT', 100);
define('PERMIT_ONLY_ADJECTIVE', 1);

The PERMIT_ONLY_ADJECTIVE flag controls whether the analysis considers all parts of speech or only adjectives. Adjectives tend to carry the strongest emotional signals (“beautiful”, “terrible”, “amazing”), so filtering to adjectives only can reduce noise from emotionally neutral words like nouns and verbs.

Fetching Tweets (main.php)

The core logic creates a TwitterOAuth instance and queries the Standard Search API:

// Create TwitterOAuth instance
$obj = new TwitterOAuth(
    TWITTER_API_KEY,
    TWITTER_API_SECRET,
    ACCESS_TOKEN,
    ACCESS_SECRET
);

// Search for recent tweets matching the keyword
$options = array(
    'q'           => $keyword,
    'lang'        => 'en',
    'result_type' => 'recent',
    'count'       => SEARCH_COUNT
);

$json = $obj->get("search/tweets", $options);

The search/tweets endpoint returns recent tweets matching the query. The lang parameter filters to English tweets, and count controls how many results to retrieve (up to the API’s rate limit of 180 requests per 15-minute window on the free tier).

Sentiment Scoring

After retrieving tweets, the program:

  1. Feeds each tweet through MeCab for morphological analysis
  2. Extracts words (optionally filtering to adjectives only)
  3. Looks up each word in the sentiment dictionary
  4. Sums the scores to produce an overall sentiment value
$feelings = new Feelings(PERMIT_ONLY_ADJECTIVE);

The Feelings class encapsulates the dictionary lookup and scoring logic. A positive total score indicates overall positive sentiment; a negative score indicates negative sentiment. The magnitude reflects the strength of the expressed emotion.


Running the Program

Launching index.php in the browser presents a simple interface for entering a search keyword:

Entering a keyword for sentiment analysis

After submitting a keyword, the program fetches matching tweets, runs the morphological analysis and dictionary scoring pipeline, and displays the aggregated result:

Sentiment analysis results

A positive score indicates that the collected tweets about that keyword express overall positive sentiment. A negative score indicates overall negative sentiment. The score’s magnitude reflects how strongly the sentiment leans in either direction.


Limitations and Future Directions

The dictionary-based approach has clear strengths: it is transparent, deterministic, and easy to debug. You can trace exactly which words contributed to a given score. However, it also has significant limitations:

  • No context awareness: The phrase “not good” contains the word “good” (positive), but the sentence is negative. A dictionary-based approach cannot capture negation or sarcasm.
  • Fixed vocabulary: Words not in the dictionary receive no score, effectively being ignored. Slang, neologisms, and domain-specific jargon are invisible to the system.
  • No learning: The system cannot improve with more data. Its accuracy is bounded by the quality and coverage of the dictionary.

For more accurate sentiment analysis, deep learning approaches — such as fine-tuned transformer models — can learn contextual meaning, handle negation, and adapt to new vocabulary. These methods require substantially more infrastructure (training data, GPU compute, model serving), but they represent the current state of the art in NLP.

This project served as a valuable exercise in understanding the fundamentals: how text is tokenized, how sentiment is quantified, and how external APIs can be integrated into a working analysis pipeline. These foundational concepts remain relevant regardless of which technique you ultimately choose.


Source code: github.com/matu79go/sentiment

Share this article

Related Posts