one official page which can describe the whole article including the main idea, the objective of the article, abstract, introduction and the conclusion.
Unformatted Attachment Preview
THE JOURNAL OF FINANCE • VOL. LIX, NO. 3 • JUNE 2004
Is All That Talk Just Noise? The Information
Content of Internet Stock Message Boards
WERNER ANTWEILER and MURRAY Z. FRANK∗
Financial press reports claim that Internet stock message boards can move markets.
We study the effect of more than 1.5 million messages posted on Yahoo! Finance and
Raging Bull about the 45 companies in the Dow Jones Industrial Average and the
Dow Jones Internet Index. Bullishness is measured using computational linguistics
methods. Wall Street Journal news stories are used as controls. We find that stock
messages help predict market volatility. Their effect on stock returns is statistically
significant but economically small. Consistent with Harris and Raviv (1993), disagreement among the posted messages is associated with increased trading volume.
MANY PEOPLE ARE DEVOTING a considerable amount of time and effort creating and
reading the messages posted on Internet stock message boards. News stories
report that the message boards are having a significant impact on financial
markets. The Securities and Exchange Commission has prosecuted people for
Internet messages. All this attention to Internet stock messages caused us to
wonder whether these messages actually contain financially relevant information.1 We consider three specific issues. Does the number of messages posted
or the bullishness of these messages help to predict returns? Is disagreement
among the messages associated with more trades? Does the level of message
posting or the bullishness of the messages help to predict volatility?
The first issue is, does the level of message activity or the bullishness of the
messages successfully predict subsequent stock returns? This is the natural
starting place because a very high proportion of the messages contain explicit
assertions that the particular stock is either a good buy or a bad buy. Of course,
there are a great many previous empirical studies showing how hard it is to
predict stock returns by enough to cover transactions costs. We find that there
is evidence of a small degree of negative predictability even after controlling
for bid–ask bounce. When many messages are posted on a given day, there
∗ Both authors are at the Sauder School of Business, University of British Columbia. We would
like to thank Richard Arnott; Jack Cooney; Elizabeth Demers; Rick Green; Alan Kraus; John Ries;
Jacob Sagi; an anonymous referee; and the seminar audiences at UBC, the University of Toronto,
the 2002 Canadian Economics Association, the Nasdaq-Yale-JFM conference, the 2002 Northern
Finance Association, and the 2002 American Finance Association annual meetings for helpful
comments. We thank the SSHRC (501-2001-0034) for financial support. The second author thanks
the B. I. Ghert Family Foundation for financial support. All errors are ours.
Weiss (2000) provides a nice discussion of how message boards were regarded in the financial
press as of 2000.
The Journal of Finance
is a statistically significant negative return on the next day. This return is
economically very small in comparison to plausible transactions costs.
The second issue is whether greater disagreement is associated with more
trades. This question was stimulated by reading the messages. At times the
message boards ref lect considerable disagreement, while at other times much
greater consensus is evident.
Financial theory provides two distinct perspectives on disagreement. A traditional hypothesis is that disagreement induces trading. Hirshleifer (1977),
Diamond and Verrecchia (1981), Karpoff (1986), Harris and Raviv (1993) and
others have carried out related theoretical analysis. An alternative perspective
is found in the “no-trade theorem” of Milgrom and Stokey (1982). According
to the “no-trade theorem,” when one person considers trading with another
person, each of them needs to consider why the other person might be willing
to trade at a particular price. In some settings disagreement does not induce
trading; it leads to a revision of market prices and beliefs.
We consider differences of opinion both in contemporaneous regressions and
predictive regressions. We also condition on a variety of other factors. In the contemporaneous regressions disagreement is associated with more trades. However, there is a reversal on the next day such that trading volume is lower than
it otherwise would have been.
The third issue is whether the message boards help to predict volatility. A
remarkable range of sometimes quite odd things are said in the messages. This
leads to the hypothesis that perhaps the people posting the messages are realworld counterparts of the “noise traders” that are so often invoked in financial
theory. In order to test this idea we need to define and to model volatility. The
literature provides a large number of approaches to volatility.
Many recent studies have employed realized volatility instead of focusing on
the squared residuals from a returns regression.2 Andersen et al. (2001) have
used this approach in a study of the firms in the Dow Jones Industrial Average.
We consider the importance of the message boards within a fractionally integrated realized volatility news response function that follows their approach.
We also consider volatility vector autoregression that is related to Andersen
et al. (2002).
The GARCH class of volatility models remains popular. Recent studies such
as Hansen and Lunde (2001) and Engle and Patton (2001) show that it is hard to
beat a GARCH (1,1) within the class of GARCH models. However, there is some
evidence of an asymmetric response between positive and negative shocks, as in
Glosten, Jagannathan, and Runkle (1993). Accordingly, we have also considered
the effect of the message boards within the context of GARCH, EGARCH, and
GJR models. To save space these results are left to the technical appendix. 3
Volatility models are often estimated without exogenous variables. However,
it is well known that trading volume helps forecast volatility (see Jones, Kaul,
Realized volatility follows from the work of French, Schwert, and Stambaugh (1987) and
Schwert (1990). It has been given theoretical foundations by Andersen et al. (2001).
A technical appendix to this paper is available on the web as a PDF file at http://pacific.
commerce.ubc.ca/antweiler/public/noise-1.pdf. The appendix contains additional discussions, results, and robustness checks.
Content of Internet Stock Message Boards
and Lipson (1994) for example). Glosten et al. (1993) and Engle and Patton
(2001) also fit models that include the Treasury bill rate as an exogenous factor.
We found evidence supporting the role of trading volume, but for our sample
we did not find any evidence that the Treasury bill rate helped to forecast
volatility. Thus, we include trading volume as an added factor in our volatility
We find that message posting helps to predict volatility. Perhaps due to multicollinearity, adding trading volume tends to reduce the impact of the number
of messages on market volatility. However, this reduction does not cause the
message board effect to vanish. Trading volume is the more important factor
for predicting the market volatility of some firms, while the messages are more
important for predicting the market volatility of other firms. Evidence for an
effect of bullishness or disagreement on volatility is weak.
Why do people post messages on Internet stock message boards? To properly
answer this question requires a theory of communication that also contains a
financial market. Such theories are starting to be developed. DeMarzo, Vayanos,
and Zwiebel (2001) argue that people overweight the opinions of those with
whom they talk. This kind of belief formation process can make it profitable
to be an inf luential agent. In the equilibrium all agents will want to listen to
other agents who are particularly inf luential since what they say will affect the
market. The model is of particular interest for our purposes since it provides
an explanation both for why people post messages on message boards, and for
why other people might choose to read the message boards.4
A somewhat different perspective is offered by Cao et al. (2002). They model
the importance of fixed costs of market participation, which implies that potential traders do not always trade. They argue that conversation is then potentially important: “The introduction of conversation among a subset of market
participants may have large effects on the equilibrium. A sidelined investor
who learns that another investor shares a similar signal may decide to participate. (p. 644)” If the stock message boards permit this type of communication,
then the prediction is that message posting should be followed by trading. Our
evidence supports this prediction.
The previous literature contains a small number of papers that have examined the ability of message boards to predict stock returns. There are mixed
claims about whether public information on the Internet can predict subsequent stock returns. Our finding that higher message postings predict negative
subsequent returns has not previously been reported. Our result does seem to
be economically small but statistically robust. The previous literature has not
examined the issue of whether differences of opinion in the stock messages are
associated with more trades. We find that differences of opinion are associated
Shiller (2000) also draws attention to the role of conversation, and suggests that information
passed through conversation may play an important role in informational cascades. Hong, Kubik,
and Stein (2002) provide indirect evidence of the importance of word of mouth communication. They
show that mutual fund manager’s trades in a given stock are connected to the trading decisions of
other fund managers located in the same city. They interpret the findings in terms of an epidemic
model of information spread by word of mouth. In contrast to the current paper, they can observe
the trader’s portfolios, but they do not have direct measures of the communication.
The Journal of Finance
with more trades. Similarly, previous studies have not examined the connection
between stock message posting and stock market volatility.
Turning to the individual papers in the previous literature, the first study
of Internet stock message boards was Wysocki (1999). For the 50 firms with
the highest posting volume between January and August 1998, he reports that
message posting did forecast next-day trading volume and next-day abnormal
stock returns. Using a broader sample of firms, Wysocki also measured the
cumulative message postings on Yahoo! Finance to July 1, 1998, and studied
the cross-sectional differences between firms in order to determine which firms
had a large number of messages posted. The firms with high message posting
activity were characterized by high market valuation relative to fundamentals;
high short seller activity; high trading volume; and high analyst following but
low institutional holdings.
Bagnoli, Beneish, and Watts (1999) compared the First Call analyst earning
forecasts to unofficial “whispers.” The whispers were collected from a number of
sources including Internet web pages and news stories that reported the whisper forecasts. The analysts from First Call tended to underestimate corporate
earnings announcements, while the whispers tended to be more accurate.
In a study of stocks in the Internet service sector, Tumarkin and Whitelaw
(2001) found that the messages did not predict industry adjusted returns or
abnormal trading volume. Das and Chen (2001) is devoted to the development
of a new natural language algorithm to classify stock messages. They illustrate
its application on a selected sample of nine firms during the last quarter of
2000. They find that the stock messages ref lect information rapidly but do not
forecast stock returns. Dewally (2000) collected stock recommendations from
two newsgroups (misc.invest.stocks and alt.invest.penny-stocks). He found that
there was no predictive content in the forecasts on these newsgroups. The recommended stocks typically had strong prior performance.
The rest of the paper is organized as follows. Section I discusses the messages
and how we extracted information from the texts. In Section II we describe a
number of the basic features of the data. The predictability of stock returns
is considered in Section III. Section IV presents the volatility results. Trading
volume is considered in Section V. Both the effect of disagreement on trading
volume and the predictability of trading volume are studied. The role of the
Wall Street Journal is considered in Section VI. We conclude in Section VII.
I. Message Board Data and Classification
During 2000, Yahoo! Finance and Raging Bull provided two of the largest and
most prominent sets of message boards. The sample of stocks being studied are
the 45 firms that together made up the Dow Jones Industrial Average (DIA)
and the Dow Jones Internet Commerce Index (XLK). These firms were fairly
large and well known.
Messages were downloaded from the Yahoo! Finance (YF) and Raging Bull
(RB) message boards using specialized software written by the authors. Downloaded messages were stored in a simple plain-text database format, one file
Content of Internet Stock Message Boards
DATE 2000/01/25 04:11
TITL ETYS will surprise all pt II
TEXT ETYS will surprise all when it drops to below 15$ a pop, and even then
TEXT it will be too expensive.
TEXT If the DOJ report is real, there will definately be a backlash against
TEXT the stock. Watch your asses. Get out while you can.
DATE 2000/03/29 11:39
TITL BUY ON DIPS – This is the opportunity
TEXT to make $$$ when IBM will be going up again following this profit taking
TEXT bout by Abbey Cohen and her brokerage firm.
TEXT IBM shall go up again after today.
Figure 1. Samples of bulletin board messages.
per day per company. Each message is uniquely identified by the bulletin board
code (YF or RB), the company’s ticker symbol, and the message board sequence
number. The file contents were then summarized in an index file that also lists
the date and time of posting, the message’s length in words, and the screen
name of the originator of the message.
To understand the nature of the postings it is helpful to look at examples.
Figure 1 provides two fairly typical examples of messages in the database format. Each message is dated and timed to the minute, has a title, and has a
text. The text very often contains a predicted price change and at least some
explanation for the prediction. Most of the explanations are fairly short. The
number of words in a message is most frequently between 20 and 50. Relatively
few messages have more than about 200 words. It is fairly rare for a message
to have more than 500 words. More than 40% of the messages are posted by
people who post only a single message.5 However, there are some people who
are very active and account for more than 50 messages each.
We only observe the chosen screen name rather than the author’s actual name. Therefore, if one
author posts messages using more than one screen name we count these as if they were separate
The Journal of Finance
Our message boards data contain more than 1.5 million text messages—far
too many to interpret manually. In order to assess the content of the stock messages we employ well established methods from computational linguistics. The
oldest algorithm used to interpret text is called Naive Bayes. We use this classic
algorithm as our main case. Another algorithm called Support Vector Machine
has become very popular for use in many classification problems, including text
classification.6 In order to ensure robustness we carry out all tests using both
algorithms to measure the messages. Both algorithms are used to code the
individual messages as bullish, bearish, or neither. The two algorithms produce quite similar results and so we only report the Naive Bayes results in the
A. Naive Bayes Message Coding
The Naive Bayes algorithm is the oldest of the algorithms used to classify
documents. Lewis (1998) provides a perspective on the history of the algorithm.
It continues to be among the most successful natural language algorithms.
For Naive Bayes text classification we have employed the Rainbow package
developed by McCallum (1996).7 The key assumption underlying the Naive
Bayes classification method is that occurrences of words are independent of
each other. The assumption of independence among words is the reason that
the algorithm is referred to as “naive.” Even though this is a highly unrealistic
assumption, Naive Bayes performs rather well in practice.8
In the context of text classification, Naive Bayes can be understood most
easily as a straight-forward mechanism of updating odds ratios. Consider a
stream of words Wi that are found either in a message of type T or its antitype T̃ . Let m be the number of occurrences of this word in type T, and let
m̃ be the number of occurrences in anti-type T̃ . Further, let n and ñ denote
the total number of words in classes T and T̃ , respectively. For words found
in messages from the training set we observe the conditional probabilities
P(Wi |T) = mi /ni and P (Wi |T̃ ) = m̃i /ñi . Now consider Bayes’ rule, updating our
prior P(T|Wi−1 ) to posterior P(T|Wi ) when we observe word Wi and thus P(Wi |T)
and P (Wi |T̃ ):
P (T |Wi ) =
P (T |Wi−1 )P (Wi |T )
P (T |Wi−1 )P (Wi |T ) + (1 − P (T |Wi−1 ))P (Wi |T̃ )
We discuss the Support Vector method in the technical appendix.
This software can be downloaded freely for academic purposes from the web at
This approach, an example of a “bag of words” approach to text classification, makes no direct
use of the grammatical structure. As an empirical matter it has been found that a surprisingly
small amount is gained at substantial cost by attempting to exploit grammatical structure in the
algorithms. For a helpful discussion of the various approaches to analyzing text see Manning and
Content of Internet Stock Message Boards
That is easily rewritten in odds-ratios form as
P (T |Wi )
P (T |Wi−1 )
P (Wi |T )
1 − P (T |Wi )
1 − P (T |Wi−1 ) P (Wi |T̃ )
with P(T|W0 ) ≡ P(T). Classifying a document thus amounts to multiplying odds
ratios when processing the document word by word. For reasons of computational accuracy, it is however common practice to add up logs of odds ratios,
P (T |W N ) = P (T ) exp
P (Wi |T )
P (Wi |T̃ )
where N is the number of words in a given document.9
We start by manually classifying a training data set of 1,000 messages. Based
on this training data set, the classification software filters our entire sample
of 1,559,621 messages to obtain buy, hold, or sell signals for each message. We
then aggregate the codings into indices that measure the bullishness of each
stock message board during each time period. For our analysis, we study time
periods of 15 minutes, one hour and one day.
Usage of the Rainbow software package proceeded in three steps. First we
split the 1,000 messages into buy, sell, and hold messages stored in individual directories. In the second step we run the rainbow utility to process the
messages in the training data set. In the software we use the option settings
“naive bays” for method, and “1000” for prune-vocab-by-infogain. The latter
restricts the number of words in the vocabulary to the top 1,000 words as
ranked by the average mutual information with the class …
Purchase answer to see full
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.Read more
Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.Read more
Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.Read more
Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.Read more
By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.Read more