About ten years ago, researchers in business economics started using textual analysis methods to analyze the verbal content of documents. In the field of finance, Tetlock (2007) is regarded as one of the “milestone” contributions. The paper analyzes the relation between the tone of the daily Wall Street Journal column “Abreast of the Market” and stock market returns over the subsequent days. In this course, students will learn how textual analysis methods work and how they can be implemented using Python.

The first part will introduce students to prominent papers on textual analysis. The lecturer will discuss the most commonly used methods for textual analysis, e.g. simple word count and Naïve Bayes.

In the second part, the most commonly used text databases will be presented. For instance, the EDGAR (Electronic Data Gathering, Analysis, and Retrieval System) of the Security and Exchange Commission (SEC) will be introduced.

The third and largest part of the course deals with the implementation of textual analysis methods introduced in the first part using the programming language Python. Furthermore, the students will use Python to obtain data from the databases introduced in the second part (e.g. from the EDGAR system).

As Part 3 starts with a general introduction to Python, it is not required to have any previous knowledge or experience with Python.

The course explicitly targets students from all disciplines. Having some basic knowledge of economics is helpful but not required.

For the programming problems, participants need a computer or notebook with the software “Anaconda”. “Anaconda” provides a handy user interface for Python and includes additional Python packages. It is available for free at https://www.anaconda.com/download/. Version 3.6 of Python (and not 2.7) is recommended for the course. “Anaconda” is available as a 32-bit or 64-bit version.

Introductory literature

Loughran, T., and B. McDonald (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research, 54(4), 1187-1230.

Qualification objective

The students will learn to implement the following procedures in Python

  • Download documents and files automatically from the internet
  • Edit text documents and search for information in documents using regular expressions
  • Perform a dictionary-based textual analysis
  • Determine measures of readability and document complexity
  • Introduction to machine learning

The overall goal of the course is to provide students with the knowledge and tools to apply the procedures mentioned above to their research projects.