L697 - Advanced Topics in Information Systems:
Formal and Relational Concept Analysis

School of Library and Information Science
Indiana University
Summer I 1997

Instructor: Uta Priss
Email: upriss@indiana.edu
Office: 022 SLIS
Office phone: 812-855-2793
Office hours:

Introduction

Formal Concept Analysis is a fast-growing, relatively new field. It was introduced by Rudolf Wille in 1982. Since then more than 250 papers on the subject have been published, including several textbooks and conference proceedings. It provides a method of formal data analysis which has successfully been applied to many fields, such as medicine and psychology, musicology, linguistic databases, library and information science, software re-engineering, civil engineering, ecology, and others. (An extensive bibliography of Formal Concept Analysis can be found here.) A main advantage of Formal Concept Analysis as a tool for formal data analysis is its capability of producing graphical visualizations of the inherent structures among data. Especially for social scientists, who often handle data sets that cannot fully be captured in quantitative analyses, Formal Concept Analysis extends the scientific toolbox of formal analysis methods. Statistics and Concept Analysis complement each other in this sense.

In the field of information science there is even a further application: the mathematical lattices that are used in Formal Concept Analysis are orderings and can therefore be interpreted as classification systems. This leads to a new understanding of the structure of classification systems which can be controlled by the formal representation. This interpretation of classification structures is compatible with theories among library scientists (such as Bliss and Shera). Furthermore, Ranganathan's `facets' are represented as `scales' in the framework of Formal Concept Analysis. Formalized classification systems can be analysed according to the consistency of their relations. Thesauri can automatically be constructed from classes and their attributes, without having to create a hierarchy of classes by hand. As an example, an on-line library catalog using the Conceptual Diagrams of an automatically constructed class hierarchy has been implemented in the ZIT library in Darmstadt.

Course Objectives

  1. To introduce a formal method of qualitative data analysis.
  2. To provide practical experience with basic data analysis techniques, such as selection, grouping and scaling of features.
  3. To develop the student's ability to understand the problems involved in the formalization of `informal' data.
  4. To teach practical skills of using the computer software DIAGRAM, ANACONDA, and TOSCANA.
  5. To provide practical experience with techniques of structuring graphical representations.
  6. To provide insights into the formal structure of classification systems.

Class Organization

The course consists of lectures by the instructor and class discussions. About 50 % of the class time will be spent in practical training (lab sessions and computer lab sessions). During the first session students will form teams of two to three members. Each team will select a topic from a field of interest (this can be data they used for research in another class or research project) which appears to be suitable to be analysed using Formal Concept Analysis. The instructor will provide some topics for groups who do not find an appropriate topic. The students will develop a formal analysis of their data using the techniques they learn during the semester. In the 11th session the groups will present their results to the class.

Readings

Since the existing textbooks on Formal Concept Analysis are either written for mathematicians or written in German, there will be no textbook for the class. All readings for the class will be put on reserve in the SLIS library. The students are required to make their own photocopies of the two introductory papers (Wille (1996): `Introduction to FCA' and Wolff (1994): `A first course in FCA') and bring them to class, since these two papers will be used as the main material for several sessions. The readings should be read before the session to which they are assigned according to the class schedule.

Grading

The final course grade will be computed for each student on the basis of grades assigned for the following:

Class contribution 1/3
Group Project 1/3
Final Exam 1/3

Each student is expected to complete all course work by the end of the term. A grade of incomplete (I) will be assigned only if exceptional circumstances warrant.

Class contribution

Class contribution does not mean attendance, but the quality and quantity of contributions to the work of the class. Comments and questions are equally valuable if they help to clarify the topics and to move the discussion forward. The assignments and readings of each week must be completed before the class meeting so that substantive and meaningful contributions from the students are possible. It is required that every student demonstrate respect for the ideas, opinions, and feelings of all other members of the class.

Group presentation and project

During the first session students will form teams of two to three members. Each team will select a topic from a field of interest which appears to be suitable to be analysed using Formal Concept Analysis. The students will develop a formal analysis of their data using the techniques they learn during the semester. In the 11th session the groups will present their results to the class. The class presentation will use several representation techniques learned during the semester and contain an interpretation of the results. The presentation will last for 20 to 30 minutes. The groups are recommended to consult the instructor several times throughout the semester to clarify questions and to discuss ways of solving problems.

Final Exam

The final exam will be a take-home exam consisting of two (small) data sets to be modeled with Formal Concept Analysis and two essay questions. It will be distributed during the 9th session and it will be due at the beginning of the 12th session. Teamwork is not acceptable for the final exam.

Academic Dishonesty

Any assignment that contains plagiarized material or indicates any other form of dishonesty will receive, at a minimum, an automatic grade of F. A second instance will result in an automatic grade of F for the course.

Class Schedule

Session: 1. Formal Modeling of Data

  • How can informal data be formally investigated?
  • Data selection, coding and representation
  • Qualitative versus quantitative data analysis

    Assignment:

    Create a list of problems involved in formal data analysis methods. Assign the problems to the different stages of data analysis (such as data selection, coding and representation). Try to evaluate the severity of each problem.

    Readings:

    Wolff, Karl Erich (1995)
    Comparison of Graphical Data Analysis Methods. Proceedings SoftStat'95. (Hint: It is not necessary to understand the mathematical details of this paper. Study the "main problems" mentioned for the data analysis methods.)
    Schoenemann, Peter H. (1994)
    Measurement: The Reasonable Ineffectiveness of Mathematics in the Social Sciences. In: Borg; Mohler (eds.). Trends and Perspectives in Empirical Social Research. De Gruyter. pp. 149 - 159.
    de Leeuw, Jan (1994)
    Statistics and the Sciences. In: Borg; Mohler (eds.). Trends and Perspectives in Empirical Social Research. De Gruyter. pp. 138 - 148.

    Optional Readings:

    Gigerenzer, Gerd; Murray, David J. (1987)
    Emergence of Statistical Inference. In: Cognition as Intuitive Statistics. Lawrence Erlbaum, Hillsdale, New Jersey. pp. 1 - 28.
    Paulos, John Allen (1988)
    Innumeracy : mathematical illiteracy and its consequences. New York, Hill and Wang. (Hint: This book is not directly concerned with data analysis, but with basic mathematical operations, such as counting and estimating. The mistakes which Paulos describes are more likely to be found in newspaper statistics than in scientific research.)

    Session: 2. Content Analysis - Linguistic and Computational Methods

  • Content analysis
  • Linguistic methods for text (discourse) analysis
  • The software GABEK

    Readings:

    Harris, Mary Dee (1985).
    Introduction to Natural Language Processing. pp 55 - 69 and 98 - 104
    Zelger, Josef (1993).
    A Dialogic Networking Approach to Information Retrieval. Preprint 24, University of Innsbruck.
    Zelger, Josef (1996).
    From Verbal Data to Practical Knowledge. In: Gaul; Pfeifer. From Data to Knowledge. Springer. pp. 458 - 465.
    Groeben, Norbert; Rustemeyer, Ruth (1994).
    On the Integration of Quantitative and Qualitative Methodological Paradigms (Based on the Example of Content Analysis). In: Borg; Mohler (eds.). Trends and Perspectives in Empirical Social Research. De Gruyter. pp. 308 - 325.

    Session (Lab): 3. Formal Concepts and Concept Lattices

  • Designing formal contexts
  • Extracting concepts from formal contexts

    Readings:

    Wille, Rudolf (1996).
    Introduction to Formal Concept Analysis. Preprint, TH-Darmstadt. pp 1-4
    Wolff, Karl Erich (1994)
    A first Course in Formal Concept Analysis. Proceedings SoftStat'93. Gustav Fischer Verlag. pp 1-5

    Session (Lab): 4. Line Diagrams of Concept Lattices

  • How to draw `nice' line diagrams

    Readings:

    Wille, Rudolf (1996).
    Introduction to Formal Concept Analysis. Preprint, TH-Darmstadt. pp 4-5

    Session (Computerlab): 5. ANACONDA and DIAGRAM

  • The software ANACONDA and DIAGRAM
  • Drawing lattices with a computer

    Readings:

    Luksch; Skorsky; Wille (1985)
    On Drawing Concept Lattices with a Computer. In: Gaul; Schader (eds.). Classification as a Tool of Research. North-Holland. pp. 269 - 274.

    Session: 6. Facet Theory

  • Facet theory
  • Faceted classification systems
  • Facets as scales in Formal Concept Analysis

    Readings:

    Borg, Ingwer (1994).
    Evolving Notions of Facet Theory. In: Borg; Mohler (eds.). Trends and Perspectives in Empirical Social Research. De Gruyter. pp. 178 - 200.
    Vickery, Brian C. (1972/1966)
    Faceted classification schemes. In: A. F. Painter (ed.). Reader in classification and descriptive cataloguing. NCR Microcard Editions. pp. 107 - 114.

    Session (Lab): 7. Conceptual Scaling

  • Conceptual Scaling of many-valued contexts

    Readings:

    Wolff, Karl Erich (1994)
    A first Course in Formal Concept Analysis. Proceedings SoftStat'93. Gustav Fischer Verlag. pp 5-9
    Vellemann, Paul; Wilkinson, Leland (1994)
    Nominal, Ordinal, Interval, and Ration Typologies are Misleading. In: Borg; Mohler (eds.). Trends and Perspectives in Empirical Social Research. De Gruyter. pp. 161 - 177.

    Optional Reading:

    Ganter, Bernhard; Wille, Rudolf (1989)
    Conceptual Scaling. In: Roberts (ed.). Applications of combinatorics and graph theory to the biological and social sciences. Springer, Heidelberg.

    Session (Computerlab): 8. Nested Line diagrams and TOSCANA

  • Nested Line diagrams
  • Citation order
  • The software TOSCANA

    Readings:

    Wille, Rudolf (1996).
    Introduction to Formal Concept Analysis. Preprint, TH-Darmstadt. pp 6-7
    Scheich; Skorsky; Vogt; Wachter; Wille (1993).
    Conceptual Data Systems. In: Opitz; Lausen; Klar (eds.). Information and Classification. Springer, Berlin-Heidelberg-New York.
    Vogt, Frank; Wille, Rudolf (1995)
    TOSCANA - A Graphical Tool for Analyzing and Exploring Data. In: Tamassia; Tollis (eds.). Graph Drawing. Springer, Heidelberg.
    Skorsky, Martin
    TOSCANA Management System for Conceptual Data. Available at:
    http://www.mathematik.th-darmstadt.de/ags/ag1/software/ToscanaDemo/ToscanaDemo.html

    Session: 9. Applications of Formal Concept Analysis to Library Classification Systems and the Internet

  • Large scale applications of Formal Concept Analysis
  • The WAVE and the GRIN project
  • The ZIT library catalog

    Readings:

    Kent, Robert; Neuss, Christian (1995 )
    Creating a Web Analysis and Visualization Environment. http://wave.eecs.wsu.edu/WAVE/references.html
    Priss, Uta (1997)
    A Graphical Interface for Document Retrieval Based on Formal Concept Analysis. Proc. of the Midwest Artificial Intelligence and Cognitive Science Conference, May 1997.. (to appear)

    Session: 10. Relational Concept Analysis

  • Quantifiers in semantic relations
  • Graphical representations of relations

    Readings:

    Priss, Uta (1996)
    Relational Concept Analysis: Semantic Structures in Dictionaries and Lexical Databases. Dissertation. pp 42 - 47 and 51 - 57

    Session: 11. Applications: Student Presentations


    Session: 12. Conclusions

  • Conceptual knowledge systems
  • Attribute exploration
  • Knowledge Processing

    Readings:

    Wille, Rudolf (1992)
    Concept Lattices and Conceptual Knowledge Systems. Computers Math. Applic. Vol. 23. pp. 507 -514.
    Stumme, Gerd (1995)
    Exploration Tools in Formal Concept Analysis. Preprint 1796, Th-Darmstadt.
    Wille, Rudolf ( ).
    Conceptual Landscapes of Knowledge: A Pragmatic Paradigm of Knowledge Processing.

    Uta Priss
    Fri Feb 21 08:54:31 EST 1997