L548: Computer Programming for
Information Management
School of Library and Information Science
Indiana University
Fall 1999
Instructor: Uta Priss
Email: upriss@indiana.edu
Office: 029 SLIS
Phone: 812-855-2793
Office hours: Wednesday 2.00 - 3.00 or by appointment
Course Syllabus
Some class-related links:
Student projects
Some information on Ella
Documentation
for the CGI Perl module
SQL Quick Reference (for students who will use ORACLE for their
projects)
Introduction
This course introduces basic skills for programming and manipulation of
text-based information systems. Information management is a major task for
librarians and information professionals who are asked to extract information
from sources on the WWW, design interactive text-based web interfaces to
information systems, utilize text that is stored or is supposed to be stored
in a markup format or preprocess information for storage in databases.
This course teaches computer-based approaches to these tasks.
Currently the class is taught using Perl/CGI. Perl provides a good
introduction to general programming concepts. These concepts include basic
programming structures, such as control structures, file handling and
program design strategies. But they also include more advanced topics,
such as networking, text-based user interfaces, and basic
retrieval concepts. Perl allows rapid prototyping which is appropriate for
applications in a fast changing environment such as the WWW. Furthermore, Perl
is very suited for search engines, parsers and mark-up languages. Students
will develop a small information systems application as a project for this
class. The concepts are therefore not taught abstractly but as hands-on
experiences with WWW applications.
Course Objectives
This course
- teaches basic programming concepts and structures.
- introduces basic information processing and management concepts.
- uses small scale but realistic examples of information management tasks.
- teaches the basics of Perl and Perl/CGI.
- provides an introduction to more advanced topics such as object
oriented programming.
Prerequisites
L401 or consent of instructor. Especially important: basic Unix skills,
i.e. understanding of the Unix directory structure and ability to edit and
save files on a Unix computer; ability to create HTML web forms.
Class Organization
The class is taught as a combination of lecture and lab sessions.
The students will work on a semester project either as a team of two
members or individually. The results of the projects will be
presented during the last class session.
Computer Lab
The lab session is taught in GY226 (a Unix lab). All students must
create an account
on the Unix Nations cluster at least 24 hours before the first lab session.
If students want to practise in the Unix lab during other times,
they should first check the on-line availability schedule for the
lab. (Select month and lab "GY226" on
this page.)
Readings
Required Textbook:
Randal L. Schwartz & Tom Christiansen: Learning Perl, 2nd Edition,
July 1997, O'Reilly
The book should be treated as an "encyclopedia". It contains detailed
technical information, which sometimes may be difficult to understand
for programming beginners. Some on-line tutorials may serve as
complimentary, easier reading:
Day 2
on this page or the
Perl Tutorial.
Additional resources can be found at
www.perl.com and on a
local IU resource page.
Grading
The grades are given according to the SLIS grading standards. Good work
that meets the course expectations will be assigned a grade of B. To get a
higher grade than B, the students must demonstrate above average
comprehension of the course materials, knowledge and/or effort.
The final course grade will be computed for each student on the basis
of grades assigned for the following:
| Class contribution and listserv discussions
| 1/5
|
| Project
| 2/5
|
| Final exam
| 2/5
|
Each student is expected to complete all course work by the end of the
term. A grade of incomplete (I) will be assigned only if exceptional
circumstances warrant. Late work will be accepted only at the discretion
of the instructor and in every case will be automatically downgraded by
1/3 grade (e.g., a B+ becomes a B, a B- becomes a C+, etc.).
Class contribution and listserv discussions
Class contribution consists of class attendance, contributions to class
discussions and to discussions on the majordomo distribution list
(priss_l548@indiana.edu). It is required that every student
demonstrate respect for the ideas, opinions, and feelings of all other
members of the class.
Project
Teams and topics
Students can work on their projects either as a team of two members
or individually. The teams must be formed during the first week of class.
Each project will consist of developing an information processing
or information management tool. The tool should have a CGI-based user
interface.
Examples for projects are: a mail filtering system (allows users to extract
mail messages from a standard Unix mail folder based on certain
preferences), a search engine,
an information extraction tool for webpages (allows to extract certain
information from a set of webpages), an html or xml viewer (displays markup
pages as text with certain formatting), a data preprocessing tool (prepares
data for input into a database or formats the output from a database), a
library access tool (formats user input for search in an electronic library
catalog), an indexing tool (parses text and identifies important words
based on frequencies). Other similar topics can be suggested by the students.
Some of these topics require additional knowledge (such as databases or
xml) and should only be chosen by students who have acquired such knowledge
prior to this class. The students should discuss their choice of
topic with the instructor.
Project presentation, assignments and final project report
The students will present their tools during the last lab session
(December 9th, 1999).
Parts of the projects will be handed in as
assignments during the semester (see the Class Schedule).
The final project report is due on
December 9th, 1999. It must contain
information on the purpose, features and limits of the software
and indicate possible future extensions and improvements.
The source code of the tool should not be
included in the documentation but it must be made available
for evaluation by the instructor.
Grading of the projects
A total of up to 100 points will be given for the project.
Each assignment is worth 10 points, the presentation is
worth 10 points and 40 points will be given for the final project report
and the project as a whole.
The project will be evaluated according to the following criteria:
-
General project: The project should be appropriate
and feasible for Perl/CGI. It should be neither too comprehensive nor
too simple. It should provide a service that is relevant
and interesting for WWW users.
-
Performance: The software must run without errors
and must do whatever is claimed in the documentation. The software
should not be too slow, for example, it should not try to download
huge files or graphics from the WWW while the user is waiting.
-
Security: The software should not present an open security risk for
the CGI server, i.e. all security rules that are given in class
must be followed. The CGI scripts must not permit sending of
unsolicited email messages. All copyright and privacy rules that
are common law or issued by IU must be followed.
-
Usability: the software must be usable by a medium computer literate
person. The web pages and links must contain enough information
so that users know what the software does and what happens if they
follow links or press buttons. The structure of the pages should be
clear, consistent and contain no redundancies.
The documentation should be clear and concise.
The web pages and documentation should be spell- and grammar-checked.
-
Programming style: the source code of the script should be clear
and should have appropriate comments.
Final Exam
The final exam will be a take-home exam consisting of several small
information management tasks for which the students will write
appropriate Perl
scripts. The exam will be distributed at the conclusion of the class
on November 29th and will be due on
Monday, December 13th, 1999,
12.00 pm (Noon). Team work is not allowed for the final exam.
A note on plagiarism
The students must clearly indicate if they use materials from other
sources, such as textbooks or Internet webpages. Full citation information
must be given for such sources. Academic and personal misconduct by
students in this class are defined and dealt with according to the
procedures in the Code of Student Ethics.
Class Schedule
Week 1. Programming basics
Aug. 30, Sept. 2
Topics:
Introduction to information processing tasks; simple Perl programs;
scalar variables
Assignments:
- Read chapter 2 in Learning Perl
- Develop a plan for your information
processing tool: what do you want to accomplish with the tool? Which
components will your tool have? What are possible features and limits?
Find a name for your software tool.
Week 2. Operators, program design and control structures
Sept. 6, Sept. 9
Topics:
Program design; flowcharts; control structures
Assignments:
- Read chapter 4 in Learning Perl
- To be handed in by Sept. 13:
Draw flowcharts for components of your information processing tool.
Email the name of your project and a short description to the discussion
list. Hand in the flowcharts.
Week 3. Arrays, Input/Output, File handling
Sept. 13, Sept. 16
Topics:
Arrays and I/O; file handling
Assignments:
- Read chapter 3, 6 and p. 108 - 111 of chapter 10 in Learning Perl
- Chapter 3 exercises 1 and 2;
Week 4. CGI I
Sept. 20, Sept. 23
Topics: HTML forms and how to process them with CGI
Assignments:
- Read chapter 19, pages 180 - 186, in Learning Perl
- To be handed in by Sept 27:
Create forms for your project and email
the URL of the forms to upriss@indiana.edu.
Week 5. Regular expressions I
Sept. 27, Sept. 30
Topics:Regular expressions
Assignments:
- Read chapter 7, p. 76 - 81, in Learning Perl
Week 6. Regular expressions II
Oct. 4, Oct. 7
Topics:
Regular expressions; substitution, transliteration and split
Assignments:
- Read the rest of chapter 7 and 15 in Learning Perl
- Chapter 7 exercise 1
Week 7. Programming in the large
Oct. 11, Oct. 14
Topics:
Functions, modular program design, local and global variables
Assignments:
- Read chapter 8 in Learning Perl
- Chapter 8 exercise 1
- To be handed in by Oct. 18:
Write the main (sub)routine of your project.
Print the source code of your main routine and hand it in.
Week 8. A Perl networking client
Oct. 18, Oct. 21
Topics:
Retrieving documents from the web via Perl
Assignments:
- Read Appendix C in Learning Perl
Week 9. CGI II
Oct. 25, Oct. 28
Topics:
Searching web pages on-line; security
Assignments:
- Read the rest of chapter 19 in Learning Perl
- To be handed in by Nov. 8:
Process the form input for your project in a secure manner.
Print the source code of the subroutine
that processes the form input and hand it in.
Week 10. Hashs
Nov. 4 (Note: there is no lecture on Nov. 1);
Topics:
Hashs (associative arrays); miscellaneous control structures;
Assignments:
- Read chapter 5 and 9 in Learning Perl
- Chapter 5 exercises 1 and 2
Week 11. CGI III
Nov. 8, Nov. 11
Topics:
Environment variables, hidden text and cookies
Assignments:
- To be handed in by Nov 15:
Write a two page user manual for your project.
Print the manual and hand it in.
Week 12. The object oriented paradigm I
Nov. 15, Nov. 18
Topics:
Objects, classes, methods
Assignments:
-
Here is an optional reading on object-oriented Perl.
Follow the links: Object-oriented programming, Objects, ...,
Using Modules on that page.
Week 13 and 14. The object oriented paradigm II
Nov. 22, Nov. 29, Dec. 2
Topics:
Class hierarchy, inheritance, polymorphism and encapsulation
Final exam
Week 15. Outlook and Team presentations
Dec. 6: Outlook
Dec. 9; Presentation of projects,
project report is due
Final exam is due: Dec. 13