Return to Teaching

Advanced Text Algorithms

Description

Text algorithms are essential in many areas of science and information processing. Very often the information is represented as a written text in the form of, e.g., newspapers, books, computer hard-drives, optical disks, the genetic information of living organisms, etc. Specific text algorithms are needed when processing the information, whether in text editors or in (web)-searching engines. The intricacies of text algorithms represents a major topic of investigation in computer science. This course will present several advanced, practically relevant text algorithms and their interconnections. They include pattern matching and data compression algorithms, detection of periodicities, repetitions and symmetries in texts, and other algorithms.

Content

  • Pattern-matching algorithms: Knuth-Morris-Pratt algorithm, Boyer-Moore algorithm
  • String alignment algorithms: edit distance, optimal alignment, longest common subsequence, alignments with gaps
  • Approximate patter-matching algorithms
  • Suffix trees and text search applications
  • Symmetries and repetitions in texts
  • Constant-space algorithms
  • Text compression algorithms

Prerequisites

  • Algorithmics
  • Data structures
  • Note:
    • the prerequisites are preferable, though not compulsory. The course is self-contained, all the background will be introduced

Course format

  • Lectures: 28h, 14 lectures (including several exercise sessions)
  • Final examination:
    • 15 points (without bonus from exercises) – pass, mark 1/5
    • over 27 points -, mark 5/5
    • Bonus points
      • a maximum of 3 bonus points in the exam can be gained from completing 80% of the exercises
      • 1 bonus point can be gained for completing the custom-made “feedback form” (to be electronically filled, printed, and collected “on paper” only on 18- and 20.12.2018)

Literature

  1. Maxime Crochemore, Wojciech Rytter, Jewels of stringology, World Scientific, 2002.
  2. Maxime Crochemore, Christophe Hancart, Thierry Lecroq, Algorithms on strings, Cambridge University Press, 2001.
  3. Dan Gusfield, Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, 1999.

Credits

  • 5sp

Time schedule

  • Start date: 30st of October, 2018
  • End date: 18th of December, 2018
  • Tuesdays:
    • 15-17, 115A, Agora
  • Thursdays:
    • 10-12, 115A, Agora
  • Course exam:
    • 20th of December, 2018
    •  ( Dept.’s Exam date in January, )

 

Lecturer

  • Eugen Czeizler,
  • Faculty of Science and Engineering,
  • Åbo Akademi University,
  • firstname.lastname at abo.fi

 

Lecture Slides

(preliminary, to be updated during the course)

 

Exercises

 

Exercises_1: Exercise Set 1 (due 8.11.2018)

Exercises_2 Exercise Set 2 (due 27.11.2018)

 

Notes:

 

NOTE: Tuesday, 11.12.2018, there will be no lecture/exercises.

Advanced text algorithms – 2016

Description Text algorithms are essential in many areas of science and information processing. Very often the information is represented as a written text in the form of, e.g., newspapers, books, computer hard-drives, optical disks, the genetic information of living organisms, etc. Specific text algorithms are needed when processing the information, whether in text editors or …

Projects

Project1: Implement either Knuth-Morris-Pratt (5 points) or Boyer-Moore (7 points) algorithm for pattern matching. Run your algorithm on the following input. Take text from here (by CNN). Each student has his own pattern to match. Find your pattern from the table below. Your program should report all matches for the pattern and total number of comparisons performed. Bonus …