↑ Return to Teaching

Advanced Text Algorithms

Description

Text algorithms are essential in many areas of science and information processing. Very often the information is represented as a written text in the form of, e.g., newspapers, books, computer hard-drives, optical disks, the genetic information of living organisms, etc. Specific text algorithms are needed when processing the information, whether in text editors or in (web)-searching engines. The intricacies of text algorithms represents a major topic of investigation in computer science. This course will present several advanced, practically relevant text algorithms and their interconnections. They include pattern matching and data compression algorithms, detection of periodicities, repetitions and symmetries in texts, and other algorithms.

Content

  • Pattern-matching algorithms: Knuth-Morris-Pratt algorithm, Boyer-Moore algorithm
  • String alignment algorithms: edit distance, optimal alignment, longest common subsequence, alignments with gaps
  • Approximate patter-matching algorithms
  • Suffix trees and text search applications
  • Symmetries and repetitions in texts
  • Constant-space algorithms
  • Text compression algorithms

Prerequisites

  • Algorithmics
  • Data structures
  • Note:
    • the prerequisites are preferable, though not compulsory. The course is self-contained, all the background will be introduced

Course format

  • Lectures: 28h, 14 lectures
  • Exercises: 7 homework+demonstration sessions
  • Final examination:
    • 15 points – pass, mark 1/5
    • over 27 points -, mark 5/5
    • Bonus points from exercises
      • with enough gained bonus points one can skip the final exam, and even obtain the highest mark

Literature

  1. Maxime Crochemore, Wojciech Rytter, Jewels of stringology, World Scientific, 2002.
  2. Maxime Crochemore, Christophe Hancart, Thierry Lecroq, Algorithms on strings, Cambridge University Press, 2001.
  3. Dan Gusfield, Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, 1999.

Credits

  • 5sp

Time schedule

  • Start date: 31st of October, 2017
  • End date: 14th of December, 2017
  • Tuesdays:
    • 15-17, 115A, Agora
  • Thursdays:
    • 10-12, 115A, Agora
  • Course exam:
    • 15th of December, 2017
    • 12th of January, 2018
  • General exam:
    • 16th of February, 2018
    • 13th of April, 2018
  • Check the list below for more details (SUBJECT OF CHANGE):
October 31, 2017
  • October 31, 2017: Lecture 1: Intro
    3:15 pm - 5:00 pm, 115A, Agora,
November 2, 2017
  • November 2, 2017: Lecture 2: Notations and Definitions, Pattern Matching Algorithms
    10:15 am - 12:00 pm, 115A, Agora,
November 7, 2017
  • November 7, 2017: Lecture 3: Pattern Matching Algorithms
    3:15 pm - 5:00 pm, 115A, Agora,
November 9, 2017
  • November 9, 2017: Lecture 4: Pattern Matching Algorithms, Boyer-Moore Algorithm
    10:15 am - 12:00 pm, 115A, Agora,
November 14, 2017
  • November 14, 2017: Lecture 5: Suffix trees
    3:15 pm - 5:00 pm, 115A, Agora,
November 16, 2017
  • November 16, 2017: Lecture 6: Ukkonen algorithm, suffix trees and their applications
    10:15 am - 12:00 pm, 115A, Agora,
November 21, 2017
  • November 21, 2017: Lecture 7: Approximate pattern matching
    3:15 pm - 5:00 pm, 115A, Agora,
November 23, 2017
  • November 23, 2017: Lecture 8:Approximate pattern matching, local alignments and alignments with gaps
    10:15 am - 12:00 pm, 115A, Agora,
November 28, 2017
  • November 28, 2017: Lecture 9:K-difference Inexact Matching and Database Sequence Searching
    3:15 pm - 5:00 pm, 115A, Agora,
November 30, 2017
  • November 30, 2017: Lecture 10: Compression algorithms
    10:15 am - 12:00 pm, 115A, Agora,
December 5, 2017
  • December 5, 2017: Lecture 11: Detecting text regularities
    3:15 pm - 5:00 pm, 115A, Agora,
December 7, 2017
  • December 7, 2017: Lecture 12: Detecting text regularities: palindromes
    10:15 am - 12:00 pm, 115A, Agora,
December 12, 2017
  • December 12, 2017: Lecture 13: Sorting factors in text
    3:15 pm - 5:00 pm, 115A, Agora,
December 14, 2017
  • December 14, 2017: ---
    10:15 am - 12:00 pm, 115A, Agora,

Lecturer

  • Vladimir Rogojin,
  • Faculty of Science and Engineering,
  • Åbo Akademi University,
  • vrogojin at abo.fi

Previous course page

Advanced text algorithms – 2016

Description Text algorithms are essential in many areas of science and information processing. Very often the information is represented as a written text in the form of, e.g., newspapers, books, computer hard-drives, optical disks, the genetic information of living organisms, etc. Specific text algorithms are needed when processing the information, whether in text editors or …

View page »

Projects

Project1: Implement either Knuth-Morris-Pratt (5 points) or Boyer-Moore (7 points) algorithm for pattern matching. Run your algorithm on the following input. Take text from here (by CNN). Each student has his own pattern to match. Find your pattern from the table below. Your program should report all matches for the pattern and total number of comparisons performed. Bonus …

View page »