June 2011

Volume 26 Number 06

UI Frontiers - Principles of Pagination

By Charles Petzold | June 2011

Charles PetzoldFor a couple decades now, programmers specializing in computer graphics have known that the most difficult tasks do not involve bitmaps or vector graphics, but good old text.

Text is tough for several reasons, most of which relate to the fact that it’s usually desirable for the text to be readable. Also, the actual height of a piece of text is only casually related to its font size, and character widths vary depending on the character. Characters are often combined into words (which must be held together). Words are combined into paragraphs, which must be separated into multiple lines. Paragraphs are combined into documents, which must be either scrolled or separated into multiple pages.

In the last issue I discussed the printing support in Silverlight 4, and now I’d like to discuss printing text. The most important difference between displaying text on the screen and printing text on the printer can be summed up in a simple truth suitable for framing: The printer page has no scrollbars.

A program that needs to print more text than can fit on a page must separate the text into multiple pages. This is a non-trivial programming task known as pagination. I find it quite interesting that pagination has actually become more important in recent years, even as printing has become less important. Here’s another simple truth you can frame and put up on your wall: pagination—it’s not just for printers anymore.

Pick up any e-book reader—or any small device that lets you read periodicals, books or even desktop book-reading software—and you’ll find documents that have been organized into pages. Sometimes these pages are preformatted and fixed (as with PDF and XPS file formats), but in many cases pages can by dynamically reflowed (such as with EPUB or proprietary e-book formats). For documents that can be reflowed, something as simple as changing the font size can require a whole section of the document to be repaginated dynamically while the user is waiting, probably impatiently.

Paginating on the fly—and doing it quickly—turns a non-trivial programming job into one that can be exceptionally challenging. But let’s not frighten ourselves too much. I’ll build up to the hard stuff over time, and for now will start off very simply.

Stacking TextBlocks

Silverlight provides several classes that display text:

  • The Glyphs element is perhaps the one class with which most Silverlight programmers are least familiar. The font used in Glyphs must be specified either with a URL or a Stream object, which makes the element most useful in fixed-page documents or document packages that rely heavily on specific fonts. I won’t be discussing the Glyphs element.
  • The Paragraph class is new with Silverlight 4 and mimics a prominent class in the document support of the Windows Presentation Foundation (WPF). But Paragraph is used mostly in conjunction with RichTextBox, and it’s not supported in Silverlight for Windows Phone 7.
  • And then there’s TextBlock, which is often used in a simple way by setting the Text property—but it can also combine text of different formats with its Inlines property. TextBlock also has the crucial ability to wrap text into multiple lines when the text exceeds the allowable width.

TextBlock has the virtue of being familiar to Silverlight programmers and suitable for our needs, so that’s what I’ll be using.

The SimplestPagination project (available with the downloadable code for this article) is designed to print plain-text documents. The program treats each line of text as a paragraph that might need to be wrapped into multiple lines. However, the program implicitly assumes that these paragraphs are not very long. This assumption comes from the limitation of the program to break paragraphs across pages. (That’s the Simplest part of the SimplestPagination name.) If a paragraph is too long to fit on a page, the whole paragraph is moved to the next page, and if the paragraph is too large for a single page, then it’s truncated.

You can run the SimplestPagination program at bit.ly/elqWgU. It has just two buttons: Load and Print. The Load button displays an OpenFileDialog that lets you pick a file from local storage. Print paginates it and prints it.

The OpenFileDialog returns an object of FileInfo. The OpenText method of FileInfo returns a StreamReader, which has a ReadLine method for reading whole lines of text. Figure 1 shows the PrintPage handler.

Figure 1 The PrintPage Handler of SimplestPagination

void OnPrintDocumentPrintPage(
  object sender, PrintPageEventArgs args) {

  Border border = new Border {
    Padding = new Thickness(
      Math.Max(0, desiredMargin.Left - args.PageMargins.Left),
      Math.Max(0, desiredMargin.Top - args.PageMargins.Top),
      Math.Max(0, desiredMargin.Right - args.PageMargins.Right),
      Math.Max(0, desiredMargin.Bottom - args.PageMargins.Bottom))
  };

  StackPanel stkpnl = new StackPanel();
  border.Child = stkpnl;
  string line = leftOverLine;

  while ((leftOverLine != null) || 
    ((line = streamReader.ReadLine()) != null)) {

    leftOverLine = null;

    // Check for blank lines; print them with a space
    if (line.Length == 0)
      line = " ";

    TextBlock txtblk = new TextBlock {
      Text = line,
      TextWrapping = TextWrapping.Wrap
    };

    stkpnl.Children.Add(txtblk);
    border.Measure(new Size(args.PrintableArea.Width, 
      Double.PositiveInfinity));

    // Check if the page is now too tall
    if (border.DesiredSize.Height > args.PrintableArea.Height &&
      stkpnl.Children.Count > 1) {

      // If so, remove the TextBlock and save the text for later
      stkpnl.Children.Remove(txtblk);
      leftOverLine = line;
      break;
    }
  }

  if (leftOverLine == null)
    leftOverLine = streamReader.ReadLine();

  args.PageVisual = border;
  args.HasMorePages = leftOverLine != null;
}

As usual, the printed page is a visual tree. The root of this particular visual tree is the Border element, which is given a Padding property to obtain 48-unit (half-inch) margins as indicated in the desiredMargins field. The PageMargins property of the event arguments provides the dimensions of the unprintable margins of the page, so the Padding property needs to specify additional space to bring the total up to 48.

A StackPanel is then made a child of the Border, and TextBlock elements are added to the StackPanel. After each one, the Measure method of the Border is called with a horizontal constraint of the printable width of the page, and a vertical constraint of infinity. The DesiredSize property then reveals how big the Border needs to be. If the height exceeds the height of the PrintableArea, then the TextBlock must be removed from the StackPanel (but not if it’s the only one).

The leftOverLine field stores the text that didn’t get printed on the page. I also use it to signal that the document is complete by calling ReadLine on the StreamReader one last time. (Obviously if StreamReader had a PeekLine method, this field wouldn’t be required.)

The downloadable code contains a Documents folder with a file named EmmaFirstChapter.txt. This is the first chapter of Jane Austen’s novel, “Emma,” specially prepared for this program: All the paragraphs are single lines, and they’re separated by blank lines. With the default Silverlight font, it’s about four pages in length. The printed pages aren’t easy to read, but that’s only because the lines are too wide for the font size.

This file also reveals a little problem with the program: It could be that one of the blank lines is the first paragraph on a page. If that’s the case, it shouldn’t be printed. But that’s just additional logic.

For printing text that has actual paragraphs, you could use blank lines between paragraphs, or you might prefer to have more control by setting the Margin property of TextBlock. It’s also possible to have a first-line indent by changing the statement that assigns the Text property of the TextBlock from this:

Text = line,
 
to this:
Text = "     " + line,

But neither of these techniques would work well when printing source code.

Splitting the TextBlock

After experimenting with the SimplestPagination program, you’ll probably conclude that its biggest flaw is the inability to break paragraphs across pages.

One approach to fixing this problem is illustrated in the BetterPagination program, which you can run at bit.ly/ekpdZb. This program is much like SimplestPagination except in cases when a TextBlock is added to the StackPanel, which causes the total height to exceed the page. In SimplestPagination, this code simply removed the entire TextBlock from the StackPanel:

// Check if the page is now too tall
if (border.DesiredSize.Height > args.PrintableArea.Height &&
  stkpnl.Children.Count > 1) {

  // If so, remove the TextBlock and save the text for later
  stkpnl.Children.Remove(txtblk);
  leftOverLine = line;
  break;
}
BetterPagination now calls a method named RemoveText:
// Check if the page is now too tall
if (border.DesiredSize.Height > args.PrintableArea.Height) {
  // If so, remove some text and save it for later
  leftOverLine = RemoveText(border, txtblk, args.PrintableArea);
  break;
}

RemoveText is shown in Figure 2. The method simply removes one word at a time from the end of the Text property of the TextBlock and checks if that helps the TextBlock fit on the page. All the removed text is accumulated in a StringBuilder that the PrintPage handler saves as leftOverLine for the next page.

Figure 2 The RemoveText Method from BetterPagination

string RemoveText(Border border, 
  TextBlock txtblk, Size printableArea) {

  StringBuilder leftOverText = new StringBuilder();

  do {
    int index = txtblk.Text.LastIndexOf(' ');

    if (index == -1)
      index = 0;

    leftOverText.Insert(0, txtblk.Text.Substring(index));
    txtblk.Text = txtblk.Text.Substring(0, index);
    border.Measure(new Size(printableArea.Width, 
      Double.PositiveInfinity));

    if (index == 0)
      break;
  }
  while (border.DesiredSize.Height > printableArea.Height);

  return leftOverText.ToString().TrimStart(' ');
}

It’s not pretty, but it works. Keep in mind that if you’re dealing with formatted text (different fonts, font sizes, bold and italic), then you’ll be working not with the Text property of the TextBlock but with the Inlines property, and that complicates the process immensely.

And yes, there are definitely faster ways to do this, although they certainly are more complex. For example, a binary algorithm can be implemented: Half the words can be removed, and if it fits on the page, half of what was removed can be restored, and if that doesn’t fit on the page, then half of what was restored can be removed, and so forth.

However, keep in mind that this is code written for printing. The bottleneck of printing is the printer itself, so while the code might spend a few more seconds testing each and every TextBlock, it’s probably not going to be noticeable.

But you might start wondering exactly how much is going on under the covers when you call Measure on the root element. Certainly all the individual TextBlock elements are getting Measure calls, and they’re using Silverlight internals to determine how much space that text string actually occupies with the particular font and font size.

You might wonder whether code like this would even be tolerable for paginating a document for a video display on a slow device.

So let’s try it.

Pagination on Windows Phone 7

My goal (which won’t be completed in this article) is to build an e-book reader for Windows Phone 7 suitable for reading plain-text book files downloaded from Project Gutenberg (gutenberg.org). As you may know, Project Gutenberg dates from 1971 and was the very first digital library. For many years, it focused on providing public-domain books—very often the classics of English literature—in a plain-text ASCII format. For example, the complete “Emma” by Jane Austen is the file gutenberg.org/files/158/158.txt.

Every book is identified by a positive integer for its file name. As you can see here, “Emma” is 158 and its text version is in file 158.txt. In more recent years Project Gutenberg has provided other formats such as EPUB and HTML, but I’m going to stick with plain text for this project for obvious reasons of simplicity.

The EmmaReader project for Windows Phone 7 includes 158.txt as a resource and lets you read the entire book on the phone. Figure 3 shows the program running on the Windows Phone 7 emulator. For gesture support, the project requires the Silverlight for Windows Phone Toolkit, downloadable from silverlight.codeplex.com. Tap or flick left to go to the next page; flick right to go to the previous page.

EmmaReader Running on the Windows Phone 7 Emulator

Figure 3 EmmaReader Running on the Windows Phone 7 Emulator

The program has almost no features except those necessary to make it reasonably useable. Obviously I’ll be enhancing this program, particularly to allow you to read other books besides “Emma”—perhaps even books of your own choosing! But for nailing down the basics, it’s easier to focus on a single book.

If you examine 158.txt, you’ll discover the most significant characteristic of plain-text Project Gutenberg files: Each paragraph consists of one or more consecutive 72-character lines delimited by a blank line. To turn this into a format suitable for TextBlock to wrap lines, some pre-processing is required to concatenate these individual consecutive lines into one. This is performed in the PreprocessBook method in EmmaReader. The entire book—including zero-length lines separating paragraphs—is then stored as a field named paragraphs of type List<string>. This version of the program doesn’t attempt to divide the book into chapters.

As the book is paginated, every page is identified as an object of type PageInfo with just two integer properties: ParagraphIndex is an index into the paragraphs list and CharacterIndex is an index into the string for that paragraph. These two indices indicate the paragraph and character that begins the page. The two indices for the first page are obviously both zero. As each page is paginated, the indices for the next page are determined.

The program does not attempt to paginate the entire book at once. With the page layout I’ve defined and the default Silverlight for Windows Phone 7 font, “Emma” sprawls out to 845 pages and requires nine minutes to get there when running on a real device. Obviously the technique I’m using for pagination—requiring Silverlight to perform a Measure pass for each page, and very often many times if a paragraph continues from one page to the next—takes a toll. I’ll be exploring some faster techniques in later columns.

But the program doesn’t need to paginate the entire book at once. As you start reading at the beginning of a book and proceed page by page, the program only needs to paginate one page at a time.

Features and Needs

I originally thought that this first version of EmmaReader would have no features at all except those necessary to read the book from beginning to end. But that would be cruel. For example, suppose you’re reading the book, you’ve gotten to page 100 or so, and you turn off the screen to put the phone in your pocket. At that time, the program is tombstoned, which means that it’s essentially terminated. When you turn the screen back on, the program starts up afresh and you’re back at page one. You’d then have to tap 99 pages to continue reading where you left off!

For that reason, the program saves the current page number in isolated storage when the program is tombstoned or terminated. You’ll always jump back to the page you left. (If you experiment with this feature when running the program under Visual Studio, either on the emulator or an actual phone, be sure to terminate the program by pressing the Back button, not by stopping debugging in Visual Studio. Stopping debugging doesn’t allow the program to terminate correctly and access isolated storage.)

Saving the page number in isolated storage isn’t actually enough. If only the page number was saved, then the program would have to paginate the first 99 pages in order to display the hundredth. The program needs at least the PageInfo object for that page.

But that single PageInfo object isn’t enough, either. Suppose the program reloads, it uses the PageInfo object to display page 100, and then you decide to flick your finger right to go to the previous page. The program doesn’t have the PageInfo object for page 99, so it needs to repaginate the first 98 pages.

For that reason, as you progressively read the book and each page is paginated, the program maintains a list of type List<PageInfo> with all the PageInfo objects that it has determined so far. This entire list is saved to isolated storage. If you experiment with the program’s source code—for example, changing the layout, or the font size or replacing the entire book with another—keep in mind that any change that affects pagination will invalidate this list of PageInfo objects. You’ll want to delete the program from the phone (or the emulator) by holding your finger on the program name on the start list, and selecting Uninstall. This is currently the only way to erase the stored data from isolated storage.

Here’s the content Grid in MainPage.xaml:

<Grid x:Name="ContentPanel" 
  Grid.Row="1" Background="White">
  <toolkit:GestureService.GestureListener>
  <toolkit:GestureListener 
    Tap="OnGestureListenerTap"
    Flick="OnGestureListenerFlick" />
  </toolkit:GestureService.GestureListener>
            
  <Border Name="pageHost" Margin="12,6">
    <StackPanel Name="stackPanel" />
  </Border>
</Grid>

During pagination, the program obtains the ActualWidth and ActualHeight of the Border element and uses that in the same way that the PrintableArea property is used in the printing programs. The TextBlock elements for each paragraph (and the blank lines between the paragraphs) are added to the StackPanel.

The Paginate method is shown in Figure 4. As you can see, it’s very similar to the methods used in the printing programs except that it’s accessing a List of string objects based on paragraphIndex and characterIndex. The method also updates these values for the next page.

Figure 4 The Paginate Method in EmmaReader

void Paginate(ref int paragraphIndex, ref int characterIndex) {
  stackPanel.Children.Clear();

  while (paragraphIndex < paragraphs.Count) {
    // Skip if a blank line is the first paragraph on a page
    if (stackPanel.Children.Count == 0 &&
      characterIndex == 0 &&
      paragraphs[paragraphIndex].Length == 0) {
        paragraphIndex++;
        continue;
    }

    TextBlock txtblk = new TextBlock {
      Text = 
        paragraphs[paragraphIndex].Substring(characterIndex),
      TextWrapping = TextWrapping.Wrap,
      Foreground = blackBrush
    };

    // Check for a blank line between paragraphs
    if (txtblk.Text.Length == 0)
      txtblk.Text = " ";

    stackPanel.Children.Add(txtblk);
    stackPanel.Measure(new Size(pageHost.ActualWidth, 
      Double.PositiveInfinity));

    // Check if the StackPanel fits in the available height
    if (stackPanel.DesiredSize.Height > pageHost.ActualHeight) {
      // Strip words off the end until it fits
      do {
        int index = txtblk.Text.LastIndexOf(' ');

        if (index == -1)
          index = 0;

        txtblk.Text = txtblk.Text.Substring(0, index);
        stackPanel.Measure(new Size(pageHost.ActualWidth, 
          Double.PositiveInfinity));

        if (index == 0)
          break;
      }
      while (stackPanel.DesiredSize.Height > pageHost.ActualHeight);

      characterIndex += txtblk.Text.Length;

      // Skip over the space
      if (txtblk.Text.Length > 0)
        characterIndex++;

      break;
    }
    paragraphIndex++;
    characterIndex = 0;
  }

  // Flag the page beyond the last
  if (paragraphIndex == paragraphs.Count)
    paragraphIndex = -1;
}

As you can see in Figure 3, the program displays a page number. But notice that it does not display a number of pages, because this can’t be determined until the entire book is paginated. If you’re familiar with commercial e-book readers, you’re probably aware that the display of page numbers and number of pages is a big issue.

One feature that users find necessary in e-book readers is the ability to change the font or font size. However, from the program’s perspective, this has deadly consequences: All the pagination information accumulated so far has to be discarded, and the book needs to be repaginated to the current page, which is not even the same page it was before.

Another nice feature in e-book readers is the ability to navigate to the beginnings of chapters. Separating a book into chapters actually helps the program deal with pagination. Each chapter begins on a new page, so the pages in each chapter can be paginated independently of the other chapters. Jumping to the beginning of a new chapter is trivial. (However, if the user then flicks right to the last page of the previous chapter, the entire previous chapter must be re-paginated!)

You’ll probably also agree that this program needs a better page transition. Having the new page just pop into place is unsatisfactory because it doesn’t provide adequate feedback that the page has actually turned, or that only one page has turned instead of multiple pages. The program definitely needs some work. Meanwhile, enjoy the novel.


Charles Petzold is a longtime contributing editor to MSDN Magazine*. His recent book, “Programming Windows Phone 7” (Microsoft Press, 2010), is available as a free download at bit.ly/cpebookpdf.*

Thanks to the following technical expert for reviewing this article: Jesse Liberty