Share via


Step 1: Downloading RSS Feeds

Applies to: Functional Programming

Authors: Tomas Petricek and Jon Skeet

Referenced Image

Get this book in Print, PDF, ePub and Kindle at manning.com. Use code “MSDN37b” to save 37%.

Summary: This step of the tutorial explains how to implement the model of a news aggregator web page. The model asynchronously downloads RSS feeds and formats them into F# records.

This topic contains the following sections.

  • Implementing Model for News Items
  • Downloading and Parsing Feeds
  • Gathering the Home Page Data
  • Summary
  • Additional Resources
  • See Also

This article is associated with Real World Functional Programming: With Examples in F# and C# by Tomas Petricek with Jon Skeet from Manning Publications (ISBN 9781933988924, copyright Manning Publications 2009, all rights reserved). No part of these chapters may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, electrostatic, mechanical, photocopying, recording, or otherwise—without the prior written permission of the publisher, except in the case of brief quotations embodied in critical articles or reviews.

Implementing Model for News Items

This article describes how to create the first part of a News Aggregator web application. The tutorial begins by implementing the model, which is the component that provides the data and core logic for the web application. In the News Aggregator's case, it is quite clear what the model should do. It needs to download several RSS feeds in parallel and then load them into a data structure that can be used by the rest of the application.

If the requirements are well known, the model can be implemented first using F# Interactive to test its functionality. The rest of the web application can be implemented after the model behaves as expected. In a more complex application, the model typically evolves together with the application.

As mentioned in the tutorial overview, the starting point for the web application is an ASP.NET MVC 3 template (discussed in Tutorial: Creating a Web Project in F# Using an Online Template). All code discussed in this article will be added to the Model.fs file of the FSharpWeb.Core project.

Creating the Model Source File

The Model.fs file typically declares several types used to return data to other components and contains a module that includes functions for retrieving data. The structure of the News Aggregator model file is shown below. The aggregator uses the LINQ to XML library, so it is necessary to add references to System.Xml and System.Xml.Linq.dll.

open System
open System.Net
open System.Xml
open System.Xml.Linq

// This area will contain types representing news items

module Model = 
    // This area will contain functions for reading data

The listing shows only the basic structure of the file. The following section starts filling in the blank spaces by looking at the code that downloads and parses RSS feeds.

Downloading and Parsing Feeds

The best way to develop the code that downloads and parses RSS feeds is to use F# Interactive. This way, you can write some code and immediately verify that it behaves as expected. This section follows this approach and starts with a simple snippet, looks at the results interactively, and then turns it into a model that will be used by the web application.

Testing the Feed Processing

Downloading an RSS feed can take some time, especially if the server is busy. To avoid blocking a thread, the snippet below implements the download using an F# asynchronous workflow. The actual downloading is done using a WebClient object and the AsyncDownloadString extension method, which is added by the F# core library.

After the feed is downloaded, it is loaded into an XDocument object, which provides an easy way of navigating through the XML elements. The following snippet shows two helpers that make this even easier. The xn function is used to create a value representing an XML name, and the dynamic operator (?) is used for reading the values of the elements with specified names:

let xn s = XName.Get(s)
let (?) (el:XElement) name = el.Element(xn name).Value

let AsyncDownloadFeed(url) = async { 
    let wc = new WebClient()
    let! rss = wc.AsyncDownloadString(Uri(url))
    let feed = XDocument.Parse(rss) 
    let elements = 
        [ for el in feed.Descendants(xn "item") do 
              yield el?title, el?link ]
    return elements }

This implementation of the dynamic operator (?) isn't supposed to be very general. It is just a simple helper for this particular snippet. The RSS feed format contains an <item> element for each article. The element contains nested elements such as <title> and <link>, which contain values in the form of text. When the snippet later gets a reference to the <item> element el, the title and link values can be extracted just by writing el?title and el?link.

The main function declared by the snippet is AsyncDownloadFeed. It downloads the specified RSS feed and loads it into an XDocument object. Then, it iterates over all <item> nodes and returns tuples containing the article title and link.

After selecting the code and entering it into F# Interactive, it is possible to test the function. The following listing shows a command that downloads a sample feed:

> let url = "http://feeds.guardian.co.uk/theguardian/world/rss"
  AsyncDownloadFeed(url) |> Async.RunSynchronously;;
val it : (string * string) list =
  [ ("Bin Laden wasn't sheltered by us, says Pakistan",
     "http://www.guardian.co.uk/world/2011/may/03/...");
    ("Gaddafi forces' bombardment cut off Misrata",
     "http://www.guardian.co.uk/world/2011/may/02/...");
    ("Air France crash: second black box recovered",
     "http://www.guardian.co.uk/world/2011/may/03/..."); ...]

The output shows several recent articles returned at the time of writing this article. Most importantly, it demonstrates that the code for downloading and parsing works as expected. The next section makes several improvements to the code so that it can be easily used later by the rest of the web application.

Exporting the Feed Data

The information obtained from the RSS feed will be returned to the view of the application that will render it to the user. When returning data to other components, it is generally recommended to use types with named properties or fields instead of tuples. Although tuples can be used from C#, they make code more difficult to read.

The easiest way to define a type with named members is to use an F# record type. The following type represents information about a single article:

type FeedLink = 
  { Title : string
    Link : string
    Description : string }

In addition to the title and link URL that were already used in the previous section, this type also contains a field to store a brief description of the article. Unfortunately, the length of the description returned by different RSS feeds often varies, and some feeds even include HTML tags. To display a plain text description, the following function removes all of the HTML tags and ensures that the maximum length of the snippet is 200 characters. The function is extremely simplified and uses regular expressions (it requires opening the namespace System.Text.RegularExpressions). A more robust implementation would use some HTML parser instead.

let stripHtml (html:string) = 
    let res = Regex.Replace(html, "<.*?>", "")
    if res.Length > 200 then res.Substring(0, 200) + "..." else res

Now it is very easy to modify the previous AsyncDownloadFeed function to return a sequence of FeedLink items instead of a list of tuples. The snippet builds a list with items (to ensure that it is fully evaluated) and then converts it to seq<FeedLink>, which is easier to use from C#:

let AsyncDownloadFeed(url) = async { 
    let wc = new WebClient()
    let! rss = wc.AsyncDownloadString(Uri(url))
    let feed = XDocument.Parse(rss) 
    let elements = 
      [ for el in feed.Descendants(xn "item") do 
            yield { Title = el?title
                    Link = el?link
                    Description = stripHtml el?description } ] |> seq
    let name = feed.Element(xn "rss").Element(xn "channel")?title
    return name, elements }

Before returning the collection of news links, the function also gets the title of the entire feed (which is available under the path rss/channel/title). Although it is generally preferred to use records (or classes), the name is returned together with the collection using a tuple. This is a very simple case where tuples do not cause confusion (it is clear that string value is a name).

The last step in implementing the model is to add code that downloads several feeds in parallel and returns the result as a single record that can be later displayed on the home page of the application.

Gathering the Home Page Data

The News Aggregator home page displays the news from several RSS feeds. The data for the page can be stored in the following record:

type NewsPage = 
  { Feeds : seq<string * seq<FeedLink>> }

The record is quite simple. It contains only a single field named Feeds, which is a collection of tuples containing the heading and articles for each individual source. The type is using the seq<'T> type instead of an F# list to make sure that the type will be easy to use from C# (because the view component is implemented using embedded C#).

The following listing defines a function AsyncDownloadAll that asynchronously downloads data from three RSS feeds in parallel and returns the result using the NewsPage record:

let feeds = 
    [ "http://feeds.guardian.co.uk/theguardian/world/rss"
      "http://www.nytimes.com/services/xml/rss/nyt/GlobalHome.xml" 
      "http://feeds.bbci.co.uk/news/world/rss.xml" ]

let AsyncDownloadAll() = async {
    let! results = 
        [ for url in feeds -> AsyncDownloadFeed(url) ]
        |> Async.Parallel
    return { Feeds = results } }

The body of the function generates a list of asynchronous workflows and then combines them into a single workflow that runs them all in parallel using Async.Parallel. When the data is downloaded, the asynchronous body of the function resumes and wraps the result into a NewsPage record. Although the model now contains several types and functions, all of the code can still be entered into F# Interactive so the AsyncDownloadAll function can be tested in the same way as the previous function for downloading a single RSS feed.

Summary

This step of the tutorial looked at the implementation of the model of a web application for aggregating the news from various RSS feeds. This part of the application is usually a set of standard F# modules and types that obtain data and expose them in a form that can easily be used from the rest of the application. To prevent the web server from blocking threads, the functionality was implemented using asynchronous workflows.

The next step shows how to integrate the model with the rest of the web application. ASP.NET MVC supports asynchronous controllers, which also enable writing the HTTP request processing without blocking threads.

Additional Resources

This tutorial demonstrates how to create a web application that downloads RSS feeds from other web sites and aggregates them. The best way to read the tutorial is to continue to the next step, which completes the application:

The structure of the web application was created using an online template, which is discussed in the following article:

More complex web applications usually also store their data in databases. More information about working with databases using ADO.NET and as LINQ to SQL can be found in the following article:

To download the code snippets shown in this article, go to https://code.msdn.microsoft.com/Chapter-5-Bulding-Data-ec639934

See Also

This article is based on Real World Functional Programming: With Examples in F# and C#. Book chapters related to the content of this article are:

  • Book Chapter 7: “Designing data-centric programs” explains how to design applications that are primarily designed to work with or create a data structure. This is very often the design used by ASP.NET MVC models.

  • Book Chapter 12: “Sequence expressions and alternative workflows” contains detailed information on processing in-memory data (such as lists and seq<'T> values) using higher-order functions and F# sequence expressions.

  • Book Chapter 13: “Asynchronous and data-driven programming” explains how asynchronous workflows work and uses them to write an interactive script that downloads a large dataset from the Internet.

The following MSDN documents are related to the topic of this article:

  • Asynchronous Workflows (F#) explains how to use F# asynchronous workflows for writing efficient computations that do not block the execution of other work.

  • Records (F#) provides more information about declaring and using records in F#.

  • XDocument Class Overview explains the structure of the XDocument type and how to use it to read and create XML data.

  • LINQ to XML provides detailed information about the LINQ to XML library including examples and tutorials (C# and Visual Basic).

Previous article: Tutorial: Creating a News Aggregator

Next article: Step 2: Displaying Aggregated News Items