Export (0) Print
Expand All

Creating an Online RSS News Aggregator with ASP.NET

 

Scott Mitchell

Revised August 2003

Applies to:
    Microsoft® ASP.NET

Summary: Learn about displaying XML data in an ASP.NET Web page using the XML Web control to retrieve remote XML data, and about using the Repeater control to emit XML data from a database. With the ever-increasing demands of data sharing among disparate platforms, the use of XML has exploded over the past few years. Realizing this trend, Microsoft made sure to include robust XML support throughout the .NET Framework. For ASP.NET developers, this means that displaying and working with XML data in a Web page has never been simpler. Throughout this article we'll learn about XML and ASP.NET by building an RSS 2.0 syndication engine and an online news aggregator. This article assumes the reader is familiar with ASP.NET and XML. (20 printed pages)

Download RSSAggregator.msi.

Contents

Introduction
Syndicating Content with RSS 2.0
Creating the Syndication Output via an ASP.NET Web Page
Consuming a Syndication Feed in an ASP.NET Web Page
Displaying the List of Syndication Feeds
Displaying the News Items for a Particular Syndication Feed
Displaying the Details for a Particular News Item
Future Enhancements and Current Shortcomings
Summary

Introduction

With the rise of always-on Internet connections in homes and businesses, and the continued explosive growth of the World Wide Web and Internet-accessible applications, it is becoming more and more important for applications to be able to share data with each other. Sharing data among disparate platforms requires a platform-neutral data format that can be easily transmitted via standard Internet protocols—this is where XML fits in. Since XML files are essentially, simple text files with well-known encodings, and since there exist XML parsers for all commonly used programming languages, XML data can be easily consumed by any platform.

A good example of data-sharing using XML is Web site syndication, commonly found in news sites and Web logs. With Web site syndication, a Web site publishes its latest content in an XML-formatted, Web-accessible syndication file. There are a number of syndication formats in use, one of the more popular ones being RSS 2.0. (RSS 2.0 Specification is published online at the Technology at Harvard Law site.) Additionally, MSDN® Magazine has a syndication file, MSDN Magazine: Current Issue, which lists the most recent MSDN Magazine articles with links to the online version.

Once a Web site has publicly published a syndication file, various clients may decide to consume it. There are a number of ways to consume a syndication file. Someone who runs a .NET resource Web site might want to add the latest MSDN Magazine article headlines on their Web site. Syndication files are also commonly consumed by news aggregator applications, which are applications designed specifically to retrieve and display syndication files from a variety of sources.

With this growing emphasis on XML data, being able to work with XML data in an ASP.NET Web page is more pertinent now than ever before. Since Web site syndication is becoming all the rage, in this article we'll build a Web site syndication file generator as well as an online news aggregator application. As we work on these two mini-projects throughout this article we'll examine how to access and display XML data from both a remote Web server and from the local file system. We'll look at how to display XML data in a myriad of ways, such as using a Repeater control and using the ASP.NET XML Web control.

Since I do not have limitless space for this article, I will assume that you are currently familiar with XSLT and XPath. If this is not the case, consider reading the following resources before continuing with this article:

Syndicating Content with RSS 2.0

The first mini-application we will be building in this article is a syndication file generator. For this mini-application, imagine that you work as a Web developer for a large news site (like MSNBC.com) where all of the news stories are stored in a Microsoft® SQL Server™ 2000 database. Specifically, the articles are stored in a table called Articles with the following germane fields:

  • ArticleID—an auto-increment primary key integer field uniquely identifying each article.
  • Title—a varchar(50), specifying the title of the news item,
  • Author—a varchar(50), specifying the author of the news item,
  • Description—a varchar(2000), providing a more in-depth description of the news item, and
  • DatePublished—a datetime indicating the date the news item was published.

Note that there might be other fields in the Articles table, but those listed above are the only fields we are interested in using for syndication. Furthermore, this is a very simplified data model; in a real-world setting you would likely have a more normalized database, such as having a separate table for authors, a many-to-many table joining authors and articles, and so on.

Our next step is to create an ASP.NET Web page that will display a list of the most recent news items as a properly formatted RSS 2.0 XML file. Before examining how to accomplish this transformation in an ASP.NET Web page, let's first take a moment to examine the RSS 2.0 specification. While looking over the specification, keep in mind that RSS is designed to provide a data model to syndicate content. Not surprisingly, then, it has a series of XML elements for information about the Web site syndicating the content, as well as a series of XML elements to describe a particular news item. Finally, don't forget that RSS syndication files, like any XML-formatted file, must adhere to XML formatting guidelines, namely that:

  • All XML elements be properly nested,
  • All attribute values be quoted, and
  • All instances of <, >, &, " and ' be replaced with &lt;, &gt;, &amp;, &quot; and &apos;, respectively.

Furthermore, XML files are case-sensitive, meaning that the opening and closing tags for an XML element must match in case as well as in spelling.

The root element in an RSS 2.0 file is the <rss> element. You can provide the version number in this element like so:

<rss version="2.0">
  ...
</rss>

The <rss> element has a single child element, <channel>, which describes the syndicated content. Inside the <channel> element there are three required children elements that are used to describe information about the syndicating Web site. These three elements are:

  • title—Specifies the name of the syndication file, and typically includes the Web site's name,
  • link—the URL to the Web site, and
  • description—a short description of the Web site.

There are a number of optional elements to describe the Web site as well; see the RSS 2.0 Specification for more information on these elements.

Each news item is placed within an individual <item> element. The <channel> element can have an arbitrary number of <item> elements. Each <item> element can have a variety of children elements, the only requirement being that the <item> contain at minimum either the <title> element or the <description> element as a child. A list of the more germane <item> children elements follows:

  • title—the title of the news item,
  • link—the URL to the news item,
  • description—a brief synopsis of the news item,
  • author—the author of the news item, and
  • pubDate—the published date of the news item.

A very simple RSS 2.0 syndication file is shown below. You can see another example RSS 2.0 file from RSS generated by Radio UserLand.

<rss version="2.0">
  <channel>
    <title>Latest DataWebControls.com FAQs</title>
    <link>http://datawebcontrols.com</link>
    <description>
        This is the syndication feed for the FAQs 
        at DataWebControls.com
    </description>
    <item>
      <title>Working with the DataGrid</title>
      <link>http://datawebcontrols.com/faqs/DataGrid.aspx</link>
      <pubDate>Mon, 07 Jul 2003 21:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Working with the Repeater</title>
      <description>
         This article examines how to work with the Repeater 
         control.
      </description>
      <link>http://datawebcontrols.com/faqs/Repeater.aspx</link>
      <pubDate>Tue 08 Jul 2003 12:00:00 GMT</pubDate>
    </item>
  </channel>
</rss>

One important thing to note here is the <pubDate> element's formatting. RSS requires that the date be formatted according to RFC 822, Date and Time Specification, which starts with an optional three-letter day abbreviation and comma, followed by a required day, then the three-letter abbreviated month, and then the year, followed by a time with time-zone name. Also notice that the <description> child element in the <item> element is optional: the first news item lacks a <description> element, while the second news item has one.

Creating the Syndication Output via an ASP.NET Web Page

Now that we've seen how our news items are stored along with the RSS 2.0 specification, we're ready to create an ASP.NET Web page that, when requested, will return our Web site's syndicated content. More specifically, we'll create an ASP.NET Web page named rss.aspx that will return the five most recent news items from the Articles database table, formatted according to the RSS 2.0 specification.

There are a number of ways to accomplish this, which we'll examine shortly. First things first, though: we need to get the five most recent news items from the database. This can be done with the following SQL query:

SELECT TOP 5 ArticleID, Title, Author, Description, DatePublished
FROM Articles
ORDER BY DatePublished DESC

Once we have this information, we need to render it into an appropriate RSS 2.0 syndication file. The simplest and quickest way to display the database data as XML data is to use a Repeater control. Specifically, the Repeater will display the <rss>, <channel>, and Web site related elements tags in its HeaderTemplate and FooterTemplate templates, and the <item> elements in the ItemTemplate template. The following is the HTML portion of our ASP.NET Web page (the .aspx file):

<%@ Page language="c#" ContentType="text/xml" Codebehind="rss.aspx.cs"
  AutoEventWireup="false" Inherits="SyndicationDemo.rss" %>
<asp:Repeater id="rptRSS" runat="server">
  <HeaderTemplate>
    <rss version="2.0">
      <channel>
        <title>ASP.NET News!</title>
        <link>http://www.ASPNETNews.com/Headlines/</link>
        <description>
          This is the syndication feed for ASPNETNews.com.
        </description>
  </HeaderTemplate>

  <ItemTemplate>
        <item>
          <title><%# FormatForXML(DataBinder.Eval(Container.DataItem,
                                              "Title")) %></title>
          <description>
             <%# FormatForXML(DataBinder.Eval(Container.DataItem, 
                                     "Description")) %>
          </description>
          <link>
             http://www.ASPNETNews.com/Story.aspx?ID=<%# 
                   DataBinder.Eval(Container.DataItem, "ArticleID") %>
          </link>
          <author><%# FormatForXML(DataBinder.Eval(Container.DataItem, 
                                             "Author")) %></author>
          <pubDate>
             <%# String.Format("{0:R}", 
                  DataBinder.Eval(Container.DataItem, 
                                         "DatePublished")) %>
           </pubDate>
        </item>
  </ItemTemplate>

  <FooterTemplate>
      </channel>
    </rss>  
  </FooterTemplate>
</asp:Repeater>

The first thing to note is that the above code example contains only the Repeater control and no other HTML markup or Web controls. This is because we want this page to emit nothing but XML output. In fact, if you look in the @Page directive you will find that the ContentType has been set to the XML MIME type (text/xml). Next, in the ItemTemplate when adding the Title, Description and Author database fields to the XML output, we call the helper function FormatForXML(). This function is defined in the code-behind class—which we'll examine shortly—and simply replaces any illegal XML characters with their escaped, legal equivalent. Finally, the DatePublished database field, entered into the <pubDate> element, is formatted appropriately using the String.Format() method. The standard format specifier R formats the DatePublished value appropriately.

The code-behind class for this Web page is rather straightforward. The Page_Load event handler simply binds the database query to the Repeater, while the FormatForXML() function does a few simple string replacements, if needed. For brevity, only the Page_Load event handler and FormatForXML() function are shown from the code-behind class:

private void Page_Load(object sender, System.EventArgs e)
{
   // Connect to the Database
   SqlConnection myConnection = new SqlConnection(connection string);
   
   // Retrieve the SQL query results and bind it to the Repeater
   string SQL_QUERY = "SELECT TOP 5 ArticleID, Title, Author, " +
                          "Description, DatePublished " +
                      "FROM Articles ORDER BY DatePublished DESC";
   SqlCommand myCommand = new SqlCommand(SQL_QUERY, myConnection);

   // bind the results to the Repeater
   myConnection.Open();
   rptRSS.DataSource = myCommand.ExecuteReader();
   rptRSS.DataBind();
   myConnection.Close();
}


protected string FormatForXML(object input)
{
   string data = input.ToString();      // cast the input to a string

   // replace those characters disallowed in XML documents
   data = data.Replace("&", "&amp;");
   data = data.Replace("\"", "&quot;");
   data = data.Replace("'", "&apos;");
   data = data.Replace("<", "&lt;");
   data = data.Replace(">", "&gt;");

   return data;
}

A screenshot of rss.aspx, when viewed through a browser, can be seen in Figure 1.

Aa478968.aspnet-createrssw-aspnet01(en-us,MSDN.10).gif

Figure 1. Rss.aspx, When Viewed Through a Browser

Before moving on to building an online news aggregator, let me mention some possible enhancements for the syndication engine. First, every time the rss.aspx Web page is requested, a database access occurs. If you expect a large number of people to be frequently accessing rss.aspx, it would be worthwhile to use output caching. Second, typically news sites break down their syndication into a number of categories. For example, News.com has specialized syndicated sections, such as content focusing on Enterprise Computing, E-Business, Communications and so on. Providing such support could be easily accomplished provided the Articles table has some sort of Category field. The rss.aspx Web page, then, could accept a querystring parameter indicating what category to display, and then retrieve only those news items for the specified category.

Consuming a Syndication Feed in an ASP.NET Web Page

In order to test the syndication engine we just created, let's build an online news aggregator that allows for any number of syndication feeds. The aggregator user interface will be fairly straightforward, comprising of three frames, as shown in Figure 2. In the left frame, a list of the various syndication feeds will be listed. In the top right frame, the news items for the selected syndication feed will be displayed. Finally, in the bottom right frame, the title and description of the selected news item feed will be displayed, with a link to the news item. Note that this UI is pretty much the de facto standard UI for aggregators of all kinds, including news aggregators, email clients and newsgroup readers.

Aa478968.aspnet-createrssw-aspnet02(en-us,MSDN.10).gif

Figure 2. A Screenshot of the News Aggregator User Interface

The first step is to create an HTML Web page that sets up the framed user interface. Fortunately, Visual Studio .NET 2003 makes this process quite simple, just choose to Add a New Item to your Web application Solution, choosing the new item type to be the Frameset. (I named this new file NewsAggregator.htm in my project. I left this as an HTML file as opposed to making it an ASP.NET Web page because this page will contain just the HTML to setup the three frames in the Web page. Each individual frame will be displaying an actual ASP.NET Web page.) This will launch the Frameset Template wizard shown in Figure 3. Simply pick the Nested Hierarchy option and click OK.

Aa478968.aspnet-createrssw-aspnet03(en-us,MSDN.10).gif

Figure 3. The Frameset Wizard in Visual Studio .NET 2003

The Frameset Template Wizard will then create an HTML Web page with the frame source already added. Merely set the src attribute of the left frame to DisplayFeeds.aspx, the URL of the ASP.NET Web page that will display the list of syndication feeds. That's it for the NewsAggreator.htm page.

Over the next three sections we'll look at creating the three components of the online news aggregator: DisplayFeeds.aspx, which displays the list of syndication feeds; DisplayNewsItems.aspx, which displays the news items for a particular syndication feed; and DisplayItem.aspx, which displays the details for a particular news item for a particular syndication feed.

Displaying the List of Syndication Feeds

We now need to create the DisplayFeeds.aspx ASP.NET Web page. This Web page will display the list of feeds we are subscribed to. For this demonstration, I decided to place the list of feeds in a database table called Feeds, although you could store the feeds just as well in an XML file. The Feeds table contains four fields:

  • FeedID—auto-incremented, integer primary key field used to uniquely identify each feed,
  • Title—a varchar(50) which stores the name of the feed,
  • URL—a varchar(150) that stores the URL to the RSS syndication feed, and
  • UpdateInterval—an integer field that specifies how often the feed should be updated, in minutes.

The DisplayFeeds.aspx Web page uses a DataGrid to display the list of syndication feeds. The DataGrid has a single HyperLinkColumn column, which displays the value of the Title field and links to the page DisplayNewsItems.aspx, passing along the FeedID field value in the querystring. The DataGrid declaration, with some peripheral markup omitted for brevity, can be seen below:

<asp:DataGrid id="dgFeeds" runat="server" 
             AutoGenerateColumns="False" ...>
   ...
   <Columns>
     <asp:HyperLinkColumn Target="rtop" 
         DataNavigateUrlField="FeedID" 
         DataNavigateUrlFormatString="DisplayNewsItems.aspx?FeedID={0}"
         DataTextField="Title" HeaderText="RSS Feeds">
     </asp:HyperLinkColumn>
   </Columns>
</asp:DataGrid>

The key thing to observe is the HyperLinkColumn. Note that its Target property is set so that when the user clicks on a syndication feed URL, the DisplayNewsItems.aspx Web page is opened in the top right frame. Also, the DataNavigateUrlField, DataNavigateUrlFormatString, and DataTextField properties are set so that the hyperlink displays the title of the syndication feed and, when clicked, it takes the user to DisplayNewsItems.aspx, passing along the value of the FeedID field in the querystring. (The code-behind class for this page simply accesses the list of feeds from the Feeds table ordered alphabetically by the Title field table and then binds the results to the DataGrid. For space considerations, this code is not presented here in the text of the article.)

Displaying the News Items for a Particular Syndication Feed

The next task that faces us is creating the DisplayNewsItems.aspx Web page. This page should display the titles of the news items in the selected syndication feed as hyperlinks such that when the hyperlink is clicked the description of the news item is shown in the bottom right frame. This task presents us with two primary challenges:

  • Accessing the RSS syndication feed from the feed's specified URL, and
  • Transforming the XML received into the appropriate HTML.

Fortunately with the .NET Framework, neither of these two challenges is particularly daunting. For the first task, realize that remote XML data can be loaded into an XmlDocument object with just two lines of code. For the second task, displaying XML in an ASP.NET Web page is a cinch with the ASP.NET XML Web control.

The XML Web control is designed to display raw or transformed XML data on a Web page. The first step in using the XML Web control is specifying the XML data source. This can be accomplished in a bevy of ways through three different properties. Using the Document property, you can assign an XmlDocument instance as the XML data source for the XML Web control. If the XML data exists in a file on the Web server's file system, you can use the DocumentSource property, providing either a relative or absolute path to the XML file. Finally, if you have the XML data in a string, you can set this string to the XML Web control's DocumentContent property. Any of these three approaches can be used to associate XML data with the XML Web control.

Typically we'll want to transform the XML data in some manner before displaying it on the Web page. The XML Web control allows us to specify an XSLT stylesheet that will apply the transformation. Similar to the XML data, the XSLT stylesheet can be specified in one of two ways through one of two properties: the Transform property can be assigned an XslTransform instance or the TransformSource property can be set to a relative or absolute file path to the XSLT stylesheet on the local Web server. In a moment, we'll see an example of the XML Web control in action.

Let's get to creating the DisplayNewsItems.aspx Web page. Before we add an XML Web control and start creating the code-behind class, we need to add a small bit of client-side JavaScript to the HTML portion. Specifically, add the following <script> block in the <head> tag of the HTML portion:

<script language="javascript">
  // display a blank page in the bottom frame when the news items loads
  parent.rbottom.location.href = "about:blank";
</script>

This client-side JavaScript displays a blank page in the bottom right frame whenever DisplayNewsItems.aspx is loaded. To understand why we want to do this consider the following situation that can unfold if we omit this <script> block:

  1. The user clicks on a syndication feed from the left frame, thereby loading the feed's news items in the top right frame.
  2. The user then clicks one of the news items from the top right frame, thereby loading the news item's details in the bottom right frame.
  3. Now the user clicks on a different syndication feed from the left frame, thereby loading the new feed's news items in the top right frame.

At this point, the details from the previous syndication feed's news item are still in the bottom right frame! The above client-side script code alleviates this glitch by "whipping out" the contents of the bottom right frame every time a syndication feed from the left frame is clicked.

Now that we have taken care of that client-side scripting issue, let's turn our attention to adding the needed XML Web controls. Once you have added the XML Web control, set its ID property to xsltNewsItems and its TransformSource property to NewsItems.xslt (the name of the XSLT stylesheet we'll create next). Now, in the Page_Load event handler we need to retrieve the remote RSS syndication file in an XmlDocument instance, and then set the XML Web control's Document property to this XmlDocument instance.

private void Page_Load(object sender, System.EventArgs e)
{
   // See if the news items for this feed are in the Data Cache
   int feedID = Int32.Parse(Request.QueryString["FeedID"]);

   // Connect to the Database to find out the URL to the RSS
   SqlConnection myConnection = new SqlConnection(connection string);
   
   // Retrieve the SQL URL for the Remote RSS syndication file
   string SQL_QUERY = "SELECT URL, UpdateInterval FROM Feeds " +
                      "WHERE FeedID = @FeedID";
   SqlCommand myCommand = new SqlCommand(SQL_QUERY, myConnection);

   SqlParameter feedParam = new SqlParameter("@FeedID", 
                                          SqlDbType.Int, 4);
   feedParam.Value = feedID;
   myCommand.Parameters.Add(feedParam);

   myConnection.Open();
   string feedURL = myCommand.ExecuteScalar().ToString()
   myConnection.Close();

   // Now that we have the feed URL, load it in into an XML document
   XmlDocument feedXML = new XmlDocument();
   feedXML.Load(feedURL);

   xmlNewsItems.Document = feedXML;
}

The most germane lines of code in the Page_Load event handler are the last three. These three lines of code create the new XmlDocument object, load in the remote RSS feed, and assign the XmlDocument object to the XML Web control's Document property. Isn't it impressive how simple it is to access remote XML data and display it in an ASP.NET Web page?

All that we have left to do now is create the XSLT stylesheet, NewsItems.aspx. This first draft of this stylesheet can be seen below:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" 
            xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="html" omit-xml-declaration="yes" />

   <xsl:template match="/rss/channel">
      <b><xsl:value-of select="title" 
                   disable-output-escaping="yes" /></b>
      <xsl:for-each select="item">
         <li>
            <a>
               <xsl:attribute name="href">
                  DisplayItem.aspx?ID=<xsl:number value="position()" />
   </xsl:attribute>
               <xsl:attribute name="target">rbottom</xsl:attribute>
               <xsl:value-of select="title"
                       disable-output-escaping="yes"  />
            </a>
            (<xsl:value-of select="pubDate" />)
         </li>
      </xsl:for-each>
   </xsl:template>
</xsl:stylesheet>

This XSLT stylesheet has a single template that matches on the /rss/channel XPath expression, outputting the value of the <title> element in a bold font. Next, it iterates through each of the <item> elements and, for each, displays a hyperlink to DisplayItem.aspx, passing the <item> element's position through the querystring. Note that this hyperlink also has its target attribute set to rbottom, the name of the bottom right frame. Finally, it displays the value of the <pubDate> element after each news item title.

There are a couple of items in the XSLT stylesheet that not everyone may be familiar with. The first of these is the disable-output-escaping="yes" attribute in the <xsl:value-of> elements. Essentially, this attribute setting informs the XSLT engine that it should not escape those illegal XML characters—&, < , >, " and '. To understand what this accomplishes, realize that if this attribute were not set (or was set to the default value, "no"), then if the title contained an escaped & as &amp;, the resulting HTML would have &amp; in it as well, as opposed to just &. If you think about this for a bit, you can see that this can cause a multitude of problems. For example, if the title for a syndication file is: "Matt's &lt;i&gt;Cool&lt;/i&gt; Blog", then if output escaping is not disabled, then the output would remain "Matt's &lt;i&gt;Cool&lt;/i&gt; Blog" and would be shown in the Web page as, "Matt's <i>Cool</i> Blog". With disable-output-escaping="yes", however, the output is not escaped and is read as "Matt's <i>Cool</i> Blog", thereby displaying in the Web page as the desired "Matt's Cool Blog".

Another thing to note is the <a> element. Realize that this funky syntax ends up generating the following output:

<a href="DisplayItem.aspx?ID=position">news item title</a>

The reason we have to use this syntax is because in order to add an attribute to an element in an XSLT stylesheet you need to create the element and then inside of the element's tags, use the <xsl:attribute> syntax. There are some examples of this syntax available online at W3Schools, The <xsl:attribute> Element page.

Finally, note that the ID querystring value in the hyperlink is assigned the value from the <xsl:number> element, with a value of the position() function. The <xsl:number> element simply emits a number. The position() function is an XPath function that returns the ordinal position of the current node in the XML document. This means the first news item will have a position() value of 1; the second, a position() value of 2; and so on. We need to record this value and pass it along via the querystring so that when the DisplayItem.aspx Web page is called, it knows what item from the RSS syndication feed we are interested in viewing.

The astute reader may have realized that our XSLT stylesheet is not complete due to the fact that the FeedID parameter is not passed through the querystring to the DisplayItem.aspx Web page. To see why this is a problem, recall that we are sending in the ID querystring parameter the ordinal position of the <item> element the user is interested in viewing the details for. That is, if the user clicks on the fourth news item, the page DisplayItem.aspx?ID=4 will be loaded in the bottom right frame. The problem is that DisplayItem.aspx cannot determine what feed the user is interested in viewing. There are a couple of ways to figure this out, such as having the bottom right frame use client-side JavaScript to read the URL of the top right frame, thereby ascertaining the value of FeedID. A simpler way, in my opinion, is to merely pass along the FeedID value through the querystring along with the ID parameter.

One difficulty that arises from this is that FeedID is not present in the RSS XML data, which is what the XSLT stylesheet is working with. The DisplayNewsItems.aspx Web page knows the FeedID, though, and needs to somehow let the XSLT stylesheet know this value. This can be accomplished through the use of XSLT parameters.

Using XSLT parameters is fairly straightforward. In the XSLT stylesheet, you need to add inside the <xsl:template> element an <xsl:param> element, which provides the name for the parameter. Let's call this parameter FeedID:

<xsl:stylesheet version="1.0" 
       xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   <xsl:template match="/rss/channel">
      <xsl:param name="FeedID" />

                ...
   </xsl:template>
</xsl:stylesheet>

Now, the parameter can be used in an <xsl:value-of> element using the following syntax:

<xsl:value-of select="$parameterName" />

Therefore, we can add the FeedID querystring parameter to the hyperlink by adding the following to our existing XSLT stylesheet:

<a>
  <xsl:attribute name="href">DisplayItem.aspx?ID=<xsl:number 
    value="position()" />&amp;FeedID=<xsl:value-of select="$FeedID" 
      /></xsl:attribute>

Note that we have an ampersand (escaped to &amp;) after the ID querystring parameter, and then have the querystring parameter FeedID with the value from the FeedID parameter. That's all we have to add to our XSLT stylesheet.

What remains is to set the parameter's value programmatically in the Page_Load event handler of the DisplayNewsItems.aspx Web page. This is accomplished using the XsltArgumentList class. This class contains an AddParameter() method. Once we have created an instance of this class and added the parameter, we simply set the class instance to the XML Web control's TransformArgumentList parameter. The following code shows the updated Page_Load event handler for DisplayNewsItems.aspx:

private void Page_Load(object sender, System.EventArgs e)
{
        ...

   // Now that we have the feed URL, load it in into an XML document
   XmlDocument feedXML = new XmlDocument();
   feedXML.Load(feedURL);

   xmlNewsItems.Document = feedXML;

   // Add the FeedID parameter to the XSLT stylesheet
   XsltArgumentList xsltArgList = new XsltArgumentList();
   xsltArgList.AddParam("FeedID", "", feedID);
   xmlNewsItems.TransformArgumentList = xsltArgList;
}

Displaying the Details for a Particular News Item

All that remains left to do is to display the detailed information for the particular news item the user selected. This detailed information will be displayed in the bottom right frame, and will show the title of the news item entry, its description, and a link to the news item. As with the DisplayNewsItem.aspx Web page, DisplayItem.aspx starts by retrieving the remote RSS syndication feed based on the passed-in FeedID querystring parameter. It then uses an XML Web control to display the detailed information. In fact, the Page_Load event handler for the DisplayItem.aspx Web page is identical to the DisplayNewsItem.aspx Web page's Page_Load event handler save for two minor differences:

  • DisplayItem.aspx needs to read in the value of the ID querystring parameter, and
  • DisplayItem.aspx uses an XSLT parameter, but one different from DisplayNewsItem.aspx.

As with DisplayNewsItem.aspx, DisplayItem.aspx needs to pass in a parameter into the XSLT stylesheet. Whereas DisplayNewsItem.aspx passed in the querystring parameter for FeedID, DisplayItem.aspx needs to pass in the ID querystring parameter, which indicates what news item the XSLT stylesheet should display. These minor changes are shown in the Page_Load event handler below in a bold font; code identical to the Page_Load event handler from DisplayNewsItems.aspx has been omitted:

private void Page_Load(object sender, System.EventArgs e)
{
   // See if the news items for this feed are in the Data Cache
   int feedID = Int32.Parse(Request.QueryString["FeedID"]);
   int ID = Int32.Parse(Request.QueryString["ID"]);

   ...

   // Now that we have the feed URL, load it in into an XML document
   XmlDocument feedXML = new XmlDocument();
   feedXML.Load(feedURL);

   xmlNewsItems.Document = feedXML;

   this.xmlItem.Document = feedXML;

   // Add the ID parameter to the XSLT stylesheet
   XsltArgumentList xsltArgList = new XsltArgumentList();
   xsltArgList.AddParam("ID", "", ID);
   xmlItem.TransformArgumentList = xsltArgList;
}

The XSLT stylesheet to transform the XML data can be seen below:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0"
          xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="html" omit-xml-declaration="yes" />
   <xsl:param name="ID" />

   <xsl:template match="/rss/channel">
      <b><xsl:value-of select="item[$ID]/title" 
                           disable-output-escaping="yes" /></b>
      <p>
         <xsl:value-of select="item[$ID]/description" 
                           disable-output-escaping="yes" />
      </p>
      <a>
         <xsl:attribute name="href"><xsl:value-of 
           select="item[$ID]/link" /></xsl:attribute>
         <xsl:attribute name="target">_blank</xsl:attribute>
         Read More...
      </a>
   </xsl:template>
</xsl:stylesheet>

Note that an <xsl:param> element is used to declare the ID XSLT parameter. Then, in the various <xsl:value-of> elements, the ID parameter is used to grab just the specific <item> element from the list of <item> elements. Realize that in XPath the syntax elementName[i] accesses just the Ith element with the appropriate element name. For example, item[1] would retrieve just the first <item> element, while item[2] would retrieve just the second. Therefore, item[$ID] retrieves just the <item> element as specified by the ID XSLT parameter.

Finally, note that the Read More… hyperlink, outputted near the end of the XSLT stylesheet, has its target attribute set to blank, thereby causing a new window to be opened when a user clicks on the Read More… link.

Future Enhancements and Current Shortcomings

One glaring shortcoming with the code we examined in this article is that each time the user clicks on a syndication feed from the left frame, or clicks on a news item title from the top right frame, the remote RSS syndication feed is loaded and parsed. Clearly this is inefficient since once you click on a syndication feed all of the items are loaded; it is wasteful to reload the entire remote syndication feed each time the user clicks on one of the feed's news item titles. Not only is this approach inefficient, but impolite to the person or company providing the syndication service, as these incessant and needless requests are placing an unnecessary load on their Web server.

This shortcoming is overcome in the source code you can download with this article. Specifically, the .NET Data Cache is used to store the XmlDocument object for the various feeds. The cache duration is set to the value of the feed's UpdateInterval field in the Feeds table. (Of course, the feed's XmlDocument object could be evicted from the cache at an earlier point in time for a variety of reasons.)

Another shortcoming of the system is that there is no state between the top right and bottom right frames. To see where this can lead to trouble, consider the following actions:

  1. A user clicks on a syndication feed link from the left frame, loading the feed's news items in the top right frame. Assume that this syndication has an UpdateInterval value of 30, meaning that is expires in 30 minutes.
  2. In loading the news items in the top right frame, the feed is cached in the data cache.
  3. The user leaves for lunch.
  4. The Web site providing the syndication service adds a new news item.
  5. Our user returns from an hour lunch, meaning the XmlDocument for this feed in the data cache has expired.
  6. The user clicks on the first news item from the top right frame. This loads DisplayItem.aspx in the bottom frame, passing in an ID parameter value of 1.
  7. DisplayItem.aspx cannot find the XmlDocument in the cache, so it request the syndication feed from the remote source. In response it gets the new syndication feed (remember, a new news item was added back in step 4) and displays the first item (since the ID parameter equals 1).
  8. The user is displayed the new news item, which is a bit perplexing to the new news item is not displayed in the top right frame and was not the one they thought they clicked on.

This problem arises because the ID parameter does not uniquely identify each news item; rather, it is only an offset of the news item from a list of news items at a particular point in time. A good way of fixing this would be to stop using the data cache to store each syndication feed, and start using a database, or some other means of persistent storage (such as XML files on the Web server's local file system). By using a database, each news item can be given a unique identifier, which can then be passed down to the bottom right frame. This approach guarantees against the problem outlined above. Of course this adds some additional complexities, such as deciding when to clear old items out of the database.

The existing application also lacks any exception handling. This should definitely be added. In particular, exception handling needs to be added for retrieving and loading the remote RSS syndication file into the XmlDocument object. The file might not exist, or might be malformed.

There are a plethora of enhancements that one could easily add to this online news aggregator. One obvious one would be an administration page to allow the user to add, remove and edit their existing syndication feeds. Also, it would be nice to allow the user to group their syndication feeds by user-created categories. The user interface is a bit hard on the eyes, but this could be quickly fixed by either augmenting the HTML generated by the XSLT stylesheets or adding stylesheets to the various frames to allow for a more aesthetic appearance. Finally, it would be worthwhile to add some <meta> HTML tags to cause the top right frame to refresh periodically so that the user can see new news items for a particular syndication feed without having to manually refresh the Web page.

Note (August 4, 2003): After publication of this article, a few people emailed me informing me of two potential issues with displaying the <description> of a particular RSS syndication item the way I show in the article:
1. Disable-output-encoding, which is used in the <xsl:value-of>, is not uniformly implemented by all XSLT parsers.  The .NET XSLT parser supports disable-output-encoding, but just be aware of this in case you attempt to port this application to other platforms.
2. The HTML content in the <description> element is blindly emitted.  This HTML might contain malicious code, such as <script> or <embed> blocks which should, ideally, be stripped out.  In order to strip out these potentially offensive HTML elements, you would likely need to use extension functions (See Extending XSLT with JScript, C#, and Visual Basic .NET). For more information on the need for stripping HTML content from an RSS feed, refer to this 'Dive Into Mark' blog entry.

Summary

In this article we examined building not only a syndication engine, but an online news aggregator as well. In building both products we worked with displaying XML data in an ASP.NET Web page. With the syndication engine, we displayed database data in an XML format using a Repeater control. In the news aggregator, we used XML Web controls with XSLT stylesheets.

I invite you to download the online news aggregator and make enhancements to it as you see fit. If you have any questions with regards to the application or the concepts discussed in this article, don't hesitate to shoot me an email; I can be reached at mitchell@4guysfromrolla.com.

Happy Programming!

Recommended Links

About the Author

Scott Mitchell, author of five ASP/ASP.NET books and founder of 4GuysFromRolla.com, has been working with Microsoft Web technologies for the past five years. An active member in the ASP and ASP.NET community, Scott is passionate about ASP and ASP.NET and enjoys helping others learn more about these exciting technologies. For information on the DataGrid, DataList, and Repeater controls check out Scott's latest book, ASP.NET Data Web Controls Kick Start (ISBN: 0672325012).

Show:
© 2014 Microsoft