E Pluriblog Unum: Merging RSS Feeds

E Pluriblog Unum: Merging RSS Feeds

 

Kent Sharkey
MSDN Content Strategist

December 2004

Summary: Kent Sharkey describes one technique for merging multiple RSS feeds into a new RSS feed, which enables merging common data into a single source to make it easier for users to find the information they want. (20 printed pages)

C# code sample

Download the MSDNMergeRssCS.msi file.

Visual Basic code sample

Download the MSDNMergeRssVB.msi file.

Contents

Introduction
What Is RSS?
RssClient Design
RssClient Implementation
Conclusion
Related Resources

Introduction

RSS (Really Simple Syndication) has become a popular format not only for syndicating weblogs (or blogs as they are more commonly known), but as a generic XML format for tracking updates to Web sites or applications as well. However, what if you have a number of feeds that you want to aggregate or merge? Perhaps the individual feeds themselves don't update frequently enough, or maybe you want to gather together a number of blogs into one place for convenience. This article shows one method of merging multiple RSS feeds to create a new feed. On MSDN, we use a similar technique to create team blogs that display entries from the various people that make up each product team.

What Is RSS?

RSS is an XML grammar used to syndicate (that is, distribute) content. It is a surprisingly simple format, and like other simple formats before it (such as HTML or SOAP), it has grown to find a number of uses. Many people associate RSS as the distribution format of blogs. However, it is a much more generic tool. RSS can be used whenever you have changing data as a means of providing a list of changes. For example, a news service can expose the headlines as RSS. Clients of that RSS, whether they are desktop RSS Aggregators (such as RSSBandit or SharpReader), online RSS aggregators (such as NewsGator or Bloglines), or other Web sites can interpret this data to display.

An RSS file is actually quite simple and creating them by hand isn't that difficult. Most of the elements are optional, so you can simply use the ones you're most interested in. In addition, RSS 2.0 supports namespaces, enabling the creation of extensions to RSS. Because of the looseness of the RSS 2.0 specification and the addition of extensions, as well as the previous (0.9x and 1.0) of RSS, there can be quite a bit of variety in the actual information in an RSS feed. This complicates the parsing of RSS because you can't leverage a single XML Schema for validation. Figure 1 is a portion of the RSS feed from http://blogs.msdn.com, a site used by many Microsoft bloggers. You can see the standard RSS elements in this feed, as well as a number of extensions.

ms972957.mergerss_fig01S(en-us,MSDN.10).gif

Figure 1. RSS feed with extensions

The RSS 2.0 format is composed of a mandatory root element (rss) with a version attribute. It has a single child element (channel) that describes the content in the RSS feed, as well as one or more item elements. Each item element contains the actual data in the feed in a series of optional sub-elements. Some of the more common elements used in channel and item are described in the following two tables. For more details, and for the complete list, see the RSS specification listed in the Related Resources section at the end of this article.

Table 1. Sub-elements of the channel element

ElementDescription
titleRequired element that gives an RSS feed a unique name.
linkRequired element that gives a URL that provides addition information about the RSS feed. Generally the Web site where the RSS feed originates.
descriptionRequired element that provides a description of the purpose of the feed or its content.
languageOptional element identifying the language of the RSS feed. This element should use the standard two part (Language-culture) names for the language, as used by the System.Globalization namespace, such as en-US for United States English, or fr-CA for French Canadian.
lastBuildDateOptional element identifying the last time the feed was changed. This is in RFC 822 format.
pubDateOptional element identifying the last time the feed was generated. This is in RFC 822 format.
ttlOptional element that describes the number of minutes between updates.
generatorOptional element identifying the application used to generate the feed. This may be helpful when identifying the slight differences between feeds generated by the same source.

Table 2. Sub-elements of the item element

ElementDescription
titleThe title of this item. For example, the headline for a news item or title of a blog post.
linkThe URL that points to the item in question. Different blogs use this element in one of two subtlety different interpretations. Some use it to point at the news or blog item itself on the hosting Web site. Others use it to point at the item in question. For example, you could have an item on a news site discussing the latest decision by the U.S. President. Using the first format, the link would point at the news story about it on the news site. Using the second model, the description of the item would contain the news story, but the link would point at the Whitehouse press release (basically the original source) about the decision. It's the little differences like this that make processing RSS so much fun.
descriptionThe item in question. Again, this can be used in two compatible ways. Many RSS generators put the entire item in the description. That is, the description would include the entire news story. Others put only the abstract in the description element, and put the full item either externally, using the enclosure element, or internally using a body element from an added XHTML namespace.
pubDateDate and time of the posting. Different blogs use this value in slightly different ways. Other blogs don't use this value at all. See the I Hate Dates section below for details.
authorThe actual author of the posting. This is often used by blogs that are maintained by multiple people. The actual author of the posting can be recorded in this field.
categoryEach item element can contain zero or more category sub-elements. This enables the categorization of the feed. There is no standardized list, however, and each blog creator can make up their own, so it's up to the people reading the blog to determine that one person's "ASP.NET" is another person's, "Web Development (.NET)".
guidWhile many people make the mistake of thinking of this value as a GUID (Globally Unique Identifier), it is only the same concept. The idea of the guid element in RSS is that each posting has some unique value. It may be the URL to the posting, it may be an actual GUID, or it may be something else.
enclosureWhile part of the initial RSS 2.0 specification, the enclosure element has only recently begun to grow in popularity. The enclosure element is a URL to some other resource, such as a document, media file, or other large file. The reader then has the option of downloading this content when desired.

Aggregating multiple feeds is a process of downloading each feed, putting them into the correct order based on pubDate (most recent first). Obviously, if you have the data yourself, it is easy to merge the content into an aggregated feed. This is why many online aggregators retrieve and store the content in a database. This also enables searching, arbitrary feeds, and so on. However, I wanted to create a simple aggregator, without the additional dependency of a database or other storage requirements. Thus was born RssClient.

RssClient Design

RssClient is a .NET class that provides the ability to download one or more RSS feeds and merge them into a new RSS 2.0 feed. The intent was to build this class so that it could be used in either ASP.NET or Windows Forms applications. That is, it should be able to use ASP.NET caching when available, but quietly deal with the lack of a cache. In addition, I wanted it to be easy enough that other developers could use the control without having to read a lot of documentation.

I often tend to design classes such as this one in backwards. That is, I write the code I'd like the developer to write that would use the class. Then I use this to build the API. In the case of RssClient, I wanted to write code similar to the following.

Dim client As New RssClient("URL to RSS")
Dim rss As String = client.GetRss()

However, this did not solve my need to merge multiple feeds into one. I briefly toyed with the idea of using a ParamArray or array parameter, but discounted these ideas because I thought they were ugly, too complex, or didn't feel like they were similar to the other .NET API calls. Instead, I decided to allow users one of two methods for adding multiple feeds, using a StringCollection or assigning the feed list using OPML (Outline Processor Markup Language).

OPML is another XML grammar. It was originally designed as the format for an outline editor. However, it is commonly used for blogrolls, which are lists of blogs that someone finds interesting. OPML is also a common import/export format used by most RSS aggregators. Therefore, it seemed like a convenient format to use to load a number of RSS feeds. When OPML is used for this purpose, it is quite simple, consisting of a number of outline elements. Each outline element has a title and anxmlUrl and htmlUrl attribute. Here is a simple OPML file containing a number of ASP.NET feeds.

<?xml version="1.0" encoding="utf-8"?>
<opml>
  <body>
    <outline title="ASP.NET Headlines" 
      xmlUrl="http://www.asp.net/modules/articleRss.aspx?count=7&mid=64" 
      htmlUrl="http://www.asp.net" />
    <outline title="ASP.NET Developer Center on MSDN" 
      xmlUrl="http://msdn.microsoft.com/asp.net/rss.xml" 
      htmlUrl="http://msdn.microsoft.com/asp.net" />
    <outline title="Scott Guthrie's Weblog" 
      xmlUrl="http://weblogs.asp.net/scottgu/rss.aspx" 
      htmlUrl="http://weblogs.asp.net/scottgu" />
    <outline title="ASP.NET articles on 4GuysFromRolla " 
      xmlUrl="http://aspnet.4guysfromrolla.com/rss/rss.aspx" 
      htmlUrl="http://aspnet.4guysfromrolla.com/" />
    <outline title="ASP.NET items from KBAlertz" 
      xmlUrl="http://www.kbalertz.com/rss/aspnet.xml" 
      htmlUrl="http://www.kbalertz.com/technology_20.aspx" />
    <outline title="Brian Goldfarb's Weblog" 
      xmlUrl="http://weblogs.asp.net/bgold/rss.aspx" 
      htmlUrl="http://weblogs.asp.net/bgold" />
    <outline title="Shanku Niyogi's WebLog" 
      xmlUrl="http://weblogs.asp.net/shankun/Rss.aspx" 
      htmlUrl="http://weblogs.asp.net/shankun" />
    <outline title="ASP.NET articles on ASPAlliance" 
      xmlUrl="http://aspalliance.com/rss.aspx" 
      htmlUrl="http://aspalliance.com/ArticleListing.aspx?cId=1" />
  </body>
</opml>

My sample implementation code then became some combination of using the collection, or adding through an OPML file.

Dim client As New RssClient
client.RssFiles.Add("URL1")
client.RssFiles.Add("URL2")
client.LoadOpmlFile("path to an OPML file")
Dim rss As String = client.GetRss()

The other aspects of the control grew out of this and other experimentation. Table 3 describes the properties of the class, while Table 4 shows the public methods.

Table 3. Public Properties of RssClient

PropertyTypeDescription
RssFilesStringCollectionRead-only list of the URLs to the various RSS feeds to merge. This can either be used to add each RSS feed to the list, or the LoadOpmlFile method (see below) can add a number of items from a file.
CountIntegerNumber of items to include in the final RSS feed. Default is to retrieve the complete list of items from all the merged feeds.
TitleStringTitle to use for the merged RSS feed. The default is set to Merged RSS.
LinkStringURL to a site providing more information about the feed. The default is set to http://msdn.microsoft.com/aboutmsdn/rss, which is a page on MSDN discussing RSS.
DescriptionStringText describing the purpose of the RSS feed. The default is set to Merger of multiple RSS feeds.
CacheFileStringRelative path to a file that will be monitored for changes if used with ASP.NET. If this file is changed, the cache is invalidated, meaning that all feeds will be updated based on their current values. This is very useful if you need to get a rapid update and don't want to wait for the cached item to expire normally. The default value is a file named cache.file in the current directory.
CacheDurationIntegerNumber of minutes to hold each item in the cache if this class is used with ASP.NET. This reduces network traffic by not requesting the RSS feed more often than is necessary. While you could set this on the basis of the Time To Live (TTL) element in the RSS feed itself, it would also mean that you would only look for changes at this same rate. This could mean that a change would be ignored for a long period of time. It was due to these two decisions that 60 minutes was set as the default for this property.
ProxyNameStringThe address or name of the proxy server that will be used if the RssClient is behind a firewall.
ProxyPortStringString representation of the port number that would be used if the RssClient is behind a firewall. If ProxyName is set, but this property is not, it will default to port 80.

Table 4. Public methods of the RssClient

MethodDescription
LoadOpmlFileLoads one or more RSS feeds listed in an OPML (Outline Processor Markup Language) file. OPML is an XML grammar used to describe (among other things) a list of RSS feeds. These are commonly used by RSS aggregators as an interchange format, or as a blogroll. This method takes a string that represents a local, relative path to the OPML file. It then adds the listed RSS feeds to the RssFiles collection. A relative path was used to prevent possible security implications of someone downloading an external OPML file.
InvalidateCacheIf the RssClient is used with ASP.NET, the individual RSS feeds will be cached to prevent requesting the feeds too often. However, there may be cases where you would want to re-request a feed immediately. Perhaps you have made a change to the RSS that you want people to know about before the normal cache expires. In this case, you can call InvalidateCache to remove one or all items. If you use the override that takes a key (the URL of the item), only that item is removed from the cache. If you use the override without parameters, all stored RSS feeds are removed from the cache.
GetRssThe method that actually retrieves and merges the requested RSS feeds. It returns a string containing the merged RSS feeds. Again, there are two overrides of this method. One (no parameters) returns the entire merged feed containing all the items from each feed. The other (integer parameter) returns the desired count of items. Which of these two you use depends on the overall purpose of the feed. If all you want to do is display all changes, then use the no-parameter overridden version. However, as RSS feeds are typically small (10-15 items commonly), you may want to use the version that limits the overall size to a smaller number, keeping in mind that if you have a number of frequently changing RSS feeds, some items may be lost.

Now that you have seen the plan for the RssClient, it's time to start putting some code into these methods, and to write the private methods that will be used to actually perform the work.

RssClient Implementation

The bulk of the implementation of RssClient is composed of the property handlers and of the code required to download and merge the feeds. The property handlers themselves are quite simple, and are composed of private variables, and public property procedures to expose them. The only one that is slightly different is the RssFiles property. As this is a collection, it is created as a Read-Only property. Note that this does not mean that you cannot change the items stored in the collection, but only that the collection itself is fixed. You cannot create another StringCollection and assign it to this property. You must use the methods of RssFiles to add items to the collection.

Private _rssFiles As New StringCollection
Private _count As Integer

Public ReadOnly Property RssFiles() As StringCollection
        Get
            Return _rssFiles
        End Get
    End Property

    Public Property Count() As Integer
        Get
            Return _count
        End Get
        Set(ByVal Value As Integer)
            _count = Value
        End Set
    End Property

Setting Up Defaults in the Constructors

The constructors of RssClient are used to set the defaults for the properties. This ensures that the properties have at least some values. The client developer can change these values after creating an RssClient object.

    Public Sub New()
        Me.New("Merged RSS feeds", _
          "http://msdn.microsoft.com/aboutmsdn/rss", _
          "Merger of multiple RSS feeds")
    End Sub

    Public Sub New(ByVal title As String, _
      ByVal link As String, _
      ByVal description As String)
        _title = title
        _link = link
        _description = description

        'also set defaults for caching, proxy
        _count = 15
        _cacheFile = "cache.file"
        _cacheDuration = 60
        _proxyName = ""
        _proxyPort = ""
        _context = HttpContext.Current

    End Sub

As you can see, the default (no parameter) constructor simply uses the other constructor to carry out its work. This ensures that the default properties are only set in one place in case they change. One other item worth noting is the call to HttpContext.Current. If the system is running under ASP.NET, this will return the current HttpContext of the request. The returned HttpContext can then be used to retrieve the intrinsics stored in the context, such as Request, Response, Trace, and Cache. If the system is not running within ASP.NET, such as a Windows Forms application, this property returns Nothing (null for C#). We can use this value to avoid attempting to access the non-existent cache.

Making a List

As described above, there are two methods for adding items to the list of RSS feeds that will be merged—the RssFiles collection and LoadOpmlFile. The RssFiles collection is shown above. It is a simple wrapper around the StringCollection class. LoadOpmlFile also adds to this collection. This ensures that both procedures can be used to extend the same list.

The file is opened and loaded into an XmlTextReader. The XmlTextReader was used as it is faster, and has a lower memory requirement than the XmlDocument. The XmlTextReader scans the file, looking for outline elements with an xmlUrl attribute. The contents of this attribute should be a URL pointing at an RSS feed. This URL is added to the RssFiles collection for later processing.

Public Sub LoadOpmlFile(ByVal path As String)
    'loads the OPML file
    ' adds all of the items in it to the RSSFiles collection

    Dim strm As FileStream
    Dim reader As XmlTextReader

    Try
        strm = File.OpenRead(path)
        reader = New XmlTextReader(strm)
        Do While reader.Read
            If reader.Name = "outline" Then
                If reader.MoveToAttribute("xmlUrl") Then
                    Me.RssFiles.Add(reader.Value)
                End If
            End If
        Loop
    Finally
        reader.Close()
    End Try
End Sub

Being a Nice Network Citizen

Whenever you access data over a network, you should consider performing two things—asynchronous communication and caching. Asynchronous communication (using the BeginXXX/EndXXX methods) is useful when performing network access as such calls can take a while. If the call is made synchronously, the user interface of the client application will freeze while the call is being made. Asynchronous communication makes the user interface more responsive. Similar apparent performance improvements can be obtained by caching data. This reduces the number of required network calls, making your application more responsive at the expense of the freshness of the data.

For the RssClient, I decided to only use caching, and leave adding asynchronous functionality later (or as an exercise for the reader). Caching would provide two useful benefits:

  • It would improve the overall performance of the class because you would already have some RSS values in memory.
  • It would reduce the number of requests made to remote Web sites and the data transmitted over the wire.

The core routine that retrieves and merges the RSS is GetRss. There are two overrides of this function.

Public Function GetRss() As String
    'returns the generated RSS feed
    Dim size As Integer
    If Me.Count > 0 Then
        size = Me.Count
    Else
        size = Integer.MaxValue
    End If
    Return Me.GetRss(size)
End Function

Public Function GetRss(ByVal count As Integer) As String
    'returns the requested number of entries from the feed.
    Dim result As String = String.Empty
    Dim rssData As String
    Dim dependency As System.Web.Caching.CacheDependency

    'initialize the full list
    _fullList = New SortedList

    Me.Count = count

    For Each url As String In Me.RssFiles
        'check the cache first
        If Not _context Is Nothing Then
            rssData = _context.Cache.Item(url)
        Else
            rssData = Nothing
        End If

        If rssData Is Nothing Then
            'we need to download it first
            rssData = Me.DownloadFeed(url)

            'cache, and add the dependency
            If Not _context Is Nothing Then
                dependency = New _
                  System.Web.Caching.CacheDependency( _
                  _context.Server.MapPath(Me.CacheFile))
                _context.Cache.Insert(url, _
                  rssData, dependency, _
                  DateTime.Now.AddMinutes(Me.CacheDuration), _
                  TimeSpan.Zero)
            End If
        End If
        'merge it into the main list
        MergeRss(rssData)
    Next
    result = WriteMergedRss()
    Return result
End Function

The bulk of the processing is in the GetRss(int) version of this method. The version without parameters simply leverages the version that takes a count by passing in Int32.MaxValue (about 4 billion), assuming that you won't be creating a feed with more than this many entries.

The GetRss method begins by looping through the requested URLs listed in RssFiles. For each URL, we first attempt to retrieve the value from the ASP.NET cache. This improves the performance greatly for those situations. If it is not in the cache, or if we are not running under ASP.NET, we must download the contents of the feed. I'll look at how this is done shortly. After the RSS is downloaded, if the ASP.NET cache is available, the value is stored for later retrieval. In addition to simply adding to the cache, a time limit is set based on the CacheDuration. Once this value expires, the item is removed from the cache. Alternately, if you need to immediately expire it, a FileDependency is added. If the CacheFile changes, all items in the cache will be expired immediately. Finally, you can programmatically expire items using the InvalidateCache methods.

Once each RSS feed is retrieved, it is merged into the master list of available feeds. Finally, the final, merged RSS feed is written out and returned to the calling application.

The DownloadFeed method retrieves the content for each RSS feed, and returns it as a string for processing.

Private Function DownloadFeed(ByVal url As String) As String
    Dim result As String = String.Empty
    Dim client As WebRequest
    Dim reader As StreamReader
    Dim proxy As WebProxy

    Try
        Me.ProxyName = ConfigurationSettings.AppSettings.Item("proxyName")
        If Me.ProxyName <> String.Empty Then
            Me.ProxyPort = _
              ConfigurationSettings.AppSettings.Item("proxyPort")
            client.Proxy = New WebProxy(Me.ProxyName, CInt(Me.ProxyPort))
        End If

        client = WebRequest.Create(url)
        reader = New StreamReader(client.GetResponse.GetResponseStream)

        If Not reader Is Nothing Then
            result = reader.ReadToEnd
        End If
    Finally
        reader.Close()
    End Try

    Return result
End Function

The DownloadFeed routine is a fairly basic use of the WebRequest family of classes. By using WebRequest, rather than HttpWebRequest, we don't tie the application into using HTTP, and this prepares the application for new forms of WebRequests going forward. In addition, this enables the URL to point at either a resource on the Internet (using HttpWebRequest) or on the local machine (using FileWebRequest).

Getting the Merge On

Once each feed has been retrieved, we're ready to merge the items from the feed into our master list. Recall that we will be using a SortedList collection to store the items. This enables the easy sorting of the items by date. We can then walk the list in reverse order to create our feed.

Private Sub MergeRss(ByVal data As String)
    'merges the submitted RSS data into our sorted full list
    Dim nodes As XmlNodeList
    Dim doc As New XmlDocument
    Dim key As String
    Dim workDate As DateTime

    If data <> String.Empty Then
        doc.LoadXml(data)
        nodes = doc.SelectNodes("rss/channel/item")
        For Each node As XmlElement In nodes
            node = NormalizePublishDate(doc, node)
            workDate = _
    DateTime.Parse(node.GetElementsByTagName("pubDate").Item(0).InnerText)
            'add to list, it will sort by the date
            key = String.Format("{0}_{1}", _
                workDate.ToString("u"), _
                node.ChildNodes(0).InnerText)
            _fullList.Add(key, node)
        Next
    End If
End Sub

I would normally use an XmlReader at this stage to process each block of XML. However, as I want to do more work on each item (to normalize the date used), I will use the XmlDocument. This allows the application to store an entire item node at a time, and makes changing and/or adding pubDate elements easy. The complete list of elements is retrieved using SelectNodes, we then ensure that there is a valid pubDate element in each, holding the date format we're expecting. Then the complete item node, with all child nodes, is added to the SortedList. Initially, I was using the sortable format of the pubDate as a key in the list (using the "u" format string). However, I quickly found out that this wouldn't be sufficient. Some RSS feeds use a single date and time for a number of posts. Alternately, it is certainly possible that two feeds will have items that have been submitted at the same time. So, to solve this problem, I included the text of the first child node (generally title) as a unique key.

I Hate Dates

One of the issues that will likely arise when dealing with RSS files from multiple sources is in the date formatting. If all of your feeds come from the same source, or if you are creating the feeds yourself, you can probably skip the upcoming rant.

So far, I have seen three major formats for the dates in various RSS 2.0 feeds (don't even get me started on RSS 0.9x and 1.0 feeds).

  • pubDate based on RFC 822. This is the original version of pubDate as defined in the RSS 2.0 specification. It enables the user to define the date on the basis of GMT (Greenwich Mean Time), or a local time zone (in the U.S.). If the local time zone version is used, DateTime.Parse generates an exception. An example of RFC 822 time is Sat, 07 Sep 2002 00:00:01 GMT or Thu, 11 Nov 2004 14:11:47 PST.
  • pubDate based on RFC 1123. This RFC superseded RFC 822; it presents a more standardized version of the date format. The .NET Framework supports this format, using the "R" format string. It always formats the date based on GMT. For example, Thu, 11 Nov 2004 14:00:29 GMT.
  • Date based on the Dublin Core format. This is the same date proposed in RSS 1.0, and the Resource Description Framework (RDF), and as defined by ISO8601. It may be based on GMT (if it ends with the 'Z' character), or it may be based on the local time zone. The .NET Framework supports this format, using the "u" format string. It also has the advantage of being a very sortable time format; a feature we'll take advantage of to help sort our newly merged feed. An example of the date is 2004-11-24T16:03:24Z or 1994-11-05T08:15:30-05:00.

As you can see, we have a smallish problem—we need to determine which of the three formats are being used by each feed. Ideally, the merged feed should standardize these different formats, and I have arbitrarily decided on the RFC 1123 format. This is mostly because it is also supported by RFC 822, but easier to write due to the framework support. I started to work on this functionality, then I realized that there must be something out there already written. Sure enough, as part of the RSSBandit desktop aggregator, there is a DateTimeExt class that does just this task. I decided to incorporate it into the project. I would have just added a reference to the compiled DLL, but it had a few other dependencies I didn't want to bring along. The project includes this class, with some minor changes to remove some of the logging and other functionality that depended on other code in RSSBandit (and also translated into Visual Basic .NET for that version of the aggregator).

The dates themselves are normalized—that is, converted into a standard format based on the pubDate and RFC 1123—in the NormalizePublishDate routine.

    Private Function NormalizePublishDate(ByVal doc As XmlDocument, _
      ByRef itemNode As XmlElement) As XmlElement
        Dim workDate As DateTime
        Dim work As String
        Dim workNode As XmlElement
        'date may be in one of three main formats (that I've seen so far)
        '  this routine attempts to convert them all into a single
        '   pubDate based on RFC1123
        '   see the article for more complaints, 
        '   see RssBandit (http://rssbandit.sourceforge.net)
        '   for a more full featured implementation
        If itemNode.GetElementsByTagName("pubDate").Count > 0 Then
            workNode = itemNode.GetElementsByTagName("pubDate").Item(0)
            work = workNode.InnerText
        Else
            'we may have Dublin Core date
            workNode = itemNode.GetElementsByTagName("date", _
              "http://purl.org/dc/elements/1.1/").Item(0)
            work = workNode.InnerText()
            'we should also create a pubDate for this element 
            'for use later
            Dim newNode As XmlElement = doc.CreateElement("pubDate")
            workNode = itemNode.AppendChild(newNode)
        End If
        'use the RssComponent.DateTimeExt from RssBandit to parse
        workDate = RssComponents.DateTimeExt.Parse(work)
        workNode.InnerText = workDate.ToString("R")

        Return itemNode
    End Function

The code first assumes that a pubDate is used as this is the most common form of a publish date in RSS feeds. Failing this, the next most common Dublin Core date is extracted, and an empty pubDate element added. The DateTimeExt.Parse routine (from RSSBandit) can parse any of these three date formats, and convert to a normal DateTime object. It is then a simple matter of using the standard "R" date format to convert to a RFC 1123 compliant date. This ensures that all dates are consistent (and easier to parse with .NET in the future) in the new feed.

Out with the New

Finally, we're ready to write out the newly merged RSS feed. Recall that the items are already sorted by date. Therefore, by stepping through the list backwards, the most recent item will be the first item, and we can simply write out the correct number of items.

Private Function WriteMergedRss() As String
    Dim mem As New MemoryStream
    Dim writer As New XmlTextWriter(mem, Encoding.UTF8)
    Dim reader As StreamReader
    Dim result As String = String.Empty
    Dim item As XmlElement

    Try
        With writer
            .WriteStartDocument()
            .WriteStartElement("rss")
            .WriteAttributeString("version", "2.0")
            .WriteStartElement("channel")
            .WriteElementString("title", Me.Title)
            .WriteElementString("link", Me.Link)
            .WriteElementString("description", Me.Description)

            'write out items 
            ' in the inverse of the order in the SortedList
            ' See -- I told you there was a reason for the Sortedlist
            ' note that I'm only writing a subset of 
            ' possible RSS items here
            For i As Integer = _
              _fullList.Count To (_fullList.Count - Me.Count + 1) Step -1
                item = _fullList.GetByIndex(i - 1)

                .WriteStartElement("item")
                .WriteStartElement("title")
                .WriteString( _
                   item.GetElementsByTagName("title").Item(0).InnerText)
                .WriteEndElement() 'title
                .WriteStartElement("link")
                .WriteString( _
                   item.GetElementsByTagName("link").Item(0).InnerText)
                .WriteEndElement() 'link
                .WriteStartElement("description")
                .WriteString( _
               item.GetElementsByTagName("description").Item(0).InnerXml)
                .WriteEndElement() 'description
                .WriteStartElement("pubDate")
                .WriteString( _
                   item.GetElementsByTagName("pubDate").Item(0).InnerText)
                .WriteEndElement() 'pubDate

                .WriteEndElement() 'item
            Next

            .WriteEndElement() 'channel
            .WriteEndDocument()
            .Flush()
        End With

        Try
            'move memory stream back to beginning and read
            mem.Position = 0
            reader = New StreamReader(mem)
            result = reader.ReadToEnd
        Finally
            reader.Close()
        End Try
    Finally
        writer.Close()
    End Try

    Return result
End Function

XmlWriter is used here to provide a fast, low memory means of writing out the XML. You have two convenient choices when using XmlWriter for in memory writing. You can either use the form of the XmlWriter to a StringWriter, or use (as I have here) a MemoryStream. Using a StringWriter is certainly easier, but the resulting XML is encoded as UTF-16, and we want it to be encoded as UTF-8. Therefore, the code writes to a MemoryStream, and a StreamReader is used to retrieve the content. When doing this the first time, I couldn't seem to get any data returned. This is because after writing all the data to the MemoryStream, the current position is at the end of the stream. Therefore, reads will return no data. So, whenever you want to write to a Stream for later reading, always remember to set the Position back to the beginning of the Stream before you read.

Testing RssClient

The downloads for this article include two simple sample projects—one for Windows Forms and one for ASP.NET. The samples load a test OPML file and add another common RSS feed, then download and display the merged feed.

ms972957.mergerss_fig02(en-us,MSDN.10).gif

Figure 2. Windows Test Application

ms972957.mergerss_fig03(en-us,MSDN.10).gif

Figure 3. ASP.NET Test Application

Conclusion

While not everyone will need to merge multiple RSS feeds for an application, it is a handy tool to keep around. Once we had it available on MSDN, we began to find more and more uses for it. First as a means of creating feeds of team members, then as a means of creating topic RSS feeds, based off of feeds such as recent articles, important team member and/or non-Microsoft blogs, upcoming relevant webcasts, and more. By providing the merged feeds with a common scope, it makes it easier for people to find the information they need in one place.

Related Resources

Kent Sharkey is the Content Strategist for ASP.NET and Visual Studio content on MSDN. He is thinking of changing his name to "Really Simple" Sharkey to get some appropriate initials for the future. When he's not reading your e-mails, he tries to write and sleep (just not at the same time).

Show:
© 2016 Microsoft