E Pluriblog Unum: Merging RSS Feeds
Kent Sharkey
MSDN Content Strategist
December 2004
Summary: Kent Sharkey describes one technique for merging multiple RSS feeds into a new RSS feed, which enables merging common data into a single source to make it easier for users to find the information they want. (20 printed pages)
C# code sample
Download the MSDNMergeRssCS.msi file.
Visual Basic code sample
Download the MSDNMergeRssVB.msi file.
Contents
Introduction
What Is RSS?
RssClient Design
RssClient Implementation
Conclusion
Related Resources
Introduction
RSS (Really Simple Syndication) has become a popular format not only for syndicating weblogs (or blogs as they are more commonly known), but as a generic XML format for tracking updates to Web sites or applications as well. However, what if you have a number of feeds that you want to aggregate or merge? Perhaps the individual feeds themselves don't update frequently enough, or maybe you want to gather together a number of blogs into one place for convenience. This article shows one method of merging multiple RSS feeds to create a new feed. On MSDN, we use a similar technique to create team blogs that display entries from the various people that make up each product team.
What Is RSS?
RSS is an XML grammar used to syndicate (that is, distribute) content. It is a surprisingly simple format, and like other simple formats before it (such as HTML or SOAP), it has grown to find a number of uses. Many people associate RSS as the distribution format of blogs. However, it is a much more generic tool. RSS can be used whenever you have changing data as a means of providing a list of changes. For example, a news service can expose the headlines as RSS. Clients of that RSS, whether they are desktop RSS Aggregators (such as RSSBandit or SharpReader), online RSS aggregators (such as NewsGator or Bloglines), or other Web sites can interpret this data to display.
An RSS file is actually quite simple and creating them by hand isn't that difficult. Most of the elements are optional, so you can simply use the ones you're most interested in. In addition, RSS 2.0 supports namespaces, enabling the creation of extensions to RSS. Because of the looseness of the RSS 2.0 specification and the addition of extensions, as well as the previous (0.9x and 1.0) of RSS, there can be quite a bit of variety in the actual information in an RSS feed. This complicates the parsing of RSS because you can't leverage a single XML Schema for validation. Figure 1 is a portion of the RSS feed from http://blogs.msdn.com, a site used by many Microsoft bloggers. You can see the standard RSS elements in this feed, as well as a number of extensions.
Figure 1. RSS feed with extensions
The RSS 2.0 format is composed of a mandatory root element (rss) with a version attribute. It has a single child element (channel) that describes the content in the RSS feed, as well as one or more item elements. Each item element contains the actual data in the feed in a series of optional sub-elements. Some of the more common elements used in channel and item are described in the following two tables. For more details, and for the complete list, see the RSS specification listed in the Related Resources section at the end of this article.
Table 1. Sub-elements of the channel element
| Element | Description |
|---|---|
| title | Required element that gives an RSS feed a unique name. |
| link | Required element that gives a URL that provides addition information about the RSS feed. Generally the Web site where the RSS feed originates. |
| description | Required element that provides a description of the purpose of the feed or its content. |
| language | Optional element identifying the language of the RSS feed. This element should use the standard two part (Language-culture) names for the language, as used by the System.Globalization namespace, such as en-US for United States English, or fr-CA for French Canadian. |
| lastBuildDate | Optional element identifying the last time the feed was changed. This is in RFC 822 format. |
| pubDate | Optional element identifying the last time the feed was generated. This is in RFC 822 format. |
| ttl | Optional element that describes the number of minutes between updates. |
| generator | Optional element identifying the application used to generate the feed. This may be helpful when identifying the slight differences between feeds generated by the same source. |
Table 2. Sub-elements of the item element
| Element | Description |
|---|---|
| title | The title of this item. For example, the headline for a news item or title of a blog post. |
| link | The URL that points to the item in question. Different blogs use this element in one of two subtlety different interpretations. Some use it to point at the news or blog item itself on the hosting Web site. Others use it to point at the item in question. For example, you could have an item on a news site discussing the latest decision by the U.S. President. Using the first format, the link would point at the news story about it on the news site. Using the second model, the description of the item would contain the news story, but the link would point at the Whitehouse press release (basically the original source) about the decision. It's the little differences like this that make processing RSS so much fun. |
| description | The item in question. Again, this can be used in two compatible ways. Many RSS generators put the entire item in the description. That is, the description would include the entire news story. Others put only the abstract in the description element, and put the full item either externally, using the enclosure element, or internally using a body element from an added XHTML namespace. |
| pubDate | Date and time of the posting. Different blogs use this value in slightly different ways. Other blogs don't use this value at all. See the I Hate Dates section below for details. |
| author | The actual author of the posting. This is often used by blogs that are maintained by multiple people. The actual author of the posting can be recorded in this field. |
| category | Each item element can contain zero or more category sub-elements. This enables the categorization of the feed. There is no standardized list, however, and each blog creator can make up their own, so it's up to the people reading the blog to determine that one person's "ASP.NET" is another person's, "Web Development (.NET)". |
| guid | While many people make the mistake of thinking of this value as a GUID (Globally Unique Identifier), it is only the same concept. The idea of the guid element in RSS is that each posting has some unique value. It may be the URL to the posting, it may be an actual GUID, or it may be something else. |
| enclosure | While part of the initial RSS 2.0 specification, the enclosure element has only recently begun to grow in popularity. The enclosure element is a URL to some other resource, such as a document, media file, or other large file. The reader then has the option of downloading this content when desired. |
Aggregating multiple feeds is a process of downloading each feed, putting them into the correct order based on pubDate (most recent first). Obviously, if you have the data yourself, it is easy to merge the content into an aggregated feed. This is why many online aggregators retrieve and store the content in a database. This also enables searching, arbitrary feeds, and so on. However, I wanted to create a simple aggregator, without the additional dependency of a database or other storage requirements. Thus was born RssClient.
RssClient Design
RssClient is a .NET class that provides the ability to download one or more RSS feeds and merge them into a new RSS 2.0 feed. The intent was to build this class so that it could be used in either ASP.NET or Windows Forms applications. That is, it should be able to use ASP.NET caching when available, but quietly deal with the lack of a cache. In addition, I wanted it to be easy enough that other developers could use the control without having to read a lot of documentation.
I often tend to design classes such as this one in backwards. That is, I write the code I'd like the developer to write that would use the class. Then I use this to build the API. In the case of RssClient, I wanted to write code similar to the following.
Dim client As New RssClient("URL to RSS")
Dim rss As String = client.GetRss()
However, this did not solve my need to merge multiple feeds into one. I briefly toyed with the idea of using a ParamArray or array parameter, but discounted these ideas because I thought they were ugly, too complex, or didn't feel like they were similar to the other .NET API calls. Instead, I decided to allow users one of two methods for adding multiple feeds, using a StringCollection or assigning the feed list using OPML (Outline Processor Markup Language).
OPML is another XML grammar. It was originally designed as the format for an outline editor. However, it is commonly used for blogrolls, which are lists of blogs that someone finds interesting. OPML is also a common import/export format used by most RSS aggregators. Therefore, it seemed like a convenient format to use to load a number of RSS feeds. When OPML is used for this purpose, it is quite simple, consisting of a number of outline elements. Each outline element has a title and anxmlUrl and htmlUrl attribute. Here is a simple OPML file containing a number of ASP.NET feeds.
<?xml version="1.0" encoding="utf-8"?>
<opml>
<body>
<outline title="ASP.NET Headlines"
xmlUrl="http://www.asp.net/modules/articleRss.aspx?count=7&mid=64"
htmlUrl="http://www.asp.net" />
<outline title="ASP.NET Developer Center on MSDN"
xmlUrl="http://msdn.microsoft.com/asp.net/rss.xml"
htmlUrl="http://msdn.microsoft.com/asp.net" />
<outline title="Scott Guthrie's Weblog"
xmlUrl="http://weblogs.asp.net/scottgu/rss.aspx"
htmlUrl="http://weblogs.asp.net/scottgu" />
<outline title="ASP.NET articles on 4GuysFromRolla "
xmlUrl="http://aspnet.4guysfromrolla.com/rss/rss.aspx"
htmlUrl="http://aspnet.4guysfromrolla.com/" />
<outline title="ASP.NET items from KBAlertz"
xmlUrl="http://www.kbalertz.com/rss/aspnet.xml"
htmlUrl="http://www.kbalertz.com/technology_20.aspx" />
<outline title="Brian Goldfarb's Weblog"
xmlUrl="http://weblogs.asp.net/bgold/rss.aspx"
htmlUrl="http://weblogs.asp.net/bgold" />
<outline title="Shanku Niyogi's WebLog"
xmlUrl="http://weblogs.asp.net/shankun/Rss.aspx"
htmlUrl="http://weblogs.asp.net/shankun" />
<outline title="ASP.NET articles on ASPAlliance"
xmlUrl="http://aspalliance.com/rss.aspx"
htmlUrl="http://aspalliance.com/ArticleListing.aspx?cId=1" />
</body>
</opml>
My sample implementation code then became some combination of using the collection, or adding through an OPML file.
Dim client As New RssClient
client.RssFiles.Add("URL1")
client.RssFiles.Add("URL2")
client.LoadOpmlFile("path to an OPML file")
Dim rss As String = client.GetRss()
The other aspects of the control grew out of this and other experimentation. Table 3 describes the properties of the class, while Table 4 shows the public methods.
Table 3. Public Properties of RssClient
| Property | Type | Description |
|---|---|---|
| RssFiles | StringCollection | Read-only list of the URLs to the various RSS feeds to merge. This can either be used to add each RSS feed to the list, or the LoadOpmlFile method (see below) can add a number of items from a file. |
| Count | Integer | Number of items to include in the final RSS feed. Default is to retrieve the complete list of items from all the merged feeds. |
| Title | String | Title to use for the merged RSS feed. The default is set to Merged RSS. |
| Link | String | URL to a site providing more information about the feed. The default is set to http://msdn.microsoft.com/aboutmsdn/rss, which is a page on MSDN discussing RSS. |
| Description | String | Text describing the purpose of the RSS feed. The default is set to Merger of multiple RSS feeds. |
| CacheFile | String | Relative path to a file that will be monitored for changes if used with ASP.NET. If this file is changed, the cache is invalidated, meaning that all feeds will be updated based on their current values. This is very useful if you need to get a rapid update and don't want to wait for the cached item to expire normally. The default value is a file named cache.file in the current directory. |
| CacheDuration | Integer | Number of minutes to hold each item in the cache if this class is used with ASP.NET. This reduces network traffic by not requesting the RSS feed more often than is necessary. While you could set this on the basis of the Time To Live (TTL) element in the RSS feed itself, it would also mean that you would only look for changes at this same rate. This could mean that a change would be ignored for a long period of time. It was due to these two decisions that 60 minutes was set as the default for this property. |
| ProxyName | String | The address or name of the proxy server that will be used if the RssClient is behind a firewall. |
| ProxyPort | String | String representation of the port number that would be used if the RssClient is behind a firewall. If ProxyName is set, but this property is not, it will default to port 80. |
Table 4. Public methods of the RssClient
| Method | Description |
|---|---|
| LoadOpmlFile | Loads one or more RSS feeds listed in an OPML (Outline Processor Markup Language) file. OPML is an XML grammar used to describe (among other things) a list of RSS feeds. These are commonly used by RSS aggregators as an interchange format, or as a blogroll. This method takes a string that represents a local, relative path to the OPML file. It then adds the listed RSS feeds to the RssFiles collection. A relative path was used to prevent possible security implications of someone downloading an external OPML file. |
| InvalidateCache | If the RssClient is used with ASP.NET, the individual RSS feeds will be cached to prevent requesting the feeds too often. However, there may be cases where you would want to re-request a feed immediately. Perhaps you have made a change to the RSS that you want people to know about before the normal cache expires. In this case, you can call InvalidateCache to remove one or all items. If you use the override that takes a key (the URL of the item), only that item is removed from the cache. If you use the override without parameters, all stored RSS feeds are removed from the cache. |
| GetRss | The method that actually retrieves and merges the requested RSS feeds. It returns a string containing the merged RSS feeds. Again, there are two overrides of this method. One (no parameters) returns the entire merged feed containing all the items from each feed. The other (integer parameter) returns the desired count of items. Which of these two you use depends on the overall purpose of the feed. If all you want to do is display all changes, then use the no-parameter overridden version. However, as RSS feeds are typically small (10-15 items commonly), you may want to use the version that limits the overall size to a smaller number, keeping in mind that if you have a number of frequently changing RSS feeds, some items may be lost. |
Now that you have seen the plan for the RssClient, it's time to start putting some code into these methods, and to write the private methods that will be used to actually perform the work.
RssClient Implementation
The bulk of the implementation of RssClient is composed of the property handlers and of the code required to download and merge the feeds. The property handlers themselves are quite simple, and are composed of private variables, and public property procedures to expose them. The only one that is slightly different is the RssFiles property. As this is a collection, it is created as a Read-Only property. Note that this does not mean that you cannot change the items stored in the collection, but only that the collection itself is fixed. You cannot create another StringCollection and assign it to this property. You must use the methods of RssFiles to add items to the collection.
Private _rssFiles As New StringCollection
Private _count As Integer
Public ReadOnly Property RssFiles() As StringCollection
Get
Return _rssFiles
End Get
End Property
Public Property Count() As Integer
Get
Return _count
End Get
Set(ByVal Value As Integer)
_count = Value
End Set
End Property
Setting Up Defaults in the Constructors
The constructors of RssClient are used to set the defaults for the properties. This ensures that the properties have at least some values. The client developer can change these values after creating an RssClient object.
Public Sub New()
Me.New("Merged RSS feeds", _
"http://msdn.microsoft.com/aboutmsdn/rss", _
"Merger of multiple RSS feeds")
End Sub
Public Sub New(ByVal title As String, _
ByVal link As String, _
ByVal description As String)
_title = title
_link = link
_description = description
'also set defaults for caching, proxy
_count = 15
_cacheFile = "cache.file"
_cacheDuration = 60
_proxyName = ""
_proxyPort = ""
_context = HttpContext.Current
End Sub
As you can see, the default (no parameter) constructor simply uses the other constructor to carry out its work. This ensures that the default properties are only set in one place in case they change. One other item worth noting is the call to HttpContext.Current. If the system is running under ASP.NET, this will return the current HttpContext of the request. The returned HttpContext can then be used to retrieve the intrinsics stored in the context, such as Request, Response, Trace, and Cache. If the system is not running within ASP.NET, such as a Windows Forms application, this property returns Nothing (null for C#). We can use this value to avoid attempting to access the non-existent cache.
Making a List
As described above, there are two methods for adding items to the list of RSS feeds that will be merged—the RssFiles collection and LoadOpmlFile. The RssFiles collection is shown above. It is a simple wrapper around the StringCollection class. LoadOpmlFile also adds to this collection. This ensures that both procedures can be used to extend the same list.
The file is opened and loaded into an XmlTextReader. The XmlTextReader was used as it is faster, and has a lower memory requirement than the XmlDocument. The XmlTextReader scans the file, looking for outline elements with an xmlUrl attribute. The contents of this attribute should be a URL pointing at an RSS feed. This URL is added to the RssFiles collection for later processing.
Public Sub LoadOpmlFile(ByVal path As String)
'loads the OPML file
' adds all of the items in it to the RSSFiles collection
Dim strm As FileStream
Dim reader As XmlTextReader
Try
strm = File.OpenRead(path)
reader = New XmlTextReader(strm)
Do While reader.Read
If reader.Name = "outline" Then
If reader.MoveToAttribute("xmlUrl") Then
Me.RssFiles.Add(reader.Value)
End If
End If
Loop
Finally
reader.Close()
End Try
End Sub
Being a Nice Network Citizen
Whenever you access data over a network, you should consider performing two things—asynchronous communication and caching. Asynchronous communication (using the BeginXXX/EndXXX methods) is useful when performing network access as such calls can take a while. If the call is made synchronously, the user interface of the client application will freeze while the call is being made. Asynchronous communication makes the user interface more responsive. Similar apparent performance improvements can be obtained by caching data. This reduces the number of required network calls, making your application more responsive at the expense of the freshness of the data.
For the RssClient, I decided to only use caching, and leave adding asynchronous functionality later (or as an exercise for the reader). Caching would provide two useful benefits:
- It would improve the overall performance of the class because you would already have some RSS values in memory.
- It would reduce the number of requests made to remote Web sites and the data transmitted over the wire.
The core routine that retrieves and merges the RSS is GetRss. There are two overrides of this function.
Public Function GetRss() As String
'returns the generated RSS feed
Dim size As Integer
If Me.Count > 0 Then
size = Me.Count
Else
size = Integer.MaxValue
End If
Return Me.GetRss(size)
End Function
Public Function GetRss(ByVal count As Integer) As String
'returns the requested number of entries from the feed.
Dim result As String = String.Empty
Dim rssData As String
Dim dependency As System.Web.Caching.CacheDependency
'initialize the full list
_fullList = New SortedList
Me.Count = count
For Each url As String In Me.RssFiles
'check the cache first
If Not _context Is Nothing Then
rssData = _context.Cache.Item(url)
Else
rssData = Nothing
End If
If rssData Is Nothing Then
'we need to download it first
rssData = Me.DownloadFeed(url)
'cache, and add the dependency
If Not _context Is Nothing Then
dependency = New _
System.Web.Caching.CacheDependency( _
_context.Server.MapPath(Me.CacheFile))
_context.Cache.Insert(url, _
rssData, dependency, _
DateTime.Now.AddMinutes(Me.CacheDuration), _
TimeSpan.Zero)
End If
End If
'merge it into the main list
MergeRss(rssData)
Next
result = WriteMergedRss()
Return result
End Function
The bulk of the processing is in the GetRss(int) version of this method. The version without parameters simply leverages the version that takes a count by passing in Int32.MaxValue (about 4 billion), assuming that you won't be creating a feed with more than this many entries.
The GetRss method begins by looping through the requested URLs listed in RssFiles. For each URL, we first attempt to retrieve the value from the ASP.NET cache. This improves the performance greatly for those situations. If it is not in the cache, or if we are not running under ASP.NET, we must download the contents of the feed. I'll look at how this is done shortly. After the RSS is downloaded, if the ASP.NET cache is available, the value is stored for later retrieval. In addition to simply adding to the cache, a time limit is set based on the CacheDuration. Once this value expires, the item is removed from the cache. Alternately, if you need to immediately expire it, a FileDependency is added. If the CacheFile changes, all items in the cache will be expired immediately. Finally, you can programmatically expire items using the InvalidateCache methods.
Once each RSS feed is retrieved, it is merged into the master list of available feeds. Finally, the final, merged RSS feed is written out and returned to the calling application.
The DownloadFeed method retrieves the content for each RSS feed, and returns it as a string for processing.
Private Function DownloadFeed(ByVal url As String) As String
Dim result As String = String.Empty
Dim client As WebRequest
Dim reader As StreamReader
Dim proxy As WebProxy
Try
Me.ProxyName = ConfigurationSettings.AppSettings.Item("proxyName")
If Me.ProxyName <> String.Empty Then
Me.ProxyPort = _
ConfigurationSettings.AppSettings.Item("proxyPort")
client.Proxy = New WebProxy(Me.ProxyName, CInt(Me.ProxyPort))
End If
client = WebRequest.Create(url)
reader = New StreamReader(client.GetResponse.GetResponseStream)
If Not reader Is Nothing Then
result = reader.ReadToEnd
End If
Finally
reader.Close()
End Try
Return result
End Function
The DownloadFeed routine is a fairly basic use of the WebRequest family of classes. By using WebRequest, rather than HttpWebRequest, we don't tie the application into using HTTP, and this prepares the application for new forms of WebRequests going forward. In addition, this enables the URL to point at either a resource on the Internet (using HttpWebRequest) or on the local machine (using FileWebRequest).
Getting the Merge On
Once each feed has been retrieved, we're ready to merge the items from the feed into our master list. Recall that we will be using a SortedList collection to store the items. This enables the easy sorting of the items by date. We can then walk the list in reverse order to create our feed.
Private Sub MergeRss(ByVal data As String)
'merges the submitted RSS data into our sorted full list
Dim nodes As XmlNodeList
Dim doc As New XmlDocument
Dim key As String
Dim workDate As DateTime
If data <> String.Empty Then
doc.LoadXml(data)
nodes = doc.SelectNodes("rss/channel/item")
For Each node As XmlElement In nodes
node = NormalizePublishDate(doc, node)
workDate = _
DateTime.Parse(node.GetElementsByTagName("pubDate").Item(0).InnerText)
'add to list, it will sort by the date
key = String.Format("{0}_{1}", _
workDate.ToString("u"), _
node.ChildNodes(0).InnerText)
_fullList.Add(key, node)
Next
End If
End Sub
I would normally use an XmlReader at this stage to process each block of XML. However, as I want to do more work on each item (to normalize the date used), I will use the XmlDocument. This allows the application to store an entire item node at a time, and makes changing and/or adding pubDate elements easy. The complete list of elements is retrieved using SelectNodes, we then ensure that there is a valid pubDate element in each, holding the date format we're expecting. Then the complete item node, with all child nodes, is added to the SortedList. Initially, I was using the sortable format of the pubDate as a key in the list (using the "u" format string). However, I quickly found out that this wouldn't be sufficient. Some RSS feeds use a single date and time for a number of posts. Alternately, it is certainly possible that two feeds will have items that have been submitted at the same time. So, to solve this problem, I included the text of the first child node (generally title) as a unique key.
I Hate Dates
One of the issues that will likely arise when dealing with RSS files from multiple sources is in the date formatting. If all of your feeds come from the same source, or if you are creating the feeds yourself, you can probably skip the upcoming rant.
So far, I have seen three major formats for the dates in various RSS 2.0 feeds (don't even get me started on RSS 0.9x and 1.0 feeds).
- pubDate based on RFC 822. This is the original version of pubDate as defined in the RSS 2.0 specification. It enables the user to define the date on the basis of GMT (Greenwich Mean Time), or a local time zone (in the U.S.). If the local time zone version is used, DateTime.Parse generates an exception. An example of RFC 822 time is Sat, 07 Sep 2002 00:00:01 GMT or Thu, 11 Nov 2004 14:11:47 PST.
- pubDate based on RFC 1123. This RFC superseded RFC 822; it presents a more standardized version of the date format. The .NET Framework supports this format, using the "R" format string. It always formats the date based on GMT. For example, Thu, 11 Nov 2004 14:00:29 GMT.
- Date based on the Dublin Core format. This is the same date proposed in RSS 1.0, and the Resource Description Framework (RDF), and as defined by ISO8601. It may be based on GMT (if it ends with the 'Z' character), or it may be based on the local time zone. The .NET Framework supports this format, using the "u" format string. It also has the advantage of being a very sortable time format; a feature we'll take advantage of to help sort our newly merged feed. An example of the date is 2004-11-24T16:03:24Z or 1994-11-05T08:15:30-05:00.
As you can see, we have a smallish problem—we need to determine which of the three formats are being used by each feed. Ideally, the merged feed should standardize these different formats, and I have arbitrarily decided on the RFC 1123 format. This is mostly because it is also supported by RFC 822, but easier to write due to the framework support. I started to work on this functionality, then I realized that there must be something out there already written. Sure enough, as part of the RSSBandit desktop aggregator, there is a DateTimeExt class that does just this task. I decided to incorporate it into the project. I would have just added a reference to the compiled DLL, but it had a few other dependencies I didn't want to bring along. The project includes this class, with some minor changes to remove some of the logging and other functionality that depended on other code in RSSBandit (and also translated into Visual Basic .NET for that version of the aggregator).
The dates themselves are normalized—that is, converted into a standard format based on the pubDate and RFC 1123—in the NormalizePublishDate routine.
Private Function NormalizePublishDate(ByVal doc As XmlDocument, _
ByRef itemNode As XmlElement) As XmlElement
Dim workDate As DateTime
Dim work As String
Dim workNode As XmlElement
'date may be in one of three main formats (that I've seen so far)
' this routine attempts to convert them all into a single
' pubDate based on RFC1123
' see the article for more complaints,
' see RssBandit (http://rssbandit.sourceforge.net)
' for a more full featured implementation
If itemNode.GetElementsByTagName("pubDate").Count > 0 Then
workNode = itemNode.GetElementsByTagName("pubDate").Item(0)
work = workNode.InnerText
Else
'we may have Dublin Core date
workNode = itemNode.GetElementsByTagName("date", _
"http://purl.org/dc/elements/1.1/").Item(0)
work = workNode.InnerText()
'we should also create a pubDate for this element
'for use later
Dim newNode As XmlElement = doc.CreateElement("pubDate")
workNode = itemNode.AppendChild(newNode)
End If
'use the RssComponent.DateTimeExt from RssBandit to parse
workDate = RssComponents.DateTimeExt.Parse(work)
workNode.InnerText = workDate.ToString("R")
Return itemNode
End Function
The code first assumes that a pubDate is used as this is the most common form of a publish date in RSS feeds. Failing this, the next most common Dublin Core date is extracted, and an empty pubDate element added. The DateTimeExt.Parse routine (from RSSBandit) can parse any of these three date formats, and convert to a normal DateTime object. It is then a simple matter of using the standard "R" date format to convert to a RFC 1123 compliant date. This ensures that all dates are consistent (and easier to parse with .NET in the future) in the new feed.
Out with the New
Finally, we're ready to write out the newly merged RSS feed. Recall that the items are already sorted by date. Therefore, by stepping through the list backwards, the most recent item will be the first item, and we can simply write out the correct number of items.
Private Function WriteMergedRss() As String
Dim mem As New MemoryStream
Dim writer As New XmlTextWriter(mem, Encoding.UTF8)
Dim reader As StreamReader
Dim result As String = String.Empty
Dim item As XmlElement
Try
With writer
.WriteStartDocument()
.WriteStartElement("rss")
.WriteAttributeString("version", "2.0")
.WriteStartElement("channel")
.WriteElementString("title", Me.Title)
.WriteElementString("link", Me.Link)
.WriteElementString("description", Me.Description)
'write out items
' in the inverse of the order in the SortedList
' See -- I told you there was a reason for the Sortedlist
' note that I'm only writing a subset of
' possible RSS items here
For i As Integer = _
_fullList.Count To (_fullList.Count - Me.Count + 1) Step -1
item = _fullList.GetByIndex(i - 1)
.WriteStartElement("item")
.WriteStartElement("title")
.WriteString( _
item.GetElementsByTagName("title").Item(0).InnerText)
.WriteEndElement() 'title
.WriteStartElement("link")
.WriteString( _
item.GetElementsByTagName("link").Item(0).InnerText)
.WriteEndElement() 'link
.WriteStartElement("description")
.WriteString( _
item.GetElementsByTagName("description").Item(0).InnerXml)
.WriteEndElement() 'description
.WriteStartElement("pubDate")
.WriteString( _
item.GetElementsByTagName("pubDate").Item(0).InnerText)
.WriteEndElement() 'pubDate
.WriteEndElement() 'item
Next
.WriteEndElement() 'channel
.WriteEndDocument()
.Flush()
End With
Try
'move memory stream back to beginning and read
mem.Position = 0
reader = New StreamReader(mem)
result = reader.ReadToEnd
Finally
reader.Close()
End Try
Finally
writer.Close()
End Try
Return result
End Function
XmlWriter is used here to provide a fast, low memory means of writing out the XML. You have two convenient choices when using XmlWriter for in memory writing. You can either use the form of the XmlWriter to a StringWriter, or use (as I have here) a MemoryStream. Using a StringWriter is certainly easier, but the resulting XML is encoded as UTF-16, and we want it to be encoded as UTF-8. Therefore, the code writes to a MemoryStream, and a StreamReader is used to retrieve the content. When doing this the first time, I couldn't seem to get any data returned. This is because after writing all the data to the MemoryStream, the current position is at the end of the stream. Therefore, reads will return no data. So, whenever you want to write to a Stream for later reading, always remember to set the Position back to the beginning of the Stream before you read.
Testing RssClient
The downloads for this article include two simple sample projects—one for Windows Forms and one for ASP.NET. The samples load a test OPML file and add another common RSS feed, then download and display the merged feed.

Figure 2. Windows Test Application

Figure 3. ASP.NET Test Application
Conclusion
While not everyone will need to merge multiple RSS feeds for an application, it is a handy tool to keep around. Once we had it available on MSDN, we began to find more and more uses for it. First as a means of creating feeds of team members, then as a means of creating topic RSS feeds, based off of feeds such as recent articles, important team member and/or non-Microsoft blogs, upcoming relevant webcasts, and more. By providing the merged feeds with a common scope, it makes it easier for people to find the information they need in one place.
Related Resources
- RSS 2.0 Specification
- Creating an Online RSS News Aggregator with ASP.NET by Scott Mitchell
- Building a Desktop News Aggregator by Dare Obasanjo
- Developing Priorities: Fun First by Duncan Mackenzie
- Content Syndication with RSS by Ben Hammersley
Kent Sharkey is the Content Strategist for ASP.NET and Visual Studio content on MSDN. He is thinking of changing his name to "Really Simple" Sharkey to get some appropriate initials for the future. When he's not reading your e-mails, he tries to write and sleep (just not at the same time).