Emily P. Lewis | June 21, 2010
Back in 1998, I was working as a Marketing Coordinator for an accounting firm. I was responsible for designing and writing content for our client newsletters, amongst a myriad of other print-focused activities. One day, my boss asked me to take a look at our corporate web site and rework the content.
As I went through the editing and re-writing process, I became increasingly interested in what was behind the scenes on the site. A few searches on the internet and I learned of something called HTML. And one fateful day, I opened Notepad and typed:
<h1>This is a test</h1>
I saved the text file with an .html extension, opened it in Internet Explorer, and was amazed. It was magic! I could publish instantly! And my world changed. I left marketing, began to teach myself HTML (and, later, CSS) and started a new adventure as a web professional.
Twelve years later, markup continues to hold this magical fascination for me. From my perspective, it is the foundation of the web. It is beautiful in its simplicity. And, despite my ever-growing skill set and today’s constantly-changing technologies, I still have an intense fascination with markup. For every site I develop, I try to write the leanest, most efficient and most semantic markup possible. And let’s not forget valid.
What saddens me is that not everyone has this commitment to good markup. Web sites are plagued with the diseases of tag soup, div-itis and span mania. And, frankly, I don’t understand why.
Sure, I get that there are constraints. Every designer and developer has them, whether it is a legacy system that limits the ability to change markup, or a boss who would rather allocate time and resources to something other than front-end development. But working within those constraints to develop excellent sites is the hallmark of a true professional.
Yet most of the people I’ve met who work in the web industry are true professionals. They are passionate and committed to their work, continually striving to create great web sites and applications. So, why do so many of them forget about the power of good markup? What I’ve come to realize is that it isn’t just about constraints. It isn’t about treating HTML like a red-headed stepchild. It isn’t even laziness (though I admit I think that sometimes).
The problem must be a lack of understanding and knowledge. Fortunately, that is something I can help change by taking you on a little journey in this article. We’ll explore what semantic markup is and looks like, as well as the benefits it offers. We’ll also take a look at how see how you can take semantics beyond your markup into CSS. And, lastly, we’ll see how semantic markup can be extended to provide valuable meaning to machines
Semantic markup is markup that has meaning; markup that describes the content it contains, rather than what that content should look like. This means no more <font> tags, no more <b> tags, no more <table> elements used for layout.
Instead, semantic markup encourages you to think about the content first. So, if you have a list, you should use a list element such as <ul>. If you have a paragraph, you should use <p> instead of <div>. If you have a headline, you should use a heading element like <h2>.
Conversely, if you need to indent content, don’t use <blockquote> unless it is an actual quote. If you want content to appear big and bold, don’t use <h1> unless it is actually the primary heading of the page. For those types of presentational needs, CSS is your friend, not markup.
Regardless of terminology, semantic markup should also be valid. All this means is that your markup validates against the assigned DOCTYPE. Whether you are writing HTML 4, XHMTL or even HTML5, your markup should contain no errors and no illegal elements or attributes.
How do you know if your markup is valid? Simple. Use the W3C’s online validator.
By this point, you may get the gist of semantic markup but you may also be wondering why it matters.
One of the foremost benefits is portability. Pages marked up with POSH not only work in web browsers, but all sorts of machines, including screen readers, mobile devices and text browsers. So, you write with POSH once, and you can deliver your content to a myriad of devices both now and in the future.
Semantic markup also offers efficiencies in the development process, which is particularly useful for teams. By using POSH, you establish a common vocabulary for your site markup. Developers don’t need to take extra time to consider what markup should be used for different types of content. A paragraph gets a <p>. An ordered list gets an <ol> and so on.
Further, by following these standards and eliminating unnecessary <div>s, <span>s and nested <table>s, markup is more readable. And readability has a huge impact on how easy it is to troubleshoot, debug and maintain.
Another benefit of semantic markup is that it provides the foundation for an accessible site. As I mentioned in my web accessibility and WAI-ARIA primer, assistive technologies such as screen readers navigate sites according to structure. For example, a site marked up with heading elements (<h1>-<h6>) to convey a hierarchical content structure gives screen reader users the ability to jump to different sections in the hierarchy.
And, of course, WCAG 2.0 guidelines encourage semantic markup, not only for page structure, but also for lists (<ul>, <ol>, <dl>) and special text that requires emphasis (<strong>, <blockquote>, <abbr>). Further, these guidelines discourage the use of <table> elements for anything other than tabular data.
<ul>, <ol>, <dl>
<strong>, <blockquote>, <abbr>
Pages developed with POSH typically are lighter-weight than their tag soup counterparts. This can often have a positive effect on page load times and responsiveness, which, in turn, can make for an improved user experience. Remember, not everyone has high-speed access and not everyone uses the latest and greatest technologies. POSH allows you to develop for the lowest common denominator so you can deliver your content efficiently to all users.
For years, I’ve heralded the benefits of semantic markup for organic SEO (Search Engine Optimization). I’d long heard the stories that search engines would bork on nested <table>s; that Google gives higher ranking to content contained by heading elements; that lean, semantic markup ensured pages would be crawled faster and more efficiently.
Unfortunately, most of the evidence I’ve found that supports these stories seems anecdotal. I struggled to find hard-and-fast facts from the major search engines that indicate valid, semantic markup improves search engine results. In fact, Google has even stated that CSS instead of <table>s for layout makes little difference to its algorithm.
The disparity between my perception and what the search engines are doing seems tied to the fact that search algorithms have improved over the years. Nested <table>s that used to cause problems for Google, for example, are no longer a barrier to content thanks to a more sophisticated algorithm.
That said, though, I personally suspect that there is something going on with search engines and semantic markup. For example, Google has recommended judicial use of the <h1> element to avoid potential penalties and it does recommend using heading tags appropriately. Google also encourages webmasters to write leaner markup to improve performance. To me, these recommendations mean Google’s algorithm is paying attention to markup and bloat to some extent.
Now that you know the meaning and benefits of semantic markup, let’s talk about actually putting it to use. First, though, I must admit that if you are new to it, semantic markup requires a bit more work and a change of mindset. If you’ve been developing sites using <table> tags to control layout or using <div> elements for every single bit of block-level content, it is going to feel strange and, perhaps, even frustrating to make the switch to POSH.
And, trust me, I feel your pain. I’ve been there. When I started learning HTML, I had no idea what web standards were, much less what semantic markup meant. My sites were the very definition of tag soup. But once I realized there were actual standards for HTML that offered a wealth of benefits I’d never even considered, I couldn’t, in good conscience, continue to develop sites as I had in the past.
If you have come to that same conclusion, here’s how I recommend you get started:
Of course, these steps are just how I go about my markup process. You may find some other workflow that works better for you (and, perhaps, your team). The point is to take time to be thoughtful. And, trust me, the more you do it, the easier it becomes and the faster you will get at writing semantic markup. But it does take a commitment and it does take time. Accept that going into it, and I suspect the process will be less frustrating.
Let’s apply these steps to some real-world examples, starting with some simple navigation links. From a content perspective, let’s use:
Home Products Services About
I’ve seen sites that mark up such content in the following fashion:
But let’s think about the content: Are each of those links really paragraphs? What purpose does the containing <div> serve?
From my perspective, the navigation links are actually a group of related content items. This makes them ideal candidates for the unordered list element <ul>. As such, a POSH approach to these navigation links would be:
Now, let’s look at another content example, this time contact information for a site:
123 Main Street
Albuquerque, NM 87106
A non-semantic approach to marking up this content might look like:
<div>123 Main Street<br />Albuquerque, NM 87106</div>
For me, I would take a different approach focusing on semantics, such as:
<li>123 Main Street</li>
<li>Albuquerque, <abbr title="New Mexico">NM</abbr> 87106</li>
One of the more challenging aspects of writing semantic markup is figuring out the “right” elements to use. For some content, it is immediately clear that a <p> should be used or an <h1>. For other content, there can be more than one semantic approach.
Consider the ABC Company example above. In my semantic markup, I chose to use a definition list (<dl>). Some people may (strongly) disagree with this use of a <dl> because they follow the school of thought that definition lists should strictly be limited to content that reflects terms and definitions.
I, on the other hand, subscribe to the notion that definition lists can also be used for content items that have a direct relationship with each other, where the <dd> content relates to that of the <dt>. For ABC Company’s contact information, I see the address, email and phone information (contained by <dd>s) directly relating to the name of the company (contained by <dt>).
And it may not just be my use of a <dl> that some folks might disagree with. You will notice I used a <ul> for the address information. That’s because, in my mind, that information is a group of related content items.
Further illustrating the subjective nature of semantic markup, I could’ve taken other approaches to this contact information, such as:
<li>123 Main Street</li>
<li>Albuquerque, <abbr title="New Mexico">NM</abbr> 87106</li>
Ultimately, there really isn’t a “right” semantic approach for all content (though some semantic purists may disagree). Again, I encourage you to focus on being thoughtful with your markup. Know your reasons why you chose certain elements over others and be open to learning different approaches. What makes sense to you semantically today may change as your knowledge grows.
Since the focus of this article is semantic markup, I must take some time to talk about HTML5. While it is still in draft form, HTML5 introduces a number of new semantic elements, which may reduce or eliminate your need to use <div>s for structure:
For the scope of this article, I’m not going to delve any further in to HTML5. Plus, I’ve only developed one site with HTML5, and I’m still working to get my head around the nuances of the spec. I do recommend, though, that you check out these excellent resources which have proved useful to me:
Thus far, most of this discussion has focused on using semantic elements to define structure. But semantics can go beyond the markup elements themselves and into semantic attribute values to provide even more meaning.
As I touched on in Be a CSS Team Player, semantic naming conventions for class and id values can provide valuable context and meaning in your CSS. Consider these two style declarations:
border: 1px solid #ff0;
border: 1px solid #ff0;
The value-property pairs in these examples are identical, but which has more meaning? I hope it is obvious that the second declaration has more meaning because of the class name used. “Alert” has far more context than “f23,” and more context can make it easier for you (and anyone else working in the CSS) to understand what content a style declaration affects.
More than just context, semantic class and id values (rather than presentational), can help ease maintenance and future development. Consider a site with sidebar content assigned id="rightSidebar". It seems semantic, but it is presentational, as well. So, if that site goes through a redesign, and that sidebar content is shifted to a new visual position, “rightSidebar” is no longer relevant.
This not only means the value needs to be changed in the CSS, but all references to that idin the markup will need to be changed. Instead, a purely semantic value, such as id="relatedContent" or even just plain id="sidebar" gives you needed flexibility for the future.
In addition to helping maintenance and communication for you the developer, semantic values for attributes can also give machines (user agents, search engines, browsers, applications, etc.) important meaning for your content.
Microformats, for example, utilize semantic class names to describe common web content, such as contact information, events, blog posts and reviews. Machines that leverage microformats, meanwhile, can easily identify that content and extract it for a wide range of uses.
Consider the ABC Company example I used earlier. You and I, as humans, instantly recognize that content for what it is: name, address, phone and email information. Machines, however, don’t have that ability. To machines, that content is the same as any content on your page.
But if you apply the hCard microformat, you are telling machines that your content includes contact information. And with that, machines can transform your web content into, for example, a downloadable business card.
Another example of making your content machine-readable with semantic attribute values is WAI-ARIA. As I mentioned in my web accessibility and ARIA primer, you can assign roles to your markup elements to help assistive technologies understand document structure, what widgets do and how to interact with widgets.
A further discussion of the machine-readable attributes used in microformats and ARIA is beyond the scope of this article. But they do play a part in the semantic markup discussion and I encourage you to read further.
For microformats, check out my 8-part blog series, Getting Semantic With Microformats. For ARIA, check out my Script Junkie primer.
Now that I’ve given you the basics of semantic markup and provided you information and resources to get started, I want to conclude our journey with a slight detour and discuss the purist thinking I alluded to earlier.
When I first got into web standards, I proudly referred to myself as a “standardista” and I frowned upon any markup that, in my mind, was non-semantic, didn’t follow standards and/or was invalid. I took a purist’s perspective and, to be honest, it was more about me feeling self-important and righteous than truly embracing what the web is about.
The reality is that non-semantic markup exists in today’s web. As long as CSS support is inconsistent across browsers, there is going to be a need to add <div>s and <span>s to achieve certain goals. As long as there are systems being used that restrict designers’ and developers’ ability to change markup, there will be non-semantic elements and even invalid markup. And let’s not forget what I said about subjectivity … not everyone is going to agree on every approach to POSH.
Even with validation, there are exceptions. I stated at the beginning of this article that semantic markup should also be valid. Yet, if you are embracing WAI-ARIA roles to add meaning and help users of assistive technologies, your markup won’t validate because the attributes are not defined in the DTDs for HTML or XHTML. There are ways around this, but the question becomes what is more important, a few validation errors or making sites and applications more accessible?
My point is that being a purist can be restrictive. Being a purist can limit your ability to understand what constraints many of today’s web professionals face. Being a purist can make it harder to engage in conversation about what is possible within those constraints.
For me, I still consider myself a standardista. But today, I seek balance more than I seek a sense of “being right.” I aim for semantic markup. I aim for lean markup. I aim for validity. But I also use non-semantic <div>s and <span>s. I may use attributes for machine readability that result in invalid markup. I even use CSS properties that don’t validate. And I don’t apologize for it. It doesn’t make me a bad developer. It makes me a realistic developer … one who knows the rules and can intelligently decide when exceptions are important for my sites and audiences.
This journey into semantic and standards-compliant markup is only part of the larger discussion. Check out these resources to get an even better understanding:
Emily Lewis is a freelance web designer of the standardista variety, which means she gets geeky about things like semantic markup andCSS, usability and accessibility. As part of her ongoing quest to spread the good word about standards, she writes about web design on her blog, A Blog Not Limited, and is the author of Microformats Made Simple and a contributing author for the HTML5 Cookbook. She’s also a guest writer for Web Standards Sherpa, .net magazine and MIX Online.
In addition to loving all things web, Emily is passionate about community building and knowledge sharing. She co-founded and co-manages Webuquerque, the New Mexico Adobe User Group for Web Professionals, and is a co-host of the The ExpressionEngine Podcast. Emily also speaks at conferences and events all over the country, including SXSW, MIX, In Control, Voices That Matter, New Mexico Technology Council, InterLab and the University of New Mexico.
More MSDN Magazine Blog entries >
Browse All MSDN Magazines
Subscribe to MSDN Flash newsletter
Receive the MSDN Flash e-mail newsletter every other week, with news and information personalized to your interests and areas of focus.