Automatically Generating Sitemap.xml Files For Jekyll

Michael Levin | Monday, April 18, 2011

Why Provide A Sitemap.xml File

Sometimes, crawlers just need a little help. Who can blame them? With hundreds of millions of websites on the internet for search engines to index, no wonder they might benefit from a boost. One way to help crawlers is by providing a sitemap.xml file. Not only does this file list all of the pages on a site that you want to make sure are indexed, but it can also provide useful metadata about each of these pages such as the last modified date, change frequency, and priority. While search engines are very good at finding your pages already, the metadata can be particularly useful, as search engines aren't as good at determining the relative importance of pages and how often to check for changes.

You can find a detailed explanation of the structure of sitemap.xml files and its metadata at the sitemaps.org site.

Once you create the XML file, you can use it for many different search engines. The simplest way to notify search engines about your sitemap.xml file is to include a reference in your robots.txt file. For my site, the entry looks like this: Sitemap: http://www.kinnetica.com/sitemap.xml. Google, Yahoo, Bing, and possibly other search engines will then be able to find the sitemap file and use it to help with indexing and crawling all of your pages and posts.

In addition, by signing up for their respective webmaster tools sites (Google Webmaster Tools, Yahoo Site Explorer, Bing Webmaster Tools), you can see which of the sites in your sitemap.xml have been indexed by each search engine. With Google Webmaster Tools, you can also let them know when your sitemap.xml has been updated and have them refresh their data.

Criteria For A Solution

When I started this site, I wanted to automate generating the sitemap.xml file. After reading the documentation on how to create Jekyll plugins, I decided to try to write a simple plugin that could generate this file for me.

These were my requirements:

  1. It should automatically be able to find all of the posts and pages on my site and provide the URLs.

  2. It should be able to automatically figure out the date the page or post was last modified. This is a little tricky, since a change to a layout file could change the output of many files (or possibly all of the posts at once). I decided to take the purist approach and calculated the last modified date by the latest date of either the page or post, or any of the layouts that it uses. However, I might change my mind later and simply use the last modified date of the post.

  3. It should provide a way to exclude certain pages and posts if I don't want them in my sitemap.xml file. I am currently only excluding my atom.xml file.

  4. It should allow me to manually specify the change frequency and priority for any page or post. By simply providing that information in the YAML Front Matter, the plugin should be able to pick up this information and include it in the generated file.

Give It A Try

I've completed my first version and wanted to get it out there for people to try. It's currently available for download on GitHub. I have provided instructions on how to use it in the README file.

If this seems like it might be useful for you, please give it a try and let me know what you think via email or twitter @kinnetica.

After all, it's not just the crawlers that can use a little help sometimes.

Jekyll - 7 Tips & Tricks

Michael Levin | Sunday, April 17, 2011

Instead of going with Wordpress or Tumblr for this blog, I decided to try out Jekyll. I was very happy with the experience, and thought I would share some of the lessons I learned to hopefully make the process easier for others.

  1. Jekyll has very helpful documentation available on their GitHub wiki. However, I found it hard to find all the available content simply by browsing the links on the wiki Home page. Once you have the basics up and running, I suggest clicking on Pages to see all of the available material. The mailing list is also a good place to go for help.

  2. The documentation also has a listing of sites that were created using Jekyll. Almost all of them have code posted on GitHub. This is definitely worth sifting through to get a better idea of how Jekyll works and how it can be used to accomplish your vision.

  3. While you're initially developing your site, it's very convenient to be able to see changes automatically as you make them without having to constantly restart the Jekyll WEBrick server. Make sure to either put auto: true in your _config.yml or start the server using the --auto flag to see your changes in real time. This is not the default behavior. Other customizations to your _config.yml can be found here.

  4. While working on writing posts, there are two different methods for keeping posts from appearing when your site is generated. The easiest way is including published: false in the YAML front matter of that post.

    Personally, I found it helpful to create a separate _drafts directory. This directory holds all of my unfinished posts. When the post is ready, I simply rename the file to reflect the correct date and copy it over to the _posts directory.

  5. If you're going to install and enable pygments, make sure you remember that you will need to provide styles for syntax coloring. For my site, I experimented with some of the stylesheets found here. You will need to change the default class in the styles from .codehilite to .highlight.

    One other cool feature with using pygments is the ability to add line numbers to your code. For instance, if you were writing Ruby code and wanted to include line numbers, simply surround the code with { % highlight ruby linenos % } and { % endhighlight % } (omit the spaces between curly brace and percent sign). If you don't want line numbers, simply don't include the linenos keyword.

  6. If you are not yet familiar with Liquid, it's worth spending a little time getting up to speed. I've found the best resources to be the Liquid for Designers guide on GitHub and the documentation.

  7. One of the convenient features of Jekyll is the ability to add metadata to all of your pages in the front matter section. By using the YAML key: value syntax, you can define your own custom variables and reference them within your site.

    One use for this is being able to change the page title depending on the page being displayed. For instance, if you have Ham: Is It Superior To Bacon? as the title in the front matter of a post, having <title>{ { page.title } }</title> (omit the spaces between the curly brackets) in your layout file will set the page title to Ham: Is It Superior To Bacon? within that post.

    Another clever use of this trick is to add a description and keyword custom variable on each post and page. You can then include these custom variables within a description and keywords meta tag in your template. The description will usually get picked up by Google and displayed in its search results. While Google ignores meta keywords, other search engines might still use them, so it might make sense to include them as well. Doing this will make it easier for search engines to display a relevant description and categorize all of your indexed pages.

The source code for my personal site is available on GitHub here. Feel free to use it to learn or as a starting point, but please make sure to remove all site specific code (e.g. Google Analytics snippet).

Kindle To Lead Ad-Subsidized Hardware Revolution

Michael Levin | Friday, April 15, 2011

You go online, you see ads. You turn on the television, you watch ads. You're driving on the highway, you pass ads. How often do you really get to make or save any money just for seeing ads?

Amazon recently announced an ad-subsidized version of its Kindle, which retails for $114 instead of $139. This results in a total savings of $25, or about an 18% discount. It is currently available for pre-order.

Knowing how Amazon operates, the ads will likely be well-designed and uniquely suited for the e-ink display. They will be much less distracting than the flashy, noisy online banner ads that we've all come to expect from web advertising. It's hard to imagine most consumers would pass up the savings, especially when the ads would cost them so little in annoyance.

From Amazon's perspective, this is a potential gold mine. Suppose Amazon sells a couple hundred thousand of the ad-subsidized Kindles within the first couple weeks or months. By losing $25 for each Kindle, they are earning the right to make ad revenue over the lifetime of each device. That's advertising over at least a year or two per customer for a measly $25.

Furthermore, this advertisement could easily be catered to individual users. Since Amazon already knows which books you are buying, it can charge advertisers a premium and serve up relevant ads to its users.

If you believe most reports that Amazon is making the majority of its revenue from the Kindle through e-book sales and that the profit margin on the actual device has become razor-thin, this makes perfect sense. If they can significantly reduce the price of the hardware for consumers, they can attract more customers and sell many more e-books.

I believe the 25 dollars is a conservative test for Amazon for this new business model. If it succeeds, the next stop would be $99, and possibly even lower. I would also not be surprised if the rumors regarding Amazon offering free Kindles to paying Amazon Prime members finally came true. Only it would be with the ad-subsidized model.

While ad-subsidized hardware hasn't yet established itself as a proven business model, other companies have considered similar ideas. There was an excellent article a while back on TechCrunch by MG Siegler about Google looking at giving away their Nexus One for free by subsidizing the device with ads. At the time, this plan was killed by the carriers.

If this model works successfully for Amazon, I suspect Google and other hardware companies will push harder to integrate this model into their products where it makes sense. Ad-subsidized phones and tablets would be the next logical step, leading to cheaper prices and even faster adoption.

For once, the possibility of having more ads in our lives actually seems exciting.

CSS Selection Attribute

Michael Levin | Sunday, April 10, 2011

One of the lesser-known features of CSS is the option to style text selected by the user. This is achieved using the ::selection pseudo-element. Here is an example:

::selection {
  color: #000;
  background: #fff;
}

::-moz-selection {
  color: #000;
  background: #fff;
}

Firefox currently requires the -moz preface when using the element. This element will work on all modern browsers (Chrome, Firefox, Safari, Opera, IE9). There is no support for earlier versions of Internet Explorer.

Although a feature like this should be used sparingly, sometimes it can be used to produce good-looking results. An example of where this feature is used well is Paul Irish's CSS3 Please site.

The ::selection pseudo-element was originally written in the CSS3 draft spec, but was later removed. However, it has already been implemented in all major browsers and should continue to be supported in the future.

Note: Only the color, background, and background-color CSS properties will work with ::selection.

More Resources:

  1. MDN Documentation About ::selection

Come In, We're Open

Michael Levin | Sunday, April 10, 2011

Hello, there! My name is Michael Levin and I am a web developer living in the Washington, D.C. area. You can find out a little more about me by looking at my about and resume pages.

The main purpose of this blog will be to share my learning experiences related to web and software development and any resources that I find helpful along the way. I have committed myself to posting at least one new article a week.

I hope you'll stick around and learn along with me.

-Michael