Experimental Lex

Playing with words.

Tag Archives: writing

Lies, Damned Lies, and Mime Types

If you’ve ever had the joy of building EPUB files and struggled with the cryptic error messages produced by epubcheck, you might find this interesting. I was having a number of facepalm moments the other day while dealing with an image icon for a Twitter user. My new Twitter friend @longreads is giving me a steady stream of wonderful content to read. And yet, I was having an awful time trying to figure out why epubcheck was rejecting the longreads Twitter icon as having the wrong mime type.

Here is the url for the icon . It has a “.jpg” extension which suggests that its mimetype is “image/jpeg”. If you download the file and open it in an image viewer app like the Preview app for Mac, it can tell you that it is actually a PNG file. In my automated workflow, this little time bomb silently waits until the EPUB is created before it announces itself during epubcheck validation.

I was previously using the file extension to determine the mime type, and that was obviously not going to work in this world of deceptive file names. Moreover, there are image urls that do not have any file extension. So, I tried to get clever and check the “Content-Type” header in the response when downloading files. However, I even found that this was not always correct.

I did some research on existing open source tools (Java and Python-based) and it is surprising how many use the file extension as the main determinant. And the tools that actually read the file header were said to be buggy. So, it dawned on me later that I should just look into the epubcheck source code and find out how it was reading image types.

Here’s the source code for BitmapChecker.java in the epubcheck source code. As is, it is not designed to be used externally so I created a copy and compiled it into a command-line tool like epubcheck. It is called MimeCheck and you can run it like this. It reads the file header returns the mime type.

$ java -jar mimecheck-1.0.jar twitpic.jpg


That only works if the file is in the same directory as the mimecheck-1.0.jar file. You can also provide the absolute path and it works the same. Here’s a download link for the jar file if you are interested in using it.


Let me know if you have any questions or if you need the source code. I think I have a few other EPUB-related tools that I can contribute and perhaps turn this into an open source project on Google code. That depends on whether anyone else finds it useful. I realize that not many people need to do automated builds of EPUB files.


Personal Attention Databases

A number of years ago, the well-known entrepreneur Seth Goldstein, along with Wall Street legend Lew Ranieri, launched a startup called Root Markets. The focus of Root Markets was the attention economy and the concept was to give users the ability to track their online interests and share that data with Root Markets. The theory was that your attention had value because you were a potential sales lead for various products and services. And therefore, your “lead” data along with everyone else’s could be traded as commodities in an exchange marketplace. Highly speculative definitely and perhaps a little ahead of its time.

It’s probably been a while since I or anyone else has thought about Root Markets. It seems that they quietly closed up their operations and very few noticed. Yet, the attention of Internet users is as important as it ever was and the attention economy continues to thrive and evolve. With the dramatic growth of mobile computing and social networks, the attention economy is expanding along new trajectories.

I bring up this obscure reference to Root Markets to introduce this concept of personal attention databases. One of the challenges that Root faced was storing and presenting your attention data in a useful way for users. While thinking about The Social Content Graph, I have been contemplating this problem – how to track the content and people that you like as you hop across devices, services, social networks, and content publishers. I called it a “complex beast” and that is an accurate description.   To simplify things, I will start using the acronym “PAD” as a replacement for personal attention database.

In the first decade of the Internet, users bookmarked the content they wanted to keep or revisit. Social bookmarking sites like Delicious.com made it easier to manage thousands of bookmarks by applying “tags” to bookmarks and therefore making your bookmarks database searchable. Delicious.com is definitely one of my favorite Internet innovations ever created and I certainly hope they survive and continue to keep my universe of bookmarks at my fingertips. This was the first generation of PADs. And now, we are contemplating the next generation of PADs, capable of mapping the social content graph.

Attention Spaces

This supremely complex goal of mapping attention in your social content graph requires that we toss around some ideas about how it might work. One feature that does not exist today with social bookmarking is the logical separation of your attention “spaces”. For example, work versus home. I use Delicious avidly and it is really the only bookmarking tool I want to use. Yet, I sometimes use Google Bookmarks and Faves.com, because I want to separate the different pools of content.

The need for separate attention spaces becomes more critical when you try to imagine a personal attention database that includes your social graph. A useful comparison is the logical separation of your social networks across Facebook and LinkedIn. Obviously, you want to keep it professional on LinkedIn and not share YouTube videos or party pics, and vice versa with Facebook. Hence, our theoretical PAD design would automatically provide separation of Spaces, like Work Space and Personal Space. (If you have a better name for “spaces”, please let me know.)

Editor’s note: This article has already taken several days to write, so it’s time to wrap it up and come back to it later.

You heard the man. I’m closing out this post and will continue with it later. I’d like to avoid publishing something really dumb while pretending it is not. Next time, we will look at the content types we like to follow and the different patterns of content consumption.

The Social Content Graph

Greetings and happy holidays! Apologies for not publishing anything lately. I feel physical pain every day that goes by when I am not writing anything. Possibly, my choice of long-form articles have hampered me from reaching that tipping point where writing becomes an easy and fluid task. So, we will try a new approach by writing lighter pieces. Many of these might look like “fluff” pieces based on current news, personal views, and speculation, yet still within the general universe of digital publishing and mobile content. We can worry about how to re-assemble this content into a book some other day. The only thing that matters is actually writing.

With that said, I would like to discuss my vision of the social content graph. This is not a new term and I have not researched prior usage of this term. However, I do feel it is a meaningful term that has relevance to digital publishing. When you look at the entire spectrum of companies, platforms, software, and content in the extended digital content ecosystem today, you will recognize that reading is becoming more social.

That may sound like Captain Obvious talking. Nonetheless, this is the essence of the social content graph. The term “social graph” is well known and widely used in reference to a person’s social network. A person with a large number of Twitter followers and Facebook friends has a large social graph and the content they share has a powerful “network effect”. Perhaps it’s just the technical and contemporary way of quantifying popularity.

So is there such a thing as a content graph? A quick search on Google shows that it was a significant term in 2010. For example, here’s an interesting quote from this article called The Content Graph and the Future of Brands

In the Social Graph, you’re defined by your friends. In the Content Graph, a content brand is defined by its distribution relationships with other content brands.

Unfortunately, that’s not the Content Graph I am thinking about. Instead, I am trying to express how digital content is published, consumed, shared, aggregated, republished, and consumed again in the digital world today. Publishers and content creators seek to publish content that is original and popular. Content is given life by consumers who share the content with others. Until the content is consumed, shared and discussed, the content barely exists. (Call up metaphors like “tree falling in the woods” or “Waiting for Godot”)

Consider the inter-twining relationships between content creators, consumers, and companies like Twitter, Bit.ly, and Flipboard in our digital content ecosystem. Sharing content via Twitter is usually done with shortened urls (generated through services like Bit.ly) which redirects users to the original URL. The importance of short URLs is a by-product of the 140-character limit that is built into Twitter. While this limit originates from the character limit of SMS messages, it also provides a universal rule that makes all messages short and easy to browse.

Anatomy of a Tweet

When browsing through Twitter messages, you see a microcosm of specialized syntax to accommodate as much content and meaning within the 140-character limit. For those new to Twitter, it can be a daunting experience trying to grok the meaning. The most basic tweet is just text from a Twitter user. In addition, a tweet can have any of the following:

  1. link: usually a shortened URL (example: http://bit.ly/eOsrVQ)
  2. #hashtag — one or more topic tags that serve as searchable metadata
  3. @username — used for replies and mentions using the “@username” format
  4. RT (retweet) — a flag that signifies when one user has republished another user’s tweet

In addition, your Twitter feed contains not only the people you follow, but also the extended conversation taking place between between people in your Twitter network and their network. Each “@” mention is a clickable link that takes you to a user’s Twitter page. Thus, it becomes another point of interest as you browse for interesting content. Yes, it seems overwhelming and yet it happens to be the best way of getting the latest and most interesting content. In the end, the Twitter messages that contain links are often the ones that are the most interesting, and the Twitter users who share the most interesting content are usually the ones worth following.

Curated Content

This brings us to curated content, which is content that is shared and republished by tastemakers and thought leaders within different areas of interest. In contrast, content that is found through organic search is not hand-picked and the quality of search results can vary greatly.

Flipboard is an iPad app that presents streams of curated content that the reader chooses, across a number of topics. Most notably, Twitter integration in Flipboard presents the links shared by people in your Twitter network in a pleasing user interface that resembles a magazine. Hence, Flipboard and other reading apps like it are an important part of our social content graph. Of course, the content you consume in Flipboard is easy to share with others through the usual channels (Twitter, Facebook, e-mail, etc).

Social Reading

When you have reading devices and apps that have social networking “baked in”, you have the beginnings of a social reading experience. Curated content that is recommended to you through your social network is an entry point to social reading. Social reading is also found at a deeper level, where readers can share bookmarks, comments, and quotes from the content they are reading within their social network or with everyone. Such social reading features are found in the Amazon Kindle reader and may have originated there. We are starting to see this kind of social reading and sharing in education reading apps like Inkling.

Mapping the Social Content Graph

I’m going to admit that my concept of the Social Content Graph is still half-baked, and I think that’s ok for now. The point I am trying to make is that the social content graph is a complex beast, since it is a chain of people and content links. The reason why Flipboard is such an excellent reading experience is that it understands that this is a complex beast and it tries to organize it for you in a way that makes it pleasing to browse and consume.

And yet, Flipboard is just a reading experience and does not help you understand and organize your social content graph. You still need to bookmark or republish the content you like if you want to be able to find and re-read content in the future. That feels a little weird to me… sharing by Twitter just because I don’t have a convenient way of mapping and saving the parts that I want to keep.

In my mind, I have this mental image of a social content graph somehow looking like that clever visualization that you see in a LinkedIn profile that shows how you are related to another person. It nicely illustrates the “degrees of separation” between you and others on LinkedIn. And the ideal visualization of my social content graph would be something like that. It would show me a 2D/3D spatial view of the people I follow and the content I like, and it would let me pivot the view along the people axis and the content axis. Someday, it would be nice to explore the reverse angle and see the people who follow me or like the content I have shared or created. Yeah, whenever that happens.

Page Layout Hacks in HTML5 – Part 2

Let’s keep going on this topic. The first post laid out some of the challenges that were faced in using HTML5 as the display format for an iPad app. Essentially, that also means it was used as the source format where the text copy was mixed in with the code. That has to be some kind of programming anti-pattern where the view logic is mixed with the business logic. In this case, the business logic is defined by the user experience requirements which calls for horizontal page flipping with multi-column text layout and fully-justified text.

As we continue this exploration, it becomes more clear that the external development team did not understand the modern CSS technologies that are now available for handling these kinds of display requirements. All things considered, this area of HTML and CSS is somewhat obscure… really an “edge-case” for most of the web development world. Almost all prior web design and development has relied on arbitrary pagination of vertically scrollable pages instead of pagination based on a fixed page/column size. Without the devices that would impose specific pagination constraints, these programming techniques were mostly unknown and virtually irrelevant.

While we continue to research the so-called “right way” of structuring HTML5 content to support text reflow between columns and pages, we should still look at the hacks that were necessary. Hacks seem to be a constant in any programming endeavor.

Aesthetics of Magazine-Style Publishing

In newspaper and magazine publishing, multi-column article layout is the norm and full justification of text is pretty standard. The narrower columns are meant to be easier to read, since the readers’ eyes have to do less work while scanning words from one margin to the other continuously. This also improves the readability of the content, since it becomes less likely for the reader to get lost in the middle of a long sentence.

The full justification of text forces each line of paragraph text to use the full width of the column (with the exception of the last line), and this gives the content a cleaner look. Without the full justification, the text is (usually) left-justified and the other margin has a “ragged” edge.

This improved readability creates additional challenges. Often, there are places in the text where long words close together get word-wrapped in an inconvenient place which results in excessive spacing between words. With the narrower columns of multi-column layout, this problem is compounded.

In addition, the usual concerns of widows and orphans need to be addressed. When paragraphs are split across pages and columns, there are often problems where a single line from the beginning or end of a paragraph is stranded by itself at the top or bottom of a page/column.

Web-Based Reading

On web pages, article content is usually presented in a single column and readers are expected to scroll down the page to continue reading. In addition, the paragraph width for an article is usually narrow, which is meant to improve the readability. This has the side effect of making the article longer in terms of vertical space. A long article on a website might have pagination links, but there is no obligation for the page content to fit into a single viewable page area. Hence, vertical scrolling is usually a requirement in web-based reading.

E-Book Reading

For digital book publishing, page layout is less of a concern since paragraph text is meant to reflow smoothly regardless of screen size. Yet, e-book readers like the Kindle and iBooks are designed to simulate the book reading experience by providing pagination of content in a non-scrollable page. The horizontal page-flipping experience is also part of most e-book readers which is important to book-lovers and purists.

Tablet-Based Reading

The coming revolution in digital publishing is being driven by tablet-based devices like the iPad. For publishers who want to provide a magazine-style look and feel, it may be necessary to support a multi-column layout and often with full-justification of text.  In the remainder of this article, I will share some of the tricks and hacks that might come in handy if you are forced down this perilous path.

Code Hacks

Force Justifying the Last Line

With fully-justified text, the last line at the bottom of a column or page is the one you have to worry about. In order to force-justify a line, you need to employ a trick that makes the paragraph think there is more text that continues on the next line. In the example below, we inserted a <span> tag with the style visibility:hidden and within the tag we have long series of characters. Together, this simulates a long word that gets word-wrapped to the next line, yet is invisible.

<p>One morning, as Gregor Samsa was waking up from anxious dreams, he discovered that in his bed he had been changed into a monstrous verminous bug. He lay on his armour-hard back and saw, as he lifted his head up a little, <span style=”visibility: hidden”>aaaaaaaaaaaaaaa</span></p>

Tweaking Word Spacing

Sometimes, paragraph text will split across pages or columns in an unattractive way. For example, an orphan line that continues in the next page or column. That will happen often. One way to handle this is to increase or decrease the number of pixels between each word by setting the word-spacing CSS style. This will often help you get the text reflow you want.

<p style=”word-spacing:2px;”>Gregor’s glance then turned to the window. The dreary weather—the rain drops were falling audibly down on the metal window ledge—made him quite melancholy.</p>

Note: the word-spacing style only accepts whole numbers when rendered by Safari/WebKit.

Squeezing Words Together

In some cases, a line is word-wrapped even though it looks like there is almost enough space to fit the next word on the same line. One trick you can apply is to use the HTML thin space entity, which is thinner than a standard space. I was surprised to find out about it and it’s very useful in this kind of situation.

for the contact felt like a cold shower all over&thinsp;him.

Keeping Words Together

On occasion, you may prefer to keep two words together, especially when a paragraph is split between pages. This can be achieved with the HTML non-breaking space (&nbsp;). However, the &nbsp; and &thinsp; entities both have a fixed width will not stretch out in fully-justified paragraphs, which will make the word spacing like odd.

Superscript and Subscript Handling

Perfect and precise line height is important from a visual perspective and to ensure that paragraph height is consistent. When you have a two-column layout, it is important that the lines of text in each column are lined up precisely.

So, it is surprising to find out that the CSS line-height style can easily get broken by superscript or subscript text in a paragraph. If you have footnote markers, this will be a common problem. The fix for this is to modify the CSS for the <sup> or <sub> tags. Here’s an example:

sup {
vertical-align: baseline;
position: relative;
bottom: 8px;

Display Formats in Digital Publishing

This is an offshoot of the series “HTML5 in Digital Publishing”. In the first article, we tried to explain the significance of HTML5 and how it has become an important part of digital publishing and mobile devices. Today, we continue this analysis by addressing the rationale for choosing HTML5 versus other display formats.

In this article, we will focus on “display formats” in digital publishing   A display format defines how the content is rendered for display and viewed by the user. Therefore, a display format is also related to the technologies in the development platform that is used to publish content to the device. For example, the Apple iOS software development kit (SDK) is a development platform that targets a set of devices (iPhone, iPad, etc). With the iOS SDK, you have a choice of rendering content through HTML web views or native app components.

HTML5 as Display Format
The choice of HTML5 as a display format is easy to justify. Most tablet devices have strong support for HTML5 content views and this makes HTML5 a good platform-agnostic strategy. In the rapidly-evolving world of devices, we are seeing consistent and wide-spread support of HTML5, particularly through the WebKit browser engine. When you test complex HTML and CSS across different WebKit browsers, you usually see great consistency.

Images as Display Format
Image-based content display is a reasonable choice for some publishers. This option is especially suitable for photography and art where full-page image galleries are the desired experience. And yet, in a tablet device with a touch-based interface, an image gallery or slideshow can feel very flat and boring. When creating an image-based experience, it is a good idea to look for opportunities to add interactive features such as image pan and zoom, text layers, and visual navigation.

Using images as a display format also opens up the possibility of eliminating complex page layout issues by using images exported from programs like Adobe Photoshop and InDesign. By authoring complex text and image layouts in a graphics/design programs and publishing images instead of text, you can guarantee absolute page fidelity when comparing the comps to the end product. However, the idea of publishing books without text might seem unsavory to some. It seems odd to remove the text from a book or magazine and only display the screenshots.

As you can guess, replacing text content with images would remove the possibility of searching and selecting text content, which is one of the promising features of digital e-readers.

Native as Display Format
We use this abstract term “native” when referring to content that is implemented through the programming language and tools required by the development platform. Among the current development platforms that have native programming languages are: Apple iOS, Android, Adobe Flash/AIR, and Windows Phone 7. “Native” also has the connotation of being expensive and proprietary, which is usually true. Native application programming requires specialized programmers and is often more time-consuming.

To make native app content more plausible as a display format in digital publishing, there are a few approaches you can take:

  1. Use a template-based system for loading data fields into content template(s):

    Native code will perform the task of reading data stored in a database or as structured data like XML and then injecting the data into a template. With native apps, a template is often a big chunk of code that places data on the screen as well as the layout, formatting and effects to apply when rendering the content.

    The drawback here is the same as any template-based publishing system… the content templates can feel too restrictive and the content look-and-feel may look stagnant and boring. The art directors will never go along with it.

  2. Mix-and-match with HTML and images:

    Each development platform has support for web view and image view components and it is possible to create a native app solution that uses both to enrich the experience. Since most HTML5 and image-based solutions still require a native app as the container around the web view components, a mixture of native and other display formats will usually be present at some level.

EPUB and Kindle
Perhaps it’s unfair to lump these two together since they are competing e-book formats. However, as display formats they are similar enough to group together. These e-book formats both use HTML as the core document storage format and both have a standard packaging structure that defines the organization of files within a file structure.

E-books also speak to a narrower audience within the digital publishing universe. Most e-books will only contain chapters and paragraphs of text, presented in a format similar to the paper-based books that they may eventually replace.

EPUB and Kindle formats are both interesting beasts and we can learn much from how they are constructed. We will analyze later them both in a separate set of articles.

HTML5 in Digital Publishing: Part 1

This is the first in a series of posts on the use of HTML5 as a content format in digital publishing. This will be an informal journal with no real plan as to the number of posts or the topics that will be covered beyond the current post. In this first post, we will provide an intro to HTML5 and why it is relevant to digital publishing.

Explain HTML5
We should start by explaining what HTML5 is. I am sure it is not adequate to say that HTML5 is just a newer version of HTML. In general, I assume the audience here is kind of technical, but not necessarily involved in web development. So, I will start by explaining the big picture. Bear with me. This exploration is not intended to be a boring roundup of technology history. There’s a story with real meaning here.

Since the beginning of the Internet, the primary way for interacting with the Web* was through a web browser. The content that makes up a web page is assembled in a text structure called HTML and delivered to a web browser. HTML is a hierarchical text structure that resembles XML, which means that it has named elements (or “tags”) with metadata attributes that define specific page layout and formatting details. The HTML text that is rendered by a web browser will often have references to images and other media, and the browser will also fetch and display that content.

Altogether, that complex mass of tags and metadata is received by a web browser and translated for display on a computer screen for a person to view and interact with. In general, when we refer to “HTML”, we usually mean HTML4 and prior versions. With each new version of HTML, there are new features that are defined through new tags and attributes (usually with corresponding updates to CSS and Javascript). To support the new features, new web browsers are released and updated. This takes us back to “HTML5 is just a newer version of HTML”.

Just kidding. It’s much more than that.

HTML5 is Really About Mobile
With HTML5, we have a new and evolving world of Internet-connected devices that includes computers, televisions, and mobile devices. With mobile devices, especially smartphones and tablet devices, there is a driving need for alternate ways of viewing web content, due to the different content consumption habits of people when they are away from their computers and laptops. One major factor is the need for mobile devices to be able display content for users who are not currently connected to the Internet or when mobile networks are too slow.

With the iPhone and the iPad, Apple redefined mobile content consumption by creating an app-centric universe of mobile apps. Instead of depending on the web browser and an Internet connection for content, apps are capable of delivering content and entertainment when the user is away from work/home or simply relaxing. With current and future generations of mobile devices, the web browser is no longer the primary means of interacting with the Internet.

And yet, the definition of a web browser has changed or maybe lost its original meaning as a program that can display websites. However, custom apps are also capable of displaying web content, either remote websites or content stored locally. In mobile applications development, there is the notion of a “web view” component, which is like an embedded web browser that can display HTML content without looking like a web browser (with windows and tabs and menus, etc). The end-user may see it as richly-formatted content, while the source content may in fact be HTML.

Summary: Why HTML5 is Relevant
To bring this long-winded story home, I will summarize what this all means:

  • The browser is now embedded and invisible: The “webview” component in mobile apps is an HTML5-capable browser engine, but it doesn’t look like a browser. Very often, it is the WebKit rendering engine underneath, and that’s a good thing. This means you can expect consistency in the display of HTML5 content.
  • The Web is now local: Webview components are often used to display content that is stored locally on the device (and often deployed in the downloadable app). As users and devices become more mobile, the Web will be there with or without an Internet connection.
  • HTML is still a good publishing format: EBook readers like the Apple iBooks app uses the WebKit browser engine to read HTML files included inside an EPUB file. On top of that, it adds an interactive Table of Contents, bookmarks, and thumbnail navigation to make the book experience more exciting. You can do the same and create your own custom reader to deliver the experience you want.

Bottom line: HTML is no longer limited to the traditional web browser-based experience. And yet, it still supports the traditional browser-based content model.

HTML5 Features
HTML5, as a language that defines a number of features, was developed during the evolution of the Internet and towards mobile computing. Without going into the details of each feature, the overall enhancements in HTML5 can be described as follows:

  1. Portable: The portability of mobile devices also requires a web content model that is capable of operating without an Internet connection. To support this need, HTML5 provides additional features like database storage to allow HTML5 content to store and query data in a local database instead of a remote website.
  2. Media-Capable: Online video and audio in desktop web browsers almost always depends on the Adobe Flash plug-in. With mobile devices, Flash does not have the same pervasiveness due to performance constraints in mobile devices and due to legal licensing issues. One of the key goals of HTML5 is to provide built-in media players for video and audio content.
  3. Canvas Animation: Again, without the Flash plugin, there is a need to provide advanced animation capabilities. The HTML5 Canvas, with lots of help from Javascript, aims to provide this.
  4. Location-Aware: To provide location-based experiences in web content, HTML5 provides support for geolocation data for the current user location (if the user gives permission to share their geolocation info).

NEXT: Choosing a Content Format for Digital Publishing
So far, we have only started to explain the role of HTML5 in our evolving world of Internet devices. Next time, we will need to address the rationale for choosing HTML5 and what the other options are. When you consider the alternatives, you might decide that HTML5 is the best approach. Let the smackdown begin.

Anthologize WordPress-to-EPUB Publishing

I recently wrote about my interest in WordPress-Based Publishing Tools and I am continuing that thread with a test-drive of the Anthologize WordPress plugin. The story behind Anthologize is interesting. It is an open source plugin for WordPress that originated from an innovation project called One Week | One Tool hosted by the Center for History and New Media at George Mason University. I totally love the tagline “Digital Humanities Barn Raising”. (As an aside, I am curious whether digital humanities will be a mainstream college degree in the near future).

Anthologize will appeal to a specific audience in the realm digital publishing: WordPress-based publishers interested in publishing to the EPUB format. The EPUB format is an e-book format that has been adopted by most e-book readers (pretty much every device except the Amazon Kindle). I should also mention that Anthologize can export to other formats besides EPUB, such as PDF. It is very possible that Anthologize will support other digital formats and workflows in the future.

There are a number of programs that let you create and assemble book content and export to the EPUB format. However, these programs and tools often require advanced technical skills which make them unattractive to content creators, who really want simplicity like the kind provided by blog platforms. Hence, the idea of integrating EPUB authoring tools into WordPress is an attractive one. Note, however, custom plugins like this can only be installed in independent blog servers running the WordPress software and not on hosted blog sites on WordPress.com

Getting Started
While I have nearly 20 years of hard technology experience and can create and deploy massive Internet sites across a dozen servers, I still shy away from managing my own blog server. This blog, Experimental Lex, is hosted on WordPress.com and I have other blogs on Blogger and Tumblr. I, too, like simplicity. I want my writing persona to never think about server technologies or hackers or whatever.

So it’s a little amusing that I have to setup my own WordPress server to try out Anthologize. That’s okay. I’m sure it won’t be the only WordPress-based digital publishing solution that I will need to explore. (Note to self: must try out CoverPad). If you have ever tried or witnessed a WordPress install, you will know that it’s a piece of cake. A few minutes (maybe 5) and you should be up and running.

Installing Anthologize is quite easy as well. Look for the “Plugins” menu in the left navigation and click on “Add New”. You can then search for “Anthologize” by name and WordPress will download and install it.

It is worth noting that the latest release of Anthologize is still at version 0.5-alpha, which means it is not quite complete.

Test Drive
After the plugin is installed, you will now have an “Anthologize” menu in the left nav. Click on “My Projects” you will see the empty “My Projects” page.

At this point, it is probably worthwhile to discuss the concepts and terminology used in Anthologize. A “project” represents a collection of content that will be assembled and exported as an EPUB or other digital reader format. You start by creating a project with a name and then start adding “parts” to it. In e-book terms, the parts will be chapters and other sections that make up a book.

In the screenshot, you can see that I created a project called “Digital Future” and added parts for “Introduction” and “Wordpress-Driven Publishing”. Since this is a brand-spaking new WordPress blog, the only blog post I had was the “Hello World” post. I can drag-and-drop “posts” from the “Items” section of the screen and into the Parts organizer.

Anthologize copies each post that you drag over and creates a “Library Item”. Several items can be added to a part and you can re-order or edit the content of the item within the Anthologize editor. Since it is a copy of the original post, the edits you make will not affect the real blog post.

Importing Content
Up until this point, I had not considered the “Import Content” menu in the left navigation. I assumed it might be some kind of upload tool that converts Microsoft Word docs, etc. Not so! It actually lets you pull content from an RSS feed into Anthologize.

I tried it out with the RSS feed from Paidcontent.org and within moments I had a rich collection of content to use in my sample project.

Ok, maybe that’s cool if you are one of those content pirates who rip off other people’s work. (just kidding) It wasn’t until some time later that I realized that you could pull your own RSS feed content. That means you could download blog posts from your hosted blog site using the RSS feed, assuming that the feed contains the full posts. And if you are running Anthologize on your own computer like I am, you can use this setup for assembling and creating e-books.

Now that’s a pretty big deal, since this truly becomes a more powerful digital publishing workflow. This system was designed with the understanding that the blog publishing workflow is different from publishing an e-book. Yet, it lets you re-use and re-integrate your content in a fairly smooth way.

Exporting Content
The Export Content feature is pretty barebones at this stage. As you can see in the screenshots below, you have some very basic controls over the metadata and formatting of the exported EPUB file. Still, it is very gratifying to see that it successfully produces a valid EPUB file that looks pretty decent in terms of formatting.



Inkling and the Reinvention of Education

We live in exciting times. The next decade or so will see the gradual exit of paper-based books and magazines as printed content moves from the physical world to digital. The digital trendsetters, who already read books and other content on tablet devices like the iPad and Kindle and countless other devices to come, have already embraced this future. The upcoming battle between Apple, Amazon, and Google on this playing field is mainly focused on this audience. However, I continue to wonder if this playing field will eventually encompass education.

The education market opportunity for digital publishing is almost an accepted fact. We probably agree that it’s going to be huge. Yet, we may not agree on what products and solutions will succeed in this market. The reading experience for novels and magazines on tablet devices is a more linear experience than the kind of interactive reading and problem-solving that students must do as part of their homework and curriculum.

And so it is always exciting to hear about new companies and products in the e-learning marketplace that are pioneering the way. Inkling is one such company that is offering an interactive reading/learning software platform for the iPad. When I heard about Inkling, I had to give it a try.

As an iPad app, installation was dead simple. It’s also a free download, so there were no excuses at all. Upon install and after you complete the registration form, you find yourself looking at the one book that is pre-installed. An essential classic — The Elements of Style. Except that it’s not really the Strunk and White edition. It’s the Inkling Edition “based on the work of William Strunk, Jr.”.

You can either feel horrified or amused at this point. They are messing with the classics and adapting them for the digital future. If you want the original Strunk and White, you probably want it to look like the original, front to back. And this is not the old-timer’s original edition. In the first 2 to 3 pages, you will see commentary like “ftw” and “wtf” as part of the dialogue.

However, if you accept that the digital textbooks of the future need to be updated and adapted for the next generation of students, then you will likely enjoy the experience. I’m in this camp and I found it to be intriguing and inspiring… with some reservations.

Test Drive
Here are a few screenshots to help you visualize the Inkling experience. If you have an iPad, just download it and try it yourself. In the opening screen after signing up / signing in, you see the one book. In Part 2 of this article, I will discuss the Inkling business model and speculate about the digital publishing workflow behind this system. It can’t possibly be easy.


Table of Contents

When you open the book, you immediately can see that this is not a traditional table of contents. The TOC is one of the primary areas for enhancement in any digital publication and this one wants you to know that the linear TOC you knew as a child is a piece of history.


Cover Page

In the cover page, you see a nice splashy title and image along with a toolbar on the left and a call-to-action graphic near the bottom to instruct you on scrolling the content using a swipe gesture. Note, that you will need to do a heavy swipe from the bottom of the screen to the top.


Navigation Menu

The menu button in the toolbar shows you several actions available to you. The “Highlights” and “Notes” menu options are particularly interesting. According to the Inkling website, these can contain shared notes provided by other students reading the same pages in the same book.


Search Tool

And lastly, here’s a screenshot of the Search Tool, also found in the toolbar. I found it strange that there were no matches for the word “Footnote”. I felt sure that Strunk and White addressed this topic. We will have to do a little fact-checking.


WordPress-Based Publishing Tools

As the digital publishing future unfolds, there are several unknowns we often think about. We like to speculate about which devices will win or lose, which e-book format will dominate, etc. Essentially, these are questions about how users will consume written content. From the publisher’s perspective, there are also questions about what tools are best to use. Often this is a complicated decision based on the reading platforms that the publisher wants to target.   

Blog-based publishers are an important market segment that is starting to adapt to the changing landscape of digital publishing. In the last decade, they had it easy in the sense that they only target one device (personal computers) and essentially one format (html)*. Now and in the rapidly-approaching future, blog-based publishers will need to evaluate how their blog content looks on a tablet-based browser or whether they need their own custom reader app.

In general, blog-based publishers will prefer solutions that let them preserve their existing publishing tools and workflow, so they can continue to concentrate on writing great content. Since a large percentage of blog publishers use WordPress, it makes sense for them to evaluate solutions based on WordPress plugins. For example, Akismet is a WordPress plugin that helps control and moderate comment spam and is provided in the standard WordPress installation.

CoverPad / PadPressed

A few days ago, I mentioned the CoverPad app from PadPressed, which offers a set of publishing tools and apps that help publishers target the iPad. PadPressed makes this possible through their custom WordPress plugin which they sell as a commercial product. This highlights an interesting digital publishing workflow for blog-based publishers who want to reach the growing market of tablet-based readers.

I visited the PadPressed website to try it out, but found that I needed to buy the product before I could try it out. The pricing seemed fair enough for publishers who are using WordPress. However, since I don’t have my own WordPress server, I was a little shy about paying to try it out. Eventually, I will get around to it.

It is worth noting that PadPressed has stated that they will be moving from WordPress-based solutions and towards CMS’s in general. It seems they have their eye on the larger market of digital publishing and not just content published via WordPress.


Another intriguing WordPress plugin called Anthologize shows great potential. This plugin helps you assemble and build content in the EPUB format, which is the book format used in the Apple iBooks app and several other reader apps available for Apple iOS and Android devices. It originated from the One Week | One Tool project hosted by the Center for History and New Media at George Mason University. It is now an open source project that is quickly evolving. Here’s a brief description pulled from the Anthologize website:

Anthologize is a free, open-source, plugin that transforms WordPress 3.0 into a platform for publishing electronic texts. Grab posts from your WordPress blog, import feeds from external sites, or create new content directly within Anthologize. Then outline, order, and edit your work, crafting it into a single volume for export in several formats, including—in this release—PDF, ePUB, TEI.

I don’t have my own WordPress server, but I suddenly have one good reason to set one up. By the way, you have to have your own WordPress server (not necessarily your own server), since it’s not meant to be installed on a hosted WordPress service. Since I was just testing it out, I installed the whole WordPress stack on my computer along with the Anthologize plugin.

I will save the review for another post, but I can say that I was definitely impressed. As an early alpha version 0.5, it shows lots of potential and clearly shows that the Anthologize creator(s) have a good understanding of the digital publishing workflow possibilities.

The Future of QR Codes and Digital Content

I’ve been thinking about QR Codes for a while now. Usually, I dismiss it as a silly technology looking for a purpose. If you don’t know what QR codes are, perhaps it’s because it is the kind of emerging technology that only geeks know of and care about. Rather than try to explain it myself, let’s defer to some better write-ups:


From the Wikipedia page:

A QR Code is a matrix barcode (or two-dimensional code), readable by QR scanners, mobile phones with a camera, and smartphones. The code consists of black modules arranged in a square pattern on white background. The information encoded can be text, URL or other data.

And an excellent article on QR codes at SXSW:

While QR codes have reached a mainstream Japanese audience, in the U.S. QR code usage is limited to alpha geeks–and not all of them are sold on the idea. … Many think QR codes are gimmicky, clumsy, not used well or enough, or that they’re “a solution looking for a problem.”

“QR” supposedly stands for “quick response” but it’s possible that this definition was applied later, just like RSS was rebranded as “Really Simple Syndication”. Even this friendlier definition of RSS was still too geeky and obscure for normal people and eventually “feed” became the mainstream term used to refer to RSS content and other subscribed content. Yes, I’m referring to Twitter and Facebook status updates, since these are also linear streams of content presented in reverse chronological order.

Anyway, the purpose of this article is not to rant about QR codes. The real purpose is to discover the future of this technology, which may be something besides QR codes. This week, I came across this Engadget article about Microsoft Tags and it captured my attention. Here’s an excerpt:

201010291847.jpgMicrosoft might be late to the cameraphone-able barcode game, but it appears to be making up for lost time. Its multi-colored (and, frankly, rather attractive) Tag barcodes added a few important innovations on top of the general QR code concept, and apparently to good effect: 2 billion Tags have been printed since the January 2009 launch, and 1 billion of those Tags were printed in the past four months. Sounds like Microsoft has found some momentum, and they claim to have gained a lead in the publishing industry already.

Here’s what I find interesting about this:

  1. Microsoft has a history of creating products based on pre-existing technologies and giving them simple and obvious names. The best example is Microsoft Windows. The name alone explains what it is and most people like things to be simple and obvious, especially with computers. At the time, the alternative was a Macintosh, and the name did not explain what it was. Instead, it was often perceived as “expensive”, among other things.
  2. “Tags” is another example of this practice. Most people can recognize “tag” and guess that it is like a label that helps identify something.   
  3. In a weird way, the colorful Microsoft Tags have a vague similarity to the Windows logo, while a QR code seems to evoke old technologies that lacked color like Xerox copies and UPC bar codes.
  4. They already have excellent uptake with 2 billion tags. I suppose that translates to 2 billion products or ads or physical items that have a unique tag barcode.

If you visit the Tags website, the story gets more intriguing. Microsoft even did a nice job with the tagline: “Connecting Real Life and the Digital World using Mobile Barcodes!” Wow, since when did Microsoft figure out how to market things. (which, by the way, is different from choosing simple product names)

Microsoft’s approach with Tags seems to be spot-on. They are touting how Tags provides benefits that businesses really care about. Specifically, metrics and reporting are important to large companies whenever they make an investment in media or advertising. Omniture is a great example of this. Companies want to know the ROI of their online properties and marketing campaigns and are willing to pay millions to get the pretty executive reports.

And that’s when I had my “Aha” moment. Tags (or some future version of it) will be the Omniture of the future. The measurement of attention will move beyond websites to the mobile world, which is a mixture of mobile device users, mobile Internet, and the interaction with all content wherever it happens to be. That includes publishing, outdoor ads, local businesses, television, Internet video, coupons, etc. All content and all attention will need to be measurable.

Well, we’re not quite there yet. I feel that the basic problem with Tags and similar barcodes is that it depends on user-initiated interaction with a displayed code. Currently, the interaction model requires a user to take a picture of the code using their mobile device camera. That’s kind of clunky.   The call-to-action is really not there unless you specifically draw attention to it and offer an incentive.

What really needs to happen is for mobile devices to be auto-aware of nearby Tags, and hence Tags would need some kind of broadcast mechanism or wireless network to publish to. It seems likely that you would also need micro-circuits that uniquely identify the Tag. In simple terms, I think the Tags of the future will be more like stickers with embedded circuits and code that mobile devices can interact with.

Perhaps in a future article, we can discuss how Tags are being used in the industry and how they might be used in digital publishing.