Japan: Newsgraphy and HatenarMaps

With ever-increasing amount of news information making its way onto the Internet every day, the question of how to parse and interpret this information is becoming more and more critical. Services such as Newsmap tackle this problem by visualizing data from news flows in a 2-dimensional space in an attempt to reveal patterns in news reporting.

Japanese blogger and engineer id:kaiseh [ja] has taken a somewhat different approach with their visualization tool HatenarMaps [ja], which made a splash [ja] in June of this year, and more recently with Newsgraphy, launched on September 25th. Both tools use Voronoi Treemaps to display data in a 2-dimensional space, HatenarMaps taking its information from Japan's popular blogging platform Hatena Diary [ja], Newsgraphy drawing in news feeds from Yahoo! News [ja]. id:kaiseh has also developed a popular [ja] visualization service for tracking the top bloggers on Hatena Diary through number of bookmarks and RSS readership, referred to as TopHatenar [ja].

Screenshot from HatenarMaps.

In a post from June 9th, id:kaiseh explains how HatenarMaps works:


To put it in simple terms, this service is an experiment in which regions [of a 2-dimensional map] are allocated to users of Hatena Diary according to the number of times they have been bookmarked, with regions of similar users placed next to each other.

地図の全体を見渡すことで、はてダの大まかなトレンドを掴むこともできるし、スケールを拡大していけば個別記事に到達することもできます。さらに、 Google Mapsで検索するような感覚ではてなidやキーワードを入力して地図を探索したり、「去年と今年で勢力図がどう変わったか」を調べることもできます。

Surveying the entire map enables [the user] to grasp broad trends in Hatena Diary, whereas by zooming in one arrives at individual blog entries. It's also possible to search the map by entering a Hatena id or keyword, similar to the way search works in Google Maps, and to compare differences in relationships of influence from year to year.

To get a good idea of how HatenarMaps works, have a look at the fantastic video below demonstrating the visualization algorithm. The video cannot be embedded directly, please see pages at Hatena (no registration) or NicoNico Douga (requires registration, English instructions for which can be found here) to watch it.:

Video from NicoNico Douga demonstrating visualization algorithm used in HatenarMaps. The video cannot be embedded directly, please see pages at Hatena (no registration) or NicoNico Douga (requires registration, English instructions for which can be found here) to watch it.

HatenarMaps is developed with Java, and uses the following method to minimize the group of all Hatena users to a manageable size:


First, based on the following rules, the top 1000 users of Hatena Diary are selected out.

1. 被ブクマ数総計の多いユーザから順に抽出。
2. ただし、RSS購読者数が上位2000位以内にランクしていないユーザは除外。

1. Extract users with high bookmark totals in order [of number of bookmarks].
2. However, exclude users whose RSS readership numbers are not ranked in the top 2000.

(The blogger notes that the above steps use the database from TopHatenar.)

Next, vectors are calculated for each of these users:


Next, a term vector is created for each of these 1000 Hatena users by summing together the frequency of tags assigned to the [user's] top 5 high-bookmarked diary entries in Hatena Bookmarks. However, since this would [add up to] tens of thousands of tags in total, low-frequency tags are dropped, decreasing the dimensionality of the vector to 300.

And finally, groups are created through clustering:


Once the vector has settled [to fixed values] for each of the users, non-configurational clustering using the K-means technique is performed, dividing the 1000 users into 100 clusters. As a consequence, users which have strong similarities in their Hatena Bookmark tags (i.e. whose diaries have similar themes) are grouped together.


At this time, groups are attached names. Term vectors of all users belonging to the group are summed together, and the tag with the strongest representation is adopted as the name of the group.

For further details about how HatenarMaps works, please see the original post in Japanese.

Map of news by subject using Newsgraphy

Newsgraphy is a service with a similar concept to that of HatenarMaps. In a September 25th post, id:kaiseh explains:

6月に公開して大きな反響をいただいたHatenarMapsの可視化手法を、Yahoo!のトピックスAPIから取得したニュース記事に適用して、いろいろと機能強化を施したものがNewsgraphyです。Mashup Award 4thにも応募しています。

Newsgraphy applies the visualization technique from HatenarMaps, which drew a big response when it was made public in June, to news articles obtained from the Yahoo! Topics API, with various functional enhancements. I've also submitted it to Mashup Award 4th [see site in English].

追記(2008/9/26): 「HatenarMapsの可視化手法を適用」と書きましたが、これは二次元平面へのマッピング手法(Voronoi Treemap)のことで、クラスタリング手法は含んでいません。Newsgraphyは、Yahoo!で分類済みのニュースカテゴリ階層を使用しています。

Note (2008/9/26) : When I wrote that [Newsgraphy] “applies the visualization technique from HatenarMaps”, I meant the technique for mapping to a two-dimensional surface (Voronoi Treemap), and not the clustering technique. Newsgraphy makes use of news category layers pre-classified by Yahoo!

The blogger then contrasts Newsgraphy with newsmap:


Newsmap is well-known in the area of news visualization, but what I was aiming in developing [Newsgraphy] was a site that is more interesting and more practically useful than newsmap.

and with HatenarMaps:


In HatenarMaps, each region is very angular, and I had the feeling that as a map it was somewhat inorganic, so [with Newsgraphy] I made the topography more fractal-like. I think it has an appearance that is very nature-like.

In Newsgraphy, it is possible to narrow the range of visualization through use of a calendar:


When you specify a range of dates using the calandar, regions are highlighted corresponding to news that was reported during that period. This function makes it possible to observe transitions, in chronological order, between high-profile topics.

Visualization of news in a specific time range (white regions) using Newsgraphy. The grey region is labeled “LDP elections”, and the white regions are news stories on September 22nd on this topic.

id:kaiseh notes later that it is possible to highlight regions by searching for keywords. The default mode colors regions differently according to the category of news, with a 3-color classification scheme used to enable users to survey news from various perspectives.

There is another visualization mode for person/place/organization, which functions as follows:


From results of a morphological analysis of the news headline:

* 特定の人物に関わるニュース
* 特定の(地理的な)場所に関わるニュース
* 特定の企業や組織に関わるニュース

* News involving specific persons
* News involving specific (geographic) places
* News involving specific companies or organizations


are separated, and each is represented as icons. Green icons for people, blue icons for places, and red icons for organizations.

Visualization in Newsgraphy by person (green) / place (blue) / company or organization (red)

Region for news about marriages and divorces of famous entertainers, mostly green because it is news about people.

Finally, news can also be categorized by coloring it according to how recent it is. In this method:


The color of regions corresponds to when the news came out: regions for recent news items are colored a verdant green, whereas older news items are colored an earthy brown.

News about the recent tainted rice scandal, visualized by how recent news items are. The scandal was covered most heavily a few weeks ago, so most news items are brown (old) and not green (new).

The blogger finishes the post with a look to the future of Newsgraphy:


Crawling only started from September, so it only includes recent information, but I think that in the next half-year or year, as data is collected and the map develops, it will become possible to do really interesting analysis of news trends.

Start the conversation

Authors, please log in »


  • All comments are reviewed by a moderator. Do not submit your comment more than once or it may be identified as spam.
  • Please treat others with respect. Comments containing hate speech, obscenity, and personal attacks will not be approved.