–en– Semantic Tag Cloud

english version / german version


In the field of enterprise content management search is a central component for many years. Two kinds of searches are established for years:

  • Full text search: this requires an indexing service / server and computing capacity
  • Filter navigation / also called meta search or search for attributes.
    Content must be well attributed so that a search via attributes makes sense.

In a meeting with an important customer on the subject of faceted search the following questions and requirements came up:

“Why are no documents shown whose content is related to this document?”

“Can I choose the entrance level of the successive filter myself or must the user always go through the whole search tree from top to down?”

“I want to determine the result / target of certain queries.”

From this brainstorming with users without a deeper technical knowledge I took the three fundamental ideas and developed the following functional description. It is clear to me that this deals with semantic only on principle. However I want to reach the target in several steps. For this reason the source code for the plug-in has been optimized to ease reading it.

Simultaneously to this plug-in an adapted search based on EMC’s Explore is developed. Especially xCP 2.0 already offers four kinds of search (real-time, historical, full text and task-list query) which can be extended accordingly. Real-time query is especially interesting because the search result can be updated after each new input.

Functional description

This plug-in extends the functionality of a simple tag cloud in WordPress by three points:

  1. Keywords are assigned to each article in the blog
  2. Keywords have an importance and are displayed accordingly
  3. Keywords can point to a certain article and not to the tag search

Keywords are assigned to blog articles

For each article in the blog keywords are displayed separately. The advantage of this is that different but also similar keywords can be displayed on each page.

Cloud including all available tags form Mr.Crazyapples’s blog.

E.g.: the keywords Audi, BMW, Mercedes and Peugeot are assigned to the page ‘cars’. On the page ‘bicycles’ the keywords Hercules, Gudereit, BMW and Peugeot exist. The keywords BMW and Peugeot can be prioritized differently on both pages based on the context.

All this requires that tags are assigned wisely and in a certain amount. A number of 12 to max. 24 tags has proven reasonable. Too little keywords do not complete the cloud and / or many keywords are equally prioritized. On the contrary many keywords take away clearness and the important relationships are not distinctly marked.

Prioritizing keywords

In common presentation keywords are prioritized by frequency. A frequent word is more important than a word that occurs only once or twice. On principle this is correct. However this principle has certain disadvantages for example if only a few tags are assigned and many infrequent keywords are presented. In this case all tags would seem to be equally important.

Tags relative to the specific page.
Tags relative to the specific page.

Prioritizing keywords offers the possibility to even mark tags as important that are assigned only once.

E.g.: in a DMS many documents on server installation, server administration, user management and so on exist. The document on server security exists only once and its tag is not valid for the other documents. Nevertheless the document ‘Server security’ must always be displayed as important in the tag cloud.

Permalinking keywords

In the demonstration (presentation) before the brainstorming it was criticized as well that from the cloud always the search function is called. Certain keywords should only be linked to a certain document. However this idea is interesting even though it contradicts the basic principle of search. I have seized this idea and want to test this feature in practice.