June 27, 2005

Sig Rinde talks about tags. And he has created a new gizmo that allows you to play with his ideas:

Even with tags we easily become overwhelmed and would require some data-structure to find our way. Technorati follows 1.3 million tags now!

Every person on this planet has a tag; name or social security number etc. 6.45 billion of them.

This experiment:

Uses multiple tag choices to choose and find.

Using multiple tags, about 20 tags would cover the 1.3 million single-use tags at Technorati.

Using multiple tags, about 33 tags could give a unique identity to every person in the whole world.

(Quite a few years since I studied statistics, believe I'm in the ballpark, but anybody out there who could corroborate?)

And 20-30 tags are less cumbersome to navigate than 1.3 million, or 6 billion!

Multiple tags can replace any single tag, however unique that is.

You're tagged with your name. That does not say much, does it? Unless I know you of course.

Now try multiple tags. Add 10 tags, red hair, tall, birthplace etc. and you may be one of 153,000 with exact same tag set. Add yet another one that says more about you, say 'Italian speaking' - voila, you got only 9,675 individuals with the same tags. Add one more, now 634 identicals. Add two more and 'highlighting' exactly those 14 tags gives one return; you.

Ditto for plants, ditto for file structure on your computer, goodbye folders and search. Etc.

Add that a set of tags gives immediate (and complete!) information about the object. Far beyond what a two dimensional system may give (First and middle name, family name, does not give much information that).

And that is what knowledge is all about. Expand on that.

Time for a remake of Carl von Linné's work?

This new gizmo is built around the stuff he's building over at Thingamy, of course.

We've all heard of prime numbers. Is there such a thing as "prime tags"?

And if so, what are they called, and how many of them are there?

[Disclaimer: Sig and I work together.]

This is cute, and a very nice use of Javascript and CSS. The real problem, however, with a simple boolean search like this is how do we determine the 20 tags that can define any blog entry?

Are there 20 or 30 keywords that, in combination, could describe any book ever written? Don't "folksonomies" exist because it's so much easier to freely tag things than to fit them into predefined buckets (whether it's one bucket or multiple buckets)?

Just some thoughts... any ideas how one would go about determining this "master list"?

Posted by: toby at June 27, 2005 9:25 PM

Toby, you hit the real issue there!

And Hugh, another bulls eye, "prime tags" I like a lot! A bit like common denominators...

And after all, why do people sometimes get into heated discussions, fights, even wars - semantics, cultural differences in how we understand words...

Most tags or keywords could probably be represented or replaced by another set of tags - perhaps such tags that in itself could shed more light on the "meaning" of the tag, keyword, word of the author?

Carl Linnaeus found that using family, form and so forth in the name would tell more about the plant than just "dandelion" (the French calls it 'piss en lit' which tells a bit more :). And that is what is called 'knowledge'. Relationship between objects.

In that sense the folksonomies does not forward knowledge as such, even if it's colourful, interesting etc.

Lets keep this discussion going, this I like a lot! :-D

Posted by: sig at June 27, 2005 10:01 PM

Prime Tags

raving lunacy

Mine All Mine!!!!!

Posted by: the head lemur at June 27, 2005 11:31 PM

It's easy to visualize how you can create a limited set of tags for a given set of things that can be classified into a nice taxonomy (e.g. "People" or "Books"). But can you really come up with a small set of tags that would classify "Everything"?

Posted by: William at June 28, 2005 1:09 AM

Now, you're talking. This is exactly what I was thinking was needed to classify blogs/webcasts so that like minded could find each other. I knew you'd know.

Posted by: Shelley Noble at June 28, 2005 1:42 AM

I suspect that prime tags are the tags that have more rather than less commonly shared meaning in defined areas of human activity or knowledge ... like "skates", "sticks", "rules", "teams" etc. in hockey, or "yarn", "stitches", "patterns" for knitting, and so on.

The confluence of structured taxonomies dancing back and forth with miscellaneous author-designated tags that reflect the participation and interaction of people sharing and searching for meanings or pertinent information ... now there's something I will watch develop with great interest.

There once was a paper I read titled "Cooperative Classification And Communication Using Shared Metadata" .. I believe the author was the person (aah .. the wonder of Google - Adam Mathes) who came up with the term folksonomies, or is one of the people in that tribe who follow and lead the development of the understanding of folksonomies.

Posted by: Jon Husband at June 28, 2005 4:48 AM

Yer blog doesn't like html ... here's the URL of said paper


Posted by: Jon Husband at June 28, 2005 4:49 AM

Thanks, Jon. And let's not forget Clay Shirky: "Ontology is Overrated: Categories, Links, and Tags".


Posted by: hugh macleod at June 28, 2005 11:10 AM

Hugh your ability to pick labels for ideas that are both provocative and meaningful is truly impressive.

That is a real skill... and 'Prime Tags' is an excellent example.

Posted by: Alex at June 28, 2005 11:28 AM

Sean McGrath recently posted on "semantic primes" (click my name for the link).

If you Google "fractal thicket" you'll find my approach to breaking down 'composite' semantics into basic concepts like person-place-thing.

A Flickr pic with two people and a pizza could be tagged "person1 person2 thing1" for starters...

Posted by: Jorn Barger at June 28, 2005 12:04 PM

That Clay Shirky article reminded me of the 'pathway problem' - where do you put a pathway across a park? If you build the path first, it's like a hierarchical categorisation - in Clay's words the 'Yahoo' approach. If you let the people use the park without pathways, they will create them for you - more the 'Google' approach, or del.icio.us tags where the 'pathways' to information will be formed by people tagging links.

Posted by: Ric at June 28, 2005 2:12 PM

Ric, will follow your suggestion, Junior promises that you (and everybody else) shall be able to add tags to any post and comment (but not deduct of course) in the 'experiment'... in next version... soon (see geek definition of 'soon') :-)

That'll enable the pathway development nicely, perhaps...

Posted by: sig at June 28, 2005 4:21 PM

There are an infinite number of prime numbers, of course, which is a reassuring thought when making the analogy between tags for information and these numerical building blocks (that even though information is categorised, it is not limited to finite categories).

Posted by: Sarah B at June 28, 2005 5:13 PM

The more I think about it, the less I think trying to limit the number of tags is going to work. Whatever set you come up with, there will always that one situation where yet-another-tag is going to be required. I'm thinking a better approach might be to create classes of tags. (E.g. color: red-green-blue, genre: humor-drama-scifi, etc.) This would increase the ability to identify like things as they would tend to use tags from the same set of classes. For instance, something referencing a person would use tags that describe hair color (blond), ethnicity (Hispanic), and so on.

Posted by: Wiliam at June 28, 2005 5:26 PM

William, nobody's saying the number of tags should be limited to a certain number. Sig's point was how surprisingly few are needed in order to handle large amounts of information.

And the tags will be created from the bottom-up, not the top-down, I'm guessing.

Posted by: hugh macleod at June 28, 2005 5:31 PM

William's idea of creating classes of tags might be called a tagsonomy. :-)

There are techniques for automatically gleaning "concepts" from a textual document that have been in use for quite some time by knowledge management software. The idea is to avoid requiring people to add their own tags. It has been a while since I followed the market, so all of the company names I used to know have disappeared in the Great Popping of the dot com Bubble, but Intellisophic has something like what I'm talking about:


Fascinating stuff.

Posted by: Matt at June 28, 2005 7:01 PM

As far as I know, my name is a prime tag, because as far as I can tell, I'm the only person on earth who has my name, or has ever had it. My last name is pretty rare, I know almost everyone who has it in the US, and there are very few in Europe from what I can gather.

So there can be simple prime tags that are unique identifiers.

Posted by: Jeff Zugale at June 28, 2005 7:07 PM

It seems to me that an interesting solution would be to take the way people are tagging things and algorithmically determine the optimal set of tags.

The only algorithms I've seen run on folksonomies so far have been simple co-occurence metrics, which tell you "related" tags. I have a couple of ideas of how individual tags can be aggregated, which I'm trying out. I'll keep you posted.

Posted by: toby at June 28, 2005 7:32 PM

[NOTE TO SELF:] Stick to cartooning. This is so out of your league...

Posted by: hugh at June 28, 2005 8:16 PM

Learning which properties or tags act as good identifiers is big in machine learning, and has several algorithms. Finding the 'prime tags' for some finite group of items is a matter of finding which tag most effectively splits the overall group into smaller subgroups, recursively, until you are left with unique results. Decision trees are a popular type of machine learning classifier and are an excellent example of this technique.

Posted by: Marty at June 28, 2005 9:23 PM

I think that if you consider a formal taxonomy, you'll have your idea of prime tags. In fact, it's one of the ways that Tim Berners-Lee intended digital marking (tags) to be used, I believe, when he wrote his original article on the Semantic Web.

An ontology is a superset of a taxonomy (taxonomy is the heirarchical, object oriented view of knowledge categories - your prime tags, especially if you leave off several layers of leaf nodes). A taxonomy becomes a knowledge representation only when each level of it's heirarchy is filled with many enumerations of data. If you just take the taxonomy (and as I suggested, cut off several layers of leaf nodes), then you have basically a nice tree structure of prime tags.

It could work, you know . . .


Posted by: Chuck Turnitsa at July 1, 2005 5:28 AM