Today a friend asked me what I did in the Ranking of Weapons in Chinese Kung-Fu Novels article. Here is the backup copy of my reply.
Chinese link: http://cos.name/2013/02/jinyong-fiction-mining/
- Find raw texts for 14 novels by a famous Chinese Kungfu novel writer. Each novel is at least 100K Chinese words long.
- Do some natural language processing technics… mainly filter the weapon and character name keywords from the long texts.
- Define a relationship (formally links in a network) between two keywords if they appear in the same paragraphs.
- Do the social network analysis. So you see the graphs here : http://cos.name/2013/02/jinyong-fiction-mining/
- Rank these weapons by their pagerank score in the network. e.g.
- some other analysis, e.g. clustered by novels, relationships between characters, etc.
I tried to speak these in a relatively plain tone to make it sound less subjective. In recent days I have been listening to many researches from industry people and so here are my comments：
- the industry is really good at renaming existing stuffs, to make it better interpreted, but these are perhaps decorations rather than fundamental innovations.
- not many machine learning people think about their algorithms deeply. they see whether an algorithm works but they probably do not know when it would fail.
- there is always a trade-off between consistency and efficiency, or the technical beauty and usefulness.
...just my opinion. It always depends on the purpose.
There are many beautiful pieces of memories from travels. This is definitely one of the best:
A dream garden in the dusty city center of Kathmandu... unbelievable and such a sharp contrast.
Perhaps it is also related to my status at that time... no work to follow, no exam to prepare, no deadline to catch, no pressure at all. Single, alone, strangers, wander, aimless, relax, flowers, afternoon tea... the best combination ever.