十八般武艺,谁主天下?(Ranking of Weapons in Chinese Kung-Fu Novels)

Today a friend asked me what I did in the Ranking of Weapons in Chinese Kung-Fu Novels article. Here is the backup copy of my reply.

Chinese link: http://cos.name/2013/02/jinyong-fiction-mining/

-------------the analysis----------------

  • Find raw texts for 14 novels by a famous Chinese Kungfu novel writer. Each novel is at least 100K Chinese words long.
  • Do some natural language processing technics… mainly filter the weapon and character name keywords from the long texts.
  • Define a relationship (formally links in a network) between two keywords if they appear in the same paragraphs.
  • Do the social network analysis. So you see the graphs here : http://cos.name/2013/02/jinyong-fiction-mining/
  • Rank these weapons by their pagerank score in the network. e.g.

剑Sword  0.018411053
刀Knife   0.017516021
掌Palm  0.017137869
抓  0.011880115
拳Fist  0.011605281

  • some other analysis, e.g. clustered by novels, relationships between characters, etc.