Today a friend asked me what I did in the Ranking of Weapons in Chinese Kung-Fu Novels article. Here is the backup copy of my reply.
Chinese link: http://cos.name/2013/02/jinyong-fiction-mining/
-------------the analysis----------------
- Find raw texts for 14 novels by a famous Chinese Kungfu novel writer. Each novel is at least 100K Chinese words long.
- Do some natural language processing technics… mainly filter the weapon and character name keywords from the long texts.
- Define a relationship (formally links in a network) between two keywords if they appear in the same paragraphs.
- Do the social network analysis. So you see the graphs here : http://cos.name/2013/02/jinyong-fiction-mining/
- Rank these weapons by their pagerank score in the network. e.g.
剑Sword 0.018411053
刀Knife 0.017516021
掌Palm 0.017137869
抓 0.011880115
拳Fist 0.011605281
- some other analysis, e.g. clustered by novels, relationships between characters, etc.