- several pages of sumo wrestler bios and statistics
- lots of pages about the odds on horse races
- many pages in Estonian
- a lot of pages about finances, weather, and distaster information
skip to main |
skip to sidebar
Thursday, January 15, 2009
Funny Biproducts of Research
I've recently begun working full-throttle on my Master's thesis. The first stage requires crawling a number of pages looking for certain HTML features within them. Unfortunately, the feature I am interested in can be used in multiple ways, and I need to make sure manually that each page contains the one I need. Luckily I was able to make an interface that significantly speeds up this process, but the whole process still requires several hours of clicking 'keep' or 'reject' buttons. On the bright side, I got to see first-hand what an eclectic collection of pages I've crawled:
Subscribe to:
Post Comments (Atom)
Kate...
Archive
Labels
- #picmecomp (1)
- books (21)
- career (18)
- computer geekdom (40)
- conferences (39)
- datamining (8)
- frisbee (3)
- games (5)
- ghc08 (5)
- ghc10 (3)
- ghc11 (15)
- ghc12 (13)
- ghc13 (9)
- graduate school (23)
- iPhone (5)
- languages (1)
- LaTeX (3)
- life on a mountain (19)
- linux (3)
- movies (1)
- music (5)
- photography (6)
- programming (19)
- radio (4)
- robotics (21)
- scholarships (10)
- science fiction (33)
- science news (19)
- travel (54)
- windows phone (4)
- Women in Computing Science (65)
No comments:
Post a Comment