Donna Bergmark | January, 2006 |
Cornell Computer Science | |
Ithaca, NY 14850 USA | Email: bergmark@cs.cornell.edu |
A very good use for crawlers loaded with extensibility is to make a focussed crawler which can quickly and efficiently build up collections of URLs of Web pages relevant to a given topic.
Five Cornell M. Eng. students in the Fall of 2002 put together a very respectible collection building system using a pluggable Web crawler, based on Mercator. My retirement project is to migrate that project from Mercator to Heritrix. If the project succeeds, then we will have a nascent collection synthesis system based on an open source Web crawler.
So far it is going well, and I am having lots of fun.
1999-Jan 2004: Researcher and Programmer/Analyst Specialist for the Cornell Digital Library Research Group. My projects included reference linking, Web crawling, and the National Science Digital Library (NSDL) project. The work on automatic collection building by web crawling won the best paper award at JCDL 2002.
1998-1999: Programmer/Analyst Specialist for the Cornell Network Research Group, specializing in Computer Telephony and integrating the PSTN with the Internet. Led a research project of 9 students in the Spring and Summer of 1999, which produced a component-based telephony API. Solaris, Microsoft NT; Java, JTAPI, C/C++, Lucent PBX and Dialogic gateway programming. Streaming media and sound technologies.
Carleton College and Boston University, BA in History
Cornell University, MS in Computer Science
Recent extramural courses in: computer science, materials science,
chemistry, astronomy, classics, linguistics and physics.
Completed Alexander Hamilton Seminar on Installing & Managing NT Server 4.0.
Compilers and languages, particularly for parallel processing systems. General interest in high performance computing, especially for science and engineering applications. Recent interest in network programming (Java, LDAP, IP Telephony and Web crawling).
D. Bergmark, Steve Hitchcock et al. Open Citation Linking. D-Lib Magazine (8,10). October 2002.
D. Bergmark, C. Lagoze, and A. Sbityakov. Focused Crawls, Tunneling, and Digital Libraries. ACM European Conference on Digital Libraries, Rome, Italy, September 16-18, 2002. (Preprint)
D. Bergmark. Using High Performance Systems to Build Collections for a Digital Library. Proceedings of the 2002 International Conference on Parallel Processing Workshops (ICPP 2002 Workshops), Vancouver, Canada, August 18--21, 2002. (Postscript Preprint)
D. Bergmark. Collection Synthesis. ACM Proceedings of the Joint Conference on Digital Libraries 2002, Portland Ore (best paper award), July, 2002. (official) )
D. Bergmark, P. Phempoonpanich, and S. Zhao. ``Scraping the ACM Digital Library''. SIGIR Forum (35,2). Fall 2001.
D. Bergmark and C. Lagoze. An Architecture for Automatic Reference Linking. Proceedings of the European Conference on Digital Libraries, Darmstadt, DE, September 2001. (pdf)
D. Bergmark. Automatic Extraction of Reference Linking Information from Online Documents. Technical Report TR 2000-1821, Computer Science Department, Cornell University, November, 2000. (postscript)(pdf)
D. Bergmark and C. Lagoze. Reference Linking the Web's Scholarly Papers. Technical Report TR 2001-1835, Computer Science Department, Cornell University, February, 2001. (postscript)(pdf)
D. Bergmark, W. Arms, and C. Lagoze. An Architecture for Reference Linking. Technical Report TR 2000-1820, Computer Science Department, Cornell University, October, 2000. (postscript )(pdf)
D. Bergmark. Link Accessibility in Electronic Journal Articles. Technical Report TR 2000-1793, Computer Science Department, Cornell University, March, 2000. (postscript)(pdf)(html)
D. Bergmark and S. Keshav. Building Blocks for IP Telephony. IEEE Communications Magazine, pages 88-94, April 2000. (postscript version)
D. Bergmark. ITX Programmer's Guide. Cornell Computer Science Technical Report TR99-1768. http://www.cs.cornell.edu/cnrg/telephony/JavaDocs/HTML_Guide/HTML_Guide.html.
D. Bergmark. Tools for HPF Programmers. A tutorial presented at Supercomputing '97,
November 1997.
B. Appelbe and D. Bergmark. Software tools for high-performance computing: Survey and
recommendations. Scientific Programming, pages 239-249, Fall 1996.
D. Bergmark. Optimization and parallelization of a commodity trade model for the IBM SP2
using parallel programming tools. In Proceedings of 1995 International Conference on
Supercomputing, Barcelona Spain, pages 227-236, July 1995.
D. Bergmark. ``The optimization and parallelization of an economic model using KSR programming tools'', Invited talk, KSR Users' Group Meeting, Manchester U.K. July 1994.
C.M. Pancake and D. Bergmark, ``Do parallel languages respond to the needs of scientific programmers?'', Computer, 23(12):13-23, December 1990.
A longer list of my work can be found at http://www.cs.cornell.edu/bergmark/resume_long.html.