Glossary
Digital libraries have absorbed terminology from many fields, including computing, libraries, publishing, law, and more. This glossary gives brief explanations of how some common terms are used in digital libraries today, which may not be the usage in other contexts. Often the use in digital libraries has diverged from or extended the original sense of a term.
- AACR2 (Anglo-American Cataloguing Rules)
- A set of rules that describe the content that is contained in library catalog records.
- abstracting and indexing services
- Secondary information services that provide searching of scholarly and scientific information, in particular of individual journal articles.
- access management
- Control of access to material in digital libraries. Sometimes called terms and conditions or rights management.
- ACM Digital Library
- A digital library of the journals and conference proceedings published by the Association for Computing Machinery.
- Alexandria Digital Library
- A digital library of geospatial information, based at the University of California, Santa Barbara.
- American Memory and the National Digital Library Program
- The Library of Congress's digital library of materials converted from its primary source materials related to American history.
- applet
- A small computer program that can be transmitted from a server to a client computer and executed on the client.
- archives
- Collections with related systems and services, organized to emphasize the long-term preservation of information.
- Art and Architecture Thesaurus
- A controlled vocabulary for fine art, architecture, decorative art, and material culture, a project of the J. Paul Getty Trust.
- artifact
- A physical object in a library, archive, or museum.
- ASCII (American Standard Code for Information Interchange)
- A coding scheme that represents individual characters as 7 or 8 bits; printable ASCII is a subset of ASCII.
- authentication
- Validation of a user, a computer, or some digital object to ensure that it is what is claims to be.
- authorization
- Giving permission to a user or client computer to access specific information and carry out approved actions.
- automatic indexing
- Creation of catalog or indexing records using computer programs, not human cataloguers.
- Boolean searching
- Methods of information retrieval where a query consists of a sequence of search terms, combined with operators, such as "and", "or", and "not".
- browser
- A general-purpose user interface, used with the web and other online information services. Also known as a web browser.
- browsing
- Exploration of a body of information, based on the organization of the collections or scanning lists, rather than by direct searching.
- cache
- A temporary store that is used to keep a readily available copy of recently used data or any data that is expected to be used frequently.
- California Digital Library
- A digital library that serves the nine campuses of the University of California.
- catalog
- A collection of bibliographic records created according to an established set of rules.
- classification
- An organization of library materials by a hierarchy of subject categories.
- client
- A computer that acts on behalf of a user, including a user's personal computer, or another computer that appears to a server to have that function.
- CGI (Common Gateway Interface)
- A programming interface that enables a web browser to be an interface to information services other than web sites.
- Chemical Abstracts
- A secondary information service for chemistry.
- CNI (Coalition for Networked Information)
- A partnership of the Association for Research Libraries and Educause to collaborate on academic networked information.
- complex object
- Library object that is made up from many inter-related elements or digital objects.
- compression
- Reduction in the size of digital materials by removing redundancy or by approximation; lossless compression can be reversed; lossy compression can not be reversed since information is lost by approximation.
- computational linguistics
- The branch of natural language processing that deals with grammar and linguistics.
- controlled vocabulary
- A set of subject terms, and rules for their use in assigning terms to materials for indexing and retrieval.
- conversion
- Transformation of information from one medium to another, including from paper to digital form.
- CORBA
- A standard for distributed computing where an object on one computer invokes an Object Request Broker (ORB) to interact with an object on another computer.
- CORE
- A project from 1991 to 1995 by Bellcore, Cornell University, OCLC, and the American Chemical Society to convert chemistry journals to digital form.
- Cryptolope
- Secure container used to buy and sell content securely over the Internet, developed by IBM.
- CSS (Cascading Style Sheets)
- System of style sheets for use with HTML, the basis of XLS.
- CSTR (Computer Science Technical Reports project)
- A DARPA-funded research project with CNRI and five universities, from 1992 to 1996.
- DARPA (Defense Advanced Research Projects Agency)
- A major sponsor of computer science research in the U.S., including digital libraries. Formerly ARPA.
- data type
- Structural metadata associated with digital data that indicates the digital format or the application used to process the data.
- DES (Data Encryption Standard)
- A method for private key encryption.
- Dewey Decimal Classification
- A classification scheme for library materials which uses a numeric code to indicate subject areas.
- desktop metaphor
- User interface concept on personal computers that represents information as files and folders on a desktop.
- Dienst
- An architecture for digital library services and an open protocol that provides those services, developed at Cornell University, used in NCSTRL.
- digital archeology
- The process of retrieving information from damaged, fragmentary, and archaic data sources.
- Digital Libraries Initiative
- A digital libraries research program. In Phase 1, from 1994 to 1998, NSF/DARPA/NASA
funded six university projects
- Phase 2 began in 1998/9.
- digital object
- An item as stored in a digital library, consisting of data, metadata, and an identifier.
- digital signature
- A cryptographic code consisting of a hash, to indicate that data has not changed, that can be decrypted with the public key of the creator of the signature.
- dissemination
- The transfer from the stored form of a digital object in a repository to a client.
- distributed computing
- Computing systems in which services to users are provided by teams of computers collaborating over a network.
- D-Lib Magazine
- A monthly, online publication about digital libraries research and innovation.
- DLITE
- An experimental user interface used with the Stanford University InfoBus.
- document
- Digital object that is the analog of a physical document, especially textual materials; a document model is an object model for documents.
- domain name
- The name of a computer on the Internet; the domain name service (DNS) converts domain names to IP addresses.
- DOI (Digital Object Identifier)
- An identifier used by publishers to identify materials published electronically, a form of handle.
- DSSSL (Document Style Semantics and Specification Language)
- A general purpose system of style sheets for SGML.
- DTD (Document Type Definition)
- A mark-up specification for a class of documents, defined within the SGML framework.
- Dublin Core
- A simple set of metadata elements used in digital libraries, primarily to describe digital objects and for collections management, and for exchange of metadata.
- dynamic object
- Digital object where the dissemination presented to the user depends upon the execution of a computer program, or other external activity.
- EAD (Encoded Archival Description)
- A DTD used to encode electronic versions of finding aids for archival materials.
- electronic journal
- A online publication that is organized like a traditional printed journal, either an online version of a printed journal or a journal that has only an online existence.
- eLib
- A British program of innovation, around the theme of electronic publication.
- emulation
- Replication of a computing system to process programs and data from an early system that is no longer available.
- encryption
- Techniques for encoding information for privacy or security, so that it appears to be random data; the reverse process, decryption, requires knowledge of a digital key.
- entities and elements
- In a mark-up language, entities are the basic unit of information, including character entities; elements are strings of entities that form a structural unit.
- expression
- The realization of a work, by expressing the abstract concept as actual words, sounds, images, etc.
- fair use
- A concept in copyright law that allows limited use of copyright material without requiring permission from the rights holders, e.g., for scholarship or review.
- federated digital library
- A group of digital libraries that support common standards and services, thus providing interoperability and a coherent service to users.
- field, subfield
- An individual item of information in a structured record, such as a catalog or database record.
- fielded searching
- Methods for searching textual materials, including catalogs, where search terms are matched against the content of specified fields.
- finding aid
- A textual document that describes holdings of an archive, library, or museum.
- firewall
- A computer system that screens data passing between network segments, used to provide security for a private network at the point of connection to the Internet.
- first sale
- A concept in copyright law that permits the purchaser of a book or other object to transfer it to somebody else, without requiring permission from the rights holders.
- FTP (File Transfer Protocol)
- A protocol used to transmit files between computers on the Internet.
- full text searching
- Methods for searching textual materials where the entire text is matched against a query.
- gatherer
- A program that automatically assembles indexing information from digital library collections.
- gazetteer
- A database used to translate between different representations of geospatial references, such as place names and geographic coordinates.
- genre
- The class or category of an object when considered as an intellectual work.
- geospatial information
- Information that is reference by a geographic location.
- gif
- A format for storing compressed images.
- A web search program that ranks web pages in a list of hits by giving weight to the links that reference a specific page.
- gopher
- A pre-web protocol used for building digital libraries, now largely obsolete.
- handle
- A system of globally-unique names for Internet resources and a computer system for managing them, developed by CNRI; a form of URN.
- Harvest
- A research project that developed an architecture for distributed searching, including protocols and formats.
- hash
- A short value calculated from digital data that serves to distinguish it from other data.
- HighWire Press
- A publishing venture, from Stanford University Libraries, that provides electronic versions of journals, on behalf of learned and professional societies.
- hit
- 1. An incoming request to a web server or other computer system.
2. In information retrieval, a document that is discovered in response to a query. - home page
- The introductory page to a collection of information on the web.
- HTML (Hyper-Text Mark-up Language)
- A simple mark-up and formatting language for text, with links to other objects, used with the web.
- HTTP (Hyper-Text Transport Protocol)
- The basic protocol of the web, used for communication between browsers and web sites.
- hyperlink
- A network link from one item in a digital library or web site to another.
- ICPSR (International Consortium for Political and Social Science Research)
- An archive of social science datasets, based at the University of Michigan.
- identifier
- A string of characters that identifies a specific resource in a digital library or on a network.
- IETF (Internet Engineering Task Force)
- The body that coordinates the technological development of the Internet, including standards.
- InfoBus
- An approach to interoperability that uses proxies as interfaces between existing systems, developed at Stanford University.
- information discovery
- General term covering all strategies and methods of finding information in a digital library.
- information retrieval
- Searching a body of information for objects that match a search query.
- Informedia
- A research program and digital library of segments of video, based at Carnegie Mellon University.
- Inspec
- An indexing service for physics, engineering, computer science, and related fields.
- Internet
- An international network, consisting of independently managed networks using the TCP/IP protocols and a shared naming system. A successor to the ARPAnet.
- Internet RFC series
- The technical documentation of the Internet, provided by the Internet Engineering Task Force. Internet Drafts are preliminary versions of RFCs.
- interoperability
- The task of building coherent services for users from components that are technically different and independently managed.
- inverted file
- A list of the words in a set of documents and their locations within those documents; an inverted list is the list of locations for a given word.
- item
- A specific piece of material in a digital library; a single instance or copy of a manifestation.
- Java
- A programming language used for writing mobile code, especially for user interfaces, developed by Sun Microsystems.
- JavaScript
- A scripting language used to embed executable instructions in a web page.
- JPEG
- A format for storing compressed images.
- JSTOR
- A subscription service, initiated by the Andrew W. Mellon Foundation, to convert back runs of important journals and make them available to academic libraries.
- key
- A digital code used to encrypt or decrypt messages. Private key encryption uses a single, secret key. Dual key (public key) encryption uses two keys of which one is secret and one is public.
- legacy system
- An existing system, usually a computer system, that must be accommodated in building new systems.
- lexicon
- A linguistic tool with information about the morphological variations and grammatical usage of words.
- Lexis
- A legal information service, a pioneer of full-text information online.
- Los Alamos E-Print Archives
- An open-access site for rapid distribution of research papers in physics and related disciplines.
- manifestation
- Form given to an expression of a work, e.g., by representing it in digital form.
- MARC (Machine-Readable Cataloging)
- A format used by libraries to store and exchange catalog records.
- mark-up language
- Codes embedded in a document that describe its structure and/or its format.
- Medline
- An indexing service for research in medicine and related fields, provided by the National Library of Medicine.
- MELVYL
- A shared digital library system for academic institutions in California; part of the California Digital Library.
- Memex
- A concept of an online library suggested by Vannevar Bush in 1945.
- Mercury
- An experimental digital library project to mount scientific journals online at Carnegie Mellon University from 1987 to 1993.
- MeSH (Medical Subject Headings)
- A set of subject term and associated thesaurus used to describe medical research, maintained by the National Library of Medicine.
- metadata
- Data about other data, commonly divided into descriptive metadata such as bibliographic information, structural metadata about formats and structures, and administrative metadata, which is used to manage information.
- migration
- Preservation of digital content, where the underlying information is retained but older formats and internal structures are replaced by newer.
- MIME (Internet Media Type)
- A scheme for specifying the data type of digital material.
- mirror
- A computer system that contains a duplicate copy of information stored in another system.
- mobile code
- Computer programs or parts of programs that are transmitted across a network and executed by a remote computer.
- morphology
- Grammatical and other variants of words that are derived from the same root or stem.
- Mosaic
- The first widely-used web browser, developed at the University of Illinois.
- MPEG
- A family of formats for compressing and storing digitized video and sound.
- multimedia
- A combination of several media types in a single digital object or collection, e.g., images, audio, video.
- natural language processing
- Use of computers to interpret and manipulate words as part of a language.
- NCSTRL (Networked Computer Science Technical Reports Library)
- An international distributed library of computer science materials and services, based at Cornell University.
- Netlib
- A digital library of mathematical software and related collections.
- NSF (National Science Foundation)
- U.S. government agency that supports science and engineering, including digital libraries research.
- object
- A technical computing term for an independent piece of computer code with its data. Hence, object-oriented programming, and distributed objects, where objects are connected over a network.
- object model
- A description of the structural relationships among components of a library object including its metadata.
- OCLC (Online Computer Library System)
- An organization that provides, among other services, a bibliographic utility for libraries to share catalog records.
- OPAC (online public access catalog)
- An online library catalog used by library patrons.
- open access
- Resources that are openly available to users with no requirements for authentication or payment.
- optical character recognition
- Automatic conversion of text from a digitized image to computer text.
- Pad++
- A experimental user interface for access to large collections of information, based on semantic zooming.
- page description language
- A system for encoding documents that precisely describes their appearance when rendered for printing or display.
- PDF (Portable Document Format)
- A page description language developed by Adobe Corporation to store and render images of pages.
- peer review
- The procedure by which academic journal articles are reviewed by other researchers before being accepted for publication.
- Perseus
- A digital library of hyperlinked sources in classics and related disciplines, based at Tufts University.
- policy
- A rule established by the manager of a digital library that specifies which users should be authorized to have what access to which materials.
- port
- A method used by TCP to specify which program running on a computer should process a message arriving over the Internet.
- PostScript
- A programming language to create graphical output for printing, used as a page description language.
- precision
- In information retrieval, the percentage of hits found by a search that satisfy the request that generated the query.
- presentation profile
- Guidelines associated with a digital object that suggest how it might be presented to a user.
- protocol
- A set of rules that describe the sequence of messages sent across a network, specifying both syntax and semantics.
- proxy
- A computer that acts as a bridge between two computer systems that use different standards, formats, or protocols.
- publish
- To make information available and distribute it to the public.
- PURL (Persistent URL)
- A method of providing persistent identifiers using standard web protocols, developed by OCLC.
- query
- A textual string, possibly structured, that is used in information retrieval, the task being to find objects that match the words in the query.
- ranked searching
- Methods of information retrieval that return a list of documents, ranked in order of how well each matches the query,
- RDF (Resource Description Framework)
- A method for specifying the syntax of metadata, used to exchange metadata.
- RealAudio
- A format and protocol for compressing and storing digitized sound, and transmitting it over a network to be played in real time.
- recall
- In informational retrieval, the percentage of the items in a body of material which would satisfy a request that are actually found by a search.
- refresh
- To make an exact copy of data from older media to newer for long-term preservation.
- render
- To transform digital information in the form received from a repository into a display on a computer screen, or for other presentation to the user.
- replication
- Make copies of digital material for backup, performance, reliability, or preservation.
- repository
- A computer system used to store digital library collections and disseminate them to users.
- RSA encryption
- A method of dual key (public key) encryption.
- scanning
- Method of conversion in which a physical object, e.g., a printed page, is represented by a digital grid of pixels
- search term
- A single term within a query, usually a single word or short phrase.
- secondary information
- Information sources that describe other (primary) information, e.g., catalogs, indexes, and abstracts; used to find information and manage collections.
- security
- Techniques and practices that preserve the integrity of computer systems, and digital library services and collections.
- server
- Any computer on a network, other than a client, that stores collections or provides services.
- SGML (Standard Generalized Markup Language)
- A system for creating mark-up languages that represent the structure of a document.
- SICI (Serial Item and Contribution Identifier)
- An identifier for an issue of a serial or an article contained within a serial.
- speech recognition
- Automatic conversion of spoken words to computer text.
- STARTS
- An experimental protocol for use in distributed searching, which enables a client to combine results from several search engines.
- stemming
- In informational retrieval, reduction of morphological variants of a word to a common stem.
- stop word
- A word that is so common that it is ignored in information retrieval. A set of such words is called a stop list.
- structural type
- Metadata that indicates the structural category of a digital object.
- style sheet
- A set of rules that specify how mark-up in a document translates into the appearance of the document when rendered.
- subscription
- In a digital library, a payment made by a person or an organization for access to specific collections and services, usually for a fixed period, e.g., one year.
- subsequent use
- Use made of digital materials after they leave the control of a digital library.
- tag
- A special string of characters embedded in marked-up text to indicate the structure or format.
- TCP/IP
- The base protocols of the Internet. IP uses numeric IP addresses to join network segments; TCP provides reliable delivery of messages between networked computers.
- TEI (Text Encoding Initiative)
- A project to represent texts in digital form, emphasizing the needs of humanities scholars. Also the DTD used by the program.
- TeX
- A method of encoding text that precisely describes its appearance when printed, especially good for mathematical notation. LaTeX is a version of TeX.
- thesaurus
- A linguistic tool that relates words by meaning.
- Ticer Summer School
- A program at Tilburg University to educate experienced librarians about digital libraries.
- Tipster
- A DARPA program of research to improve the quality of text processing methods, including information retrieval.
- transliteration
- A systematic way to convert characters in one alphabet or phonetic sounds into another alphabet.
- TREC (Text Retrieval Conferences)
- Annual conferences in which methods of text processing are evaluated against standard collections and tasks.
- truncation
- Use of the first few letters of a word as a search term in information retrieval.
- Tulip
- An experiment in which Elsevier Science scanned material science journals and a group of universities mounted them on local computers.
- UDP
- An Internet protocol which transmits data packets without error checking.
- Unicode
- A 16-bit code to represent the characters used in most of the world's scripts. UTF-8 is an alternative encoding in which one or more 8-bit bytes represents each Unicode character.
- union catalog
- A single catalog that contains records about materials in several collections or libraries.
- URL (Uniform Resource Locator)
- A reference to a resource on the Internet, specifying a protocol, a computer, a file on that computer, and parameters. An absolute URL specifies a location as a domain name or IP address; a relative URL specifies a location relative to the current file.
- URN (Uniform Resource Name)
- Location-independent names for Internet resources.
- WAIS
- An early version of Z39.50, used in digital libraries before the web, now largely obsolete.
- Warwick Framework
- A general model that describes the various parts of a complex object, including the various categories of metadata.
- watermark
- A code embedded into digital material that can be used to establish ownership, may be visible or invisible to the user.
- web crawler
- A web indexing program that builds an index by following hyperlinks continuously from web page to web page.
- webmaster
- A person who manages web sites.
- web search services
- Commercial services that provide searching of the web, including: Yahoo, Altavista, Excite, Lycos, Infoseek, etc.
- web site
- A collection of information on the web; usually stored on a web server.
- Westlaw
- A legal information service provided by West Publishing.
- World Wide Web (web)
- An interlinked set of information sources on the Internet, and the technology they use, including HTML, HTTP, URLs, and MIME.
- World Wide Web Consortium (W3C)
- A international consortium based at M.I.T. that coordinates technical developments of the web.
- work
- The underlying intellectual abstraction behind some material in a digital library.
- Xerox Digital Property Rights Language
- Syntax and rules for expressing rights, conditions, and fees for digital works.
- XLS (eXtensible Style Language)
- System of style sheets for use with XML, derived from CSS.
- XML (eXtensible Mark-up Language)
- A simplified version of SGML intended for use with online information.
- Z39.50
- A protocol that allows a computer to search collections of information on a remote system, create sets of results for further manipulation, and retrieve information; mainly used for bibliographic information.