Bibliogram
Encyclopedia
A bibliogram is a verbal construct made when noun phrase
Noun phrase
In grammar, a noun phrase, nominal phrase, or nominal group is a phrase based on a noun, pronoun, or other noun-like word optionally accompanied by modifiers such as adjectives....

s from extended stretches of text are ranked high to low by their frequency
Frequency (statistics)
In statistics the frequency of an event i is the number ni of times the event occurred in the experiment or the study. These frequencies are often graphically represented in histograms....

 of co-occurrence
Co-occurrence
Co-occurrence or cooccurrence can either mean concurrence / coincidence or, in a more specific sense, the above-chance frequent occurrence of two terms from a text corpus alongside each other in a certain order. Co-occurrence in this linguistic sense can be interpreted as an indicator of semantic...

 with one or more user-supplied seed terms. Each bibliogram has three components:
  • A seed term that sets a context.
  • Words that co-occur with the seed across some set of records.
  • Counts (frequencies) by which co-occurring words can be ordered high to low.


The term was introduced in 2005 by Howard D. White
Howard D. White
Howard D. White is a scientist in library and information science with a focus on informetrics and scientometrics....

 to name the linguistic object studied, but not previously named, in informetrics
Informetrics
Informetrics is the study of quantitative aspects of information. This includes the production, dissemination and use of all forms of information, regardless of its form or origin...

, scientometrics
Scientometrics
Scientometrics is the science of measuring and analysing science. In practice, scientometrics is often done using bibliometrics which is a measurement of the impact of publications. Modern scientometrics is mostly based on the work of Derek J. de Solla Price and Eugene Garfield...

 and bibliometrics
Bibliometrics
Bibliometrics is a set of methods to quantitatively analyze scientific and technological literature. Citation analysis and content analysis are commonly used bibliometric methods...

. The noun phrases in the ranking
Ranking
A ranking is a relationship between a set of items such that, for any two items, the first is either 'ranked higher than', 'ranked lower than' or 'ranked equal to' the second....

 may be authors, journals, subject headings, or other indexing terms. The "stretches of text” may be a book, a set of related articles, a subject bibliography, a set of Web pages, and so on. Bibliograms are always generated from writings, usually from scholarly or scientific literatures.

As a family of term-frequency distributions, the bibliogram has frequently been written about under descriptions such as:
  • positive skew distribution
  • empirical hyperbolic
  • scale-free (see also Scale-free network
    Scale-free network
    A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction P of nodes in the network having k connections to other nodes goes for large values of k as...

    )
  • power law
    Power law
    A power law is a special kind of mathematical relationship between two quantities. When the frequency of an event varies as a power of some attribute of that event , the frequency is said to follow a power law. For instance, the number of cities having a certain population size is found to vary...

  • size frequency distribution
  • reverse-J


It is sometimes called a "core and scatter" distribution. The "core" consists of relatively few top-ranked terms that account for a disproportionately large share of co-occurrences overall.
The "scatter” consists of relatively many lower-ranked terms that account for the remaining share of co-occurrences. Usually the top-ranked terms are not tied in frequency, but identical frequencies and tied ranks become more common as the frequencies get smaller. At the bottom of the distribution, a long tail of terms are tied in rank because each co-occurs with the seed term only once.

In most cases bibliograms can be described by power law
Power law
A power law is a special kind of mathematical relationship between two quantities. When the frequency of an event varies as a power of some attribute of that event , the frequency is said to follow a power law. For instance, the number of cities having a certain population size is found to vary...

s such as Zipf's law and Bradford's law
Bradford's law
Bradford's law is a pattern first described by Samuel C. Bradford in 1934 that estimates the exponentially diminishing returns of extending a search for references in science journals...

. In this regard, they have long been studied by mathematicians and statisticians in information science. However, these treatments typically ignore the qualitative meanings of the ranked terms themselves, which are often of interest in their own right. For example, the following bibliogram was made with an author's name as seed and shows the descriptors that co-occur with her name in the ERIC database. The descriptors are ranked by how many of her articles they were used to index:

6 Creativity
4 Creativity Tests
3 Divergent Thinking
2 Elementary School Mathematics
2 Instruction
2 Mathematics Education
2 Problem Solving
2 Research
2 Time
1 Acceleration
1 Anxiety
1 Beginning Teachers
1 Behavioral Objectives
1 Child Development
1 Classroom Techniques
1 Cognitive Development
etc.

This author is a researcher in education, and it will be seen that the terms profile her intellectual interests over the years. In general, bibliograms can be used to:
  • suggest additional terms for search strategies
  • characterize the work of scholars, scientists, or institutions
  • show who an author cites over time
  • show who cites an author over time
  • show the other authors with whom an author is co-cited over time
  • show the subjects associated with a journal or an author
  • show the authors, organizations, or journals associated with a subject
  • show library classification codes associated with subject headings and vice versa
  • show the popularity of items in the collections of libraries
  • model the structure of literatures with title terms, descriptors, author names, journal names


Bibliograms can be created with the RANK command on Dialog (other vendors have similar commands),
ranking options within WorldCat
WorldCat
WorldCat is a union catalog which itemizes the collections of 72,000 libraries in 170 countries and territories which participate in the Online Computer Library Center global cooperative...

, HistCite, Google Scholar
Google Scholar
Google Scholar is a freely accessible web search engine that indexes the full text of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes most peer-reviewed online journals of Europe and America's largest...

, and inexpensive content analysis software.

White suggests that bibliograms have a parallel construct in what he calls associograms. These are the rank-ordered lists of word association norms studied in psycholinguistics
Psycholinguistics
Psycholinguistics or psychology of language is the study of the psychological and neurobiological factors that enable humans to acquire, use, comprehend and produce language. Initial forays into psycholinguistics were largely philosophical ventures, due mainly to a lack of cohesive data on how the...

. They are similar to bibliograms in statistical structure but are not generated from writings. Rather, they are generated by presenting panels of people with a stimulus term (which functions like a seed term) and tabulating the words they associate with the seed by frequency of co-occurrence. They are currently of interest to information scientists as a nonstandard way of creating thesauri for document retrieval.

Examples

Other examples of bibliograms are the ordered set of an author's co-authors or the list of authors that are published in a specific journal together with their number of articles. A popular example is the list of additional titles to consider for purchase that you get when you search an item in Amazon
Amazon.com
Amazon.com, Inc. is a multinational electronic commerce company headquartered in Seattle, Washington, United States. It is the world's largest online retailer. Amazon has separate websites for the following countries: United States, Canada, United Kingdom, Germany, France, Italy, Spain, Japan, and...

. These suggested titles are the top terms in the "core" of a bibliogram formed with your search term as seed. The frequencies are counts of the times they have been co-purchased with the seed.

Examples of associagrams may be found in the Edinburgh Associative Thesaurus.

Other methods

Similar but different methods are used in data clustering
Data clustering
Cluster analysis or clustering is the task of assigning a set of objects into groups so that the objects in the same cluster are more similar to each other than to those in other clusters....

 and data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

.
Google Sets (see also http://questsin.blogspot.com/2005/06/in-search-for-answers-another.html for its algorithm)
does also create list of associated terms to a given set of terms.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK