PubMed Help PubMed Help

Appendices

Section Contents

How PubMed works: automatic term mapping

Cookies

Search Field Descriptions and Tags

MeSH Subheadings

MeSH Subheading hierarchies

PubMed Publication Types

Stopwords

MEDLINE display format

NLM author indexing policy

Grant code and institute abbreviations used in grant numbers

PubMed character conversions

Computation of Related Articles

PMID to PMC ID Converter

How PubMed works: automatic term mapping

Untagged terms that are entered in the search box are matched (in this order) against a MeSH (Medical Subject Headings) translation table, a Journals translation table, the Full Author translation table, Author index, the Full Investigator (Collaborator) translation table and an Investigator (Collaborator) index.

When a match is found for a term or phrase in a translation table the mapping process is complete and does not continue on to the next translation table.

1. MeSH translation table

The MeSH Translation Table contains:

  • MeSH terms
  • The See-Reference mappings (also known as entry terms) for MeSH terms
  • MeSH Subheadings
  • Publication Types
  • Pharmacologic action terms
  • Terms derived from the Unified Medical Language System (UMLS) that have equivalent synonyms or lexical variants in English
  • Supplementary concept (substance) names and their synonyms.

If a match is found in this translation table, the term will be searched as MeSH (that includes the MeSH term and any specific terms indented under that term in the MeSH hierarchy), and in all fields.

For example, if you enter child rearing in the search box, PubMed will translate this search to: "child rearing"[MeSH Terms] OR ("child"[All Fields] AND "rearing"[All Fields]) OR "child rearing"[All Fields]

If you enter a MeSH Term that is also a Pharmacologic Action PubMed will search the term as [MeSH Terms], [Pharmacologic Action], OR in [All Fields].

If you enter an entry term for a MeSH term the translation will also include an all fields search for the MeSH term associated with the entry term. For example, a search for odontalgia will translate to: "toothache"[MeSH Terms] OR "toothache"[All Fields] OR "odontalgia"[All Fields] because Odontalgia is an entry term for the MeSH term toothache.

Substance name mappings do not include a mapping for individual terms in a phrase, e.g., IL-22 will not include IL[All Fields AND 22[All Fields].

MeSH term mappings that include a standalone number or single character do not include a mapping for individual terms in a phrase, e.g., Protein C will not include Protein[All Fields] or C[All Fields].

More information about automatic term mapping:

  • Click Details to verify how your terms are translated. If you want to report a translation that does not seem accurate for your search topic, please e-mail the information to the NLM Help Desk.
2. Journals translation table

The Journals translation table contains the:

  • full journal title
  • title abbreviation
  • ISSN number. 

These will automatically map to the journal abbreviation that is used to search journals in PubMed and in all fields. For example, a search for endocrine pathology will translate to: "Endocr Pathol"[Journal] OR ("endocrine"[All Fields] AND "pathology"[All Fields]) OR "endocrine pathology"[All Fields]top link

3. Full Author translation table

The full author translation table includes full author names for articles published from 2002 forward, if available. Enter a full author name in natural or inverted order, e.g., julia s wong or wong julia s.

More information about full author searching:

  • A comma following the last name for searching is optional. For some names, however, it is necessary to distinguish which name is the last name by using the comma following the last name, e.g., james, ryan.
  • Omit periods after initials and put all suffixes at the end, e.g., vollmer charles jr
  • Initials and suffixes are not required, if you include a middle initial or suffix, you will only retrieve citations for articles that were published using the middle initial or suffix.
  • To distinguish author initials that may match a full author name use the [fau] search tag, e.g., peterson do[fau].
4. Author index

If the term is not found in the above tables, except for Full Author, and is not a single term, PubMed checks the author index for a match. When combining multiple authors, to avoid a match with full author names, include initials or use the [au] search tag, e.g., ryan[au] james[au].top link

5. Full Investigator (Collaborator) translation table

The full investigator (collaborator) translation table includes full names, if available. Enter a full investigator name in natural or inverted order, e.g., harry janes or janes harry.top link

6. Investigator (Collaborator) index

If the term is not found in the above tables, except for Full Author, and is not a single term, PubMed checks the investigator index for a match.top link

7. If no match is found?

PubMed breaks apart the phrase and repeats the above automatic term mapping process until a match is found. PubMed ignores stopwords in searches.

If there is no match, the individual terms will be combined (ANDed) together and searched in all fields.top link

Cookies

A "cookie" is information stored by a Web site server (e.g., PubMed) on your computer. See the NLM Privacy Policy for additional information.

In the case of PubMed, it is information about your interactions that may be needed later to perform a function. Cookies allow PubMed to provide more interactive features such as Preview/Index, Clipboard, History, My NCBI and paging through results. Cookies placed by PubMed are removed from your computer after a set time period unless you choose to use a persistent cookie with the My NCBI automatic sign in function.

To use these interactive features you need to enable cookies on your computer. Please consult your browser's Help for information on enabling cookies.

If you have problems using cookie-dependent features of PubMed even after enabling cookies, possible reasons may include:

  • Cookies are blocked by your provider or institution. Check with your Internet provider and/or the system administrator at your institution to see if cookies can be accepted. Even if you have them enabled in your Web browser, if they are blocked by your provider or institution (e.g., by a firewall, proxy server, etc.), cookie-dependent features of PubMed won't work.
  • Your computer's date and time settings are incorrect. Check your computer's time settings to ensure that they are correct.
MeSH Subheadings

See the MeSH Subheadings and scope notes and allowable categories on the NLM website.top link

MEDLINE display format

The MEDLINE Display Format tags table defines the data tags that compose the PubMed MEDLINE display format. The tags are presented in alphabetical order. Some of the tags (e.g., CIN) are not mandatory and therefore will not be found in every PubMed MEDLINE display format. Other tags (e.g., AU, MH, RN) may occur multiple times in one record. This format is available for exporting citations into a reference management software program.

Not all fields are searchable in PubMed. See Search Field Descriptions and Tags.top link

NLM Author Indexing Policy

NLM's author indexing policy is as follows:

  • 1966 - 1984: MEDLINE did not limit the number of authors.
  • 1984 - 1995: The NLM limited the number of authors to 10, with "et al" as the eleventh occurrence.
  • 1996 - 1999: The NLM increased the limit from 10 to 25. If there were more than 25 authors, the first 24 were listed, the last author was used as the 25th, and the twenty-sixth and beyond became "et al."
  • 2000 - Present: MEDLINE does not limit the number of authors.

Note:

Until 1990, only five transliterated (Japanese and Cyrillic) authors were included on each citation.  Since 1990, the first ten transliterated authors have been entered.  Chinese ideograms for co-authors are not transliterated at all if the journal lists only a single transliterated name in the table of contents.top link

PubMed Character Conversions

PubMed uses certain characters to have special meaning in searches, while others are converted to spaces, see PubMed character conversions.top link

Computation of Related Articles

The neighbors of a document are those documents in the database that are the most similar to it. The similarity between documents is measured by the words they have in common, with some adjustment for document lengths. To carry out such a program, one must first define what a word is. For us, a word is basically an unbroken string of letters and numerals with at least one letter of the alphabet in it. Words end at hyphens, spaces, new lines, and punctuation. A list of 132 common, but uninformative, words (also known as stopwords) are eliminated from processing at this stage. Next, a limited amount of stemming of words is done, but no thesaurus is used in processing. Words from the abstract of a document are classified as text words. Words from titles are also classified as text words, but words from titles are added in a second time to give them a small advantage in the local weighting scheme. MeSH terms are placed in a third category, and a MeSH term with a subheading qualifier is entered twice, once without the qualifier and once with it. If a MeSH term is starred (indicating a major concept in a document), the star is ignored. These three categories of words (or phrases in the case of MeSH) comprise the representation of a document. No other fields, such as Author or Journal, enter into the calculations.

Having obtained the set of terms that represent each document, the next step is to recognize that not all words are of equal value. Each time a word is used, it is assigned a numerical weight. This numerical weight is based on information that the computer can obtain by automatic processing. Automatic processing is important because the number of different terms that have to be assigned weights is close to two million for this system. The weight or value of a term is dependent on three types of information: 1) the number of different documents in the database that contain the term; 2) the number of times the term occurs in a particular document; and 3) the number of term occurrences in the document. The first of these pieces of information is used to produce a number called the global weight of the term. The global weight is used in weighting the term throughout the database. The second and third pieces of information pertain only to a particular document and are used to produce a number called the local weight of the term in that specific document. When a word occurs in two documents, its weight is computed as the product of the global weight times the two local weights (one pertaining to each of the documents).

The global weight of a term is greater for the less frequent terms. This is reasonable because the presence of a term that occurred in most of the documents would really tell one very little about a document. On the other hand, a term that occurred in only 100 documents of one million would be very helpful in limiting the set of documents of interest. A word that occurred in only 10 documents is likely to be even more informative and will receive an even higher weight.

The local weight of a term is the measure of its importance in a particular document. Generally, the more frequent a term is within a document, the more important it is in representing the content of that document. However, this relationship is saturating, i.e., as the frequency continues to go up, the importance of the word increases less rapidly and finally comes to a finite limit. In addition, we do not want a longer document to be considered more important just because it is longer; therefore, a length correction is applied. This local weight computation is based on the Poisson distribution and the formula can be found in Lin J and Wilbur WJ.

The similarity between two documents is computed by adding up the weights (local wt1 × local wt2 × global wt) of all of the terms the two documents have in common. This provides an indication of how related two documents are. The resultant score is an example of a vector score. Vector scoring was originated by Gerard Salton and has a long history in text retrieval. The interested reader is referred to Salton, Automatic Text Processing, Reading, MA: Addison-Wesley, 1989 for further information on this topic. Our approach differs from other approaches in the way we calculate the local weights for the individual terms. Once the similarity score of a document in relation to each of the other documents in the database has been computed, that document's neighbors are identified as the most similar (highest scoring) documents found. These closely related documents are pre-computed for each document in PubMed so that when you select Related Articles, the system has only to retrieve this list. This enables a fast response time for such queries.top link


Write to the Help Desk
NCBI | NLM | NIH
Department of Health & Human Services
Privacy Statement | Freedom of Information Act | Disclaimer