Wikipedia:Identifying and using primary sources
This is an explanatory essay about Error: No page specified. This page provides additional information about concepts in the page(s) it supplements. This page is not one of Wikipedia's policies or guidelines as it has not been thoroughly vetted by the community. |
Identifying and using primary and secondary sources requires careful thought and some extra knowledge on the part of Wikipedia's editors.
In determining the type of source, there are three separate, basic characteristics to identify:
- Is this source self-published or not? (If so, then see the Verifiability policy on self-published sources.)
- Is this source independent or third-party, or is it closely affiliated with the subject?
- Is this source primary or not?
Every possible combination of these three traits has been seen in sources on Wikipedia. Any combination of these three traits can produce a source that is usable for some purpose in a Wikipedia article. Identifying these characteristics will help you determine how you can use these sources.
This page deals primarily with the last question: identifying and correctly using primary and non-primary sources.
Source classification in the real world
The concept of primary, secondary, and tertiary sources originated with the academic discipline of historiography. The point was to give historians a handy way to indicate how close the source of a piece of information was to the actual events.
Importantly, the concept developed to deal with "events", rather than ideas or abstract concepts. A primary source was a source that was created at about the same time as the event, regardless of the source's contents. So while a dictionary is a classic example of a tertiary source for the meanings of words, an ancient dictionary is actually a primary source—for the meanings of words in the ancient world.
There are no quaternary sources: Either the source is primary, or it describes, comments on, or analyzes primary sources (in which case, it is secondary), or it relies heavily or entirely on secondary sources (in which case, it is tertiary). The first published source for any given fact is always considered a primary source.
The historians' concept has been extended into other fields, with partial success.
Wikipedia is not the real world
Wikipedia does not use these terms exactly like academics use them. There are at least two major definitions of secondary source in use on Wikipedia. This page deals primarily with the classification of reliable sources in terms of article content. The classification used specifically for notability is addressed in a separate section at the end.
How to classify a source
Imagine that an army conquered a small country 200 years ago, and you have the following sources:
- a proclamation of victory written at the time of the conquest,
- a diary written by someone who lived at the time and talks about it,
- a book written 150 years later, that analyzes the proclamation,
- an academic journal article written two years ago that examines the diary, and
- an encyclopedia entry written last year, that is based on both the book and the journal.
Both the proclamation and the diary are primary sources. These primary sources have advantages: they were written at the time, and so are free of the opinions and fictions imposed by later generations. They also have disadvantages: the proclamation might contain propaganda designed to pacify the conquered country, or omit politically inconvenient facts, or overstate the importance of other facts, or be designed to stroke the new ruler's ego. The diary will reflect the prejudices of its author, and its author might be unaware of relevant facts.
The book and the journal article are a secondary sources. These secondary sources have advantages: The authors were not involved in the event, so they have the emotional distance that allows them to analyze the events dispassionately. They also have disadvantages: The authors are writing about what other people said happened, and cannot use their own experience to correct any errors or omissions. The authors may be unable to see clearly through their own cultural lens, and the result may be that they unconsciously emphasizes things important to their cultures and times, while overlooking things important to the actual actors.
The encyclopedia article is a tertiary source. It has advantages: it summarizes information. It also has disadvantages: in relying on the secondary source, the encyclopedia article will repeat, and may accidentally amplify, any distortions or errors in that source. It may also add its own interpretation.
This sort of simple example is what the source classification system was intended to deal with. It has, however, been stretched to cover much more complicated situations.
Uses in fields other than history
In science, data is primary, and the first publication of any idea or experimental result is always a primary source. Narrative reviews, systematic reviews and meta-analyses are considered secondary sources, because they are based on and analyze or interpret (rather than merely citing) these original experimental reports.
In the fine arts, a work of art is always a primary source. This means that novels, plays, paintings, sculptures, and such are always primary sources. Statements made by or works written by the artists about their artwork might be primary or secondary. Critiques and reviews by art critics are secondary sources.
Not a matter of counting the number of links in the chain
Consider the simple example above: the original proclamation is a primary source. Is the book necessarily a secondary source?
The answer is: not always. If the book merely quotes the proclamation (such as re-printing a section in a sidebar or the full text in an appendix, or showing an image of the signature or the official seal on the proclamation) with no analysis or commentary, then the book is just a newly printed copy of the primary source, rather than being a secondary source. The text and images of the proclamation always remain primary sources.
It's not a matter of counting up the number of sources in a chain. The first published source is always a primary source, but it is possible to have dozens of sources, without having any secondary or tertiary sources. If Alice writes down an idea, and Bob simply quotes her work, and Chris refers Bob's quotation, and Daisy cites Chris, and so forth, you very likely have a string of primary sources, rather than one primary, one secondary, one tertiary, and all subsequent sources with made-up classification names.
Characteristics of a secondary source
- A secondary source is built from primary sources. Secondary sources are not required to provide you with a bibliography, but you should have some reason to believe that the source is building on the foundation of prior sources rather than starting with all-new material. For example, century-old love letters on display at a museum are primary sources; a secondary source might analyze the contents of these letters. The fact that the analysis is based on these letters would be evident from the description in the source, even if the paper contained no footnotes.
- A secondary source is significantly separated from these primary sources. A reporter's notebook is an (unpublished) primary source, and the news story published by the reporter based on those notes is also a primary source. This is because the sole purpose of the notes in the notebook is to produce the news report. If a journalist later reads dozens of these primary-source news reports and uses those articles to write a book about a major event, then this resulting work is a secondary source. This separation is not defined by the length of time that elapses or geographical distance.
- A secondary source usually provides analysis, commentary, evaluation, context, and interpretation. It is this act of going beyond simple description, and telling us the meaning behind the simple facts, that makes them valuable to Wikipedia.
- Reputable secondary sources are usually based on more than one primary source. High-quality secondary sources often synthesize together multiple primary sources, in due proportion to the expert-determined quality of the primary sources. This helps us present the material in due proportion to the sources' actual importance, rather than in due proportion to the size of the sources' publicity budgets.
All sources are primary for something
Every source is the primary source for something, whether it be the name of the author, its title, its date of publication, and so forth. For example, no matter what kind of book it is, the copyright page inside the front of a book is a primary source for the date of the book's publication.
More importantly, many high-quality sources contain both primary and secondary material. A textbook might include commentary on the proclamation (which is secondary material) as well as the full text of the proclamation (which is primary material). A peer-reviewed journal article may begin by summarizing previously published work to place the new work in context (which is secondary material) before proceeding into a description of a novel idea (which is primary material). An author might write a book about an event that is mostly a synthesis of primary-source news stories (which is secondary material), but he might add occasional information about personal experiences or new material from recent interviews (which is primary material). The book about love letters might analyze the letters (which is secondary material) and provide a transcription of the letters in an appendix (which is primary material). The work based on previously published sources is probably a secondary source; the new information is a primary source.
"Secondary" is not another way to spell "good"
"Secondary" is not, and should not be, a bit of jargon used by Wikipedians to mean "good" or "reliable" or "usable". Secondary does not mean that the source is independent, authoritative, high-quality, accurate, fact-checked, expert-approved, subject to editorial control, or published by a reputable publisher. Secondary sources can be unreliable, biased, self-serving and self-published.
According to our content guideline on identifying reliable sources, a reliable source has the following characteristics:
- It has a reputation for fact-checking and accuracy.
- It is published by a reputable publishing house, rather than by the author(s).
- It is "appropriate for the material in question", i.e., the source is directly about the subject, rather than mentioning something unrelated in passing.
- It is a third-party or independent source.
- It has a professional structure in place for deciding whether to publish something, such as editorial oversight or peer review processes.
A primary source can have all of these qualities, and a secondary source may have none of them.
"Primary" is not another way to spell "bad"
"Primary" is not, and should not be, a bit of jargon used by Wikipedians to mean "bad" or "unreliable" or "unusable". While primary sources are less likely to be fully independent, they can be authoritative, high-quality, accurate, fact-checked, expert-approved, subject to editorial control and published by a reputable publisher. Primary sources can be reliable, and they can be used. However, there are limitations in what primary sources can be used for.
You are allowed to use primary sources... carefully
Material based on primary sources can be valuable and appropriate additions to articles.
Primary sources may only be used on Wikipedia to make straightforward, descriptive statements that any educated person—with access to the source but without specialist knowledge—will be able to verify are directly supported by the source. This person does not have to be able to determine that the material in the article or in the primary source is True™. The goal is only that the person could compare the primary source with the material in the Wikipedia article, and agree that the primary source actually, directly says just what we're saying it does.
- Examples
- An article about the conquest of the hypothetical country above: The proclamation itself is an acceptable primary source for a simple description of the proclamation, including its size, whether it was written in blackletter calligraphy, whether it is signed or has an official seal, and what words, dates, or names were on it. Anyone should be able to look at an image of the proclamation and see that it was all written on one page, whether it used that style of calligraphy, and so forth. However, the proclamation's authenticity, meaning, relevance, importance, typicality, influences, and so forth should all be left to the book that analyzed it, not to Wikipedia's editors.
- An article about a novel: The novel itself is an acceptable primary source for information about the plot, the names of the characters, or other contents in the book: Any educated person can read Jane Austen's Pride and Prejudice and discover that the main character's name is Elizabeth. It is not an acceptable source for claims about book's style, themes, foreshadowing, symbolic meaning, values, importance, or other matters of critical analysis, interpretation, or evaluation: No one will find a direct statement of this material in the book.
- An article about a painting: The painting itself is an acceptable primary source for information about the colors, shapes, and figures in the painting. Any educated person can look at Georgia O'Keeffe's Cow Skull: Red, White, and Blue, and see that it is a painting of a cow's skull on a background of red, white, and blue. It is not an acceptable source for claims about the artist's motivation, allusions or relationships to other works, the meaning of the figures in the painting, or any other matters of analysis, interpretation, or evaluation: Looking at the painting does not tell anyone why the artist chose these colors, whether she meant to evoke religious or patriotic sentiments, or what motivated the composition.
- An article about a business: The organization's own website is an acceptable (although possibly incomplete) primary‡ source for information about what the company says about itself and for most basic facts about its history, products, employees, finances, and facilities. It is not likely to be an acceptable source for most claims about how it or its products compare to similar companies and their products (e.g., "OurCo's Foo is better than Brand X"), although it will be acceptable for some simple, objective comparison claims ("OurCo is the oldest widget business in Smallville" or "OurCo sells more widgets than anyone else in the New Zealand"). It is never an acceptable source for claims that evaluate or analyze the company or its actions, such as an analysis of its marketing strategies (e.g., "OurCo's sponsorship of National Breast Cancer Month is an effective tool in expanding sales to middle-aged, middle-class American women").
Secondary sources for notability
One rough rule of thumb for identifying primary sources is this: if the source is noticeably closer to the event than you are, then it's a primary source. For example, if an event occurred on January 1, 1800, and a newspaper article appeared about it the next day, then Wikipedia (and all historians) considers the newspaper article a primary source.
However, Wikipedia fairly often writes about current events. As a result, an event may happen on Monday afternoon, may be written about in Tuesday morning's newspapers, and may be added to Wikipedia just minutes later. Many editors—especially those with no training in historiography—call these newspaper articles "secondary sources", by which they mean "please don't delete this article" sources.
Typically, very recent newspaper articles are mis-labeled as a "secondary source" during AFDs, by way of trying to finesse the general notability guideline's requirement that secondary sources exist, when no true secondary sources actually exist. It is difficult, if not impossible, to find true secondary sources for run-of-the-mill events and breaking news. Typically, editors are willing to overlook this error for recent events. However, once a couple of years have passed, if no true secondary sources can be found, the article is usually deleted.