This is a well-bandied statistic on the internet - "80% of technical information is found only in patents". (the corollary being, "therefore you need to search patents" - no argument here)   But I'm trying to find some definitive proof for this, the piece of research that everyone is apparently citing, because nobody seems to actually give the citation.

I can find presentations on the EPO website giving this figure as gospel, but with no research citation to support it.  Everyone else seems to be quoting the EPO, and even ripping off their slides, which is an interesting take on copyright protection!  I can find nothing on the WIPO website, or the European Commission website (other than EPO stuff), but their search engines aren't really letting me look for what I want to look for.  So where did this 80% figure come from?  And how do we know it's true?

If anyone can point me at original research proving the point, I'd be grateful. It's for a potential article for a general information professional audience (non-patent specialists), so they would expect proper citations if I'm spouting 'facts' at them.



  • No labels


  1. I guess that it was probably true 15-20 or more years ago. I remember that statement already from 2001-2002 when the Espacenet just took its first steps.

    Today the amount of any kind of information grows exponentially and the availability of information is faster, wider and easier. 

    For example, take YouTube, Facebook, SlideShare, LinkedIn, etc etc - very often the information about technology X is published in social media long time before the patent application regarding technology X is published.

    1. Margaret, you might have a look at the "Eigth Technology Assessment and Forecast Report" from the USPTO of 1977, see . Starting from page 23, the chapter "The Uniqueness of Patents as a Technological Resource" might help you.


      1. Thank you, Michael. That's the sort of thing I had in mind, but unfortunately is rather too old to be cited in anything other than a historical context now. (60s & 70s)  I'll keep it in reserve though!

    2. Yes, good point.  I guess it would be hard to 'prove' any particular percentage of uniqueness within patent information, given the fast-changing data volumes, and the fast-changing information sharing technologies these days.  So even if 80% was true at one time, it may not be today.  But it still comes back to my original point - where did that 80% figure come from?  Have people just plucked a figure from the air?

  2. Dear Margaret,

    Here is something I found (but not have the Real Copy of the sources, which you may have ways to get):

    1) 1997: Titel: Kompetenzorientierte, analysegestützte Technologiestrategieerarbeitung (Dissertationen - Autor: Matthias Ehrat)

    2) 1998: Patente im strategischen Marketing: Sicherung der Wettbewerbsfähigkeit durch systematische Patentanalyse und Patentnutzung (S.43/Page 43) (of see Book: Patentbewertung: Ein Praxisleitfaden zum Patentmanagement)

    3) CAFC Judge NEWMAN also said something similar in "In re Alappat, 33 F.3d 1526"

    Hope that helps (smile)

  3. How would you even tell what percentage of the information in patents is duplicated elsewhere?

    I'm pretty sure I've seen some data from Chemical Abstracts showing that most compounds in the CA Registry have been mentioned in only one publication, and the vast majority of those publications were patents.  I have no idea when or where that was reported, but it could be the source of the recurrent claim of 80% only in patents.  If that's the source, the percentage of unique disclosures in patents ought to have increased after CA started indexing compound disclosures without physical properties.  But compounds in Registry aren't "technology," they're just a small slice of it.

    My own observation is that the amount of overlap between published information in patents and elsewhere is that it's extremely technology dependent.  Some technologies have a well developed scholarly literature, and academic research is routinely published in journals.  Other technologies have few or no scholarly journals, so patents are the only places research results are published.  I've done searches where every reference I retrieve in CA is a patent, and that's just chemically related technology. There are searches for which no non-patent literature exists, especially in what they call the mechanical arts.

  4. The one piece of information I know is from the reference by Mucke, H. A. (2011). "Relating patenting and peer-review publications: an extended perspective on the vascular health and risk management literature." Vasc Health Risk Manag 7: 265-272. An excerpt for the abstract reads, “more than two-thirds of the patent applications had no such companion paper in a scientific journal.”

  5. Various majority percentages of patents of all chemical (or technical)  information have been cited for decades.  I remember Derwent people citing it, and later CAS people.  I have no idea where the data comes from or who was the first to cite it.  As Edlyn says, it's probably dependent on the technology.

  6. It can be agreed not more that the percentage that technical information/knowledge ONLY disclosed/published in patent documents varies depending on specific technologies.

    And personally I also agree even with that, it might not crystal clear that an exact percentage can be figured out.

    I am thinking this question as raised by Margaret is probably ONLY for informational/curiosity purpose.

    Meanwhile, it might be interesting if we can figure the "first citation source" out, which at least can show that the 2/3 or 8- or 80-90 percentage saying is not just "rumor" saying...

  7. Thanks, everyone, for comments so far.  This query arose from me planning to write for non-patent information specialists, and obviously the first point to address is "why you need to consider searching patents".  The equally obvious answer is the usual one of "80% of 'stuff ' is found only in patents.."  but then the scientist in me waved her hand and asked "How do they know that? Where's the proof?"

    Hence my turning to this expert community, 'cause if you guys don't know the source, it can't be true!  (smile)      And I agree ... how could you 'prove' something like that anyway?

    I'll check out the things people have cited for me so far, but it's not life and death, it really was just fact-checking and curiosity.  I may just hedge my bets in the end, and say something like "most technical information - depending on area - may be found only in patents..."

  8. I use the format "It is generally accepted that ......a significant volume if technical and scientific information is only available through patent publications"


  9. Hi Margaret,

    The reference I have always quoted is the one mentioned already by Michael Schwantner: ‘8th USPTO Technology Assessment and Forecast Report. Section II: “The uniqueness of patents as a technological resource”, USPTO 1977’

    My notes on this report are as follows:

    In 1977 the USPTO Office of Technology Assessment and Forecast published two studies on this topic which are often referred to. The report was the Eighth Technology Assessment and Forecast Report. Section II: The uniqueness of patents as a technological resource, Part I. Patents as a technological resource, Part II,The Patent File as a technological resource.

    To summarize: The first study seeks an answer to the questions: How much technology disclosed in patents is reported in the non-patent literature? The project employed a random sample of 435 U.S. patents, each of which was extensively searched in the non-patent literature. Conclusion: "The results of this study show that about 8 out of 10 U.S. patents contain technology not disclosed in the non-patent literature." (p. 37)

    Hope this helps,


  10. After a discussion in the WON Board if this number, that is cited so often, is true, Mervyn Bregonje (WON board member at that time) investigated this in the CAS databases and presented his findings at a WON meeting. Afterwards, this was also published in World Patent Information in December 2005 with the title "Patents: A unique source for scientific technical information in chemistry related industry?" (DOI This study shows that even within the chemical area the numbers vary per subarea. For instance Mervyn found that for polymers 60-70% of the first publications are patent publications, whereas for metallocenes it is only 5-15%.

    Mervyn cites among others Terapane et al. who "concluded, after he carried out a random sample of 435 mainly US patents, that 70% of new technology was not disclosed later elsewhere, 13% partly and 16% was completely disclosed in non-patent literature later" (Terapane JF. A unique source of information. Chemtech 1978;8:272–6.). I assume that this is the same publication that Bob refers to although the conclusion seems to be slightly different.


    1. Thanks for your "due diligence".  Interesting that so little metallocene information is published first in patents since metallocene catalysts are used extensively used in polyolefin production.  Maybe the figure refers to new compositions rather than further extension of utility.

      I recall that sometime in the late 70s that various information specialists, including vendors of databases and systems, started pitching patents as an essential source of chemical (and other) information.  This prompted me to look through my collection of books on searching for information, especially chemical, e.g. Maizell, Wiggins, etc.  Most referred to the value of chemical patent information as often being unique, but no percentages were given.  I would have hoped that representatives (especially "veterans") from CAS, Derwent, etc. would have chimed in on this string.

  11. Excellent, thank you!  I'm sure there's more than enough in the trail above to meet my modest needs.  Many thanks to everyone for your assistance.

  12. I too have a hard time believing that high stat for technical information in patents. I run many searches where I know that the information I need will be in literature and not in patents. So, it's likely dependent on the technology and it's 'new-ness'.

  13. Chanced upon a WIPO document which says "80% of the disclosures in patents are not published in any other form". This seems more plausible as this is measurable. Apparently, USPTO did exactly this assessment.

  14. Dear All,


    please be careful with mixing up the different numbers.

    The statement of Margaret that started this page was:

    "80% of technical information is found only in patents"


    The probable source for this 80% figure as suggested by Michael Schwantner is:

    "Eighth Technology Assessment and Forecast Report" from the USPTO of 1977

    which, according to Bob Stembridge states:

    "The results of this study show that about 8 out of 10 U.S. patents contain technology not disclosed in the non-patent literature." (p. 37)


    This seems to support the 80% figure, but unfortunately does not what percentage of information as published via the non-patent publications is not made available in patents, but not the exact statement as defined by Margaret.


    Bettina de Jong points to a publication by Mervyn Bregonje in WPI 2005 and correctly summarises that it studied the sources of first publication for several classes of chemicals. Bob Buntrock correctly mentions the surprising number for metallocenes and concludes : "Maybe the figure refers to new compositions rather than further extension of utility".


    The method as used for this study was the following:

    For the registry entries in CAS Registry that fit a defined substructure query, statistics were run over the document type of the documents in Chem. Abstr. that initiated the creation of found registry numbers.

    So indeed it could be that for the metallocenes most of the publications and most of the technical information is made available via the patent publications, but the majority of the first descriptions of new metallocenes apparently is in non-patent publications.


    This summary is intended to be a reminder to all readers to be careful when using statistics. Always check how the analysed dataset was compiled and what was actually analysed.

    We see many end-user tools being offered to management or other less informed staff promising interesting analysis and insight in patent information. It is our task to stay sharp and warn for the limitations of such tools.



    Perhaps another message to include in your paper: "know what you quote"

  15. Dear All,

    It's really amazing that we still use figures with more than 35 years and with a bias can not forget, because the data correspond to U.S. Pat. 

    Margaret concern was also shared a few years ago on the blog IPKat

  16. I once tried to work out the comparable number of new scientific publications published every year - and found a figure of $1.35 million - see 

    "Bjork, B et al, Scientific journal publishing – yearly volume and open access availability, Information Research 14(1) March, 2009"

    So the ~ 2 million patents published annually are ahead of scientific papers, although these figures says nothing about the considerable overlap between the two.

    What the quoted '80%' figure also ignores is technical information 'published' in new products etc. Given all of the above, I find it very hard to believe.


    Speaking of urban legends, I once looked up the quote routinely attributed to Henry Ford; “If I had asked people what they wanted, they would have said faster horses'.

    There is no evidence that he said that either:

    But one quote that does appear to be attributed properly is this one:

    In God we trust; all others bring data.” by Deming - and which appears to be very relevant to this conversation.


    1. > I once looked up the quote routinely attributed to Henry Ford; “If I had asked people what they wanted, they would have said faster horses'.

      Reminded me of a quote from the Steve Jobs book [1]::

      In his Apple office, 1982: Asked if he wanted to do market research, he said, “No, because customers don’t know what they want until we’ve shown them.”

        [1] Isaacson, W. (2011). Steve Jobs. New York: Simon & Schuster.



  17. This is a partial layman proof by contradiction that 80% of technical information is found only in patents [1] is false - sorry for the bummer.

    Section 1. Most common words in English [2]

    Rank 1 = the
    Rank 2 = be
    Rank 3 = to
    Rank 4 = of
    Rank 5 = and
    Rank 6 = a


    Section 2. From Non-Patent Technical Databases

    [D1a] ScienceDirect
    a = 12,197,919 (4 Apr 2014)
    the / be / to / of / and = Invalid keyword (4 Apr 2014)

    [D1b] EbscoHost
    a = 17,001,836

    [D1c] JSTOR
    a = 5,721,803

    [D1d] ProQuest Dissertation & Theses
    a = 1,613,284

    [D1e] Web of Science (1900 - 25 Feb 2015)
    the = 35,101,530
    a = 28,090,911

    [D1f] SpringerLink
    a = 8,057,690

    [D1g] ProQuest Medline
    a = 19,266,736

    [D1h] Inspec
    a = 13,334,341

    [D1i] World Wide Science
    a = 384,462,717

    [D1j] Networked Digital Library of Theses and Dissertations
    a = 3,890,354

    [D1k] Microsoft Academic Search
    a = 28,881,001

    [D1l] Google Scholar
    a = 8,790,000


    2.1. D1a + D1b + D1c + D1d + D1e + D1f + D1g + D1h + D1j + D1k + D1l = 127,589,878

    2.2 If there is a way to do a unique(D1a, D1b, D1c, D1d, D1e, D1f, D1g, D1h, D1j, D1k, D1l), that may give us a lower limit in terms of total no. of non-patent technical papers.


    Section 3. From Patent Databases

    [D2a] (1782 - 4 April 2014)
    the = 46,780,657
    be = 21,220,425
    to = 41,719,897
    of = 46,975,038
    a = 51.288,486 (as at 25 Feb 2015)
    and = 46,572,567

    [D2b] PatentScope
    a = 27,022,583

    [D2c] FPO
    a = 31,106,638

    Section 4. From Search Engines

    a = 25,270,000,000 (as at 25 Feb 2015)

    a = 32,500,000,000


    Section 5. Yeap's Conjecture

    Yeap's conjecture states that there are at least twice as many non-patent technical publications as patents.

    Interpretation 1: Less than 33.3% of technical information is found only in patents.
    Interpretation 2: More than 66.6% of technical information is found in non-patent publications.
    Interpretation 3. The statement that "80% of technical information is found only in patents" is false.

    Section 6: Suggestions for the PIUG community

    1. To conduct the same search using the common keywords in English a, the, be, to, of, and on the other patent (in particular on Thomson Innovation, TotalPatent & Patbase) and/or non-patent databases. Let us know your findings and we'll add to this list accordingly
    2. Note that for the non-patent databases, we are limited to only digital technical information, if you have access to statistics relating to (a) paper-based technical information (b) private technical information among others, do keep us updated.
    3. Open to other novel approaches to proof or disproof the statement 80% of technical information is found only in patents.



  18. It's good to see that there is some follow up research to the 8th Reoprt (cited above). I belive the 80% quote that started this is an example of the "telephone game" - after 35+ years, some misinterpretation has crept in. What they actually found was that 80% of the technical information disclosed in patents is not disclosed anywhere else. The report actually goes further by saying that (in their sample), 90% were not significantly disclosed in the literature.(There were a few "partially disclosed" examples.)

    The 8th report is actually pretty interesting, containing 2 studies looking at technical information published in patents vs. what is disclosed in NPL, and both studies based soley on chemical literature (Chem Abs) and chemical patents. It's been a while since I read it, so I don't want to make up what I remember about the methodology, but I was impressed.  

    So even when the report was new, the point isn't that patents have 80% of what's out there, it was that 80%+ of what's in patents never gets published anywhere else. Still significant, but a slightly different slant.

    1. Donna,

      Many thanks for your clarification. What you say makes a lot of sense, is believable - and I might even start using this in my presentations. 

    2. 1. Having reviewed the source (TA & F 8th Report), I found that one of the conclusion made by the authors: "8 out of 10 U.S. patents contain technology not disclosed in the non-patent literature." (highlighted in yellow)

      2. Noticed the tiny sample size in this research (highlighted in magenta).

      3. Consequently, the generalisation based on a tiny sample size that 80% of the patents contains tech not disclosed in the non-patent literature is unsound - another bummer.

    3. Hi Donna,

      I agree that we may have a "telephone game" situation here up till 6 Apr 2014.

      RS1, Reference Statement 1 (1977): "8 out of 10 U.S. patents contain technology not disclosed in the non-patent literature."
      Source: Baruch, J., Parker, L., Lawson, W., USPTO, 1978. Technology Assessment & Forecast: Eighth Report, December 1977. United States Patent and Trademark Office.

      RS2, Reference Statement 2 (2007): "up to 80% of current technical knowledge can only be found in patent documents"

      RS3a, Reference Statement 3a (2004): "Patents contain more than 80% of all  technical  information worldwide"
      RS3b, Reference Statement 3b (2004): "80% of the disclosures in patents are never published in any other form."


      1. Scenario 1 is a visualisation of RS1 and was shown to be unsound due to sample size issue.

      2. RS2 (visualised in Scenario 2), RS3a (visualised in Scenario 3) and RS3b (visualised in Scenario 4) may be a due to the 'telephone game' effect. Good to have confirmation from their respective authors of those papers.

      3. RS3b is also unsound - based on the partial full world's digital information of Bing's & Google's search results.

      4. Since RS3b is unsound, the implication is that RS3a (from the same paper) may also be unsound.

      5. If (RS2 is based on RS1) then

      RS2 is also unsound;


      if Yeap's conjecture is true, then RS2 is false.

  19. Rex

    I'm not convinced using "a" is a good measure.  Your other words are all fairly distinct whereas "a" will be used in many more instances other than the indefinite article, for example in a list (a, b, c etc) or as kind codes.

    1. Hi Frazer,

      1. Am aware that the approach I've used may be non-obvious to some.

      2. In the case of patent database, as every single patent document is applicable to the conjecture, therefore 'a' is sufficient to obtain an upper limit. The fields searched were title & abstract. Would be more insightful if we have similar results from TI, TotalPatent, STN & Patbase as far as obtaining a possible higher upper limit for patents.

      Similar considerations for the other technical non-databases.

      3. Additionally, the results from the search engines (Google & Bing) were deliberately excluded in the computation. It was published for us to get a sense of a possible upper limit on information (technical and non-technical) in general.

      I appreciate your comment.

      1. A striking similarity to the premise of this discussion is what Dr. Peter Rusch found when he was creating an early chemical dictionary from when CAS started using Registry Numbers in 1967 until almost 1982.  His research showed that over that period of time that 77.43% of chemical substances that were published were only published once.  This number is extremely close to the 80% that is discussed here.

        It would be interesting to note how many of these singly-indexed-substance publications were patents.  While difficult to prove now, it can be conjectured that a chemical published in a patent may not be published in non-patent literature because (1) it is either of no further interest to anyone or (2) it is of interest but because it is patented, researchers tended to shy away from it.  Of course, the most successful chemicals would probably be published more than once as interest grew and studies showed possible new uses or effects, etc. and new patents and technical publications would be plentiful.   I tend to think it more likely that chemists are very prolific in creating new substances, but the rest of the world is slow in accepting these discoveries. 

        Granted, some chemists may be more interested in getting a patent than in making something that the business world could embrace, but the fact that almost 80% of chemicals were only published once leads one into discussions very similar to the breakdown of what is being discussed in the thread.  And, if a chemist really wanted to be published, a technical journal is an alternative to a patent, though each goes through its own peer or technical review.  If it were the case that a chemist wanted only to be published, then a significant number of these singly-indexed-substances would have been defensive or statutory invention registrations back when these were the fad, because such were easier to obtain than other patents and could be made available in a shorter time than many peer-reviewed journals.

        Perhaps this is all coincidence, but the numbers are so close as to make one imagine a bit.

        1. Ron, thanks for reminding us of the work of Pete Rusch on the frequency of publications for chemical compounds.  This was undoubtedly don when DIALOG was mounting the CA Registry file and due to the size, they split it into two parts, those singly reported in one file and all of the rest in the other.  Of course singly reported compounds often "graduated to the other file yupon addition reports.  One had to search both files to be comprehensive.  He might have or know or have access to data on how many of those were patents.

          Others from other patent A&I organizations often cited similar statistics.  Unfortunately, many of the "old-timers" are no longer active in the profession so getting their input would be difficult.  Given his varied background, I was hoping that Eliot Linder would contribute to this string.

          Donna Hopkins' correct interpretation of the 8th report is the figure that should be quoted.  Further breakdown on singly reported compounds in patents would be helpful to see where chemical information fits in.

          If a researcher is working for industry, the usual procedure for disclosure is and has been to apply for patents first and then possibly consider non-patent publication.  Ron is correct that defensive publications were an alternative but only in a minority of cases.  of course, a lot of chemistry is chemical technology, not necessarily involving new compounds so that's another aspect of the uniqueness of chemical information.

          Back in the early days of the registry file, a number of singly cited compounds appeared, typically monomers from which new polymers were prepared.  Both were cited but, frustratingly, details were often not given for the unique monomer,  They became "orphan" singly reported compounds, never again to be discussed (I called them "I've got it on my shelf and you don't" compounds) but as a result, morphed into defensive publications since they were prior art and could not be patented by anyone else.  C'est la guerre.

  20. Bob,

    Like many of us, I remember that 80% mythical number but do not recall its source.  What I do recall is this: The CAIS of API was abstracting and indexing both NPL and patents, and creating, respectively, two databases, APILIT and APIPAT.  Many of the  researchers at the petroleum refining and petrochemical companies had been used to browsing and/or searching the journal literature.  Thus we had some convincing to do as to the value of APIPAT, of both the researchers and those doing searching on their behalf.  That number, plus some good examples, were part of the process.

    When we speak of comparative amounts of information, knowledge, and technology, I am reminded of the onerous task of indexing Markush groups, which we partially solved at API, with "encouragement" from Stu Kaback, by creating a template to generate the thousands of permutations, within some limitations.  How does one measure how much information that is?  I, for one, cannot say.

    In the end, I am inclined to agree with Nigel Clarke's comment above regarding use of "significant amount" in this environment with so many variables.


  21. Hello Everyone,

    Actually, if you look at the entire Chemical Abstracts literature database (CAplus) for all of the patents they cover, and extract all of the substances from those documents in the Registry file you will find that more like 95% of the discrete substances are not published outside of the patent literature.

    If you would like to read the details please have a look at the blog post I published today on this topic using the New STN system:

    In the post I describe the method and how I came to this conclusion.

    Thanks, Tony

    1. Hi Tony,

      Thanks for the insightful experiment.

      1. Of the 46,449,600 chemical substances, wonder what is the no. of patents that they are associated with?

      2. Could you help obtain the patent numbers based on top-level IPC codes, eg. IPC C = 35,000,000, IPC A = ? IPC B = ? ... IPC H = ...?

      3. Are the patents based on family (ie. invention)?


        based: The lowest or bottom part: the base of a cliff; the base of a lamp.

      1. Hello Rex,

        I think I can answer some of your questions:

        1. While there are over 9M patents in the CAplus database, the 49M substances are actually coming from only 5,633,959 of them. I didn't explicitly point that out in the post but the difference between the total references containing the 49M substances (24,824,536) and the number of non-patent references (19,190,577) is about 5.6M. This means there are some patent documents that do not have discrete, novel substances, according to CAS' rules for capturing RNs within them.
        2. I am not certain that organizing the substances by top-level IPCs will be that useful since a single substance could be in many patents potentially across a collection of IPC classes. I imagine that the number of documents per substance follows a power law distribution where a relatively small number of substances are responsible for the majority of the references and many of the compounds are only found in a single reference.
        3. The patent records in CAplus are organized by family so this information is not associated with individual patent documents.

        Hope this helps, and thanks for commenting on the experiment.

        Thanks, Tony

        1. >I am not certain that organizing the substances by top-level IPCs will be that useful since a single substance could be in many patents
          >potentially across a collection of IPC classes.

          I am fairly certain that it would be insightful. Is it easy to do a top level IPC clustering on the 5.6m patents using STN?

    2. Hi Tony,

      >if you look at the entire Chemical Abstracts literature database (CAplus) for all of the patents they cover, and extract all of the substances
      >from those documents in the Registry file you will find that more like 95% of the discrete substances are not published outside of the
      >patent literature.

      Would Scenario 5 be a correct visualisation of your CAplus experiment?

      Wonder if there is a way to conduct Scenario 6?

  22. I think all compounds receiving a RN will obtain a C class in IPC, so you will get a 100% result for C. Other classes such as A or D will be use classes rather than compound classes

  23. This conversation is bothering me for a couple of reasons -

    First not all chemical patents are on a new substance - could be a new use, new method of manufacture/purification/etc, so just because the substance is also in the literature, it doesn't mean the true "technology" disclosed ever appears in the literature.

    And second, most patents (and their claims) try to be as broad as possible. So they may mention "amines" or polymers haveing a certain range of molecula weights. The people at CAS do try to ferret out what the substances really are, but I don't know that it's always possible. So limiting by subsstance, or even by those that specify the substance is just one more way to acheive "false" results.

    And then there are the patents that are not chemical....

    From my perspective (and limited experience with private industry), it seems unsurprising that companies will allow their researchers to disclose (only as much as required) information in order to protect it (patent), but not be quite as lenient when it comes to broadcasting it (publication/conference presentations). So I am not at all surprised that, at least as far as what comes out of corporate America, information in the patent literature may never appear elsewhere. 


  24. I agree with Donna, based on considerably longer experience working in industry. 

    Corporations get patents because they need to protect inventions, and they need the broadest protection they can get.  Corporate researchers have always been encouraged to disclose inventions in patents, and they've never been encouraged to publish in journals or conference proceedings.  One inviolate rule is that no manuscript or abstract can be submitted for publication until after their content has been disclosed in a patent application. 

    Because of the nature of academic publishing, manuscripts tend to emphasize real experimental results, while patent applications try to illustrate the full scope of the generic disclosure with specific examples.  From personal experience, I know that if Compound A shows real promise in the lab and the inventors foresee similar activity from similar chemical structures, the patent agent will include the similar structures in a Markush and include examples of every compound that's been made and many that haven't.  Chemical Abstracts will register every one of those examples, but they don't represent different "information," just different embodiments of the Markush structure.  And unless one or another of the exemplified compounds tests out as well as Compound A, any later publications will ignore them.

    It should go without saying that most Registry Numbers appear only once, and only in a family of patent documents.  That's how patents are written. 

    Let's not get carried away with the statistics.  Counting Registry Numbers isn't the same thing as counting units of information.


    1. Edlyn hit the nail on the head.  In addition, a patent can have thousands of compounds claimed using Markush structures, but a journal article with that many compounds would be very boring reading!  You don't run across Markush structures in journal articles because the authors are trying to talk about known compounds and their activities, reactions, uses, etc., and thus tend to limit their coverage to the actual compounds that have been created, tested and analyzed.  The opposite is true in patents in which the inventor is trying to claim a broader category of compounds to keep others from making a small change and creating something similar.  The fact that CAS registers these compounds means that we have a handle on searching them, but simply because there is now a means of searching them doesn't mean that there is any more information other than the original reference.  This is explained by the 77+ percent of all substances that were only found once in the literature.

      Again, chemists tend to claim a lot of compounds in patents and because most have not been actually created, tested or analyzed, but there is a vested interest in claiming them.  As such, there is very little interest in the academic community about most of these compounds because they have been only claimed and not researched further. These compounds tend not to make it into journal literature, so the idea that patents contain information not found elsewhere starts to ring true. 

    2. One of the issues I have is that the discussion has been centering around chemical patents, whereas I think the 80% figure was intended to refer to all patents, and if you count in all of the "better mousetrap" ones and others in areas where there is not much publication in technical literature, as Edlyn has pointed out previously, the figure can probably be justified, although impossible to prove.