As a member of the PIUG Vendors and Producers Committee, I act as "monitor" for databases and other sources not in the patent database mainstream. I'm opening this forum to promote discussion and to gather input for an article in the PIUG newsletter. Please add your input on this diverse topic.

Patent searchers not only need to search sources covering patents but others covering non-patent information about science and technology of interest. Of course, information and data in these or any other published source have importance to patent searchers, especially in the realm of prior art.

Just as a number of free patent files have become available, free and "open" sources of more generalized chemical information are appearing. Challenging the perceived domination of databases from Chemical Abstracts Services (CAS), especially the CAS Registry System and CAS Registry Numbers (CASRN), several Open Access (OA) advocates are promoting the use of free and open sources of chemical information and data and even alternative registry systems. In a prior entry in his blog, Peter Murray-Rust

(http://wwmm.ch.cam.ac.uk/blogs/murrayrust/) [yes, it works] throws down the gauntlet and promotes a number of free/open chemical databases including "Chemspider, Pubchem, ChEBI, Wikipedia, The Blue Obelisk, CrystalEye, and Open Noteboook Science". They allegedly feature "openness, re-use and sharing, immediacy, innovation, linkedness, semantics, quality control, and community". He and others also promote the use of InChI nomenclature and structure representation as an open alternative to CASRN.

As previously discussed, quality control for PubChem (http://pubchem.ncbi.nlm.nih.gov/) is often problematic. An effort is underway to provide quality control for chemical information in Wikipedia (http://en.wikipedia.org/wiki/Wikipedia:Chemistry) . We'd welcome input on experience with these and other such sources of chemical information.

Additional sources are cited at http://zusammen.metamolecular.com/2009/03/09/sixty-four-free-chemistry-databases-serialized . In addition, there's the still somewhat mysterious (at least it only has limited publicity) Common Chemistry site (http://www.commonchemistry.org/), "...Common Chemistry from Chemical Abstracts Service (CAS), a web resource that contains CAS Registry Numbers for approximately 7,800 chemicals of widespread general public interest." Since it's from CAS, the accuracy and quality control are assured but the free, public availability is an innovation.

In addition, fee-based services like Beilstein are facing difficulties. Elsevier is bundling access to them, via CROSSFIRE, along with their chemical reactions resource, Reaxys (http://info.reaxys.com/index.php ). Academics are especially concerned with the effective increase in cost. Non-academic users, also concerned with costs although possibly not as much as the academics, are also concerned with support of the service. Of course, Beilstein is available via STN but the pricing is considerably more expensive ("outrageous" in one patent searchers opinion).

Let's keep this string going. I won't be attending the PIUG meeting but I'll monitor this Forum and report on it.

-- Bob Buntrock, Buntrock Associates, buntrock16@myfairpoint.net

12 Comments

  1. Bob,

    Thank you!  I really appreciate the info you've collected and disseminated.

    Personally, I'm using the PTO's EAST system which is useless very limited in registry and structure searching, so outside sources for this are needed.  But I do not do these types of searches often enough to justify paying the "outrageous" Vendor rates, so your links are great.

    Dominic
    www.demarcoip.com

  2. Hi Bob,

    In your posting above, you used quotations when referring to 'open' and they say databases allegedly[my emphasis] feature "openness, re-use and sharing, immediacy, innovation, linkedness, semantics, quality control, and community". Do you have doubts as to whether these sources are genuinely open or make good on their other claims? It may not be intentional, but some of the phrasing suggests you have a bias against anything that hasn't been developed by CAS. As the previous posting suggests, there are many people interested in exploring these options, but I wonder if they will get a fair hearing in this forum.

    Regards,

    Nicko Goncharoff

    Managing Director

    SureChem Inc.

    1. Nicko,

      As your aforementioned "previous posting suggests" poster, I have to say relax.  I don't know Bob at all, but I'm pretty sure he's not purposedly slandering your product in any way, shape or form.  We, as patent information users, rely on the tools we know and will always have an inherent bias towards them, especially if they have served us well for 10 or 20 years.

      That said, any of the free and open-source projects Bob mentioned that piqued my curiousity might be the greatest thing since sliced bread and become the industry standard in a decade, but they also may be gone next week when the venture capital money runs out.  Most do not have track records and their claims to greatness can often be exaggerated.  For every Wikipedia that survives and becomes useful, how many others do not survive?

      Hopefully your quick defense of SureChem will cause a few PIUG members to give it a test run and perhaps in a few months we'll get some feedback.  But without really vigorous testing or a very large authorities stamp of approval, no professional will vouch for an otherwise unknown system.  As an example, just look at the way we disparage Google Patents in its current incarnation.  A great tool to knock out your uncle's crazy ideas with a three minute search, but no one will ever sign a clearance search performed with it.

      Dominic
      www.demarcoip.com

  3. Nicko, no offence intended.  Admittedly, I should have used "reportedly" when referring to these quoted databases since I have familiarity only with PubChem and Wikipedia.  I didn't have a chance to check them out since I've been busy with other matters and I wanted to post this forum before many contributors left for the PIUG meeting (and give you something else to discuss).  To show my unfamiliarity, which of these databases are you (and SureChem) associated with?  If none of the above, please furnish a link.

    Going over the quoted features, I have little doubt that these sources are open, resusable, shared, immediate (I would hope), innovative, and linked.  Semantics and quality control are potentially more problematic and community depends on the size, capability, and true participation of the community.  My skepticism of quality control on PubChem is well known in another forum (CHMINF-L; Chemical Information List).

    Disclaimer time.  My knowledge of CAS and its databases goes way back (50 years counting printed CA, almost 40 using various online versions).  I've published extensively on CAS products especially the CAS Registry System.  And yes, after going independent, I've consulted for them on occasion (but no longer).  For features that a patent searcher (or any other for that matter) needs to trust, they're the benchmark the rest of you are aiming for.  Bias on my part?  Based on long-term experience.

    Over the course of my career, I've loved doing comparisons ANY of the wide variety of chem and tech info services and products available and I deeply appreciate such comparisons and evaluations by other info pros.  That's the primary intent of this forum.  To paraphrase the moive tag line, "Show us the data".  Please participate.  Give yourself a "fair hearing".

    Dominic, thanks for your views and defense.

    -- Bob Buntrock

    buntrock16@myfairpoint.net


    1. Gentlemen,

      I think my posting must have been misinterpreted. First and foremost, I did not take Bob's comments in any way as slandering or disparaging SureChem. I actually wasn't even thinking of SureChem when writing my post. While I know and work with some fine folks in the Open Access chemistry community, SureChem is a commercial product and as such doesn't really fall into that category. So no defense of SureChem was intended, and Bob, certainly no offense was taken!

      My point was more that it seemed like Bob's intro to this thread implicitly questioned the viability or utility of many of the Open Access chemistry initiatives, and I was merely noting that tone. As I did say, perhaps it was not intentional. While I don't for a moment question the issues of quality assurance surrounding many of these databases, I do think that given time they can become valuable resources for science (if not IP searching!) and as such should be encouraged.

      While I'm clarifying: I don't question the quality and value of CAS products, especially the registry system. It's the product of 100 years of solid science and effort. I guess I feel that it should not be an either-or issue - CAS vs. Open Access. At SureChem we see ourselves as a useful and efficient complement to other patent chemistry search services. Likewise I think the likes of ChemSpider and PubChem should be seen as viable complements to, not replacements of, established data sources like CAS Registry. I don't think anyone shoud aim to replace CAS, but rather seek to broaden coverage of chemistry information, and expand access to it.

      On to more interesting matters. We have conducted some comparisons of SureChem and STN, though I can't release the results, as they have been conducted by our pharma customers and involve proprietary compounds. However, we are commissioning a comparative study that is both pharmaceutically relevant AND can be publicly released. When we have that I will be more than happy to post it here. Thanks for the invitation to do so.

      Look forward to continuing the conversation.

      Nicko

  4. Note the important news item re the acquisition of ChemSpider by RSC ( RSC acquires ChemSpider - News Release ).  ChemSpider is one of the free resources cited above and of interest to the chemical information community, including patent information.  Further contributions re any of these resources will be welcomed to this forum.

    -- Bob Buntrock, Buntrock Associates, buntrock16@myfairpoint.net

  5. Below is a notice and press release from CAS re Common Chemistry, the previously somewhat mysterioous free database of 7800 CAS Registry Numbers (CASRN).  Obviously not comprehensive, but with CAS behind it, of good quality.  Also note the relationship with the Wikipedia volunteers.

    -- Bob Buntrock

    CAS Launches Free Web-Based Resource "Common Chemistry(tm)" for General
    Public: Links to Wikipedia records provided in collaboration with
    Wikipedia volunteers 
    In 2008, CAS and Wikipedia announced an agreement to work together for
    provide accurate CAS Registry Numbers(r) for current substances of
    widespread general public interest.  
    CAS is pleased to announce the official launch of Common Chemistry
    (www.commonchemistry.org <http://www.commonchemistry.org/> ), a free
    web-based resource for information on chemical substances of general and
    widespread interest. 
    This resource is helpful to non-chemists and others who know either a
    chemical name or a CAS Registry Number of a common everyday chemical and
    want to pair both pieces of information.  
    Common Chemistry contains approximately 7,800 chemicals of widespread
    and general interest, as well as all 118 elements from the periodic
    table. With the exception of some of the elements, all other substances
    in this collection were deemed of widespread interest by having been
    cited 1,000 or more times in the CAS databases. Examples of substances
    in Common Chemistry include widely recognizable ones such as caffeine,
    benzoyl peroxide (acnes treatment), and sodium chloride (table salt).  
    CAS thanks the Wikipedia volunteers, especially Professor Martin Walker
    at SUNY Potsdam, who collaborated with CAS to provide the links to
    Wikipedia records (when available). The Common Chemistry database will
    be updated periodically and Wikipedia links will be added when possible.
    To view the news release, visit
    www.cas.org/newsevents/releases/commonchemistry051209.html.   
    Crystal Poole Bradley
    CAS

    CAS Launches Free Web-Based Resource "Common Chemistry" for General Public

    Links to Wikipedia records provided in collaboration with Wikipedia volunteers
    Chemical Abstracts Service (CAS), a division of the American Chemical Society and the most comprehensive and authoritative source of chemical information, has launched a new, free, web-based resource called Common Chemistry.
    This resource is helpful to non-chemists and others who might know either a chemical name or a CAS Registry Number of a common everyday chemical and want to pair both pieces of information.   
     
    Common Chemistry contains approximately 7,800 chemicals of widespread and general interest, as well as all 118 elements from the periodic table. With the exception of some of the elements, all other substances in this collection were deemed of widespread interest by having been cited 1,000 or more times in the CAS databases. Examples of substances in Common Chemistry include widely recognizable ones such as caffeinebenzoyl peroxide (acne treatment), and sodium chloride (table salt).
    "Anyone can easily search by CAS Registry Number or chemical name and confirm the substance details, such as the CAS Registry Number, chemical names or synonyms, molecular formula, chemical structure, and a reciprocal Wikipedia link when available," said Christine McCue, CAS Vice President, Marketing. "Visitors also have the ability to bookmark the page using social media tools, such as Delicious and Digg."
    While not intended to be a comprehensive CAS Registry Number (CAS RN) lookup service, Common Chemistry does provide access to information on chemicals of general interest. The CAS Registry Number is recognized throughout the world as the most commonly used, unique identifier of chemical substances. The full CAS REGISTRYSM database contains more than 46 million organic and inorganic substances. Research discovery and patent tools such as SciFinder and STN allow users to search the entire database.
    CAS thanks the Wikipedia volunteers, especially Professor Martin Walker at SUNY Potsdam, who collaborated with CAS to provide the links to Wikipedia records (when available). The Common Chemistry database will be updated periodically and Wikipedia links will be added when possible. Updated 5/12/2009 11:40:03 AM 
    Copyright © 2009 American Chemical Society

  6. Just a few words of behalf of the "WikiChemists" at Wikipedia. First of all, thanks to Bob for remarking the effort that Wikipedia is putting into quality control: it's a big job, even for a small database such as WP:CHEM.
    Our link up with CAS is part of that quality control effort ("verification and validation" in WikiJargon). CASRNs provide us with one baseline, chemical structures (codified as InChIs) provide us with a second. Every database contains errors – even (SHOCK HORROR!) the CAS Registry – but the very concept of an "error" can be subjective at times.
    I know that at least a couple of patent professionals have used Wikipedia as part of their searching, because they've been kind enough to take a few minutes of their time to give us their comments and suggestions. Such comments and suggestions are always welcome (HINT)! We would like to introduce structure searching (probably as some specialized plugin) as many users have suggested, but again it's a big job.
    Wikipedia isn't really set up for patent searching. Between WP:CHEM and WP:PHARM, we probably have details of about 10,000 chemical compounds, while other free-access databases number their entries in the millions. That ratio will not change, as Wikipedia continues to offer a distinct product to hundreds of millions of users each month: the key to success is offering a distinct product. All the same, I can understand a searcher doing a quick check on Wikipedia, just as I would understand that the searcher/examiner on US7479949 would be upset when people say that the "invention" was already described on Wikipedia. Still, Wikipedia was cited in 508 U.S. patents issued in 2008 http://patentlibrarian.blogspot.com/2009/02/wikipedia-references-increase.html, a minuscule proportion of the total, but one which is growing.
    Finally, I should add that we have an article on "Internet as a source of prior art" http://en.wikipedia.org/wiki/Internet_as_a_source_of_prior_art: like all Wikipedia articles, it should be treated as a convenient starting point for further research, not as an authoritative statement in itself.

    P.S. For those who are interested, a few more details on the WP-CAS link up and WP's future projects can be found here: http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2009-05-18/Chemistry_data

  7. More on Free Chemical Structure Databases

     

    Steve Heller recently posted (on CHMINF-L, the Chemical Information list) the announcement that the ALA cited PubChem as on of the "Best Free Reference Websites of 2009".  In an addendum, PubChem was stated to have >63 million records of chemical structures ("substances") of which >37 million are "unique" (compounds).  Kudos to PubChem but when I questioned the latter figure (PubChem is notably redundant, ca. 2:1 even from their figures), I precipitated a vigorous discussion on the true number of truly unique compounds in PubChem (CAS has ca. 46 million non-sequence compounds in CAS Registry, including polymers, mixtures, etc.).

     

    Initially, the reason for the high number of unique compounds was that PubChem listed a large number of "unpublished" compounds (substances with no published citations).  Others weighed in with the observation that even CAS REG has records for "unpublished" compounds so that even with different missions, CAS and NIH should have extensive overlap in this area.  However, CAS has extensive editing, curation, and revision processes (human and machine-based); PubChem's similar efforts are largely machine-based.

     

    Antony Williams (ChemSpider) summarized much of the discussion and the situation of errors and redundancy in his blog: http://tinyurl.com/nepaho .  The emphasis was on PubChem and CAS but the chemical indexing of Wikipedia was also discussed.  For more than a year, CAS has been using ChemSpider input.  Williams also questions the value of listing compounds with no associated data or information.  The upshot is that no database is perfect much less complete.  However, those with more extensive editing, especially by humans, are more  likely to be accurate.

     

    The SureChem service was also cited briefly.  This is a searching service for chemical structures in patents with both free and fee capabilities.  Input to this discussion by users of SureChem is requested.

     

    It would seem to me that the definitions of "published" and unpublished are becoming increasingly blurred.  Bottom line for patent searchers:  Prior art searching for chemical structures and entities should include these "non-traditional" sources.  However, caveat emptor is even more important because these sources may be less accurate in their validity.

    1. Bob,

      This is a nice summary of the CHMINF-L discussion on free vs. CAS Registry compound counts. Your comments about the importance of non-CAS compounds to patent searchers are well taken, but the situation is a bit more complicated than whether a compound from an unpublished source is has been published because it's been indexed.

      US patent law requires a disclosure be enabling to support a patent, and a similar principle applies to cited prior art - if a patent examiner cites a publication of a chemical structure that simply shows a diagram or name and doesn't teach how to make or use the compound, a good patent attorney/agent can usually talk a patent examiner out of rejecting claims to the compound. An additional reference can supply the missing information, but a simple index record won't suffice to make the rejection stick. This is why patent examiners cite patents and articles and hardly ever cite Registry listings.

      Similar principles apply to other patent offices. My favorite example of a rejection that went away is an EP application with a rejection based on a CAS Registry search. The applicant's agent pointed out to the EPO that the compounds were indexed incorrectly - the article from which they'd been indexed had not really specified the positions of substitution in the molecules. Those compounds aren't in Registry any more - CAS deleted them because they'd never been described in a published document.

      1. Edlyn, thanks for your comments.  However, I believe that the distinctions between published and unpublished, indexed and not indexed, are being blurred.  Many of these "unpublished" compounds, in PubChem or even CAS, have some data and maybe even a link or two cited (of course, many do not).

        The indexing criterion used by examiners cuts both ways.  I know of an instance where the corporate Patent Attorney (a PhD Organic Chemist, incidentally) had found that a disclosed novel compound had been previously indexed.  It had, but the only "mention" in the article (Japanese, in English) was incidental in the journal abstract.  CA indexed it, so what to do.  No data, no nothing.  A search of Science Citation index was requested but the dozen or so citing references only referred to the NMR study which was the main thrust of the paper BUT FOR COMPOUNDS OTHER THAN THE ONE OF INTEREST.  The compound was deleted from the disclosure.  C'est la guerre.

  8. Hi Bob,

    This is a fascinating discussion.  You mentioned earlier that you were looking for input/additional information on the Surechem service - I just wanted to add that Intellogist has a write-up on the service with some details, screenshots, etc.  at http://www.intellogist.com/wiki/Report:SureChem