Strategic Librarian

Using strategy to develop the law firm library.


Findability vs Searching

I can see a summary of the searches that have been done that lead the searchers to Strategic Librarian.  It’s interesting to see the words and ideas that people use while searching.  For example, “how to write a business case” leads searchers to the post I did on writing business cases.  I was baffled recently when I found the search string “to do both make”.  While baffled and wondering what post of mine led that searcher to this blog, I realize that search is a bit of an art form that many haven’t mastered. 

I often refer back to Roy Tennant’s quote,  “Librarians like to search, everyone else wants to find.”  Since I do think this is true, I was interested to read Mark Hall’s recent On the Mark post Finding, not searching is what really counts.  Thank you to Janice LaChance, SLA CEO, for pointing out this posting in her blog, Executive Connections.

Hall reports on the research being conducted by Carl Frappaolo, vice president of market intelligence for AIIM.  In summarizing a recent survey, Frappaolo reported that:

HIs survey, taken in May among 528 respondents, … indicates 52% of business users acknowledge that the enterprise search process has gotten easier over the past two years, but half of them (49%) still find it difficult and time-consuming.

He also notes that “49% of any given company’s employees were clueless on advanced search techniques, like Booleans or even multi-term queries.” 

Frappaolo contends that “IT is “throwing a lot of good search tools at siloed content” and that the main problem is that “nobody owns the strategy for findability.” 

Search is tough to get right in most organizations.  Besides there being no owner of strategy, other roadblocks exist:

  • Most members of an organization, do not understand how search works, they just know that many times, it doesn’t
  • Leadership, who need to understand more detail to define and approve search projects and expenses, don’t think there is a problem and don’t have the time or interest to learn
  • IT thinks putting a search strategy in place is a simple task – index, search, find
  • Content owners don’t think to share with one another or don’t understand the value of their information to others
  • Members of an organization may be working on the same issue, idea, project, etc and not know it

Knowledge management practitioners are in a unique position to help solve these problems, if they get the support, as their work crosses boundaries and silos in the organization.   Having done a knowledge audit, they can draw connections that would have otherwise remained unknown.  With this knowledge they can define the search/find strategy needed and then work with IT toward a successful solution that works for everyone.

Several past surveys by various firms support the need to improve search/find.  In 2005, Outsell, Inc. reported:

Today’s professionals spend most of their time (53 percent) seeking out information. Four years ago, knowledge workers were able to spend 58 percent of their time analyzing and applying what they had found. Collectively, the time spent gathering and looking for information translates to an estimated. 5.4 billion lost hours per year for US corporations.

With these kinds of statistics, you would think corporate leadership would be very interested in finding a solution to improve efficiencies.

I am most likely “preaching to the choir” as most of you already know what I am talking about.  How do we get leadership and IT to understand that there is a way to quit focusing on searching and start focusing on finding?  While not the solution, I think we need to start by giving up our own love affair with Search and start learning more about Find.  Does anyone have next steps? 

An article with a different take on search: Foraging for Information with Search Search Insider, July 10, 2008


2 Comments

5 Reasons Librarians are Better than Search Engines.

j0409425.jpgI have long agreed with Danny Sullivan of SearchEngineLand.com on most things but I have to respectfully disagree with one of his recent comments with regard to Microsoft’s and Google’s book digitization projects.  Sullivan, quoted in In Microsoft vs Google, Search is True Prize published in the Guardian Unlimited on February 2, 2008 and currently available via the Semantic Web Company website, said:

The projects are strategic, said Danny Sullivan, Editor-in-Chief of SearchEngineLand.com. Sullivan said Google sets the tone by spending large sums of money to develop new businesses without rushing to make money back. Books is one example. It undertakes many “pie-in-the-sky” projects betting some will become big money-spinners once they are popular, allowing Google to sell advertising alongside them.  “Microsoft and Google are both building libraries and the way you get the books off the shelves at these digital libraries is through their search engine. Their search engine is an electronic librarian,” Sullivan said.  “The battle shouldn’t be over getting the books, the battle should be over who is building the best librarian.”

Yes, search engines have come a long way since the 1990s, but they have not reached the capabilities that would put them kin to a librarian.  What skills and knowledge to librarians possess that search engines don’t?

  • Librarians have critical thinking skills that allow them to look at a question from many angles before working on the answer. 
  • Librarians understand nuances that aren’t contained in the text of a book or web site.
  • Librarians have muti-dimensional problem solving skills.  They understand that questions could lead to more questions and answers could lead to more problems.
  • Librarians recognize differences in their users that search engines have yet to learn.  Humans know more about human motivation than computers could ever understand.  
  • Librarians ask questions.  They are taught to ferret out the researcher’s real question through reference interviews.  Researchers often don’t know how to ask the right question to get the answer they are seeking.  Reference interviews aren’t set questions and answers that a computer can put forth and understand.  They are discussions between two human beings that lead to a better understanding of the question by both parties and better answers for the researcher.

My sister, while doing some research recently, likened searching on the internet to setting off fireworks.  The search is like an explosion that sends millions of answers off in several directions much like a firework can send fire and color up into the night for all to see.  While reviewing one set of answers, another set of fireworks could be set off distracting you from what you already found and taking you further away from what you need.  There is no one to guide you to the right materials.

Building a better librarian (er, search engine) is a lofty goal and we all know we need better search engines.  However, thinking of a search engine as a librarian is a bit short sighted.  I’ve worked with good librarians who, while even using bad search engines, find information we would have thought could not be found or used a few years back. 

If an improved search engine could make a resources easier to find, think of what a librarian with a good search engine could do for your company.  A search engine, no matter how good it gets, is still a tool.  Librarians add a human element to online researching.  They are the guides that can keep fireworks associated with holidays, not searching.


1 Comment

Retro Thoughts on Search Engines

Back in 1996, I worked with the Minnesota State Law Library to develop the court decisions archive for the Minnesota Appellate Court System.  After many years of working in libraries including 10 working in law firm libraries, I was finally getting my MLS and had managed to arrange an independent study that required me to study what other courts were doing with their decisions and what software those courts were using to make the decisions searchable.   Once I learned which search engines were being used, I needed to evaluate which one to use for this project.   To do that, I wrote a paper describing the criteria to use to evaluate this type of software along with several appendices that contained the evaluation of a number of search engines in use during that time.  

The paper was published on the web back in 1996.  In addition to this paper, I also wrote one titled State Court Decisions on the Internet that summarized what was available and what functionality each site had.  Following that, I created a web site with the same name that served as a directory of state court decision sites.  That was taken down eventually as I moved into another position and did not have time to maintain it.  I only have a print copy of the appendices, which are of little use given how long ago I wrote the paper unless someone wants to go down memory lane with WAIS or Swish or the like, but I did find an electronic version of the search engine paper.  

I do think the main part of the article still has value as it was one of the first basic web postings on the topic and some of the ideas hold true even if the language I used may not be used today to describe functionality, etc.   I will follow this posting with another in the near future that provides an updated look at search engines as we know them today.

Web Search Engines  

Nina Platt, November 1996  

Regular users of the Internet’s World Wide Web have become very familiar with the sites that have been developed that allow users to search across all or a part of the Web.  Examples of these sites include Infoseek, Altavista, Hotbot, etc.  They are great tools for finding information that is stored across the Web and can assist users in finding information that would otherwise be difficult to locate.  In addition to these tools, individual websites have begun adding searching capabilities that allow users to search all or a part of their own sites.  This paper focuses on the search engines that have been used by the sites that provide access to state appellate court decisions.  Many of the sites listed in the paper, State Court Decisions on the Internet, have provided access to decisions through a variety of search engines which will be described throughout the following pages. 

Structure of Search Engine 

The search software on the web (more often referred to as search engines or search tools) are really database management tools that have been developed over the years to manage data and now have been enhanced to allow users access over the Internet, or they are database management tools that have been developed specifically for the Internet.  And to further complicate this explanation, third party or individual developers have also created Web interfaces that allow Web browsers to communicate with search engines.  Additional differences that can be attributed to search engines include: 

  • They are created to manage databases with structured fields (i.e., Foxpro), databases of textual documents(i.e., Excite), or databases that have both structured field and text (i.e. freeWAIS-sf). 
  • They are created to manage relational databases (i.e., Dbase) or flat files (i.e., Filemaker Pro). 
  • They are created to manage databases that have index files (that contain each word that has been indexed) but maintain the original documents in directories on the computer (i.e. Swish), databases that have index files (that contain each word that has been indexed) and store the documents in a data file (i.e., Folio), and databases that have index files (that contain each word that has been indexed), data files, and maintain the original documents for displaying or downloading (Basis Webserver).
  • They are developed to support different operating systems (i.e. UNIX, Windows NT, Windows 95).
  • They can be accessed through a variety of interfaces including Web browsers, Telnet connections, modem connections, connections using a GUI Windows interface, etc.

This paper will deal with search engines that have been developed to manage databases or document collections that may have both full text and structured fields. 

Indexing 

Before a database can be searched it must be indexed.  Indexing generally consists of systematically going through all documents that have been designated for indexing and creating a file of terms found in the documents.  The index will also include pointers back to the original document or to a record in a data file.  This allows the user to search on a term and receive a grouping of records or documents that contain the term.  The search engines examined use a variety of capabilities to make databases searchable.  They may or may not include the ability to:

  • Specify multiple directories and files to index
  • Recursively index subdirectories (index all subdirectories without having to specify each subdirectory) 
  • Specify file extensions of files to be indexed
  • Specify stopwords (commonly used words that should not be indexed) 
  •  Provide incremental indexing (when new documents are added, the entire file system does not have to be reindexed)
  • Provide dynamic indexing (documents can be indexed while users are searching)
  • Schedule indexing when site is not in use
  •  Index across servers on the same network or across networks
  • Merge indexes
  • Create individual indexes for different collections of documents
  • Add structured fields that will be indexed
  •  Index a variety of document formats including HTML, ASCII, PDF, wordprocessing files, etc.
  •  Index HTML tags: <meta>, <head>, <body>, <title>,header (<h1> to <h6>), emphasized (<i>, <b>, <em>, <strong>), or comment tags
  •  Index a protected server (one that requires user authentication to access)

Searching 

Once the database has been indexed a form or script is used to provide access by searching.  The searching capabilities of a database can be different even if developers are using the same search engine.  The differences are due in part to how the database was indexed and in part to how the search interface has been set up.  Some search engines do not include a searching function.  The developers of these engines created the ability to index and left it up to third parties or individuals to create the scripts or forms that are used in searching.  As with indexing, the search capabilities of a database can be different depending on what the developer included in the search function.  They may or may not include the ability to search using:

  • Natural query language.  This allows users to enter a question or phrase that best describes the topic for which they are searching.
  • Boolean operators (AND, OR, NOT).  These are connectors that allow user to search where all terms are contained in documents (AND), where any terms are contained in documents (OR), or when one term but not the other are contained in documents (NOT).  The default that is used when not entering a connector is generally AND or OR.
  • Proximity operators.  These connectors allow users to search where a term is found within so many characters from another term (W/number of characters), where a term is found ADJacent or NEAR another term.  Another capability offered by some database managers allows the user to specify the order of the terms (i.e., database BEFORE manager). 
  • Phrase searching.  Allows users to search for an exact phrase.
  • Thesaurus.  Uses an operator that replaces terms with synonyms or provides the user with a summary of broader or narrower terms and/or synonyms. 
  • Concept searching.  Similar to thesaurus, this function will search on all variations of a term. 
  • Wildcards.  Allows users to truncate terms when they want variations of a term or insert wildcards when they are not sure of the spelling or want to specify how many characters should be replaced by the wildcards.  A single string wildcard can be used to replace one character (i.e., Anders?n for Anderson or Andersen).  Multiple characters can be replaced by using more than one single string wildcard (i.e., act??? would retrieve action or acting).  A character string wildcard can be used to search words that contain the same string of characters (i.e., dark* would retrieve darker, darkness, darkest, etc).  Wildcards can be used for prefixes, suffixes, or characters within a word depending on how the software was developed. 
  • Exact match.  Allows users to search on the term exactly as it is entered.  This is useful if the database was set up to search for the singular and plural variation of a term. 
  • Fuzzy match.  Returns records with words that have a similar spelling to search terms. 
  • Numeric operators like equals, greater than, less than, etc.  Returns records with a specific alpha numeric value. 
  • Range operator.  Returns records within a range of values. 
  • Fielded searches.  Allows users to search on a specific field or fields in the database. 
  • Query by example.  Enables users to find other documents similar to an document in the current result set that the user finds relevant. 
  • Advisors.  Provide tips on how to construct a better query.  

Additional features that may be part of the search function are:

  • Users can select to search on one or more databases
  • Users can select the max number of records to return in a result set
  • Users can choose between a simple or advanced search form

Results Display 

Once the database has been searched, the results must be returned to the user in a usable format.  The format should include enough of a description of the records or documents returned to allow the user to make a decision about which document he/she wishes to display.  As with the indexing and searching, the results display will be different depending on the database manager used.  They may or may not include the ability to display the following: 

  • Title of document/record
  •  Author of document/record
  •  Description or summary of document/record
  •  Size of document/record
  • Relevance ranking of document/record
  • Number of documents/records found
  • Database from which the document was retrieved
  • Search terms used
  • Date document was created or indexed
  • Database fields as specified by database administrator or user

Additional features that may be part of the results display function include:

  • Results display can be modified by administrator or by user. 
  • Terms searched on are highlighted in the document. 
  • Users can navigate between search terms (or hits) within the retrieved documents. 
  • Users can select to display various formats of the same document (ex. HTML, wordprocessing, PDF). 

Descriptions of Search Engines 

As mentioned above, the creators of the websites that have searchable archives of court decisions have used a variety of search engines to index and provide searching capabilities for their users.  These search engines include: 

  • Applesearch
  • Excerpt
  • Excite
  • Folio
  • Frontpage
  •  Fulcrum
  • Isearch
  •  PL Web
  • Swish
  • TEAMate
  • WAIS
  • WebFind/WebIndex (WebSite) 

Appendix A includes an description of these search engines based on the criteria listed above.  It also includes description of Basis Webserver and ht://Dig, two search engines which look very interesting.  The information on various search engines covered in this paper was collected through an analysis of the documentation posted at the website of the organization or individual who developed the software and/or through a survey sent to the developer.   

WAIS is a name that is used for a variety of products that are both freeware and commercial.  They include the original WAIS search engine that was originally developed by Thinking Machines Corporations.  WAIS Inc. now sells a commercial version of WAIS and owns the trademarks “WAIS” and “Wide Area Information Servers”.  Variations of WAIS that have been developed include freeWAIS (freeware), freeWAIS-sf (freeware), WAIS (commercial), NT Wais Toolkit (commercial).  For purposes of simplication, freeWAIS-sf is the only WAIS product described in the descriptions.  Swish, which is also included in the evaluations, is a simplified version of WAIS.  Swishgate and wwwwais are WWW to WAIS gateways which allow users to search and display the results of searches done on a WAIS or Swish database.  Descriptions of these products are included. 

It is assumed that all search engines discussed require a HTTP compliant web server with the exception of Frontpage and Website (both are Web server products with built-in searching functions).  

How to Choose a Search Engine 

The purpose of this study was to gather information to be used in determining which search engine should be used to create a searchable archive of court decisions.  Selecting a search engine is a difficult task because of the variables involved.  To make a decision one must answer the following questions? 

  • What platform will you be using.  Examine the platforms supported carefully.  The search engine generally will only run on specific platforms with specific versions of the platforms operating system.
  • How important is easy installation and low maintenance?  If a website creator finds that time is a precious commodity or doesn’t have the technical know how to undertake a complicated installation then he/she should look for a search engine that is easy to install and maintain. 
  •  Do you want to maintain a collection of files stored in directories or do you want to maintain a datafile that stores the documents?  There are advantages to both.  The search engine that requires maintenance of a directory or directories of files does not require that you import each file into the database.  The search engine that requires the documents be stored in a datafile allows the administrator the luxury of only having to maintain one or more files (depending on the structure of the database).
  • What documents do you want to index and search?  If the documents that are going to be included in the database are spread across file servers or directories, then the search engine chosen must include the ability to index documents wherever they are located.
  • Do you want to maintain individual databases or be able to choose one or more databases to search?  If so, then the search engine has to have the capability to create multiple databases and the search interface must provide users with the option to select databases.
  • What search capabilities do you want to offer your users?  The searching functions must be examined carefully to see that they meet user needs.See Appendix B. 
  • Do you want to add structured fields to your documents?  If so, the search engine must support the addition of structured fields.  The advantages to adding structured fields are many including the ability to search on specific criteria.  The disadvantages include the increased amount of time needed to maintain the database.
  • How do you want the results displayed?  Do you need the ability to modify the results display that is delivered with the search engine?  As with the searching functions, the results display function must be examined carefully to see that they meet user needs.  See Appendix C.
  • Do you want your users to be able modify the results display?  Different users may find that they need to display different components of the documents they’ve retrieved.
  • Do you want your users to be able to view the HTML or ASCII form of the documents and be able to download the original wordprocessing file?  The various search engines handle this in different ways (some of which are more time consuming) and some do not offer the function. 
  • How much do you want to pay for the search engine?  The costs of the search engine range anywhere from free to thousands of dollars.  See Appendix D. 

These are just a few of the questions that must be considered before selecting a search engine.  To make a good selection, end users should be included in the evaluation process.  This will ensure that the database that is developed will meet user needs.  Also, there is no perfect search engine.  All of the features of the products must be examined and tradeoffs must be made depending on what is more important.  For example, a website developer may find that the users must have Boolean operators but don’t need the ability to modify search results.  

Conclusion 

As mentioned earlier, the research for this paper was initiated in order to determine which search engine should be used for a searchable archive of court decisions.  And as mentioned earlier there is no perfect search engine that will provide all of the functions needed.  Selection is a three step process: 

  • Determine the functionality needed
  • Compare what is needed with the functionality provided by the various products
  • Determine what tradeoffs you are willing to make