|
Thunderbird |
|
|
|
Software Engineering | Telescope Resolution | Quiet Flight | Q & A |
|
Q and A Web Technology |
|||||||||||||||
|
Copyright 2003 by Roger Charles Garrett |
|||||||||||||||
|
This article is a work in progress. |
|||||||||||||||
|
This article discusses a proposed extension to HTML that would allow web search engines to more efficiently and effectively answer questions posed to the search engine. The Issue There are two primary uses for web search engines, determined by the result expected by the user. On the one hand there are those who want to locate the most important web pages related to some topic or key word. For example, I’m looking for anything and everything on Dinosaurs so I type “Dinosaur” into the search engine and it returns a list of (potentially tens of thousands of) web pages that in one way or another are related to dinosaurs. The engine ranks them in some fashion and, if I’m lucky, the ones near the top of the list will contain the information that I’m most interested in. On the other hand there are those who want an answer to a very specific question. Retrieving a huge list of pages related to some keyword is generally a very inefficient approach for this case. The user then has to read through often significant amounts of text on multiple web sites before the actual answer to the question is encountered, if at all. Suppose that I want to know when the dinosaurs were killed off by the asteroid, so I type in the question, “When were the dinosaurs killed off by the asteroid?”. What I want is a precise answer to the question. But what I generally get from a search engine is an endless list of web pages, some of which, if I’m lucky, will actually address the question. The Proposed Solution The proposed solution is a combination of a new HTML tag and new processing by web search engines. The new HTML tag allows web page designers to embed explicit question and answer information into their web pages and the proposed processing by search engines allows the user of the search engine to get exact answers to questions. The Benefit to Search Engines Search engines will benefit by using this new approach to questions-and-answers because users will more likely come to their search engine if it provides better answers. And more visits to the search engine means more potential income from the advertisers on the search engine. The Benefit to Users of search engines The users of search engines will benefit because they will get precies answers to their questions. The web will become a more infomative and helpful resource to one and all. The Benefit to Web Sites Web sites that include the proposed QANDA tags within their pages will benefit because search engines will more likely link through to their site, since priority will be given by the search engines to pages that directly address the user’s specific question, as opposed to sites that merely contain keywords found in the user’s question. At a minimum, web sites would modify their FAQ (Frequently Asked Questions) pages so that they employ the QANDA tags. But they would be well advised to provide additional QANDA tags covering the entir range of questions that reasonaby apply to their site’s business or objectives. |
|||||||||||||||
|
This article is a work in progress. |
|||||||||||||||
|
The New Approach... The new approach invloves a new HTML tag and new processing by search engines. The <QANDA> HTML Tag I am proposing that a new HTML tag be instituted; one which will allow web page designers to include explicit question and answer information in their web pages which can then be processed by search engines and which would allow web users to get specific answers to specific questions. The general format of the new web tage would be as follows: |
|||||||||||||||
|
<QANDA |
|||||||||||||||
|
The QANDA tag The <QANDA> tag provides all the information regarding a single, specific question and the answer to that question. The QUESTION attribute The QUESTION attribute provides the text of the question. For example: “When were the dinosaurs killed off by the asteroid?” The optional ALTERNATEQUESTION attribute The optional ALTERNATEQUESTION attribute provides a reasonable alternative to the specified QUESTION. That is, it provides a different way that the question might be posed. For example: “At what time in history did the dinosaurs cease to exist?”. Or, “When did the dinosaurs disappear?”. The purpose of this attribute is to assure that, no matter how the user might pose the question, the answer can be provided (without having to include a separate QANDA tag for each and every alternative phrasing of the question). There can be any number of ATERNATEQUESTION attributes within a given QANDA tag. The ANSWER attribute The ANSWER attribute provides the actual text of the answer to the question. For example, “According to the current understanding of the demise of the dinosaurs an asteroid the size of Manhattan collided with the Earth approximately 250 million years ago.”. The optional ANSWERAT attribute The optional ANSWERAT attribute provides the URL of a web page that contains the answer to the question. This attribute is recommended when the answer to a question cannot be fully provided by a short paragraph or two of text (as would normally be provided within the ANSWER attribute) or when the answer is long and involved or requires pictures or other explanatory methods. The optional ANSWERSPOKEN attribute The optional ANSWERSPOKEN attribute provides the URL of an audio file and is intended to provide the spoken version of the text contained within the ANSWER attribute or the text on the web page specified by the ANSWERAT attribute. This attribute makes the answer accessible to those with vision problems. In addition, it would be useful for users accessing the answer via telephone or PDAs (i.e. small computers with limited screen space). Also, there are cases where a written answer is insufficient for fully conveying the necessary inflection of the answer. The optional ANSWERAUDIO attribute The optional ANSWERAUDIO attribute provides the URL of an audio file that provides the answer to the QUESTION. This audio could be different from the audio as provided by the ANSWERSPOKEN attribute, in that it would not necessarily be an exact match to the text as provided in the ANSWER or ANSWERAT attributes. The ANSWERAUDIO file might include, for example, sound effects that aid in the description of the answer. The optional ANSWERVIDEO The optional ANSWERVIDEO attribute provides the URL of a video file that provides the answer to the QUESTION. |
|||||||||||||||
|
A minimal QANDA tag A minimal QANDA tag must contain a QUESTION attribute and at least one of the various forms of the ANSWER attribute. The PARAM tag within a QUESTION attribute The QUESTION attribute, which specifies the text of the answer, may contain optional “parameters” embedded within the text. Each parameter has the form <PARAM name of parameter>. These embedded PARAMs provide a means of generalizing questions. For example, <PARAM geographic area> indicates that any geographic area may be substituted within the text of the question at the position of the parameter. Thus, you could have a QUESTION attribute such as: QUESTION=”What is the current temperature in <PARAM geographic area>?” In this case, when the user of a search engine has entered the question “What is the temperature in New York?” the search engine would recognize (assuming suitable intelligence on the part of the search engine) that New York is a geographic area and would match that question with any <QANDA> tag which contains the above QUESTION parameter. In other words, the search engine would translate “What is the temperature in New York?” into “What is the temperature in <PARAM geographic area>?” and look for any <QANDA> tags (in its database of QANDA tags, gleaned from spidering the Internet) that contain a QUESTION=”What is the current temperature in <PARAM geographic area>?” attribute. If a match is found then the search engine “passes” the actual value for the PARAM, which in this case would mean “passing” “New York” instead of <PARAM geographic area>. If the matched <QANDA> tag contains a simple ANSWER attribute then <PARAM> tags within the ANSWER attribute are substituted for corresponding <PARAM> tags from the QUESTION attribute. Here’s an example: Some web site contains the following QANDA tag: <QANDA A search engine comes across this <QANDA> tag during its normal spidering operations and stores it into its database. A user goes to the search engine and types in the question: What is the temperature in New York? The search engine searches within its database of <QANDA> tags for an exact match on the entered question and fails to find an exact match. The search engine attempts to translate the question into a “parameterized” question. It recognizes “New York” as being a geographic area and translates the original question into ”What is the temperature in <PARAM geographic area>?”. Using this QUESTION the search engine locates a <QANDA> tag with that same QUESTION, i.e. the one described above. The search engine extracts the related ANSWER attribute, namely, “The temperature in <PARAM geographic area> is 37 degrees.”, substitutes the original New York for the <PARAM geographic area> and generates the final answer as: “The temperature in New York is 37 degrees.” which it then presents to the user as the answer to his question. This is clearly a very simplified example and would not be very useful to the user since he would get 37 degrees as the temperature no matter what geographic area he specified, but it suffices to show the intent of parameters within the QUESTION and ANSWER attributes. PARAM values Possible values for the PARAM tag would include: geographic area Others would be added as the use of the QANDA feature becomes more commonplace. Multiple PARAM tags with the same value type It would be possible for a given QUESTION attribute to contain two or more of the same type of PARAM tag. For example, we might want to process a question such as: What is the distance between New York City and Chicago? The corresponding QANDA QUESTION attribute would therefore require two PARAM tags with geographic area values. In order to accommodate this we allow for subscriptin of the value type, as in: QUESTION=”What is the distance between <PARAM geographic area[1]> The FUNCTION tag within the QUESTION attribute The QUESTION attribute may also contain embedded <FUNCTION> tags which specify a function, typically a PERL or JAVA script (embedded within the HTML file) that are intended to return a text string. More info about the ANSWER attribute: “parameters” embedded within the text, referring to the parameters within the QUESTION attribute. The AUTHOR attribute The optional AUTHOR attribute specifies the Nnme of person who authored the QANDA tag. The CONTACTEMAIL attribute The optional CONTACTEMAIL attribute specifies the email address of the person who authored the QANDA tag. The KEYWORDS attribute The optional KEYWORDS attribute specifies keywords and categories that are associated with the QUESTION. For example, if I have a website that deals with telecope optics I would specify the words “telescope” and “optics” as keywords for each of the QANDA tags that I include within my web pages.
When a search engine encounters a QANDA tag during its spidering of the internet it will add the information contained within the tag to its own (i.e. the search engine’s) database. This QANDA database would most likely be separate from the standard database that search engines maintain (which primarily depend upon the frequency of words encountered in each page and certain key words located in the HEAD tag of web pages). The QANDA database would, at its simplest, be a list of Questions and related Answers (simple ANSWER attributes as well as ANSWERAT, ANSWERSPOKEN, ANSWERAUDIO, ANSWERVIDEO attributes) as extracted from QANDA tags. QANDA tags with FUNCTION tags within the QUESTION atttibute For QANDA tags whose QUESTION attribute contains one or more FUNCTION tags the related function itself, extracted from the applicable web page, would also be maintained in the database, so that the search engine could call it as necessary when responding to a question and constructing its related answer. [This concept needs more work, since functions might rely on other functions or be expected to run on the host of the web page.] PROCESSING OF A <QANDA> TAG BY A WEB BROWSER The processing of a QANDA tag by a web browser depends upon whether the tag resides in the <HEAD> or the <BODY> of the HTML file. If it resides within the <HEAD> tag then the web browser simply ignores it. QANDA tags residing within the <HEAD> tag are intended for reference and use by search engines and are not intended to be displayed on the web page defined within the HTML file. If a QANDA tag resides within the <BODY> tag of an HTML file then the browser is expected to present a reasonable representation of the question and answer contained within the QANDA tag. How it actually displays it is up to the designer of the web browser. Smart Search Engines Ideally a smart search engine would allow the user to enter “background information” and then pose the question. The engine would then phrase the actual question “in light of the background information provided. For example, a user might enter something like: OK, here’s the thing. I have an assignment from my biology teacher to figure out The only actual question here is “What do I do?” But that certainly is, in itself, an unanswerable question. A smart search engine, however, would be able to figure out from the “background” information (i.e. the sentences leading up to that final question) that the actaul question being asked is something like: How many types of tree toads live in South America? And if we’re lucky there will be an entry in the search engine’s QANDA database along the lines of How many types of <PARAM animal> live in <PARAM geographic area>? and the search engine will therefor be able to answer the question. Evaluating Answers Since the search engine relies on the QANDA tags embedded within pages on the internet the search engine should provide some means of feedback from users so that it can evaluate the suitability of the answers that it is providing. For example, if I ask “When does the next flight for Chicago leave from Manchester airport?” and the answer that I get is “Your train leaves in ten minutes” then there is something wrong with the QANDA tag that resulted in that answer. The search engine should allow me to let it know that the answer wasn’t suitable and that therefor the QANDA tag should be evaluated (by an actual person!) and possibly discarded from the database. Likewise, if I ask the above question and the answer takes me to a totally unsuitable (e.g. pornographic) web page (via an ANSWERAT attribute) then I should be able to easily report that problem to the search engine and have it corrected. Conversely, if I find that I get an extermely helpful answer to some question, it would be advisable for the search engine to allow me to let in know how pleased I was with the answer, so that the search engine can raise the “reliability factor” of the related QANDA entry (and possibly give it preference over similar but lower-rated entries. Entering Q and A information directly to the Seach Engine The search engine should also provide a means by which Q and A information can be entered directly at the search engine site. A user might be able to provide very helpful Q and As but he should not be required to actually build a web page with the embedded QANDA tag in order for it to become a part of the search engine’s QANDA database. He should, instead, be able to enter such information directly at the search engine site, via a suitable form. Web Site’s Use of QANDA Tags Web sites would |
|||||||||||||||
|
This article is a work in progress. |
|||||||||||||||