Nothing Special   »   [go: up one dir, main page]

WO2001052078A1 - Dead hyper link detection method and system - Google Patents

Dead hyper link detection method and system Download PDF

Info

Publication number
WO2001052078A1
WO2001052078A1 PCT/US2001/001214 US0101214W WO0152078A1 WO 2001052078 A1 WO2001052078 A1 WO 2001052078A1 US 0101214 W US0101214 W US 0101214W WO 0152078 A1 WO0152078 A1 WO 0152078A1
Authority
WO
WIPO (PCT)
Prior art keywords
hyperlinks
document
elements
invalid
valid
Prior art date
Application number
PCT/US2001/001214
Other languages
French (fr)
Inventor
Brian Mcginty
Original Assignee
Screamingmedia Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Screamingmedia Inc. filed Critical Screamingmedia Inc.
Priority to AU2001227909A priority Critical patent/AU2001227909A1/en
Publication of WO2001052078A1 publication Critical patent/WO2001052078A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Definitions

  • the present invention relates generally to the field of document retrieval and interaction on a distributed computer network. More specifically, the present invention relates to a system for post processing embedded hyperlinks.
  • the World Wide Web may be broadly described as a virtual collection of documents with a user being able to access and retrieve these documents through existing telephone or data lines.
  • Documents accessible on the WWW have the capability to direct users to other documents on the web using linking information imbedded in the text itself.
  • the documents are stored in hypertext markup language (HTML) format.
  • HTTP hypertext markup language
  • Using hypertext linking an author will integrate references directly into the text of a document which point to other related items of information.
  • Uniform resource locators URLs provide a way of converting the integrated reference to a real location where the related information will be located on the Internet. It is possible that links that are valid when they are included, in these pages may become defunct or "dead links" over time.
  • An aspect of the present invention involves a method of testing embedded hyperlinks including receiving a document request from a client; parsing a first document to determine if elements in the first document contain hyperlinks; separating the elements into hyperlinks and all other non-hyperlink elements; testing the hyperlinks in a first document in parallel to determine if the hyperlinks are valid hyperlinks or invalid hyperlinks by comparing the hyperlinks to a predetermined rule set; adding the valid hyperlinks to a list including the other non-hyperlink elements; generating a second document from the list; and providing the second document to the client.
  • Another aspect of the present invention involves a system including a memory device which stores a first document; and a processor in communication with the memory device, said processor configured to: receive a document request from a client; parse the first document to determine if elements in the first document contain said hyperlinks; separate the elements into hyperlinks and all other non- hyperlink elements; test hyperlinks in said first document to determine if said hyperlinks are valid hyperlinks or invalid hyperlinks by the comparing the hyperlinks to a predetermined rale set; add the valid hyperlinks to a list including the other non-hyperlink elements; generate a second document using the list; and provide said second document to said client.
  • Figure 1 illustrates a block diagram of an internet client/server relationship
  • Figure 2 illustrates a block diagram of the server of Figure 1;
  • Figure 3 illustrates an HTML document in an exploded view
  • Figure 4 illustrates a flow chart of the process of link validation of an embodiment of the present invention
  • Figure 5 illustrates a first subroutine of the flow chart of Figure 4 in which the hypertext links and other text are separated;
  • Figure 6 illustrates a second subroutine of the flow chart of Figure 4 in which the hypertext links are tested to determine if they are valid.
  • Figure 7 illustrates an alternative embodiment of the present invention which includes a modification of the subroutine of Figure 5 so that invalid hypertext links are processed to strip away the HTML tags .
  • Embodiments of the present invention disclosed herein relate to the serving of web pages or documents by Internet web servers.
  • the pages or documents discussed in this application may be in Hyper Text Markup Language (HTML), Standard Generalized Markup Language (SGML), Extensible Markup Language (XML) or any other format which uses a tagging architecture.
  • HTML will be used for example purposes only.
  • the embodiments disclosed herein include a method and system for checking the validity of HTML hyperlinks embedded in HTML web pages being served to clients by a server. This is true of web servers that serve static (or non- changing HTML web pages) orapplication web servers that serve dynamic HTML web pages.
  • Static web pages are HTML web pages that are written or "constructed" at some point in time and then remain unchanged until a web site administrator manually either removes them, updates them, or replaces them with entirely new pages.
  • Dynamic HTML web pages are web pages served through some type of application server utilizing HTML templates and some type of dynamic page generation mechanism. In both cases it is possible that links that are valid when they are included in these pages may become defunct or "dead links" over time.
  • electronic content distribution system 100 includes a server 110 and a user computer/client 140 both of which are connected across network backbone 105.
  • Network backbone 105 may include an internet backbone, an intranet backbone or any other conventional network backbone or a combination thereof.
  • Server 110 may be a conventional server which includes conventional computer hardware and functionality.
  • Server 110 may be associated with a web site or a content provider, such as a publisher (e.g., a magazine publisher, book publisher, etc.), a news agency, or any distributor or provider of electronic content.
  • Electronic content may correspond to any publications (e.g., a news or magazine article), reports, technical papers and so forth.
  • Electronic content may include a content body including documents with text and/or images with associated metadata as well as traditional index fields generally provided in a header or trailer section of this electronic content.
  • Server 110 is configured to perform automatic dead link checking of hyperlinks to determine if dead links appear in a content body of the electronic content.
  • Fig. 2 is a schematic block diagram illustrating the components of server 110 of Fig. 1.
  • Conventional computer components are included, such as a processor 200, user input devices 205, e.g., keyboard, mouse, etc., for receiving user inputs, network interface 210 for interconnection to the network backbone 105, RAM 215, ROM 220, display 225 and storage device 230.
  • Storage device 230 stores the software which implements the present invention.
  • a request is sent from user computer 140 onto the network backbone 105 for a particular document or other piece of information.
  • the requested document 320 as shown, in Figure 3 is stored on server 110.
  • the document 320 may include highlighted text 322 which includes hidden embedded links to other related information as prepared by hypertext authoring tools.
  • the present invention will automatically perform a dead link check on any hyperlinks in the document 320 before sending the document to the user computer 140.
  • FIG. 4 illustrates a flow diagram of the elemental steps of a first embodiment of the present invention.
  • a user accesses an Internet resource, such as an HTML page, which is served by the server 110.
  • the server 110 will, before serving the page to the user, parse that page and isolate the HTML hyper links that are embedded in that page.
  • Figure 5 illustrates step 412 in more detail.
  • step 412a a comparison is performed between the HTML page and a predefined rule set. Since all HTML hyperlinks employ a defined syntax the server 110 can work from this predetermined rule set for parsing and isolating these links.
  • This predetermined rale set can optionally be augmented through the use of a web server configuration file.
  • This configuration file may employ an HTML hyper link meta language that will allow the server 110 to dynamically learn at initialization time the syntax and nature of the HTML hyperlinks that must be isolated.
  • step 412b a decision is made whether the text is a hyperlink. If so, it is added to the list of "N" hyperlinks in 412c (with N representing a number greater than or equal to 0). If the text is not a hyperlink, it is added to the list of all other HTML elements which are not hyperlinks 412d.
  • step 412e the system determines if all of the document has been checked and if not, returns to step 412a to continue checking the document. If the entire document has been reviewed, then the hyperlink parsing is completed in step 412f and the program returns to the flowchart of Figure 3.
  • FIG. 4 shows that in steps 414 and 416 the hyperlink list of the "N" links and the other non-hyperlink HTML elements lists are separated.
  • the server 110 may in step 418 employ a multi-threaded socket initiator to simultaneously create hypertext transfer protocol (HTTP) socket connections to all the hyperlinks in the hyperlink list and allow the hyperlinks to be tested in parallel.
  • HTTP hypertext transfer protocol
  • These socket connections will begin retrieving the specified web pages looking in particular for web server error messages in HTTP headers of the incoming pages. For example a 404 return code signifies that the web page in question no longer exists at the specified location.
  • the socket connection may be terminated. It is then a matter of parsing and interpreting the headers for the various web pages.
  • step 418a hyperlinks 1 to N are tested. If the first through "N" hyperlinks are valid as determined in steps 418a through 418c then these hyperlinks are given a Boolean value of VALID and added to the list of valid hypertext links in 418d. If these hyperlinks are not valid, then the hyperlink is given the Boolean value of NOT VALID and not added to the list of valid hyperlinks and the program returns to the flowchart of Figure 4.
  • the server 110 has the HTML web page parsed into a dynamic data stracture with the hyperlinks separated from the remaining page elements.
  • the server 110 also has a dynamic data stracture that has a list of the pages internal links and a Boolean value that represents that links web status (i.e., VALID or NOT VALID).
  • the server 110 will recombine the VALID hyperlinks with the other HTML elements in step 420 and omit any hyperlinks having a NOT VALID value.
  • the server 110 will recompose the elements of the page in step 422. In this way the user will never see invalid or defunct links being served by the web site that employs a server 110 such as this.
  • subroutine 418 will be modified so that server 110 will recompose the page with the non- valid link but will strip away the HTML tags that empower that link, thus making the link look like plain text. In this embodiment, the net result is the same. A user will never click on a hyper link that takes them to a defunct page.
  • Subroutine 418 will be modified to include steps 418e through 418g in which if a hyperlink is found to be invalid, the tag will be stripped and the link will be made to look like text and added to VALID hyperlink list.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and system for automatically checking the validity of hyperlinks embedded in web pages being served to clients by these servers. In response to a web page request, a server will parse the document (step 412) and separate the hyperlinks from the other elements in the documents (steps 414, 416). The server will then review the hyperlinks to determine whether 'dead links' are present (step 418). The server will then either remove the dead link or will strip away the tags that empower the link thus making the link look like plain text. The server will then reconstruct the document including the hypertext links and other elements and send the document to the requestor (steps 422, 424).

Description

DEAD HYPER LINK DETECTION METHOD AND SYSTEM
Field of Invention
The present invention relates generally to the field of document retrieval and interaction on a distributed computer network. More specifically, the present invention relates to a system for post processing embedded hyperlinks.
Background of the Invention
The World Wide Web (WWW) may be broadly described as a virtual collection of documents with a user being able to access and retrieve these documents through existing telephone or data lines. Documents accessible on the WWW have the capability to direct users to other documents on the web using linking information imbedded in the text itself. Typically, the documents are stored in hypertext markup language (HTML) format. Using hypertext linking an author will integrate references directly into the text of a document which point to other related items of information. Uniform resource locators (URLs) provide a way of converting the integrated reference to a real location where the related information will be located on the Internet. It is possible that links that are valid when they are included, in these pages may become defunct or "dead links" over time.
Summary of the Invention
An aspect of the present invention involves a method of testing embedded hyperlinks including receiving a document request from a client; parsing a first document to determine if elements in the first document contain hyperlinks; separating the elements into hyperlinks and all other non-hyperlink elements; testing the hyperlinks in a first document in parallel to determine if the hyperlinks are valid hyperlinks or invalid hyperlinks by comparing the hyperlinks to a predetermined rule set; adding the valid hyperlinks to a list including the other non-hyperlink elements; generating a second document from the list; and providing the second document to the client.
Another aspect of the present invention involves a system including a memory device which stores a first document; and a processor in communication with the memory device, said processor configured to: receive a document request from a client; parse the first document to determine if elements in the first document contain said hyperlinks; separate the elements into hyperlinks and all other non- hyperlink elements; test hyperlinks in said first document to determine if said hyperlinks are valid hyperlinks or invalid hyperlinks by the comparing the hyperlinks to a predetermined rale set; add the valid hyperlinks to a list including the other non-hyperlink elements; generate a second document using the list; and provide said second document to said client.
Other and further aspects of the present invention will become apparent during the course of the following description and by reference to the attached drawings.
Brief Description of the Drawings
Figure 1 illustrates a block diagram of an internet client/server relationship;
Figure 2 illustrates a block diagram of the server of Figure 1;
Figure 3 illustrates an HTML document in an exploded view;
Figure 4 illustrates a flow chart of the process of link validation of an embodiment of the present invention;
Figure 5 illustrates a first subroutine of the flow chart of Figure 4 in which the hypertext links and other text are separated;
Figure 6 illustrates a second subroutine of the flow chart of Figure 4 in which the hypertext links are tested to determine if they are valid; and
Figure 7 illustrates an alternative embodiment of the present invention which includes a modification of the subroutine of Figure 5 so that invalid hypertext links are processed to strip away the HTML tags .
Detailed Description of the Preferred Embodiments
The ability of a web server application to ascertain the validity of embedded links in web pages at request time is critical for the creditability of a web site. With more and more web sites moving into the e-commerce arena, this question of web site creditability is becoming even more sensitive. The present invention is capable of detecting defunct hyper links as soon as they become accessible. Embodiments of the present invention disclosed herein relate to the serving of web pages or documents by Internet web servers. The pages or documents discussed in this application may be in Hyper Text Markup Language (HTML), Standard Generalized Markup Language (SGML), Extensible Markup Language (XML) or any other format which uses a tagging architecture. In the following discussion of this application, HTML will be used for example purposes only.
The embodiments disclosed herein include a method and system for checking the validity of HTML hyperlinks embedded in HTML web pages being served to clients by a server. This is true of web servers that serve static (or non- changing HTML web pages) orapplication web servers that serve dynamic HTML web pages. Static web pages are HTML web pages that are written or "constructed" at some point in time and then remain unchanged until a web site administrator manually either removes them, updates them, or replaces them with entirely new pages. Dynamic HTML web pages are web pages served through some type of application server utilizing HTML templates and some type of dynamic page generation mechanism. In both cases it is possible that links that are valid when they are included in these pages may become defunct or "dead links" over time.
With reference to the Figures, several embodiments of the present invention will now be shown and described. Referring to Figure 1, electronic content distribution system 100 includes a server 110 and a user computer/client 140 both of which are connected across network backbone 105. Network backbone 105 may include an internet backbone, an intranet backbone or any other conventional network backbone or a combination thereof.
Server 110 may be a conventional server which includes conventional computer hardware and functionality. Server 110 may be associated with a web site or a content provider, such as a publisher (e.g., a magazine publisher, book publisher, etc.), a news agency, or any distributor or provider of electronic content. Electronic content may correspond to any publications (e.g., a news or magazine article), reports, technical papers and so forth. Electronic content may include a content body including documents with text and/or images with associated metadata as well as traditional index fields generally provided in a header or trailer section of this electronic content. Server 110 is configured to perform automatic dead link checking of hyperlinks to determine if dead links appear in a content body of the electronic content.
Fig. 2 is a schematic block diagram illustrating the components of server 110 of Fig. 1. Conventional computer components are included, such as a processor 200, user input devices 205, e.g., keyboard, mouse, etc., for receiving user inputs, network interface 210 for interconnection to the network backbone 105, RAM 215, ROM 220, display 225 and storage device 230. Storage device 230 stores the software which implements the present invention.
Turning to Figure 1, a request is sent from user computer 140 onto the network backbone 105 for a particular document or other piece of information. The requested document 320 as shown, in Figure 3 is stored on server 110. The document 320 may include highlighted text 322 which includes hidden embedded links to other related information as prepared by hypertext authoring tools. The present invention will automatically perform a dead link check on any hyperlinks in the document 320 before sending the document to the user computer 140.
Figure 4 illustrates a flow diagram of the elemental steps of a first embodiment of the present invention. In a first step 410, a user accesses an Internet resource, such as an HTML page, which is served by the server 110. In step 412, the server 110 will, before serving the page to the user, parse that page and isolate the HTML hyper links that are embedded in that page. Figure 5 illustrates step 412 in more detail. In step 412a, a comparison is performed between the HTML page and a predefined rule set. Since all HTML hyperlinks employ a defined syntax the server 110 can work from this predetermined rule set for parsing and isolating these links. This predetermined rale set can optionally be augmented through the use of a web server configuration file. This configuration file may employ an HTML hyper link meta language that will allow the server 110 to dynamically learn at initialization time the syntax and nature of the HTML hyperlinks that must be isolated. In step 412b, a decision is made whether the text is a hyperlink. If so, it is added to the list of "N" hyperlinks in 412c (with N representing a number greater than or equal to 0). If the text is not a hyperlink, it is added to the list of all other HTML elements which are not hyperlinks 412d. In step 412e, the system determines if all of the document has been checked and if not, returns to step 412a to continue checking the document. If the entire document has been reviewed, then the hyperlink parsing is completed in step 412f and the program returns to the flowchart of Figure 3.
Figure 4 shows that in steps 414 and 416 the hyperlink list of the "N" links and the other non-hyperlink HTML elements lists are separated. Once the server 110 has isolated the list of hyperlinks for a given web page it may in step 418 employ a multi-threaded socket initiator to simultaneously create hypertext transfer protocol (HTTP) socket connections to all the hyperlinks in the hyperlink list and allow the hyperlinks to be tested in parallel. These socket connections will begin retrieving the specified web pages looking in particular for web server error messages in HTTP headers of the incoming pages. For example a 404 return code signifies that the web page in question no longer exists at the specified location. Once the HTTP header is read, the socket connection may be terminated. It is then a matter of parsing and interpreting the headers for the various web pages.
Figure 6 discloses step 418 in more detail. In step 418a, hyperlinks 1 to N are tested. If the first through "N" hyperlinks are valid as determined in steps 418a through 418c then these hyperlinks are given a Boolean value of VALID and added to the list of valid hypertext links in 418d. If these hyperlinks are not valid, then the hyperlink is given the Boolean value of NOT VALID and not added to the list of valid hyperlinks and the program returns to the flowchart of Figure 4.
At this point the server 110 has the HTML web page parsed into a dynamic data stracture with the hyperlinks separated from the remaining page elements. The server 110 also has a dynamic data stracture that has a list of the pages internal links and a Boolean value that represents that links web status (i.e., VALID or NOT VALID). The server 110 will recombine the VALID hyperlinks with the other HTML elements in step 420 and omit any hyperlinks having a NOT VALID value. The server 110 will recompose the elements of the page in step 422. In this way the user will never see invalid or defunct links being served by the web site that employs a server 110 such as this. In an alternative embodiment disclosed in Figure 7, subroutine 418 will be modified so that server 110 will recompose the page with the non- valid link but will strip away the HTML tags that empower that link, thus making the link look like plain text. In this embodiment, the net result is the same. A user will never click on a hyper link that takes them to a defunct page. Subroutine 418 will be modified to include steps 418e through 418g in which if a hyperlink is found to be invalid, the tag will be stripped and the link will be made to look like text and added to VALID hyperlink list.
The foregoing is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the law. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention.

Claims

Claims
1. A method of testing embedded hyperlinks comprising: receiving a document request from a client; parsing a first document to determine if elements in the first document contain hyperlinks; separating the elements into hyperlinks and all other non-hyperlink elements; testing the hyperlinks in a first document in parallel to determine if said hyperlinks are valid hyperlinks or invalid hyperlinks by comparing the hyperlinks to a predetermined rule set; adding the valid hyperlinks to a list including the other non-hyperlink elements; generating a second document from said list; and providing said second document to said client.
2. A method comprising: automatically testing hyperlinks in a first document to determine if said hyperlinks are valid hyperlinks or invalid hyperlinks; and generating a second document using the valid hyperlinks.
3. The method of claim 2, further comprising: stripping tags from the invalid hyperlinks and adding the invalid hyperlinks to the second document.
4. The method of claim 2, wherein said testing of the hyperlinks is performed in parallel.
5. The method of claim 2, further comprising: receiving a document request from a client; and providing the second document to the client.
6. The method of claim 2 further comprising: parsing the first document to determine if elements in the first document contain said hyperlinks.
7. The method of claim 2 further comprising: separating the hyperlinks from other elements in the first document; and adding the valid hyperlinks to the other elements before generating said second document.
8. The method of claim 2, wherein said parsing step includes comparing said elements to a predetermined rule set.
9. The method of claim 2, wherein said first and second documents are static web pages.
10. The method of claim 2, wherein said first and second documents are dynamic web pages.
11. The method of claim 2, wherein said first and second documents are written in a format from one of the group consisting of HTML, SGML, and XML.
12. The method of claim 2, further comprising: stripping tags from the invalid hyperlinks and adding the invalid hyperlinks to the list.
13. A system comprising: a memory device which stores a first document; and a processor in communication with said memory device, said processor configured to: automatically test hyperlinks in said first document to determine if said hyperlinks are valid hyperlinks or invalid hyperlinks; and generate a second document using the valid hyperlinks.
14. The system of claim 13, said processor further configured to: strip tags from the invalid hyperlinks and add the invalid hyperlinks to the second document.
15. The system of claim 13, said processor further configured to: test said hyperlinks in parallel.
16. The system of claim 13, said processor further configured to: parse the first document to determine if elements in the first document contain said hyperlinks.
17. A system comprising: a memory device which stores a first document; and a processor in communication with said memory device, said processor configured to: receive a document request from a client; parse the first document to determine if elements in the first document contain said hyperlinks; separate the elements into hyperlinks and all other non-hyperlink elements; test hyperlinks in said first document to determine if said hyperlinks are valid hyperlinks or invalid hyperlinks by the comparing the hyperlinks to a predetermined rale set; add the valid hyperlinks to a list including the other non-hyperlink elements; generate a second document using the list; and provide said second document to said client.
18. A system comprising: means for automatically testing hyperlinks in a first document to determine if said hyperlinks are valid hyperlinks or invalid hyperlinks; and means for generating a second document using the valid hyperlinks.
19. The system of claim 18, further comprising: means for stripping tags from the invalid hyperlinks and adding the invalid hyperlinks to the second document.
20. The system of claim 18, further comprising: a means for parsing the first document to determine if elements in the first document contain hyperlinks.
PCT/US2001/001214 2000-01-14 2001-01-12 Dead hyper link detection method and system WO2001052078A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001227909A AU2001227909A1 (en) 2000-01-14 2001-01-12 Dead hyper link detection method and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US48343900A 2000-01-14 2000-01-14
US09/483,439 2000-01-14

Publications (1)

Publication Number Publication Date
WO2001052078A1 true WO2001052078A1 (en) 2001-07-19

Family

ID=23920028

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/001214 WO2001052078A1 (en) 2000-01-14 2001-01-12 Dead hyper link detection method and system

Country Status (2)

Country Link
AU (1) AU2001227909A1 (en)
WO (1) WO2001052078A1 (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6519616B1 (en) * 1999-12-31 2003-02-11 Qwest Communications International, Inc. Web site quality assurance system and method
WO2005015387A2 (en) * 2003-07-17 2005-02-17 International Business Machines Corporation Method and system for automatic adjustment of entitlements in a distributed data processing environment
EP1677215A1 (en) 2004-12-30 2006-07-05 Microsoft Corporation Methods and apparatus for the evalution of aspects of a web page
EP1739603A1 (en) * 2005-06-28 2007-01-03 Hurra Communications GmbH Client-server system, server and method for outputting at least one information concerning an online shop or a product offered by the online shop on a network page
US7222101B2 (en) * 2001-02-26 2007-05-22 American Express Travel Related Services Company, Inc. System and method for securing data through a PDA portal
US7536389B1 (en) 2005-02-22 2009-05-19 Yahoo ! Inc. Techniques for crawling dynamic web content
US7590634B2 (en) 2005-12-09 2009-09-15 Microsoft Corporation Detection of inaccessible resources
US7610267B2 (en) * 2005-06-28 2009-10-27 Yahoo! Inc. Unsupervised, automated web host dynamicity detection, dead link detection and prerequisite page discovery for search indexed web pages
US20100275117A1 (en) * 2009-04-23 2010-10-28 Xerox Corporation Method and system for handling references in markup language documents
US8833650B1 (en) 2006-05-25 2014-09-16 Sean I. Mcghie Online shopping sites for redeeming loyalty points
US8944320B1 (en) 2006-05-25 2015-02-03 Sean I. Mcghie Conversion/transfer of non-negotiable credits to in-game funds for in-game purchases
CN104504097A (en) * 2014-12-29 2015-04-08 北京奇虎科技有限公司 Live link rule mining method and device, and searching method and device
CN104572928A (en) * 2014-12-29 2015-04-29 北京奇虎科技有限公司 Dead link rule digging method, dead link rule digging device, searching method and searching device
US9704174B1 (en) 2006-05-25 2017-07-11 Sean I. Mcghie Conversion of loyalty program points to commerce partner points per terms of a mutual agreement
US9842345B2 (en) 2001-03-29 2017-12-12 Gula Consulting Limited Liability Company System and method for networked loyalty program
US10062062B1 (en) 2006-05-25 2018-08-28 Jbshbm, Llc Automated teller machine (ATM) providing money for loyalty points

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5995099A (en) * 1996-06-10 1999-11-30 Horstmann; Jens U. Method for creating and maintaining page links
US6035330A (en) * 1996-03-29 2000-03-07 British Telecommunications World wide web navigational mapping system and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6035330A (en) * 1996-03-29 2000-03-07 British Telecommunications World wide web navigational mapping system and method
US5995099A (en) * 1996-06-10 1999-11-30 Horstmann; Jens U. Method for creating and maintaining page links

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LEUNG: "A tool for testing hypermedia systems", EUROMICRO CONFERENCE, IEEE, vol. 2, 1999, pages 203, XP002939083 *
STOTTS: "Petri-net-based hypertext: document structure with browsing semantics", ACM TRANSACTIONS ON INFORMATION SYSTEMS, vol. 7, January 1989 (1989-01-01), pages 3 - 29, XP002939082 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6519616B1 (en) * 1999-12-31 2003-02-11 Qwest Communications International, Inc. Web site quality assurance system and method
US7222101B2 (en) * 2001-02-26 2007-05-22 American Express Travel Related Services Company, Inc. System and method for securing data through a PDA portal
US9842345B2 (en) 2001-03-29 2017-12-12 Gula Consulting Limited Liability Company System and method for networked loyalty program
WO2005015387A2 (en) * 2003-07-17 2005-02-17 International Business Machines Corporation Method and system for automatic adjustment of entitlements in a distributed data processing environment
WO2005015387A3 (en) * 2003-07-17 2005-06-16 Ibm Method and system for automatic adjustment of entitlements in a distributed data processing environment
CN100424636C (en) * 2003-07-17 2008-10-08 国际商业机器公司 Method and system for automatic adjustment of entitlements in a distributed data processing environment
EP1677215A1 (en) 2004-12-30 2006-07-05 Microsoft Corporation Methods and apparatus for the evalution of aspects of a web page
US7536389B1 (en) 2005-02-22 2009-05-19 Yahoo ! Inc. Techniques for crawling dynamic web content
EP1739603A1 (en) * 2005-06-28 2007-01-03 Hurra Communications GmbH Client-server system, server and method for outputting at least one information concerning an online shop or a product offered by the online shop on a network page
US7610267B2 (en) * 2005-06-28 2009-10-27 Yahoo! Inc. Unsupervised, automated web host dynamicity detection, dead link detection and prerequisite page discovery for search indexed web pages
US7590634B2 (en) 2005-12-09 2009-09-15 Microsoft Corporation Detection of inaccessible resources
US8973821B1 (en) 2006-05-25 2015-03-10 Sean I. Mcghie Conversion/transfer of non-negotiable credits to entity independent funds
US8833650B1 (en) 2006-05-25 2014-09-16 Sean I. Mcghie Online shopping sites for redeeming loyalty points
US8944320B1 (en) 2006-05-25 2015-02-03 Sean I. Mcghie Conversion/transfer of non-negotiable credits to in-game funds for in-game purchases
US8950669B1 (en) 2006-05-25 2015-02-10 Sean I. Mcghie Conversion of non-negotiable credits to entity independent funds
US9704174B1 (en) 2006-05-25 2017-07-11 Sean I. Mcghie Conversion of loyalty program points to commerce partner points per terms of a mutual agreement
US10062062B1 (en) 2006-05-25 2018-08-28 Jbshbm, Llc Automated teller machine (ATM) providing money for loyalty points
US8209599B2 (en) * 2009-04-23 2012-06-26 Xerox Corporation Method and system for handling references in markup language documents
US20100275117A1 (en) * 2009-04-23 2010-10-28 Xerox Corporation Method and system for handling references in markup language documents
CN104504097A (en) * 2014-12-29 2015-04-08 北京奇虎科技有限公司 Live link rule mining method and device, and searching method and device
CN104572928A (en) * 2014-12-29 2015-04-29 北京奇虎科技有限公司 Dead link rule digging method, dead link rule digging device, searching method and searching device

Also Published As

Publication number Publication date
AU2001227909A1 (en) 2001-07-24

Similar Documents

Publication Publication Date Title
US7426513B2 (en) Client-based objectifying of text pages
US8452925B2 (en) System, method and computer program product for automatically updating content in a cache
US7512710B2 (en) Web address converter for dynamic web pages
US7143143B1 (en) System and method for distributed caching using multicast replication
US7739354B2 (en) Adding data to text pages by means of an intermediary proxy
US7970874B2 (en) Targeted web page redirection
US20030005159A1 (en) Method and system for generating and serving multilingual web pages
WO2001052078A1 (en) Dead hyper link detection method and system
US6694484B1 (en) Relating a HTML document with a non-browser application
WO2002039310A1 (en) Content publication system for supporting real-time integration and processing of multimedia content including dynamic data, and method thereof
US20060224397A1 (en) Methods, systems, and computer program products for saving form submissions
US8903887B2 (en) Extracting web services from resources using a web services resources programming model
US20020116525A1 (en) Method for automatically directing browser to bookmark a URL other than a URL requested for bookmarking
WO2004084097A1 (en) Method and apparatus for detecting invalid clicks on the internet search engine
WO2002010957A2 (en) Computer method and apparatus for determining content types of web pages
US20020078014A1 (en) Network crawling with lateral link handling
EP1283993A2 (en) Method and system for building internet-based applications
US20040117349A1 (en) Intermediary server for facilitating retrieval of mid-point, state-associated web pages
WO2001048630A2 (en) Client-server data communication system and method for data transfer between a server and different clients
US8806326B1 (en) User preference based content linking
CA2441014A1 (en) Method and apparatus for processing of internet forms
JP5712496B2 (en) Annotation restoration method, annotation assignment method, annotation restoration program, and annotation restoration apparatus
US20030200331A1 (en) Mechanism for communicating with multiple HTTP servers through a HTTP proxy server from HTML/XSL based web pages
WO2002033553A1 (en) Http request generation from xml definitions
Jiang et al. Multi-level web surfing

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP