WWW(3) Library Functions Manual WWW(3) NAME WWW - World Wide Web Package SYNOPSIS extract_description( FILE ) extract_meta( FILE, NAME ) hyperlink( LIST ) DESCRIPTION This package provides a utility functions for the World Wide Web to ex- tract descriptions of or meta information from files, and hyperlink text. SUBROUTINES The following Perl subroutines are defined and available: extract_description( FILE ) Extracts a description from an HTML or plain text file given by the FILE name; FILE should be an absolute path. The first $de- scription::chars (default: 2048) characters are read. If the file ends in one of the extensions htm, html, or shtml, it is presumed to be an HTML file; if the file ends in txt, it is pre- sumed to be a plain text file. Other extensions are not recog- nized and no description is returned for them. For HTML files, first, if a <META NAME="description" CON- TENT="..."> or a <META NAME="DC.description" CONTENT="..."> (Dublin Core) element is found, then the words specified as the value of the CONTENT attribute is returned as the description. Otherwise, all HTML comments, text between <SCRIPT>, <STYLE>, and <TITLE> tags, and all other HTML tags are stripped. If <AREA ... ALT="..."> or <IMG ... ALT="..."> elements are found, then the words specified as the value of the ALT attributes are extracted. Finally, for either HTML or plain text files, at most $descrip- tion::words (default: 50) are returned. extract_meta( FILE, NAME ) Extracts the value of the CONTENT attribute from a META element having the given NAME attribute from an HTML file given by the FILE name; FILE should be an absolute path. The file must end in one of the extensions htm, html, or shtml to be considered an HTML file. The first $description::chars (default: 2048) char- acters are read. The characters are cached between consecutive calls using the same filename. hyperlink( LIST ) Adds hyperlinks to strings: that is strings that contain sub- strings that are valid URLs (according to RFC 1630) have the ap- propriate HTML tags ``wrapped'' around them so that they will be selectable when displayed in a browser. The ftp, gopher, http, https, mailto, news, telnet, and wais URLs are recognized. Ex- ample: Read all about it at http://www.usatoday.com/ becomes: Read all about it at <A HREF="http://www.usatoday.com/">http://www.usatoday.com/</A> SEE ALSO perl(1) Tim Berners-Lee. ``Universal Resource Identifiers in WWW,'' Request for Comments 1630, Network Working Group of the Internet Engineering Task Force, June 1994. Tim Berners-Lee, Larry Masinter, and Mark McCahill. ``Uniform Resource Locators (URL),'' Request for Comments 1738, Network Working Group, 1994. Dave Raggett, Arnaud Le Hors, and Ian Jacobs. ``Notes on helping search engines index your Web site,'' HTML 4.0 Specification, Appendix B: Performance, Implementation, and Design Notes, World Wide Web Con- sortium, April 1998. --. ``Objects, Images, and Applets: How to specify alternate text,'' HTML 4.0 Specification, ยง13.8, World Wide Web Consortium, April 1998. Dublin Core Directorate. ``The Dublin Core: A Simple Content Descrip- tion Model for Electronic Resources.'' Larry Wall, et al. Programming Perl, 3rd ed., O'Reilly & Associates, Inc., Sebastopol, CA, 2000. AUTHOR Paul J. Lucas <pauljlucas@mac.com> WWW February 12, 2000 WWW(3)
Generated by dwww version 1.14 on Fri Jan 24 06:21:46 CET 2025.