CLVIII. Tidy Functions

��

Tidy is a binding for the Tidy HTML clean and repair utility which allows you to not only clean and otherwise manipulate HTML documents, but also traverse the document tree.

�ݨD

To use Tidy, you will need libtidy installed, available on the tidy homepage http://tidy.sourceforge.net/.

�w��

Tidy is currently available for PHP 4.3.x and PHP 5 as a PECL extension from http://pecl.php.net/package/tidy.

注: Tidy 1.0 is just for PHP 4.3.x, while Tidy 2.0 is just for PHP 5.

If PEAR is available on your *nix-like system you can use the pear installer to install the tidy extension, by the following command: pecl install tidy.

You can always download the tar.gz package and install tidy by hand:

範例 1. tidy install by hand in PHP 4.3.x

gunzip tidy-xxx.tgz
tar -xvf tidy-xxx.tar
cd tidy-xxx
phpize
./configure && make && make install

Windows users can download the extension dll from http://pecl4win.php.net/ext.php/php_tidy.dll.

In PHP 5 you need only to compile using the --with-tidy option.

���ɴ�պA

php.ini�]�w�|�v�T�o�Ǩ�ƪ�欰�C

表格 1. Tidy Configuration Options

NameDefaultChangeableChangelog
tidy.default_config""PHP_INI_SYSTEMAvailable since PHP 5.0.0.
tidy.clean_output"0"PHP_INI_PERDIRAvailable since PHP 5.0.0.
For further details and definitions of the PHP_INI_* constants, see the 附錄 G.

�o�̬�²�u�c��O��C

tidy.default_config string

Default path for tidy config file.

tidy.clean_output boolean

Turns on/off the output repairing by Tidy.

警告

Do not turn on tidy.clean_output if you are generating non-html content such as dynamic images.

�귽��

�o�ө��S�w�q���귽�C

�w��w�q��O

tidyNode

Methods

Properties

  • value - the value of the node (e.g. the html text)

  • name - the name of the tag (e.g. html, a, etc..)

  • type - the type of the node (one of the constants above, e.g. TIDY_NODETYPE_PHP)

  • line* - the line where the node starts

  • column* - the column where the node starts

  • proprietary* - TRUE if the node refers to a proprietary tag

  • id - the ID of the tag (one of the constants above, e.g. TIDY_TAG_FRAME)

  • attribute - an array with the attributes of the current node, or NULL if there aren't any

  • child - an array with the child tidyNodes, or NULL if there aren't any

注: The properties marked with * are just available since PHP 5.1.0.

�w��w�q�`��

�H�U�`�ƥѦ��w�q�A�u�b�o�ө��Q�sĶ�PHP�ι��ɴ�Q�ʺA��J�ɦ�ġC

Each TIDY_TAG_XXX represents a HTML tag. For example, TIDY_TAG_A represents a <a href="XX">link</a> tag. Each TIDY_ATTR_XXX represents a HTML atribute. For example TIDY_ATTR_HREF would represent the href atribute in the previous example.

The following constants are defined:

表格 2. tidy tag constants

constant
TIDY_TAG_UNKNOWN
TIDY_TAG_A
TIDY_TAG_ABBR
TIDY_TAG_ACRONYM
TIDY_TAG_ALIGN
TIDY_TAG_APPLET
TIDY_TAG_AREA
TIDY_TAG_B
TIDY_TAG_BASE
TIDY_TAG_BASEFONT
TIDY_TAG_BDO
TIDY_TAG_BGSOUND
TIDY_TAG_BIG
TIDY_TAG_BLINK
TIDY_TAG_BLOCKQUOTE
TIDY_TAG_BODY
TIDY_TAG_BR
TIDY_TAG_BUTTON
TIDY_TAG_CAPTION
TIDY_TAG_CENTER
TIDY_TAG_CITE
TIDY_TAG_CODE
TIDY_TAG_COL
TIDY_TAG_COLGROUP
TIDY_TAG_COMMENT
TIDY_TAG_DD
TIDY_TAG_DEL
TIDY_TAG_DFN
TIDY_TAG_DIR
TIDY_TAG_DIV
TIDY_TAG_DL
TIDY_TAG_DT
TIDY_TAG_EM
TIDY_TAG_EMBED
TIDY_TAG_FIELDSET
TIDY_TAG_FONT
TIDY_TAG_FORM
TIDY_TAG_FRAME
TIDY_TAG_FRAMESET
TIDY_TAG_H1
TIDY_TAG_H2
TIDY_TAG_H3
TIDY_TAG_H4
TIDY_TAG_H5
TIDY_TAG_H6
TIDY_TAG_HEAD
TIDY_TAG_HR
TIDY_TAG_HTML
TIDY_TAG_I
TIDY_TAG_IFRAME
TIDY_TAG_ILAYER
TIDY_TAG_IMG
TIDY_TAG_INPUT
TIDY_TAG_INS
TIDY_TAG_ISINDEX
TIDY_TAG_KBD
TIDY_TAG_KEYGEN
TIDY_TAG_LABEL
TIDY_TAG_LAYER
TIDY_TAG_LEGEND
TIDY_TAG_LI
TIDY_TAG_LINK
TIDY_TAG_LISTING
TIDY_TAG_MAP
TIDY_TAG_MARQUEE
TIDY_TAG_MENU
TIDY_TAG_META
TIDY_TAG_MULTICOL
TIDY_TAG_NOBR
TIDY_TAG_NOEMBED
TIDY_TAG_NOFRAMES
TIDY_TAG_NOLAYER
TIDY_TAG_NOSAVE
TIDY_TAG_NOSCRIPT
TIDY_TAG_OBJECT
TIDY_TAG_OL
TIDY_TAG_OPTGROUP
TIDY_TAG_OPTION
TIDY_TAG_P
TIDY_TAG_PARAM
TIDY_TAG_PLAINTEXT
TIDY_TAG_PRE
TIDY_TAG_Q
TIDY_TAG_RP
TIDY_TAG_RT
TIDY_TAG_RTC
TIDY_TAG_RUBY
TIDY_TAG_S
TIDY_TAG_SAMP
TIDY_TAG_SCRIPT
TIDY_TAG_SELECT
TIDY_TAG_SERVER
TIDY_TAG_SERVLET
TIDY_TAG_SMALL
TIDY_TAG_SPACER
TIDY_TAG_SPAN
TIDY_TAG_STRIKE
TIDY_TAG_STRONG
TIDY_TAG_STYLE
TIDY_TAG_SUB
TIDY_TAG_TABLE
TIDY_TAG_TBODY
TIDY_TAG_TD
TIDY_TAG_TEXTAREA
TIDY_TAG_TFOOT
TIDY_TAG_TH
TIDY_TAG_THEAD
TIDY_TAG_TITLE
TIDY_TAG_TR
TIDY_TAG_TR
TIDY_TAG_TT
TIDY_TAG_U
TIDY_TAG_UL
TIDY_TAG_VAR
TIDY_TAG_WBR
TIDY_TAG_XMP

表格 3. tidy attribute constants

constant
TIDY_ATTR_UNKNOWN
TIDY_ATTR_ABBR
TIDY_ATTR_ACCEPT
TIDY_ATTR_ACCEPT_CHARSET
TIDY_ATTR_ACCESSKEY
TIDY_ATTR_ACTION
TIDY_ATTR_ADD_DATE
TIDY_ATTR_ALIGN
TIDY_ATTR_ALINK
TIDY_ATTR_ALT
TIDY_ATTR_ARCHIVE
TIDY_ATTR_AXIS
TIDY_ATTR_BACKGROUND
TIDY_ATTR_BGCOLOR
TIDY_ATTR_BGPROPERTIES
TIDY_ATTR_BORDER
TIDY_ATTR_BORDERCOLOR
TIDY_ATTR_BOTTOMMARGIN
TIDY_ATTR_CELLPADDING
TIDY_ATTR_CELLSPACING
TIDY_ATTR_CHAR
TIDY_ATTR_CHAROFF
TIDY_ATTR_CHARSET
TIDY_ATTR_CHECKED
TIDY_ATTR_CITE
TIDY_ATTR_CLASS
TIDY_ATTR_CLASSID
TIDY_ATTR_CLEAR
TIDY_ATTR_CODE
TIDY_ATTR_CODEBASE
TIDY_ATTR_CODETYPE
TIDY_ATTR_COLOR
TIDY_ATTR_COLS
TIDY_ATTR_COLSPAN
TIDY_ATTR_COMPACT
TIDY_ATTR_CONTENT
TIDY_ATTR_COORDS
TIDY_ATTR_DATA
TIDY_ATTR_DATAFLD
TIDY_ATTR_DATAPAGESIZE
TIDY_ATTR_DATASRC
TIDY_ATTR_DATETIME
TIDY_ATTR_DECLARE
TIDY_ATTR_DEFER
TIDY_ATTR_DIR
TIDY_ATTR_DISABLED
TIDY_ATTR_ENCODING
TIDY_ATTR_ENCTYPE
TIDY_ATTR_FACE
TIDY_ATTR_FOR
TIDY_ATTR_FRAME
TIDY_ATTR_FRAMEBORDER
TIDY_ATTR_FRAMESPACING
TIDY_ATTR_GRIDX
TIDY_ATTR_GRIDY
TIDY_ATTR_HEADERS
TIDY_ATTR_HEIGHT
TIDY_ATTR_HREF
TIDY_ATTR_HREFLANG
TIDY_ATTR_HSPACE
TIDY_ATTR_HTTP_EQUIV
TIDY_ATTR_ID
TIDY_ATTR_ISMAP
TIDY_ATTR_LABEL
TIDY_ATTR_LANG
TIDY_ATTR_LANGUAGE
TIDY_ATTR_LAST_MODIFIED
TIDY_ATTR_LAST_VISIT
TIDY_ATTR_LEFTMARGIN
TIDY_ATTR_LINK
TIDY_ATTR_LONGDESC
TIDY_ATTR_LOWSRC
TIDY_ATTR_MARGINHEIGHT
TIDY_ATTR_MARGINWIDTH
TIDY_ATTR_MAXLENGTH
TIDY_ATTR_MEDIA
TIDY_ATTR_METHOD
TIDY_ATTR_MULTIPLE
TIDY_ATTR_NAME
TIDY_ATTR_NOHREF
TIDY_ATTR_NORESIZE
TIDY_ATTR_NOSHADE
TIDY_ATTR_NOWRAP
TIDY_ATTR_OBJECT
TIDY_ATTR_OnAFTERUPDATE
TIDY_ATTR_OnBEFOREUNLOAD
TIDY_ATTR_OnBEFOREUPDATE
TIDY_ATTR_OnBLUR
TIDY_ATTR_OnCHANGE
TIDY_ATTR_OnCLICK
TIDY_ATTR_OnDATAAVAILABLE
TIDY_ATTR_OnDATASETCHANGED
TIDY_ATTR_OnDATASETCOMPLETE
TIDY_ATTR_OnDBLCLICK
TIDY_ATTR_OnERRORUPDATE
TIDY_ATTR_OnFOCUS
TIDY_ATTR_OnKEYDOWN
TIDY_ATTR_OnKEYPRESS
TIDY_ATTR_OnKEYUP
TIDY_ATTR_OnLOAD
TIDY_ATTR_OnMOUSEDOWN
TIDY_ATTR_OnMOUSEMOVE
TIDY_ATTR_OnMOUSEOUT
TIDY_ATTR_OnMOUSEOVER
TIDY_ATTR_OnMOUSEUP
TIDY_ATTR_OnRESET
TIDY_ATTR_OnROWENTER
TIDY_ATTR_OnROWEXIT
TIDY_ATTR_OnSELECT
TIDY_ATTR_OnSUBMIT
TIDY_ATTR_OnUNLOAD
TIDY_ATTR_PROFILE
TIDY_ATTR_PROMPT
TIDY_ATTR_RBSPAN
TIDY_ATTR_READONLY
TIDY_ATTR_REL
TIDY_ATTR_REV
TIDY_ATTR_RIGHTMARGIN
TIDY_ATTR_ROWS
TIDY_ATTR_ROWSPAN
TIDY_ATTR_RULES
TIDY_ATTR_SCHEME
TIDY_ATTR_SCOPE
TIDY_ATTR_SCROLLING
TIDY_ATTR_SELECTED
TIDY_ATTR_SHAPE
TIDY_ATTR_SHOWGRID
TIDY_ATTR_SHOWGRIDX
TIDY_ATTR_SHOWGRIDY
TIDY_ATTR_SIZE
TIDY_ATTR_SPAN
TIDY_ATTR_SRC
TIDY_ATTR_STANDBY
TIDY_ATTR_START
TIDY_ATTR_STYLE
TIDY_ATTR_SUMMARY
TIDY_ATTR_TABINDEX
TIDY_ATTR_TARGET
TIDY_ATTR_TEXT
TIDY_ATTR_TITLE
TIDY_ATTR_TOPMARGIN
TIDY_ATTR_TYPE
TIDY_ATTR_USEMAP
TIDY_ATTR_VALIGN
TIDY_ATTR_VALUE
TIDY_ATTR_VALUETYPE
TIDY_ATTR_VERSION
TIDY_ATTR_VLINK
TIDY_ATTR_VSPACE
TIDY_ATTR_WIDTH
TIDY_ATTR_WRAP
TIDY_ATTR_XML_LANG
TIDY_ATTR_XML_SPACE
TIDY_ATTR_XMLNS

表格 4. tidy nodetype constants

constantdescription
TIDY_NODETYPE_ROOTroot node
TIDY_NODETYPE_DOCTYPEdoctype
TIDY_NODETYPE_COMMENTHTML comment
TIDY_NODETYPE_PROCINSProcessing Instruction
TIDY_NODETYPE_TEXTText
TIDY_NODETYPE_STARTstart tag
TIDY_NODETYPE_ENDend tag
TIDY_NODETYPE_STARTENDempty tag
TIDY_NODETYPE_CDATACDATA
TIDY_NODETYPE_SECTIONXML section
TIDY_NODETYPE_ASPASP code
TIDY_NODETYPE_JSTEJSTE code
TIDY_NODETYPE_PHPPHP code
TIDY_NODETYPE_XMLDECLXML declaration

�d��

This simple example shows basic Tidy usage.

範例 2. Basic Tidy usage

<?php
ob_start
();
?>
<html>a html document</html>
<?
$html
= ob_get_clean();

// Specify configuration
$config = array(
           
'indent'         => true,
           
'output-xhtml'   => true,
           
'wrap'           => 200);

// Tidy
$tidy = new tidy;
$tidy->parseString($html, $config, 'utf8');
$tidy->cleanRepair();

// Output
echo $tidy;
?>

內容目錄
ob_tidyhandler --  ob_start callback function to repair the buffer
tidy_access_count --  Returns the Number of Tidy accessibility warnings encountered for specified document
tidy_clean_repair --  Execute configured cleanup and repair operations on parsed markup
tidy_config_count --  Returns the Number of Tidy configuration errors encountered for specified document
tidy::__construct --  Constructs a new tidy object
tidy_diagnose --  Run configured diagnostics on parsed and repaired markup
tidy_error_count --  Returns the Number of Tidy errors encountered for specified document
tidy_get_body --  Returns a tidyNode Object starting from the <body> tag of the tidy parse tree
tidy_get_config --  Get current Tidy configuration
tidy_get_error_buffer --  Return warnings and errors which occurred parsing the specified document
tidy_get_head --  Returns a tidyNode Object starting from the <head> tag of the tidy parse tree
tidy_get_html_ver --  Get the Detected HTML version for the specified document
tidy_get_html --  Returns a tidyNode Object starting from the <html> tag of the tidy parse tree
tidy_get_opt_doc --  Returns the documentation for the given option name
tidy_get_output --  Return a string representing the parsed tidy markup
tidy_get_release --  Get release date (version) for Tidy library
tidy_get_root --  Returns a tidyNode object representing the root of the tidy parse tree
tidy_get_status --  Get status of specified document
tidy_getopt --  Returns the value of the specified configuration option for the tidy document
tidy_is_xhtml --  Indicates if the document is a XHTML document
tidy_is_xml --  Indicates if the document is a generic (non HTML/XHTML) XML document
tidy_load_config --  Load an ASCII Tidy configuration file with the specified encoding
tidy_node->get_attr --  Return the attribute with the provided attribute id
tidy_node->get_nodes --  Return an array of nodes under this node with the specified id
tidy_node->next --  Returns the next sibling to this node
tidy_node->prev --  Returns the previous sibling to this node
tidy_parse_file --  Parse markup in file or URI
tidy_parse_string --  Parse a document stored in a string
tidy_repair_file --  Repair a file and return it as a string
tidy_repair_string --  Repair a string using an optionally provided configuration file
tidy_reset_config --  Restore Tidy configuration to default values
tidy_save_config --  Save current settings to named file
tidy_set_encoding --  Set the input/output character encoding for parsing markup
tidy_setopt --  Updates the configuration settings for the specified tidy document
tidy_warning_count --  Returns the Number of Tidy warnings encountered for specified document
tidyNode->hasChildren --  Returns true if this node has children
tidyNode->hasSiblings --  Returns true if this node has siblings
tidyNode->isAsp --  Returns true if this node is ASP
tidyNode->isComment --  Returns true if this node represents a comment
tidyNode->isHtml --  Returns true if this node is part of a HTML document
tidyNode->isJste --  Returns true if this node is JSTE
tidyNode->isPhp --  Returns true if this node is PHP
tidyNode->isText --  Returns true if this node represents text (no markup)