| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225 | 
							- # sax js
 
- A sax-style parser for XML and HTML.
 
- Designed with [node](http://nodejs.org/) in mind, but should work fine in
 
- the browser or other CommonJS implementations.
 
- ## What This Is
 
- * A very simple tool to parse through an XML string.
 
- * A stepping stone to a streaming HTML parser.
 
- * A handy way to deal with RSS and other mostly-ok-but-kinda-broken XML
 
-   docs.
 
- ## What This Is (probably) Not
 
- * An HTML Parser - That's a fine goal, but this isn't it.  It's just
 
-   XML.
 
- * A DOM Builder - You can use it to build an object model out of XML,
 
-   but it doesn't do that out of the box.
 
- * XSLT - No DOM = no querying.
 
- * 100% Compliant with (some other SAX implementation) - Most SAX
 
-   implementations are in Java and do a lot more than this does.
 
- * An XML Validator - It does a little validation when in strict mode, but
 
-   not much.
 
- * A Schema-Aware XSD Thing - Schemas are an exercise in fetishistic
 
-   masochism.
 
- * A DTD-aware Thing - Fetching DTDs is a much bigger job.
 
- ## Regarding `<!DOCTYPE`s and `<!ENTITY`s
 
- The parser will handle the basic XML entities in text nodes and attribute
 
- values: `& < > ' "`. It's possible to define additional
 
- entities in XML by putting them in the DTD. This parser doesn't do anything
 
- with that. If you want to listen to the `ondoctype` event, and then fetch
 
- the doctypes, and read the entities and add them to `parser.ENTITIES`, then
 
- be my guest.
 
- Unknown entities will fail in strict mode, and in loose mode, will pass
 
- through unmolested.
 
- ## Usage
 
- ```javascript
 
- var sax = require("./lib/sax"),
 
-   strict = true, // set to false for html-mode
 
-   parser = sax.parser(strict);
 
- parser.onerror = function (e) {
 
-   // an error happened.
 
- };
 
- parser.ontext = function (t) {
 
-   // got some text.  t is the string of text.
 
- };
 
- parser.onopentag = function (node) {
 
-   // opened a tag.  node has "name" and "attributes"
 
- };
 
- parser.onattribute = function (attr) {
 
-   // an attribute.  attr has "name" and "value"
 
- };
 
- parser.onend = function () {
 
-   // parser stream is done, and ready to have more stuff written to it.
 
- };
 
- parser.write('<xml>Hello, <who name="world">world</who>!</xml>').close();
 
- // stream usage
 
- // takes the same options as the parser
 
- var saxStream = require("sax").createStream(strict, options)
 
- saxStream.on("error", function (e) {
 
-   // unhandled errors will throw, since this is a proper node
 
-   // event emitter.
 
-   console.error("error!", e)
 
-   // clear the error
 
-   this._parser.error = null
 
-   this._parser.resume()
 
- })
 
- saxStream.on("opentag", function (node) {
 
-   // same object as above
 
- })
 
- // pipe is supported, and it's readable/writable
 
- // same chunks coming in also go out.
 
- fs.createReadStream("file.xml")
 
-   .pipe(saxStream)
 
-   .pipe(fs.createWriteStream("file-copy.xml"))
 
- ```
 
- ## Arguments
 
- Pass the following arguments to the parser function.  All are optional.
 
- `strict` - Boolean. Whether or not to be a jerk. Default: `false`.
 
- `opt` - Object bag of settings regarding string formatting.  All default to `false`.
 
- Settings supported:
 
- * `trim` - Boolean. Whether or not to trim text and comment nodes.
 
- * `normalize` - Boolean. If true, then turn any whitespace into a single
 
-   space.
 
- * `lowercase` - Boolean. If true, then lowercase tag names and attribute names
 
-   in loose mode, rather than uppercasing them.
 
- * `xmlns` - Boolean. If true, then namespaces are supported.
 
- * `position` - Boolean. If false, then don't track line/col/position.
 
- * `strictEntities` - Boolean. If true, only parse [predefined XML
 
-   entities](http://www.w3.org/TR/REC-xml/#sec-predefined-ent)
 
-   (`&`, `'`, `>`, `<`, and `"`)
 
- ## Methods
 
- `write` - Write bytes onto the stream. You don't have to do this all at
 
- once. You can keep writing as much as you want.
 
- `close` - Close the stream. Once closed, no more data may be written until
 
- it is done processing the buffer, which is signaled by the `end` event.
 
- `resume` - To gracefully handle errors, assign a listener to the `error`
 
- event. Then, when the error is taken care of, you can call `resume` to
 
- continue parsing. Otherwise, the parser will not continue while in an error
 
- state.
 
- ## Members
 
- At all times, the parser object will have the following members:
 
- `line`, `column`, `position` - Indications of the position in the XML
 
- document where the parser currently is looking.
 
- `startTagPosition` - Indicates the position where the current tag starts.
 
- `closed` - Boolean indicating whether or not the parser can be written to.
 
- If it's `true`, then wait for the `ready` event to write again.
 
- `strict` - Boolean indicating whether or not the parser is a jerk.
 
- `opt` - Any options passed into the constructor.
 
- `tag` - The current tag being dealt with.
 
- And a bunch of other stuff that you probably shouldn't touch.
 
- ## Events
 
- All events emit with a single argument. To listen to an event, assign a
 
- function to `on<eventname>`. Functions get executed in the this-context of
 
- the parser object. The list of supported events are also in the exported
 
- `EVENTS` array.
 
- When using the stream interface, assign handlers using the EventEmitter
 
- `on` function in the normal fashion.
 
- `error` - Indication that something bad happened. The error will be hanging
 
- out on `parser.error`, and must be deleted before parsing can continue. By
 
- listening to this event, you can keep an eye on that kind of stuff. Note:
 
- this happens *much* more in strict mode. Argument: instance of `Error`.
 
- `text` - Text node. Argument: string of text.
 
- `doctype` - The `<!DOCTYPE` declaration. Argument: doctype string.
 
- `processinginstruction` - Stuff like `<?xml foo="blerg" ?>`. Argument:
 
- object with `name` and `body` members. Attributes are not parsed, as
 
- processing instructions have implementation dependent semantics.
 
- `sgmldeclaration` - Random SGML declarations. Stuff like `<!ENTITY p>`
 
- would trigger this kind of event. This is a weird thing to support, so it
 
- might go away at some point. SAX isn't intended to be used to parse SGML,
 
- after all.
 
- `opentagstart` - Emitted immediately when the tag name is available,
 
- but before any attributes are encountered.  Argument: object with a
 
- `name` field and an empty `attributes` set.  Note that this is the
 
- same object that will later be emitted in the `opentag` event.
 
- `opentag` - An opening tag. Argument: object with `name` and `attributes`.
 
- In non-strict mode, tag names are uppercased, unless the `lowercase`
 
- option is set.  If the `xmlns` option is set, then it will contain
 
- namespace binding information on the `ns` member, and will have a
 
- `local`, `prefix`, and `uri` member.
 
- `closetag` - A closing tag. In loose mode, tags are auto-closed if their
 
- parent closes. In strict mode, well-formedness is enforced. Note that
 
- self-closing tags will have `closeTag` emitted immediately after `openTag`.
 
- Argument: tag name.
 
- `attribute` - An attribute node.  Argument: object with `name` and `value`.
 
- In non-strict mode, attribute names are uppercased, unless the `lowercase`
 
- option is set.  If the `xmlns` option is set, it will also contains namespace
 
- information.
 
- `comment` - A comment node.  Argument: the string of the comment.
 
- `opencdata` - The opening tag of a `<![CDATA[` block.
 
- `cdata` - The text of a `<![CDATA[` block. Since `<![CDATA[` blocks can get
 
- quite large, this event may fire multiple times for a single block, if it
 
- is broken up into multiple `write()`s. Argument: the string of random
 
- character data.
 
- `closecdata` - The closing tag (`]]>`) of a `<![CDATA[` block.
 
- `opennamespace` - If the `xmlns` option is set, then this event will
 
- signal the start of a new namespace binding.
 
- `closenamespace` - If the `xmlns` option is set, then this event will
 
- signal the end of a namespace binding.
 
- `end` - Indication that the closed stream has ended.
 
- `ready` - Indication that the stream has reset, and is ready to be written
 
- to.
 
- `noscript` - In non-strict mode, `<script>` tags trigger a `"script"`
 
- event, and their contents are not checked for special xml characters.
 
- If you pass `noscript: true`, then this behavior is suppressed.
 
- ## Reporting Problems
 
- It's best to write a failing test if you find an issue.  I will always
 
- accept pull requests with failing tests if they demonstrate intended
 
- behavior, but it is very hard to figure out what issue you're describing
 
- without a test.  Writing a test is also the best way for you yourself
 
- to figure out if you really understand the issue you think you have with
 
- sax-js.
 
 
  |