A Uniform Resource Identifier (URI) is a string of characters that unambiguously identifies a
particular resource. To guarantee uniformity, all URIs follow a predefined set of syntax rules,[1] but also maintain extensibility through a separately defined hierarchical naming scheme (e.g. http://).
particular resource. To guarantee uniformity, all URIs follow a predefined set of syntax rules,[1] but also maintain extensibility through a separately defined hierarchical naming scheme (e.g. http://).
URI = scheme:[//authority]path[?query][#fragment]
where the authority component divides into three subcomponents:
authority = [userinfo@]host[:port]
he URI comprises:
- A non-empty scheme component followed by a colon (:), consisting of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus (+), period (.), or hyphen (-). Although schemes are case-insensitive, the canonical form is lowercase and documents that specify schemes must do so with lowercase letters. Examples of popular schemes include http, https, ftp, mailto, file, data, and irc. URI schemes should be registered with the Internet Assigned Numbers Authority (IANA), although non-registered schemes are used in practice.[b]
- An optional authority component preceded by two slashes (//), comprising:
- An optional userinfo subcomponent that may consist of a user name and an optional password preceded by a colon (:), followed by an at symbol (@). Use of the format username:password in the userinfo subcomponent is deprecated for security reasons. Applications should not render as clear text any data after the first colon (:) found within a userinfo subcomponent unless the data after the colon is the empty string (indicating no password).
- A host subcomponent, consisting of either a registered name (including but not limited to a hostname), or an IP address. IPv4 addresses must be in dot-decimal notation, and IPv6 addresses must be enclosed in brackets ([]).[10][c]
- An optional port subcomponent preceded by a colon (:).
- A path component, consisting of a sequence of path segments separated by a slash (/). A path is always defined for a URI, though the defined path may be empty (zero length). A segment may also be empty, resulting in two consecutive slashes (//) in the path component. A path component may resemble or map exactly to a file system path, but does not always imply a relation to one. If an authority component is present, then the path component must either be empty or begin with a slash (/). If an authority component is absent, then the path cannot begin with an empty segment, that is with two slashes (//), as the following characters would be interpreted as an authority component.[12] The final segment of the path may be referred to as a 'slug'.
Query delimiter | Example |
---|---|
Ampersand (&) | key1=value1&key2=value2 |
Semicolon (;)[d][incomplete short citation] | key1=value1;key2=value2 |
- An optional query component preceded by a question mark (?), containing a query string of non-hierarchical data. Its syntax is not well defined, but by convention is most often a sequence of attribute–value pairs separated by a delimiter.
- An optional fragment component preceded by a hash (#). The fragment contains a fragment identifier providing direction to a secondary resource, such as a section heading in an article identified by the remainder of the URI. When the primary resource is an HTML document, the fragment is often an id attribute of a specific element, and web browsers will scroll this element into view.
The following figure displays example URIs and their component parts.
userinfo host port
┌──┴───┐ ┌──────┴──────┐ ┌┴┐
https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top
└─┬─┘ └───────────┬──────────────┘└───────┬───────┘ └───────────┬─────────────┘ └┬┘
scheme authority path query fragment
ldap://[2001:db8::7]/c=GB?objectClass?one
└┬─┘ └─────┬─────┘└─┬─┘ └──────┬──────┘
scheme authority path query
mailto:John.Doe@example.com
└─┬──┘ └────┬─────────────┘
scheme path
news:comp.infosystems.www.servers.unix
└┬─┘ └─────────────┬─────────────────┘
scheme path
tel:+1-816-555-1212
└┬┘ └──────┬──────┘
scheme path
telnet://192.0.2.16:80/
└─┬──┘ └─────┬─────┘│
scheme authority path
urn:oasis:names:specification:docbook:dtd:xml:4.1.2
└┬┘ └──────────────────────┬──────────────────────┘
scheme path
Uniform Resource Locator (URL) refers to the subset of URIs
The generic URI syntax consists of a hierarchical sequence of components referred to as the scheme, authority, path, query, and fragment.
The scheme and path components are required, though the path may be empty (no characters). When authority is present, the path mush either be empty or begin with a slash ("/") character. when the authority is not present, the path cannot begin with two slash characters.
No comments:
Post a Comment