Format and indent XML
- Dashboard
- Documentation
- API
Why format XML?
XML formatting (also called XML pretty printing or XML indentation) consists of reorganising a compact or minified XML document into an indented, line-by-line structure. Well-formatted XML improves human readability, which is essential when reading a SOAP response, browsing an RSS feed or inspecting the content of a configuration file.
Concretely, XML is formatted for four main reasons:
- Debug: quickly spot an unclosed tag, a missing attribute or an inconsistent structure.
- Manual integration: cleanly copy a fragment into a configuration file without breaking existing indentation.
- Code review: comparing two versions of an XML document in a Git diff becomes readable when each tag is on its own line.
- Documentation and examples: a formatted XML is much more pedagogical in technical docs or a README.
Typical use cases
XML remains pervasive in the software ecosystem. Here are the formats developers most often run through an XML formatter:
- SOAP responses:
<soap:Envelope>envelopes returned by SOAP web services usually arrive minified. - RSS and Atom feeds: to inspect or audit a syndication feed.
- Spring, Maven (
pom.xml), Ant, Ivy configurations: Java build and dependency files. AndroidManifest.xmland other XML resources of an Android project.- SVG files: a vector graphic exported from a design tool is often compressed onto a single line.
- E-commerce exports: Google Shopping product feeds, Magento, Shopify or PrestaShop exports.
- OOXML documents:
.docx,.xlsxand.pptxfiles are ZIP archives containing XML files that often need to be inspected. - XML sitemaps: the
sitemap.xmlserved to search engines.
How XML formatting works
An XML formatter does not simply add whitespace at random. It performs two steps:
- Parsing: the document is read then turned into a DOM tree (Document Object Model). This step also validates that the XML is well-formed (correctly nested tags, attributes in quotes, a single root element).
- Indented serialisation: the DOM tree is re-emitted as text with line breaks between elements and indentation proportional to the depth in the tree.
This tree-based approach guarantees that the document's logical structure remains strictly identical: only non-significant whitespace between tags is modified. CDATA sections, comments and processing instructions are preserved. More advanced transformations (XSLT) or queries (XPath) actually operate on this same DOM tree.
How to use the XML formatter
The procedure is intentionally simple:
- Paste your XML document into the text area, or upload an
.xmlfile. - Submit the form: the document is parsed and indented automatically.
- The formatted result appears in the output area. In case of a parsing error, a message indicates the nature and approximate position of the problem.
- Copy the result with the dedicated button, or download it to embed directly in your project.
XML indentation best practices
A few conventions to follow to produce clean and durable XML:
- Indentation: 2 or 4 spaces depending on your team's convention. Stay consistent across the whole project.
- Line breaks: one element per line, which makes Git diffs workable.
- CDATA sections: a good formatter fully preserves
<![CDATA[ ... ]]>blocks without reindenting their content, which is treated as raw text. - Encoding: always declare the encoding on the first line with
<?xml version="1.0" encoding="UTF-8"?>. UTF-8 is the de facto standard. - Attributes: if an element has many attributes, some teams put them on separate lines for readability; otherwise they stay on the same line as the opening tag.
- Comments:
<!-- ... -->should be kept as is by the formatter.
Before / after example
Here is a concrete example of a compact XML beautified by the formatter.
Before (single line, unreadable):
<root><item id="1">val</item><item id="2">val2</item></root>
After (formatted with 2 spaces):
<?xml version="1.0" encoding="UTF-8"?>
<root>
<item id="1">val</item>
<item id="2">val2</item>
</root>
The logical structure is strictly identical: only the visual layout changes.
Full example
A larger XML document after formatting looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<user id="12345">
<name>John Doe</name>
<email>johndoe@example.com</email>
<address>
<street>123 Main St</street>
<city>Springfield</city>
<state>IL</state>
<postalCode>62704</postalCode>
<country>USA</country>
</address>
<phoneNumbers>
<phoneNumber type="home">555-1234</phoneNumber>
<phoneNumber type="work">555-5678</phoneNumber>
</phoneNumbers>
<preferences>
<newsletter>true</newsletter>
<theme>dark</theme>
</preferences>
<lastLogin>2024-06-14T09:30:00Z</lastLogin>
</user>
</root>
Frequently asked questions
What is a well-formed XML?
An XML is said to be well-formed when it follows the basic syntactic rules: a single root element, correctly nested and closed tags, attributes in quotes, escaped special characters (&, <, >). It is a prerequisite to be able to parse the document. Not to be confused with a valid XML, which additionally complies with an XSD schema or a DTD.
What is the difference between pretty printing and XML minification?
Pretty print adds line breaks and indentation for human reading. Minification does the opposite: it removes all non-significant whitespace to reduce the document's size, which is useful for network transit (SOAP responses, XML APIs). Both documents are semantically equivalent.
Are comments and CDATA sections preserved during formatting?
Yes. A correct formatter preserves <!-- ... --> comments, <![CDATA[ ... ]]> sections as well as processing instructions (<? ... ?>). The content of a CDATA stays intact without reindentation, since it is treated as raw text. Check the result if your comments have documentary importance.
Are XML namespaces (xmlns) handled?
Yes. Namespace declarations (xmlns:soap="...") and the associated prefixes (<soap:Envelope>) are preserved as is. The formatter does not rename prefixes nor move declarations in the tree, which guarantees the document's semantics remain identical. Default namespaces (without prefix) are also kept at their declaration level.
Should I indent with spaces or tabs?
Both work, but spaces are the dominant convention because they display identically everywhere (terminals, browsers, Git diff). 2 or 4 spaces depending on your XML's verbosity: 2 for deeply nested documents (SVG, Spring configurations), 4 for flat documents. What matters most is consistency throughout the project.
Does the formatter change the document's semantics?
No. Only whitespace between tags is adjusted. The DOM structure, attribute values and textual content of elements remain identical. One exception: significant textual content (with xml:space="preserve") must be kept as is, check this aspect if your data contains meaningful whitespace.
Can XML be validated at the same time as formatted?
The formatter already performs a syntactic validation (well-formed XML) since it has to parse the document. For validation against an XSD schema or a DTD, a dedicated tool is needed: structural validation is beyond the scope of a simple formatter. xmllint --schema or xerces are suitable for this extra step.
Example request
curl -X POST https://cdrn.fr/api/v1/tools/xml-formatter/execute \
-H "Content-Type: application/json" \
-d '{"input":"..."}'
Input schema
| Field | Type | Required | Default |
|---|---|---|---|
input |
text | ✓ | – |
Endpoints
GET https://cdrn.fr/api/v1/tools- lists every available toolGET https://cdrn.fr/api/v1/tools/xml-formatter- returns the schema for this toolPOST https://cdrn.fr/api/v1/tools/xml-formatter/execute- runs this tool with a JSON payload