Skip to content

New principle: Discourage polyglot formats #239

@hober

Description

@hober

This is a generalization of @dbaron's concern in #128.

Polyglot formats tend to lead to interoperability problems, so we should discourage defining them.

Polyglot formats are formats which are defined such that they can be processed by two or more different kinds of processors with roughly equivalent results. For instance, it's possible to write a computer program that is simultaneously valid C and valid C++. Another example is Polyglot Markup, an abandoned attempt to define a markup syntax that was simultaneously valid HTML and XHTML, and whose documents would produce roughly equivalent DOM trees when parsed with an HTML parser or an XML parser.

Authors tend to test their document with only one kind of processor, so they inadvertently introduce errors which would only be caught by the other kind of processor. In the case of Polyglot Markup, this happened when authors introduced XML errors into their document but only tested with an HTML parser. Consumers using an XML parser would, instead of seeing the document, see an XML parser error screen.

If the polyglot format contains fields that are only used by one kind of processor, such fields are likely to experience bit rot problems when authors only routinely test their documents with the other kind of processor. (For instance, if authors routinely use JSON parsers to test their JSON-LD, the @context section is likely to experience bit rot. Downstream consumers of the document who use a JSON-LD processor will start encountering bugs. They'll report it upstream, and that person will say "it works for me, it must be a bug in your software".)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions