Tutorial link too hard to click? Seriously: these comments inevitably come up for most language related pages, and I don’t see how they don’t fall under the site guidelines of “Please don't post shallow dismissals”.
The comment specifically said "on the landing before the fold", so yours is the shallow dismissal. I agree with the comment; lead with the things we shall judge ye by.
I made a similar comment. I think they come up because people are genuinely interested in a project and trying to offer the creator a fresh perspective. When creating a project where your the domain expert it's so easy to get stuck in your own head, and then you start explaining the project to a newcomer diving into deep details when they don't understand the starting point.
Until a project has a lot of traction (think docker, react, django not uv or jq) it's very safe to assume that every visitor to your page doesn't understand the background.
> This is a good time to mention that even though from a semantic perspective sets and dictionaries do not carry information about the ordering of their elements.
Except they do in Python. It is extremely useful, surprisingly often.
Is it true that Records are the same as Dictionaries, because the labels in the records can have any value?
Interesting on how on one hand the size of SignedInteger is unlimited, but on the other hand there is a ByteString. A ByteString could also have been represented by as a sequence of SignedInteger. I also wonder if it would not better to have a Unicode character as an atomic unit and represent a string as a sequence of Unicode characters.
This makes me wonder whether this is a high-level data model or yet another data representation.
Records have positional "fields" because of the Scheme heritage of the design.
--
Re bytestring -- yes there are some concessions to real machines/languages in there that aren't absolutely required. Other examples include booleans and strings, which could have been <true> and <false> and <string [65 66 67]> etc respectively.
Thanks. Yes there are: see sections 15, 16 and 17 of https://synit.org/book/, where Preserves, Preserves Schemas, and the Syndicated Actor Model make a reactive replacement system layer for linux (essentially an alternative to systemd)
Does Preserves have a page with a comparison to other common serialization languages?
As someone familiar with Protobuf, comparing Preserves vs Protobuf text format, here's my quick comparison between the two after reading through the tutorial:
- Preserves' Symbol is very close to Protobuf enums. But Symbol can contain characters like dash
- There doesn't seem to be an equivalent of Preserves' Record in Protobuf, but the tutorial's example of using <Unknown ...> To denote a missing <Date ...> can be simulated using the `oneof` field in Protobuf.
- Having to write #t/#f in Preserves is unfortunate. I guess this is the result of schemaless serialization language and potential parsing ambiguity with a Symbol?
- Protobuf have a way to annotate the schema and reuse at runtime, very similar to Preserves' annotations.
Protobufs are designed to support schema evolution without explicit versioning. (All fields are optional so they can be added or dropped, provided field numbers aren’t reused.)
It looks like Preserves just uses version numbers in its schemas. On the other hand, you can read the data without a schema, similar to JSON.
The version number is the schema language version, not the version of the collection of types described in the file.
The schema language is extensible/evolvable in that pattern matching ignores extra entries in a sequence and extra key/value pairs in a dictionary. So you could have a "version 1" of a schema with
Then, Person.v2 from "version 2" would be parseable by Person from "version 1", and Person from "version 1" would parse using "version 2" as a Person.v1.
The schema language is in production but the design is still a work in progress and I expect more changes before a 1.0 release of the schema language.
(The schema language is completely separate from the preserves data model, by the way -- one could imagine other schema languages being used instead/as well)
XML would make a fine choice. It lacks atomic data types other than text, and compound data types other than sequences, unless you count element attributes, which are in a kind of awkward position because of the historical development of the language. Preserves has a richer suite of primitive data types and decomposes XML's elements into separate notions of map, sequence, and tagged value.
XML would require a schema to express the concepts in Preserves, or JSON for that matter.
The reason JSON is lower friction than XML for data representation is that you get basic data representations (numbers, strings, arrays, maps) for free in a natural native syntax that happens to parallel multiple programming languages.
XML, in contrast, is a meta-language that allows schema to express different data representations. You've got to use attributes and elements to represent data and data types. XSD is a common datatype schema, but it's quite verbose, and data serialization looks very different from what it looks like in a programming language representation.
Preserves looks like a superset of JSON. It includes additional data representation concepts through syntax extensions, but the idea is the same.
What I don't see is a standard way to map record types (like "irl" in the tutorial) to a unique identifier like an URI/IRI, or something like a CURIE. That kind of feature would allow Preserves to better describe standardized record types.
The syntax isn't the most interesting part though; the thing that distinguishes it from most other data languages out there is that it has semantics (= a rigorous definition of when values are equal and when they aren't). So you can use Preserves semantics with JSON syntax (a subset of Preserves' text syntax) as one way of getting actually-meaningful JSON.
I was going to ask the exact same question. The title makes it sound like something I might be interested in, then I visited the page and I have no idea what it does.
After some brief reading of docs, I'm trying to write one sentence explanations. Maybe this will be helpful to you
What
Preserves is a specification and set of libraries in popular languages that lets you reliably exchange data between XML, JSON and EDN.
Who
Preserves is built for (data engineers|data framework writers) to reliably interchange data.
Why
Formats like JSON in particular are imprecise. Preserves forces you to deal with these vagaries up front
What else?
With P-Expressions you can search a preserve compliant datasource much like you would query JSON with JQ
Who Not? Who shouldn't use this
This will not help a data analyst exchange data between CSV and Excel
Useful if you have a JSON-keyed table, for example: JSON lacks a useful (standardised) equivalence relation, meaning you get weak and/or implementation-specific guarantees about how key lookup works. Equivalences were the motivation for developing Preserves: I was (and am) working on a generalized approach to messaging middleware, you might say, meaning that things like "patterns over values" and "filters" and "value-keyed tables" are all things I need to talk about. (This all comes out of RabbitMQ/AMQP thinking back in the day and my PhD-and-after work subsequently.)
CSTML is targeting many of the same weaknesses in JSON. It's fun to see a whole different, competing set of design choices at work. I had a very different take on schema validation and how to use the < syntax.
Yeah EDN is quite similar. Preserves has no nil, allows any value as a tag, gets into the weeds more on when strings are equal or not, doesn't distinguish lists and vectors, and doesn't require each kind of tagged element to define an equivalence. And it has annotations (vs EDN's comments) and embedded values.
So `#` starts a line comment but `#t` is a boolean. Yeah, that's never gonna hurt anyone.
gotta have some examples on the landing before the fold, otherwise i have no reason to explore
https://preserves.dev/TUTORIAL.html
Not a fan of annotation (that can be used as comment syntax) having # followed by a space character have a different behaviour feels strange.
Tutorial link too hard to click? Seriously: these comments inevitably come up for most language related pages, and I don’t see how they don’t fall under the site guidelines of “Please don't post shallow dismissals”.
The comment specifically said "on the landing before the fold", so yours is the shallow dismissal. I agree with the comment; lead with the things we shall judge ye by.
I made a similar comment. I think they come up because people are genuinely interested in a project and trying to offer the creator a fresh perspective. When creating a project where your the domain expert it's so easy to get stuck in your own head, and then you start explaining the project to a newcomer diving into deep details when they don't understand the starting point.
Until a project has a lot of traction (think docker, react, django not uv or jq) it's very safe to assume that every visitor to your page doesn't understand the background.
> This is a good time to mention that even though from a semantic perspective sets and dictionaries do not carry information about the ordering of their elements.
Except they do in Python. It is extremely useful, surprisingly often.
Python remembers order, and exposes it in its iterations, but doesn't use it in its equivalence over dictionaries (== semantics).
(ETA: What are you quoting there? I don't think that text appears on the Preserves site) (ETA2: Ah, it's the tutorial. Cool)
Not sure about the previous post, but also python’s OrderedDict collection guarantees order-sensitive equality checks. [1]
Plain dict maintains insertion order but equality checks only check that the key/value pairs are the same. [2] [3]
[1] https://docs.python.org/3/library/collections.html#:~:text=e...
[2] https://docs.python.org/3/library/stdtypes.html#:~:text=dict...
[3] https://docs.python.org/3/library/stdtypes.html#:~:text=dict...
Yea. Just pointing out that the order can be significant, so throwing it away isn't generally a good idea.
Is it true that Records are the same as Dictionaries, because the labels in the records can have any value?
Interesting on how on one hand the size of SignedInteger is unlimited, but on the other hand there is a ByteString. A ByteString could also have been represented by as a sequence of SignedInteger. I also wonder if it would not better to have a Unicode character as an atomic unit and represent a string as a sequence of Unicode characters.
This makes me wonder whether this is a high-level data model or yet another data representation.
No, a record is a tagged (sequence of) value(s).
If you put a single dictionary-valued "field" in a record, you get a variation with named fields Records have positional "fields" because of the Scheme heritage of the design.--
Re bytestring -- yes there are some concessions to real machines/languages in there that aren't absolutely required. Other examples include booleans and strings, which could have been <true> and <false> and <string [65 66 67]> etc respectively.
There's a little more on this topic in footnote 2 on the "conventions" page: https://preserves.dev/conventions.html#fn:why-dictionaries
Looks like a lot of excellent work!
Are there any good examples of nontrivial schemas etc?
Thanks. Yes there are: see sections 15, 16 and 17 of https://synit.org/book/, where Preserves, Preserves Schemas, and the Syndicated Actor Model make a reactive replacement system layer for linux (essentially an alternative to systemd)
Does Preserves have a page with a comparison to other common serialization languages?
As someone familiar with Protobuf, comparing Preserves vs Protobuf text format, here's my quick comparison between the two after reading through the tutorial:
- Preserves' Symbol is very close to Protobuf enums. But Symbol can contain characters like dash
- There doesn't seem to be an equivalent of Preserves' Record in Protobuf, but the tutorial's example of using <Unknown ...> To denote a missing <Date ...> can be simulated using the `oneof` field in Protobuf.
- Having to write #t/#f in Preserves is unfortunate. I guess this is the result of schemaless serialization language and potential parsing ambiguity with a Symbol?
- Protobuf have a way to annotate the schema and reuse at runtime, very similar to Preserves' annotations.
Protobufs are designed to support schema evolution without explicit versioning. (All fields are optional so they can be added or dropped, provided field numbers aren’t reused.)
It looks like Preserves just uses version numbers in its schemas. On the other hand, you can read the data without a schema, similar to JSON.
The version number is the schema language version, not the version of the collection of types described in the file.
The schema language is extensible/evolvable in that pattern matching ignores extra entries in a sequence and extra key/value pairs in a dictionary. So you could have a "version 1" of a schema with
and a "version 2" with Then, Person.v2 from "version 2" would be parseable by Person from "version 1", and Person from "version 1" would parse using "version 2" as a Person.v1.The schema language is in production but the design is still a work in progress and I expect more changes before a 1.0 release of the schema language.
(The schema language is completely separate from the preserves data model, by the way -- one could imagine other schema languages being used instead/as well)
Might be an ignorant question but why not just use XML? It seems like XML could do all this, from my limited reading?
XML would make a fine choice. It lacks atomic data types other than text, and compound data types other than sequences, unless you count element attributes, which are in a kind of awkward position because of the historical development of the language. Preserves has a richer suite of primitive data types and decomposes XML's elements into separate notions of map, sequence, and tagged value.
XML would require a schema to express the concepts in Preserves, or JSON for that matter.
The reason JSON is lower friction than XML for data representation is that you get basic data representations (numbers, strings, arrays, maps) for free in a natural native syntax that happens to parallel multiple programming languages.
XML, in contrast, is a meta-language that allows schema to express different data representations. You've got to use attributes and elements to represent data and data types. XSD is a common datatype schema, but it's quite verbose, and data serialization looks very different from what it looks like in a programming language representation.
Preserves looks like a superset of JSON. It includes additional data representation concepts through syntax extensions, but the idea is the same.
What I don't see is a standard way to map record types (like "irl" in the tutorial) to a unique identifier like an URI/IRI, or something like a CURIE. That kind of feature would allow Preserves to better describe standardized record types.
Here is the ABNF:
https://preserves.dev/preserves-text.html
Or, in "quick reference card" form: https://preserves.dev/cheatsheet.html
The syntax isn't the most interesting part though; the thing that distinguishes it from most other data languages out there is that it has semantics (= a rigorous definition of when values are equal and when they aren't). So you can use Preserves semantics with JSON syntax (a subset of Preserves' text syntax) as one way of getting actually-meaningful JSON.
Plus, comments (and other annotations) ;-)
Why/when/where would I need this?
I was going to ask the exact same question. The title makes it sound like something I might be interested in, then I visited the page and I have no idea what it does.
After some brief reading of docs, I'm trying to write one sentence explanations. Maybe this will be helpful to you
What
Preserves is a specification and set of libraries in popular languages that lets you reliably exchange data between XML, JSON and EDN.
Who
Preserves is built for (data engineers|data framework writers) to reliably interchange data.
Why
Formats like JSON in particular are imprecise. Preserves forces you to deal with these vagaries up front
What else?
With P-Expressions you can search a preserve compliant datasource much like you would query JSON with JQ
Who Not? Who shouldn't use this
This will not help a data analyst exchange data between CSV and Excel
Thank you! That makes more sense now
I have know idea if the project author would agree with those sentences, I was just proposing them.
Useful if you have a JSON-keyed table, for example: JSON lacks a useful (standardised) equivalence relation, meaning you get weak and/or implementation-specific guarantees about how key lookup works. Equivalences were the motivation for developing Preserves: I was (and am) working on a generalized approach to messaging middleware, you might say, meaning that things like "patterns over values" and "filters" and "value-keyed tables" are all things I need to talk about. (This all comes out of RabbitMQ/AMQP thinking back in the day and my PhD-and-after work subsequently.)
CSTML is targeting many of the same weaknesses in JSON. It's fun to see a whole different, competing set of design choices at work. I had a very different take on schema validation and how to use the < syntax.
Do you have a link for CSTML, please? Googling is showing a bunch of possibilities none of which look quite relevant enough to be right...
https://github.com/bablr-lang/
https://bablr.org/playground
Thank you!
Looking at the headings in that TOC, “Preserves“ is a bit of an unfortunate naming choice grammatically.
Yeah I struggle with "Preserve" vs "Preserves" sometimes. Was there something in particular that struck you as unfortunate, though?
I can not find an Emacs mode. Does anybody know one?
https://gitlab.com/preserves/preserves/-/blob/main/preserves... -- crude but effective! (I use it all the time)
So EDN?
Yeah EDN is quite similar. Preserves has no nil, allows any value as a tag, gets into the weeds more on when strings are equal or not, doesn't distinguish lists and vectors, and doesn't require each kind of tagged element to define an equivalence. And it has annotations (vs EDN's comments) and embedded values.
It seems like it doesn't natively support decimals (needed for financial data) or any sort of date or datetime?
Date and datetime are left to convention: https://preserves.dev/conventions.html#dates-and-times
Decimals I'm on the fence about. Some discussion here: https://gitlab.com/preserves/preserves/-/issues/10