Why does Scrivener use &nbsp for separating sections?

Ro
Robotech_Master
Posts: 44
Joined: Sun Feb 12, 2012 3:42 pm
Platform: Windows

Tue May 03, 2016 12:18 am Post

I was in a discussion over on the Mobileread forum with Jim Chapman, developer of the Windows 10 EPUB-reading app Freda, asking him to add support to his app so that my Scrivener-compiled EPUBs would display properly in his app. (See the other thread I started in this forum.) He agreed to amend Freda to throw in a line break for the <p>&nbsp;</p> format Scrivener uses, then he added:

Regarding the 'right' way to represent vertical space in epub files: It is pretty clear that using a p element containing only &nbsp; is an ugly hack, and is the wrong thing to do, in terms of web standards (see for instance the discussion at http://stackoverflow.com/questions/1484 ... tor-or-not ). The right thing is certainly to use CSS styles to add a margin of the appropriate size. I'd hope that the implementers of Scrivener etc. will get round to fixing their program at some point, to do the right thing. I don't feel especially proud of having changed Freda to fit in with their broken interpretation of the xhtml standard ;-)


Is that right? If so, why does Scrivener do it this way? It seems that keeping to XHTML standards should be the way to go.

So why doesn't Scrivener? Should it? I'd really like to know.

Ro
Robotech_Master
Posts: 44
Joined: Sun Feb 12, 2012 3:42 pm
Platform: Windows

Tue May 03, 2016 3:02 pm Post


User avatar
AmberV
Posts: 20613
Joined: Sun Jun 18, 2006 4:30 am
Platform: Mac + Linux
Location: Santiago de Compostela, Galiza
Contact:

Wed May 04, 2016 12:52 am Post

We are in complete agreement on whether or not using empty paragraphs with non-breaking spaces, or sequences of <BR/>s, to affect spacing is not at all a good way to make a well-formed page. It’s not exactly breaking standards to do this, to be clear, you won’t get an error result from W3C or an ePub validator—but it is a bit like using the Tab key to indent your paragraphs instead of proper ruler formatting. I see in your blog post you did represent that point of view as well. I think everyone that works with HTML feels it is ugly and when in a circumstance where CSS can be used to generate spacing, will do so.

We don’t have a lot of control over the conversion from RTF to HTML, to be clear. By and large we don’t generate HTML, we make a document look right and then ask the engine to turn what that looks like into HTML/CSS that looks as close as possible to the original. What we can do is often limited to what can be done with Find and Replace All—just searching for string patterns and changing them. As you can imagine that’s a fragile approach that must be used very carefully. I’ve added a note to see if we can search for empty paragraphs and replace them with 1em-height CSS margins, but I can’t promise anything. I mean we basically have to look for these stub lines, remove them, and then hack a class into the prior paragraph’s style attribute. That’s not impossible, but again just doing that with search and replace is risky. What if the thing above the stub paragraph is another stub paragraph, or something other than a body text paragraph or maybe the stub paragraph isn’t actually meant to be a scene separator and being generated for some other reason, etc.

I’m not even sure if using padding on the ultimate paragraph of a scene is the right approach either. I would think that a section separator should be a discrete element that could be styled centrally and referred to in the DOM as a specific semantic thing, rather than one paragraph with way more padding below it than most paragraphs.

Looking to the future (somewhat long-term) we hope to introduce an optional and more advanced approach to ePub generation that would result in extremely clean HTML code, using another engine entirely. It’s still far too early to give any details or estimations on how it will work though.
.:.
Ioa Petra'ka
“Whole sight, or all the rest is desolation.” —John Fowles

Ro
Robotech_Master
Posts: 44
Joined: Sun Feb 12, 2012 3:42 pm
Platform: Windows

Wed May 04, 2016 10:57 am Post

Thanks for the response!

But what about the breaks between separate scenes? Those aren't represented by a blank paragraph that would get replaced by a non-breaking space; they're not represented by anything at all but generated by Scrivener. Apparently Scrivener throws a non-breaking space paragraph in there, too. Why not <hr /> instead?

User avatar
AmberV
Posts: 20613
Joined: Sun Jun 18, 2006 4:30 am
Platform: Mac + Linux
Location: Santiago de Compostela, Galiza
Contact:

Wed May 04, 2016 7:43 pm Post

That’s what I was referring to when I said we make a document look a certain way and then pass that to the HTML converter. The software inserts an empty paragraph and the HTML converter dutifully inserts what is necessary to make an empty paragraph display in most contexts. I’m not aware of any construct in RTF that would universally be considered an <hr/> in the modern sense of the element.

Have you considered using MultiMarkdown with Scrivener? A lot of what I’m saying here is owing to the limitation of being an RTF based editor and trying to generate clean HTML out of that. MMD works by ignoring all of the rich text stuff and using Scrivener more like a plain-text editor with a simple syntax based heavily on Markdown. MMD itself does not have an ePub generator, but (a) the HTML5 it produces is super clean and semantic, and (b) there is another tool called Pandoc which can take MMD files created in Scrivener and turn them into ePubs—it does a pretty good job of it, too.
.:.
Ioa Petra'ka
“Whole sight, or all the rest is desolation.” —John Fowles