TeXDown

mn
mnott
Posts: 32
Joined: Fri Nov 18, 2016 1:12 pm
Platform: Mac + Linux

Fri Nov 18, 2016 1:21 pm Post

Hello all,

I'd like to contribute something here. I've just started my PhD project using Scrivener, and am a LaTeX user since 25 years. I've managed to make my LaTeX template live within Scrivener, but have run across a certain number of issues:

  • I find Scrivener's idea to use dynamically generated file names, and using RTF, challenging when it comes to versioning e.g. using Git.
  • I find Scrivener's compilation and export to be very very slow, even if just going to plain text files. Also, Scrivener on Mac seems to consistently crash when the target file is already there.
  • MultiMarkDown does not really support what I often use with LaTeX, i.e., a variety of reference commands, citation commands, etc. And neither does Pandoc. I don't really need conversion to something else than LaTeX, and I like the idea of simplifying the commands I use most often.

To address these issues, I've written a little script that does this:

  • Provides for a number of LaTeX commands in Markdown format
  • Can work on plain LaTeX files
  • Can directly extract, and convert to Plain text, files from Scrivener - and is very fast at that
  • Hence can work as a pre-processor to the build scripts I anyway have for my LaTeX builds

I'm providing this script to the community - feel free to comment, improve and use it as you like.

Here is the link:

https://github.com/mnott/texdown

User avatar
nontroppo
Posts: 749
Joined: Mon Mar 05, 2007 5:22 pm
Platform: Mac
Location: Airstrip One

Fri Nov 18, 2016 5:47 pm Post

Very cool addition to the writing format options for using Scrivener, thank you! Just to clarify, this scans the Scrivener project directly so you do not use the compile mechanism?

And out of curiosity, what are the limitations with Pandoc you are overcoming?

mn
mnott
Posts: 32
Joined: Fri Nov 18, 2016 1:12 pm
Platform: Mac + Linux

Sat Nov 19, 2016 1:21 am Post

It parses the scrivx file directly, i.e., it does not use Scrivener's compile option - which I found to be very slow.

I've just added support for itemize - to the extent that Scrivener supports itemizes (or my rtf parser about it): In essence, when I parse rtf back to plain txt, I get a maximum of two levels of items. Which is fine for me and allowed me to add support for it today (in an ugly way, but it works). Of course you can any time just use plain LaTeX.

W.r.t. Pandoc: Just look at my documentation. I am somewhat sure that you can have support for all of that using Pandoc's approach in terms of scripting etc. For me, I really wanted to have a fast version which I can understand and easily adapt, should I need to. I've been working with it today for the whole day, doing some literature review, and it works absolutely awesome - extremely fast, and not interfering with my workflow.

With that, I can use Scrivener for all my academic writing, as I can now just shuffle around blocks of text that I am working on. My enclosing project just contains the related LaTeX front/back matter templates that I use. I run TeXDown on the enclosing folder, and then I get out a simple LaTeX file which I can run through.

I'll probably next add support for locating folders hierarchically (at the moment it takes the first match it finds, as root folder).

mn
mnott
Posts: 32
Joined: Fri Nov 18, 2016 1:12 pm
Platform: Mac + Linux

Sun Nov 20, 2016 9:34 pm Post

BTW I've just added the option to respect Scrivener's "IncludeInCompilation" metadata field - in a recursive way, which means, if you have some content that you want to exclude, you just unselect that field at the top node.

For example, I may have a list of text nodes, each for a given article that I'm reading. Each text node is for one article, and each text node will have children like Research Question, Research Methods, Research Methods/Dependent Variables, ..., Results, and so forth, and if I don't want to have a review of a given article being reported in the compilation of my literature review, I just uncheck the root node for that given article.

Scrivener itself will leave the child nodes activated, which is not so useful in this case. So my little script will just stop processing child node branches the moment it hits one node that has been excluded.

mn
mnott
Posts: 32
Joined: Fri Nov 18, 2016 1:12 pm
Platform: Mac + Linux

Sun Nov 20, 2016 11:44 pm Post

...and I've added the option to collate the exported project out of multiple, disjunct folders, and to also use absolute folder names, so e.g. that allows to have some more technical folders like

Code: Select all

/LaTeX/Articles/Front Matter
/LaTeX/Articles/Back Matter


To then be able to combine that with some content "Literature Review" that may live in some place. All of those can contain child objects, and here's then the way to collate the TeX file:

Code: Select all

./texdown.pl Dissertation.scriv -projects "/LaTeX/Articles/Front Matter" -projects "Literature Review" -projects "/LaTeX/Articles/Back Matter" -incompilation -listids -listsections


The last two parameters, -listids and -listsections, alternatively allow you to just print out the hierarchy of objects inside the scrivener database (-listsections) as well as their corresponding IDs (-listids) - which of course just happen to also be the names of the rtf files in Files/Docs.

Which, accidentally, gives you an interesting way to understand the mapping of Scrivener content to rtf files, for your whole database, like so:

Code: Select all

./texdown.pl Dissertation.scriv -projects / -listids -listsections

mn
mnott
Posts: 32
Joined: Fri Nov 18, 2016 1:12 pm
Platform: Mac + Linux

Mon Nov 21, 2016 3:38 pm Post

Little further update: I've added the support for configuration files. Which means, you can now easily have a very complex structure in Scrivener, pulling assets from all over the place, and still compile them easily. For example, let's say you have three locations from where you need to take content from, in order, parse it, and throw it at LaTeX:

Code: Select all

/LaTeX/Article/Frontmatter
"Research Design"
/LaTeX/Article/Backmatter


Normally you'd have to specify all of that on the command line. But if we put it in a configuration file Dissertation.cfg, which lives in the same directory as Dissertation.scriv (all those locations can of course be further specified on the command line, let's take the easiest case)

Code: Select all

[rd]
; Research Design
p=/LaTeX/Article/Frontmatter, "Research Design", /LaTeX/Article/Backmatter


Rather than doing this:

Code: Select all

./texdown.pl Dissertation -c -p /LaTeX/Article/Frontmatter "Research Design" /LaTeX/Article/Backmatter


You can do this:

Code: Select all

./texdown.pl Dissertation -c -p rd


Essentially, this creates an indirection of the specification of the projects. As a motivation, here's the output of a sample call (using the -l parameter to list, rather than process, the assets):

Code: Select all

./texdown.pl Dissertation -c -l -p rd
[     287] /Frontmatter
[     279] /Research Design
[     289] /Research Design/Title Page
[     282] /Research Design/The Research Problem
[     283] /Research Design/The Research Question
[     290] /Research Design/The Research Strategy
[     291] /Research Design/The Research Assumptions
[     292] /Research Design/The Research Assumptions/Ontological
[     293] /Research Design/The Research Assumptions/Epistemological
[     294] /Research Design/The Research Paradigms
[     295] /Research Design/The Relevant Concepts and Theory
[     296] /Research Design/The Hypotheses
[     297] /Research Design/The Data
[     298] /Research Design/The Data/Sources
[     299] /Research Design/The Data/Types
[     300] /Research Design/The Data/Forms
[     301] /Research Design/The Data/Methods of Selection
[     302] /Research Design/The Data/Methods of Data Collection
[     303] /Research Design/Potential Problems and Limitations
[     288] /Backmatter


The numbers at the beginning of the lines are the internal IDs of Scrivener, and, by coincidence, also the file names of the rtf files as per the Files/Docs directory.

If not using "-l", all of those assets are going to be parsed and appended into one output, in their order given. This allows you to have, as shown in the above example, a "typical" LaTeX Frontmatter and Backmatter in some location, and then just pull in the actual content from whatever stuff you are writing at the moment.

Oh yes, and other than that, I've simplified the command line parameters a lot.

mn
mnott
Posts: 32
Joined: Fri Nov 18, 2016 1:12 pm
Platform: Mac + Linux

Tue Nov 22, 2016 7:25 pm Post

Out of the need to find something across all my files in my Scrivener library, while not wanting to use its search function but rather do it via the command line, I've added that to TeXDown.

Let's assume you have somewhere in your files used the command \part.

Let's define a command line function that will anyway help us to find things easily. Just enter this on the command line:

Code: Select all

ff () { find . -type f -iname "*$1" -print0 | xargs -0 grep -i "$2" ; }


You can of course, if you like it, put it into your

Code: Select all

~/.profile


Let's use it from inside the directory where our Dissertation.scriv file lives:

Code: Select all

ff rtf parta
./Dissertation.scriv/Files/Docs/216.rtf:\\def\\parta\{Thesis\}\
./Dissertation.scriv/Files/Docs/281.rtf:\\def\\parta\{Thesis\}\


OK, so we know now that we want to look at the Scrivener objects that would correspond to those files 216 and 281.

How do we find where these objects are, in Scrivener? Here's how:

Code: Select all

./texdown.pl Dissertation -i 216 281
/Dissertation/
/Trash/LaTeX - Front Matter/

mn
mnott
Posts: 32
Joined: Fri Nov 18, 2016 1:12 pm
Platform: Mac + Linux

Tue Nov 22, 2016 7:37 pm Post

And one more thing: Since grepping over rtf files is only so good, I've also added an actual command line search for Scrivener to TeXDown:

It allows you to perform a search on any of your scrivener
projects. The search string can be plain text, or even a regular
expression. So here's how we are going to search for all the
sections that we have in our roilr project as defined per
configuration file:

Code: Select all

  ./texdown.pl Dissertation -c -p roilr  -s "section"
  [     322] /ROI - Literature Review/coakes - 2011 - sustainable innovation and right to market: \section[Coakes, Smith, and Alwis (201
  [     335] /ROI - Literature Review/desouza - 2011 - intrapreneurship managing ideas within your: \section[Desouza (2011)]{\citet{Desouz
  [     348] /ROI - Literature Review/dyduch - 2008 - corporate entrepreneurship measurement for improving organizational: \section[Dyduch (2008)]{\citet{Dyduch:
  [     361] /ROI - Literature Review/hornsby - 2002 - middle managers' perception of the internal environment: \section[Hornsby, Kuratko and Zahra (2
  [     175] /ROI - Literature Review/zahra - 2015 - corporate entrepreneurship as knowledge creation and conversion: \section[Zahra (2015)]{\citet{Zahra:20
  [     167] /ROI - Literature Review/zahra - 1993 - a conceptual model of entrepreneurship as firm behavior: \section[Zahra (1993)]{\citet{Zahra:19


Let's assume we want to see only those sections which have Zahra
in them:

Code: Select all

  ./texdown.pl Dissertation -c -p roilr -s "section.*Zahra"
  [     361] /ROI - Literature Review/hornsby - 2002 - middle managers' perception of the internal environment: \section[Hornsby, Kuratko and Zahra (2
  [     175] /ROI - Literature Review/zahra - 2015 - corporate entrepreneurship as knowledge creation and conversion: \section[Zahra (2015)]{\citet{Zahra:20
  [     167] /ROI - Literature Review/zahra - 1993 - a conceptual model of entrepreneurship as firm behavior: \section[Zahra (1993)]{\citet{Zahra:19


Let's search for those where Zahra is the first author, measured
by that it is close to the section tag:

Code: Select all

  ./texdown.pl Dissertation -c -p roilr -s "section.{1,5}?Zahra"
  [     175] /ROI - Literature Review/zahra - 2015 - corporate entrepreneurship as knowledge creation and conversion: \section[Zahra (2015)]{\citet{Zahra:2015aa}}
  [     167] /ROI - Literature Review/zahra - 1993 - a conceptual model of entrepreneurship as firm behavior: \section[Zahra (1993)]{\citet{Zahra:1993aa}}


Finally, as we can combine this with the other command line
parameters, let's not have TeXDown parse the markdown code
first, but search for all places where we may have left \section
commands in plain LaTeX code:

Code: Select all

  ./texdown.pl Dissertation -p / -n -s '\\section'
  [      94] /Dissertation/LaTeX - Front Matter/01 - Appendix/03 - Symbols/00 - Manual: % \section{Some Greek symbols}


Don't forget to use four \\\\ if you use double quotes, or two
\\, if you use single quotes.

Other than shown here, the output is actually nicely colored.

User avatar
nontroppo
Posts: 749
Joined: Mon Mar 05, 2007 5:22 pm
Platform: Mac
Location: Airstrip One

Wed Nov 23, 2016 1:55 am Post

Configuration files are cool, they remind me a bit of Panzer for Pandoc.

I have a feature query. Not sure if you use snapshots when you write, but they are an excellent way to keep named revisions for each document in your project. If I make corrections from a collaborator, submit a referee revision or whatnot I create a named snapshot of all my documents. However, you can only really benefit from looking at snapshots one-by-one within the Inspector.

I often need to generate a version of my project from a particular revision (i.e. create the first revision of the project submitted to Journal X). This is incredibly laborious, so I asked in a Wishlist post to allow Compile to regex replace a named revision of a document:

Compile a Named Snapshot

I have toyed with the idea of doing this myself from the Scrivener XML using a script but never had the time to sit down and do it. TeXDown already does most of what is needed, and so I wondered whether being able to pass a snapshot regex would be difficult to add.

Also I do use Pandoc as I only occasionally use LaTeX. I think TeXDown in its plain text mode should also work if I want to compile a file I could then pass to pandoc?

mn
mnott
Posts: 32
Joined: Fri Nov 18, 2016 1:12 pm
Platform: Mac + Linux

Wed Nov 23, 2016 7:36 am Post

Interesting.

First, since I'm new to using Scrivener, how do you actually create a snapshot of stuff? I see the menu entry, but it allows me to only do a snapshot of what I have manually selected - even not of child documents. Am I missing something?

So on that level, what I'd like to do:

- create/rename/delete/diff snapshots from the command line, on a selection of stuff.
- list snapshots that are available, then select what I want to have

But on another hand, would this not rather say that what we really want to have, is a better git support? Why reinvent the wheel?

Now git of course is not really great with rtf files, and it surely is not so great with the fact that we have dynamically generated filenames.

This is something I am already thinking about, but note that we can only work on what Scrivener exposes.

Sometimes, it's more than you think, for example, yesterday I found out from another post on the forum that Scrivener supports footnotes, and it took me like just an hour to include support for them in TeXDown - what they do is for document ID 123, there's not only going to be an 123.rtf, but also an 123.comments, which contains XML, and in CDATA fields contains the content of footnotes as rtf. So easy to parse, and since it is again rtf, what I just do is search replace into the rtf content of 123.rtf the content of the 123.comments, without even trying to completely understand the rtf structure.

I don't know whether that's risky, really.

Now, on your other question, yes of course, you can just switch off the parser using -n.

User avatar
nontroppo
Posts: 749
Joined: Mon Mar 05, 2007 5:22 pm
Platform: Mac
Location: Airstrip One

Wed Nov 23, 2016 12:32 pm Post

I doubt we'll ever get a Git friendly Scrivener (Keith is very much pro-RTF, and using numbered rather than named docs), I think this has been discussed many times before. Though I'd be happy if Scrivener did this, it is a fringe feature most users would not use so I understand Keith's reticence.

For a named snapshot, you do need to expand all folders, so I simply expand all in either the binder tree or the outliner (⌘9), select all (shift+select in binder, or ⌘a in outliner), then ⌘⇧5 to create a named snapshot for the selected documents. Very quick.

I had a quick check, snapshots are not listed in the .scrivx file, so you need to parse the Project.scriv/Snapshots folder. Each numbered document that has a snapshot gets a numbered folder (10.rtf gets a 10.snapshots folder), containing datestamp.RTF files and an index.xml. Therefore to resolve the names one would need to recurse the directories and parse the index.xml files to generate a list of name↔datestamp. I suppose easier would be just to use a datestamp filter. So for example using -r or --revision:

Code: Select all

texdown.pl -n --revision '2016-11-01-13-00-31' -p myproject


…would parse the scrivx for the to-compile docs list as you currently do, then check the snapshots of each doc to match that datestamp.rtf to use instead of the current doc.

-------------------

Regarding the command-line interface to generate a snapshot, in theory it is very simple as all it requires is adding a datestamp.rtf and updating the index.xml. But I don't know how scrivener reads this, does it use an internal cache on load or reads de-novo from disk each time the snapshot panel is shown?

And yes, the Scrivener project format is awesome — it is great this is a folder with readable indexes and clear layout. Your tool is a great example of the benefits of this!

mn
mnott
Posts: 32
Joined: Fri Nov 18, 2016 1:12 pm
Platform: Mac + Linux

Wed Nov 23, 2016 2:32 pm Post

True about the openness of the format, and yes, it specifically allows to do some hacking like I did. Just like he says in this video: https://realm.io/news/altconf-wil-shipley-git-document-format/.

I think you'd have to close Scrivener before doing file level operations. And yes, you could trigger snapshots like this - I understand the logic behind there, it is simple and can be parsed.

So I've looked a bit into the Git issue, see here: http://www.mnott.de/unscrivening/ - I think the real option would really only be for the Scrivener people to embed the Git API as suggested by the people from OmniFocus, in the above video. That would clearly add value, and would allow some more control.

That leaves me with re-considering the workflow that you had suggested - how would it really work? How would you want it to work? Do you want to export a given snapshot (i.e., compile in scrivener speak), do you want to import it as a new document tree, or what do you really want to do?

mn
mnott
Posts: 32
Joined: Fri Nov 18, 2016 1:12 pm
Platform: Mac + Linux

Wed Nov 23, 2016 2:37 pm Post

Oh man, I didn't see that you suggested an interface. Yes, so you want to compile a certain version / revision, by name or time stamp, and then just use that one to run an export. You'd probably also want a list of snapshots that are available from any asset, somehow like I did in my (unrelated) cccrestore (http://github.com/mnott/cccrestore/). But yes, I think the export can be done - essentially you would of course get a detached compilation - it's not going to be working two ways (I always think round-trip editing).

mn
mnott
Posts: 32
Joined: Fri Nov 18, 2016 1:12 pm
Platform: Mac + Linux

Wed Nov 23, 2016 2:53 pm Post

Oh dear, and yes. That shows you why you just don't want to create your own snapshot or whatever feature if someone else has already done it in a better way.

I mean, come on. I've just created two snapshots. Several hundred files were created. Of course, because every time every single revision is created as its own file. Compare that to what Git does, and it is like so much last century :) Really, I think Scrivener could so much benefit from embedding Git behind the scenes, like they did at Omni.

No wonder taking snapshots takes forever, and even if we want to live with many files, hey, what I cannot really live with is to have to manually select the assets - I mean yes, I understand how you "expand all", but really??? Then later if I want to remove a snapshot, I can only remove like one by one. I cannot rename snapshots. I will make mistakes taking snapshots. That's why we need a command line interface, anyway. I won't be able to get a merge, that's for sure - because that, as I've blogged about, will come at a loss of things. I'll be able to make a diff only if I export the whole tree into a file system - and really, why wouldn't I anyway. Rather than using snapshots, I could just perhaps create a 2 way import/export for Scrivener, writing out into real directory structures, and then mapping it back in using the scrivx file.

I could even do an exporter that pulls out only part of scrivener's assets into another scrivener "file" (I mean, .scriv) so that you can then give away only that one. And probably we could import back if we understood the UUID scheme behind in the scrivx.

Hey, this started out for me wanting to use Scrivener for my writing. It is good at that already. I can use it as a frontend to LaTeX, and that significantly improves my workflow. I'll think about what else I need, but honestly, I could also just forget about merging, and live with Git for my snapshots, as branches - I'd be done at that point then already. I could even clean out all my snapshots regularly, since I don't really need them - at least not when I've committed / pushed to my git repository, as I'll be able to pull back out from there anyway.

Tower + Kaleidoscope are pretty good at comparing, only not at merging. So that should be potentially good enough.

What do you think?

User avatar
nontroppo
Posts: 749
Joined: Mon Mar 05, 2007 5:22 pm
Platform: Mac
Location: Airstrip One

Thu Nov 24, 2016 4:15 am Post

Ha, I had mostly just discounted using Git with Scrivener due to there being no plans for support from Keith.

My needs are pretty simple personally, I've always used named snapshots at important points in all my projects, and I really miss a compile-from-snapshot tool. As you said I'm only thinking of this as a one-way route. I do very much understand why a better versioning mechanism, where the project context is understood, would be wonderful. I just don't see that the significant complexity of hacking a solution would really be worth it. Named snapshots are the closest thing we have[1], and I use Dropbox versioning as a backup if I make a change I forgot needed a named snapshot.

Anyway your unscrivenings post was an interesting read!

----
[1] one caveat of snapshots is that they don't version the footnotes/comments right?