Split text into several files, reassemble it when compiling

M5
M55
Posts: 4
Joined: Sun Mar 21, 2021 9:39 am
Platform: Mac

Sun Mar 21, 2021 10:10 am Post

Hi everyone,

just a quick question: in a structure like shown in this image

scriv1.jpg
scriv1.jpg (35.69 KiB) Viewed 389 times


I would like to have the compile output to pull the sub-chapter content from the text files, so if Text 1 contains "...bla bla bla end of text 1 " and Text 2 contains "start of text 2 bla bla bla...", Sub-Chapter 1 compiles to

...bla bla bla end of text 1 start of text 2 bla bla bla...

without any added breaks, spaces or new lines and the whole of Sub-Chapter 1 still formatted according to the section layout that is assigned to it.

I can't figure out how to define the section layouts for Text and Sub-Chapter. Can anybody give me a hint or am I trying to do something that's just not possible?


Thanks,

Marc

Me
Merx
Posts: 89
Joined: Sat Feb 27, 2021 6:59 am
Platform: Mac

Sun Mar 21, 2021 10:57 am Post

Does this similar thread help?

viewtopic.php?p=324608#p324608

Merx

M5
M55
Posts: 4
Joined: Sun Mar 21, 2021 9:39 am
Platform: Mac

Sun Mar 21, 2021 12:45 pm Post

Thanks Merx - kind of. I was hoping to avoid the additional step, but it's a possible workaround. My bigger problem is that the section layout settings for the "Sub-Chapter" level don't include the "Text" level, for example "Number of words to make uppercase" has no effect, probably because the "Sub-Chapter" actually doesn't have any words.

I guess that's just how Scrivener works and I need to rethink the layout.

User avatar
gr
Posts: 2314
Joined: Wed Feb 14, 2007 3:57 am
Platform: Mac + iOS
Location: Florida

Sun Mar 21, 2021 1:49 pm Post

You can do this. You can make a custom compile that removes the minimal carriage return between text docs. In this way you can even have sentences that span documents and which will get fused on compile. The trick is to realize that the Replacements specified in compile are performed after any separators between sections are added.

Suppose you have assigned a section type T to all the text docs you want to treat -- that is, whenever two such text docs are adjacent in the binder, you want them to compile as truly continuous text.

Edit compile settings as follows:

1) In the Separators section, find the appropriate type for the text docs you want to treat (all adjacent text files? or just adjacent text files assigned to a certain section layout?) and specify a Separator Between Sections. Set that to Custom and type something that will not otherwise occur in your text. Suppose we use the symbol '@' by way of example.

2) In the Replacements section, add a replacement entry to Replace string:

Code: Select all

\n@\n

Leave the With field blank ( assuming you don't want to introduce even a space character between docs). Check the RegEx option.

That's it!

Alternative: You could instead skip step (2) above and use the Replacements tab on the right side of the Compile dialog box instead. This would be useful if you wanted to be able to toggle on/off the replacement. If you went this route, you might want to include also a replacement to replace '@' with nothing. In this way you could switch the fusion on (enable the former, disable the latter replacement) or switch fusion off (disable the former, enable the latter replacement).

gr
gr : Scrivener user : not affiliated with Lit^Lat
Image
"Nothing, like something, happens anywhere." —Philip Larkin

br
brookter
Posts: 2301
Joined: Wed Mar 18, 2009 12:22 pm
Platform: Mac

Sun Mar 21, 2021 2:41 pm Post

GR has explained how to join the 'Text sections' together (as I was typing a reply... :D:).

You can also make the first X words of the subchapter be upper case.

1. Instead of having a separate binder document for the first 'text', just put that content into the text of the Subchapter itself, and set it to have the first X words in upper case.

2. In the Separators section, have you custom separator for you Text section type/layout as BOTH 'before' and between'.

Here's a dummy project illustrating the binder layout and output:
Screenshot 2021-03-21 at 14.40.09.png
Screenshot 2021-03-21 at 14.40.09.png (904.89 KiB) Viewed 351 times


The Separator panel looks like this:

Screenshot 2021-03-21 at 14.23.34.png
Screenshot 2021-03-21 at 14.23.34.png (313.02 KiB) Viewed 351 times


And the replacement screen like this

Screenshot 2021-03-21 at 14.24.03.png
Screenshot 2021-03-21 at 14.24.03.png (244.61 KiB) Viewed 351 times

You only have to set all this up once.

[BTW, of course you could just have a fifth Section Type and Layout to take care of the first text in a subchapter, but there's really no point when you can use the Subchapter itself...)

HTH.

Me
Merx
Posts: 89
Joined: Sat Feb 27, 2021 6:59 am
Platform: Mac

Sun Mar 21, 2021 3:33 pm Post

gr and brookter, many thanks.

Merx

User avatar
kewms
Posts: 7595
Joined: Fri Feb 02, 2007 5:22 pm
Platform: Mac

Sun Mar 21, 2021 3:59 pm Post

M55 wrote:Thanks Merx - kind of. I was hoping to avoid the additional step, but it's a possible workaround. My bigger problem is that the section layout settings for the "Sub-Chapter" level don't include the "Text" level, for example "Number of words to make uppercase" has no effect, probably because the "Sub-Chapter" actually doesn't have any words.

I guess that's just how Scrivener works and I need to rethink the layout.


Note that you can create whatever additional Section Layouts you need.

Katherine
Scrivener Support Team

M5
M55
Posts: 4
Joined: Sun Mar 21, 2021 9:39 am
Platform: Mac

Mon Mar 22, 2021 8:26 pm Post

Thanks everybody for all the replies.

brookter wrote:Instead of having a separate binder document for the first 'text', just put that content into the text of the Subchapter itself, and set it to have the first X words in upper case.


It looks like that's not even necessary and I could just use 'first X words in uc' in the Text sections - only the first Text section ever follows a page break, so the result looks exactly like if the actual text was contained in the Sub-Chapter files.

gr wrote:The trick is to realize that the Replacements specified in compile are performed after any separators between sections are added.


I tried to follow your advice, but no luck. Here's what I did:


scriv-concat.jpg
scriv-concat.jpg (854.1 KiB) Viewed 297 times



I think you're correct the replacements are applied after the addition of separators, but it looks like they are also applied only after the text is converted to HTML, at least in my case. So I tried removing the complete

Code: Select all

<p class="separator">XXXX</p>


and that did work.

Next I tried a regex that should match the structure around the XXXX separator and replace it with a concatenation of the two paragraphs around it: https://regex101.com/r/FYJ6vS/1 seems to work fine in the tester, but I couldn't get the same result in Scrivener (I know that HTML and regex isn't exactly the best idea anyway).

@Katherine: can you comment on the order of Replacements and if I butchered the regex somehow?


Thanks,

Marc

User avatar
kewms
Posts: 7595
Joined: Fri Feb 02, 2007 5:22 pm
Platform: Mac

Mon Mar 22, 2021 9:28 pm Post

Yes, Compile format Replacements are invoked after the text is converted to HTML, as one of the uses for Replacements is to save you from having to type the full HTML (or LaTeX) syntax for complex elements. See Section 23.4.4 in the Scrivener manual. (You're actually using a Replacement in the Compile Format, discussed in Section 24.15, but the syntax is the same.)

I'm not very familiar with regex myself, so I'll leave that part of your question to others.

Katherine
Scrivener Support Team

User avatar
gr
Posts: 2314
Joined: Wed Feb 14, 2007 3:57 am
Platform: Mac + iOS
Location: Florida

Tue Mar 23, 2021 2:45 am Post

Marc,

But why are you wildcarding so many of parts of the target string. The bit of html you are trying to replace will always be exactly the same. It is a constant. So take that exact string, flank it with newline specifications and then insert backslashes in front of any of the inner characters that need escaping. Isn’t the regex you want just this:

Code: Select all

 <p class="separator">XXXX<\/p>


gr

Yeah, did not know you were aiming for epub. I tested my original write up only with pdf as target format.
gr : Scrivener user : not affiliated with Lit^Lat
Image
"Nothing, like something, happens anywhere." —Philip Larkin

br
brookter
Posts: 2301
Joined: Wed Mar 18, 2009 12:22 pm
Platform: Mac

Tue Mar 23, 2021 3:12 am Post

Doesn't it have to be a bit more complicated that that though? You're removing the separator line, but you're still leaving the </p> from the preceding section, and the <p class= etc from the next on, so won't it still treat them as separate paragraphs? I think you may also need to escape the ".

I tried doing this outside compilation (as a test with vim regex) with

Code: Select all

:%s/<\/p>\n<p class=\"separator\"XXXX<\/p>\n<p id=\"doc\d+\">//g


and it worked (though it's probably quite fragile! and dependent on my test project setup rather than being general.) I haven't tried putting that through the replacement dialogue yet as I should have been in bed hours ago...

M5
M55
Posts: 4
Joined: Sun Mar 21, 2021 9:39 am
Platform: Mac

Tue Mar 23, 2021 8:52 am Post

@gr: Sorry, I should have mentioned I'm compiling to epub. My bad. The regex you posted would leave

Code: Select all

</p></div><div class="snippet"><p class="ps2" id="doc11">


between the two Text blocks (just like Brookter said), so they wouldn' t be merged together.

@Brookter: You're right of course, and the regex I actually came up with was https://regex101.com/r/FYJ6vS/1/, but I couldn't make it work in Scrivener. I agree this is messy and doesn't look reliable at all. Using vim or sed for this might work, but I think I'll try compiling to Markdown first. That should be a lot easier to clean up.

User avatar
gr
Posts: 2314
Joined: Wed Feb 14, 2007 3:57 am
Platform: Mac + iOS
Location: Florida

Tue Mar 23, 2021 10:36 pm Post

Ah yes. I forgot about the need to do the equivalent of removing the pre and post newlines!
gr : Scrivener user : not affiliated with Lit^Lat
Image
"Nothing, like something, happens anywhere." —Philip Larkin