1.7.0.1 - Web Import fails

Li
Lilith
Posts: 351
Joined: Sat Nov 13, 2010 3:50 pm
Platform: Windows
Location: Belgium

Mon Feb 03, 2014 9:02 am Post

Hi,

Maybe there is something that I didn't understand correctly.

I just tried to import this page from Wikipedia : http://fr.wikipedia.org/wiki/Les_Noces_ ... osenkreutz
... and got a fail.

Except for the pictures, Wikipedia has always given the best results on importing pages.
This time, I notice that only two options are left : Pdf and MHT, none of them being selectet at first.

First try : without selecting any. I get a message saying "import of URL failed"

Second try : as MHT. I get a blank page with a link to the original web page. The link opens Internet explorer, which I never use (I'm on Firefox, and Scrivener has always used Firefox when prompted to open an imported web page).

Third try : as PDF. I get a single blank page, although the web page should give me about 3 A4 pages.

User avatar
Sanguinius
Posts: 612
Joined: Sun Dec 04, 2011 4:16 pm
Platform: Windows

Mon Feb 03, 2014 2:33 pm Post

I verify this behavior. I tried importing the page Lilith tried with and got the same result. Then I tried reimporting a webpage that I had successfully imported into a project with 1.5.6. The same thing happened.

On a side note, what exactly is the "MHT" format? Is this the same as "Dynamic Web Browser?"

Also, are we really no longer going to have the option of importing a webpage as an "Image (Browser Quality)", or "Plain Text?" I use both of these, depending on the webpage, as I don't always want all of the images or links and sometimes just want the text.

User avatar
tiho_d
Posts: 968
Joined: Tue Sep 13, 2011 1:14 pm
Platform: Linux + Windows

Mon Feb 03, 2014 3:02 pm Post

In this release of Scrivener we changed the Web rendering engine that provides MHT support(similar to webarchive under Mac), so it might happen that some pages that rendered fine previously, have issues at the moment. Please, let us know some examples so that we can fine tune the functionality.

MHT is a web format that contains all the info from the webpage. It is not a Dynamic Web Page, that existed in previous versions. As Scrivener is not being able to display MHT files yet, it loads the MHT file within a browser. Native MHT display within Scrivener will come with the next versions, but not in the next official release. Have in mind that the MHT file with its full contents will be available even without an Internet connection and it is a preferred way of storing web pages at the moment. Sometimes rendering the webpage to PDF might fail based on the technologies used within the webpage.

We are still considering whether to keep the previous web import formats like - Plain Text Only, and Image - Browser Quality. Extracting text from a web page via "Select" > "Copy" > "Paste" is probably the best way to extract a selected text. Image Browser Quality has always been a compromise as it does not allow extracting of text and we believe MHT does a better job here with more consistent results. If you find the old options very useful please let us know and let us know your arguments, why the new options does not cover your needs.

Thanks Lilith & Sanguinius

User avatar
garpu
Posts: 2026
Joined: Mon Oct 25, 2010 9:38 pm
Platform: Linux

Mon Feb 03, 2014 5:23 pm Post

Same thing's happening for me. I tried a fairly complicated page (http://tardis.wikia.com/wiki/Time_Lord) and a simple one. (http://www.ingoodtastestore.com/recipes ... squash.asp)
Slackware-current 64-bit, XFCE

User avatar
MimeticMouton
Posts: 8722
Joined: Wed May 05, 2010 5:39 am
Platform: Mac + Windows
Location: city of rain
Contact:

Mon Feb 03, 2014 6:42 pm Post

The butternut squash recipe imports as PDF for me without issue; what result are you getting?

For the others, something you can try meanwhile is importing as MHT, then opening that in Chrome and trying the Print to PDF option from there and importing the PDF to Scrivener. Depending on the page set up, that doesn't always work--some web elements in a page just don't convert well to PDF--but I've had a few occasions where that has given a somewhat better result.

Internet Explorer will be the default for opening MHT files unless you set it to something else. Chrome can read these natively; other browsers might need a plug-in to read them. You can change the default program for the .mht extension by dragging out the downloaded MHT file from Scrivener's binder to the desktop, then right-clicking it and choosing Open With... then "Choose default program". Select the program you want to use and open it, leaving "Always use the selected program to open this kind of file" checked.
Jennifer Hughes
(MM for short)

Li
Lilith
Posts: 351
Joined: Sat Nov 13, 2010 3:50 pm
Platform: Windows
Location: Belgium

Mon Feb 03, 2014 9:37 pm Post

Hi,

I get garpu's recipe fine in PDF.
The tardis.wikia file gives me the same result as the other page I tried : Nuts in both options.

This one works in PDF (not in MHT) : http://www.literatureandlatte.com/forum ... m.php?f=47

I have also tried this one (as MHT) : http://www.humanosphere.info/2013/11/il ... en-8-mois/

It took a long time before I decided to cancel.
I got a message saying that import failed, and then Scrivener simply closed both projects that were open, did not show start panel and gave no information about a crash.


As you ask for it, Jennifer, I do prefer the option where the page can be viewed offline as if it were the original. Wether I do have the pictures or not is not that important.
My favourite option has always been the "html" (thus not the viewer).

User avatar
garpu
Posts: 2026
Joined: Mon Oct 25, 2010 9:38 pm
Platform: Linux

Tue Feb 04, 2014 3:37 am Post

Ah OK...I shold've read MM's first response before posting. Yeah, I get a link, but wasn't actually getting a rendered web page. PDFs are blank for both.
Slackware-current 64-bit, XFCE

Bi
BigBearGeek
Posts: 1
Joined: Tue May 14, 2013 10:06 pm
Platform: Windows

Sat May 17, 2014 5:32 am Post

Hi All,

I had the same problem - win 8.1 / Bitdefender antivirus. :roll:

The problem was Bitdefender blocking 2 programs when i tried to import the PDF from internet explorer

Once i allowed the programs to run, they worked!!!!! :mrgreen:

Anyway, I'm so jazzed I can import to PDF for research - it really helps me a lot.

BTW -- I almost didn't notice the antivirus block and yours may block and not notify so it's something to check on.

Hope this helps someone

Mark