2020-12-23 21:21:47 +00:00
|
|
|
# NZZ Downloader
|
2024-07-02 06:38:16 +00:00
|
|
|
|
2020-12-23 21:21:47 +00:00
|
|
|
The [NZZ](https://en.wikipedia.org/wiki/Neue_Z%C3%BCrcher_Zeitung) is the Swiss
|
2024-07-02 06:38:16 +00:00
|
|
|
Swiss newspaper of record. Its first issue was all the way back in 1780. It's
|
|
|
|
even better that you can download every single issue ever released (if you have
|
|
|
|
a subscription of course).
|
2020-12-23 21:21:47 +00:00
|
|
|
|
|
|
|
This little tool helps you with downloading all released issues in a specified
|
|
|
|
time span.
|
|
|
|
|
|
|
|
It was written because the archive website is not very friendly in the author's
|
2024-07-02 06:38:16 +00:00
|
|
|
opinion and of course because it is not possible to download everything in a
|
|
|
|
time span.
|
2020-12-23 21:21:47 +00:00
|
|
|
|
|
|
|
Because the archive website makes heavy use of javascript this is done with
|
2024-07-02 06:38:16 +00:00
|
|
|
[selenium](https://www.selenium.dev/) to remote control a browser (firefox in
|
2020-12-23 21:21:47 +00:00
|
|
|
this case). This is also why it is not all that fast but that is ok.
|
|
|
|
|
|
|
|
Please only use this with your own credentials, the journalists deserve to be
|
|
|
|
paid for their work.
|
|
|
|
|
2020-12-23 22:07:53 +00:00
|
|
|
![screenshot](screenshot.jpg)
|
|
|
|
|
2020-12-23 21:21:47 +00:00
|
|
|
## Installation
|
2024-07-02 06:38:16 +00:00
|
|
|
|
2020-12-23 21:21:47 +00:00
|
|
|
You need to be comfortable with the command line to use the nzz downloader and
|
|
|
|
it has only been tested on linux systems though it should work fine on Windows
|
|
|
|
or macOS.
|
|
|
|
|
|
|
|
- [NodeJS](https://nodejs.org/en/download/) (the LTS version is fine)
|
|
|
|
- [Firefox](https://www.mozilla.org/en-US/firefox/download/thanks/)
|
|
|
|
- [geckodriver](https://github.com/mozilla/geckodriver/releases)
|
2020-12-23 21:27:26 +00:00
|
|
|
- [nzz.js](https://code.vanwa.ch/sebastian/nzz-downloader/-/releases)
|
2020-12-23 21:21:47 +00:00
|
|
|
|
|
|
|
## Usage
|
2024-07-02 06:38:16 +00:00
|
|
|
|
2020-12-23 21:21:47 +00:00
|
|
|
```
|
|
|
|
Usage: nzz.js -f [date] -t [date] -o [path] -u [usernane] -p [password]
|
|
|
|
|
|
|
|
Options:
|
|
|
|
--version Show version number [boolean]
|
|
|
|
-h, --help Show help [boolean]
|
|
|
|
-f, --from Earliest issue to download. [default: "2020-12-23"]
|
|
|
|
-t, --to Latest issue to download. [default: "2020-12-23"]
|
|
|
|
-o, --out Download directory. [default: "./nzz"]
|
|
|
|
-u, --user Username for the nzz archive. [required]
|
|
|
|
-p, --password Password for the user. [required]
|
|
|
|
```
|
|
|
|
|
2024-07-02 06:38:16 +00:00
|
|
|
### Examples
|
|
|
|
|
|
|
|
Download all existing issues from 1780-01-01 until 1780-02-30 to the default
|
2020-12-23 21:21:47 +00:00
|
|
|
directory "./nzz"
|
2024-07-02 06:38:16 +00:00
|
|
|
|
2020-12-23 21:21:47 +00:00
|
|
|
```
|
|
|
|
./nzz.js -u 'myuser@example.com' -p 'mypassword' -f 1780-01-01 -t 1780-02-30
|
|
|
|
```
|
2020-12-23 21:27:26 +00:00
|
|
|
|
|
|
|
## Caveats
|
|
|
|
|
2024-07-02 06:38:16 +00:00
|
|
|
You need a good internet connection, as the program only waits a couple seconds
|
|
|
|
until a download of an issue can start. This is something that is hard to solve
|
|
|
|
unfortunately.
|
|
|
|
|
|
|
|
If you get strange errors about elements not being visible, wait a bit and try
|
|
|
|
again, it's usually a network problem.
|
|
|
|
|
|
|
|
The proper way of doing this would be to figure out how the calls to the backend
|
|
|
|
work and do that instead of using the heavy handed approach of instrumenting a
|
|
|
|
browser.
|
2020-12-23 21:58:47 +00:00
|
|
|
|
2020-12-23 21:27:26 +00:00
|
|
|
## Licence
|
2024-07-02 06:38:16 +00:00
|
|
|
|
2020-12-23 22:09:26 +00:00
|
|
|
Licensed as [MPL 2.0](https://www.mozilla.org/en-US/MPL/2.0/).
|