2020-12-23 21:21:47 +00:00
|
|
|
# NZZ Downloader
|
2024-07-02 06:38:16 +00:00
|
|
|
|
2020-12-23 21:21:47 +00:00
|
|
|
The [NZZ](https://en.wikipedia.org/wiki/Neue_Z%C3%BCrcher_Zeitung) is the Swiss
|
2024-07-03 10:32:47 +00:00
|
|
|
Swiss newspaper of record. Its first issue was all the way back in 1780. Even
|
|
|
|
better you can download every single issue ever released (if you have a
|
|
|
|
subscription of course).
|
2020-12-23 21:21:47 +00:00
|
|
|
|
2024-07-03 10:32:47 +00:00
|
|
|
This little tool helps with downloading all released issues in a specified time
|
|
|
|
span.
|
2020-12-23 21:21:47 +00:00
|
|
|
|
2024-07-03 10:32:47 +00:00
|
|
|
It was written because the archive website is not very friendly and because it
|
|
|
|
is not possible to download everything within a specified time span.
|
2020-12-23 21:21:47 +00:00
|
|
|
|
2020-12-23 22:07:53 +00:00
|
|
|
![screenshot](screenshot.jpg)
|
|
|
|
|
2020-12-23 21:21:47 +00:00
|
|
|
## Installation
|
2024-07-02 06:38:16 +00:00
|
|
|
|
2020-12-23 21:21:47 +00:00
|
|
|
You need to be comfortable with the command line to use the nzz downloader and
|
2024-07-03 10:32:47 +00:00
|
|
|
it has only been tested on *nix systems though it should work fine on Windows or
|
|
|
|
macOS.
|
|
|
|
|
|
|
|
### Prerequisites
|
2020-12-23 21:21:47 +00:00
|
|
|
|
|
|
|
- [Firefox](https://www.mozilla.org/en-US/firefox/download/thanks/)
|
|
|
|
- [geckodriver](https://github.com/mozilla/geckodriver/releases)
|
2024-07-03 10:32:47 +00:00
|
|
|
- [rust](https://www.rust-lang.org/learn/get-started)
|
|
|
|
|
|
|
|
Build the binaries with cargo: `cargo build --release` (they are created as
|
|
|
|
`target/release/{nzz-cookie,nzz-download}`).
|
2020-12-23 21:21:47 +00:00
|
|
|
|
|
|
|
## Usage
|
2024-07-02 06:38:16 +00:00
|
|
|
|
2024-07-03 10:32:47 +00:00
|
|
|
### Authentication
|
|
|
|
|
|
|
|
The downloader needs an authentication cookie to work. With the binary
|
|
|
|
`nzz-cookie` (and the huge help of the webdriver api + geckodriver) this can be
|
|
|
|
generated.
|
|
|
|
|
2020-12-23 21:21:47 +00:00
|
|
|
```
|
2024-07-03 10:32:47 +00:00
|
|
|
Usage: nzz-cookie --username <USERNAME>
|
2020-12-23 21:21:47 +00:00
|
|
|
|
|
|
|
Options:
|
2024-07-03 10:32:47 +00:00
|
|
|
-u, --username <USERNAME> Username [env: USERNAME=]
|
|
|
|
-h, --help Print help
|
|
|
|
-V, --version Print version
|
|
|
|
|
|
|
|
Provide the password from stdin.
|
2020-12-23 21:21:47 +00:00
|
|
|
```
|
|
|
|
|
2024-07-03 10:32:47 +00:00
|
|
|
The resulting cookie is printed to stdout.
|
2024-07-02 06:38:16 +00:00
|
|
|
|
2024-07-03 10:32:47 +00:00
|
|
|
### Download
|
2024-07-02 06:38:16 +00:00
|
|
|
|
2020-12-23 21:21:47 +00:00
|
|
|
```
|
2024-07-03 10:32:47 +00:00
|
|
|
Usage: nzz-download [OPTIONS] --from <FROM> --to <TO>
|
|
|
|
|
|
|
|
Options:
|
|
|
|
-f, --from <FROM> Earliest issue to download (like 1780-12-31) [env: FROM=]
|
|
|
|
-t, --to <TO> Latest issue to download (like 1780-12-31) [env: TO=]
|
|
|
|
-o, --output-dir <OUTPUT_DIR> Output directory [env: OUTPUT_DIR=] [default: ./nzz]
|
|
|
|
-h, --help Print help
|
|
|
|
-V, --version Print version
|
2020-12-23 21:27:26 +00:00
|
|
|
|
2024-07-03 10:32:47 +00:00
|
|
|
Provide the authentication cookie from stdin.
|
|
|
|
```
|
2020-12-23 21:27:26 +00:00
|
|
|
|
2024-07-03 10:32:47 +00:00
|
|
|
### Example
|
2024-07-02 06:38:16 +00:00
|
|
|
|
2024-07-03 10:32:47 +00:00
|
|
|
Login and use the resulting cookie to download all issues from 2024-06-01 until
|
|
|
|
2024-06-05 to the default directory "./nzz", reading the password from the file
|
|
|
|
`pw`:
|
2024-07-02 06:38:16 +00:00
|
|
|
|
2024-07-03 10:32:47 +00:00
|
|
|
```
|
|
|
|
nzz-cookie -u 'myuser@example.com' <pw | nzz-download -f 2024-06-01 -t 2024-06-05
|
|
|
|
```
|
2020-12-23 21:58:47 +00:00
|
|
|
|
2024-07-03 11:43:08 +00:00
|
|
|
## Caveats
|
|
|
|
|
2024-07-04 06:06:33 +00:00
|
|
|
There are no retries on a failed downloads so far, it just crashes. Stemming
|
|
|
|
from that I would advise not to try and download big ranges at once until that
|
|
|
|
is fixed.
|
2024-07-03 11:43:08 +00:00
|
|
|
|
2024-07-03 10:32:47 +00:00
|
|
|
## License
|
2024-07-02 06:38:16 +00:00
|
|
|
|
2024-07-03 10:32:47 +00:00
|
|
|
Licensed as [AGPL-3.0](https://www.gnu.org/licenses/agpl-3.0.html).
|