nzz-downloader/README.md

2.5 KiB

NZZ Downloader

The NZZ is the Swiss Swiss newspaper of record. Its first issue was all the way back in 1780. Even better you can download every single issue ever released (if you have a subscription of course).

This little tool helps with downloading all released issues in a specified time span.

It was written because the archive website is not very friendly and because it is not possible to download everything within a specified time span.

screenshot

Installation

You need to be comfortable with the command line to use the nzz downloader and it has only been tested on *nix systems though it should work fine on Windows or macOS.

Prerequisites

Build the binaries with cargo: cargo build --release (they are created as target/release/{nzz-cookie,nzz-download}).

Usage

Authentication

The downloader needs an authentication cookie to work. With the binary nzz-cookie (and the huge help of the webdriver api + geckodriver) this can be generated.

Usage: nzz-cookie --username <USERNAME>

Options:
  -u, --username <USERNAME>  Username [env: USERNAME=]
  -h, --help                 Print help
  -V, --version              Print version

Provide the password from stdin.

The resulting cookie is printed to stdout.

Download

Usage: nzz-download [OPTIONS] --from <FROM> --to <TO>

Options:
  -f, --from <FROM>              Earliest issue to download (like 1780-12-31) [env: FROM=]
  -t, --to <TO>                  Latest issue to download (like 1780-12-31) [env: TO=]
  -o, --output-dir <OUTPUT_DIR>  Output directory [env: OUTPUT_DIR=] [default: ./nzz]
  -h, --help                     Print help
  -V, --version                  Print version

Provide the authentication cookie from stdin.

Example

Login and use the resulting cookie to download all issues from 2024-06-01 until 2024-06-05 to the default directory "./nzz", reading the password from the file pw:

nzz-cookie -u 'myuser@example.com' <pw | nzz-download -f 2024-06-01 -t 2024-06-05

Caveats

There are no retries on a failed downloads so far, it just crashes. Stemming from that I would advise not to try and download big ranges at once until that is fixed.

License

Licensed as AGPL-3.0.