If You initiate litigation against any entity by asserting a patent -infringement claim (excluding declaratory judgment actions, -counter-claims, and cross-claims) alleging that a Contributor Version -directly or indirectly infringes any patent, then the rights granted to -You by any and all Contributors for the Covered Software under Section -2.1 of this License shall terminate. - -5.3. In the event of termination under Sections 5.1 or 5.2 above, all -end user license agreements (excluding distributors and resellers) which -have been validly granted by You or Your distributors under this License -prior to termination shall survive termination. - -************************************************************************ -* * -* 6. Disclaimer of Warranty * -* ------------------------- * -* * -* Covered Software is provided under this License on an "as is" * -* basis, without warranty of any kind, either expressed, implied, or * -* statutory, including, without limitation, warranties that the * -* Covered Software is free of defects, merchantable, fit for a * -* particular purpose or non-infringing. The entire risk as to the * -* quality and performance of the Covered Software is with You. * -* Should any Covered Software prove defective in any respect, You * -* (not any Contributor) assume the cost of any necessary servicing, * -* repair, or correction. This disclaimer of warranty constitutes an * -* essential part of this License. No use of any Covered Software is * -* authorized under this License except under this disclaimer. * -* * -************************************************************************ - -************************************************************************ -* * -* 7. Limitation of Liability * -* -------------------------- * -* * -* Under no circumstances and under no legal theory, whether tort * -* (including negligence), contract, or otherwise, shall any * -* Contributor, or anyone who distributes Covered Software as * -* permitted above, be liable to You for any direct, indirect, * -* special, incidental, or consequential damages of any character * -* including, without limitation, damages for lost profits, loss of * -* goodwill, work stoppage, computer failure or malfunction, or any * -* and all other commercial damages or losses, even if such party * -* shall have been informed of the possibility of such damages. This * -* limitation of liability shall not apply to liability for death or * -* personal injury resulting from such party's negligence to the * -* extent applicable law prohibits such limitation. Some * -* jurisdictions do not allow the exclusion or limitation of * -* incidental or consequential damages, so this exclusion and * -* limitation may not apply to You. * -* * -************************************************************************ - -8. Litigation -------------- - -Any litigation relating to this License may be brought only in the -courts of a jurisdiction where the defendant maintains its principal -place of business and such litigation shall be governed by laws of that -jurisdiction, without reference to its conflict-of-law provisions. -Nothing in this Section shall prevent a party's ability to bring -cross-claims or counter-claims. - -9. Miscellaneous ----------------- - -This License represents the complete agreement concerning the subject -matter hereof. If any provision of this License is held to be -unenforceable, such provision shall be reformed only to the extent -necessary to make it enforceable. Any law or regulation which provides -that the language of a contract shall be construed against the drafter -shall not be used to construe this License against a Contributor. - -10. Versions of the License ---------------------------- - -10.1. New Versions - -Mozilla Foundation is the license steward. Except as provided in Section -10.3, no one other than the license steward has the right to modify or -publish new versions of this License. Each version will be given a -distinguishing version number. - -10.2. Effect of New Versions - -You may distribute the Covered Software under the terms of the version -of the License under which You originally received the Covered Software, -or under the terms of any subsequent version published by the license -steward. - -10.3. Modified Versions - -If you create software not governed by this License, and you want to -create a new license for such software, you may create and use a -modified version of this License if you rename the license and remove -any references to the name of the license steward (except to note that -such modified license differs from this License). - -10.4. Distributing Source Code Form that is Incompatible With Secondary -Licenses - -If You choose to distribute Source Code Form that is Incompatible With -Secondary Licenses under the terms of this version of the License, the -notice described in Exhibit B of this License must be attached. - -Exhibit A - Source Code Form License Notice -------------------------------------------- - - This Source Code Form is subject to the terms of the Mozilla Public - License, v. 2.0. If a copy of the MPL was not distributed with this - file, You can obtain one at http://mozilla.org/MPL/2.0/. - -If it is not possible or desirable to put the notice in a particular -file, then You may include the notice in a location (such as a LICENSE -file in a relevant directory) where a recipient would be likely to look -for such a notice. - -You may add additional accurate notices of copyright ownership. - -Exhibit B - "Incompatible With Secondary Licenses" Notice ---------------------------------------------------------- - - This Source Code Form is "Incompatible With Secondary Licenses", as - defined by the Mozilla Public License, v. 2.0. diff --git a/README.md b/README.md index da40a92..198e3ad 100644 --- a/README.md +++ b/README.md @@ -1,74 +1,79 @@ # NZZ Downloader The [NZZ](https://en.wikipedia.org/wiki/Neue_Z%C3%BCrcher_Zeitung) is the Swiss -Swiss newspaper of record. Its first issue was all the way back in 1780. It's -even better that you can download every single issue ever released (if you have -a subscription of course). +Swiss newspaper of record. Its first issue was all the way back in 1780. Even +better you can download every single issue ever released (if you have a +subscription of course). -This little tool helps you with downloading all released issues in a specified -time span. +This little tool helps with downloading all released issues in a specified time +span. -It was written because the archive website is not very friendly in the author's -opinion and of course because it is not possible to download everything in a -time span. - -Because the archive website makes heavy use of javascript this is done with -[selenium](https://www.selenium.dev/) to remote control a browser (firefox in -this case). This is also why it is not all that fast but that is ok. - -Please only use this with your own credentials, the journalists deserve to be -paid for their work. +It was written because the archive website is not very friendly and because it +is not possible to download everything within a specified time span. ![screenshot](screenshot.jpg) ## Installation You need to be comfortable with the command line to use the nzz downloader and -it has only been tested on linux systems though it should work fine on Windows -or macOS. +it has only been tested on *nix systems though it should work fine on Windows or +macOS. + +### Prerequisites -- [NodeJS](https://nodejs.org/en/download/) (the LTS version is fine) - [Firefox](https://www.mozilla.org/en-US/firefox/download/thanks/) - [geckodriver](https://github.com/mozilla/geckodriver/releases) -- [nzz.js](https://code.vanwa.ch/sebastian/nzz-downloader/-/releases) +- [rust](https://www.rust-lang.org/learn/get-started) + +Build the binaries with cargo: `cargo build --release` (they are created as +`target/release/{nzz-cookie,nzz-download}`). ## Usage +### Authentication + +The downloader needs an authentication cookie to work. With the binary +`nzz-cookie` (and the huge help of the webdriver api + geckodriver) this can be +generated. + ``` -Usage: nzz.js -f [date] -t [date] -o [path] -u [usernane] -p [password] +Usage: nzz-cookie --username Options: - --version Show version number [boolean] - -h, --help Show help [boolean] - -f, --from Earliest issue to download. [default: "2020-12-23"] - -t, --to Latest issue to download. [default: "2020-12-23"] - -o, --out Download directory. [default: "./nzz"] - -u, --user Username for the nzz archive. [required] - -p, --password Password for the user. [required] + -u, --username Username [env: USERNAME=] + -h, --help Print help + -V, --version Print version + +Provide the password from stdin. ``` -### Examples +The resulting cookie is printed to stdout. -Download all existing issues from 1780-01-01 until 1780-02-30 to the default -directory "./nzz" +### Download ``` -./nzz.js -u 'myuser@example.com' -p 'mypassword' -f 1780-01-01 -t 1780-02-30 +Usage: nzz-download [OPTIONS] --from --to + +Options: + -f, --from Earliest issue to download (like 1780-12-31) [env: FROM=] + -t, --to Latest issue to download (like 1780-12-31) [env: TO=] + -o, --output-dir Output directory [env: OUTPUT_DIR=] [default: ./nzz] + -h, --help Print help + -V, --version Print version + +Provide the authentication cookie from stdin. ``` -## Caveats +### Example -You need a good internet connection, as the program only waits a couple seconds -until a download of an issue can start. This is something that is hard to solve -unfortunately. +Login and use the resulting cookie to download all issues from 2024-06-01 until +2024-06-05 to the default directory "./nzz", reading the password from the file +`pw`: -If you get strange errors about elements not being visible, wait a bit and try -again, it's usually a network problem. +``` +nzz-cookie -u 'myuser@example.com' ) -> Result>> { + if TcpStream::connect_timeout(&GECKODRIVER_HOST.parse()?, Duration::from_secs(2)).is_err() { + let handle = spawn(async move { + let mut child = Command::new(GECKODRIVER_BINARY) + .stdout(Stdio::null()) + .spawn() + .expect("failed to run binary"); + tokio::select! { + _ = stop_rx => { + child.kill().await.expect("Failed to kill process"); + } + result = child.wait() => { + result.expect("Child process wasn't running"); + } + } + }); + Ok(Some(handle)) + } else { + Ok(None) + } +} diff --git a/cookie/src/lib.rs b/cookie/src/lib.rs new file mode 100644 index 0000000..974d374 --- /dev/null +++ b/cookie/src/lib.rs @@ -0,0 +1,79 @@ +//! Login to the NZZ archive and print the authentication cookie for further usage. +//! +//! Leveraging webdriver + geckodriver. + +use std::time::Duration; + +use anyhow::Result; +use cli::Config; +use fantoccini::{elements::Element, Client, ClientBuilder, Locator}; +use serde_json::json; +use tokio::{sync::oneshot, time::sleep}; + +use crate::geckodriver::GECKODRIVER_HOST; + +pub mod cli; +pub mod geckodriver; + +const LOGIN_URL: &str = "https://zeitungsarchiv.nzz.ch/"; + +/// Entrypoint to login to the NZZ archive. +pub async fn run(args: Config, pw: &str) -> Result<()> { + let (stop_tx, stop_rx) = oneshot::channel(); + let driver_handle = geckodriver::run(stop_rx).await?; + + let driver_args = json!({ "moz:firefoxOptions": {"args": ["-headless"]} }); + + let client = ClientBuilder::native() + .capabilities(driver_args.as_object().unwrap().clone()) + .connect(&format!("http://{GECKODRIVER_HOST}")) + .await?; + + client.goto(LOGIN_URL).await?; + let login_button: Element = element_from_css(&client, ".fup-menu-login-container").await?; + sleep(Duration::from_millis(500)).await; + login_button.click().await?; + + let login_iframe = element_from_css(&client, r#"iframe[id^="piano""#).await?; + login_iframe.enter_frame().await?; + + let email_input = element_from_css(&client, r#"input[name="email"]"#).await?; + email_input.send_keys(&args.username).await?; + + let pw_input: Element = element_from_css(&client, r#"input[type="password"]"#).await?; + pw_input.send_keys(pw).await?; + + let submit = element_from_css(&client, r#"button[class="btn prime"]"#).await?; + submit.click().await?; + + let main_frame = client.window().await?; + client.switch_to_window(main_frame).await?; + + element_from_css(&client, ".fup-login-open.fup-button.fup-s-menu-login-open").await?; + + let cookies = client.get_all_cookies().await?; + let cobbled_cookies = cookies + .into_iter() + .map(|cookie| format!("{}={}", cookie.name(), cookie.value())) + .fold(String::new(), |mut acc, word| { + if !acc.is_empty() { + acc.push(';'); + } + acc.push_str(&word); + acc + }); + println!("{cobbled_cookies}"); + + client.close().await?; + + if let Some(driver_handle) = driver_handle { + let _ = stop_tx.send(()); + driver_handle.abort(); + } + + Ok(()) +} + +async fn element_from_css(client: &Client, selector: &str) -> Result { + Ok(client.wait().for_element(Locator::Css(selector)).await?) +} diff --git a/cookie/src/main.rs b/cookie/src/main.rs new file mode 100644 index 0000000..e199540 --- /dev/null +++ b/cookie/src/main.rs @@ -0,0 +1,24 @@ +use std::io::{self, Read}; + +use anyhow::Result; +use clap::Parser; +use nzz_cookie::cli::Config; + +#[tokio::main] +async fn main() -> Result<()> { + let args = Config::parse(); + + let pw = read_pw().unwrap_or_else(|_| panic!("Provide the password via stdin")); + nzz_cookie::run(args, &pw).await?; + Ok(()) +} + +/// Read password from stdin. +fn read_pw() -> Result { + let stdin = io::stdin(); + let mut buffer = String::new(); + + stdin.lock().read_to_string(&mut buffer)?; + let cookie +serde = { version = "1.0.203", features = ["derive"] } +serde_json = { workspace = true } +tempfile = "3.10.1" +tokio = { workspace = true } +time = { version = "0.3.36", features = ["macros", "serde", "formatting", "parsing" ] } +tracing = "0.1.40" +tracing-subscriber = "0.3.18" diff --git a/download/src/cli.rs b/download/src/cli.rs new file mode 100644 index 0000000..d68456a --- /dev/null +++ b/download/src/cli.rs @@ -0,0 +1,31 @@ +//! Cli interface. + +use std::path::PathBuf; + +use clap::Parser; +use time::error::Parse; +use time::Date; + +use crate::date::FORMAT; + +/// Parse a date prvided a a cli argument. +fn parse_date(input: &str) -> Result { + Date::parse(input, FORMAT) +} + +/// Download issues of the NZZ newspaper +#[derive(Parser)] +#[command(version, about, long_about = None, after_help = "Provide the authentication cookie from stdin.")] +pub struct Config { + /// Earliest issue to download (like 1780-12-31) + #[arg(short, long, env, value_parser=parse_date)] + pub from: Date, + + /// Latest issue to download (like 1780-12-31) + #[arg(short, long, env, value_parser=parse_date)] + pub to: Date, + + /// Output directory. + #[arg(short, long, env, default_value = "./nzz")] + pub output_dir: PathBuf, +} diff --git a/download/src/date.rs b/download/src/date.rs new file mode 100644 index 0000000..1c020df --- /dev/null +++ b/download/src/date.rs @@ -0,0 +1,27 @@ +//! Utilities for handling dates. + +use serde::{Deserialize, Deserializer, Serializer}; +use time::format_description::FormatItem; +use time::macros::format_description; +use time::Date; + +/// Date format for newspaper issues (YYYY-mm-dd) +pub const FORMAT: &[FormatItem<'_>] = format_description!("[year]-[month]-[day]"); + +/// Serialize a date to a String with serde. +pub fn serialize(value: &Date, serializer: S) -> Result +where + S: Serializer, +{ + let formated = value.format(&FORMAT).unwrap(); + serializer.serialize_str(&formated) +} + +/// Deserialize a String to a Date with serde. +pub fn deserialize<'de, D>(deserializer: D) -> Result +where + D: Deserializer<'de>, +{ + let s: &str = Deserialize::deserialize(deserializer)?; + Date::parse(s, FORMAT).map_err(serde::de::Error::custom) +} diff --git a/download/src/download.rs b/download/src/download.rs new file mode 100644 index 0000000..2885cdc --- /dev/null +++ b/download/src/download.rs @@ -0,0 +1,52 @@ +//! Handle downloads of newspaper issues. + +use std::{ + fs::{self}, + io::{Cursor, Read}, + path::Path, +}; + +use anyhow::Result; +use tracing::{debug, info}; + +use crate::{nzz::Issue, pdf}; + +/// Download all pages of the provided `issues` and save them merged to the directory `output_dir`. +/// +/// Create `output_dir` if it does not exist. +pub async fn fetch(issues: Vec, output_dir: &Path) -> Result<()> { + debug!("ensuring {output_dir:?} exists"); + fs::create_dir_all(output_dir)?; + + for issue in issues { + info!("saving issue {}", issue.publication_date); + + let tmp_dir = tempfile::tempdir()?; + let mut pages = Vec::new(); + for (i, page) in issue.pages.into_iter().enumerate() { + debug!( + "fetching issue {}, page {}: {page}", + issue.publication_date, + i + 1 + ); + + let response = reqwest::Client::new().get(page).send().await?; + let mut content = Cursor::new(response.bytes().await?); + let mut page_data = Vec::new(); + content.read_to_end(&mut page_data)?; + + let tmp_page = tmp_dir.path().join(i.to_string()); + fs::write(&tmp_page, page_data)?; + pages.push(tmp_page); + } + + let issue_name = format!("nzz_{}.pdf", issue.publication_date); + let issue_path = output_dir.join(issue_name); + let issue_title = format!("NZZ {}", issue.publication_date); + + pdf::merge(pages, &issue_path, &issue_title)?; + debug!("issue {} saved", issue.publication_date); + } + + Ok(()) +} diff --git a/download/src/lib.rs b/download/src/lib.rs new file mode 100644 index 0000000..419724f --- /dev/null +++ b/download/src/lib.rs @@ -0,0 +1,19 @@ +//! A small utility to download issues of the NZZ newspaper. + +use anyhow::Result; + +use cli::Config; + +pub mod cli; +pub mod date; +pub mod download; +pub mod nzz; +pub mod pdf; + +/// Entry point to download nzz issues. +pub async fn run(args: Config, cookie: &str) -> Result<()> { + let issues = nzz::fetch(cookie, args.from, args.to).await?; + download::fetch(issues, &args.output_dir).await?; + + Ok(()) +} diff --git a/download/src/main.rs b/download/src/main.rs new file mode 100644 index 0000000..3c8b88c --- /dev/null +++ b/download/src/main.rs @@ -0,0 +1,29 @@ +use std::io::{self, Read}; + +use anyhow::Result; +use clap::Parser; +use nzz_download::cli::Config; + +#[tokio::main] +async fn main() -> Result<()> { + if std::env::var_os("RUST_LOG").is_none() { + std::env::set_var("RUST_LOG", "info"); + } + + tracing_subscriber::fmt::init(); + + let args = Config::parse(); + let cookie = read_cookie().expect("Provide the authentication cookie via stdin"); + + nzz_download::run(args, &cookie).await +} + +/// read authentication cookie from stdin. +fn read_cookie() -> Result { + let stdin = io::stdin(); + let mut buffer = String::new(); + + stdin.lock().read_to_string(&mut buffer)?; + let cookie = buffer.trim(); + Ok(cookie.to_string()) +} diff --git a/download/src/nzz.rs b/download/src/nzz.rs new file mode 100644 index 0000000..4b1f88a --- /dev/null +++ b/download/src/nzz.rs @@ -0,0 +1,198 @@ +//! Handle information relating to NZZ issues. + +use anyhow::Result; +use serde::{Deserialize, Serialize}; +use time::Date; +use tracing::info; + +const SEARCH_URL: &str = "https://zeitungsarchiv.nzz.ch/solr-epaper-search/1.0/search"; +const ISSUE_URL: &str = "https://zeitungsarchiv.nzz.ch/archive/1.0/getPages"; + +#[derive(Debug, Serialize, Deserialize)] +struct SearchData { + query: String, + offset: u32, + #[serde(rename = "sortField")] + sort_field: String, + #[serde(rename = "sortOrder")] + sort_order: String, + #[serde( + rename = "startDate", + serialize_with = "crate::date::serialize", + deserialize_with = "crate::date::deserialize" + )] + start_date: Date, + #[serde( + rename = "endDate", + serialize_with = "crate::date::serialize", + deserialize_with = "crate::date::deserialize" + )] + end_date: Date, +} + +#[derive(Debug, Serialize, Deserialize)] +struct SearchResult { + data: SearchInfo, +} + +#[derive(Debug, Serialize, Deserialize)] +struct SearchInfo { + total: u32, + offset: u32, + #[serde(rename = "pageSize")] + page_size: u32, + #[serde(rename = "resData")] + res_data: Vec, +} + +#[derive(Debug, Serialize, Deserialize)] +struct IssueData { + #[serde(rename = "editionId")] + edition_id: u32, + #[serde(rename = "pageNumber")] + page_nr: u32, + #[serde( + rename = "publicationDate", + deserialize_with = "crate::date::deserialize" + )] + publication_date: Date, +} + +#[derive(Debug, Serialize, Deserialize)] +struct PagesResult { + data: PagesInfo, +} + +#[derive(Debug, Serialize, Deserialize)] +struct PagesInfo { + pages: Vec, +} + +#[derive(Debug, Serialize, Deserialize)] +struct Page { + #[serde(rename = "pmPageNumber")] + page_nr: u32, + #[serde(rename = "pageDocUrl")] + doc: PageDoc, +} + +#[derive(Debug, Serialize, Deserialize)] +struct PageDoc { + #[serde(rename = "HIGHRES")] + link: PageHighRes, +} + +#[derive(Debug, Serialize, Deserialize)] +struct PageHighRes { + url: String, +} + +/// A single NZZ issue. +#[derive(Debug, Clone)] +pub struct Issue { + /// Date of publication. + pub publication_date: Date, + /// ordered vector of page urls in the issue. + pub pages: Vec, +} + +impl SearchData { + pub fn new(offset: u32, start_date: Date, end_date: Date) -> Self { + Self { + query: "".to_string(), + offset, + sort_field: "media_ts".to_string(), + sort_order: "desc".to_string(), + start_date, + end_date, + } + } +} + +/// Search all issues between `from` and `to` (inclusive) using an `offset` into the results. +async fn offset_search(offset: u32, cookie: &str, from: Date, to: Date) -> Result { + let data = SearchData::new(offset, from, to); + let result: SearchResult = reqwest::Client::new() + .post(SEARCH_URL) + .header("Cookie", cookie) + .json(&data) + .send() + .await? + .json() + .await?; + Ok(result.data) +} + +/// Only keep first pages, they are enough to get the edition id. +fn filter_issues(unfiltered_issues: Vec) -> Vec { + unfiltered_issues + .into_iter() + .filter(|info| info.page_nr == 1) + .collect() +} + +/// Search all issues between `from` and `to` (inclusive) respecting pagination. +async fn search(cookie: &str, from: Date, to: Date) -> Result> { + info!("looking for issues between {from} to {to}"); + let mut result = offset_search(0, cookie, from, to).await?; + let mut issues: Vec = filter_issues(result.res_data); + + while result.offset + result.page_size < result.total { + result = offset_search(result.offset + result.page_size, cookie, from, to).await?; + issues.extend(filter_issues(result.res_data)); + } + + Ok(issues) +} + +/// Fetch all page urls for the issue with eddition id `edition_id` and order them by page number. +async fn build_pages(cookie: &str, edition_id: u32) -> Result> { + let result: PagesResult = reqwest::Client::new() + .post(ISSUE_URL) + .header("Cookie", cookie) + .json(&serde_json::json!({ + "editionId": edition_id, + })) + .send() + .await? + .json() + .await?; + + let mut pages: Vec<(u32, String)> = result + .data + .pages + .into_iter() + .map(|page| (page.page_nr, page.doc.link.url)) + .collect(); + pages.sort_by(|a, b| a.0.cmp(&b.0)); + let pages = pages.into_iter().map(|page| page.1).collect(); + + Ok(pages) +} + +/// Fetch all page urls for `issues`. +async fn build_issues(cookie: &str, issues: Vec) -> Result> { + let mut hydrated_issues = Vec::new(); + for issue in issues { + info!( + "fetching page information for issue {}", + issue.publication_date + ); + let pages = build_pages(cookie, issue.edition_id).await?; + hydrated_issues.push(Issue { + publication_date: issue.publication_date, + pages, + }); + } + + Ok(hydrated_issues) +} + +/// Fetch issue information in the date range `from`- `to` (inclusive) using `cookie`for +/// authentication. +pub async fn fetch(cookie: &str, from: Date, to: Date) -> Result> { + let issues = search(cookie, from, to).await?; + let issues = build_issues(cookie, issues).await?; + + Ok(issues) +} diff --git a/download/src/pdf.rs b/download/src/pdf.rs new file mode 100644 index 0000000..b5135c9 --- /dev/null +++ b/download/src/pdf.rs @@ -0,0 +1,177 @@ +//! Manipulate pdf documents. + +use std::{ + collections::BTreeMap, + path::{Path, PathBuf}, +}; + +use anyhow::Result; +use lopdf::{Dictionary, Document, Object, ObjectId}; + +const METADATA_TITLE: &str = "Title"; +const METADATA_PRODUCER: &str = "Producer"; +const PDF_VERSION: &str = "1.8"; +const PRODUCER: &str = "NZZ Downloader"; + +/// Merge the provided pdfs in the `input` vector to one pdf in `out`, setting its title to +/// `title`. +/// +/// The code is from https://github.com/J-F-Liu/lopdf/blob/6b04581640e061bfeb39b585e50a7e9d102b8fe2/examples/merge.rs +/// with some modifications. I have no clue about PDF structure and this is still a bit of a +/// mistery to me. +pub fn merge(input: Vec, out: &Path, title: &str) -> Result<()> { + let mut max_id = 1; + let mut documents_pages = BTreeMap::new(); + let mut documents_objects = BTreeMap::new(); + let mut merged_doc = Document::with_version(PDF_VERSION); + + for pdf in input { + let mut doc = Document::load(pdf)?; + + doc.renumber_objects_with(max_id); + + max_id = doc.max_id + 1; + + documents_pages.extend( + doc.get_pages() + .into_values() + .map(|object_id| (object_id, doc.get_object(object_id).unwrap().to_owned())) + .collect::>(), + ); + documents_objects.extend(doc.objects); + } + + let mut catalog_object: Option<(ObjectId, Object)> = None; + let mut pages_object: Option<(ObjectId, Object)> = None; + + // Process all objects except "Page" type + for (object_id, object) in documents_objects.iter() { + // We have to ignore "Page" (as are processed later), "Outlines" and "Outline" objects + // All other objects should be collected and inserted into the main Document + match object.type_name().unwrap_or("") { + "Catalog" => { + // Collect a first "Catalog" object and use it for the future "Pages" + catalog_object = Some(( + if let Some((id, _)) = catalog_object { + id + } else { + *object_id + }, + object.clone(), + )); + } + "Pages" => { + // Collect and update a first "Pages" object and use it for the future "Catalog" + // We have also to merge all dictionaries of the old and the new "Pages" object + if let Ok(dictionary) = object.as_dict() { + let mut dictionary = dictionary.clone(); + if let Some((_, ref object)) = pages_object { + if let Ok(old_dictionary) = object.as_dict() { + dictionary.extend(old_dictionary); + } + } + + pages_object = Some(( + if let Some((id, _)) = pages_object { + id + } else { + *object_id + }, + Object::Dictionary(dictionary), + )); + } + } + "Page" => {} // Ignored, processed later and separately + "Outlines" => {} // Ignored, not supported yet + "Outline" => {} // Ignored, not supported yet + _ => { + merged_doc.max_id += 1; + merged_doc.objects.insert(*object_id, object.clone()); + } + } + } + + for (object_id, object) in documents_pages.iter() { + if let Ok(dictionary) = object.as_dict() { + let mut dictionary = dictionary.clone(); + dictionary.set("Parent", pages_object.as_ref().unwrap().0); + + merged_doc + .objects + .insert(*object_id, Object::Dictionary(dictionary)); + } + } + + let catalog_object = catalog_object.unwrap(); + let pages_object = pages_object.unwrap(); + + // Build a new "Pages" with updated fields + if let Ok(dictionary) = pages_object.1.as_dict() { + let mut dictionary = dictionary.clone(); + + // Set new pages count + dictionary.set("Count", documents_pages.len() as u32); + + // Set new "Kids" list (collected from documents pages) for "Pages" + dictionary.set( + "Kids", + documents_pages + .into_keys() + .map(Object::Reference) + .collect::>(), + ); + + merged_doc + .objects + .insert(pages_object.0, Object::Dictionary(dictionary)); + } + + // Build a new "Catalog" with updated fields + if let Ok(dictionary) = catalog_object.1.as_dict() { + let mut dictionary = dictionary.clone(); + dictionary.set("Pages", pages_object.0); + dictionary.remove(b"Outlines"); // Outlines not supported in merged PDFs + + merged_doc + .objects + .insert(catalog_object.0, Object::Dictionary(dictionary)); + } + + merged_doc.trailer.set("Root", catalog_object.0); + + set_metadata(METADATA_TITLE, title, &mut merged_doc); + set_metadata(METADATA_PRODUCER, PRODUCER, &mut merged_doc); + + // Update the max internal ID as wasn't updated before due to direct objects insertion + merged_doc.max_id = merged_doc.objects.len() as u32; + + // Reorder all new Document objects + merged_doc.renumber_objects(); + + merged_doc.compress(); + merged_doc.save(out)?; + + Ok(()) +} + +/// Set metadata `key` to `value`. +/// +/// Add the `Info trailer to the pdf document if it does not yet exist.` +fn set_metadata(key: &str, value: &str, doc: &mut Document) { + let info_dict_id = match doc.trailer.get(b"Info") { + Ok(&Object::Reference(id)) => id, + _ => { + // without this the following add_object call overwrites an existing + // object at max_id + doc.max_id += 1; + + let id = doc.add_object(Dictionary::new()); + doc.trailer.set("Info", Object::Reference(id)); + id + } + }; + + if let Some(Object::Dictionary(ref mut info_dict)) = doc.objects.get_mut(&info_dict_id) { + info_dict.set(key, Object::string_literal(value)); + } +} diff --git a/flake.lock b/flake.lock index f1b9252..6998b46 100644 --- a/flake.lock +++ b/flake.lock @@ -1,5 +1,24 @@ { "nodes": { + "fenix": { + "inputs": { + "nixpkgs": "nixpkgs", + "rust-analyzer-src": "rust-analyzer-src" + }, + "locked": { + "lastModified": 1719901701, + "narHash": "sha256-7yztwIit3Ei6wySJDmRjhrP2VWwfoYifofwJPRXdjDQ=", + "owner": "nix-community", + "repo": "fenix", + "rev": "b7a33f57c6756e4c50f9c46189f8374841c764e8", + "type": "github" + }, + "original": { + "owner": "nix-community", + "repo": "fenix", + "type": "github" + } + }, "flake-utils": { "inputs": { "systems": "systems" @@ -19,6 +38,22 @@ } }, "nixpkgs": { + "locked": { + "lastModified": 1719690277, + "narHash": "sha256-0xSej1g7eP2kaUF+JQp8jdyNmpmCJKRpO12mKl/36Kc=", + "owner": "nixos", + "repo": "nixpkgs", + "rev": "2741b4b489b55df32afac57bc4bfd220e8bf617e", + "type": "github" + }, + "original": { + "owner": "nixos", + "ref": "nixos-unstable", + "repo": "nixpkgs", + "type": "github" + } + }, + "nixpkgs_2": { "locked": { "lastModified": 1719075281, "narHash": "sha256-CyyxvOwFf12I91PBWz43iGT1kjsf5oi6ax7CrvaMyAo=", @@ -36,8 +71,26 @@ }, "root": { "inputs": { + "fenix": "fenix", "flake-utils": "flake-utils", - "nixpkgs": "nixpkgs" + "nixpkgs": "nixpkgs_2" + } + }, + "rust-analyzer-src": { + "flake": false, + "locked": { + "lastModified": 1719842244, + "narHash": "sha256-QMWaT8yN6AeyUpbLMakahwdZhiNdYvlUVOd07zbi7ss=", + "owner": "rust-lang", + "repo": "rust-analyzer", + "rev": "1b283db47f8de1412c851c92bb4ce4ef039ff8ff", + "type": "github" + }, + "original": { + "owner": "rust-lang", + "ref": "nightly", + "repo": "rust-analyzer", + "type": "github" } }, "systems": { diff --git a/flake.nix b/flake.nix index 0bf17dc..72fa8d3 100644 --- a/flake.nix +++ b/flake.nix @@ -3,14 +3,21 @@ inputs = { nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable"; flake-utils.url = "github:numtide/flake-utils"; + fenix.url = "github:nix-community/fenix"; }; outputs = - { nixpkgs, flake-utils, ... }: + { + nixpkgs, + flake-utils, + fenix, + ... + }: flake-utils.lib.eachDefaultSystem ( system: let pkgs = import nixpkgs { inherit system; }; + rust = fenix.packages.${system}.stable; in { devShells.default = @@ -18,7 +25,9 @@ mkShell { buildInputs = [ geckodriver - nodejs_22 + rust.toolchain + cargo-deny + rust-analyzer ]; }; } diff --git a/nzz.js b/nzz.js deleted file mode 100755 index fd84830..0000000 --- a/nzz.js +++ /dev/null @@ -1,335 +0,0 @@ -#!/usr/bin/env node -/* Copyright (c) Sebastian Hugentobler 2024. - * - * This Source Code Form is subject to the terms of the Mozilla Public - * License, v. 2.0. If a copy of the MPL was not distributed with this - * file, You can obtain one at https://mozilla.org/MPL/2.0/. */ - -const tmp = require("tmp"); -const path = require("path"); -const fs = require("fs"); -const { Builder, By, Capabilities, Key, until } = require("selenium-webdriver"); -const fx = require("selenium-webdriver/firefox"); -const yargs = require("yargs/yargs"); -const { hideBin } = require("yargs/helpers"); - -const URL = "https://zeitungsarchiv.nzz.ch/"; -const WAIT_TIMEOUT = 10000; -const TIMEOUT_MSG = `Timeout after ${WAIT_TIMEOUT / 1000} seconds.`; -const SEARCH_WAIT_TIMEOUT = 15000; -const SEARCH_TIMEOUT_MSG = `Timeout after ${SEARCH_WAIT_TIMEOUT / 1000} seconds.`; -const DOWNLOAD_TIMEOUT = 20000; -const USER_AGENT = - "Mozilla/5.0 (X11; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0"; - -Date.prototype.isoDate = function () { - return `${this.getFullYear()}-${String(this.getMonth() + 1).padStart(2, "0")}-${String(this.getDate()).padStart(2, "0")}`; -}; - -Date.prototype.nzzDate = function () { - return `${String(this.getDate()).padStart(2, "0")}.${String(this.getMonth() + 1).padStart(2, "0")}.${this.getFullYear()}`; -}; - -Date.prototype.addDays = function (days) { - const date = new Date(this.valueOf()); - date.setDate(date.getDate() + days); - return date; -}; - -function sleep(ms) { - return new Promise((resolve) => { - setTimeout(resolve, ms); - }); -} - -/** - * Wait for an in progress download to finish and move the file to the correct - * destination. - * - * @param {fs.PathLike} tmpDir Download directory. - * @param {fs.PathLike} outDir Final destination directory. - * @param {Date} date Date of the issue. - */ -function moveDownload(tmpDir, outDir, date) { - return new Promise((resolve) => { - const srcFile = path.join( - tmpDir, - `Gesamtausgabe_NZZ_-_Neue_Zürcher_Zeitung_${date.isoDate()}.pdf`, - ); - - if (!fs.existsSync(srcFile)) { - setTimeout(() => moveDownload(tmpDir, outDir, date), 2000); - } - - const destFile = path.join(outDir, `${date.isoDate()}.pdf`); - try { - fs.copyFileSync(srcFile, destFile); - } catch { - // this means we tried to download a wrong issue - } - resolve(); - }); -} - -/** - * Enter the dates of the issue to download. - * - * @param {WebDriver} driver Selenium driver to use. - * @param {Date} date Date of the issue. - */ -async function enterDate(driver, date) { - const dateString = date.nzzDate(); - - const startDate = await driver.wait( - until.elementLocated(By.css("input.fup-s-date-start")), - WAIT_TIMEOUT, - TIMEOUT_MSG, - ); - await driver.wait(until.elementIsVisible(startDate), WAIT_TIMEOUT); - await driver.actions().scroll(0, 0, 0, 0, startDate).perform(); - await startDate.clear(); - await startDate.sendKeys(dateString); - - const endDate = await driver.wait( - until.elementLocated(By.css("input.fup-s-date-end")), - WAIT_TIMEOUT, - TIMEOUT_MSG, - ); - await endDate.clear(); - await endDate.sendKeys(dateString + Key.ENTER); - await sleep(500); -} - -/** - * Login to the NZZ archive. - * - * @param {WebDriver} driver Selenium driver to use. - * @param {String} user Username for the login. - * @param {String} password Password for the user. - */ -async function login(driver, user, password) { - console.log("logging in..."); - - await driver.get(URL); - await sleep(500); - - const loginButton = await driver.findElement( - By.css(".fup-menu-login-container"), - ); - await driver.wait(until.elementIsVisible(loginButton), WAIT_TIMEOUT); - await loginButton.click(); - - const iframe = await driver.findElement(By.css('iframe[id^="piano"]')); - await driver.switchTo().frame(iframe); - - const emailField = await driver.wait( - until.elementLocated(By.css('input[name="email"]')), - WAIT_TIMEOUT, - TIMEOUT_MSG, - ); - - await driver.wait(until.elementIsVisible(emailField), WAIT_TIMEOUT); - await emailField.sendKeys(user); - - const pwField = await driver.wait( - until.elementLocated(By.css('input[type="password"]')), - WAIT_TIMEOUT, - TIMEOUT_MSG, - ); - await pwField.sendKeys(password); - - const submitButton = await driver.wait( - until.elementLocated(By.css('button[class="btn prime"]')), - WAIT_TIMEOUT, - TIMEOUT_MSG, - ); - await driver.wait(until.elementIsVisible(submitButton), WAIT_TIMEOUT); - await submitButton.click(); - await driver.switchTo().defaultContent(); - await sleep(500); - - const loginMenu = await driver.wait( - until.elementLocated( - By.css(".fup-login-open.fup-button.fup-s-menu-login-open"), - ), - WAIT_TIMEOUT, - TIMEOUT_MSG, - ); - await driver.wait(until.elementIsVisible(loginMenu), WAIT_TIMEOUT); -} - -/** - * Start the download of a full issue. - * - * @param {WebDriver} driver Selenium driver to use. - */ -async function download(driver) { - const menu = await driver.wait( - until.elementLocated(By.css(".fup-menu-item-download")), - WAIT_TIMEOUT, - TIMEOUT_MSG, - ); - await menu.click(); - - const download = await driver.wait( - until.elementLocated(By.css(".fup-s-menu-download-edition-confirmation")), - WAIT_TIMEOUT, - TIMEOUT_MSG, - ); - await download.click(); - - const loadingMask = await driver.wait( - until.elementLocated(By.css(".fup-loading-mask")), - WAIT_TIMEOUT, - TIMEOUT_MSG, - ); - await driver.wait(until.elementIsVisible(loadingMask), DOWNLOAD_TIMEOUT); - await driver.wait(until.stalenessOf(loadingMask), DOWNLOAD_TIMEOUT); - - const back = await driver.wait( - until.elementLocated(By.css(".fup-s-menu-back")), - WAIT_TIMEOUT, - TIMEOUT_MSG, - ); - await driver.wait(until.elementIsVisible(back), WAIT_TIMEOUT); - // back.click(); - await sleep(500); -} - -/** - * Download all issues in a certain time span. - * - * @param {WebDriver} driver Selenium driver to use. - * @param {Date} from Earliest issue to download. - * @param {Date} to Latest issue to download. - * @param {fs.PathLike} tmpDir Download directory. - * @param {fs.PathLike} outDir Final destination directory. - */ -async function findIssues(driver, from, to, tmpDir, outDir) { - from = from.addDays(-1); - - while (from.toDateString() !== to.toDateString()) { - from = from.addDays(1); - console.log(`checking ${from.isoDate()}...`); - - await enterDate(driver, from); - - try { - const articles = await driver.wait( - until.elementsLocated(By.css(".fup-archive-result-item-article-title")), - SEARCH_WAIT_TIMEOUT, - SEARCH_TIMEOUT_MSG, - ); - - await articles[0].click(); - await sleep(500); - - await download(driver); - console.log(`\tdownloading...`); - await sleep(500); - - // do this in the background - moveDownload(tmpDir, outDir, from); - - await driver.get(URL); - } catch { - // this means there is no issue on the searched date - // move along with the next date - console.log(`\tno issues`); - } - } -} - -/** - * Setup the headless browser and download all issues in the specified time span. - * - * @param {Date} from Earliest issue to download. - * @param {Date} to Latest issue to download. - * @param {String} user Username for the nzz archive. - * @param {String} password Password for the user. - * @param {fs.PathLike} outDir Final destination directory. - */ -async function run(from, to, user, password, outDir) { - if (!fs.existsSync(outDir)) { - fs.mkdirSync(outDir); - } - - const tmpDir = tmp.dirSync(); - console.log(`downloading to ${outDir} (tmp dir: ${tmpDir.name})...`); - - const fxOptions = new fx.Options() - .addArguments("-headless") - .setPreference("pdfjs.disabled", true) - .setPreference("general.useragent.override", USER_AGENT) - .setPreference("browser.helperApps.neverAsk.openFile", "application/pdf") - .setPreference("browser.download.folderList", 2) - .setPreference("browser.download.manager.showWhenStartingout", false) - .setPreference("browser.download.dir", tmpDir.name) - .setPreference("browser.helperApps.neverAsk.saveToDisk", "application/pdf"); - - const caps = new Capabilities(); - caps.setPageLoadStrategy("normal"); - - const driver = await new Builder() - .withCapabilities(caps) - .forBrowser("firefox") - .setFirefoxOptions(fxOptions) - .build(); - - try { - await login(driver, user, password); - await findIssues(driver, from, to, tmpDir.name, outDir); - await sleep(1000); - } finally { - await fs.rm(tmpDir.name, { recursive: true }, (e) => { - if (e) { - console.error(`failed to remove tmp directory: ${e}`); - } - }); - await driver.quit(); - } -} - -/** - * Parse arguments and start the downloading of the issues. - */ -(function init() { - const now = new Date(); - const nowString = now.isoDate(); - - const argv = yargs(hideBin(process.argv)) - .usage( - "Usage: $0 -f [date] -t [date] -o [path] -u [usernane] -p [password]", - ) - .demandOption(["u", "p"]) - .help("h") - .describe("f", "Earliest issue to download.") - .describe("t", "Latest issue to download.") - .describe("o", "Download directory.") - .describe("u", "Username for the nzz archive.") - .describe("p", "Password for the user.") - .alias("h", "help") - .alias("f", "from") - .alias("t", "to") - .alias("o", "out") - .alias("u", "user") - .alias("p", "password") - .default("f", nowString) - .default("t", nowString) - .default("o", "./nzz") - .epilog("Copyright (c) Sebastian Hugentobler 2024") - .example( - "$0 -u 'myuser@example.com' -p 'mypassword' -f 1780-01-01 -t 1780-02-30", - 'Download all existing issues from 01-01-1780 until 30-02-1780 to the default directory "./nzz"', - ).argv; - - const from = new Date(argv.from); - const to = new Date(argv.to); - - if (from > to) { - console.error('"from" date must be before "to" date'); - process.exit(1); - } - - run(from, to, argv.user, argv.password, argv.out); -})(); diff --git a/package-lock.json b/package-lock.json deleted file mode 100644 index 9627b2c..0000000 --- a/package-lock.json +++ /dev/null @@ -1,528 +0,0 @@ -{ - "name": "nzz-downloader", - "version": "0.1.3", - "lockfileVersion": 2, - "requires": true, - "packages": { - "": { - "name": "nzz-downloader", - "version": "0.1.3", - "license": "MPL-2.0", - "dependencies": { - "selenium-webdriver": "4.22.0", - "tmp": "0.2.3", - "yargs": "17.7.2" - } - }, - "node_modules/ansi-regex": { - "version": "5.0.1", - "resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-5.0.1.tgz", - "integrity": "sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ==", - "engines": { - "node": ">=8" - } - }, - "node_modules/ansi-styles": { - "version": "4.3.0", - "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-4.3.0.tgz", - "integrity": "sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg==", - "dependencies": { - "color-convert": "^2.0.1" - }, - "engines": { - "node": ">=8" - }, - "funding": { - "url": "https://github.com/chalk/ansi-styles?sponsor=1" - } - }, - "node_modules/cliui": { - "version": "8.0.1", - "resolved": "https://registry.npmjs.org/cliui/-/cliui-8.0.1.tgz", - "integrity": "sha512-BSeNnyus75C4//NQ9gQt1/csTXyo/8Sb+afLAkzAptFuMsod9HFokGNudZpi/oQV73hnVK+sR+5PVRMd+Dr7YQ==", - "dependencies": { - "string-width": "^4.2.0", - "strip-ansi": "^6.0.1", - "wrap-ansi": "^7.0.0" - }, - "engines": { - "node": ">=12" - } - }, - "node_modules/color-convert": { - "version": "2.0.1", - "resolved": "https://registry.npmjs.org/color-convert/-/color-convert-2.0.1.tgz", - "integrity": "sha512-RRECPsj7iu/xb5oKYcsFHSppFNnsj/52OVTRKb4zP5onXwVF3zVmmToNcOfGC+CRDpfK/U584fMg38ZHCaElKQ==", - "dependencies": { - "color-name": "~1.1.4" - }, - "engines": { - "node": ">=7.0.0" - } - }, - "node_modules/color-name": { - "version": "1.1.4", - "resolved": "https://registry.npmjs.org/color-name/-/color-name-1.1.4.tgz", - "integrity": "sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA==" - }, - "node_modules/core-util-is": { - "version": "1.0.3", - "resolved": "https://registry.npmjs.org/core-util-is/-/core-util-is-1.0.3.tgz", - "integrity": "sha512-ZQBvi1DcpJ4GDqanjucZ2Hj3wEO5pZDS89BWbkcrvdxksJorwUDDZamX9ldFkp9aw2lmBDLgkObEA4DWNJ9FYQ==" - }, - "node_modules/emoji-regex": { - "version": "8.0.0", - "resolved": "https://registry.npmjs.org/emoji-regex/-/emoji-regex-8.0.0.tgz", - "integrity": "sha512-MSjYzcWNOA0ewAHpz0MxpYFvwg6yjy1NG3xteoqz644VCo/RPgnr1/GGt+ic3iJTzQ8Eu3TdM14SawnVUmGE6A==" - }, - "node_modules/escalade": { - "version": "3.1.1", - "resolved": "https://registry.npmjs.org/escalade/-/escalade-3.1.1.tgz", - "integrity": "sha512-k0er2gUkLf8O0zKJiAhmkTnJlTvINGv7ygDNPbeIsX/TJjGJZHuh9B2UxbsaEkmlEo9MfhrSzmhIlhRlI2GXnw==", - "engines": { - "node": ">=6" - } - }, - "node_modules/get-caller-file": { - "version": "2.0.5", - "resolved": "https://registry.npmjs.org/get-caller-file/-/get-caller-file-2.0.5.tgz", - "integrity": "sha512-DyFP3BM/3YHTQOCUL/w0OZHR0lpKeGrxotcHWcqNEdnltqFwXVfhEBQ94eIo34AfQpo0rGki4cyIiftY06h2Fg==", - "engines": { - "node": "6.* || 8.* || >= 10.*" - } - }, - "node_modules/immediate": { - "version": "3.0.6", - "resolved": "https://registry.npmjs.org/immediate/-/immediate-3.0.6.tgz", - "integrity": "sha512-XXOFtyqDjNDAQxVfYxuF7g9Il/IbWmmlQg2MYKOH8ExIT1qg6xc4zyS3HaEEATgs1btfzxq15ciUiY7gjSXRGQ==" - }, - "node_modules/inherits": { - "version": "2.0.4", - "resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz", - "integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==" - }, - "node_modules/is-fullwidth-code-point": { - "version": "3.0.0", - "resolved": "https://registry.npmjs.org/is-fullwidth-code-point/-/is-fullwidth-code-point-3.0.0.tgz", - "integrity": "sha512-zymm5+u+sCsSWyD9qNaejV3DFvhCKclKdizYaJUuHA83RLjb7nSuGnddCHGv0hk+KY7BMAlsWeK4Ueg6EV6XQg==", - "engines": { - "node": ">=8" - } - }, - "node_modules/isarray": { - "version": "1.0.0", - "resolved": "https://registry.npmjs.org/isarray/-/isarray-1.0.0.tgz", - "integrity": "sha512-VLghIWNM6ELQzo7zwmcg0NmTVyWKYjvIeM83yjp0wRDTmUnrM678fQbcKBo6n2CJEF0szoG//ytg+TKla89ALQ==" - }, - "node_modules/jszip": { - "version": "3.10.1", - "resolved": "https://registry.npmjs.org/jszip/-/jszip-3.10.1.tgz", - "integrity": "sha512-xXDvecyTpGLrqFrvkrUSoxxfJI5AH7U8zxxtVclpsUtMCq4JQ290LY8AW5c7Ggnr/Y/oK+bQMbqK2qmtk3pN4g==", - "dependencies": { - "lie": "~3.3.0", - "pako": "~1.0.2", - "readable-stream": "~2.3.6", - "setimmediate": "^1.0.5" - } - }, - "node_modules/lie": { - "version": "3.3.0", - "resolved": "https://registry.npmjs.org/lie/-/lie-3.3.0.tgz", - "integrity": "sha512-UaiMJzeWRlEujzAuw5LokY1L5ecNQYZKfmyZ9L7wDHb/p5etKaxXhohBcrw0EYby+G/NA52vRSN4N39dxHAIwQ==", - "dependencies": { - "immediate": "~3.0.5" - } - }, - "node_modules/pako": { - "version": "1.0.11", - "resolved": "https://registry.npmjs.org/pako/-/pako-1.0.11.tgz", - "integrity": "sha512-4hLB8Py4zZce5s4yd9XzopqwVv/yGNhV1Bl8NTmCq1763HeK2+EwVTv+leGeL13Dnh2wfbqowVPXCIO0z4taYw==" - }, - "node_modules/process-nextick-args": { - "version": "2.0.1", - "resolved": "https://registry.npmjs.org/process-nextick-args/-/process-nextick-args-2.0.1.tgz", - "integrity": "sha512-3ouUOpQhtgrbOa17J7+uxOTpITYWaGP7/AhoR3+A+/1e9skrzelGi/dXzEYyvbxubEF6Wn2ypscTKiKJFFn1ag==" - }, - "node_modules/readable-stream": { - "version": "2.3.8", - "resolved": "https://registry.npmjs.org/readable-stream/-/readable-stream-2.3.8.tgz", - "integrity": "sha512-8p0AUk4XODgIewSi0l8Epjs+EVnWiK7NoDIEGU0HhE7+ZyY8D1IMY7odu5lRrFXGg71L15KG8QrPmum45RTtdA==", - "dependencies": { - "core-util-is": "~1.0.0", - "inherits": "~2.0.3", - "isarray": "~1.0.0", - "process-nextick-args": "~2.0.0", - "safe-buffer": "~5.1.1", - "string_decoder": "~1.1.1", - "util-deprecate": "~1.0.1" - } - }, - "node_modules/require-directory": { - "version": "2.1.1", - "resolved": "https://registry.npmjs.org/require-directory/-/require-directory-2.1.1.tgz", - "integrity": "sha1-jGStX9MNqxyXbiNE/+f3kqam30I=", - "engines": { - "node": ">=0.10.0" - } - }, - "node_modules/safe-buffer": { - "version": "5.1.2", - "resolved": "https://registry.npmjs.org/safe-buffer/-/safe-buffer-5.1.2.tgz", - "integrity": "sha512-Gd2UZBJDkXlY7GbJxfsE8/nvKkUEU1G38c1siN6QP6a9PT9MmHB8GnpscSmMJSoF8LOIrt8ud/wPtojys4G6+g==" - }, - "node_modules/selenium-webdriver": { - "version": "4.22.0", - "resolved": "https://registry.npmjs.org/selenium-webdriver/-/selenium-webdriver-4.22.0.tgz", - "integrity": "sha512-GNbrkCHmy249ai885wgXqTfqL2lZnclUH/P8pwTDIqzyFxU3YhDiN7p/c9tMFA4NhgRdEBO2QCG+CWmG7xr/Mw==", - "dependencies": { - "jszip": "^3.10.1", - "tmp": "^0.2.3", - "ws": ">=8.16.0" - }, - "engines": { - "node": ">= 14.21.0" - } - }, - "node_modules/setimmediate": { - "version": "1.0.5", - "resolved": "https://registry.npmjs.org/setimmediate/-/setimmediate-1.0.5.tgz", - "integrity": "sha512-MATJdZp8sLqDl/68LfQmbP8zKPLQNV6BIZoIgrscFDQ+RsvK/BxeDQOgyxKKoh0y/8h3BqVFnCqQ/gd+reiIXA==" - }, - "node_modules/string_decoder": { - "version": "1.1.1", - "resolved": "https://registry.npmjs.org/string_decoder/-/string_decoder-1.1.1.tgz", - "integrity": "sha512-n/ShnvDi6FHbbVfviro+WojiFzv+s8MPMHBczVePfUpDJLwoLT0ht1l4YwBCbi8pJAveEEdnkHyPyTP/mzRfwg==", - "dependencies": { - "safe-buffer": "~5.1.0" - } - }, - "node_modules/string-width": { - "version": "4.2.3", - "resolved": "https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz", - "integrity": "sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==", - "dependencies": { - "emoji-regex": "^8.0.0", - "is-fullwidth-code-point": "^3.0.0", - "strip-ansi": "^6.0.1" - }, - "engines": { - "node": ">=8" - } - }, - "node_modules/strip-ansi": { - "version": "6.0.1", - "resolved": "https://registry.npmjs.org/strip-ansi/-/strip-ansi-6.0.1.tgz", - "integrity": "sha512-Y38VPSHcqkFrCpFnQ9vuSXmquuv5oXOKpGeT6aGrr3o3Gc9AlVa6JBfUSOCnbxGGZF+/0ooI7KrPuUSztUdU5A==", - "dependencies": { - "ansi-regex": "^5.0.1" - }, - "engines": { - "node": ">=8" - } - }, - "node_modules/tmp": { - "version": "0.2.3", - "resolved": "https://registry.npmjs.org/tmp/-/tmp-0.2.3.tgz", - "integrity": "sha512-nZD7m9iCPC5g0pYmcaxogYKggSfLsdxl8of3Q/oIbqCqLLIO9IAF0GWjX1z9NZRHPiXv8Wex4yDCaZsgEw0Y8w==", - "engines": { - "node": ">=14.14" - } - }, - "node_modules/util-deprecate": { - "version": "1.0.2", - "resolved": "https://registry.npmjs.org/util-deprecate/-/util-deprecate-1.0.2.tgz", - "integrity": "sha512-EPD5q1uXyFxJpCrLnCc1nHnq3gOa6DZBocAIiI2TaSCA7VCJ1UJDMagCzIkXNsUYfD1daK//LTEQ8xiIbrHtcw==" - }, - "node_modules/wrap-ansi": { - "version": "7.0.0", - "resolved": "https://registry.npmjs.org/wrap-ansi/-/wrap-ansi-7.0.0.tgz", - "integrity": "sha512-YVGIj2kamLSTxw6NsZjoBxfSwsn0ycdesmc4p+Q21c5zPuZ1pl+NfxVdxPtdHvmNVOQ6XSYG4AUtyt/Fi7D16Q==", - "dependencies": { - "ansi-styles": "^4.0.0", - "string-width": "^4.1.0", - "strip-ansi": "^6.0.0" - }, - "engines": { - "node": ">=10" - }, - "funding": { - "url": "https://github.com/chalk/wrap-ansi?sponsor=1" - } - }, - "node_modules/ws": { - "version": "8.17.1", - "resolved": "https://registry.npmjs.org/ws/-/ws-8.17.1.tgz", - "integrity": "sha512-6XQFvXTkbfUOZOKKILFG1PDK2NDQs4azKQl26T0YS5CxqWLgXajbPZ+h4gZekJyRqFU8pvnbAbbs/3TgRPy+GQ==", - "engines": { - "node": ">=10.0.0" - }, - "peerDependencies": { - "bufferutil": "^4.0.1", - "utf-8-validate": ">=5.0.2" - }, - "peerDependenciesMeta": { - "bufferutil": { - "optional": true - }, - "utf-8-validate": { - "optional": true - } - } - }, - "node_modules/y18n": { - "version": "5.0.5", - "resolved": "https://registry.npmjs.org/y18n/-/y18n-5.0.5.tgz", - "integrity": "sha512-hsRUr4FFrvhhRH12wOdfs38Gy7k2FFzB9qgN9v3aLykRq0dRcdcpz5C9FxdS2NuhOrI/628b/KSTJ3rwHysYSg==", - "engines": { - "node": ">=10" - } - }, - "node_modules/yargs": { - "version": "17.7.2", - "resolved": "https://registry.npmjs.org/yargs/-/yargs-17.7.2.tgz", - "integrity": "sha512-7dSzzRQ++CKnNI/krKnYRV7JKKPUXMEh61soaHKg9mrWEhzFWhFnxPxGl+69cD1Ou63C13NUPCnmIcrvqCuM6w==", - "dependencies": { - "cliui": "^8.0.1", - "escalade": "^3.1.1", - "get-caller-file": "^2.0.5", - "require-directory": "^2.1.1", - "string-width": "^4.2.3", - "y18n": "^5.0.5", - "yargs-parser": "^21.1.1" - }, - "engines": { - "node": ">=12" - } - }, - "node_modules/yargs-parser": { - "version": "21.1.1", - "resolved": "https://registry.npmjs.org/yargs-parser/-/yargs-parser-21.1.1.tgz", - "integrity": "sha512-tVpsJW7DdjecAiFpbIB1e3qxIQsE6NoPc5/eTdrbbIC4h0LVsWhnoa3g+m2HclBIujHzsxZ4VJVA+GUuc2/LBw==", - "engines": { - "node": ">=12" - } - } - }, - "dependencies": { - "ansi-regex": { - "version": "5.0.1", - "resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-5.0.1.tgz", - "integrity": "sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ==" - }, - "ansi-styles": { - "version": "4.3.0", - "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-4.3.0.tgz", - "integrity": "sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg==", - "requires": { - "color-convert": "^2.0.1" - } - }, - "cliui": { - "version": "8.0.1", - "resolved": "https://registry.npmjs.org/cliui/-/cliui-8.0.1.tgz", - "integrity": "sha512-BSeNnyus75C4//NQ9gQt1/csTXyo/8Sb+afLAkzAptFuMsod9HFokGNudZpi/oQV73hnVK+sR+5PVRMd+Dr7YQ==", - "requires": { - "string-width": "^4.2.0", - "strip-ansi": "^6.0.1", - "wrap-ansi": "^7.0.0" - } - }, - "color-convert": { - "version": "2.0.1", - "resolved": "https://registry.npmjs.org/color-convert/-/color-convert-2.0.1.tgz", - "integrity": "sha512-RRECPsj7iu/xb5oKYcsFHSppFNnsj/52OVTRKb4zP5onXwVF3zVmmToNcOfGC+CRDpfK/U584fMg38ZHCaElKQ==", - "requires": { - "color-name": "~1.1.4" - } - }, - "color-name": { - "version": "1.1.4", - "resolved": "https://registry.npmjs.org/color-name/-/color-name-1.1.4.tgz", - "integrity": "sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA==" - }, - "core-util-is": { - "version": "1.0.3", - "resolved": "https://registry.npmjs.org/core-util-is/-/core-util-is-1.0.3.tgz", - "integrity": "sha512-ZQBvi1DcpJ4GDqanjucZ2Hj3wEO5pZDS89BWbkcrvdxksJorwUDDZamX9ldFkp9aw2lmBDLgkObEA4DWNJ9FYQ==" - }, - "emoji-regex": { - "version": "8.0.0", - "resolved": "https://registry.npmjs.org/emoji-regex/-/emoji-regex-8.0.0.tgz", - "integrity": "sha512-MSjYzcWNOA0ewAHpz0MxpYFvwg6yjy1NG3xteoqz644VCo/RPgnr1/GGt+ic3iJTzQ8Eu3TdM14SawnVUmGE6A==" - }, - "escalade": { - "version": "3.1.1", - "resolved": "https://registry.npmjs.org/escalade/-/escalade-3.1.1.tgz", - "integrity": "sha512-k0er2gUkLf8O0zKJiAhmkTnJlTvINGv7ygDNPbeIsX/TJjGJZHuh9B2UxbsaEkmlEo9MfhrSzmhIlhRlI2GXnw==" - }, - "get-caller-file": { - "version": "2.0.5", - "resolved": "https://registry.npmjs.org/get-caller-file/-/get-caller-file-2.0.5.tgz", - "integrity": "sha512-DyFP3BM/3YHTQOCUL/w0OZHR0lpKeGrxotcHWcqNEdnltqFwXVfhEBQ94eIo34AfQpo0rGki4cyIiftY06h2Fg==" - }, - "immediate": { - "version": "3.0.6", - "resolved": "https://registry.npmjs.org/immediate/-/immediate-3.0.6.tgz", - "integrity": "sha512-XXOFtyqDjNDAQxVfYxuF7g9Il/IbWmmlQg2MYKOH8ExIT1qg6xc4zyS3HaEEATgs1btfzxq15ciUiY7gjSXRGQ==" - }, - "inherits": { - "version": "2.0.4", - "resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz", - "integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==" - }, - "is-fullwidth-code-point": { - "version": "3.0.0", - "resolved": "https://registry.npmjs.org/is-fullwidth-code-point/-/is-fullwidth-code-point-3.0.0.tgz", - "integrity": "sha512-zymm5+u+sCsSWyD9qNaejV3DFvhCKclKdizYaJUuHA83RLjb7nSuGnddCHGv0hk+KY7BMAlsWeK4Ueg6EV6XQg==" - }, - "isarray": { - "version": "1.0.0", - "resolved": "https://registry.npmjs.org/isarray/-/isarray-1.0.0.tgz", - "integrity": "sha512-VLghIWNM6ELQzo7zwmcg0NmTVyWKYjvIeM83yjp0wRDTmUnrM678fQbcKBo6n2CJEF0szoG//ytg+TKla89ALQ==" - }, - "jszip": { - "version": "3.10.1", - "resolved": "https://registry.npmjs.org/jszip/-/jszip-3.10.1.tgz", - "integrity": "sha512-xXDvecyTpGLrqFrvkrUSoxxfJI5AH7U8zxxtVclpsUtMCq4JQ290LY8AW5c7Ggnr/Y/oK+bQMbqK2qmtk3pN4g==", - "requires": { - "lie": "~3.3.0", - "pako": "~1.0.2", - "readable-stream": "~2.3.6", - "setimmediate": "^1.0.5" - } - }, - "lie": { - "version": "3.3.0", - "resolved": "https://registry.npmjs.org/lie/-/lie-3.3.0.tgz", - "integrity": "sha512-UaiMJzeWRlEujzAuw5LokY1L5ecNQYZKfmyZ9L7wDHb/p5etKaxXhohBcrw0EYby+G/NA52vRSN4N39dxHAIwQ==", - "requires": { - "immediate": "~3.0.5" - } - }, - "pako": { - "version": "1.0.11", - "resolved": "https://registry.npmjs.org/pako/-/pako-1.0.11.tgz", - "integrity": "sha512-4hLB8Py4zZce5s4yd9XzopqwVv/yGNhV1Bl8NTmCq1763HeK2+EwVTv+leGeL13Dnh2wfbqowVPXCIO0z4taYw==" - }, - "process-nextick-args": { - "version": "2.0.1", - "resolved": "https://registry.npmjs.org/process-nextick-args/-/process-nextick-args-2.0.1.tgz", - "integrity": "sha512-3ouUOpQhtgrbOa17J7+uxOTpITYWaGP7/AhoR3+A+/1e9skrzelGi/dXzEYyvbxubEF6Wn2ypscTKiKJFFn1ag==" - }, - "readable-stream": { - "version": "2.3.8", - "resolved": "https://registry.npmjs.org/readable-stream/-/readable-stream-2.3.8.tgz", - "integrity": "sha512-8p0AUk4XODgIewSi0l8Epjs+EVnWiK7NoDIEGU0HhE7+ZyY8D1IMY7odu5lRrFXGg71L15KG8QrPmum45RTtdA==", - "requires": { - "core-util-is": "~1.0.0", - "inherits": "~2.0.3", - "isarray": "~1.0.0", - "process-nextick-args": "~2.0.0", - "safe-buffer": "~5.1.1", - "string_decoder": "~1.1.1", - "util-deprecate": "~1.0.1" - } - }, - "require-directory": { - "version": "2.1.1", - "resolved": "https://registry.npmjs.org/require-directory/-/require-directory-2.1.1.tgz", - "integrity": "sha1-jGStX9MNqxyXbiNE/+f3kqam30I=" - }, - "safe-buffer": { - "version": "5.1.2", - "resolved": "https://registry.npmjs.org/safe-buffer/-/safe-buffer-5.1.2.tgz", - "integrity": "sha512-Gd2UZBJDkXlY7GbJxfsE8/nvKkUEU1G38c1siN6QP6a9PT9MmHB8GnpscSmMJSoF8LOIrt8ud/wPtojys4G6+g==" - }, - "selenium-webdriver": { - "version": "4.22.0", - "resolved": "https://registry.npmjs.org/selenium-webdriver/-/selenium-webdriver-4.22.0.tgz", - "integrity": "sha512-GNbrkCHmy249ai885wgXqTfqL2lZnclUH/P8pwTDIqzyFxU3YhDiN7p/c9tMFA4NhgRdEBO2QCG+CWmG7xr/Mw==", - "requires": { - "jszip": "^3.10.1", - "tmp": "^0.2.3", - "ws": ">=8.16.0" - } - }, - "setimmediate": { - "version": "1.0.5", - "resolved": "https://registry.npmjs.org/setimmediate/-/setimmediate-1.0.5.tgz", - "integrity": "sha512-MATJdZp8sLqDl/68LfQmbP8zKPLQNV6BIZoIgrscFDQ+RsvK/BxeDQOgyxKKoh0y/8h3BqVFnCqQ/gd+reiIXA==" - }, - "string_decoder": { - "version": "1.1.1", - "resolved": "https://registry.npmjs.org/string_decoder/-/string_decoder-1.1.1.tgz", - "integrity": "sha512-n/ShnvDi6FHbbVfviro+WojiFzv+s8MPMHBczVePfUpDJLwoLT0ht1l4YwBCbi8pJAveEEdnkHyPyTP/mzRfwg==", - "requires": { - "safe-buffer": "~5.1.0" - } - }, - "string-width": { - "version": "4.2.3", - "resolved": "https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz", - "integrity": "sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==", - "requires": { - "emoji-regex": "^8.0.0", - "is-fullwidth-code-point": "^3.0.0", - "strip-ansi": "^6.0.1" - } - }, - "strip-ansi": { - "version": "6.0.1", - "resolved": "https://registry.npmjs.org/strip-ansi/-/strip-ansi-6.0.1.tgz", - "integrity": "sha512-Y38VPSHcqkFrCpFnQ9vuSXmquuv5oXOKpGeT6aGrr3o3Gc9AlVa6JBfUSOCnbxGGZF+/0ooI7KrPuUSztUdU5A==", - "requires": { - "ansi-regex": "^5.0.1" - } - }, - "tmp": { - "version": "0.2.3", - "resolved": "https://registry.npmjs.org/tmp/-/tmp-0.2.3.tgz", - "integrity": "sha512-nZD7m9iCPC5g0pYmcaxogYKggSfLsdxl8of3Q/oIbqCqLLIO9IAF0GWjX1z9NZRHPiXv8Wex4yDCaZsgEw0Y8w==" - }, - "util-deprecate": { - "version": "1.0.2", - "resolved": "https://registry.npmjs.org/util-deprecate/-/util-deprecate-1.0.2.tgz", - "integrity": "sha512-EPD5q1uXyFxJpCrLnCc1nHnq3gOa6DZBocAIiI2TaSCA7VCJ1UJDMagCzIkXNsUYfD1daK//LTEQ8xiIbrHtcw==" - }, - "wrap-ansi": { - "version": "7.0.0", - "resolved": "https://registry.npmjs.org/wrap-ansi/-/wrap-ansi-7.0.0.tgz", - "integrity": "sha512-YVGIj2kamLSTxw6NsZjoBxfSwsn0ycdesmc4p+Q21c5zPuZ1pl+NfxVdxPtdHvmNVOQ6XSYG4AUtyt/Fi7D16Q==", - "requires": { - "ansi-styles": "^4.0.0", - "string-width": "^4.1.0", - "strip-ansi": "^6.0.0" - } - }, - "ws": { - "version": "8.17.1", - "resolved": "https://registry.npmjs.org/ws/-/ws-8.17.1.tgz", - "integrity": "sha512-6XQFvXTkbfUOZOKKILFG1PDK2NDQs4azKQl26T0YS5CxqWLgXajbPZ+h4gZekJyRqFU8pvnbAbbs/3TgRPy+GQ==", - "requires": {} - }, - "y18n": { - "version": "5.0.5", - "resolved": "https://registry.npmjs.org/y18n/-/y18n-5.0.5.tgz", - "integrity": "sha512-hsRUr4FFrvhhRH12wOdfs38Gy7k2FFzB9qgN9v3aLykRq0dRcdcpz5C9FxdS2NuhOrI/628b/KSTJ3rwHysYSg==" - }, - "yargs": { - "version": "17.7.2", - "resolved": "https://registry.npmjs.org/yargs/-/yargs-17.7.2.tgz", - "integrity": "sha512-7dSzzRQ++CKnNI/krKnYRV7JKKPUXMEh61soaHKg9mrWEhzFWhFnxPxGl+69cD1Ou63C13NUPCnmIcrvqCuM6w==", - "requires": { - "cliui": "^8.0.1", - "escalade": "^3.1.1", - "get-caller-file": "^2.0.5", - "require-directory": "^2.1.1", - "string-width": "^4.2.3", - "y18n": "^5.0.5", - "yargs-parser": "^21.1.1" - } - }, - "yargs-parser": { - "version": "21.1.1", - "resolved": "https://registry.npmjs.org/yargs-parser/-/yargs-parser-21.1.1.tgz", - "integrity": "sha512-tVpsJW7DdjecAiFpbIB1e3qxIQsE6NoPc5/eTdrbbIC4h0LVsWhnoa3g+m2HclBIujHzsxZ4VJVA+GUuc2/LBw==" - } - } -} diff --git a/package.json b/package.json deleted file mode 100644 index 0c6de55..0000000 --- a/package.json +++ /dev/null @@ -1,13 +0,0 @@ -{ - "name": "nzz-downloader", - "version": "0.1.3", - "description": "", - "main": "nzz.js", - "author": "Sebastian Hugentobler", - "license": "MPL-2.0", - "dependencies": { - "selenium-webdriver": "4.22.0", - "tmp": "0.2.3", - "yargs": "17.7.2" - } -} diff --git a/screenshot.jpg b/screenshot.jpg index 2b74f5c..18c91e8 100644 Binary files a/screenshot.jpg and b/screenshot.jpg differ