From d8568f6991eb4cfec74774fa6bd8791714d58937 Mon Sep 17 00:00:00 2001 From: Hongbo Wu Date: Wed, 6 Jul 2022 22:29:17 +0800 Subject: [PATCH 1/8] Add test --- .../expected-metadata.json | 12 + .../www.computerenhance.com/expected.html | 160 +++ .../www.computerenhance.com/source.html | 1235 +++++++++++++++++ .../www.computerenhance.com/url.txt | 1 + 4 files changed, 1408 insertions(+) create mode 100644 packages/readabilityjs/test/test-pages/www.computerenhance.com/expected-metadata.json create mode 100644 packages/readabilityjs/test/test-pages/www.computerenhance.com/expected.html create mode 100644 packages/readabilityjs/test/test-pages/www.computerenhance.com/source.html create mode 100644 packages/readabilityjs/test/test-pages/www.computerenhance.com/url.txt diff --git a/packages/readabilityjs/test/test-pages/www.computerenhance.com/expected-metadata.json b/packages/readabilityjs/test/test-pages/www.computerenhance.com/expected-metadata.json new file mode 100644 index 000000000..93191051a --- /dev/null +++ b/packages/readabilityjs/test/test-pages/www.computerenhance.com/expected-metadata.json @@ -0,0 +1,12 @@ +{ + "title": "No really, why can't we have raw UDP in JavaScript?", + "byline": "Casey Muratori", + "dir": null, + "excerpt": "In my opinion, the pat answers about security are incomplete. I'd like to see a detailed writeup of specifically why a raw UDP API cannot be made as secure as current HTTPS.", + "siteName": "Computer, Enhance!", + "siteIcon": "https://substackcdn.com/icons/substack/favicon.ico", + "previewImage": "https://substackcdn.com/image/fetch/w_1200,h_600,c_limit,f_jpg,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F43e258db-6164-4e47-835f-d11f10847d9d_5616x3744.jpeg", + "publishedDate": "2022-07-05T02:58:16.000Z", + "language": "English", + "readerable": true +} diff --git a/packages/readabilityjs/test/test-pages/www.computerenhance.com/expected.html b/packages/readabilityjs/test/test-pages/www.computerenhance.com/expected.html new file mode 100644 index 000000000..c5b043949 --- /dev/null +++ b/packages/readabilityjs/test/test-pages/www.computerenhance.com/expected.html @@ -0,0 +1,160 @@ +
+
+
+

In my opinion, the pat answers about security are incomplete. I'd like to see a detailed writeup of specifically why a raw UDP API cannot be made as secure as current HTTPS.

+
+
+
+
+ + + A sculpture of a cartoon character stuck in a pipe. + + +
+
+

+ By now I should know better than to ask on Twitter for a “rigorous analysis” of anything. As George W. Bush said, “Fool me once, shame on you… fool me can’t fooled again.” +

+

I don’t want to be “fool me can’t get fooled again”, so I officially give up on technical tweets. Today’s the last day I will ever post anything technical on Twitter, I promise. Instead, you will be forced to endure yet another Substack, so I can post 3,000-word posts that no one will read.

+

Here we go:

+

The goal with raw UDP is very simple: better performance and security on the server side.

+

+ HTTPS is an unbaked sausage made by grinding pure text HTTP with TLS and encasing the result in an arbitrary selection of third-party animal intestine… err, I mean, “highly secure” certificates provided by arbitrarily selected certificate providers. Implementing HTTPS is a massive amount of code that is inexorably slow. It is not only theoretically difficult to secure completely, but is insecure in practice in popular implementations available to the public. +

+

+ Oh, and the certificate authorities are also insecure, by the way - but that’s another story (and another, and another, and ) +

+

It also relied (up until recently) on TCP, which, unless you plan to write a completely custom network stack for every type of server/NIC you ever use, requires the underlying kernel to understand and track network connections. This means that you inherit substantial overhead, and perhaps vulnerabilities as well, from the TCP/IP substrate before you even begin to write your server code.

+

If you were a large company with significant academic and engineering resources, you might instead want to design your own private secure protocol that:

+
    +
  1. +

    Uses encryption you control, so it cannot be bypassed by hacking the certificate authority,

    +
  2. +
  3. +

    Uses UDP to avoid having OS connection state on the server side, and

    +
  4. +
  5. +

    Uses a well-designed, known packet structure to improve throughput and reduce security vulnerabilities from HTTP/TLS parsing.

    +
  6. +
+

+ The first thing on that list is half-possible now. Although there’s nothing you can (ever1) do to avoid man-in-the-middle attacks the very first time someone interacts with your server, web APIs have long made it possible to store data on the client for later use. One use for that data would be storing your own set of public keys. +

+

+ So even using nothing newer than XHR and cookies, you could theoretically add your own layer of encryption to anything you send to the server. This would ensure that any subsequent hack of the certificate authority could not inspect or modify your packets. It’d be much less efficient than rolling your own top-to-bottom, because now you pay the entire cost for your encryption and TLS. But you can do it. +

+

+ It’s slow, but possible. Call it half-possible, like I did above. +

+

+ The second thing on the list is sort-of possible now as well. If you can somehow manage to use HTTP/3 exclusively as your target platform, you will still be talking HTTP but you’ll be doing it over UDP instead of TCP, and can manage connection state however you wish without OS intervention. +

+

+ It is probably unrealistic to assume that you could do this in practice today. If you didn’t care about broad compatibility, you probably wouldn’t be deploying on the web anyway, so presumably the current adoption of HTTP/3 is insufficient. But at least it exists, and perhaps if adoption continues to grow, eventually it will be possible to require HTTP/3 without losing a significant number of users. For now, it’s only something you can do on the side - you still have to have a traditional HTTPS fallback. +

+

+ Which brings us to the third item on the list, and the real sticking point. As far as I’m aware, no current or planned future Web API ever lets you do number three. There are many new web “technologies” swarming around the custom packet idea (WebRTC, WebSockets, WebTransport), but to the best of my knowledge, all of them require an HTTPS connection to be made first, so your “custom packet” servers still need to implement all of HTTPS anyway. +

+

I can imagine someone raising the following objection at this point: “If you don’t support HTTPS on the server, how do you serve the WASM/JavaScript/whatever with the custom packet logic in the first place?”

+

That’s a reasonable question.

+

The answer is, the two most logical deployment scenarios I can think of both involve a separate server (or process) for the initial HTTPS transaction.

+

The first is what I imagine would be the most common: you upload to a CDN a traditional web package containing the PWA-style web worker necessary to do your own custom packet logic. The CDN serves this (static) content everywhere for you. They obviously implement HTTPS already, because that’s what they do for a living, and they’re not your servers anyway so you don’t care.

+

+ The second would be less common, but plausible: you run your own CDN-equivalent, because you’re just that hard core. But you expect that your HTTPS code is more vulnerable than your custom code, since HTTPS is vastly more complicated and has ridiculous things in it like arbitrary text parsing, which no one in their right mind would ever put into a “secure” protocol. So you cabin your HTTPS server instances into their own restricted processes or own machines entirely. This prevents exploits of the HTTPS code from affecting anything other than newly connecting users - existing users (who are only talking to your custom servers) remain unharmed. +

+

In neither scenario do you actually include HTTPS code in any of the processes running your actual secure server.

+

So that’s the hopefully-at-least-somewhat-convincing explanation of why someone might want raw UDP. Now the question is, can raw UDP be provided by a browser in a way that is “secure”?

+

+ I’m putting a lot of these words in scare quotes because browsers aren’t secure for any serious definition of that word, and hopefully that is overwhelmingly obvious to everyone who has ever used one. But just to be clear about the landscape, there are two different ways browsers are not secure: +

+
    +
  1. +

    + The web as a platform consists of massive, overlapping, poorly-specified APIs that require millions of lines of code to fully implement. As a result, browsers inexorably have an effectively infinite number of security exploits waiting to be found. +

    +
  2. +
  3. +

    Browsers include the ability, sans exploit, to transmit information from the client computer to any number of remote servers. Without the ability to control this behavior, the user’s data could be misappropriated.

    +
  4. +
+

Clearly, for raw UDP, we only care about the second one of these. The first one happens in browsers all the time already and there’s no reason to suspect that raw UDP would somehow have more implementation code vulnerabilities on average than any other part of the sprawling browser substrate.

+

+ So the question is, assuming the browser has not been exploited, what is the security standard for web features, and can raw UDP be implemented under that standard or not? +

+

As a point of comparison, I will use the example of the current camera/microphone/location policy as it presently exists. That will be our “gold standard”, since if it were not considered “secure” by web implementers, presumably it would not have been knowingly shipped in web browsers everywhere for the past several years.

+

As everyone who uses a web browser knows, a web site at present is allowed to ask you for permission, temporarily or permanently (your choice), to access your camera, microphone, and location data. Once you say “yes” to any one of these things, that site can transmit that data anywhere in the world, and use it for any purpose, trivially.

+

Allow me to provide a worked example.

+

+ Suppose I partner with Jeffrey Toobin to make a cybersex conduit site for people who, like him, see the value in quickly switching tabs away from your work meetings to get down to some real business. We launch cyberballsdeep.net, and it’s a big success. +

+

When a user visits our site, they see at most two security-related things:

+
    +
  1. +

    An allow/deny request for access to the microphone and camera, and

    +
  2. +
  3. +

    A lock icon indicating that the connection has been signed by a third party warranting that this connection is end-to-end encrypted from the user’s machine to some server somewhere with the secure keys for cyberballsdeep.net.

    +
  4. +
+

Assuming you click “allow” - which you have to in order to use the service - the servers at cyberballsdeep.net can now do anything they want with your (very sensitive) video data. They can, for example, record you while you are toobin’ and play it back at any time, anywhere, at their discretion. They could play it on a billboard in Times Square, they could send it to your spouse - anything goes.

+

So the “security standard” that you are getting, in practice, exactly mirrors the two things you saw:

+
    +
  1. +

    You know your sensitive data will not be captured unless you click “allow”, and

    +
  2. +
  3. +

    You know that nobody will be able to see your sensitive data unless either cyberballsdeep.net or the issuing certificate authority let them (either intentionally, or unintentionally if they’ve been hacked).

    +
  4. +
+

+ That’s it. You don’t know anything else. In practice, you basically have no security guarantees other than a warrant that your sensitive data will go to a particular named party first before it goes somewhere else. +

+

+ Hopefully we can all agree that this extremely low bar for security is the only hurdle one should have to clear in order to dismiss concerns of “security” as a reason not to implement a feature in a W3C spec. It’s not much, but it is something. +

+

+ OK, finally, with all that out of the way, this is what I actually wanted someone to point me to when I asked about this on Twitter. I just wanted to see that someone, somewhere, had worked out exactly why UDP could not be made to fit the same security model considered acceptable across other basic web features already deployed and considered “secure”. +

+

Since nobody sent me such a thing, I am still stuck with my own security modeling, with nothing to compare against. My model goes something like this:

+

Step one - the “allow/deny” step - is easy for raw UDP to provide. The browser is still sitting between the JavaScript/WASM layer and the OS sockets layer, so it can ensure that inbound and outbound packets are filtered any way the browser wishes.

+

This means that it would be trivial for a browser to only allow UDP packets to and from servers that the user has authorized, as it does with microphone, camera, and location data. Any site that wishes to access raw UDP simply provides a hostname to the browser, and the browser asks the user whether they wish to allow the page to communicate with that site.

+

Furthermore, since the browser already allows the page to send as much HTTPS data as it wants back to the originating site, one could optionally allow any site to send UDP packets back to its own (exact) originating IP without asking the user. This is not necessary for raw UDP to work, but I can’t think of any violation of “step one” that would happen as a result, so it could be considered.

+

+ Note that this is not true for something like camera/microphone/location data. Those are additional data sources to which the page gets access, so if anything, raw UDP permission is less dangerous in terms of user permission, since at no time does the page itself get additional access to the user’s data, regardless of whether they allow UDP communication. +

+

Which brings us to step two.

+

As far as I can tell, there’s actually nothing special about step two. The original web page was served by HTTPS, obviously, since that’s the only way the browser supports getting WASM/JavaScript downloaded in the first place. So the originating server and code are already exactly as “secure” as they would be in any other scenario.

+

The user had to affirmatively allow the destination name, so the page can only send UDP to a specifically approved endpoint.

+

+ So the only question is, can the user be sure that the data sent to that endpoint is encrypted such that only the endpoint or the certificate authority can decrypt it? +

+

+ I can’t know the hivemind of a W3C committee (thank the heavens). But if I had to guess, I would suspect that this is why they didn’t want to allow raw UDP (or raw TCP for that matter). In their mind, it probably seems less secure than HTTPS to allow a web page to implement its own secure UDP protocol. +

+

+ However, to my mind, this is based upon a flawed assumption. That assumption is that somehow web implementers can be trusted to deploy their encryption keys securely, but cannot be trusted to deploy their protocol securely. +

+

To be more specific, HTTPS can be intercepted trivially if the attacker A) has a machine on the route between the endpoints and B) has access to the server’s keys, or any certificate authority’s signing capability. (A) either happens or it doesn’t - there’s no way to control it - so (B) is really the entire question.

+

So the notion that allowing web pages to use UDP for transmission is less secure than HTTPS seems to me to be predicated on the notion that web developers can be trusted to do something complicated in one place (run a set of servers without leaking keys), but also cannot be trusted to do something complicated in another (download, for example, a JavaScript UDP encryption library and use it).

+

Stated alternately, the hard constraint on the client side that you can’t roll your packet code “for security reasons” is nowhere to be found on the server side. There is no requirement anywhere in W3C or anywhere else that says your web server has to be… well… anything at all, really. You can just go ahead and write your own code from top to bottom. You can even have a dedicated web page on your site that has the entire cryptographic key set for the server posted on it for people to cut-and-paste, so everyone can impersonate your server to anyone, anywhere, at any time. You can leave a thumb drive with your keys at the bar. You can generate your keys with a random seed of 0x000000000000000000. Anything goes.

+

+ Nobody seems to be panicked about this. Nobody has pushed the policy that the W3C should standardize on a specific web server deployment that you are forced to use, or a set of n of them made by Google/Mozilla/Apple, or what have you. It is just assumed that everyone is allowed to write their own server packet handling, but that no one is allowed to write their own client packet handling. +

+

So that’s what I would like explained. Internet, justify this!

+

I have seen people mention (but not support) a claim that raw UDP would cause “denial of service” problems because malicious web pages would send UDP packets to random servers in an attempt to overload them. This claim seems completely baseless to me, because there is no reason why you can’t employ the relevant XHR DDoS restrictions to UDP. If DDoS was the concern, just require that UDP packets be sent exclusively within the same domain as the originating code.

+

+ Furthermore, you could restrict the port ranges of raw web UDP to some assigned range. A new port range could be explicitly reserved just for raw web UDP if that makes people more comfortable, so it could literally be discarded at the gateway on any network that doesn’t want to support raw UDP for web, making it easier to deal with than UDP attacks from native code and viruses which can choose their ports at will. +

+

+ At that point, I fail to see how raw UDP from the browser could be significantly more dangerous than XHR, unless I am missing some particularly clever use of UDP. And again, that’s why I asked for writeups in my original tweet. I’m totally willing to believe I’m missing something, but I want to see a complete technical explanation about what it is. +

+

+ Now, none of this is the same as saying I can’t see how you would perform DDoS attacks with raw UDP. I certainly can. I just can’t see how you would perform them more easily than with XHR, which obviously is considered “secure”. +

+

As a simple example, suppose a commercial CDN distributes the payload of ddosfuntimes.com. On the main page, there’s an XHR to target.ddosfuntimes.com. Even though the CDN is a completely different set of IP addresses as target.ddosfuntimes.com, this is completely legal under XHR policy.

+

The owners of ddosfuntimes.com can go ahead and set the IP address in their DNS records to point target.ddosfuntimes.com at any server they want, and they will receive all the XHR traffic from every browser that visits the page. And to the best of my knowledge, there isn’t a damn thing the target can do about that.

+

So unless I’m missing something, XHR already allows you to target any website you wish with unwanted traffic from anyone who visits your site. So why the concern about UDP?

+
+
+
\ No newline at end of file diff --git a/packages/readabilityjs/test/test-pages/www.computerenhance.com/source.html b/packages/readabilityjs/test/test-pages/www.computerenhance.com/source.html new file mode 100644 index 000000000..a17839fed --- /dev/null +++ b/packages/readabilityjs/test/test-pages/www.computerenhance.com/source.html @@ -0,0 +1,1235 @@ + + + + + + + + + + + + + + + + + + + + + + + + + No really, why can't we have raw UDP in JavaScript? + + + + + + + + + + + + + + + + + + + + +
+
+ + +
+ +
+
+
+ +
+
+
+
+
+
+ 18 Comments +
+
+
+
+
+ +
+
+ +
+
+
+
+ + + +
+
+
+
+
+
+ + + + + +
+
+ +
+
+
+ Jul 5 +
+
+

+ My sense is that the game they are playing is blame management and plausible deniability. +

+

+ Without https, it becomes plausible for banks and other websites where security is paramount to blame the web standards for lacking a way to secure connections when they leak sensitive user data. +

+

+ So what the committees and browser vendors really wants is a way for the browsers to easily know that all connections with this site are "secured". Now, if information leaks, the blame is solely on the site operators. +

+

+ Currently they can do this if the site uses https. +

+

+ If you introduce UDP to the mix, and tell them "I will encrypt the packets myself", then the browser has no way to tell whether the connection is secure or not, so they will default to telling the user that this website uses an insecure connection. +

+

+ This would not be so problematic, except I think they want to eventually deprecate non-secure connections. +

+

+ Efficiency and simplicity is the last thing they care about. They will only care about it when someone demonstrates the existence of a clearly superior web application that cannot be implemented without a certain feature. I think this is why wasm got standarized. +

+
+
+ Expand full comment +
+
+
+ +
+
+ 1 reply +
+
+
+
+
+
+
+
+ + + + + +
+
+ +
+
+ +
+

+ Create your own client app. This is very much trying to fit a square peg into a round hole. +

+

+ If you want to, you can even give your client app an address bar, and let others use your app for their servers. Then you won't even need to touch html or css or JavaScript. +

+
+
+ Expand full comment +
+
+
+ +
+
+
+
+
+
+
16 more comments… +
+
+
+ +
+ + + + +
+ +
+
+
+ + + + + + + diff --git a/packages/readabilityjs/test/test-pages/www.computerenhance.com/url.txt b/packages/readabilityjs/test/test-pages/www.computerenhance.com/url.txt new file mode 100644 index 000000000..453926e3e --- /dev/null +++ b/packages/readabilityjs/test/test-pages/www.computerenhance.com/url.txt @@ -0,0 +1 @@ +https://www.computerenhance.com/p/no-really-why-cant-we-have-raw-udp \ No newline at end of file From d18d2e480f21f0dde0d834f36103da665cdfe751 Mon Sep 17 00:00:00 2001 From: Hongbo Wu Date: Wed, 6 Jul 2022 22:31:14 +0800 Subject: [PATCH 2/8] rename folder --- .../expected-metadata.json | 0 .../expected.html | 0 .../{www.computerenhance.com => computerenhance.com}/source.html | 0 .../{www.computerenhance.com => computerenhance.com}/url.txt | 0 4 files changed, 0 insertions(+), 0 deletions(-) rename packages/readabilityjs/test/test-pages/{www.computerenhance.com => computerenhance.com}/expected-metadata.json (100%) rename packages/readabilityjs/test/test-pages/{www.computerenhance.com => computerenhance.com}/expected.html (100%) rename packages/readabilityjs/test/test-pages/{www.computerenhance.com => computerenhance.com}/source.html (100%) rename packages/readabilityjs/test/test-pages/{www.computerenhance.com => computerenhance.com}/url.txt (100%) diff --git a/packages/readabilityjs/test/test-pages/www.computerenhance.com/expected-metadata.json b/packages/readabilityjs/test/test-pages/computerenhance.com/expected-metadata.json similarity index 100% rename from packages/readabilityjs/test/test-pages/www.computerenhance.com/expected-metadata.json rename to packages/readabilityjs/test/test-pages/computerenhance.com/expected-metadata.json diff --git a/packages/readabilityjs/test/test-pages/www.computerenhance.com/expected.html b/packages/readabilityjs/test/test-pages/computerenhance.com/expected.html similarity index 100% rename from packages/readabilityjs/test/test-pages/www.computerenhance.com/expected.html rename to packages/readabilityjs/test/test-pages/computerenhance.com/expected.html diff --git a/packages/readabilityjs/test/test-pages/www.computerenhance.com/source.html b/packages/readabilityjs/test/test-pages/computerenhance.com/source.html similarity index 100% rename from packages/readabilityjs/test/test-pages/www.computerenhance.com/source.html rename to packages/readabilityjs/test/test-pages/computerenhance.com/source.html diff --git a/packages/readabilityjs/test/test-pages/www.computerenhance.com/url.txt b/packages/readabilityjs/test/test-pages/computerenhance.com/url.txt similarity index 100% rename from packages/readabilityjs/test/test-pages/www.computerenhance.com/url.txt rename to packages/readabilityjs/test/test-pages/computerenhance.com/url.txt From b317a0877b40f9e04ac2b1513e02b0f0d7c7970f Mon Sep 17 00:00:00 2001 From: Hongbo Wu Date: Thu, 7 Jul 2022 10:48:04 +0800 Subject: [PATCH 3/8] Remove foot and footnote from negative score list --- packages/readabilityjs/Readability.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packages/readabilityjs/Readability.js b/packages/readabilityjs/Readability.js index 829f814f5..742db3928 100644 --- a/packages/readabilityjs/Readability.js +++ b/packages/readabilityjs/Readability.js @@ -178,7 +178,7 @@ Readability.prototype = { }, positive: /article|body|content|entry|hentry|h-entry|main|page|pagination|post|text|blog|story|tweet(-\w+)?|instagram|image|container-banners/i, - negative: /\bad\b|hidden|^hid$| hid$| hid |^hid |banner|combx|comment|com-|contact|foot|footer|footnote|gdpr|masthead|media|meta|outbrain|promo|related|scroll|share|shoutbox|sidebar|skyscraper|sponsor|shopping|tags|tool|widget|controls|video-controls/i, + negative: /\bad\b|hidden|^hid$| hid$| hid |^hid |banner|combx|comment|com-|contact|footer|gdpr|masthead|media|meta|outbrain|promo|related|scroll|share|shoutbox|sidebar|skyscraper|sponsor|shopping|tags|tool|widget|controls|video-controls/i, extraneous: /print|archive|comment|discuss|e[\-]?mail|share|reply|all|login|sign|single|utility/i, byline: /byline|author|dateline|writtenby|p-author/i, publishedDate: /published|modified|created|updated/i, From c4e8a443765657d3840c9352a4d59d5add8343c6 Mon Sep 17 00:00:00 2001 From: Hongbo Wu Date: Thu, 7 Jul 2022 10:48:16 +0800 Subject: [PATCH 4/8] Update generated html --- .../test-pages/computerenhance.com/expected.html | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/packages/readabilityjs/test/test-pages/computerenhance.com/expected.html b/packages/readabilityjs/test/test-pages/computerenhance.com/expected.html index c5b043949..d8559f1f7 100644 --- a/packages/readabilityjs/test/test-pages/computerenhance.com/expected.html +++ b/packages/readabilityjs/test/test-pages/computerenhance.com/expected.html @@ -155,6 +155,19 @@

As a simple example, suppose a commercial CDN distributes the payload of ddosfuntimes.com. On the main page, there’s an XHR to target.ddosfuntimes.com. Even though the CDN is a completely different set of IP addresses as target.ddosfuntimes.com, this is completely legal under XHR policy.

The owners of ddosfuntimes.com can go ahead and set the IP address in their DNS records to point target.ddosfuntimes.com at any server they want, and they will receive all the XHR traffic from every browser that visits the page. And to the best of my knowledge, there isn’t a damn thing the target can do about that.

So unless I’m missing something, XHR already allows you to target any website you wish with unwanted traffic from anyone who visits your site. So why the concern about UDP?

+
+

1

+
+

+ This is way off topic, but in case it struck people as odd: all secure systems have a root trust problem. At some point you have to get something from somebody that you will just blindly trust. This is the root of the chain of trust, and unfortunately, there’s really nothing you can do to make it secure. You just have to hope that this initial exchange is trusted. +

+

So in the case of web browsers, you have to keep in mind that HTTPS doesn’t actually guarantee you anything beyond a chain of trust. You are implicitly trusting that a) nobody messed with the browser when you downloaded it, b) none of the certificate authorities trusted by that browser download have been compromised, c) the certificate for signing browser root certificate updates hasn’t itself been compromised.

+

Etc., etc.

+

+ So in general, when we talk about adding security to a protocol, we can only talk about securing it up to a point. No matter what we do, there will never be a way for it to be completely secure, because the chain of trust is not infinite, and any of its endpoints (in this case, the browser itself or any certificate authority) can lie to you for as long as it takes for a security firm to catch them doing it. +

+
+
\ No newline at end of file From cf5f577c142d40e40e1821e910977ea1e570676a Mon Sep 17 00:00:00 2001 From: Hongbo Wu Date: Wed, 13 Jul 2022 16:23:36 +0800 Subject: [PATCH 5/8] regenerate test page for danwang with footnote added --- .../test-pages/danwang/expected-metadata.json | 2 + .../test/test-pages/danwang/expected.html | 126 ++++++++++++++++-- 2 files changed, 120 insertions(+), 8 deletions(-) diff --git a/packages/readabilityjs/test/test-pages/danwang/expected-metadata.json b/packages/readabilityjs/test/test-pages/danwang/expected-metadata.json index ebd3ec373..06e4f9573 100644 --- a/packages/readabilityjs/test/test-pages/danwang/expected-metadata.json +++ b/packages/readabilityjs/test/test-pages/danwang/expected-metadata.json @@ -4,7 +4,9 @@ "dir": null, "excerpt": "Centralized campaigns of inspiration; Proust; rejecting complacency and decadence; the pandemic in Beijing; brown sauce; riding a bike; rejuvenation.", "siteName": "Dan Wang", + "siteIcon": "https://danwang.co/wp-content/uploads/2014/09/dan-wang-shopify12.png", "previewImage": "https://i1.wp.com/danwang.co/wp-content/uploads/2021/01/nasa-titan.jpg?fit=700%2C1044&ssl=1", "publishedDate": "2021-01-01T15:44:10.000Z", + "language": "English", "readerable": true } diff --git a/packages/readabilityjs/test/test-pages/danwang/expected.html b/packages/readabilityjs/test/test-pages/danwang/expected.html index 20a144bac..86553e0f5 100644 --- a/packages/readabilityjs/test/test-pages/danwang/expected.html +++ b/packages/readabilityjs/test/test-pages/danwang/expected.html @@ -1,4 +1,4 @@ -
+
@@ -22,7 +22,7 @@

When it’s not being vague, the party can be trying to have things both ways. Xi declared at the third plenum in 2013 that market forces would have a “decisive” role in allocating resources, while at the same time the state sector would have a “leading” role. It’s not unusual to see a great deal of semantic acrobatics. Deng declared that socialism means the capacity to concentrate resources to accomplish great tasks; under that definition, the Apollo and Manhattan projects were socialism. In July, Xi reminded us that “socialism with Chinese characteristics has many distinctive features, but its most essential is leadership by the Chinese Communist Party.”

In other words, socialism with Chinese characteristics means the party is never wrong. Either the market or the state sector can be more important at any moment: it is the party’s pleasure to decide.

Centralized campaigns of inspiration, which usually manifests through fixing slogans, is a distinctive feature of the Chinese political system. In the US, political candidates trot out slogans when they run for election; in China, one is never far from the next big named initiative. At its best, defining major goals is the essence of political leadership, and nowhere is this principle better illustrated than Apollo. John F. Kennedy announced the target in 1961: land a man on the moon and return him safely to earth before the decade was out. By fixing this clear goal,

-

as well as committing the necessary spending, he accelerated the creation, development, and deployment of technologies that made the lunar landings possible. 

+

as well as committing the necessary spending, he accelerated the creation, development, and deployment of technologies that made the lunar landings possible.

Xi grasps this idea of leadership. In his tenure, he has unleashed a torrent of new initiatives. In my view, he feels that the practice of governing China under socialism cannot be an exercise in sustained mendacity. The political system can no longer continue to be an unstable structure based on ad hoc compromises; instead it must have a clear organizational structure, with the party at the top. And the ruling party needs to have the political consciousness of an effective governing force.

Consider two of his most important initiatives: the campaign against corruption and the move toward law-based governance. Xi has decided that corruption is not a mystery to be endured, but a problem to be solved. A few years past the peak of the crackdown, it’s fair to say that the campaign hasn’t solely been effective in removing his adversaries, but has also been broad enough to restore some degree of public confidence in government. A few commentators contend that removal of opportunities for graft have prompted talented people to leave government. But the flip side of that coin has been the improvement in morale among the civil servants who found corruption among colleagues to be intolerable, and can finally see themselves doing public work well. 

And for years, Xi has emphasized following clear rules of written procedure, under the rubric of “law-based governance.”

@@ -40,11 +40,11 @@

Given the importance of the slogan, it’s worthwhile to try to come to terms with the fondness and reverence his generation has for the party’s early days. Many of the people tormented by the party center, including Deng and Xi’s father, have ended up being fiercely loyal to the party.

That shows not just that human nature is complex, but also that the revolutionary heritage of the party instills pride. The CCP started out as a combat party constantly at the mercy of forces grander than itself, achieving its goals after an unusually long struggle that repeatedly brought it to the brink of death. Daniel Koss reminds us that the longer that revolutionary parties have to struggle before consolidating power, the more stronger their ideological commitments and the greater their governance durability tend to be.

Xi is keen to reflect upon the regime’s history. He has decided that the party must believe in itself, and that it is correct to do so: “If our Party members and officials are firm in their ideals and convictions and maintain high morale in their activities and initiatives, and if our people are high-spirited and determined, then we will surely create many miracles.”

-

Furthermore, he has stated: “The prospects are bright but the challenges are severe. All comrades must aim high and look far, be alert to dangers even in times of calm, have the courage to pursue reform and break new ground, and never become hardened to change.” 

+

Furthermore, he has stated: “The prospects are bright but the challenges are severe. All comrades must aim high and look far, be alert to dangers even in times of calm, have the courage to pursue reform and break new ground, and never become hardened to change.”

Thus I’ve arrived at the idea that a commitment to centralized campaigns of inspiration, represented by the tendency to fix clear goals, is the booster stage required to leave the gravitational pull of decadence and complacency. Ross Douthat laments that “a consistent ineffectuality in American governance is just the way things are.”

And he references Jacques Barzun, who defines a decadent society as one that is “peculiarly restless, for it sees no clear lines of advance.” As a society turns developed, its main problems become social: an organizational sclerosis, which no technology is sophisticated enough to solve. No great effort is required to identify the comprehensive paralysis in the US. And that is the political and social current that Xi is trying to reverse in China.

One way to do that is to continue to pursue GDP growth, which has mostly become an unfashionable idea today in the west. Xi reminded the state in July that “economic work must be our core task, if we succeed in that, then the rest of our tasks become easy.”

-

Barry Naughton has noted that “China’s system of incentives for local bureaucrats to encourage growth is extremely unusual, and seems only to exist in China. It is a blunt and powerful instrument.” 

+

Barry Naughton has noted that “China’s system of incentives for local bureaucrats to encourage growth is extremely unusual, and seems only to exist in China. It is a blunt and powerful instrument.”

This emphasis on growth makes it less likely for China to develop into American complacency or decadence. There are other types of paralysis that it stands a good chance of avoiding. With its emphasis on the real economy, it is trying to avoid the fate of Hong Kong, where local elites have reorganized the productive forces completely around sustaining high property prices and managing mainland liquidity flows. With its emphasis on economic growth, it cannot be like Taiwan, whose single bright corporate beacon is surrounded by a mass of firms undergoing genteel decline. With its emphasis on manufacturing, it cannot be like the UK, which is so successful in the sounding-clever industries—television, journalism, finance, and universities—while seeing a falling share of R&D intensity and a global loss of standing among its largest firms.

Douthat’s book does not deal seriously with China, only with a fantasy of a universally-surveilled society under the rubric of a social credit system. If he did engage more seriously, he might pick up what Frank Pieke has termed “neo-socialism,” which is the attempt to harness market liberalization to strengthen state capacity and a more Leninist party.

In return, the state provides purpose and direction, as well as inspiring the rest of society with a transformative mission. It helps, of course, that Xi is a genuine believer in socialism, which to him is both an instrument as well as an end. He’s leveraging that belief to reject decadence and assert agency to point out new lines of advance.

@@ -54,8 +54,7 @@

That was quite a lot of theory. Where does it fall apart?

Xi has said: “If we turn a blind eye to challenges, or even dodge or disguise them; if we fear to advance in the face of challenges and sit by and watch the unfolding calamity; then they will grow beyond our control and cause irreparable damage.”

Instead of heeding this warning, authorities in Wuhan suppressed reporting of a spread of a novel virus. At a time when they should have imposed restrictions, they congregated thousands around a gigantic potluck. That has indeed unfolded into a calamity.

-

Xi has said: “Some officials are perfunctory in their work, shirking responsibility when troubles come and dodging thorny problems. They like to report every trifle to their superiors for approval or directives. In doing so, they appear to be abiding by the rules but are actually avoiding responsibilities. Some make ill-considered or purely arbitrary decisions. They place themselves above the party organization and allow no dissenting voices.”

-

 But as economic growth slows down, the country is doubling down on centralized government. Over the last several years, the state is taking more of a leading role in the economy, which means a larger role for bureaucrats.

+

Xi has said: “Some officials are perfunctory in their work, shirking responsibility when troubles come and dodging thorny problems. They like to report every trifle to their superiors for approval or directives. In doing so, they appear to be abiding by the rules but are actually avoiding responsibilities. Some make ill-considered or purely arbitrary decisions. They place themselves above the party organization and allow no dissenting voices.”

 

But as economic growth slows down, the country is doubling down on centralized government. Over the last several years, the state is taking more of a leading role in the economy, which means a larger role for bureaucrats.

Xi has said: “Self-criticism needs to be specific about our problems and needs to touch underlying questions… We must be gratified when told of our errors; we must not shy away from our shortcomings. We must accommodate different opinions and sharp criticism.”

When medical professionals spoke up about a strange new virus circulating in Wuhan, police gave them reprimands. More and more often, the state is simply arresting critics. Even though the government has every reason to be confident about the effectiveness of its virus containment, it has issued a jail sentence to a citizen journalist under the catch-all charge of “picking quarrels and provoking trouble.” For all the emphasis on seeking truth from facts, the state still maintains this practice of shooting the messenger or jailing its critics.

On its own terms, the party center’s instruction is unevenly followed. And there are plenty of reasons to doubt the sustainability of Chinese growth that exist beyond the party’s capacity for self-reform. The following have all received extensive treatment: demographics will be a clear and serious drag in only a few years; an uncomfortable buildup of debt is now accompanied by growing investor discomfort with strategic defaults; the environment is bearing greater stresses; and based on the state’s aggression abroad and the operation of detention camps for minority groups at home, the rest of the world has become much less friendly towards China. One can add more items here, I want to consider the problems with centralized campaigns of inspiration.

@@ -137,7 +136,7 @@

In the early months of the pandemic, I picked up a new skill: riding a bike. I’ve always been mortified to admit that I never properly knew how. With the encouragement of kind and patient friends, I’ve enjoyed cycling so much that it has become the primary way I get around Beijing. The city is good for cyclists, with its wide bicycle paths and flat roads. (Given the behavior of most drivers though, Beijing requires taking seriously the principle of safety first.) My favorite activity has become to cycle to the Forbidden City and back home, a nice hour-long ride that I would do after lunch. I’m still enjoying the feeling of gliding down a road on my own propulsion, which gives me a sense of slight unreality. That’s been good for thinking: I wrote significant chunks of this letter while riding down Beijing’s second and fourth ring roads.

This year marks my seventh of not drinking. I expect that I’m in the best shape of my life, given that, regular bike rides, occasional badminton sessions, and working out with my personal trainer three times a week. Still, I’m exhausted. That doesn’t mean it’s time to slow down. There are too many interesting things left to do.

- +

Titan, a planet-sized moon of Saturn, has a thick atmosphere and liquid oceans. It and Europa—one of the moons of Jupiter, which might have warm liquid oceans—offer the best chances of discovering extraterrestrial life in our solar system. Credit: JPL @@ -158,10 +157,121 @@

+
+
+
    +
  1. +

    see Anne-Marie Brady: Marketing Dictatorship: Propaganda and Thought Work in Contemporary China +

    +
  2. + +
  3. +

    中国特色社会主义有很多特点和特征,但最本质的特征是坚持中国共产党领导。http://www.qstheory.cn/dukan/qs/2020-07/15/c_1126234524.htm +

    +
  4. + +
  5. +

    For more, see Charles Fishman’s excellent One Giant Leap, which showed how NASA had to invent a thousand and one technologies to reach the moon +

    +
  6. + +
  7. +

    Sometimes translated as “rule of law”: 依法治国 +

    +
  8. + +
  9. +

    + http://www.xinhuanet.com/english/2020-06/07/c_139120424.htm +

    +
  10. + +
  11. +

    see Dan Grover on the UI changes that Chinese apps made: http://dangrover.com/blog/2020/04/05/covid-in-ui.html +

    +
  12. + +
  13. +

    That’s a broad and unfair generalization, I know. This Economist leader offers a more nuanced view: https://www.economist.com/briefing/2020/08/15/xi-jinping-is-trying-to-remake-the-chinese-economy +

    +
  14. + +
  15. +

    This is my translation of 不忘初心、牢记使命. There are variations on the third line, I included one I’ve seen: 永远奋斗 +

    +
  16. + +
  17. +

    see this excellent discussion between Frederick Teiwes and Joseph Torigian https://omny.fm/shows/the-little-red-podcast/xi-dada-and-daddy-power-the-party-and-the-presiden +

    +
  18. + +
  19. +

    From Dialectical Materialism Is the Worldview and Methodology of Chinese Communists, 广大党员、干部理想信念坚定、干事创业精气神足,人民群众精神振奋、发愤图强,就可以创造出很多人间奇迹 http://www.qstheory.cn/dukan/qs/2018-12/31/c_1123923896.htm +

    +
  20. + +
  21. +

    Report to the 19th party congress: http://www.xinhuanet.com/english/download/Xi_Jinping’s_report_at_19th_CPC_National_Congress.pdf +

    +
  22. + +
  23. +

    see The Decadent Society +

    +
  24. + +
  25. +

    经济工作是中心工作,党的领导当然要在中心工作中得到充分体现,抓住了中心工作这个牛鼻子,其他工作就可以更好展开。http://www.qstheory.cn/dukan/qs/2020-07/15/c_1126234524.htm +

    +
  26. + +
  27. +

    see Frank Pieke’s Knowing China +

    +
  28. + +
  29. +

    see Dialectical Materialism Is the Worldview and Methodology of Chinese Communists 如果对矛盾熟视无睹,甚至回避、掩饰矛盾,在矛盾面前畏缩不前,坐看矛盾恶性转化,那就会积重难返,最后势必造成无法弥补的损失。 http://www.qstheory.cn/dukan/qs/2018-12/31/c_1123923896.htm +

    +
  30. + +
  31. +

    from the speech at the Third Plenary Session of the 19th Central Commission for Discipline Inspection +

    +
  32. + +
  33. +

    from Goals of the Aspiration and Mission Education Campaign, May 31 2019 +

    +
  34. + +
  35. +

    http://www.chinafilm.gov.cn/chinafilm/contents/141/2533.shtml +

    +
  36. + +
  37. +

    Wang Hongsheng, a boss at Jinghai, admits to fretting about interruptions to chick supplies, even wondering if President Donald Trump might curb American exports. https://www.economist.com/china/2020/10/31/high-tech-chickens-are-a-case-study-of-why-self-reliance-is-so-hard +

    +
  38. + +
  39. +

    see this WSJ story https://www.wsj.com/articles/the-u-s-vs-china-the-high-cost-of-the-technology-cold-war-11603397438 and Doug Fuller’s claim on Tokyo Electron https://www.jhuapl.edu/assessing-us-china-technology-connections/publications +

    +
  40. + +
  41. +

    This is admittedly a bit of my own fanciful translation of 必须看到,实体经济是基础,各种制造业不能丢,作为14亿人口的大国,粮食和实体产业要以自己为主,这一条绝对不能丢 http://www.qstheory.cn/dukan/qs/2020-10/31/c_1126680390.htm +

    +
  42. + +
+
- \ No newline at end of file + \ No newline at end of file From 2e645b29b237ab5f1f7dba9e11b52107e9d85ee3 Mon Sep 17 00:00:00 2001 From: Hongbo Wu Date: Wed, 13 Jul 2022 16:24:33 +0800 Subject: [PATCH 6/8] regenerate test pages with footnote added --- .../test/test-pages/garymarcus/expected.html | 4 ++ .../sciencedirect/expected-metadata.json | 2 + .../test-pages/sciencedirect/expected.html | 42 +++++++++++++++---- 3 files changed, 40 insertions(+), 8 deletions(-) diff --git a/packages/readabilityjs/test/test-pages/garymarcus/expected.html b/packages/readabilityjs/test/test-pages/garymarcus/expected.html index f2b7d8024..a18902025 100644 --- a/packages/readabilityjs/test/test-pages/garymarcus/expected.html +++ b/packages/readabilityjs/test/test-pages/garymarcus/expected.html @@ -46,6 +46,10 @@

Epilogue:

Last word to philosopher poet Jag Bhalla

+
+

1

+

To be triply sure I asked Aguera y Arcas if I could have access to LaMDA; so far Google has been unwilling to let pesky academics like me have a look see. I’ll report back if that changes.

+
\ No newline at end of file diff --git a/packages/readabilityjs/test/test-pages/sciencedirect/expected-metadata.json b/packages/readabilityjs/test/test-pages/sciencedirect/expected-metadata.json index 6af058d1a..ffec318f4 100644 --- a/packages/readabilityjs/test/test-pages/sciencedirect/expected-metadata.json +++ b/packages/readabilityjs/test/test-pages/sciencedirect/expected-metadata.json @@ -4,7 +4,9 @@ "dir": null, "excerpt": "The “Weak Garden of Eden” model for the origin and dispersal of modern humans (Harpendinget al., 1993) posits that modern humans spread into separate …", "siteName": null, + "siteIcon": "https://sdfestaticassets-eu-west-1.sciencedirectassets.com/shared-assets/13/images/favSD.ico", "previewImage": "https://ars.els-cdn.com/content/image/1-s2.0-S0047248420X00121-cov150h.gif", "publishedDate": null, + "language": "English", "readerable": true } diff --git a/packages/readabilityjs/test/test-pages/sciencedirect/expected.html b/packages/readabilityjs/test/test-pages/sciencedirect/expected.html index a83f5cc56..de2c78a3b 100644 --- a/packages/readabilityjs/test/test-pages/sciencedirect/expected.html +++ b/packages/readabilityjs/test/test-pages/sciencedirect/expected.html @@ -1,27 +1,53 @@ -
-
+
+
-

Elsevier logo

+

Elsevier logo

-

Elsevier +

Elsevier

-

Journal of Human Evolution +

Journal of Human Evolution

Abstract

-

The “Weak Garden of Eden” model for the origin and dispersal of modern humans (Harpendinget al., 1993) posits that modern humans spread into separate regions from a restricted source, around 100 ka (thousand years ago), then passed through population bottlenecks. Around 50 ka, dramatic growth occurred within dispersed populations that were genetically isolated from each other. Population growth began earliest in Africa and later in Eurasia and is hypothesized to have been caused by the invention and spread of a more efficient Later Stone Age/Upper Paleolithic technology, which developed in equatorial Africa.

-

Climatic and geological evidence suggest an alternative hypothesis for Late Pleistocene population bottlenecks and releases. The last glacial period was preceded by one thousand years of the coldest temperatures of the Later Pleistocene (∼71–70 ka), apparently caused by the eruption of Toba, Sumatra. Toba was the largest known explosive eruption of the Quaternary. Toba's volcanic winter could have decimated most modern human populations, especially outside of isolated tropical refugia. Release from the bottleneck could have occurred either at the end of this hypercold phase, or 10,000 years later, at the transition from cold oxygen isotope stage 4 to warmer stage 3. The largest populations surviving through the bottleneck should have been found in the largest tropical refugia, and thus in equatorial Africa. High genetic diversity in modern Africans may thus reflect a less severe bottleneck rather than earlier population growth.

+

The “Weak Garden of Eden” model for the origin and dispersal of modern humans (Harpendinget al., 1993) posits that modern humans spread into separate regions from a restricted source, around 100 +   + ka (thousand years ago), then passed through population bottlenecks. Around 50 +   + ka, dramatic growth occurred within dispersed populations that were genetically isolated from each other. Population growth began earliest in Africa and later in Eurasia and is hypothesized to have been caused by the invention and spread of a more efficient Later Stone Age/Upper Paleolithic technology, which developed in equatorial Africa. +

+

Climatic and geological evidence suggest an alternative hypothesis for Late Pleistocene population bottlenecks and releases. The last glacial period was preceded by one thousand years of the coldest temperatures of the Later Pleistocene (∼71–70 +   + ka), apparently caused by the eruption of Toba, Sumatra. Toba was the largest known explosive eruption of the Quaternary. Toba's volcanic winter could have decimated most modern human populations, especially outside of isolated tropical refugia. Release from the bottleneck could have occurred either at the end of this hypercold phase, or 10,000 years later, at the transition from cold oxygen isotope stage 4 to warmer stage 3. The largest populations surviving through the bottleneck should have been found in the largest tropical refugia, and thus in equatorial Africa. High genetic diversity in modern Africans may thus reflect a less severe bottleneck rather than earlier population growth. +

Volcanic winter may have reduced populations to levels low enough for founder effects, genetic drift and local adaptations to produce rapid population differentiation. If Toba caused the bottlenecks, then modern human races may have differentiated abruptly, only 70 thousand years ago.

+
+
+
+ +
+
+

P. Mellars

+
+
+
+
+ f1 +
+
+

E-mail: Ambrose@uiuc.edu

+
+
+
-
\ No newline at end of file +
\ No newline at end of file From f0f7aa5a6d413890f1c5f429c50ad5018f1b01a9 Mon Sep 17 00:00:00 2001 From: Hongbo Wu Date: Thu, 14 Jul 2022 10:59:44 +0800 Subject: [PATCH 7/8] rename jsdom to linkedom --- packages/readabilityjs/Readability.js | 1 - packages/readabilityjs/test/generate-testcase.js | 11 +++++------ .../{test-jsdomparser.js => test-linkedomparser.js} | 2 +- packages/readabilityjs/test/test-readability.js | 4 ++-- 4 files changed, 8 insertions(+), 10 deletions(-) rename packages/readabilityjs/test/{test-jsdomparser.js => test-linkedomparser.js} (99%) diff --git a/packages/readabilityjs/Readability.js b/packages/readabilityjs/Readability.js index 742db3928..c486f6362 100644 --- a/packages/readabilityjs/Readability.js +++ b/packages/readabilityjs/Readability.js @@ -2879,7 +2879,6 @@ Readability.prototype = { * 4. Replace the current DOM tree with the new one. * 5. Read peacefully. * - * @return void **/ parse: async function() { // Avoid parsing too large documents, as per configuration option diff --git a/packages/readabilityjs/test/generate-testcase.js b/packages/readabilityjs/test/generate-testcase.js index 2cd20ff0b..05a94e8cd 100644 --- a/packages/readabilityjs/test/generate-testcase.js +++ b/packages/readabilityjs/test/generate-testcase.js @@ -6,7 +6,6 @@ var prettyPrint = require("./utils").prettyPrint; var htmltidy = require("htmltidy2").tidy; var { Readability, isProbablyReaderable } = require("../index"); -var JSDOMParser = require("../JSDOMParser"); const { generate: generateRandomUA } = require("modern-random-ua/random_ua"); const puppeteer = require('puppeteer'); const { parseHTML } = require("linkedom"); @@ -226,12 +225,12 @@ async function runReadability(source, destPath, metadataDestPath) { var uri = "http://fakehost/test/page.html"; var myReader, result, readerable; try { - // Use jsdom for isProbablyReaderable because it supports querySelectorAll - var jsdom = parseHTML(source).document; - readerable = isProbablyReaderable(jsdom); + // Use linkedom for isProbablyReaderable because it supports querySelectorAll + var dom = parseHTML(source).document; + readerable = isProbablyReaderable(dom); // We pass `caption` as a class to check that passing in extra classes works, // given that it appears in some of the test documents. - myReader = new Readability(jsdom, { classesToPreserve: ["caption"], url: uri }); + myReader = new Readability(dom, { classesToPreserve: ["caption"], url: uri }); result = await myReader.parse(); } catch (ex) { console.error(ex); @@ -274,7 +273,7 @@ if (process.argv.length < 3) { if (process.argv[2] === "all") { fs.readdir(testcaseRoot, function (err, files) { if (err) { - console.error("error reading testcaseses"); + console.error("error reading testcases"); return; } diff --git a/packages/readabilityjs/test/test-jsdomparser.js b/packages/readabilityjs/test/test-linkedomparser.js similarity index 99% rename from packages/readabilityjs/test/test-jsdomparser.js rename to packages/readabilityjs/test/test-linkedomparser.js index 982647fd5..0d80b57ca 100644 --- a/packages/readabilityjs/test/test-jsdomparser.js +++ b/packages/readabilityjs/test/test-linkedomparser.js @@ -10,7 +10,7 @@ var BASETESTCASE = '

Some text and a var baseDoc = new JSDOMParser().parse(BASETESTCASE, "http://fakehost/"); -describe("Test JSDOM functionality", function() { +describe("Test linkedom functionality", function() { function nodeExpect(actual, expected) { try { expect(actual).eql(expected); diff --git a/packages/readabilityjs/test/test-readability.js b/packages/readabilityjs/test/test-readability.js index 6db89071e..ac0d3dc8a 100644 --- a/packages/readabilityjs/test/test-readability.js +++ b/packages/readabilityjs/test/test-readability.js @@ -326,8 +326,8 @@ describe("Test pages", function() { describe(testPage.dir, function() { var uri = "http://fakehost/test/page.html"; - runTestsWithItems("jsdom", function(source) { - var doc =parseHTML(source).document; + runTestsWithItems("linkedom", function(source) { + var doc = parseHTML(source).document; removeCommentNodesRecursively(doc); return doc; }, testPage.source, testPage.expectedContent, testPage.expectedMetadata, uri); From a39a82bd4283136fbf3407b655455e2f9be13a26 Mon Sep 17 00:00:00 2001 From: Hongbo Wu Date: Thu, 14 Jul 2022 11:08:32 +0800 Subject: [PATCH 8/8] fix tests --- .../test/test-pages/sciencedirect/expected.html | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/packages/readabilityjs/test/test-pages/sciencedirect/expected.html b/packages/readabilityjs/test/test-pages/sciencedirect/expected.html index de2c78a3b..17228ea7c 100644 --- a/packages/readabilityjs/test/test-pages/sciencedirect/expected.html +++ b/packages/readabilityjs/test/test-pages/sciencedirect/expected.html @@ -16,13 +16,13 @@

Abstract

-

The “Weak Garden of Eden” model for the origin and dispersal of modern humans (Harpendinget al., 1993) posits that modern humans spread into separate regions from a restricted source, around 100 +

The “Weak Garden of Eden” model for the origin and dispersal of modern humans (Harpendinget al., 1993) posits that modern humans spread into separate regions from a restricted source, around 100    - ka (thousand years ago), then passed through population bottlenecks. Around 50 + ka (thousand years ago), then passed through population bottlenecks. Around 50    ka, dramatic growth occurred within dispersed populations that were genetically isolated from each other. Population growth began earliest in Africa and later in Eurasia and is hypothesized to have been caused by the invention and spread of a more efficient Later Stone Age/Upper Paleolithic technology, which developed in equatorial Africa.

-

Climatic and geological evidence suggest an alternative hypothesis for Late Pleistocene population bottlenecks and releases. The last glacial period was preceded by one thousand years of the coldest temperatures of the Later Pleistocene (∼71–70 +

Climatic and geological evidence suggest an alternative hypothesis for Late Pleistocene population bottlenecks and releases. The last glacial period was preceded by one thousand years of the coldest temperatures of the Later Pleistocene (∼71–70    ka), apparently caused by the eruption of Toba, Sumatra. Toba was the largest known explosive eruption of the Quaternary. Toba's volcanic winter could have decimated most modern human populations, especially outside of isolated tropical refugia. Release from the bottleneck could have occurred either at the end of this hypercold phase, or 10,000 years later, at the transition from cold oxygen isotope stage 4 to warmer stage 3. The largest populations surviving through the bottleneck should have been found in the largest tropical refugia, and thus in equatorial Africa. High genetic diversity in modern Africans may thus reflect a less severe bottleneck rather than earlier population growth.

@@ -50,4 +50,4 @@
-
\ No newline at end of file +