Commit Graph

50 Commits

Author SHA1 Message Date
e2a0c41528 Add readability test for sydney.com 2022-06-27 22:26:41 +08:00
a3e5d6d817 Remove ad 2022-06-27 11:40:18 +08:00
2d4ee151d1 Generate test page 2022-06-27 11:12:34 +08:00
324c4ee6e7 Give .container-banners higher weight 2022-06-23 21:58:37 +08:00
bf05d1782f Regenerate test case Cont 2022-06-23 21:14:06 +08:00
68692111d9 Regenerate test case 2022-06-23 21:11:34 +08:00
13a95f4e44 Add test case for energias-renovables.com 2022-06-23 00:31:23 -07:00
486f22a594 remove redundant async 2022-06-15 22:31:55 +08:00
079857d6c0 update test case 2022-06-15 22:27:41 +08:00
2aafd39650 add fastcompany.com to the non-script hosts list 2022-06-15 22:27:25 +08:00
e028e2e440 generate test page for fast company 2022-06-15 21:22:25 +08:00
5625a055d8 Update test case 2022-06-13 17:36:23 +08:00
64673bd6db Add test case 2022-06-13 15:15:03 +08:00
8eecdd60e3 Add test case for https://infoproc.blogspot.com/2022/05/theodore-postol-nuclear-weapons-missile.html 2022-06-09 16:37:21 +08:00
35ca7ede82 Add test case for danluu 2022-06-06 22:37:05 +08:00
704726dc6a Improve parsing of channel news asia 2022-06-03 14:03:14 -07:00
304fe70113 Fix tests 2022-06-01 10:30:23 +08:00
98ecdcff80 Fix test case of setting tweet-placeholder 2022-05-31 22:51:27 +08:00
404805e0c0 Make async calls to parse() 2022-05-31 22:51:00 +08:00
417ed0a4eb Fetch tweet id from url 2022-05-31 20:02:54 +08:00
b6fef171be If we have a node with only one child element which has the placeholder class, keep it 2022-05-31 14:18:39 +08:00
22f5e1cc32 Fix embed tweets got deleted when simplifying nested elements 2022-05-31 13:42:16 +08:00
cc5bdf96f5 Update electrek test case 2022-05-31 12:52:33 +08:00
bce50c2a92 Update electrek test case 2022-05-31 11:11:06 +08:00
9dc7fd4c4c Add test case for electrek 2022-05-30 22:47:47 +08:00
35dcd00ec3 Add test case for github blog 2022-05-18 15:52:47 +08:00
0e4cec5e25 Update tests 2022-05-18 11:28:47 +08:00
6795508942 Add test for readability on youtube embeds 2022-05-18 09:55:36 +08:00
bdfa76d716 Remove listnav elements from articles 2022-05-12 15:58:24 -07:00
d542d31aed Fix gflownet test generation 2022-05-11 21:23:30 +08:00
82fb8151a4 Fix generate tests 2022-05-10 21:10:20 +08:00
96b543946d Temporarily disable customer content serializer test 2022-05-10 21:10:06 +08:00
ffa5dee721 Use linkedom in readability tests 2022-05-10 18:40:52 +08:00
cb7f30607a Use linkedom in readability test isProbablyReaderable 2022-05-10 18:33:29 +08:00
6ef14e1f91 Mark ft.com page a readerable 2022-05-05 09:28:22 -07:00
384c5dbf9f Improve rendering of the Financial Times 2022-05-05 09:14:51 -07:00
74693d40c0 Mark page as readerable 2022-05-04 15:01:35 -07:00
8386aebaf8 Remove ads from fiercepharma page 2022-05-04 14:22:08 -07:00
a24b976546 Remove lazy loaded srcset elements
Some tools like jetpack: https://jetpack.com/support/lazy-images/
use a temporary srcset element set to a data image when lazy
loading, these are later removed by JS. We test if there is
a valid src attribute and if the srcset attribute is a data embed
to remove these.
2022-04-29 10:05:33 -07:00
d2bb359f5c Handle srcset image density specifications 2022-04-26 15:18:23 -07:00
72a231c97e Do not proxy image data uri (#421)
* do not proxy image data uri

* rename data uri

* rename data uri in test
2022-04-14 12:59:12 +08:00
4d01f689b2 replace tables of article content with divs for newsletters 2022-04-11 20:00:11 +08:00
6d405432af add site_name and site_icon to page model and return in resolver (#341)
* add site_name and site_icon to page model and return in resolver

* fix tests
2022-03-30 10:43:10 +08:00
12af64609c Fix readability issues with null style elements
isProbably visible can fail in this case because style could be
undefined on an element.
2022-03-23 13:35:00 -07:00
960a22d50c Fix/city journal parsing (#266)
* remove arrow image when parsing

* ignore m_article classname element which indicates a mobile version of the website

* generate test page for city journal
2022-03-21 22:53:21 +08:00
0361ef86fa Better handling of HTML entities in descriptions
The HTML code method didnt implent all possible
entities, causing some (usually rquote) to display.
2022-03-14 11:02:08 -07:00
fc7d972855 Fix typo in readability date handling causing this parse issue
Can remove our special handler for the published date now that we
are pulling it out correctly.
2022-03-14 10:20:19 -07:00
8a2bb0f49d Handle blogger sites that display the full feed on the article page 2022-03-10 13:48:39 -08:00
234dba9174 Improve readability of channelnewsasia
This uses negative lookahead to reject nodes that have outstream
ads embedded. Previously they were being accepted because they
contained `$article` in the class name.
2022-03-04 10:34:44 -08:00
84f32935f5 Open source omnivore 2022-02-11 09:24:33 -08:00