e2a0c41528
Add readability test for sydney.com
2022-06-27 22:26:41 +08:00
a3e5d6d817
Remove ad
2022-06-27 11:40:18 +08:00
2d4ee151d1
Generate test page
2022-06-27 11:12:34 +08:00
324c4ee6e7
Give .container-banners higher weight
2022-06-23 21:58:37 +08:00
bf05d1782f
Regenerate test case Cont
2022-06-23 21:14:06 +08:00
68692111d9
Regenerate test case
2022-06-23 21:11:34 +08:00
13a95f4e44
Add test case for energias-renovables.com
2022-06-23 00:31:23 -07:00
486f22a594
remove redundant async
2022-06-15 22:31:55 +08:00
079857d6c0
update test case
2022-06-15 22:27:41 +08:00
2aafd39650
add fastcompany.com to the non-script hosts list
2022-06-15 22:27:25 +08:00
e028e2e440
generate test page for fast company
2022-06-15 21:22:25 +08:00
5625a055d8
Update test case
2022-06-13 17:36:23 +08:00
64673bd6db
Add test case
2022-06-13 15:15:03 +08:00
8eecdd60e3
Add test case for https://infoproc.blogspot.com/2022/05/theodore-postol-nuclear-weapons-missile.html
2022-06-09 16:37:21 +08:00
35ca7ede82
Add test case for danluu
2022-06-06 22:37:05 +08:00
704726dc6a
Improve parsing of channel news asia
2022-06-03 14:03:14 -07:00
304fe70113
Fix tests
2022-06-01 10:30:23 +08:00
98ecdcff80
Fix test case of setting tweet-placeholder
2022-05-31 22:51:27 +08:00
404805e0c0
Make async calls to parse()
2022-05-31 22:51:00 +08:00
417ed0a4eb
Fetch tweet id from url
2022-05-31 20:02:54 +08:00
b6fef171be
If we have a node with only one child element which has the placeholder class, keep it
2022-05-31 14:18:39 +08:00
22f5e1cc32
Fix embed tweets got deleted when simplifying nested elements
2022-05-31 13:42:16 +08:00
cc5bdf96f5
Update electrek test case
2022-05-31 12:52:33 +08:00
bce50c2a92
Update electrek test case
2022-05-31 11:11:06 +08:00
9dc7fd4c4c
Add test case for electrek
2022-05-30 22:47:47 +08:00
35dcd00ec3
Add test case for github blog
2022-05-18 15:52:47 +08:00
0e4cec5e25
Update tests
2022-05-18 11:28:47 +08:00
6795508942
Add test for readability on youtube embeds
2022-05-18 09:55:36 +08:00
bdfa76d716
Remove listnav elements from articles
2022-05-12 15:58:24 -07:00
d542d31aed
Fix gflownet test generation
2022-05-11 21:23:30 +08:00
82fb8151a4
Fix generate tests
2022-05-10 21:10:20 +08:00
96b543946d
Temporarily disable customer content serializer test
2022-05-10 21:10:06 +08:00
ffa5dee721
Use linkedom in readability tests
2022-05-10 18:40:52 +08:00
cb7f30607a
Use linkedom in readability test isProbablyReaderable
2022-05-10 18:33:29 +08:00
6ef14e1f91
Mark ft.com page a readerable
2022-05-05 09:28:22 -07:00
384c5dbf9f
Improve rendering of the Financial Times
2022-05-05 09:14:51 -07:00
74693d40c0
Mark page as readerable
2022-05-04 15:01:35 -07:00
8386aebaf8
Remove ads from fiercepharma page
2022-05-04 14:22:08 -07:00
a24b976546
Remove lazy loaded srcset elements
...
Some tools like jetpack: https://jetpack.com/support/lazy-images/
use a temporary srcset element set to a data image when lazy
loading, these are later removed by JS. We test if there is
a valid src attribute and if the srcset attribute is a data embed
to remove these.
2022-04-29 10:05:33 -07:00
d2bb359f5c
Handle srcset image density specifications
2022-04-26 15:18:23 -07:00
72a231c97e
Do not proxy image data uri ( #421 )
...
* do not proxy image data uri
* rename data uri
* rename data uri in test
2022-04-14 12:59:12 +08:00
4d01f689b2
replace tables of article content with divs for newsletters
2022-04-11 20:00:11 +08:00
6d405432af
add site_name and site_icon to page model and return in resolver ( #341 )
...
* add site_name and site_icon to page model and return in resolver
* fix tests
2022-03-30 10:43:10 +08:00
12af64609c
Fix readability issues with null style elements
...
isProbably visible can fail in this case because style could be
undefined on an element.
2022-03-23 13:35:00 -07:00
960a22d50c
Fix/city journal parsing ( #266 )
...
* remove arrow image when parsing
* ignore m_article classname element which indicates a mobile version of the website
* generate test page for city journal
2022-03-21 22:53:21 +08:00
0361ef86fa
Better handling of HTML entities in descriptions
...
The HTML code method didnt implent all possible
entities, causing some (usually rquote) to display.
2022-03-14 11:02:08 -07:00
fc7d972855
Fix typo in readability date handling causing this parse issue
...
Can remove our special handler for the published date now that we
are pulling it out correctly.
2022-03-14 10:20:19 -07:00
8a2bb0f49d
Handle blogger sites that display the full feed on the article page
2022-03-10 13:48:39 -08:00
234dba9174
Improve readability of channelnewsasia
...
This uses negative lookahead to reject nodes that have outstream
ads embedded. Previously they were being accepted because they
contained `$article` in the class name.
2022-03-04 10:34:44 -08:00
84f32935f5
Open source omnivore
2022-02-11 09:24:33 -08:00