f2feb6fa6e
Handle pulling locale from open graph metadata
...
This also fixes parsing language codes which are split by an
underscore.
2022-12-16 09:19:07 +08:00
ce44c8e529
Default parent classes being an empty array
2022-11-30 11:59:42 +08:00
548f317607
Skip cleanning nytimes article first few paragraphs which has too many links
2022-11-30 11:04:05 +08:00
cc77e72a5b
Correctly parse names out of itemprop/author segments
...
If these nodes are setup correctly with structured data, parse
out the name instead of taking the entire textContent.
2022-11-25 10:20:00 +08:00
5d9b705be3
Fix issue where stripping classNames could cause a crash
2022-11-17 15:50:32 +08:00
1b62ada73e
Make overlay element an unlikelyCandidates and give player element positive score
2022-11-17 11:27:51 +08:00
13e925c503
Allow kaltura videos through in readablity
2022-11-17 11:27:51 +08:00
81c125ed50
Handle cases where className is a dictionary instead of string
2022-11-01 12:16:29 +08:00
9b53b09d51
Fix check for isOmnivoreNode
2022-11-01 10:52:29 +08:00
4afd598ada
Check the correct node when looking for detecting omnivore nodes
2022-11-01 10:31:50 +08:00
cc91e43572
Handle embedded tweets in substack emails
...
This does a few things:
- tags static tweets found in substack emails with a special class
- upgrades readability to ignore special class names
- reduces some readability debug output
2022-10-31 21:28:36 +08:00
303165d47c
Add icon to the ignored matchlist
2022-10-27 12:06:38 +08:00
990759da73
Save base64 encoded image site icon in page
2022-10-18 15:32:30 +08:00
3f82c0af01
Handle cases where tweet items dont have parents
2022-10-18 13:46:37 +08:00
57676d381c
Better handle <a elements with no parent when looking for tweet placeholders
2022-10-18 12:13:54 +08:00
57575c75aa
Remove the subscriber dialog that was part of the header before
2022-10-12 14:31:34 +08:00
0116304284
Dont exclude substacks in article headers
...
Substack uses this class on header elements like <h1> to give
them anchors in the documents, but we mark content with `header`
in its class name as unlikely.
2022-10-12 14:27:31 +08:00
7187326d90
Use li-date and post-tag selectors instead of post-meta as that usually has useful data
2022-10-04 15:06:54 +08:00
8a4777011f
Improve readability of dev.to pages
2022-10-04 13:40:26 +08:00
3adf0fe428
Update dock removal for NYT
2022-09-28 18:21:49 +08:00
e8213eff97
Revert typo
2022-09-28 16:51:00 +08:00
9d223bb4e2
Better handling of NYT podcast transcripts
2022-09-28 16:31:35 +08:00
8ff356d3bc
keep the market-info class in morning brew newsletters
2022-09-15 16:22:12 +08:00
fa731acd74
Create new newsletter if old one is deleted
2022-09-14 19:14:58 +08:00
6435a20246
Remove social links
2022-09-14 16:59:21 +08:00
1563ef9131
replace tables whose role is not presentation with divs for newsletters
2022-09-14 16:59:21 +08:00
1be0dd50ef
Remove anchors class names that contain "tw-text-substack-secondary" as they are used for Substack subscription
2022-09-08 22:19:27 +08:00
f0f7aa5a6d
rename jsdom to linkedom
2022-07-14 10:59:44 +08:00
b317a0877b
Remove foot and footnote from negative score list
2022-07-07 10:48:04 +08:00
7957cd2126
Do not remove content in .menu-opacity element
2022-06-27 22:27:38 +08:00
bcebe738ff
Update Readability-readerable
2022-06-27 11:56:21 +08:00
a3e5d6d817
Remove ad
2022-06-27 11:40:18 +08:00
2504348936
Fix getting nodes from null document element
2022-06-27 11:09:38 +08:00
324c4ee6e7
Give .container-banners higher weight
2022-06-23 21:58:37 +08:00
86a9383b53
Do not remove content in .container-banners class
2022-06-23 21:11:23 +08:00
8535534709
Fix classname = null exception by checking element parent node nullability before checking classname
2022-06-16 18:09:23 +08:00
ff87371d21
Make the tweet a single child in the parent class=tweet element even there are more than one twitter links in it
2022-06-13 17:36:15 +08:00
f8dd405c3f
ignores link density for the links inside the .post-body div (the main content)
2022-06-09 16:37:03 +08:00
704726dc6a
Improve parsing of channel news asia
2022-06-03 14:03:14 -07:00
0b0edd3e69
Make fetching tweet url async
2022-05-31 22:50:14 +08:00
417ed0a4eb
Fetch tweet id from url
2022-05-31 20:02:54 +08:00
b6fef171be
If we have a node with only one child element which has the placeholder class, keep it
2022-05-31 14:18:39 +08:00
22f5e1cc32
Fix embed tweets got deleted when simplifying nested elements
2022-05-31 13:42:16 +08:00
a34806a782
Fix tests
2022-05-26 10:55:21 +08:00
7d4d1d7b67
Parse language in readability
2022-05-26 10:55:21 +08:00
bdfa76d716
Remove listnav elements from articles
2022-05-12 15:58:24 -07:00
2755da16a9
Fix not getting iframe src
2022-05-11 19:25:12 +08:00
2152a9e466
Fix getting embeded class lists bug
2022-05-10 16:57:38 +08:00
76d47f7dc5
Fix updating live collections
2022-05-10 16:57:02 +08:00
79a941a2b6
Default use options.url if exists
2022-05-10 16:56:09 +08:00