310ad5de1d
get published date from url
2023-09-28 17:35:33 +08:00
45b7c2b619
get published date from time element
2023-09-28 17:14:17 +08:00
55e274a32c
better match of published date and avoid removing date string which is not a published date
2023-09-28 10:34:05 +08:00
3399213328
add test cases from economist and caixin
2023-09-27 15:32:42 +08:00
60b7d500a2
fix long published date not parsed correctly
2023-09-26 21:41:27 +08:00
d37cb7fda1
fix published date in chinese not parsed correctly
2023-09-26 20:48:01 +08:00
e38411af33
Boost content length of emoji
2023-09-12 16:49:05 +08:00
08dbe2dead
Handle the ignore density check in the getLinkDensity function
2023-09-12 16:04:58 +08:00
293becf596
Ignore link density checks in newsletters
2023-09-12 15:53:43 +08:00
51a2029f65
fix title not fetched correctly for some chinese websites
2023-08-16 10:45:48 +08:00
3119471d1c
do not remove QuestionHeader
2023-08-15 21:31:19 +08:00
94b7399b1c
Add points for any commas within this paragraph
2023-08-15 21:21:17 +08:00
2f0c830843
Improve readability for lesswrong.com
2023-07-24 13:13:50 +08:00
48ed5ec745
Hide webflow test elements
2023-07-24 12:35:18 +08:00
a964c59d80
fix: missing links
...
* skip removing <a> elements with published date in the url
2023-06-26 11:50:50 +08:00
2fbee1e831
Remove webflow invisible elements
2023-04-28 19:57:13 +08:00
fbb638619c
Mark the related stories and social buttons as unlikely candidates
2023-04-19 17:04:01 +08:00
add54b1e35
For lazy loaded images use their lazy src as the src URL
2023-04-11 10:58:06 +08:00
eb58bf11ba
Force to use content handler of piped.video when saving from extensions
2023-04-10 20:52:09 +08:00
deff73953a
Do not delete embeded iframe of piped video
2023-04-06 16:30:52 +08:00
2378abef4a
Merge pull request #1962 from omnivore-app/fix/newline-in-author
...
Remove \n, extra spaces from and trim author
2023-03-31 10:21:42 +08:00
f77aae9810
Remove \n, extra spaces from and trim author
2023-03-30 21:55:41 +08:00
db687f151b
Strip the tl_article_header element
2023-03-30 19:31:43 +08:00
895e50201a
Fix tests
2023-03-15 19:36:53 +08:00
aeb09539cc
Fallback to hostname
2023-03-15 13:24:17 +08:00
aae6759bcb
return published date if the class name is omnivore-published-date which we added when we scraped the article
2023-03-13 12:08:01 +08:00
1b58804547
Add points for any commas (including those in CJK language)
2023-02-15 17:12:28 +08:00
fc0bbe391a
Merge pull request #1805 from omnivore-app/fix/content-parsing
...
fix/content parsing
2023-02-14 14:15:46 +08:00
cc8b1cefdb
Preserve <pre> elements with prism- class and identity them as code blocks
2023-02-14 12:33:59 +08:00
9fc77c62d6
Merge pull request #1795 from omnivore-app/feat/fallback-urls-for-images
...
Add the original URL as a fallback when creating URL proxys
2023-02-13 16:55:24 +08:00
6b4c34bec1
Add wechat test page
2023-02-10 13:57:21 +08:00
af55adbef8
Add the original URL as a fallback when creating URL proxys
2023-02-09 17:06:14 +08:00
48255ffbf9
Fix not showing images in wechat articles
2023-01-30 15:03:55 +08:00
963e768996
Fix not showing images in wechat articles
2023-01-30 15:03:43 +08:00
57289cb0c4
Improve label search in the apply views
2023-01-23 12:43:27 +08:00
a0f51c94ee
Add another social media footer class to unlikely candidates
2023-01-03 12:33:12 +08:00
7c39db207b
Replace createArticle with savePage in puppeteer-parse service
2022-12-28 10:15:05 +08:00
f2feb6fa6e
Handle pulling locale from open graph metadata
...
This also fixes parsing language codes which are split by an
underscore.
2022-12-16 09:19:07 +08:00
ce44c8e529
Default parent classes being an empty array
2022-11-30 11:59:42 +08:00
548f317607
Skip cleanning nytimes article first few paragraphs which has too many links
2022-11-30 11:04:05 +08:00
cc77e72a5b
Correctly parse names out of itemprop/author segments
...
If these nodes are setup correctly with structured data, parse
out the name instead of taking the entire textContent.
2022-11-25 10:20:00 +08:00
5d9b705be3
Fix issue where stripping classNames could cause a crash
2022-11-17 15:50:32 +08:00
1b62ada73e
Make overlay element an unlikelyCandidates and give player element positive score
2022-11-17 11:27:51 +08:00
13e925c503
Allow kaltura videos through in readablity
2022-11-17 11:27:51 +08:00
81c125ed50
Handle cases where className is a dictionary instead of string
2022-11-01 12:16:29 +08:00
9b53b09d51
Fix check for isOmnivoreNode
2022-11-01 10:52:29 +08:00
4afd598ada
Check the correct node when looking for detecting omnivore nodes
2022-11-01 10:31:50 +08:00
cc91e43572
Handle embedded tweets in substack emails
...
This does a few things:
- tags static tweets found in substack emails with a special class
- upgrades readability to ignore special class names
- reduces some readability debug output
2022-10-31 21:28:36 +08:00
303165d47c
Add icon to the ignored matchlist
2022-10-27 12:06:38 +08:00
990759da73
Save base64 encoded image site icon in page
2022-10-18 15:32:30 +08:00