Commit Graph

121 Commits

Author SHA1 Message Date
4d0f1bec88 Add support for embedding TikTok videos 2024-05-13 13:30:54 +08:00
e031f4f81c deprioritize jsonld preview image 2024-03-19 15:51:20 +08:00
4cfe4f95ae Remove bottom-wrapper element which is added on NYT and some other sites RSS feeds 2024-02-20 11:26:21 +08:00
f33ff113db Improve readability for the verge 2024-02-15 10:57:36 +08:00
d5d581fc54 remove redundant logs 2024-01-29 14:32:43 +08:00
fd7c2ffb49 fix code blocks not formatted correctly in articles from wechat official accounts 2024-01-25 16:30:59 +08:00
dae90aaffa fix code blocks not formatted correctly 2024-01-25 16:30:59 +08:00
b3f052860a Merge pull request #3175 from omnivore-app/fix/detect-language
detect language from html content
2024-01-03 17:50:01 +08:00
1a371a80b1 remove hidden labels from substack post by readability 2023-12-20 16:18:25 +08:00
77110640a1 detect language from content 2023-11-27 23:09:20 +08:00
10a21adc33 detect language from content if locale not found in metadata 2023-11-27 22:54:39 +08:00
310ad5de1d get published date from url 2023-09-28 17:35:33 +08:00
45b7c2b619 get published date from time element 2023-09-28 17:14:17 +08:00
55e274a32c better match of published date and avoid removing date string which is not a published date 2023-09-28 10:34:05 +08:00
3399213328 add test cases from economist and caixin 2023-09-27 15:32:42 +08:00
60b7d500a2 fix long published date not parsed correctly 2023-09-26 21:41:27 +08:00
d37cb7fda1 fix published date in chinese not parsed correctly 2023-09-26 20:48:01 +08:00
e38411af33 Boost content length of emoji 2023-09-12 16:49:05 +08:00
08dbe2dead Handle the ignore density check in the getLinkDensity function 2023-09-12 16:04:58 +08:00
293becf596 Ignore link density checks in newsletters 2023-09-12 15:53:43 +08:00
51a2029f65 fix title not fetched correctly for some chinese websites 2023-08-16 10:45:48 +08:00
3119471d1c do not remove QuestionHeader 2023-08-15 21:31:19 +08:00
94b7399b1c Add points for any commas within this paragraph 2023-08-15 21:21:17 +08:00
2f0c830843 Improve readability for lesswrong.com 2023-07-24 13:13:50 +08:00
48ed5ec745 Hide webflow test elements 2023-07-24 12:35:18 +08:00
a964c59d80 fix: missing links
* skip removing <a> elements with published date in the url
2023-06-26 11:50:50 +08:00
2fbee1e831 Remove webflow invisible elements 2023-04-28 19:57:13 +08:00
fbb638619c Mark the related stories and social buttons as unlikely candidates 2023-04-19 17:04:01 +08:00
add54b1e35 For lazy loaded images use their lazy src as the src URL 2023-04-11 10:58:06 +08:00
eb58bf11ba Force to use content handler of piped.video when saving from extensions 2023-04-10 20:52:09 +08:00
deff73953a Do not delete embeded iframe of piped video 2023-04-06 16:30:52 +08:00
2378abef4a Merge pull request #1962 from omnivore-app/fix/newline-in-author
Remove \n, extra spaces from and trim author
2023-03-31 10:21:42 +08:00
f77aae9810 Remove \n, extra spaces from and trim author 2023-03-30 21:55:41 +08:00
db687f151b Strip the tl_article_header element 2023-03-30 19:31:43 +08:00
895e50201a Fix tests 2023-03-15 19:36:53 +08:00
aeb09539cc Fallback to hostname 2023-03-15 13:24:17 +08:00
aae6759bcb return published date if the class name is omnivore-published-date which we added when we scraped the article 2023-03-13 12:08:01 +08:00
1b58804547 Add points for any commas (including those in CJK language) 2023-02-15 17:12:28 +08:00
fc0bbe391a Merge pull request #1805 from omnivore-app/fix/content-parsing
fix/content parsing
2023-02-14 14:15:46 +08:00
cc8b1cefdb Preserve <pre> elements with prism- class and identity them as code blocks 2023-02-14 12:33:59 +08:00
9fc77c62d6 Merge pull request #1795 from omnivore-app/feat/fallback-urls-for-images
Add the original URL as a fallback when creating URL proxys
2023-02-13 16:55:24 +08:00
6b4c34bec1 Add wechat test page 2023-02-10 13:57:21 +08:00
af55adbef8 Add the original URL as a fallback when creating URL proxys 2023-02-09 17:06:14 +08:00
48255ffbf9 Fix not showing images in wechat articles 2023-01-30 15:03:55 +08:00
963e768996 Fix not showing images in wechat articles 2023-01-30 15:03:43 +08:00
57289cb0c4 Improve label search in the apply views 2023-01-23 12:43:27 +08:00
a0f51c94ee Add another social media footer class to unlikely candidates 2023-01-03 12:33:12 +08:00
7c39db207b Replace createArticle with savePage in puppeteer-parse service 2022-12-28 10:15:05 +08:00
f2feb6fa6e Handle pulling locale from open graph metadata
This also fixes parsing language codes which are split by an
underscore.
2022-12-16 09:19:07 +08:00
ce44c8e529 Default parent classes being an empty array 2022-11-30 11:59:42 +08:00