Commit Graph

244 Commits

Author SHA1 Message Date
b3f052860a Merge pull request #3175 from omnivore-app/fix/detect-language
detect language from html content
2024-01-03 17:50:01 +08:00
9922e0b2d8 Update generated html 2023-12-20 08:19:27 +00:00
1a371a80b1 remove hidden labels from substack post by readability 2023-12-20 16:18:25 +08:00
77110640a1 detect language from content 2023-11-27 23:09:20 +08:00
10a21adc33 detect language from content if locale not found in metadata 2023-11-27 22:54:39 +08:00
c4773dc904 Landing page improvements and various supporting improvements 2023-10-24 09:43:39 +01:00
313fd77bef Update generated html 2023-10-20 18:59:22 +08:00
1b1cce7485 disable javascript for the host 2023-10-20 18:59:22 +08:00
3bd43048a4 add test case for forte labs newsletter 2023-10-20 18:59:22 +08:00
2b23b0e002 Update generated html 2023-09-28 09:37:48 +00:00
001403c02d fix tests 2023-09-28 17:36:27 +08:00
310ad5de1d get published date from url 2023-09-28 17:35:33 +08:00
45b7c2b619 get published date from time element 2023-09-28 17:14:17 +08:00
f0abdd654a Update generated html 2023-09-28 02:35:06 +00:00
55e274a32c better match of published date and avoid removing date string which is not a published date 2023-09-28 10:34:05 +08:00
0ccf332ab0 Update generated html 2023-09-27 07:33:55 +00:00
3399213328 add test cases from economist and caixin 2023-09-27 15:32:42 +08:00
60b7d500a2 fix long published date not parsed correctly 2023-09-26 21:41:27 +08:00
d37cb7fda1 fix published date in chinese not parsed correctly 2023-09-26 20:48:01 +08:00
53a6a5e6b9 Update generated html 2023-09-12 08:50:11 +00:00
e38411af33 Boost content length of emoji 2023-09-12 16:49:05 +08:00
08dbe2dead Handle the ignore density check in the getLinkDensity function 2023-09-12 16:04:58 +08:00
293becf596 Ignore link density checks in newsletters 2023-09-12 15:53:43 +08:00
f157063187 Update generated html 2023-08-16 03:58:00 +00:00
ad6ce8077b speically handle zhihu.com 2023-08-16 11:54:48 +08:00
51a2029f65 fix title not fetched correctly for some chinese websites 2023-08-16 10:45:48 +08:00
3119471d1c do not remove QuestionHeader 2023-08-15 21:31:19 +08:00
94b7399b1c Add points for any commas within this paragraph 2023-08-15 21:21:17 +08:00
d955e53fdd generate test page for zhihu 2023-08-15 11:15:19 +08:00
7641a2567e disable extensions too 2023-08-02 16:12:24 +08:00
4eab6ea6d2 remove hardware acceleration 2023-08-02 16:07:43 +08:00
a97fcd1e88 do not use single process in chromium 2023-08-02 15:58:32 +08:00
63cbb3011e upgrade puppeteer and update chromium args 2023-08-02 15:33:15 +08:00
837bea4913 Update generated html 2023-07-24 05:15:02 +00:00
2f0c830843 Improve readability for lesswrong.com 2023-07-24 13:13:50 +08:00
48ed5ec745 Hide webflow test elements 2023-07-24 12:35:18 +08:00
545e396d6e Update generated html 2023-07-24 12:21:51 +08:00
9d49b683f5 Add test for webflow page that includes an embedded textbox 2023-07-24 12:17:45 +08:00
e446634504 Update generated html 2023-06-26 08:41:23 +00:00
244fb4ccb5 fix: removing node with background image 2023-06-26 16:40:14 +08:00
90d41c5e85 Update generated html 2023-06-26 04:03:48 +00:00
a964c59d80 fix: missing links
* skip removing <a> elements with published date in the url
2023-06-26 11:50:50 +08:00
8d3db4161b Update generated html 2023-06-22 09:41:25 +00:00
9d1eb3bfe6 add testcase 2023-06-22 16:32:30 +08:00
2aa2d09cef Update generated html 2023-05-26 09:50:22 +00:00
141266461c add test case for lesswrong 2023-05-26 17:49:12 +08:00
d1c69303b3 fix readability's substack test page failure by mocking the tweet redirect link 2023-05-10 18:09:07 +08:00
736f003b37 Update generated html 2023-04-28 11:58:22 +00:00
2fbee1e831 Remove webflow invisible elements 2023-04-28 19:57:13 +08:00
815aea8375 Update generated html 2023-04-19 09:07:11 +00:00