Commit Graph

176 Commits

Author SHA1 Message Date
aae6759bcb return published date if the class name is omnivore-published-date which we added when we scraped the article 2023-03-13 12:08:01 +08:00
83bff96699 Update generated html 2023-02-15 09:19:01 +00:00
97cfd1376f Add test pages 2023-02-15 17:18:04 +08:00
1b58804547 Add points for any commas (including those in CJK language) 2023-02-15 17:12:28 +08:00
53c3bfff43 Update generated html 2023-02-14 07:04:19 +00:00
3c97545fc1 Regenerate test pages 2023-02-14 15:01:17 +08:00
fc0bbe391a Merge pull request #1805 from omnivore-app/fix/content-parsing
fix/content parsing
2023-02-14 14:15:46 +08:00
7fc468b2bd Update generated html 2023-02-14 04:34:50 +00:00
cc8b1cefdb Preserve <pre> elements with prism- class and identity them as code blocks 2023-02-14 12:33:59 +08:00
4513b5931b Update generated html 2023-02-13 14:14:39 +00:00
69486a8527 Add readability test for yuyue.com 2023-02-13 17:01:48 +08:00
9fc77c62d6 Merge pull request #1795 from omnivore-app/feat/fallback-urls-for-images
Add the original URL as a fallback when creating URL proxys
2023-02-13 16:55:24 +08:00
96a9c80960 Update tests 2023-02-13 15:26:33 +08:00
1a1617e86c Add readability test for habr.com 2023-02-10 17:13:18 +08:00
2f32b01a61 Update generated html 2023-02-10 07:46:02 +00:00
6b4c34bec1 Add wechat test page 2023-02-10 13:57:21 +08:00
15d417410e Add GitHub action to generate static html for readability and distiller test pages 2023-02-10 12:40:12 +08:00
eba25f8307 Update python script to write content into index.html 2023-02-10 12:08:20 +08:00
c285a017c3 Increase view.html iframe height 2023-02-09 22:20:58 +08:00
af55adbef8 Add the original URL as a fallback when creating URL proxys 2023-02-09 17:06:14 +08:00
5903160eb7 Add dom-distiller generated test pages 2023-02-09 15:36:26 +08:00
48255ffbf9 Fix not showing images in wechat articles 2023-01-30 15:03:55 +08:00
963e768996 Fix not showing images in wechat articles 2023-01-30 15:03:43 +08:00
57289cb0c4 Improve label search in the apply views 2023-01-23 12:43:27 +08:00
0edd91057e Update user interface for file import tool 2023-01-03 17:49:07 +08:00
742ff7aa69 Update test case user agent 2023-01-03 12:33:50 +08:00
a0f51c94ee Add another social media footer class to unlikely candidates 2023-01-03 12:33:12 +08:00
51544dfa50 Use same user agents in generate-testcase as in puppeteer 2023-01-03 11:48:07 +08:00
abd1b5916f Merge pull request #1490 from omnivore-app/fix/parse-guardian
Fix parsing guardian news
2022-12-29 14:27:43 +08:00
7c39db207b Replace createArticle with savePage in puppeteer-parse service 2022-12-28 10:15:05 +08:00
3277b1d229 Add another testcase for locale parsing 2022-12-16 09:24:36 +08:00
f2feb6fa6e Handle pulling locale from open graph metadata
This also fixes parsing language codes which are split by an
underscore.
2022-12-16 09:19:07 +08:00
34029bee7e Update getting started guide, improve section on newsletter forwards, fix link to Logseq guide 2022-12-14 17:59:32 +08:00
6463231edc Generate test page 2022-12-01 14:44:31 +08:00
ce44c8e529 Default parent classes being an empty array 2022-11-30 11:59:42 +08:00
4bedfd2abe Fix test 2022-11-30 11:10:26 +08:00
548f317607 Skip cleanning nytimes article first few paragraphs which has too many links 2022-11-30 11:04:05 +08:00
2564c20f86 generate test page for nytimes.com 2022-11-29 22:17:46 +08:00
754ee36773 Add a test case from stackoverflow for author parsing 2022-11-25 10:45:23 +08:00
cc77e72a5b Correctly parse names out of itemprop/author segments
If these nodes are setup correctly with structured data, parse
out the name instead of taking the entire textContent.
2022-11-25 10:20:00 +08:00
5d9b705be3 Fix issue where stripping classNames could cause a crash 2022-11-17 15:50:32 +08:00
1b62ada73e Make overlay element an unlikelyCandidates and give player element positive score 2022-11-17 11:27:51 +08:00
13e925c503 Allow kaltura videos through in readablity 2022-11-17 11:27:51 +08:00
4ee5675e23 Add gdcvault parsing test-case 2022-11-17 11:27:51 +08:00
81c125ed50 Handle cases where className is a dictionary instead of string 2022-11-01 12:16:29 +08:00
9b53b09d51 Fix check for isOmnivoreNode 2022-11-01 10:52:29 +08:00
4afd598ada Check the correct node when looking for detecting omnivore nodes 2022-11-01 10:31:50 +08:00
cc91e43572 Handle embedded tweets in substack emails
This does a few things:
- tags static tweets found in substack emails with a special class
- upgrades readability to ignore special class names
- reduces some readability debug output
2022-10-31 21:28:36 +08:00
e01ff35a01 Add test case from brookings.edu
This article has a bunch of embedded icons in the content, things
like an indicator for an external link.
2022-10-27 12:06:40 +08:00
303165d47c Add icon to the ignored matchlist 2022-10-27 12:06:38 +08:00