Commit Graph

155 Commits

Author SHA1 Message Date
48255ffbf9 Fix not showing images in wechat articles 2023-01-30 15:03:55 +08:00
963e768996 Fix not showing images in wechat articles 2023-01-30 15:03:43 +08:00
57289cb0c4 Improve label search in the apply views 2023-01-23 12:43:27 +08:00
0edd91057e Update user interface for file import tool 2023-01-03 17:49:07 +08:00
742ff7aa69 Update test case user agent 2023-01-03 12:33:50 +08:00
a0f51c94ee Add another social media footer class to unlikely candidates 2023-01-03 12:33:12 +08:00
51544dfa50 Use same user agents in generate-testcase as in puppeteer 2023-01-03 11:48:07 +08:00
abd1b5916f Merge pull request #1490 from omnivore-app/fix/parse-guardian
Fix parsing guardian news
2022-12-29 14:27:43 +08:00
7c39db207b Replace createArticle with savePage in puppeteer-parse service 2022-12-28 10:15:05 +08:00
3277b1d229 Add another testcase for locale parsing 2022-12-16 09:24:36 +08:00
f2feb6fa6e Handle pulling locale from open graph metadata
This also fixes parsing language codes which are split by an
underscore.
2022-12-16 09:19:07 +08:00
34029bee7e Update getting started guide, improve section on newsletter forwards, fix link to Logseq guide 2022-12-14 17:59:32 +08:00
6463231edc Generate test page 2022-12-01 14:44:31 +08:00
ce44c8e529 Default parent classes being an empty array 2022-11-30 11:59:42 +08:00
4bedfd2abe Fix test 2022-11-30 11:10:26 +08:00
548f317607 Skip cleanning nytimes article first few paragraphs which has too many links 2022-11-30 11:04:05 +08:00
2564c20f86 generate test page for nytimes.com 2022-11-29 22:17:46 +08:00
754ee36773 Add a test case from stackoverflow for author parsing 2022-11-25 10:45:23 +08:00
cc77e72a5b Correctly parse names out of itemprop/author segments
If these nodes are setup correctly with structured data, parse
out the name instead of taking the entire textContent.
2022-11-25 10:20:00 +08:00
5d9b705be3 Fix issue where stripping classNames could cause a crash 2022-11-17 15:50:32 +08:00
1b62ada73e Make overlay element an unlikelyCandidates and give player element positive score 2022-11-17 11:27:51 +08:00
13e925c503 Allow kaltura videos through in readablity 2022-11-17 11:27:51 +08:00
4ee5675e23 Add gdcvault parsing test-case 2022-11-17 11:27:51 +08:00
81c125ed50 Handle cases where className is a dictionary instead of string 2022-11-01 12:16:29 +08:00
9b53b09d51 Fix check for isOmnivoreNode 2022-11-01 10:52:29 +08:00
4afd598ada Check the correct node when looking for detecting omnivore nodes 2022-11-01 10:31:50 +08:00
cc91e43572 Handle embedded tweets in substack emails
This does a few things:
- tags static tweets found in substack emails with a special class
- upgrades readability to ignore special class names
- reduces some readability debug output
2022-10-31 21:28:36 +08:00
e01ff35a01 Add test case from brookings.edu
This article has a bunch of embedded icons in the content, things
like an indicator for an external link.
2022-10-27 12:06:40 +08:00
303165d47c Add icon to the ignored matchlist 2022-10-27 12:06:38 +08:00
990759da73 Save base64 encoded image site icon in page 2022-10-18 15:32:30 +08:00
3f82c0af01 Handle cases where tweet items dont have parents 2022-10-18 13:46:37 +08:00
57676d381c Better handle <a elements with no parent when looking for tweet placeholders 2022-10-18 12:13:54 +08:00
08a1d4af91 Update the Getting started guide for new users 2022-10-17 17:11:25 +08:00
7af068e8dc Header fix adds the headers back in for this test case 2022-10-12 14:51:59 +08:00
989f0e2e31 Add test from the new Omnivore Getting Started article 2022-10-12 14:34:36 +08:00
57575c75aa Remove the subscriber dialog that was part of the header before 2022-10-12 14:31:34 +08:00
0116304284 Dont exclude substacks in article headers
Substack uses this class on header elements like <h1> to give
them anchors in the documents, but we mark content with `header`
in its class name as unlikely.
2022-10-12 14:27:31 +08:00
7187326d90 Use li-date and post-tag selectors instead of post-meta as that usually has useful data 2022-10-04 15:06:54 +08:00
8a4777011f Improve readability of dev.to pages 2022-10-04 13:40:26 +08:00
57ca5ed6f8 Merge pull request #1246 from omnivore-app/feature/subscription-icon
feature/subscription icon
2022-09-28 22:10:23 +08:00
4aceac5f04 Revert "Get favicon from URL if not found from doc"
This reverts commit 7306d65642.
2022-09-28 18:25:11 +08:00
3adf0fe428 Update dock removal for NYT 2022-09-28 18:21:49 +08:00
08329ba5cf Update NYT test page 2022-09-28 18:20:12 +08:00
7306d65642 Get favicon from URL if not found from doc 2022-09-28 17:18:00 +08:00
e8213eff97 Revert typo 2022-09-28 16:51:00 +08:00
ae5b9a3fd3 Add tests for NYT podcasts page 2022-09-28 16:32:12 +08:00
9d223bb4e2 Better handling of NYT podcast transcripts 2022-09-28 16:31:35 +08:00
3710a13890 Add more test pages 2022-09-27 22:26:27 +08:00
3524f77339 Add more test pages 2022-09-27 22:23:31 +08:00
2a97284f5b Add more test pages 2022-09-27 22:11:43 +08:00