Commit Graph

88 Commits

Author SHA1 Message Date
27157006c1 use private bucket to upload page events (#244)
* use private bucket to upload page events

* fix tests

* add GCS_UPLOAD_PRIVATE_BUCKET in test env

* allow GCS_UPLOAD_PRIVATE_BUCKET to be empty
2022-03-16 14:39:07 +08:00
e652a6ea8c Rebased version of the elastic PR (#225)
* Add elastic to our docker compose

* add AND/OR/NOT search operations

* add elastic and create article in elastic

* change error code when elastic throws error

* add search pages in elastic

* add search by labels

* Add elastic to GitHub Action

* Update elastic version

* Fix port for elastic

* add url in search query

* Set elastic features when running tests

* add debug logs

* Use localhost instead of service hostname

* refresh elastic after create/update

* update search labels query

* add typescript support

* search pages in elastic

* fix search queries

* use elastic for saving page

* fix test failure

* update getArticle api to use elastic

* use generic get page function

* add elastic migration python script

* fix bulk helper param

* save elastic page id in article_saving_request instead of postgres article_id

* fix page archiving and deleting

* add tests for deleteArticle

* remove custom date type in elastic mappings which not exist in older version of elastic

* fix timestamp format issue

* add tests for save reading progress

* add tests for save file

* optimize search results

* add alias to index

* update migration script to receive env var as params

* Add failing test to validate we don't decrease reading progress

This test is failing with Elastic because we aren't fetching
the reading progress from elastic here, and are fetching it
from postgres.

* Rename readingProgress to readingProgressPercent

This is the name stored in elastic, so fixes issues pulling the
value out.

* Linting

* Add failing test for creating highlights w/elastic

This test fails because the highlight can't be looked up. Is there
a different ID we should be passing in to query for highlights,
or do we need to update the query to look for elastic_id?

* add tests code coverage threshold

* update nyc config

* include more files in test coverage

* change alias name

* update updateContent to update pages in elastic

* remove debug log

* fix createhighlight test

* search pages by alias in elastic

* update set labels and delete labels in elastic

* migration script enumeration

* make BULK_SIZE an env var

* fix pdf search indexing

* debug github action exit issue

* call pubsub when create/update/delete page in elastic

* fix json parsing bug and reduce reading data from file

* replace a depreciated pubsub api call

* debug github action exit issue

* debug github action exit issue

* add handler to upload elastic page data to GCS

* fix tests

* Use http_auth instead of basic_auth

* add index creation and existing postgres tables update in migration script

* fix a typo to connect to elastic

* rename readingProgress to readingProgressPercent

* migrate elastic_page_id in highlights and article_saving_request tables

* update migration script to include number of updated rows

* update db migration query

* read index mappings from file

* fix upload pages to gcs

* fix tests failure due to pageContext

* fix upload file id not exist error

* Handle savedAt & isArchived attributes w/out quering elastic

* Fix prettier issues

* fix content-type mismatching

* revert pageId to linkId because frontend was not deployed yet

* fix newsletters and attachment not saved in elastic

* put linkId in article for setting labels

* exclude orginalHtml in the result of searching to improve performace

* exclude content in the result of searching to improve performace

* remove score sorting

* do not refresh immediately to reduce searching and indexing time

* do not replace the backup data in gcs

* fix no article id defined in articleSavingRequest

* add logging of elastic api running time

* reduce home feed pagination size to 15

* reduce home feed pagination size to 10

* stop revalidating first page

* do not use a separate api to fetch reading progress

* Remove unused comment

* get reading progress if not exists

* replace ngram tokenizer with standard tokenizer

* fix tests

* remove .env.local

* add sort keyword in searching to sort by score

Co-authored-by: Hongbo Wu <hongbo@omnivore.app>
2022-03-16 12:08:59 +08:00
a4533dc016 Merge pull request #201 from omnivore-app/feature/beehiiv-newsletter-support
Support newsletters hosted on beehiiv
2022-03-15 14:03:55 -07:00
8e1b4fb1a4 Formatting 2022-03-14 15:36:17 -07:00
a81181ee60 Dont make queries for readingProgressPercent unless we have to 2022-03-14 15:22:53 -07:00
78660c886d rm debug 2022-03-13 09:06:15 -07:00
a874482d11 Dont perform an extra query for isArchived 2022-03-13 09:00:25 -07:00
bea7d084c4 SetClaims when creating an email article 2022-03-09 19:45:52 -08:00
f7814a0c4a Remove unused function 2022-03-09 19:45:31 -08:00
b0fe9059a9 Dont try to generate highlight URL previews until share is re-enabled 2022-03-09 09:49:03 -08:00
384a0771a7 Merge pull request #200 from omnivore-app/fix/validate-urls
Use the validateUrl method to validate URLs
2022-03-08 20:28:33 -08:00
c45c408c14 Fix formatting 2022-03-08 15:27:05 -08:00
e8fca4a7a9 Remove debug line 2022-03-08 15:20:24 -08:00
26dadab4aa rm debug, we dont need to set claims on create 2022-03-08 15:10:01 -08:00
b982bf34d6 SetClaims on userArticle create/update in saveEmail 2022-03-08 14:43:10 -08:00
9ae81d7394 Add extra debugging on newsletter save errors 2022-03-08 14:00:35 -08:00
2184c2a8d3 Parse online URLs for beehiiv newsletters 2022-03-07 15:49:44 -08:00
2cb5cc065a Add support for identifying newsletters hosted on beehiiv.com 2022-03-07 15:23:56 -08:00
be7e36ffed Use the validateUrl method to validate URLs 2022-03-07 14:34:32 -08:00
b6fd3e786e Fix parsing authors from page metadata 2022-03-03 19:40:02 -08:00
49092b707d Remove async 2022-03-03 19:31:51 -08:00
05373ba3c7 add methid to parse content metadata 2022-03-03 17:10:06 -08:00
c2e08d0e8f Fetch title and author from page metadate if possible 2022-03-03 15:20:58 -08:00
b17f22949c Use the readability title if available 2022-03-03 14:21:30 -08:00
a4bb67deee Use the readability byline for email authors if available 2022-03-03 14:21:03 -08:00
b326a5f8e7 Add more matches on substack icons 2022-03-03 13:42:30 -08:00
484cd78ac5 prettier 2022-03-02 23:14:10 -08:00
65ce8353dc Attempt to pull URLs for probable newsletter emails out of content 2022-03-02 23:09:10 -08:00
21329949e5 Fix to email address in saveNewsletterEmail 2022-03-02 22:06:15 -08:00
02d505a7e2 Add some debugging 2022-03-02 21:39:10 -08:00
8fe9827ad2 If an email appears to be a newsletter at it to the library 2022-03-02 20:47:53 -08:00
65cc666579 Pass HTML instead of a JSDOM into isProbablyNewsletter to better encapsulate 2022-03-02 20:38:11 -08:00
c4e237927d Allow any on GCP func 2022-03-02 20:27:40 -08:00
e660e41ed5 Fix warnings 2022-03-02 19:49:10 -08:00
9206230659 Better name for the save newsletter service 2022-03-02 19:45:28 -08:00
d66c114a7d Remove file, this is moved into the parser 2022-03-02 16:36:36 -08:00
b5f9478350 Dont mutate function input 2022-03-02 16:34:56 -08:00
f7f83fe080 New function to determine if an HTML blob is probably a newsletter based on its content 2022-03-02 16:31:15 -08:00
505f888e37 Update to latest intercom client 2022-03-01 11:20:31 -08:00
fc9aa9452c Add a flag in readability to retain table elements in newsletter emails (#152)
* add a flag in readability to retain table elements in newsletter emails

* remove header of axios newsletters
2022-03-01 11:49:38 +08:00
7bf454ae91 use dataloader to fetch all labels of a list of linkIds in a single q… (#133)
* use dataloader to fetch all labels of a list of linkIds in a single query and cached

* add labels in GQL query in frontend
2022-02-28 12:13:26 +08:00
328ebc48cb Apply code block highlighting before running DOM clean 2022-02-26 19:04:32 -08:00
fd39923907 Prettier improvements 2022-02-26 16:38:43 -08:00
84fbc9cd27 Add code highlighting using highlight.js 2022-02-26 14:57:59 -08:00
42f2cffdf8 remove debug log 2022-02-24 14:47:33 +08:00
276205d52a add linkId and labels in article type 2022-02-24 14:44:49 +08:00
c7841b8e8a remove duplicate links because of join with link_labels table 2022-02-24 14:44:49 +08:00
fc18004a5d add labels in search query 2022-02-24 14:44:49 +08:00
edb9339897 add labels and link_labels table name 2022-02-24 14:44:49 +08:00
5651c92efe ingore case when checking name uniqueness 2022-02-24 14:44:49 +08:00