|
|
706ab6075d
|
double capacity of content-fetch worker
|
2024-08-23 12:23:22 +08:00 |
|
|
|
eec2734f5a
|
keep up to 1 hour complete tasks and up to 1 day failed tasks
|
2024-08-22 15:27:53 +08:00 |
|
|
|
f0865654dc
|
fix: allow content-fetch to still process http requests
|
2024-08-22 15:14:41 +08:00 |
|
|
|
aa2caca944
|
do not cache content from https://jacksonh.org
|
2024-08-21 18:47:23 +08:00 |
|
|
|
0366c426bc
|
process up to 2 jobs concurrently
|
2024-08-21 17:58:58 +08:00 |
|
|
|
6de285432d
|
fix importer status not updated if failed to fetch content
|
2024-08-21 16:53:50 +08:00 |
|
|
|
72d89308c5
|
process up to 10 jobs concurrently
|
2024-08-21 12:24:35 +08:00 |
|
|
|
0fbc6d0a87
|
update importer to use content-fetch queue
|
2024-08-21 12:24:35 +08:00 |
|
|
|
5bd272dde0
|
use one queue with different priority for fetching content of rss feed item or saved url
|
2024-08-21 12:24:35 +08:00 |
|
|
|
d29ac109cb
|
slow and rss queue process 5 tasks/s and normal queue process 100 tasks per second
|
2024-08-21 12:24:35 +08:00 |
|
|
|
34edbeba56
|
fix dockerfile
|
2024-08-21 12:24:35 +08:00 |
|
|
|
08fbb8aebf
|
use different queues for fast,slow and rss content fetch jobs
|
2024-08-21 12:24:35 +08:00 |
|
|
|
87b4ec503e
|
enqueue content-fetch task to the queue
|
2024-08-21 12:24:35 +08:00 |
|
|
|
e3eae1c96c
|
create a worker to process content-fetch job
|
2024-08-21 12:24:35 +08:00 |
|
|
|
4674321531
|
reduce blocking domain to 1 hour
|
2024-08-18 12:37:10 +08:00 |
|
|
|
322f736fe0
|
stop storing original html in the database
|
2024-07-31 19:14:38 +08:00 |
|
|
|
0e0c4bddac
|
block failed domains
|
2024-07-24 16:55:50 +08:00 |
|
|
|
31fe4b65a0
|
remove readability from content-fetch
|
2024-07-24 12:53:41 +08:00 |
|
|
|
29a5b20d2c
|
remove scrapingbee from content-fetch
|
2024-07-24 12:17:13 +08:00 |
|
|
|
75338f5927
|
bypass cloudflare captcha
|
2024-07-10 14:43:47 +08:00 |
|
|
|
73e180f43d
|
add more dependencies to docker container
|
2024-07-09 19:16:21 +08:00 |
|
|
|
c75cbb39d6
|
injecting webgl fingerprint
|
2024-07-09 14:11:31 +08:00 |
|
|
|
dd01202374
|
do not cache some urls
|
2024-07-05 19:46:18 +08:00 |
|
|
|
728059c6f8
|
do not cache some urls
|
2024-07-05 19:05:36 +08:00 |
|
|
|
b38b28c75e
|
create a browser singleton instance and checks browser existence before creating context
|
2024-07-04 19:12:42 +08:00 |
|
|
|
bbc7b5e600
|
use @omnivore/utils in import-handler
|
2024-07-03 22:20:27 +08:00 |
|
|
|
59c826fd5e
|
use @omnivore/utils in content-fetch
|
2024-07-03 21:58:22 +08:00 |
|
|
|
f2ff4b7b0a
|
fix: only send content_fetch_failure event to analytics
|
2024-05-31 12:44:01 +08:00 |
|
|
|
fc9d5c64ec
|
do not fail if cache missed
|
2024-05-17 17:27:34 +08:00 |
|
|
|
6f2aa2e0cd
|
add more logs
|
2024-05-17 17:19:55 +08:00 |
|
|
|
52ebf466e3
|
get content from cache first when saving url
|
2024-05-17 16:46:54 +08:00 |
|
|
|
9c3d619ad5
|
put locale and timezone in cache key
|
2024-05-17 16:22:20 +08:00 |
|
|
|
dde9f16396
|
put error message in the analytic event
|
2024-05-17 16:16:44 +08:00 |
|
|
|
f3ce6f4d4e
|
catch content fetch result in redis
|
2024-05-17 15:55:28 +08:00 |
|
|
|
efb9b6b139
|
add source to the content_fetch event
|
2024-05-17 14:54:46 +08:00 |
|
|
|
9dee510be1
|
fix rss
|
2024-05-14 20:18:18 +08:00 |
|
|
|
cce5f2463d
|
still use redis for cache
|
2024-05-14 17:16:26 +08:00 |
|
|
|
04ba62977e
|
fix rebase conflicts
|
2024-05-14 17:14:41 +08:00 |
|
|
|
e093c9e096
|
fix comment
|
2024-05-14 17:14:41 +08:00 |
|
|
|
3e925e0193
|
update comment
|
2024-05-14 17:14:41 +08:00 |
|
|
|
5bd157ca25
|
hash url as the key
|
2024-05-14 17:14:41 +08:00 |
|
|
|
7a0b2f3d33
|
upload file only not exists
|
2024-05-14 17:14:41 +08:00 |
|
|
|
9286174ec7
|
upload and download original content from GCS
|
2024-05-14 17:14:40 +08:00 |
|
|
|
33e1c4dd00
|
remove flush method from analytics class
|
2024-05-13 19:10:14 +08:00 |
|
|
|
7634ed667f
|
capture total time of fetching a page
|
2024-05-13 17:01:52 +08:00 |
|
|
|
f64bd4700f
|
update analytic event details
|
2024-05-13 15:18:04 +08:00 |
|
|
|
a924c8448b
|
capture content-fetch success and error events
|
2024-05-13 14:55:48 +08:00 |
|
|
|
0c0a95a79c
|
fix newsletter dir not saved correctly
|
2024-04-24 21:10:13 +08:00 |
|
|
|
824b256d20
|
fix memory leak from axios error
|
2024-04-24 15:55:54 +08:00 |
|
|
|
7f441b4ff3
|
dedupe save-page job
|
2024-04-23 21:44:25 +08:00 |
|