Database schema
All schemas need to ensure that we support required operations.
articles
Attribute | Data Type / Properties | Constraints / Indexing |
---|---|---|
item_id | BIGINT, Primary Key | NOT NULL, AUTO INCREMENT, UNIQUE |
user_id | BIGINT, Foreign Key → users.user_id | Indexed, NOT NULL |
status | SMALLINT | |
favorite | BOOLEAN | DEFAULT false |
given_url | TEXT | NOT NULL |
given_title | TEXT | |
resolved_title | TEXT | |
resolved_url | TEXT | |
domain_id | BIGINT, Foreign Key -> domains.domain_id | Domain of given_url |
origin_domain_id | BIGINT, Foreign Key -> domains.domain_id | Domain of resolved_url |
excerpt | TEXT | |
is_article | BOOLEAN | DEFAULT false |
is_index | BOOLEAN | DEFAULT false |
has_video | SMALLINT | DEFAULT false |
has_image | SMALLINT | DEFAULT false |
word_count | INTEGER | |
time_added | TIMESTAMP | Indexed, DEFAULT now() |
time_updated | TIMESTAMP | |
time_favorited | TIMESTAMP | |
top_image_url | TEXT | |
author_id | BIGINT, Foreign Key → authors.author_id | Indexed |
Description of the fields
item_id
: Unique identifier for the saved itemuser_id
: id of the user to which this article/item belongsstatus
: 0 = active, 1 = archived, 2 = deletedfavorite
: weather article is favorited by the user. 1=favorited, 0=not favoritedresolved_title
: title, found by the parser/backendresolved_url
: final URL resolved to by the parser/backend. For example, by following redirectsexcerpt
: excerpt of the article found by the parser/backendis_article
: weather the item is an articleis_index
: true if parser/backedn thinks this is an index page- 1=parser thinks this item is an index page
- 0=otherwise
has_video
:- 0=no video;
- 1=has a video in the body of the article;
- 2=is a video
has_image
:- 0=no image;
- 1=has an image in the body of the article;
- 2=is an image
word_count
: number of words in the articletime_added
: timestamp (with timezone info), at which the article was addedtime_updated
: timestamp (with timezone info), at which the article was last updated (like last read/opened)time_favorited
: time at which the article was favoritedtop_image_url
: URL of the first found image in html pageauthor_id
: id of the author of the article
Details:
- this is the main and most important table, which stores information about the stored links.
- we anticipate queries of the form
-- find all articles that belong to user with user_id -> 123
SELECT * FROM articles WHERE user_id = 123 ORDER BY time_added
- or we might need articles that were added after "1 january 2024";
SELECT * FROM articles WHERE user_id = 123 AND time_added > '2024-01-01';
- we may also need to filter by
status
andfavorite
users
Attribute | Data Type / Properties | Constraints / Indexing |
---|---|---|
provider | TEXT | NOT NULL |
user_id | SERIAL, Primary Key | NOT NULL, UNIQUE |
TEXT | UNIQUE, Indexed | |
hashed_password | TEXT | |
name | TEXT | NOT NULL |
details:
login and signup
We should be able to support both traditional email-password
based
logins as well as o-auth
based logins/signups.
- traditional login and signup
When the user signs up, firstly we ask for the following information:
name | email | password
After, we have this information, then we use something like:
-- insert the newly created user into the system
INSERT INTO users (provider, email, hashed_password, name)
VALUES ('open-pocket', 'xyz@mail.com, '12d8287g1', 'xyz');
Note that open-pocket
is the provider here, since, the user is
registering through our platform. And user_id
will be generated
autormatically.
Password hashing should be done in the application using
bcrypt
.
Now when the user tries to login.
We ask for the following information:
email | password
Then we do something like:
-- find the hashed_password for user with email=email
SELECT hashed_password FROM users WHERE email=[supplied_email]
If this query executes successfully, but returns no rows. Then, the user
is not present in the system (not registered). Otherwise, the
hashed_password
must match hash_of_supplied_password
for successfull
login.
- O-auth login and signup
O-auth logins will be supported in the future, once we are done with
email-password
logins. The provider
field will store the O-auth
provider's name (e.g. google, github etc.). And additional fields will
be added as required.
tags
Attribute | Data Type / Properties | Constraints / Indexing |
---|---|---|
tag_id | SERIAL, Primary Key | NOT NULL, UNIQUE |
user_id | INTEGER, Foreign Key → Users.user_id | NOT NULL |
item_id | BIG INTEGER, Foreign Key → Articles.item_id | NOT NULL |
tag_name | TEXT | Indexed, NOT NULL |
Further Indexes and Constraints:
- Index on
(user_id, item_id)
- Index on
(user_id, tag_name)
(tag_name, user_id, item_id)
are UNIQUE combined. Therefore, no duplicate tag is allowed
Notes:
- this table basically contains: an article with
tag=tag_name
on article with item_id, belonging to user of user_id - the fields: (tag_name, user_id, item_id) combined must be unique
- we will be required to handle queries of the form:
-- get all tags on an article with `item_id` and which belongs to
-- user with `user_id`
SELECT tag_name FROM tags WHERE item_id=123 AND user_id=123;
-
therefore, it is beneficial to create an index on (item_id, user_id)
-
here's another use case:
find all articles belonging to user with user_id
and with given
tag_name
:
SELECT item_id FROM tags WHERE user_id=123 AND tag_name="xyz";
- therefore it might be beneficial to make an index on (user_id, tag_name) columns
Images
Attribute | Data Type / Properties | Constraints / Indexing |
---|---|---|
image_id | BIGINT, Primary Key | NOT NULL, AUTO INCREMENT, UNIQUE |
article_id | BIGINT, Foreign Key → Articles.item_id | Indexed, NOT NULL |
src | TEXT | NOT NULL |
width | INTEGER | |
height | INTEGER | |
credit | TEXT | |
caption | TEXT |
Videos
Attribute | Data Type / Properties | Constraints / Indexing |
---|---|---|
video_id | BIGINT, Primary Key | NOT NULL, AUTO INCREMENT, UNIQUE |
article_id | BIGINT, Foreign Key → Articles.item_id | Indexed, NOT NULL |
src | TEXT | NOT NULL |
width | INTEGER | |
height | INTEGER | |
type | VARCHAR(50) | |
vid | VARCHAR(100) |
authors
Attribute | Data Type / Properties | Constraints / Indexing |
---|---|---|
id | BIGINT, Primary Key | NOT NULL, AUTO INCREMENT, UNIQUE |
name | VARCHAR(255) | NOT NULL |
url | TEXT |
domains
Attribute | Data Type / Properties | Constraints / Indexing |
---|---|---|
domain_id | BIGINT, Primary Key | NOT NULL, AUTO INCREMENT, UNIQUE |
name | TEXT | NOT NULL |
This parameter is require, since
/add
returns it.