Anime next was a web application that I've created to find more recommendations for anime. There was a time where I would watch over 1000 episodes of anime in 1 year. Eventually ran out of anime's to watch, then I realized: I'm a developer?!?!??!! So I made a program to find new series for me to watch.
Yes. I needed more recommendations ASAP. So I started coding. This project has 3 parts: the scraper, the matcher & the website. (please say these in super hero voices)
THE SCRAPER is a very very very very important part of this application. It created the database of series. The database which will contain all the series which it'll make decisions out of. But where can you find out what people have watched????
MyAnimeList!!!!!! (short: MAL, I'll use this from now on so don't be looking like "?????" if you don't understand)
MAL is an awesome website used for data mining keeping track of your watched series.
Luckily all the information is publicly accessable!! Yay me!!!!!
They also allow user scraping by a page that shows
recently online users.
So I started the scraper:
while (true) { // pls don't judge const response = await axios.get('https://myanimelist.net/users.php'); const dataIActuallyNeed = response.data.split('<div class="picSurround"><a href="/profile/'); var usernames = []; for (var i = 0; i < a.length; i++) { const username = a[i].split('"')[0]; if (username.length < 30 && !username.includes(' ')) { usernames.push(username); } } }
Now we got an array with some random usernames. Yay
Now we'll have to figure out a way to see which series the users have watched...
Oh!!! MAL has an API for that 😮😮😮😮😮
I guess we'll be using that
// (this is in a function btw) var series = []; var url = 'https://api.myanimelist.net/v2/users/' + username + '/animelist?fields=list_status&limit=100'; while (true) { const response = // here the request happens with Axios :0 if (typeof response === 'undefined' || typeof response.data.error !== 'undefined') { break; } series.push(...response.data.data); if (typeof response.data.paging.next === 'undefined') { break; } url = response.data.paging.next; }
EZ right???? Let's imagine I put the anime id's in their own collections and the information about the anime (with the amount of times seen) in another one. And continue to the next part
THE MATCHER is as important (maybe even more important) as THE SCRAPER. To be honest, it's probably more important: if the scraper goes offline, the matcher will still be able to make recommendations.
At the beginning of this project, I was desperate for more series. And when I say desperate I fr fr mean like desperate desperate. Soooo, at the beginning I just had a simple query which would get the top x amount of the series watched. Super simple, and kind of effective.
but after some time, I realized I didn't actually really like these series.
Yes they were fun, but I like other series more.
So then the personalisation part came in.
How can I make an algorithm, strong enough to bring me new recommendations that I'll like???
For the next part you'll have to know what's in the database, so let me show it with some JSON that gets saved.
// matchCollection { a: number[] } // animeCollection { animeId: number, image: string, title: string, count: number }
count
will be incremented if a document with the same animeId
is already recorded.
tldr; animeCollection
contains info about which anime it is, and how many times it has been seen.
matchCollection
contains documents with lists of what has been seen by other people.
Before we can actually start writing the algorithm, we'll first need to find all anime's with weight
,
weight
is a number generated by comparing my `matchCollection` with other people their `matchCollection`
// let's pretend we got a variable "animeIds": number[] const animes = await matchCollection.aggregate([ { $addFields: { weight: { $size: { $setIntersection: ["$a", animeIds] } } } } ]).toArray();
EZPZ LEMON SQUEEZY RIGHT?!?!?!?!!
So now, we got all the information for the algorithm :D
Let's see...
// animeList: { animeId: number, weight: number } // contains all the unique series found with the code above // sort series by weight await new Promise(resolve => { animeList.sort((a, b) => { if (a.weight === b.weight) return 0; return (a.weight > b.weight) ? -1 : 1; }); resolve(); }); animeList = animeList.slice(0, 200); // only first 200 results (otherwise this'll take too long) for (var i = 0; i < animeList.length; i++) { // Add the other information, like count and title animeList[i] = {...await database.getInfo(animeList[i].animeId), ...animeList[i]}; } // sort for the last time, but this time on popularity await new Promise(resolve => { animeList.sort((a, b) => { if (typeof a.count !== 'number' || typeof b.count !== 'number') return 0; if (a.count === b.count) return 0; return (a.count < b.count) ? -1 : 1; }); resolve(); }); // return best 100 results return newList.slice(0, 100);
Above first sorts the series by weight. Weight is the score of how much the person that watched this, matches with me. After that, I'll also get the count of how many times the series have been seen, and sort based on those. Then send the top 100 results. That's really it
The website was an interface for me to just click on a button and see my options. (that's it)