Meta-PokéBase Q&A
15 votes
1,997 views

Hey, it's time to infringe on Fizz's trademarked "overlong discussion threads"!

I've been considering for a long time how to make it easier to contribute to Pokemon Database. We have the corrections thread of course, which is great, but it still takes time for me to input the corrections. So it would be useful to have a method where people can contribute in a direct way, then I (and potentially moderators) can just click a button to approve contributions and they would appear on the site.

Another request I've had many times is for a downloadable database of Pokemon info. Something like this would probably work best on GitHub, as that's where everyone goes for "open source". Pokemon Showdown is there, as is the Veekun database.

So hopefully we can kill two Pidgeys with one Stone Edge. There are two ways I can think of to handle this:

Option 1: allow contributions through the site, then automatically send to GitHub.

Contribution through the site would be using a basic form. For example if you were on the Squirtle page (and logged into Pokebase) there would be a button to fix some information, and it would come up like this:

PokemonDb form

It would have everything on there like stats, egg group etc. So you'd change whichever one was wrong and submit. Then I'd have a page where I see the suggested changes, and click a button to approve/reject it. For the "downloadable database" side, I can set up something to export our database to an appropriate format for GitHub. But no contributions would be allowed via GitHub, only via the site.

Pros:

  • simpler for the average visitor, they just type in some boxes
  • the site would be updated quicker (as soon as a mod approves the change)

Option 2: allow contributions via GitHub, then automatically update the site

With this method we'd have all the data in a readable text format, then people would be able to send "pull requests" on GitHub to suggest updates. Format would be something along the lines of:

bulbasaur:
    name: Bulbasaur
    national: 1
    gen: 1
    type1: grass
    type2: poison
    species: Seed
    moves:
        - growl
        - tackle
        - vine-whip
        - leech-seed
    ...
ivysaur:
    name: Ivysaur
    national: 2
    ...

This is probably the same format I would export to if using option 1.

Pros:

  • GitHub is where programmers go so we may get more contributions from technical people (though not sure if this matters)
  • easier for people to comment on each others' changes
  • easier to contribute mass updates - for example instead of having to submit 20 forms you can make 20 changes in one text file and submit that

Cons:

  • importing back into the site will be more difficult for various reasons; not everything would be in GitHub (e.g. internal ID numbers) and I'd need to match stuff up to import
  • updating the site wouldn't be instant

Let me know what you think about these two ideas, which you'd prefer, and/or if you have any other ideas!

by
@SYL Pokemaster would have to approve everything, so he/she/it can just not approve a request like that.
commented 1 day ago by sumwun
Yes the others are correct, in both options all changes would need to be approved. Either by me, or potentially some trusted users like the mods here.
For option 2, GitHub has a feature called "pull requests" where you make a change and it comes up for approval. Anyone can comment (and vote) on it, but only certain people can accept it.
For option 1, I'd need to code up my own version of that. But I do already have my own forms set up for updating the site so half the work is done.
Does this mean I can one day update this page?: https://pokemondb.net/spinoff
I kinda like option 1.
Is this still happening?

1 Answer

8 votes

This is awesome. Thanks for getting us involved, because this feature could turn into something really cool. You cover all bases with the main site for the general population and the repo for everyone looking to contribute or use the data. Just sounds like we're winning on all fronts!

As much as it would be cool to personally press the button that updates what we see on each page, I think the second option is far and away the better route to take. Thoughts, etc:

  • I massively value the collaborative aspect to this; it has a lot of potential. The first option won't allow different people to discuss individual submissions in the way the second would. GitHub has robust commenting and maintenance features and we might as well use them. Submissions need to be fact-checked and validated, and making that process open to everyone (and not just the mods clicking the button) is a practical move in my opinion.
  • Creating a form might be more accessible to the average user, but ultimately it's achieving virtually the same thing as a pull request. It convolutes the connection between each submission, the database itself and the repo, and I personally don't think speed is worth that cost.
  • I could see the form receiving a greater quantity of responses than a purely GitHub-based solution, but I could also see it receiving much lower quality responses. Running this publicly through GitHub only sounds like a way to discourage unresearched or blatant troll submissions. Quality over quantity basically!
  • As you mention, the more this is tied in with GitHub, the better personnel we're likely to get. That absolutely matters because every community can make use of people with technical know-how. These are the people most likely to care for our cause, which can only improve the quality of responses.

Then I also have a few questions about how this would all be happening. Not experienced with databases beyond high school SQL crash courses, so forgive me if any of this sounds dumb!

  • In option 1, would the GitHub repo update at the same time we 'approve' each submission for the site itself? If so, does that mean that as part of option 2, we could also use have the form, and automate each submission as a pull request? If not, then I like the first option even less anyway. It would cause the repo to fall behind the site itself which is bad in my opinion, even if the repo isn't explicitly intended for development.
  • Again, forgive my inexperience with databases: but I'm curious where the problem comes in with importing back to the site. Is there an issue with merely creating a mirror of the actual database you use for the site on GitHub? I can see how at that point the repo transcends being only a downloadable database, but to me, that's part of the point of option 2.
  • Precisely, what is going to be available for users to contribute to and/or get as part of the downloadable? You've made pretty clear it'll include each Pokedex entry, but the areas of the site that would make the best use of user contributions are the itemdex, location guides and the page descriptions for moves, abilities and items. Do we get to help with those as well? :D
by
Hey Fizz thanks for the detailed response. You’re right that GitHub already has great collaboration tools, which would save me a lot of time making my own. (On the other side I still have to write an importer, but on balance that will probably be less work.)

> In option 1, would the GitHub repo update at the same time we 'approve' each submission for the site itself?

My original thought was to have a script that I ran myself every so often. After every minor submission would be too much. But it could be automated to do it every day.

> ... have the form, and automate each submission as a pull request?

I think there are ways to do that, so it could be a nice addition further down the line. Getting the best of both options.

> Precisely, what is going to be available for users to contribute to

I haven’t decided on an exact list, but all the basics should be there: Pokemon, moves, abilities, items, locations, and their mappings (e.g. item locations, Pokemon learnsets). Plus events would be useful. I think the big initial focus would be locations.

> the page descriptions for moves, abilities and items

This part I haven’t fully decided on. The longer descriptions are what makes us unique and are not “data” as such, so I’m not sure about “giving them away” in a GitHub repo. This is partly why I was originally leaning towards option 1, as people could contribute to those (and perhaps full pages like the breeding guide), but they don’t need to be in GitHub.

> Is there an issue with merely creating a mirror of the actual database

There are various fields in the tables that are specific to this site and wouldn’t be in the repo. For example there are a couple of fields controlling how Pokemon are displayed on the site (new Pokemon are hidden from certain lists until the game is released). Plus flags for whether different images can be displayed.

> I'm curious where the problem comes in with importing back to the site.

The main of it is: if the repo doesn’t have every field (as mentioned above) then the importer has to update every row one by one instead of a mass update. Which is (a) much slower and (b) difficult to handle deleted items.

But I’ll look into that in more detail and see how difficult it really is.
Yeah no problem! I'd been in anticipation of this since you first hinted toward it, so I'm quite happy to invest a bit of time into it.
Automating as much of this as possible would be nice long-term, so I'd definitely see how far that will go before you have to start doing things personally. The automatic form in particular would be a really cool feature, best of both worlds as you mention.
I definitely agree with your feelings on the site's original descriptions; though I do think they're a feature that a lot of people would enjoy contributing to. Maybe if the form from option 1 also gets developed, a separate one could be made for these descriptions, which would submit to somewhere away from the main repo? It would be a good way to increase their contents and keep them updated. A lesser priority, but a consideration long-term perhaps? Otherwise it's good to know so much of the data on the site would be involved in this :)
I think I'm on the same page with the database now, which is nice lol. What's good is that it's at least possible, right? Hopefully some tricks to optimise the process reveal themselves at some point. I guess the other question is whether it's worth restructuring the data to increase its compatibility? If the GitHub repo is viewed as a long-term thing, then some compromise for its sake might be worth the bother.
What do you mean by “restructuring the data to increase its compatibility?”
In thinking about it, this would be probably slow down the main site too much for what it's worth; but I was thinking it would be possible move the fields that won't be on the repo to their own separate tables? Then again I'm mostly out of my depth discussing databases so feel free to correct me.
I see. Yes that would be possible, but ultimately I don’t think it makes any difference. I’m not planning to dump database tables directly (as in, CSV files or SQL code) as I think that’s too complicated. So either way I’ll have to write a script to export and import data.

Now I’ve thought about it, the way I would do it shouldn’t be much slower. And I think handling deleted items is doable, if complex.

Gonna try and get some basic stuff together in a repo soon as a starter. We can then build on it in time.