Page 1 of 5

KBDB?

Posted: 26 Jul 2013, 19:36
by matt3o
Do you think a keyboard database would be desirable? I know there's the wiki and that's a great resource but I was thinking about something that could be easily crawled and filtered. For example it might be possible to list all TKL+MX+LID keyboards, with a nice "instant search". Maybe with links on where you can buy them. Or you could order keyboards by size? Or by weight? Or by year... well you've got the point.

I would start from keyboards "that you can actually buy", having a backlog of ALL keyboards would be great but maybe a bit too much and maybe the wiki is a better place for those decks.

What do you think? would you like something like that? I might set up the software and server if needed, would you help me filling the DB?

Posted: 26 Jul 2013, 20:35
by Muirium
YES!

Here's some dimensions (in mm, sourced from various places including my own measurements) to get things started.

Code: Select all

Apple Wireless Aluminium	280x130x15		A1314
Apple Keyboard II			400x150x40		M0487
Apple Pro Keyboard		460x150x35		M7803

HHKB Pro 2			294x108x32

Filco Minila                    297x124x40
Filco Majestouch 2 Tenkeyless   356x135x33

Ducky DK9087 Shine 2            358x140x52

Cherry G80-3800                 445x150x25

KBC Poker 60%                   290x102x38
KBT Pure 60%                    290x102x38
KBT Race 75%                    310x120x50

Noppoo Choc Mini		340x160x40

Keycool 87			363x143x38
Keycool	104				        443x155x38

IBM model M space saving 	406x189x41
A real database would clearly be a huge improvement. Live ones first!

Posted: 26 Jul 2013, 21:43
by Daniel Beardsmore
I started writing exactly this (with the plan to cover both keyboards and switches), with the idea to tie it back to the wiki, and posed the idea to webwit, who wasn't interested. Ideally you'd use Semantic MediaWiki (which is his suggested option), then all the data would only ever be entered in a single place. The down side is that it would make working with the wiki more complicated and much more demanding in terms of consistency, and all the technical folk here already seem to find the idea of using MediaWiki so incredibly horrifying (I have no idea — MediaWiki is not remotely difficult to work with, even if it desperately needs some changes, such as dedicated image metadata fields).

I figured that a simple database would be something that people might actually make an effort to enter data into, and I'd be the poor sap who'd have to keep both in sync backwards and forwards …

Posted: 26 Jul 2013, 21:54
by Muirium
A quick and dirty trial run definitely seems worthwhile, as a simple independent database. Big picture dependencies and implications can be tackled depending on interest. I'd surely like to run queries for simple but hard to find stuff: like density!

That always surprises me when I lift a keyboard. No simple way to know just by looking at it.

Posted: 26 Jul 2013, 21:58
by webwit
It's not that I'm not interested, but that I want keyboard and switch info to comply to Boyce–Codd normal form. With two systems we will have data redundancy which leads to data anomalies.

In human words, not only would you have to enter data in two locations, but you know how things go, it will lead to different data about the same keyboard or switch because people will only update one system.

People have tried a quick and dirty trial before, but you end up with lots of tables and relations (a keyboard is of a model which may be of a family and has a brand behind which there is a manufacturer; it has a layout and switches, which are of a type, by a manufacturer, etc. etc.). And you have to get that design right from the start.

Posted: 26 Jul 2013, 22:03
by Daniel Beardsmore
And when you only have the wiki, you flat out don't have the information at all …

Posted: 26 Jul 2013, 22:05
by Muirium
Ain't that the truth!

Matt's suggestion interests me because it is a lateral one. Instead of cataloging keyboards (with all of the taxonomy implications Webwit mentioned) we just list them, with the emphasis equal across all columns: width, weight, key number, price. Sometimes a name is just a name. Especially when you're a dazzled newb who has good questions but no bullshit detector yet for telling fact from opinion.

Posted: 26 Jul 2013, 22:10
by Halvar
The only successful crowd-filled table-based databases that I have seen so far on the internet are music and movie databases, and they are mostly filled by people with large collections that see filling out forms as a give-and-take in order to catalogue their own collections.

I don't know why exactly people like to write articles and postings and post pictures but don't like to fill databases and don't do it well if they do, but that's how it seems to be from my experience. So I'm afraid this will only work if there are a handful of very dedicated individuals at the center that do most of the work of actual input and completion while users merely contribute data in freeform format most of the time.

Posted: 26 Jul 2013, 22:15
by Muirium
The good thing about the tabular data we're talking about is that it's freely available online without much sleuthing. And the best thing is that there are far fewer keyboards in current production (or even throughout history) than there are movies, books and songs. We're more on a par with models of car. Especially if we just ignore all the uninteresting stuff. (Which comes naturally…)

Posted: 26 Jul 2013, 22:26
by webwit
Make wiki pages with tables of keyboard, switches etc. and their main characteristics. But that isn't a database. The nice thing about a relational database is that you can ask it any question, like give me all linear switches below 60g activation.

Posted: 26 Jul 2013, 22:30
by Muirium
Any question? I'll ask it if cylindrical or spherical is better. Bet the bloody thing answers: 42.

Posted: 26 Jul 2013, 22:44
by ne0phyte
What if we scraped this -> http://deskthority.net/wiki/Category:Li ... _keyboards

And the details on every linked page. If the detail fields (the right panel) for each keyboard were consistent that could work. We'd have a list of all keyboards and could automatically fill a database based on the wiki data.

That is indeed pretty hacky, but as long as the data (the detail fields + format) in is consistent it should be doable.

To read that one could use Java + XSL templates to preprocess + XSD and JAXB to read everything into objects and from there fill a DB.

Posted: 26 Jul 2013, 22:44
by webwit
You can ask it that question, but it will take a couple of million years to compute.

Posted: 26 Jul 2013, 23:11
by Daniel Beardsmore
ne0phyte wrote:To read that one could use Java + XSL templates to preprocess + XSD and JAXB to read everything into objects and from there fill a DB.
Ouch.

Most of the desired data isn't recorded anyway, so if you're going to add it all, you may as well use something like Semantic MediaWiki and add it into a genuinely machine-readable format instead of entering it only to have the computer rip it all out backwards.

The only data that you need to read, though, is all the Template:Infobox dkeyboard and Template:Infobox dswitch transclusions. This already is a disjoint database, and ideally you'd simply engineer a solution that treats each transclusion as a row in the relevant table, where the infobox template fields are the table columns.

(In terms of image metadata (description, source, date, author, copyright, comments and notes) I would prefer that this be part of the actual UI, with the ability to modify these fields as part of the process of uploading a new image revision, so that the date, description, comments and notes fields are explicitly tied (albeit many to one) to the image copies instead of having them uncoupled. This way, people would be forced to enter that data and then they'd have to lie about their stolen images, and might feel bad and not do it. It would also avoid the recent changes page looking like a dog's breakfast when using the imagedesc template to enter this data, as the wiki engine is too effing stupid to display just the description part of the template transclusion.)

However, the good thing about the DT wiki is that it's an old (?), simple version of MediaWiki instead of the wretched monstrosity that Wikipedia are using now. The software is NOT going in the right direction.

Posted: 26 Jul 2013, 23:25
by webwit
I do update to the latest version once in a while. Haven't checked out the latest. Rich editor was a go, huh?

Posted: 26 Jul 2013, 23:29
by ne0phyte
Getting the data is a piece of cake actually. Try running this on any Deskthority page. It returns the detail fields of the first 5 keyboards of the list of all keyboards.

But I'm not sure how to group/sort them.
Spoiler:

Code: Select all

$.ajax({
		url: 'http://deskthority.net/wiki/Category:List_of_all_keyboards',
		success: function(data)
		{
			var hrefs = $(data).find('.mw-content-ltr td ul li a');
			for (i in hrefs)
			{
				if (i == 5) break;
				$.ajax({
					url: hrefs[i].href,
					success: function(data)
					{
						console.log('---------');
						var fields = $(data).find('.infobox tbody tr');
						for (f in fields) {
							var field = fields[f];
							if (field instanceof HTMLTableRowElement) {
								console.log(field.innerText.split('\n'));
							}
						}
					}
				});
			}
		}
	});

Posted: 26 Jul 2013, 23:32
by Findecanor
I think that there should be possible to make a relatively simple system that auto-converts from info-boxes and categories to a table format whenever there is a change to a Wiki page.

Posted: 26 Jul 2013, 23:32
by Daniel Beardsmore
Firefox gives me:

[22:32:09.083] ReferenceError: hrefs is not defined @ Scratchpad/1:15

Posted: 26 Jul 2013, 23:33
by webwit

Posted: 26 Jul 2013, 23:33
by ne0phyte
Sorry guys. Copy & paste error. Works now as intended.

EDIT: THe XML export contains the Wiki markup. It's easier to parse the site with jQuery and Javascript and return JSON imo.
EDIT2: That's a bit more load, but I doubt that ~250 requests every once in a while matter :P
It could also be run headless with node.js.

The problem of grouping it in a useful way remains. I wonder if a cheap fuzzy search instead of an actual database is easier.

Posted: 27 Jul 2013, 00:36
by matt3o
okay I'm glad there's some discussion on this topic.

first of all I don't see the wiki and the db as colliding projects. Actually the DB could be a sort of an "index" for the wiki. I can create the code for mediawiki automatically if needed (or even send the info directly if there's an API).

Like webwit said the db is for searching (within relational tables), the wiki is for all those geeky info. How many time did you see on the forum: "what mechanical keyboard under $100?" or "what 60% with backlight?" or "what switches does XY have?" or "what is the smallest mechanical I can buy?".

The db answers exactly those questions. If you want to go deeper you can always search the wiki... or we could simply integrate the info in the wiki into the db if it turns out to be a better media.

Of course the DB code would be open source, collaboration free, and data stored under creative commons.

Posted: 27 Jul 2013, 01:01
by webwit
It would be interesting, but I don't think you can do it yet because infoboxes and assigned categories aren't consistent enough yet. You can try and see what happens though if you feel like writing a little scraping bot. Semantic wiki is still the more stable solution, as it is an integrated solution instead of separate scraping solution. And scraped data doesn't become relational tables unless all the right properties and relations are added, which is exactly what the semantic wiki does. You just need to see pages as not only textual content but also views to set and add properties and relations to a database. I haven't got experience with it though, so I don't know its limitations if any compared to a normalized sql database. And I think the downside is still that it makes it more difficult for the more casual potential editor who just wants to write an article about his keyboard.

Posted: 27 Jul 2013, 01:58
by Halvar
Continuing to play advocatus diaboli:

Maybe we could start with a little challenge:

Can we collect links to any comparable-size databases on any community forums on any topics on the internet using any technology that work well, are filled not by a mod team but by the users of the community and can be queried in a way remotely similar to what you have in mind? Just as an example and role model?

Should be easy, shouldn't it?

If we find a few, they will give us good ideas. If we don't, that might say something, too.

Posted: 27 Jul 2013, 02:20
by ne0phyte
I still think scraping the wiki and making that data accessible with filters/fuzzy search would be the easiest thing without having any redundancies or changes to the wiki software.

Posted: 27 Jul 2013, 02:27
by Halvar
I think so, too. To make a good wiki template for keyboard descriptions with many clearly defined fields, and to scrape the database only from that is the best bet.

EDIT: I now think different about this. This route either results in a sub-par database, or it will turn the wiki into a collection of wiki markup coded database tables unmaintainable for most prospective contributors.

Posted: 27 Jul 2013, 02:57
by webwit
No one is stopping you and I wish you luck, but you're not addressing the issues. And while Keyboards form a relatively finite topic, the wiki is nowhere near completion, so it's a moving target. I think it will be more difficult to make and maintain than you think, but maybe you'll prove me wrong and provide us with a nice searchable database ;) The wiki is open source so you don't need permission or anything, and if you make something nice and want to integrate it into deskthority, we can host it, or you can host it yourself if you prefer that.

Posted: 27 Jul 2013, 08:39
by matt3o
webwit wrote:I think the downside is still that it makes it more difficult for the more casual potential editor who just wants to write an article about his keyboard.
I mostly agree on this. The idea is to have a very limited number of mandatory fields and then let the community complete the keyboard sheet (with maybe a reward system to spice it a bit).

Also, it would be trivial to add a "collection manager" where you can set your owned or wished keyboards, maybe with even a rating system.

Regarding hosting this thing, I'd say let's try to bring something up first and see if it receives some sort of attention. Project name (with eventually available domain)?

Posted: 27 Jul 2013, 10:39
by Muirium
I like KBDB. Easy to remember because of the obvious similarity with IMDB. (But hopefully not in interface!)

Having a look at Hover.com (which is reputable enough not to snag domains and ransom them back to you) I found that kbdb.com is not available, but kbdb.co, .it and .eu are all there and kbdb.net costs a bomb! For .com the closest is kb-db.com.

I'd go with kbdb.it (information technology as well as Italia) if you want a name and domain suggestion. Short, memorable and straight to the point.

Posted: 27 Jul 2013, 10:50
by ne0phyte
I see a small problem with some of the infobox fields. There are sometimes enumerations like this:

Code: Select all

Wyse proprietary (840358, 840361), AT (900840), AT (900840), PS/2 (900866)
and currently I'm simply splitting the detail fields by comma which results in this:

Code: Select all

 "Interface":[
            "Wyse proprietary (840358",
            "840361)",
            "AT (900840)",
            "PS/2 (900866)"
         ],
The parsing of metrics, keyboard price and production year is also hard as it is ("$200-300", :"$6000 (for the whole notebook)", etc).
I guess a fuzzy search should still work on that data.

But apart from that problem here is the first full list of keyboards as JSON: http://pastebin.com/FUBYej4U

I'm still sure that with minor corrections in the wiki the scraping and generating of a database is a good short-term solution to make the wiki data accessable with various filters and grouping.

Posted: 27 Jul 2013, 10:53
by Muirium
Looks a good start.