Just a quick post to bring everyones attention to a why-did-I-not-think-of-that project by the name of Byzantium. From the site:
The goal of Project Byzantium is to develop a communication system by which users can connect to each other and share information in the absence of convenient access to the Internet. This is done by setting up an ad-hoc wireless mesh network that offers services which replace popular websites often used for this purpose, such as Twitter and IRC.
These services and web apps were selected because they are the ones most often used by activists around the world to find one another, exchange information, post media, and organize. They were also selected because they stand the best chance of being easy to use by our intended userbase, which are people using mobile devices like smartphones, MP3 players, and tablet PCs.
Unlike most mesh implementations, a Byzantium Mesh requires no specialized equipment that may not be easy to get during an emergency, just an x86 computer with at least one 802.11 a/b/g/n wireless interface.
I am a strong, strong believer that– in most cases– commodity electronics are now more than cheap and powerful enough to replace dedicated, specialized hardware. What we used to do in hardware can now be done in software with at least as much safety and security. In some cases it can be done better since the software often has knowledge of the underlying systems (think ZFS as opposed to dedicated RAID controllers).
Any way, I had a love-at-first-site (get it?!) reaction to this project so I just figured I would help spread the word.
I have been working with a whole mess of data as of late in a series of MyISAM tables. The short of it is I have a 256GB XML file that I want in MySQL. A few tools already exist designed for this specific data. Problem is they import everything and I only want about 75% of the data and they were not as fast as I know they could be. So, being how I have so much free time, I decided to explore this a bit further and wrote my own PHP script to handle the job.
I came the conclusion that the indexes were what was killing me. The database ends up being ~3.5 billion records– yes, it is nearly that much– and indexes are kind of a very important thing. First I tried with the indexes enabled for the whole import. I do not know how long it would have taken because the larger the database grew the slower it got, kind of like how you can never quite reach the speed of light. After that I tried creating the tables without any indexing and using
ALTER TABLE after. It looked good at first but when you have such huge tables it gets slower and slower with each added column (IE the first column indexed plenty fast but the second was half, the third half that, ect). The problem was, using
ALTER TABLE, you have to add them one-by-one.
After further researching it seems to me that the best way to do this is to add the indexes while creating the table and then just disabling them. Before any of the data is inserted you do a
ALTER TABLE table DISABLE KEYS and
SET FOREIGN_KEY_CHECKS=0. After you have done your massive, massive import you just do
SET FOREIGN_KEY_CHECKS=1 and
ALTER TABLE table ENABLE KEYS to re-enable them. This way MySQL does all the work at once and virtually any time you let MySQL handle the work itself– versus multiple calls via external script– you end up with a big benefit.
A few notes:
- The primary keys are still updated during insertion. I suppose you could use
ALTER TABLE after to add those but I am pretty happy with the speed of this setup.
- I also tried
INSERT DELAYED (both alone and in combination with the above) and it resulted in a ~10% increase in time.
I just found this article. It seems I was wrong about having to index one column at a time but there are a few caveats. The most notable one is that the order in which you specify the columns has a direct result on if it can be used or not. Any way, this is a huge topic and that is why there are DBAs. Good luck to us all…
This post represents a paradigm shift.
What was once a repository for the everyday crap floating around in my brain is no more. This blog is now a repository for B- everyday crap floating around in my head.
Most posts before this point– minus one or two decent posts– are to be ignored.
No, that subject is not a mistake. I was being clever, doink.
Jonathan Kaufman died on 2009.02.23 of cancer.
The world is worse off for it.
There are things I do not see myself doing. Assuming I could. For example, I would not write parody or clever word-play songs. There are already so many talented people doing that (They Might be Giants, Jonathan Coulton, Paul and Storm, “Weird Al” Yankovic). It is not that I should never try because I will never be as good but because I would rather spend that time listening to their stuff.
There are, however, things I can and may do. I am already writing games– no, Simple Tanks is not dead but so very far from it– for one. I fiddle with my guitar every now and again.
There. I posted something new.
What better time then now to blow shit the fuck up?
Every now and again one comes across something that makes them laugh so hard they hurt. Often they keep looking at it over the course of weeks (at least I know I do). A recent Penny-Arcade strip had this effect on me.
The nuance in the faces of the third panel are priceless (or as much of a nuance as you can find in a mono-colored web comic). I have yet to buy a house and these might be things I need to be aware of. Also ancient Indian burial grounds.