I'm learning Lisp with a friend and "blogging" it as he and I go. So far, nothing much. Just getting into the first few chapters. I'll probably throw some more languages in there as I go.
My boss gave a short, high-bandwidth talk on Scala today. Bad ass little language. I may get to start using it at work.
In the "always hire good people" department, I was extremely pleased that all of my coworkers are very well versed in languages and computer sciencey stuff. At my last job, people knew how to write good software. I have no idea if the new guys heavy-weight application developers, but that's not what this job is about. It was very nice to have every one immediately understand tuples and higher level language concepts, mainly so they can teach me. I took this job because it takes me into a new area I haven't had before; namely a more rigorous and scientific look at software as she is run.
Most of the jobs I've had are application jobs, which are fun. But they're not always challenging in the "learning more new stuff" way. So far, this job is.
Posted Mon May 19 21:54:09 2008A friend of mine asked me about a piece of software that solves a simple problem : How do you share an address book in an office or home setting?
Well, most people use Exchange or something with their mail server to manage all that stuff because, generally, address book == email addresses. However, that's not always the case, and many places don't run Exchange or whatever. So how do you get/share/update addresses?
I've used LDAP for this over the years but it's consistently been a serious pain in the ass, either because of broken/stupid/bad tools, broken/stupid/bad server support, or the inherently hard to debug encoding scheme that LDAP uses from the X.500 spec. I tried OpenLDAP and it hurt. I'm sure it's better now or at least I hope it is.
Anyway, I tried looking for a SOHO solution to the address book problem and I found a bunch of pages describing, in glorious detail, how to set one up on a Linux box using OpenLDAP. Except it's very complicated; you have to worry about setting up the default schema, setting the server up correctly, so on and so forth. I'm a geek and this was a bit much for me, let alone the semi-technical friend which I'm giving this software.
Plus, this feels like using a sledgehammer when all I need is a light tap. I really just want this thing to do a few things and do them well :
- Simple user management
- Simple CRUD
- Simple backup
LDAP should provide all of this, but I've seen very few shops grow this solution themselves. Either they're a huge organization or they paid someone (IBM/SUN) to create the solution for them.
Why LDAP at all? Because every mail client and address book client I know use it. Mac, Linux, Windows. It doesn't matter, they all support LDAP in some form.
So I go to look at http://directory.apache.org because I figure, hey, it's Java + LDAP + Apache, it's gotta be at least OK to use. But I was wrong. First, there's this highly misguided screed about how directories just never got a fair shake. Let me address them in order :
- "First there is no formal education around directory technologies while several courses around relational database systems exist" - Bullshit. This is not a real reason at all. People have adopted many technologies without formal education long after they were in school or even getting training. HTTP wasn't popular when most of the guys/gals I've worked with were in school and they know how to use it well. Same with many other protocols, including IM via XMPP.
- "Developers may also fall back to using an RDBMS instead of a Directory because of a lack of rich integration tier constructs (like triggers, views and stored procedures)." This one may be partly right, but for the wrong reasons. Yes, lack of integration is a reason I, and others I've talked to, haven't used LDAP more in the software we write. But that's not because of triggers or views or stored procedures. That's because talking to, debugging, and managing the software relationship with an LDAP server is hard and very very hard to setup. There's almost always a schism between the person using the LDAP server and the person administering the LDAP server. What happens if it fails? How do you setup failover? How do you debug the connection on the wire?
- "Perhaps the most significant factor driving these faulty decisions is the lack of LDAP tooling. There are myriads of RDBMS tools for various aspects: tuning, accessing, and designing". This has some merit, but it still doesn't get at the crux of the matter, namely that LDAP is no longer Lightweight relative to the options available to developers. When is the last time anyone willingly chose to use ASN.1? Do developers even know what that is now-a-days? When was the last time a non-admin setup OpenLDAP or bothered to learn the schema structure?
Part of the reason I avoid LDAP in the software I write is that
I already have something for storing data and searching through
that data, and then I have an RDBMS as well
I don't need Yet Another
Technology to fill this need. The world in which LDAP lives is a
fantasy world that doesn't fill more than a few needs outside of
the security realm. It could be used for more, and I
bought into LDAP when I first read about it, but several months of
pain during setup and management of an LDAP server, not to mention
the pain of client configuration and management, taught me that
there are better ways of managing this data.
LDAP isn't "hard" in the conventional sense, and it's not even "hard" to understand. It's hard to use as a tool because of arcane decisions made decades ago that have never been revisited.
But, to get to the title of this post, the biggest problem I have with Apache lately is the documentation that makes no sense and is hard to parse. I chuckled when I read "This page needs to be overworked". It's almost engrish. Ivy used to suffer from this (still does in some places) but is being improved now that it's a mainstream project.
This isn't meant to pick on non-english speaking/writing developers, but it is a growing problem with documentation on projects I want to use. It's spectacular that people are writing projects in other countries and releasing them! That's kick ass. However, it's hard to use the projects if the documentation is bad/poor/hard to read. This is true if the documentation is written in proper english (or improper english). But it seems like documentation is slowly degrading for many open source software projects.
There are a few projects that are shining examples of good documentation; projects like Spring and Hibernate. Much of the Apache Commons. But then you find projects like Directory and, well, good luck. I wanted to try and write a quick application to manage my address book, but I think I'm going to try writing it from scratch rather than use Apache DS, mainly because of the lack and quality of documentation.
I'm not suggesting that people learn english. However, it is the defacto technology language, so that should be the target eventually for a project. Still, write the documentation in the language of your choice and then ask someone to "port" it to english! At least I have a chance at using an automatic translator (or a friend who speaks your language) to get at your documentation. And maybe it'll be more complete if you write it in the language you're comfortable?
The problem has many facets, but ultimately it just makes me not want to use the software, which is pretty much the opposite of the Open Source intention.
Posted Sat May 10 10:15:58 2008So, I've been writing an HTTP proxy again, for various reasons. I wanted to learn MINA and also I have nefarious purposes for a proxy that can do "stuff".
Anyway, on Windows, the Proxy is pretty reliable. I need to get the build cleaned up (so far it's just a hack project) so I can try it on Linux. So far, I get intermittent TCP issues with the proxy on my Mac. I'm thinking it's the glorious neglect of Java that the Apple camp keeps perpetuating.
Meanwhile, I found this out by using the glorious WireShark. I tried Eavesdrop, my normal sniffing favorite on Mac, but it seems to be strangely blind to anything on the loopback interface, even when captured with tcpdump. Anyway, wireshark can't seem to actually sniff traffic (dunno why), but it will open the tcpdump files so I can see that, hey!, when I get the "weird behavior", the TCP ack/syn process seems to just fall silent. Interesting.
I'll have to screw around with this more tomorrow.
Posted Thu May 1 23:40:28 2008I want a new tool. Something that solves all of my problems. This is a simple want, but seems to be hard to solve.
Over the years, I've taken responsibility for being the defacto build-meister at most jobs. Back in the C++ days (remember those? Remember CORBA? Right) it was gnu-make and whatever POS we had to use because we used VC++ (I wasn't far enough along to know about Vi or Emacs). I learned Make and the intricacies of just how screwed up VC++'s settings could be. It sucked. We also had to install a bunch of crap on everyone's machine because the idea of having a portable install was years, nigh, decades away.
Then I began to write in Java and things were simple. Simple because there weren't Java libraries to import and our code base was one huge monolith. We were the monkies, throwing bones into the air and hoping they didn't hit us in the head. We tried using Make for Java and that was downright painful, given the fact that we built on Windows and our tools didn't get setup well in Make.
I tried Ant. Glorious day! It did 80% of what I needed out of the box! We were still checking libraries (the few we had) into our SCM. Well, at least the build process worked across Windows and our new *nix environment (that barely ran Java).
Then the Ant scripts grew, and the Ant developers decreed that Ant was not to be a programming language (like Make, but not like Make). And Ant's utility was diminished, but only slightly since I had already spent alot of time learning Ant and could make it dance nicely. However, still checking libraries into the SCM was pretty lame.
Various companies and startups later, Ant hasn't progressed much. It's still the standard, but it's showing it's age. And we're still checking libraries into a centralized place.
Enter Maven. Maven did everything Ant did and more. It managed dependencies and helped you with build standards and and and and... like the over-excited puppy, it tried to be everything to everyone and pleased a good half of the people we ran across. Except no one could remember exactly which prototype to use and, dammit, why couldn't we just go back to Ant since it did everything right without the developers having to remember the various incantations?
Sure. Back to Ant. And lo, Ant gains import and macrodef! (Write Ant tasks you say! You've obviously never developed Ant tasks, I say) Hey, look, you can do some crazy things with this! The Ant task itself will execute a build script when not embedded in a task. Hmm, I can use that to bootstrap all these modules I've created over the years. Kick ass! Now I can easily and modularly upgrade/add/standardize all of our build scripts. Wonderful.
Except that now I have various issues, one being the bootstrap itself (written in Ant, and thus a wee bit limited in it's ability to accomplish useful tasks in a small bundle, and a bitch to upgrade), the other being the job of importing new libraries and functionality. The second is fairly easy, except you have to make your scripts depend on the conventions of your imported library (damn, this is almost as bad as Maven...)
I started out writing Formicidae. This is/was/could be a set of ant scripts + libraries that are easily imported into your scripts. The "bundles" are downloaded from the interwebs (or your cached and possibly modified copies) and expose the various steps you're most likely interested in, either as macrodefs or tasks (with well-known entry points, via pre-defined properties). Bojangles was an Ant task to do just that; manage the dependencies between modules, etc.
Sheesh! There's a mouthful. But it solved a serious problem. Namely never using Maven (more on this later). Seriously though, it did give me a way to manage my build scripts, keep them small and tidy, but it makes for a maintenance pain, since I've sort of broken the integration with many IDEs and Ant as well (Apparently many IDEs don't take kindly to you "doing stuff" to the task model inside of Ant, despite it being a legitimate extension of Ant).
I'm still checking libraries into my SCM. Crap. Enter Ivy. Yay! Something to manage library dependencies and it seems to not groan too much under the weight for the 10-level deep, 100+ node dependency graph we use. Excellent. Except there are issues with the documentation (Hello non-native English documenters! I'm sorry, but the docs suck(ed)) and the betas and alphas and generally it's overcomplicated and seems to do strange things.
Alrighty. So, lets tackle Maven for a moment. Why avoid it? Well, some of it is elegance (which Maven lacks, unless you agree with Maven) and the rest of it is having to bend my projects around Maven. There is a huge upside, which is being able to not have to worry about all of this. However, people tend to integrate with Ant and Ant alone. And I have a hard time convincing people to do something I tend to dislike, which is doing a bunch of work just to get it to work with one specific tool. Ant at least worked around my project and stayed out of my way. Maven... tends to be a bit pushy. I don't always have the same build targets, or cycles, and I don't have time to fix every project to make sure it fits. I don't have to do this with Ant, why do I have to do it with Maven?
Now I'm thinking about how to replace Ivy and it seems to be surprisingly easy. JGraphT includes pretty much everything you need to build and analyze the dependency graph. The remainder seems to be revolve around resolution and other issues.
However, ultimately, I'd love to have a tool (or series of tools) that accomplish several goals. I want a layered approach to my build system : * Simple, easily extensible, and modular build tool. Handles building without "state" (versions, etc). * Project management (namely dependency management) and release management with limited state (versions) * Build server that is project aware and can drive the remainder of the process
Instead of buying the farm with every project, I like being able to ease into the level of management I want with each tool. I sort of get this with Ant + Ivy + Hudson today, but it's clumsy.
So, now I'm going to go figure out how to walk the vertices in JGraphT since I've got a prototype resolver built.
Someday, I may have rebuilt all of this, but for now I'll settle for something that can ready Ivy and Maven dep-graphs and pull down the proper libraries.
Posted Mon Apr 28 20:19:54 2008It's what I am.
This guy is pretty good. He's goofy and that sort of makes it better.
I'll catch up sooner or later.
Posted Wed Apr 2 23:57:44 2008I didn't make it all the way through Cloverfield. I went with a buddy and he became nauseous watching the movie. It was bad anyway, I didn't mind missing it.
So, tonight, I say to myself "Well, I have a few minutes to kill, wouldn't it be neat to watch that again?". I find a screener somewhere on the interwebs. It's low quality and shakey.
Wait! The original was low quality and shakey! How do I tell the difference?
Talk about unintended consequences. Your movie looks the same no matter how bad the quality is.
Posted Thu Mar 13 00:19:46 2008While looking at Guice, I started to think about the decisions I make when I code. One of these decisions is how tightly I couple my code with third party code.
We had a problem today at work with BDB. It wasn't BDBs fault; I changed something and introduced a rather nasty bug that QA didn't catch, so data wasn't persisting correctly (BTW, no SQL in the runtime FTW). But this raised the point that, if we had written the code, we could have made it safer. This gets into the classical "Buy versus Build" problem, but I'll sidestep that and focus more on how tightly bound our code is to BDB in this case.
Our primary service has a service interface and a BDB specific provider. If we needed to swap something out it would be both contained and easy; we'd simply implement a new class based upon the previous interface, since the logic for the application is strictly contained within the storage mechanism (it's a simple service, stores data, retrieves data).
This is a pretty standard model in the Java world. I've called it the SPI model for a long time, but I think it has a different name, but anyway, the gist of the matter is you have a core set of APIs that consumers use, and a set of "internal APIs" (called SPIs, the S is for suck, er, Service) that people providing that APIs implement. There's some glue code in-between for consistency, management, or whatever else the "framework provider" is offering.
I thought of another case where I've abandoned this idea completely, and yet have managed to both survive and even excel, namely the dreaded Logging. Why is logging so painful? Because, like UI design, everyone has an opinion and a set of requirements that are greatly divergent. Anyway, there's JUL (java.util.logging), Log4j (from Apache), Commons Logging (also from Apache, woot), homemade loggers created by people unfamiliar with code reuse (or Google), and then whackjobs who created frameworks years after these have been around (I'm looking at you Sun, with JUL).
Now, the "standard" SPI would be JUL, because the JDK provides it for you, it does... most of what you need, and worst case, Log4j/Commons Logging/etc have bridge code to take your log messages from JUL to their logging subsystem.
So why do I doggedly use Log4j? Well, after thinking back, I realize I use it because I know how it works and it was (is?) far less complicated than the other options. Or at least, it's the evil I know. I probably should switch to using JUL and inject Log4j after the fact (as my preferred logging API). I probably should also inject the logging object itself, although that seems like a cross cutting concern of sorts... anyway.
I started bitching about Guice months ago, by saying "Jeez, it makes you tightly bind to their libraries!". Not really realizing I already make this choice in several ways, I didn't think much more of it, but now realize that... well, I do this with Spring too. We (at work), myself, and I have enough Spring configuration lying around to choke a cat, and it is, effectively, code.
Configuration as code! Exactly what I was hoping to avoid. I mean, that's the point of configuration, right?!?!? It's only the bits you need to tell the app what to do. However... sans some sort of Workflow Configuration Management Engine (wha?), we end up "telling the app what to do" by wiring together disparate components, feeding each one configuration (as XML, with Spring).
So, I tightly couple my configuration, er, code-as-configuration to Spring at this point. I can't "Just Startup My Application" because I need Spring to put the pieces together for me. The pieces can be reused without Spring, which is a bonus, but... not really. They still have to be wired together. And you still have to have the libraries I've used (log4j, etc), and you'll probably have to use Ivy to get them, since I patently refuse to check Jars into my SCM.
Where's my flexibility of not having tightly coupled code? I don't think I have it, actually. It'll be just as much work to run away from Guice if I wire it into code as to replace the Spring configuration going to Guice. I think. I plan to find out, at least.
Anyway, the gist of this screed is "The choices I've made in the name of flexibility haven't really given me more flexibility". I've simply traded one set of configuration for code, and vice-versa. I still have to test the application with the configuration (which is hard-ish with Spring, actually, since I don't like inheriting from their objects).
I've come to the opinion that Annotations probably don't suck as much as I previously thought (despite the fact that after using 1.5 for ... years (I think)), I haven't used them, Guice provides a compelling usage. And annotations are probably easier to remove from code than actual object usage (like log4j).
So, I'm going to :
- Try Guice
- Take another look at Annotations
- Not worry so much about my choice of libraries and tight binding... for now
So far, I've used Spring exclusively for my DI. However, it is pretty tedious, and since I pretty much only use the explicit binding of Spring, I end up with oodles of XML configuration.
I think I'm going to try Guice for a while and see how it goes. It should streamline my usage of DI (since I started doing similar things with Spring anyway), which would be peachy. It doesn't, however, take care of configuration, which Spring sort of did for me. So I'm watching the tutorial to figure out how to integrate Guice and then use (insert some other tool here) for management of configuration.
I'll post more when I know it.
Updated : I think I now understand the tradeoffs between what I like about Guice and what I like about Spring.
Guice (ratings are on scale of (+) suck (1) to awesome (5), (-) from shucks (1) to FOAD (5)):
- +++ Focused on DI (sort of, they threw in a support for dev/prod envs)
- ++++ Splits concerns of component configuration (which port do I listen on? What's my log directory?) and component wiring (dependency management between components)
- +++++ Wiring can be flexibly composed from auto-wiring and manual wiring
- +++ (Unsure) Seems to be far better at supporting Unit/Mock testing
- ? Some configuration support (Dynamic enum support, could be neat?)
- --- Requires you to use Guice annotations
Spring (same scale):
- ++ Spring Annotations are mostly external usage
- +++ Doesn't require you to tie your code to Spring
- ---- Wiring together modules from different projects with their own configuration is, at best, hackish (at work and home I use imports to handle the problem, with some dynamic application contexts passed in on CLI)
- ---- XML configuration is very verbose, and since the auto-wiring has sucked from experience, I use explicit XML
- --- Configuration and wiring are merged
I think the primary hesitancy I have with Guice is using their Annotations. However, we've treated the Spring configuration as code for quite a long time now. We have quite a bit of Spring configuration as well, easily more than would be added through simple annotations. Also, Guice seems to allow for building a framework via it's annotations and contexts to make thread safety easier, both by making it more explicit and automated.
Overall, since it seems like you could use Guice for wiring and Spring for the Web Kitchen Sink Framework, or Spring for both, but Guice seems inherently simpler. I'm sold enough to try it out. Now I need a toy project to try it out on. Hmmm.
Posted Tue Mar 11 19:14:38 2008I've been using Hudson at home and, as the build server is exposed to the interwebs, I want to secure the pages. However, Hudson doesn't have detailed instructions on how to do this. Turns out, it's not Hudson specific at all, but instead relies on how you set your container realms up.
This page tells you how to add the login-config to your web.xml (in this case, add it to jetty's webdefault.xml, since repackaing Hudson is sort of lame). Then you copy a realm into the jetty.xml file, probably looking like so :
<Set name="UserRealms">
<Array type="org.mortbay.jetty.security.UserRealm">
<Item>
<New class="org.mortbay.jetty.security.HashUserRealm">
<Set name="name">DefaultRealm</Set>
<Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
<Set name="refreshInterval">5</Set>
</New>
</Item>
</Array>
</Set>
Then you edit your realm.properties, and you should be good to go.
Posted Sat Mar 8 19:35:10 2008