because I can, even though I don't want to

Wednesday, November 18, 2009

Knot Pop Quiz

THE QUESTION
what knot is this? any idea?

THE ANSWER
Bowline.  In this form it's sometimes called a 'double bowline'.

In the old days this knot was used to tie the rope around your waist- no harness.  If the leader fell, and the rope didn't   break, the result was not pretty- broken ribs were common.   When  harnesses first came around the bowline, and later the  double  bowline, was the standard tie in knot. Some people still  use it  to tie in, it's supposed to be easier to untie after  being loaded  than a figure eight, but it has sharper bends in  it which means  it's a slight bit weaker than the figure eight.   Any bend in a  rope lowers its breaking strength, and the sharper the bend the  more the rope is weakened. The figure  eight has the least sharp  bends of any known knot, making it  the 'strongest'.  It is also  easy to tie and easy to  glance at  it and verify it is tied  correctly- which are the main reasons  it is preferred by most climbers today.

Note: This post was ghost written by me for "could have been my first guest blogger but refused" Neal Harder in response to http://twitter.com/sudarkoff/status/5836400206. aka I just copied his email response into this form since he's "allergic" to blogging. 

Wednesday, October 28, 2009

Rewriting our Berkeley XMLDB services on Twisted, I decided to implement some couch db type interfaces like responding to post and get requests for DB actions. Oh, and don't forget the already in use xmlrpc interfaces to stay backwards compatible! It was surprisingly simple after staring at the docs for a day or two, so I'm posting a piece of the code in hopes that the example makes it easier for people new to twisted. Below you'll find an example which not only covers the post/get/xmlrpc thing, but also has threadpools, deferreds, xmlrpc kwargs, and a reactor example. There are some pieces missing since the server is actually much more complicated but hopefully you get the idea. Bon appetite!


#! /usr/bin/python

import os
import atexit
import signal
import sys
from xxx.services.utilities import write_pid, exit_function, handle_sigterm

from twisted.web import xmlrpc, server, resource
from twisted.internet import defer
from twisted.python import threadpool
from twisted.internet import reactor

'''
Configure logging
logging is a blocking operation, which violates twisted principles
so please use the logging module, for warnings/errors that should be
seen by everyone. DB queries et al, should go to the twisted logging
module, since theirs is non-blocking.

http://twistedmatrix.com/trac/wiki/TwistedLogging
'''
from twisted.python import log


'''
We are using two different versions of twisted between machines -
so lets see if we can accommodate both easily
'''
TWISTED_ABOVE_8_1 = False
from twisted import version as tversion
if tversion.minor >= 2:
TWISTED_ABOVE_8_1 = True


class TXmldb(xmlrpc.XMLRPC):
isLeaf = True

def __init__(self,):
xmlrpc.XMLRPC.__init__(self)

'''
Set up a thread pool
'''
self.threadpool = threadpool.ThreadPool(config.MIN_THREADS, config.MAX_THREADS)
self.threadpool.start() # needed?


# support for handling rest requests as well
def render_GET(self, request):
# who are we going to call
func = request.path[1:] # omit leading slash

if func == 'get':
docName = request.args['howMuchHotness'][0]
defer.maybeDeferred(self.__get, howMuchHotness).addCallbacks(self.finishup,
errback=self.error,
callbackArgs=(request,))
else:
return "No comprende senor!"

return server.NOT_DONE_YET


def render_POST(self, request):
request.setHeader("Connection", "Keep-Alive")
# if this is an xmlrpc call, there will be no args
# since it will be all marshalled up into a content
# body. So, checking for arg length should tell us
# accurately if this is xmlrpc or not
if not len(request.args):
return xmlrpc.XMLRPC.render_POST(self, request)

func = request.path[1:] # omit leading slash

# otherwise handle this like a post
data = request.args['data'][0]
if func == 'add':
defer.maybeDeferred(self.__add, data).addCallbacks(self.finishup,
errback=self.error,
callbackArgs=(request,))
else:
return "No comprende senor!"

return server.NOT_DONE_YET



def error(self):
logging.error("ewe...")


def finishup(self, result, request):
# post and get only take string results
if result == True:
result = "1"
if result in [False, None]:
result = ""

request.write(result)
request.finish()


'''
---------------
XMLRPC INTERFACES

Note: In the past, we called this with positional arguments
but we also want support for passing a dictionary of keyword values
theoretically the argument orderd one gets phased out but, I
can see value in having both. the 'k' added to the suffix
stands for "Kwargs"
---------------
'''
def xmlrpc_get(self, howMuchHotness):
return self._deferToThread('__get', howMuchHotness)

def xmlrpc_getk(self, kwargs):
return self._deferToThread('__get', **kwargs)

def __get(self, howMuchHotness):
return "did something %s"%howMuch

def _deferToThread(self, f, *args, **kwargs):
if TWISTED_ABOVE_8_1:
return threads.deferToThreadPool(reactor, self.threadpool, f, *args, **kwargs)
else:
d = defer.Deferred()
self.threadpool.callInThread(threads._putResultInDeferred, d, f, args, kwargs)
return d


def __del__(self):
self.threadpool.stop()


def start(port, schema=None):
port = int(port)
log.msg("Initializing txmldb on port %s ..." % (port,) )
r = TXmldb()
reactor.listenTCP(port, server.Site(r))
reactor.run()
return reactor

def deamonize(port=7080, logFile="/var/log/txmldb.log", pidFile="/var/tmp/txmldb.pid"):
fp = open(logFile, 'a+b')
log.startLogging(fp)
try:
pid = os.fork()
if pid > 0:
# Exit first parent
sys.exit(0)
except OSError, e:
print >>sys.stderr, "fork #1 failed: %d (%s)" % (
e.errno, e.strerror)
sys.exit(1)

# Decouple from parent environment
os.chdir("/")
os.setsid()
os.umask(0)

# Do second fork
try:
pid = os.fork()
if pid > 0:
# Exit from second parent
write_pid(pid, pidFile)
sys.exit(0)
except OSError, e:
print >>sys.stderr, "fork #2 failed: %d (%s)" % (
e.errno, e.strerror)
sys.exit(1)
atexit.register(exit_function,pidFile)
#signal.signal(signal.SIGTERM,handle_sigterm)

# Start the daemon main loop
start(port)

if __name__ == "__main__":
import getopt
args = sys.argv[1:]
port=7080
logFile=None
pidFile=None
try:
opts, args = getopt.getopt(args, "p:l:P:d")
except getopt.GetoptError, e:
print "%s. options are -l (log file location) -P (pid file location) -d (detach) and -p (port)"%e
sys.exit(0)

detach = False
for opt, value in opts:
if opt == "-p":
port = value
elif opt == "-l":
logFile = value
elif opt == "-P":
pidFile = value
elif opt == "-d":
detach = True
if detach:
deamonize(port, logFile, pidFile)
else:
start(port)


Thursday, January 8, 2009

Plone LDAP and 450% speed increase rendering page load time

"Where I be workin' now we's goin through trubles, perfomance troubles solved by ma jigga... me?"...

ok - so I can't rap. big deal, neither can you. point is that we have been investigating the curmudgeouness in our plone 2.5.3 custom archetypes based product and came across this gem of a performance fart. our setup is weird I confess and I would be suprised if this actually applies to anyone but nontheless, thar she is.

we have many different users base dns in active directory that share the same group dns (scalability reasons) that map to zope roles. so we make plenty o' calls to see groups members to list them. turns out that this setup had something weird: our manager dn had permission to list other portals dns members but not to retrieve them. so if our user dn from one portal instance was "OU=AWESOME,DC=WE_ARE" and another was "OU=OK,DC=WE_ARE", they could share a groups DN of "OU=EDITORS_GROUP,DC=WE_ARE". The query to ldap for members of editors group would then return all user it can list, not edit, from both portals since they share this gruoping. Seems harmless enough right?

WRONG

(that could not be dramatic enough).

so for each user that comes back from the groups listing, there is a call to get that user. if that user call fails (i.e. the permission fails) the user is just ommitted from the list. so if those two portals each have 50 users in them, then there are 100 calls to get users from either portal, even though only 50% are accurate. oh, and each call is 1/10th of a second each (I <3>

security.declareProtected(manage_users, 'getGroupedUsers')
def getGroupedUsers(self, groups=None):
""" Return all those users that are in a group """
all_dns = {}
users = []
member_attrs = list(Set(GROUP_MEMBER_MAP.values()))

if groups is None:
groups = self.getGroups()

for group_id, group_dn in groups:
group_details = self.getGroupDetails(group_id)
for key, vals in group_details:
if key in member_attrs or key == '':
# If the key is an empty string then the groups are
# stored inside the user folder itself.
for dn in vals:
all_dns[dn] = 1

for dn in all_dns.keys():
# Only attempt to retrieve the user if their DN
# matches the Users Base DN
+if not dn.count(self.users_base):
+ user = None
+else:
try:
user = self.getUserByDN(dn)
except:
user = None
if user is not None:
users.append(user.__of__(self))
return tuple(users)

Monday, October 20, 2008

If its not asynchronous yet, make it

I am at war. It is a war against apis, query strikes, and unicode bombs. The enemy is Berkeley XMLDB, and I'm tired of losing. 

My battle at bunker hill has been using this embedded database in a server environment. XMLDB and its corresponding python bindings are not optimized OR designed for server environments. Because its embedded, it has some very negative server side effects:
  • if the db segfaults, the server segfaults
  • since the python bindings are a swig wrapper, the c libs segfault easily
  • if the db hangs, the server hangs
  • no persistent connections are a performance hit
  • multiple dbs are a management headache
  • multi-threaded processes are not well supported out of the box, and servers are multi-threaded
  • multi-processes are not supported at all, and can cause mega corruption of the database
  • if anything goes wrong with anything, recovery has to be single process single thread, meaning that all connections have to be terminated to bring one db back online
All these things add up to down time, down time, down time.  Down time means I'm losing. It stresses you out to the land of disappearing productivity and forces mistakes.  And then there is the ever present fear that one day a recovery will corrupt it all. XMLDB, I'm coming to fix you. You better be ready.

Now I need a game plan.  First things first, tackle swig and any nasty exceptions that come out of it.  Minimize encoding errors, separate interfaces, and wrap, wrap, wrap. Protection is the name of the game, and to ease the blow of attacks I'm gonna need some good armor. With every call to swig there will be a try/except/finally there on the defense.

Java weenies seem to have a web friendly XMLDB interface thats filled with interfaces of wrappers of SOAP and jsr*** that is a great idea but just won't work. Adding java to a python architecture is just going to make environment variables clash and maintenance will bring my men down.  To counter, I've started playing with twisted, as its guaranteed single threaded and can manage its threads to aid recovery. This should help separate the server from the db, a strategy that should have been painfully obvious from the start. It should also maintain an xmldb version of "persistent connections". Finally, a separate server means that if xmldb goes down, it doesn't drag plone into the mud with it. More men on the field means more threads could die, but it also means more threads to fight.

Living in a world of synchronous calls has been a dangerous game. Ideally I would like to update things on the spot, but anytime I depend on something being up, it is down. Storing data in a processing queue (such as a fancy xml feeds folder) is not premature optimization, just a smart way to never underestimate the power of the "retry" from transactions of web requests.  That's one thing java heads got right and I hope it will be the nuclear bomb in my currently weak arsenal.

Onwards and upwards I guess: XMLDB, prepare to behave.

Friday, June 13, 2008

My Abstraction Optimization

Here at Anus Health, you kinda get used to doing things multiple, unnecessary times: redoing the work of minions, retring bad data calls, or explaining complicated migration procedures to pointy haired bosses. The worst is writing code - write, rewrite with a better data model, reformat, add a tiny feature, clean - it never ends. With over 15,000 patient records filling over 13GB of data, today is the day where I need to start thinking about performance and rewriting better, more scalable code...

Abstract Out, Then Optimize In
I tend to build software according to the adage "if it doesn't make sense yet, just build another layer of abstraction". You know what? It works. Put time up front to make really nice abstractions and even the dumbest of colleagues are knockin' out working code like monkeys eat lice. But there is a price for abstraction in the end, and that tends to be performance. Every layer that you added is spinning the disk better than Punjabi MC and eating more RAM than your mom. The classic example of this is my good friend Plone. Amazing to quickly buildout an app, but I'll be damned if after 6 years of development experience, intense caching policies, and 16GB of RAM I still can't get it running much faster than a hobo on a train track.

In my last post, I talked a lot of shit about not worrying about scalability. If you're lucky, you get to the point where you actually have to start worrying because good caching and hardware is only gonna take you so far. The interesting thing about this is that after you spend all that time abstracting away from the nitty gritty, you know have optimize back down through the layers to find efficiencies: build things up and then tear them down. This is exactly what twitter is doing. Now that things are huge, start replacing those yummy developer abstractions with good old nasty C. Instead of using an ORM, write your reports direct from MySQL then take it a notch further and read directly from disk. Optimization high five - whipish!

The thing that uber sucks about optimizing an abstracted system is the fact that these optimizations are not easy to write. Current interfaces are entagled in kludges deeper than a whales vagina. The little pieces of backwords compatability from your 1034th version of the photo album picture are just barely holding up as it is. What is a developer to do? Have the Java weenies been right all along?!? Stay tuned...

Next time on Sincerely, Management:
  • "My Responsibility Response" - Things get whackey at Anus Health when Liz learns that Plone can't do everything, especially reports. Guest starring "IT guy".
  • "My Feed Lust" - After a losing battle with IP latency and unreliable 3rd party API responses, Liz shows instantaneous response the door. Not suitable for children under 25.
  • "My New XML Fetish" - Liz sends JSON & GET packing after they discover her steamy affair with XML, starting a dangerous journey away from friendly 3rd party developer API land.

Labels: , ,

Monday, May 19, 2008

Opportunity Knocking: SWF ISO QA Cowboy

Note: This post is only reflective of my desire to work with someone who doesn't suck, and not of my employer. I have to say, though, that I have considerable pull so get at me if you want to rock with yours truly every day to make some kick ass software that gets used by real people at a well funded start-up.

Here I am, lonely, and looking for a mate... my one and only work mate! I know that you are out there, and you can roll with the punches like the best of them. I know that you are un-phased by "emergencies" and have corrupted your own database a time or two only to breathe and recover gracefully (at least the second time). I know you aren't afraid of restarting a service live if timed right and that you truly understand that logs are there for more than just wasting disk cycles. You know when to yell at me for my coding sins and when to give me slack for a 95% pass day.

Save me from myself and my kludgey disgressions. Are you out there?

MISSION
:

Help us get our test on by setting up testing protocols, enforcing test suites, and anything else that makes a sturdier code base and happier customers. We have regression/unit testing in place but are looking for someone to enforce that they get run often as well as wrangle together the rest of the test picture pieces including pre and post deploy testing, bugs, new features, and performance.

A strong head and leadership qualities are a must: we are growing fast so don't be surprised if you are expected to voice opinions and yell about the right way to do things now and then. You will be given any tools you need to maintain system integrity and enforce test policy.This is the perfect position for someone who is ready to step it up a notch and help lead a small company to success-ville.

For those that are interested and able, this position can easily and quickly turn into a developer position.

SYSTEM:

We work primarily with Python and the Plone framework but dabble in many arenas such as COM, XMLRPC, XMLDB, Adobe Livecycle Workflow and Designer, etc... depending on what needs to be glued at the time. You will learn and understand how all these pieces fit together into one beautiful system and are expected to care for each piece like it was written by you.

REQUIREMENTS:
  • BS in Computer Science or must possess portfolio proving proficiency otherwise
  • 1-2 years experience testing or building web applications, preferably with dynamic languages and kudos if that language is Python
  • Ability and excitement to learn new languages/technologies in a web based environment
  • Recent grads will be considered if internship experience is close - we will gladly feed and grow you into the position
  • Must know when to ask questions and when to ask Google
  • Self-sufficient and flexible
  • Excellent troubleshooting, analytical, documentation and communication skills
  • Must be dependable, have a positive attitude, and be a team player
Knowledge of any of the following is a awesome:
  • Experience working with and/or developing in Plone
  • Familiarity with Test Driven Development and Agile Methodologies
  • History working with XML databases
  • Experience with Adobe PDF and/or Livecycle Designer
COMPENSATION:

Competitive salary based on experience with all the startup stock yummies.

Monday, May 12, 2008

I Poop on Designing for Scalability

Dear John -

Is it OK that I answer your question with a blog post? It seems very web 2.0, and my whole goal in life is to be more web 2.0 than any other web 2.0 weenie out there. Plus, maybe there is a reader out there, and maybe they have a better response.

Unfortunately, I've done a bunch of scaling in my short time on this planet (especially working with the notoriously performance agnst Plone) and I have to side with the camp that you should rarely spend time scaling architecture in the beginning. Here are some yummy reasons on why you should wait to pop your scaling cherry:
  • your web app may never take off and you have wasted time scaling when you could have spent time writing a killer feature
  • if it does take off, pay someone else to worry about it. If you read anything about scaling twitter, read about how they "fired" Blaine Cook and hired a bunch of scalability experts instead. Zing!
  • caching goes a loooong way. I like squid myself - it's a sexy beast if a little hard to configure - and I hear varnish is pretty top notch too. Hell, httpd has a nice accelerator built in too if you need a quick fix. Let's take a moment and remember WHY caching works so well: it serves up content that an app server like RoR or Django could give two craps about such as javascript, css and images. Have you ever looked at how much time your browser spends loading this stuff? It's a lot, and cache servers know when to expire, refresh, reload http headers, etc. Your cache server may be your users' browsers' best friend.
  • you can and should throw more hardware at the problem first. its cheap enough these days and it will buy you the time you need to develop a real scaling solution
  • Jared Spool gave a great talk at sxsw about actual performance and perceived performance. the performance of your pages (i.e. load time) is almost always next to perceived performance, the time it takes your user to complete a task. For example, amazon.com takes on average an unthinkable time to load pages but it is commonly perceived as the fastest site out there because of its 1 click functionality to get things done.
  • IP latency is a factor. If you are integrating with any external site (who isn't these days?), chances are that you will see more lag time from waiting for responses for that than any kludgey code you can write
  • it will rarely clog where you think it will clog so take that pipe dream to the dump and let the bums sleep in it
That being said, here are some things you can think about now if you are worried:
  • don't write stupid code. if you notice yourself writing 4 nested loops, think - hey, that could be nasty later on.
  • write good db queries and for the love of god don't use your code to filter out the results. in the same respect, don't access database variables more than needed - make a higher level variable and reference that forever. watch out for this in ORMs - they can be very inefficient for scaling even though they make code writing fast. but read bullet 1 above first.
  • know your language/tools. if you are using RoR or something that is known to be slow, anticipate it being a problem later and rewriting key parts in a language like C. oh, your language doesn't have C wrappers? that could be a problem ...
  • think about concurrency while you are writing for KNOWN expensive operations. Think: does this action rely on another action before going to the next step? For example, we have a process which needs to create an appountment in exchange and then attach 2 files to the resulting appointment. Instead of writing a functional piece of code like so: create appointment (2s) > attach file 1 (60s) > attach file 2(120 s) > report results(2s)) totaling 184sec, think of doing a threaded version: (create appt(2s) > report results(2s) > kickoff concurrent attach threads(120s) ) totaling 124 seconds. Web peeps don't think about concurrency enough. Be different.
  • put a round robin queue that diverts from dead parents so you can hot patch stuff live if your code requires restart for changes to take effect (i.e. compiled code). this has saved me a hundred times over and eliminates downtime while still allowing you to respond to heinous bugs.
Last but not least, don't forget Knuths famous words of wisdom: "Premature optimization is the root of all evil!!!"

This data streams crap seems like it was written by an architecture astronaut. Yeah right! Who really writes code like that? (hint: no one).

Hth,

Liz

>>>
...
So, although I truly want to know how you're doing, I also have a question for ya. It stems from thinking long and hard how to go about building the architecture for my latest startup. I've read interesting posts such as Two data streams for a happy website (http://gojko.net/2008/03/03/two-data-streams-for-a-happy-website/) and Scalability (http://romeda.org/blog/2008/05/scalability.html) from Twitter's Blaine Cook, but it's hard to figure out where to go from here.

Basically... how much time do I spend planning how the architecture can scale before I know the metrics. I've read that I shouldn't worry about it until I'm there, while others say to keep 2 separate data streams (one that requires users to be logged in and one that doesn't). I expect to have to tweak the solution if we do reach limits, but the more I know before I start writing code the better.

Have you (or friends you know) hit this problem? What advice could you provide?

Thanks in advance!

Best,
John
>>>