The Cult of G.E.N.I.U.S. Pt. 3
A Speculative Exploration on the Possibility of a Full-Stack, AI Architect
Part 3: An A.I. Firm ‘Strategy’
In this Post[1]:
Doing the Work
Developing an Autonomous A.I. Workflow
Building a Firm-wide, A.I. Ecosystem
Doing the Work
‘Firm strategy’ has varied interpretations, and I know plenty of architects whose strategy only has 2 steps: 1) Get the Work and 2) Do the Work. We’re going to set aside wider discussions of strategy involving market segmentation, branding, etc., although I will be discussing them in a future article. For now, we’re going to focus on Point No. 2: how to do the work, and specifically how to organize a workflow at Lang, Shelley & Associates.
‘Doing the work’ is so ordinary to most architects, they would probably not think of it as a ‘strategy.’ It’s just what you do. Get the design goals from the client, go see the site, document the site, confirm the budget, layout some schematics, and so forth. My research with GPT indicates that it knows these steps, in general terms. However, I wouldn’t want a ‘one-size fits all’ for my clients, because my clients are special and my approach is so unique. The idea would be to mimic not just a human architect, but me specifically, and that requires that G.E.N.I.U.S. develop a work flow that is customized in accordance with both my professional approach and the instructions previously given by the human client.
Developing an Autonomous A.I. Workflow
To develop a work flow, we’re going to turn to the relatively new field of ‘autonomous’ AI, which is developing tools to assemble individual AIs into intelligent sequences.[2] Think of it like an AI project manager: it selects and executes the tools necessary to accomplish a given objective. The ‘tools’ are often other AIs. To be clear, this isn’t Artificial General Intelligence (AGI). These are programs that connect other programs into sequential chains of tasks as necessary for a given objective. They do so using LLMs to communicate, in most cases.
Microsoft’s Jarvis utilizes Chat GPT and the AI model hub at HuggingFace to create unique solutions to any specified request.
With a little help, it could take a user’s request in natural language through Chat GPT (e.g. ‘design a façade system where all materials can be sourced within 500 miles of the project’s location’), and subsequently sort through HuggingFace Model Hub’s library of different AI models to find those applicable to the request. You can think of each model as a “MiniAI” having been designed to execute a particular task. There are ‘MiniAIs’ for Object Detection, Text Classification, Automatic Speech Recognition, Image Classification, Visual Question Answering, etc. In full disclosure, I haven’t done a survey to see which might be useful to architectural practice, but there are currently 169,000 such models on Hugging Face as of publication.
Jarvis then selects for the appropriate models, and sequences them as necessary to complete the request, and generates the result.
This allows for individual AI models to be created by domain experts and stored in a central repository, and then recruited whenever your AI has a chore that might require them.
BabyAGI and AutoGPT represent a step beyond that, and do largely the same thing. When you give them a particular objective, they lay out the tasks they regard as necessary to complete that objective, sequence them, and execute them. BabyAGI works by creating a task list on a continuous loop. The loop has 3 steps:
Takes the first item from a list of tasks and sends the task to an execution agent, which uses OpenAI to finish the task based on the information given. FYI, the first task can be ‘generate a task list for the given objective.’ It then improves the result and saves it in a storage location (Pinecone, in this case)
Based on the original goal and the outcome of the previous task, it uses a task creation agent, to make any new tasks it deems necessary to accomplish the original objective.
Using a prioritization agent, it then rearranges the existing list of tasks (minus the already completed task, but including any newly created task(s)), to move closer to the objective.
It then cycles through again, for as long as is necessary to accomplish the original objective. The result is ‘autonomy’ in the sense that BabyAGI can create and execute tasks that it wasn’t asked to do, so long as they’re in alignment with the original objective specified by a human. Such a platform could be used to complete the higher level, multitask functions of practice. One could tell an AI ‘redesign the building to account for a 10% higher occupant load’ without telling it all the steps necessary to accomplish that objective.
While I haven’t tried either platform for this application (and to my knowledge, no one has tried them on architectural applications), these sorts of ‘autonomous’ AIs are going to revolutionize architectural practice. If an architectural design process consists of 10,000 steps, executed in meaningful order, any AI architect would need the capability to sequence steps in an intelligent way, without being told the exact order. It would require four specific task capabilities:
Generating new tasks to pursue an objective
Sequencing tasks
Completing tasks
Re-ordering tasks based on new data.
Which is exactly what BabyAGI does. Moreover, it’s possible to stack tasks on top of tasks. So you could have BabyAGIs recruiting other BabyAGIs and cooperating on projects.
At time of publication, several programmers have already tried to build this kind of intelligence into more UX friendly forms, mostly within your existing browser (e.g. Godmode.space). Most are still buggy, and get stuck easily. But the technology only debuted two weeks ago - I think we can expect the technology to evolve quickly.
Will the future be that simple? And how would that apply to architectural practice, specifically? Are you really just going to ask a program like AutoGPT or BabyAGI to ‘design a building’ and just watch it execute everything else? Maybe, eventually, but there are a lot of steps between here and there. However, the ‘steps’ also benefit from the same technological advances, and we should expect them to unfold exponentially. It’s hard to say which of the dimensions of NLGAI are most revolutionary, but one I would put at the top of the list would be this: it democratizes coding. Since computers were invented, they have come with usability thresholds that kept people from maximally exploiting their utility. At one point, you needed a giant room and specialized technicians to even operate one. The PC revolution, the Smartphone revolution and other minor revolutions all represent thresholds that, when crossed, have allowed more and more people to leverage these remarkable machines.
NLGAI represents the crossing of another (and perhaps the final) threshold: coding knowledge. Computer ecosystems (e.g. the App Store, or Silicon Valley itself) are organized such that anyone with coding knowledge can contribute to the overall base of ‘what computers can do.’ If you want an app to track your dog’s weight loss journey, it’s entirely possible that someone in the global software development community had the same idea, coded an app for it, and made it available on the App Store. But if they didn’t, you were faced with either A) coding it yourself B) hiring a programmer to make it for you, or just forcing your dog onto the scale every morning and reminding him about the importance of healthy dietary choices.
It hasn’t felt particularly limiting, because the global software development community is massive, and they produce more apps in a day than you could try in a year. But I’m sure that everyone has had those moments where you think ‘Ugh, why can’t my computer just do X?!’ Put simply, it can’t, because no one has programmed it to do so. No programmer thought to program it to do X, and you don’t know how to do that yourself.
NLGAI, because it can write code, opens the door to everyone being a programmer. It’s like having a software engineer at your personal beck & call. It will also generate a revolution in ‘single-serve software’ – software that you ‘write’ to accomplish a task that you A) don’t know how to do or B) are too busy to do or C) just don’t want to do. Of course, you don’t actually write it – you just tell G.E.N.I.U.S. to do it, and he writes the code necessary to execute that task.
Building a Firm-wide, A.I. Ecosystem
It seems daunting to get a platform like G.E.N.I.U.S. or BabyAGI to execute all 10,000 steps in a design process, until we consider it eco-systemically. I’ve prepared an example here, where Architects Adele, Beyonce and Cher collaborate to build out the AI Library at LSA, to give you an idea about how the right innovation ecosystem can support AI integration at a given firm.
Interlude
Welcome back. I apologize for the lengthy example, but it substantiates an important point: the manner in which architects, going forward, will create and deploy armies of MiniAis to execute different tasks within an overall design process isn’t all that strange. It’s the same process by which architects have been generating script libraries and CAD block libraries for decades. Instead of generating assets with which you can perform a task (e.g. making a CAD block for your firm’s CAD block library) or generating scripts (really, just tasks) with which you can achieve a goal (e.g. writing a script to layout window partitions based on a fixed structural grid), you’ll be generating intelligences that achieve a goal by creating, recruiting, and deploying both scripts and assets. It is an extension of the mixing and matching that architects have always been doing, at least since the dawn of CAD. In making their own jobs easier, the human architects at LSA make G.E.N.I.U.S.’s job easier too. This will likely spark an employee revolt, as Adele, Beyonce and Cher click to the idea that they are building their own replacement. But that has been part of the point all along. Architects need to find something else to do. Tasks like “Run a Price Check on all Possible Substitutions” are easily automated through Autonomous AI. The process of human replacement will start there. But as the simple tasks are automated, they set the stage for the slightly more complex tasks to be automated. As those are automated, they set the stage for the medium-difficulty tasks to be automated. And so forth. AI will build on its own progress, and gradually displace most of what an architect does throughout the day.
That’s depressing! Let’s move on to something fun: Schematic Design! (On Friday). And then on Monday we’ll do DD’s and CD’s. Subscribe below to be instantly notified, and a final checkin before we go:
Check in: By now, G.E.N.I.U.S. should have a robust program brief based on an interview(s) with the potential client. The interview has been conducted using my digital self and my experience, transmuted through a platform like Chat GPT or other. G.E.N.I.U.S. now also has an AI ‘project manager’ who’s going to execute the remaining processes. That AI Project Manager has access to a growing library of MiniAIs, scripts, blocks, etc., with which it’s going to pursue the building design.
[1] Reading Notes:
“GPT” refers to both Chat GPT and GPT 4 since both were used in the construction of the model. Chat GPT and GPT 4 have different capabilities, and wherever the use of one was specific, it is noted as such. Wherever “GPT” is used, it could refer to either Chat GPT or GPT 4.
References to ‘the article’ refer to “In the Future, Everyone’s an Architect (and why that’s a good thing)” parts 1 and 2, available on Design Intelligence. If you haven’t read them, you should probably begin by doing so, as they give important context to what follows.
References to the ‘technical addendum’ refer to the previous technical addendum to “In the Future, Everyone’s an Architect (and why that’s a good thing” which was written to provide a detailed record of how the original AI architect/client exchange was developed.
References to ‘the video’ refer to the video included in “In the Future, Everyone’s an Architect (and why that’s a good thing),” which is also available on my youtube.
I developed the initial language model on March 12th, 2023 and developed the video in the days after. I submitted the article for publication on April 7th, 2023. The article was published on April 26th, 2023. The speculations are current as of date of publication, however, I expect them to be rapidly outmoded. I’ll be updating my substack periodically to track new developments at the intersection of AI and architecture, but will leave these articles in their original form.
[2] It’s ‘new’ as in, the last couple of weeks.