RAG AI: ‘Do it yourself,’ says NYC data scientist

0
14
An encouraging new conversation around sustainable IT, says Nordic CIO

Source is ComputerWeekly.com

Organisations should build their own generative artificial intelligence-based (GenAI-based) on retrieval augmented generation (RAG) with open sources products such as DeepSeek and Llama.

This is according to Alaa Moussawi, chief data scientist at New York City Council, who recently spoke at the Leap 2025 tech event in Saudi Arabia.

The event, held near the Saudi capital Riyadh, majored on AI and came as the desert kingdom announced $15bn of planned investment in AI.

But, says Moussawi, there’s nothing to stop any organisation testing and deploying AI with very little outlay at all, as he described the council’s first such project way back in 2018. 

New York City Council is the legislative branch of the New York City government that’s mainly responsible for passing laws and budget in the city. The council has 51 elected officials plus attorneys and policy analysts. 

What Moussawi’s team set out to do was make the legislative process more fact-based and evidence-driven and make the everyday work of attorneys, policy analysts and elected officials smoother. 

First AI app built in 2018

To that end, Moussawi’s team built its first AI-like app – a duplicate checker for legislation – for production use at the council in 2018. 

Whenever a council member has an idea for legislation, it’s put into the database and timestamped so it can be checked for originality and credited to the elected official who made that law come to fruition. 

There are tens of thousands of ideas in the system and a key step in the legislative process is to check whether an idea has been proposed before.

“If it was, then the idea must be credited to that official,” says Moussawi. “It is a very contentious thing. We’ve had errors happen in the past where a bill got to the point of being voted on and finally another council member recalled they had proposed the idea, but the person who had done the duplicate check manually had somehow missed it.”

By today’s standards, it’s a rudimentary model, says Moussawi. It uses Google’s Word2Vec, which was released in 2013 and captures information about the meaning of words based on those around it. 

“It’s somewhat slow,” says Moussawi. “But the important thing is that while it might take a bit of time – five or 10 seconds to return similarity rankings – it’s much faster than a human and it makes their jobs much easier.” 

Vector embedding 

The key technology behind the duplicate checker is vector embedding, which is effectively a list of numbers – the vectors – that represent the position of a word in a high-dimensional vector space. 

“That could often consist of over a thousand dimensions,” says Moussawi. “A vector embedding is really just a list of numbers.” 

Moussawi demonstrated the idea by simplifying things down to two vectors. In a game of cards, for example, you can take the vector for “royalty” and the vector for “woman” and they should give you the vector for “queen” if you add them together. 

“Strong vector embeddings can derive these relationships from the data,” says Moussawi. “Similarly, if you added the vectors for ‘royalty’ and ‘men’, you can expect to get the vector for ‘king’.”

That’s essentially the technology in the council’s duplicate checker. It trains itself by using the full set of texts to generate its vector embeddings. 

“Then it sums over all the word embeddings to create an idea vector,” he says. “We can measure the distance between this idea for a law and another idea for a law. You could measure it with your ruler if you were working with two-dimensional space, or you apply the Pythagorean theorem extended to a higher dimensional space, which is fairly straightforward. And that’s all there is to it – the measure of distance between two ideas.”

Moussawi is a strong advocate that organisations should get their hands dirty with generative AI (GenAI). He’s a software engineering PhD and a close student of developments – through the various iterations of neural networks – but is keen to stress their limitations.

“AI text models, including the state-of-the-art models we use today, are about simply predicting the next best word in a sequence of words and repeating the process,” he says. “So, for example, if you ask a large language model [LLM], ‘Why did the chicken cross the road?’, it’s going to pump it into the model and predict the next word, ‘the’, and the next one, ‘chicken’ and so on.

“That’s really all it’s doing, and this should somewhat make you understand why LLMs are actually not intelligent or don’t have true thought the way we do.

“By contrast, I’m explaining a concept to you and I’m trying to relay that idea and I’m finding the words to express that idea. A large language model has no idea what word is going to come next in the sequence. It’s not thinking about a concept.”

According to Moussawi, the big breakthrough in the scientific community that came in 2020 was that compute, datasets and parameters could scale and scale and you could keep throwing more compute power at them and get better performance.   

He stresses that organisations should bear in mind that the science behind the algorithms isn’t secret knowledge: “We have all these open source models like Deepseek and Llama. But the important takeaway is that the fundamental architecture of the technology did not really change very much, We just made it more efficient. These LLMs didn’t learn to magically think. All of a sudden, we just made it more efficient.”

Why you should DIY AI

Coming up to date, Moussawi says New York City Council has banned the use of third-party LLMs in the workplace because of security concerns. This means the organisation has opted for open source models that avoid the security concerns that come with cloud-based subscriptions or third-party APIs. 

“With the release of the first Llama models, we started tinkering on our local cluster, and you should too. There are C++ implementations that can be run on your laptop. You can do some surprisingly good inference, and it’s great for developing a proof-of-concept, which is what we did at the council.

“The first thing to do is to index documents into some vector database. This is all work you just do once on the back end to set up your system, so that’s ready to be queried based on the vector database that you’ve built. 

“Next, you need to set up a pipeline to retrieve the documents relevant to a given query. The idea is that you ask it a prompt and you’d run that vector against your vector database – legal memos you’ve stored in your vector database or plain language summaries or other legal documents that you’ve copied from wherever, depending on your domain. 

“This process is known as retrieval augmented generation or RAG and it’s a great way to provide your model with scope regarding what its output should be limited to. This significantly reduces hallucinations – and, since it’s pulling the documents that it’s responding with from the vector database, it can cite sources.”

These, says Moussawi, provide guardrails for your model and give the end user a way to ensure the output is legitimate because sources are being cited.

And that’s exactly what Moussawi’s team did, and his message– while he awaits delivery of the council data science team’s first GPUs – is: “What are you waiting for?” 

Source is ComputerWeekly.com

Vorig artikelHow to use data center wind turbines for sustainable energy
Volgend artikelMSP cuts costs with Scality pay-as-you-go anti-ransomware storage