A significant milestone in the AI journey in India. Neocambrian AI, a startup founded by the entrepreneur, Abhinav Kukreja, has officially opened with a clear and ambitious vision – to create large-scale human action datasets for robotics and embodied AI systems. The company is developing itself at the crossroads of Physical AI development and data infrastructure — an area that is quickly establishing itself as one of the most competitive in global technology.
The launch is a signal of some sort. The quest to create Physical AI models is no longer the West’s tale. India, on the other hand, with its large population and varied work environment, has started to make its presence felt in the discussion with a sense of seriousness as it has also produced technical talent on the rise.
Physical AI: What it is and why it’s in need of new data?
This is AI that works and communicates with the real, physical world. Physical AI enables robots, autonomous systems, and embodied machines to understand physical environments, interact with objects, move through spaces, and adapt to unpredictable real-world situations.
The data needed for training these systems is completely different. The text-based AI models were trained on trillions of words that have been extracted from the internet, which is huge, diverse and widely accessible. There’s no such equivalent in physical AI. To train a robot to perform a task requires a detailed record of human physical actions, such as picking up objects, opening doors, pouring liquids, walking through kitchens, navigating through corridors, and performing thousands of common manual actions.
This gap has been directly expressed by Abhinav Kukreja. In the beginning of the LLM revolution, there were text corpora that were massive in scale and already publicly available. But, he says, robotics is lacking in internet-scale datasets like that. Neocambrian AI looks to fill that void.
India’s first Robotics Data Factory

The company’s primary product is a “high fidelity, pre-trained scale database of human action”. Neocambrian AI is using a complex hardware setup to gather this data. The toolkit includes egocentric video capture systems that record first-person human activity, motion-tracking hardware that maps human body movements in 3D, stereo capture rigs that capture environments with depth accuracy, and upgraded UMI (Universal Manipulation Interface) devices designed specifically for robotics training workflows.
Neocambrian AI says it has built India’s first robotics data factory, a dedicated facility designed to scale the creation of structured and annotated human action data. This is not just a byproduct of the company positioning. It is at the core of the value proposition.
In addition, the company has announced its plans to share thousands of hours of collected data for free with researchers in India who are developing vision-language-action (VLA) models and world models.This open-access approach directly benefits Indian AI researchers who compete with well-funded Western AI labs that have far greater resources.
The Founder: Abhinav Kukreja’s Background
Abhinav Kukreja has no lack of experience in the startup of technology ventures. Prior to the creation of Neocambrian AI, he was the creator of DataVantage, a platform providing AI-powered marketing workflows for medium and large technology businesses. His background in enterprise AI and data pipelines provides a real-world understanding of scaling up structured data creation, which can directly apply to the work that Neocambrian AI is trying to do in the robotics world.
His move to Physical AI data infrastructure is part of a wider trend of seasoned AI entrepreneurs transitioning to a new space. Entrepreneurs with data skills are realizing that the infrastructure required to train Physical AI is still vast.The boundaries of language models are becoming more apparent, and the next frontier is one of embodied intelligence; the underlying infrastructure for training Physical AI is still, however, vast.
India as a Global Hub for Physical AI Datasets
One of the most interesting parts of Neocambrian AI’s thesis is their reasoning of why India is an ideal place to do this sort of work. Kukreja highlighted three advantages that India has.
Firstly, the country has a massive and heterogeneous labor force with the ability to undertake data collection tasks at scale. Second, the contexts in which Indian robots operate are also real and diverse, from dense urban areas to smaller towns, and from diverse domestic environments, which is useful for training strong Indian robotics models. Third, India has experience of coordinated deployment of large numbers of human tasks in the field in a distributed manner, making the logistical challenge of coordinating large-scale field data collection relatively easier than in markets that may have higher labour costs and less experience in such coordinated human-task deployment.
These factors make it reasonable to believe that India may become a big training data provider for robotics makers around the world, rather than just a market for their products.
A serious game issue on a growing field.
Neocambrian AI is making its foray into an area that is gaining more traction and is getting more attention from startups as well as from the big dogs that they target. The idea has been quietly tested for several months by home services companies and robotics data collection companies. A second Indian startup, Pronto, has been conducting tests on collecting data related to Physical AI. A home services platform, called Snabbit, has confirmed it was contacted by Human Archive of the USA for some of the same data collection projects, but ultimately decided not to proceed.
These advances are a true sign of the conflict that exists in the Physical AI space. The requirement for a wide variety of real-world human action data is increasing at a remarkable rate on one side.On the other side, the demand for the different kinds of human action data is rapidly increasing. On the other hand, issues of privacy, consent and ethics with respect to the collection of large amounts of behavioural data are yet to be answered. The issue will be how companies which are going into this space can address these concerns clearly in the face of increasing regulatory interest in India and elsewhere.
One indication that Neocambian AI has considered the factors beyond mere revenues is by releasing the data to Indian researchers free of charge. But, when it comes to consent frameworks, and data governance for the workers that it deploys in its data factory, this too will be a factor as the industry evolves.
What This Means for India’s AI Ecosystem
The introduction of Neocambrian AI takes a significant step towards enriching India’s AI landscape with a new component. In terms of work done so far, a large portion of Indian AI initiatives have been on software services, LLM applications, and AI-powered products that leverage global foundation models. Physical AI data infrastructure is a new way of playing — it’s all about the training layer of next-generation robotics systems which are closer to hardware than software.
Neocambrian AI’s ability to deliver high-quality, reliable datasets for robotics training at an Indian scale could entice viable partnerships with global organizations, such as robotics developers, research institutes, and AI labs, which are looking for precisely the type of data being created by Neocambrian AI. It would be the true representation of the differentiated contribution of the Indian AI ecosystem to the world of robotics revolution.
It is a purposeful timing. The need for a variety of meaningful training data will increase as humanoid robots transition out of the lab and into commercial products. Entering the data factory space early, before consolidation and before it reaches maturity, is a good risk to take — and one that pays a lot of dividends.
Frequently Asked Questions
Neocambrian AI is an Indian startup led by Abhinav Kukreja for generating human action datasets for robotics and embodied AI training on a large scale. It has the first dedicated robotics data factory, capturing structured physical behaviour data at pre-training scale by using egocentric video capture, motion tracking, stereo rigs and UMI devices.
Physical AI — AI systems that interact with the real world: robots, autonomous machines and embodied agents. Unlike the text-based models trained on internet-scale text corpora, Physical AI systems require a lot of recorded physical activity performed by humans to learn how to manipulate objects, interact with the environment and perform real-world tasks. This type of training data is currently lacking at a scale that’s needed, and that’s what Neocambrian AI aims to fill.
The CEO of Neocambrian AI, Abhinav Kukreja, was previously the founder of DataVantage, a firm that supplies AI driven marketing workflow automation for medium to large sized businesses. Neocambrian AI’s structured data system and enterprise AI operation is directly informed by his experience in structured data systems and enterprise AI.
Yes. The company has stated it will release thousands of hours of robotics training data for free to Indian researchers that are developing vision-language-action (VLA) models and world models. This places Neocambrian AI in the role of a commercial data provider and a participant in the open AI research community in India.
India’s three strengths are having a big and cost-effective workforce to conduct scaled data collection; having diverse real-world environments to enhance the variety of the data set and having good experience in the operation of distributed service coordination. All these contribute to India’s ability to become a global source for Physical AI training data for robotics developers all over the world.
