Los administradores de TransicionEstructural no se responsabilizan de las opiniones vertidas por los usuarios del foro. Cada usuario asume la responsabilidad de los comentarios publicados.
0 Usuarios y 1 Visitante están viendo este tema.
Hugging Face Clones OpenAI's Deep Research In 24 HoursPosted by BeauHD on Thursday February 06, 2025 @04:08PM from the that-was-quick dept.An anonymous reader quotes a report from Ars Technica:CitarOn Tuesday, Hugging Face researchers released an open source AI research agent called "Open Deep Research," created by an in-house team as a challenge 24 hours after the launch of OpenAI's Deep Research feature, which can autonomously browse the web and create research reports. The project seeks to match Deep Research's performance while making the technology freely available to developers. "While powerful LLMs are now freely available in open-source, OpenAI didn't disclose much about the agentic framework underlying Deep Research," writes Hugging Face on its announcement page. "So we decided to embark on a 24-hour mission to reproduce their results and open-source the needed framework along the way!"Similar to both OpenAI's Deep Research and Google's implementation of its own "Deep Research" using Gemini (first introduced in December -- before OpenAI), Hugging Face's solution adds an "agent" framework to an existing AI model to allow it to perform multi-step tasks, such as collecting information and building the report as it goes along that it presents to the user at the end. The open source clone is already racking up comparable benchmark results. After only a day's work, Hugging Face's Open Deep Research has reached 55.15 percent accuracy on the General AI Assistants (GAIA) benchmark, which tests an AI model's ability to gather and synthesize information from multiple sources. OpenAI's Deep Research scored 67.36 percent accuracy on the same benchmark with a single-pass response (OpenAI's score went up to 72.57 percent when 64 responses were combined using a consensus mechanism).As Hugging Face points out in its post, GAIA includes complex multi-step questions such as this one: "Which of the fruits shown in the 2008 painting 'Embroidery from Uzbekistan' were served as part of the October 1949 breakfast menu for the ocean liner that was later used as a floating prop for the film 'The Last Voyage'? Give the items as a comma-separated list, ordering them in clockwise order based on their arrangement in the painting starting from the 12 o'clock position. Use the plural form of each fruit." To correctly answer that type of question, the AI agent must seek out multiple disparate sources and assemble them into a coherent answer. Many of the questions in GAIA represent no easy task, even for a human, so they test agentic AI's mettle quite well.Open Deep Research "builds on OpenAI's large language models (such as GPT-4o) or simulated reasoning models (such as o1 and o3-mini) through an API," notes Ars. "But it can also be adapted to open-weights AI models. The novel part here is the agentic structure that holds it all together and allows an AI language model to autonomously complete a research task."The code has been made public on GitHub.
On Tuesday, Hugging Face researchers released an open source AI research agent called "Open Deep Research," created by an in-house team as a challenge 24 hours after the launch of OpenAI's Deep Research feature, which can autonomously browse the web and create research reports. The project seeks to match Deep Research's performance while making the technology freely available to developers. "While powerful LLMs are now freely available in open-source, OpenAI didn't disclose much about the agentic framework underlying Deep Research," writes Hugging Face on its announcement page. "So we decided to embark on a 24-hour mission to reproduce their results and open-source the needed framework along the way!"Similar to both OpenAI's Deep Research and Google's implementation of its own "Deep Research" using Gemini (first introduced in December -- before OpenAI), Hugging Face's solution adds an "agent" framework to an existing AI model to allow it to perform multi-step tasks, such as collecting information and building the report as it goes along that it presents to the user at the end. The open source clone is already racking up comparable benchmark results. After only a day's work, Hugging Face's Open Deep Research has reached 55.15 percent accuracy on the General AI Assistants (GAIA) benchmark, which tests an AI model's ability to gather and synthesize information from multiple sources. OpenAI's Deep Research scored 67.36 percent accuracy on the same benchmark with a single-pass response (OpenAI's score went up to 72.57 percent when 64 responses were combined using a consensus mechanism).As Hugging Face points out in its post, GAIA includes complex multi-step questions such as this one: "Which of the fruits shown in the 2008 painting 'Embroidery from Uzbekistan' were served as part of the October 1949 breakfast menu for the ocean liner that was later used as a floating prop for the film 'The Last Voyage'? Give the items as a comma-separated list, ordering them in clockwise order based on their arrangement in the painting starting from the 12 o'clock position. Use the plural form of each fruit." To correctly answer that type of question, the AI agent must seek out multiple disparate sources and assemble them into a coherent answer. Many of the questions in GAIA represent no easy task, even for a human, so they test agentic AI's mettle quite well.
OpenAI Researchers Find That Even the Best AI Is "Unable To Solve the Majority" of Coding ProblemsOpenAI researchers have admitted that even the most advanced AI models still are no match for human coders — even though CEO Sam Altman insists they will be able to beat "low-level" software engineers by the end of this year.In a new paper, the company's researchers found that even frontier models, or the most advanced and boundary-pushing AI systems, "are still unable to solve the majority" of coding tasks.The researchers used a newly-developed benchmark called SWE-Lancer, built on more than 1,400 software engineering tasks from the freelancer site Upwork. Using the benchmark, OpenAI put three large language models (LLMs) — its own o1 reasoning model and flagship GPT-4o, as well as Anthropic's Claude 3.5 Sonnet — to the test.Specifically, the new benchmark evaluated how well the LLMs performed with two types of tasks from Upwork: individual tasks, which involved resolving bugs and implementing fixes to them, or management tasks that saw the models trying to zoom out and make higher-level decisions. (The models weren't allowed to access the internet, meaning they couldn't just crib similar answers that'd been posted online.)The models took on tasks cumulatively worth hundreds of thousands of dollars on Upwork, but they were only able to fix surface-level software issues, while remaining unable to actually find bugs in larger projects or find their root causes. These shoddy and half-baked "solutions" are likely familiar to anyone who's worked with AI — which is great at spitting out confident-sounding information that often falls apart on closer inspection.Though all three LLMs were often able to operate "far faster than a human would," the paper notes, they also failed to grasp how widespread bugs were or to understand their context, "leading to solutions that are incorrect or insufficiently comprehensive."As the researchers explained, Claude 3.5 Sonnet performed better than the two OpenAI models pitted against it and made more money than o1 and GPT-4o. Still, the majority of its answers were wrong, and according to the researchers, any model would need "higher reliability" to be trusted with real-life coding tasks.Put more plainly, the paper seems to demonstrate that although these frontier models can work quickly and solve zoomed-in tasks, they're are nowhere near as skilled at handling them as human engineers.Though these LLMs have advanced rapidly over the past few years and will likely continue to do so, they're not skilled enough at software engineering to replace real-life people quite yet — not that that's stopping CEOs from firing their human coders in favor of immature AI models.
OpenAI Plots Charging $20,000 a Month For PhD-Level AgentsPosted by msmash on Wednesday March 05, 2025 @12:00PM from the up-next dept.OpenAI is preparing to launch a tiered pricing structure for its AI agent products, with high-end research assistants potentially costing $20,000 per month, [alternative source] according to The Information. The AI startup, which already generates approximately $4 billion in annualized revenue from ChatGPT, plans three service levels: $2,000 monthly agents for "high-income knowledge workers," $10,000 monthly agents for software development, and $20,000 monthly PhD-level research agents. OpenAI has told some investors that agent products could eventually constitute 20-25% of company revenue, the report added.
Google DeepMind Is Hiring a 'Post-AGI' Research ScientistPosted by msmash on Tuesday April 15, 2025 @02:02PM from the how-about-that dept.An anonymous reader shares a report:CitarNone of the frontier AI research labs have presented any evidence that they are on the brink of achieving artificial general intelligence, no matter how they define that goal, but Google is already planning for a "Post-AGI" world by hiring a scientist for its DeepMind AI lab to research the "profound impact" that technology will have on society."Spearhead research projects exploring the influence of AGI on domains such as economics, law, health/wellbeing, AGI to ASI [artificial superintelligence], machine consciousness, and education," Google says in the first item on a list of key responsibilities for the job. Artificial superintelligence refers to a hypothetical form of AI that is smarter than the smartest human in all domains. This is self explanatory, but just to be clear, when Google refers to "machine consciousness" it's referring to the science fiction idea of a sentient machine.OpenAI CEO Sam Altman, DeepMind CEO Demis Hassabis, Elon Musk, and other major and minor players in the AI industry are all working on AGI and have previously talked about the likelihood of humanity achieving AGI, when that might happen, and what the consequences might be, but the Google job listing shows that companies are now taking concrete steps for what comes after, or are at least are continuing to signal that they believe it can be achieved.
None of the frontier AI research labs have presented any evidence that they are on the brink of achieving artificial general intelligence, no matter how they define that goal, but Google is already planning for a "Post-AGI" world by hiring a scientist for its DeepMind AI lab to research the "profound impact" that technology will have on society."Spearhead research projects exploring the influence of AGI on domains such as economics, law, health/wellbeing, AGI to ASI [artificial superintelligence], machine consciousness, and education," Google says in the first item on a list of key responsibilities for the job. Artificial superintelligence refers to a hypothetical form of AI that is smarter than the smartest human in all domains. This is self explanatory, but just to be clear, when Google refers to "machine consciousness" it's referring to the science fiction idea of a sentient machine.OpenAI CEO Sam Altman, DeepMind CEO Demis Hassabis, Elon Musk, and other major and minor players in the AI industry are all working on AGI and have previously talked about the likelihood of humanity achieving AGI, when that might happen, and what the consequences might be, but the Google job listing shows that companies are now taking concrete steps for what comes after, or are at least are continuing to signal that they believe it can be achieved.
CitarGoogle DeepMind Is Hiring a 'Post-AGI' Research ScientistPosted by msmash on Tuesday April 15, 2025 @02:02PM from the how-about-that dept.An anonymous reader shares a report:CitarNone of the frontier AI research labs have presented any evidence that they are on the brink of achieving artificial general intelligence, no matter how they define that goal, but Google is already planning for a "Post-AGI" world by hiring a scientist for its DeepMind AI lab to research the "profound impact" that technology will have on society."Spearhead research projects exploring the influence of AGI on domains such as economics, law, health/wellbeing, AGI to ASI [artificial superintelligence], machine consciousness, and education," Google says in the first item on a list of key responsibilities for the job. Artificial superintelligence refers to a hypothetical form of AI that is smarter than the smartest human in all domains. This is self explanatory, but just to be clear, when Google refers to "machine consciousness" it's referring to the science fiction idea of a sentient machine.OpenAI CEO Sam Altman, DeepMind CEO Demis Hassabis, Elon Musk, and other major and minor players in the AI industry are all working on AGI and have previously talked about the likelihood of humanity achieving AGI, when that might happen, and what the consequences might be, but the Google job listing shows that companies are now taking concrete steps for what comes after, or are at least are continuing to signal that they believe it can be achieved.Saludos.
Gemini AI Solves Coding Problem That Stumped 139 Human Teams At ICPC World FinalsPosted by BeauHD on Wednesday September 17, 2025 @04:02PM from the artificial-brain-power dept.An anonymous reader quotes a report from Ars Technica:CitarLike the rest of its Big Tech cadre, Google has spent lavishly on developing generative AI models. Google's AI can clean up your text messages and summarize the web, but the company is constantly looking to prove that its generative AI has true intelligence. The International Collegiate Programming Contest (ICPC) helps make the point. Google says Gemini 2.5 participated in the 2025 ICPC World Finals, turning in a gold medal performance. According to Google this marks "a significant step on our path toward artificial general intelligence."Every year, thousands of college-level coders participate in the ICPC event, facing a dozen deviously complex coding and algorithmic puzzles over five grueling hours. This is the largest and longest-running competition of its type. To compete in the ICPC, Google connected Gemini 2.5 Deep Think to a remote online environment approved by the ICPC. The human competitors were given a head start of 10 minutes before Gemini began "thinking."According to Google, it did not create a freshly trained model for the ICPC like it did for the similar International Mathematical Olympiad (IMO) earlier this year. The Gemini 2.5 AI that participated in the ICPC is the same general model that we see in other Gemini applications. However, it was "enhanced" to churn through thinking tokens for the five-hour duration of the competition in search of solutions. At the end of the time limit, Gemini managed to get correct answers for 10 of the 12 problems, which earned it a gold medal. Only four of 139 human teams managed the same feat. "The ICPC has always been about setting the highest standards in problem-solving," said ICPC director Bill Poucher. "Gemini successfully joining this arena, and achieving gold-level results, marks a key moment in defining the AI tools and academic standards needed for the next generation."Gemini's solutions are available on GitHub.
Like the rest of its Big Tech cadre, Google has spent lavishly on developing generative AI models. Google's AI can clean up your text messages and summarize the web, but the company is constantly looking to prove that its generative AI has true intelligence. The International Collegiate Programming Contest (ICPC) helps make the point. Google says Gemini 2.5 participated in the 2025 ICPC World Finals, turning in a gold medal performance. According to Google this marks "a significant step on our path toward artificial general intelligence."Every year, thousands of college-level coders participate in the ICPC event, facing a dozen deviously complex coding and algorithmic puzzles over five grueling hours. This is the largest and longest-running competition of its type. To compete in the ICPC, Google connected Gemini 2.5 Deep Think to a remote online environment approved by the ICPC. The human competitors were given a head start of 10 minutes before Gemini began "thinking."According to Google, it did not create a freshly trained model for the ICPC like it did for the similar International Mathematical Olympiad (IMO) earlier this year. The Gemini 2.5 AI that participated in the ICPC is the same general model that we see in other Gemini applications. However, it was "enhanced" to churn through thinking tokens for the five-hour duration of the competition in search of solutions. At the end of the time limit, Gemini managed to get correct answers for 10 of the 12 problems, which earned it a gold medal. Only four of 139 human teams managed the same feat. "The ICPC has always been about setting the highest standards in problem-solving," said ICPC director Bill Poucher. "Gemini successfully joining this arena, and achieving gold-level results, marks a key moment in defining the AI tools and academic standards needed for the next generation."