• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Friday, June 5, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Building a Semantic Search Engine and Open-Status Classifier over the ResearchMath-14k Dataset

Josh by Josh
June 5, 2026
in Al, Analytics and Automation
0
Building a Semantic Search Engine and Open-Status Classifier over the ResearchMath-14k Dataset


from sentence_transformers import util
def search(query, k=5):
   q = model.encode([query], normalize_embeddings=True)
   sims = util.cos_sim(q, emb)[0].cpu().numpy()
   idx = sims.argsort()[::-1][:k]
   print(f'\n=== Query: "{query}" ===')
   for rank, i in enumerate(idx, 1):
       row = work.iloc[i]
       print(f"\n[{rank}] sim={sims[i]:.3f} | {row['taxonomy_level_1']} "
             f"| status={row['open_status']}")
       print("   ", row[TEXT_COL][:260].replace("\n", " "), "...")
search("rational points on hyperelliptic curves")
search("multiplicativity of maximal output p-norm of a quantum channel")
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, ConfusionMatrixDisplay
y = work["open_status"].values
Xtr, Xte, ytr, yte = train_test_split(
   emb, y, test_size=0.25, random_state=RANDOM_STATE, stratify=y)
clf = LogisticRegression(max_iter=2000, class_weight="balanced", C=2.0)
clf.fit(Xtr, ytr)
pred = clf.predict(Xte)
print("\n=== open_status classifier (embeddings + logistic regression) ===")
print(classification_report(yte, pred))
fig, ax = plt.subplots(figsize=(7, 6))
ConfusionMatrixDisplay.from_predictions(
   yte, pred, ax=ax, cmap="Blues", xticks_rotation=45,
   normalize="true", values_format=".2f")
ax.set_title("open_status confusion matrix (row-normalized)")
plt.tight_layout(); plt.show()
sims = util.cos_sim(emb, emb).cpu().numpy()
np.fill_diagonal(sims, 0)
i, j = np.unravel_index(sims.argmax(), sims.shape)
print(f"\nMost similar pair (cos={sims[i, j]:.3f}):")
for n in (i, j):
   print(f"\n  paper_id={work.iloc[n]['paper_id']} | "
         f"{work.iloc[n]['taxonomy_level_1']}")
   print("   ", work.iloc[n][TEXT_COL][:240].replace("\n", " "), "...")
print("\nDone. Set SAMPLE_SIZE=None at the top to run on the full 14.1k rows.")



Source_link

READ ALSO

PATH to boost AI training and career opportunities for industry-aligned jobs | MIT News

Miso Labs Releases MisoTTS: An 8B Emotive Text-to-Speech Model with Open Weights

Related Posts

PATH to boost AI training and career opportunities for industry-aligned jobs | MIT News
Al, Analytics and Automation

PATH to boost AI training and career opportunities for industry-aligned jobs | MIT News

June 4, 2026
Al, Analytics and Automation

Miso Labs Releases MisoTTS: An 8B Emotive Text-to-Speech Model with Open Weights

June 4, 2026
Teaching AI agents to ask better questions by playing “Battleship” | MIT News
Al, Analytics and Automation

Teaching AI agents to ask better questions by playing “Battleship” | MIT News

June 4, 2026
How to Build a Document Intelligence Backend with iii Using Workers, Functions, and Cron Triggers
Al, Analytics and Automation

How to Build a Document Intelligence Backend with iii Using Workers, Functions, and Cron Triggers

June 4, 2026
Medical Image Annotation for Ophthalmology & AI
Al, Analytics and Automation

Medical Image Annotation for Ophthalmology & AI

June 3, 2026
MIT researchers teach AI models to interpret charts | MIT News
Al, Analytics and Automation

MIT researchers teach AI models to interpret charts | MIT News

June 3, 2026

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

SCALE AI Announces New Investments to Accelerate AI Adoption Across British Columbia

SCALE AI Announces New Investments to Accelerate AI Adoption Across British Columbia

December 20, 2025
14 strategies for leads and reach

14 strategies for leads and reach

September 9, 2025
How to watch the 2025 MLB World Series without cable

How to watch the 2025 MLB World Series without cable

October 24, 2025
CRM et marketing automation : comment (enfin) aligner les équipes Sales et Marketing ?

CRM et marketing automation : comment (enfin) aligner les équipes Sales et Marketing ?

August 7, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Building a Semantic Search Engine and Open-Status Classifier over the ResearchMath-14k Dataset
  • Visual Strategy Is Sales Strategy
  • The 50 Most-Cited Websites in Gemini (June 2026)
  • OCR Receipt Data Extraction: Automate Expense Processing with AI
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions