{"id":3613,"date":"2025-09-20T21:47:31","date_gmt":"2025-09-20T21:47:31","guid":{"rendered":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/?p=3613"},"modified":"2025-09-20T21:47:57","modified_gmt":"2025-09-20T21:47:57","slug":"openbench-ar","status":"publish","type":"post","link":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/?p=3613","title":{"rendered":"OpenBench-AR"},"content":{"rendered":"\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-spectrcyde wp-block-embed-spectrcyde\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"dRDblIZTVs\"><a href=\"https:\/\/172-234-197-23.ip.linodeusercontent.com\/?page_id=3610\">OpenBench-AR<\/a><\/blockquote><iframe class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; visibility: hidden;\" title=\"&#8220;OpenBench-AR&#8221; &#8212; Spectrcyde\" src=\"https:\/\/172-234-197-23.ip.linodeusercontent.com\/?page_id=3610&#038;embed=true#?secret=qYPwqvNmrk#?secret=dRDblIZTVs\" data-secret=\"dRDblIZTVs\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">OpenBench-AR: Reproducible Benchmarks for RF-to-AR Systems&#8221;<\/h3>\n\n\n\n<p><strong>Overall Assessment<\/strong>: Aimed at a workshop like MLSys Artifact Evaluation or ReproNLP) proposes OpenBench-AR, an open-source suite for standardizing evaluations in RF-to-AR systems (e.g., Wi-Fi\/UWB sensing integrated with AR overlays). It addresses a pressing issue in ML\/HCI reproducibility by bundling traces, metrics schemas, and auto-generation scripts for figures\/tables. The work is timely, given reproducibility crises in ML [1,2], and fills a gap in RF-AR research, which often lacks traceable workflows. Strengths include practicality (one-command repro) and alignment with artifact tracks.<\/p>\n\n\n\n<p><strong>Novelty and Significance<\/strong>: RF-AR systems (e.g., for triage\/threat detection) are emerging, but fragmented evaluations hinder progress\u2014e.g., custom datasets without provenance [1]. OpenBench-AR&#8217;s end-to-end repro (traces + metrics + LaTeX gen) is a useful contribution, enabling &#8220;one-command&#8221; figure reproduction across hardware. It captures &#8220;dynamic interactions&#8221; [2] via versioned traces and manifests, promoting traceability. Significance is high for niche communities (e.g., wearable AR in obscured envs), potentially standardizing benchmarks like latency\/power in RF pipelines. However, it&#8217;s not groundbreaking; similar suites exist (e.g., Open RL Benchmark, NCBench for genomics). No exact matches found in searches for this title or related RF-AR benchmarks, suggesting originality, but cite more (e.g., MLPerf for ML repro). As part of a potential series (tying to author&#8217;s prior works on Triage-AR, RF Biomarker Sensing, Network-Degraded Ops, Glass UX\u2014sharing themes like RF fusion, AR overlays), it could unify evaluations.<\/p>\n\n\n\n<p><strong>Technical Quality<\/strong>: The design is robust: traces include raw RF data (CSV at 200Hz), labels, system logs (CPU\/power), and env params\u2014comprehensive for repro. JSON schema for metrics (latency breakdown, FPS, power, user perf) ensures structure, with manifests linking to commits\/hardware for provenance. Scripts for PGFPlots\/CSV output are innovative for LaTeX workflows, reducing manual errors (e.g., <code>make figures<\/code>). Client simulator for traces is practical for controlled repro. Limitations: Traces are simulator-based\u2014lacks real hardware diversity (e.g., phone vs. glasses); metrics focus on system-level, not end-user (e.g., SA scores). No details on dataset size\/diversity (e.g., scenarios like occlusion\/motion). Artifact is &#8220;ready for evaluation tracks&#8221;\u2014commendable, but provide GitHub link or size stats. Ties well to RF-AR pipelines (e.g., from prior works), but formalize trace format (e.g., schema examples).<\/p>\n\n\n\n<p><strong>Evaluation<\/strong>: The demonstration (reproducing latency\/FPS\/power on prior pipelines) is promising but incomplete in provided text\u2014page 2 likely details results. It shows repro across hardware, a key claim. Metrics are appropriate (e.g., energy per alert, miss rate). Weaknesses: No quantitative results here (e.g., repro time savings, error rates); assume page 2 has figures\/tables. No user study or comparison to ad-hoc repro. For artifact eval, ensure dockerized setup for easy runs.<\/p>\n\n\n\n<p><strong>Clarity and Presentation<\/strong>: Concise and clear, with logical sections: problem, design (traces, metrics, gen). Examples (script commands) aid usability. README guidance is a plus. Minor issues: References sparse (only 2)\u2014expand to 5+ (e.g., on RF datasets). Typos absent. As anonymous, clean for blind review.<\/p>\n\n\n\n<p><strong>Ethical Considerations<\/strong>: Promotes open science, reducing barriers\u2014positive. No risks (e.g., data privacy in traces).<\/p>\n\n\n\n<p><strong>Relation to Screenshots\/Series<\/strong>: This meta-paper complements the series: standardizes evals for Triage-AR (AR triage), RF Biomarker Sensing (phone RF fusion for K9 replacement), Network-Degraded Ops (DTN for AR resilience), and Glass UX (multi-target AR UI). If by same authors (e.g., Benjamin J. Gilbert in non-anon versions), it unifies their pipeline\u2014note as related work.<\/p>\n\n\n\n<p><strong>Recommendations for Revision<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Major<\/strong>: Include sample results\/figures from page 2; add real vs. sim trace comparison; expand refs.<\/li>\n\n\n\n<li><strong>Minor<\/strong>: Detail dataset stats (e.g., hours of traces); provide artifact URL; formalize JSON schema snippet.<\/li>\n\n\n\n<li><strong>Publication Fit<\/strong>: Ideal for SysML or CHI workshops on repro. Searches (Sep 20, 2025) show no publication\u2014submit for artifact badge.<\/li>\n<\/ul>\n\n\n\n<p>Solid tool for community; with full content, stronger impact.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenBench-AR: Reproducible Benchmarks for RF-to-AR Systems&#8221; Overall Assessment: Aimed at a workshop like MLSys Artifact Evaluation or ReproNLP) proposes OpenBench-AR, an open-source suite for standardizing evaluations in RF-to-AR systems (e.g., Wi-Fi\/UWB sensing integrated with AR overlays). It addresses a pressing issue in ML\/HCI reproducibility by bundling traces, metrics schemas, and auto-generation scripts for figures\/tables. The&hellip;&nbsp;<a href=\"https:\/\/172-234-197-23.ip.linodeusercontent.com\/?p=3613\" rel=\"bookmark\"><span class=\"screen-reader-text\">OpenBench-AR<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":2926,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"neve_meta_sidebar":"","neve_meta_container":"","neve_meta_enable_content_width":"","neve_meta_content_width":0,"neve_meta_title_alignment":"","neve_meta_author_avatar":"","neve_post_elements_order":"","neve_meta_disable_header":"","neve_meta_disable_footer":"","neve_meta_disable_title":"","footnotes":""},"categories":[10],"tags":[],"class_list":["post-3613","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-signal_scythe"],"_links":{"self":[{"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=\/wp\/v2\/posts\/3613","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3613"}],"version-history":[{"count":1,"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=\/wp\/v2\/posts\/3613\/revisions"}],"predecessor-version":[{"id":3614,"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=\/wp\/v2\/posts\/3613\/revisions\/3614"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=\/wp\/v2\/media\/2926"}],"wp:attachment":[{"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3613"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3613"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/172-234-197-23.ip.linodeusercontent.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3613"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}